안녕하세요!

Federico Calò

Sviluppatore Software | Divulgatore Tecnico

Creo applicazioni web moderne e strumenti digitali personalizzati per aiutare le attività a crescere attraverso l'innovazione tecnologica. La mia passione è unire informatica ed economia per generare valore reale.

연락하기

소개

La mia passione per l'informatica è nata tra i banchi dell'Istituto Tecnico Commerciale di Maglie, dove ho scoperto il potere della programmazione e il fascino di creare soluzioni digitali. Fin da subito, ho capito che l'informatica non era solo codice, ma uno strumento straordinario per trasformare idee in realtà.

Durante gli studi superiori in Sistemi Informativi Aziendali, ho iniziato a intrecciare informatica ed economia, comprendendo come la tecnologia possa essere il motore della crescita per qualsiasi attività. Questa visione mi ha accompagnato all'Università degli Studi di Bari, dove ho conseguito la Laurea in Informatica, approfondendo le mie competenze tecniche e la mia passione per lo sviluppo software.

Oggi metto questa esperienza al servizio di imprese, professionisti e startup, creando soluzioni digitali su misura che automatizzano processi, ottimizzano risorse e aprono nuove opportunità di business. Perché la vera innovazione inizia quando la tecnologia incontra le esigenze reali delle persone.

역량

Analisi Dati & Modelli Previsionali

Trasformo i dati in insights strategici con analisi approfondite e modelli predittivi per decisioni informate

프로세스 자동화

Creo strumenti personalizzati che automatizzano operazioni ripetitive e liberano tempo per attività a valore aggiunto

맞춤 시스템

Sviluppo sistemi software su misura, dalle integrazioni tra piattaforme alle dashboard personalizzate

const federico = {
  nome: "Federico Calò",
  ruolo: "Sviluppatore Software",
  città: "Bari, Italia",
  missione: "Aiutare attraverso l'informatica",
  passioni: [
    "Codice Pulito",
    "Innovazione",
    "Crescita Continua"
  ]
};

미션

Credo fermamente che l'informatica sia lo strumento più potente per trasformare le idee in realtà e migliorare la vita delle persone.

🚀

기술의 민주화

La mia missione è rendere l'informatica accessibile a tutti: dalle piccole imprese locali alle startup innovative, fino ai professionisti che vogliono digitalizzare la propria attività. Ogni realtà merita di sfruttare le potenzialità del digitale.

💡

IT와 비즈니스 통합

Non è solo questione di scrivere codice: è capire come la tecnologia possa generare valore reale. Intrecciando competenze informatiche e visione economica, aiuto le attività a crescere, ottimizzare processi e raggiungere nuovi traguardi di efficienza e redditività.

🎯

맞춤 솔루션

Ogni attività è unica, e così devono esserlo le soluzioni. Sviluppo strumenti personalizzati che rispondono alle esigenze specifiche di ciascun cliente, automatizzando processi ripetitivi e liberando tempo per ciò che conta davvero: far crescere il business.

기술로 비즈니스를 혁신하세요

Dicembre 2024

Visualizza

Master SQL

RoadMap.sh

Novembre 2024

Visualizza

Oracle Certified Foundations Associate

Oracle

Ottobre 2024

Visualizza

People Leadership Credential

Connect

Settembre 2024

💻 Linguaggi & Tecnologie

☕Java

🐍Python

📜JavaScript

🅰️Angular

⚛️React

🔷TypeScript

🗄️SQL

🐘PHP

🎨CSS/SCSS

🔧Node.js

🐳Docker

🌿Git

💼

12/2024 - Presente

Custom Software Engineering Analyst

Accenture

Bari, Puglia, Italia · Ibrida Analisi e sviluppo di sistemi informatici attraverso l'utilizzo di Java e Quarkus in Health and Public Sector. Formazione continua su tecnologie moderne per la creazione di soluzioni software personalizzate ed efficienti e sugli agenti.

💼

06/2022 - 12/2024

Analista software e Back End Developer Associate Consultant

Links Management and Technology SpA

Esperienza nell'analisi di sistemi software as-is e flussi ETL utilizzando PowerCenter. Formazione completata su Spring Boot per lo sviluppo di applicazioni backend moderne e scalabili. Sviluppatore Backend specializzato in Spring Boot, con esperienza in progettazione di database, analisi, sviluppo e testing dei task assegnati.

💼

02/2021 - 10/2021

Programmatore software

Adesso.it (prima era WebScience srl)

Esperienza nell'analisi AS-IS e TO-BE, evoluzioni SEO ed evoluzioni website per migliorare le performance e l'engagement degli utenti.

🎓

2018 - 2025

Laurea in Informatica

Università degli Studi di Bari Aldo Moro

Bachelor's degree in Computer Science, focusing on software engineering, algorithms, and modern development practices.

📚

2013 - 2018

Diploma - Sistemi Informativi Aziendali

Istituto Tecnico Commerciale di Maglie

Technical diploma specializing in Business Information Systems, combining IT knowledge with business management.

연락하기

프로젝트가 있으신가요? 아래 양식을 작성해 주시면 빠르게 답변드리겠습니다.

* Campi obbligatori. I tuoi dati saranno utilizzati solo per rispondere alla tua richiesta.

AI 지원 탐지: 시그마 규칙 생성을 위한 LLM

대규모 언어 모델을 탐지 엔지니어링에 통합하는 것은 최근 몇 년 동안 SOC 환경에서 가장 급진적인 변화 중 하나를 나타냅니다. 이는 단순히 반복 작업을 자동화하는 문제가 아닙니다. LLM을 사용하여 시그마 규칙을 생성한다는 것은 프로세스를 몇 초로 압축하여 전문 분석가는 위협 보고서 읽기부터 배포 준비가 완료된 테스트된 규칙 작성까지 몇 시간 내에 완료합니다.

다음과 같은 프레임워크 시그마젠MITRE ATT&CK APAC 2025에서 발표된 는 미세 조정된 모델이 다음과 같은 보고서를 수집할 수 있는 방법을 보여줍니다. 위협 인텔리전스를 구축하고, ATT&CK 기술을 추출하고, 고정밀 매핑된 시그마 규칙을 생성합니다. 동시에 오픈 소스 도구와 n8n 기반 워크플로우를 통해 소규모 팀은 기업 투자 없이 AI 지원 파이프라인을 구축할 수 있습니다.

이 문서에서는 엔지니어링 프롬프트에서 탐지 규칙을 생성하기 위한 AI 지원 시스템의 아키텍처를 안내합니다. 자동 검증, 합성 로그를 사용한 테스트, 기존 CI/CD 파이프라인으로의 통합까지.

무엇을 배울 것인가

LLM이 Sigma 형식에 대해 생각하는 방식과 직접 SPL보다 더 나은 결과를 생성하는 이유
탐지 규칙 생성을 위한 구체적인 프롬프트 엔지니어링 기술
엔드 투 엔드 AI 지원 파이프라인 아키텍처
생성된 규칙의 자동 검증 및 테스트
SigmaGen, pySigma 및 CI/CD 워크플로우와의 통합
보안 상황에서 LLM의 피해야 할 안티 패턴과 실제 한계

LLM이 시그마 규칙 생성에 탁월한 이유

응용 연구에서 나타난(실제로 확인된) 가장 흥미로운 관찰 중 하나는 LLM이 우수한 결과를 생성한다는 것입니다. 생성되면 훨씬 더 높아집니다. 시그마 직접 SPL 또는 KQL 쿼리와 비교됩니다. 그 이유는 구조적이다.

Sigma의 YAML 형식은 다음을 명확하게 구분합니다.

제목과 설명: 모델은 감지하는 내용과 그 이유를 명확히 설명해야 합니다.
로그 소스: 데이터 소스(카테고리, 제품, 서비스)를 지정합니다.
발각: 부울 일치 논리
상태: 선택자를 결합하는 방법
거짓 긍정: 극단적인 경우에 대한 명시적 추론

이 구조는 모델이 순차적이고 선언적으로 "생각"하도록 강제하여 다음과 같은 경우에 발생하는 논리적 오류를 줄입니다. 플랫폼별 쿼리 언어로 출력을 직접 요청합니다. 실제로 Sigma는 다음과 같이 작동합니다. 암묵적인 사고방식 탐지에 적용되는 LLM의 경우.

벤치마크 데이터

LLMCloudHunter 프로젝트(2024)의 연구원들은 GPT-4와 같은 일반 LLM이 73%의 경우에 유효한 시그마 규칙을 생성한다는 것을 입증했습니다. 구조화된 CTI 보고서에서는 직접 SPL 출력을 요청할 때 41%로 나타났습니다. 데이터 세트를 미세 조정하면 비율이 89%로 증가합니다. 안전 사양.

AI 지원 파이프라인의 아키텍처

탐지 규칙 생성을 위한 AI 지원 파이프라인은 5가지 개별 단계로 구성됩니다.

음식물 섭취: 위협 보고서 수집, CTI 블로그 게시물, CVE 권고
추출: IOC 추출, ATT&CK 기술, 설명된 동작
세대: LLM을 통한 시그마 규칙 생성
확인: 자동 구문 및 의미 검증
테스트: 합성 로그 및 CI/CD 통합을 사용한 테스트

# Architettura base della pipeline AI-assisted
# File: ai_sigma_pipeline.py

import openai
import yaml
import subprocess
from pathlib import Path
from dataclasses import dataclass
from typing import Optional

@dataclass
class ThreatReport:
    content: str
    source_url: str
    report_type: str  # 'cti_blog', 'advisory', 'malware_analysis'

@dataclass
class GeneratedRule:
    sigma_yaml: str
    mitre_techniques: list[str]
    confidence: float
    validation_passed: bool
    test_results: Optional[dict] = None

class AISigmaPipeline:
    def __init__(self, openai_api_key: str, rules_output_dir: str):
        self.client = openai.OpenAI(api_key=openai_api_key)
        self.output_dir = Path(rules_output_dir)
        self.output_dir.mkdir(parents=True, exist_ok=True)

    def process_report(self, report: ThreatReport) -> list[GeneratedRule]:
        """Pipeline completa da report a regole validate."""
        # Stage 1: Extraction
        techniques = self._extract_attack_techniques(report)
        behaviors = self._extract_behaviors(report)

        # Stage 2: Generation
        rules = []
        for technique in techniques:
            rule = self._generate_sigma_rule(
                report=report,
                technique=technique,
                behaviors=behaviors
            )
            if rule:
                rules.append(rule)

        # Stage 3: Validation + Testing
        return [self._validate_and_test(r) for r in rules]

    def _extract_attack_techniques(self, report: ThreatReport) -> list[str]:
        """Estrae tecniche ATT&CK dal report tramite LLM."""
        response = self.client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {
                    "role": "system",
                    "content": (
                        "Sei un analista di threat intelligence esperto in MITRE ATT&CK. "
                        "Estrai SOLO le tecniche ATT&CK (formato T1234 o T1234.001) "
                        "esplicitamente descritte nel testo. Rispondi solo con una lista JSON."
                    )
                },
                {
                    "role": "user",
                    "content": f"Report:\n{report.content[:4000]}"
                }
            ],
            temperature=0.1  # Bassa temperatura per output deterministico
        )

        import json
        try:
            return json.loads(response.choices[0].message.content)
        except json.JSONDecodeError:
            return []

품질 시그마 규칙을 위한 신속한 엔지니어링

출력의 품질은 프롬프트 구조에 따라 결정적으로 달라집니다. 결과를 생성하는 세 가지 기본 패턴이 있습니다. 시그마 규칙 생성으로 구성됩니다.

패턴 1: 구조화된 시스템 프롬프트

시스템 프롬프트에는 모델이 유효한 시그마를 생성하는 데 필요한 메타정보가 정확하게 포함되어야 합니다. YAML 구조, 유효한 값 category e product, 모범 사례 falsepositives 그리고 허용되는 심각도 수준.

# Prompt di sistema ottimizzato per generazione Sigma rules
SIGMA_SYSTEM_PROMPT = """
Sei un Detection Engineer esperto nella scrittura di Sigma rules.
Quando generi una Sigma rule, rispetta SEMPRE questa struttura YAML:

title: [titolo descrittivo, max 80 char]
id: [UUID v4]
status: experimental
description: [descrizione dettagliata del comportamento rilevato]
references:
  - [URL del report originale se disponibile]
author: AI-Assisted Detection
date: [data odierna in formato YYYY-MM-DD]
tags:
  - attack.[tattica]
  - attack.[tecnica]
logsource:
  category: [process_creation | network_connection | file_event | registry_event]
  product: [windows | linux | macos]
detection:
  [nome_selettore]:
    [campo]: [valore o lista valori]
  condition: [nome_selettore]
falsepositives:
  - [casi legittimi plausibili]
level: [informational | low | medium | high | critical]

REGOLE CRITICHE:
- Usa SEMPRE wildcards (*) nei valori stringa per evitare match esatti fragili
- Preferisci campi con alta disponibilità (Image, CommandLine, ParentImage)
- Indica sempre almeno un falso positivo realistico
- Il campo 'condition' deve essere semplice e leggibile
- Non usare regex complesse se un approccio con keywords e sufficiente
"""

def build_generation_prompt(technique_id: str, behaviors: list[str],
                             logsource_hint: str, report_excerpt: str) -> str:
    return f"""Genera una Sigma rule per rilevare la tecnica MITRE ATT&CK {technique_id}.

Comportamenti osservati nel report:
{chr(10).join(f'- {b}' for b in behaviors[:5])}

Tipo di log suggerito: {logsource_hint}

Estratto del report originale:
{report_excerpt[:1500]}

Genera UNA SOLA Sigma rule in formato YAML valido. Non aggiungere spiegazioni fuori dal YAML."""

패턴 2: 품질 예시가 포함된 Few-Shot

프롬프트(몇 장)에 2~3개의 고품질 규칙 예제를 포함하면 출력의 일관성이 크게 향상됩니다. 특히 비정상적인 로그 소스나 복잡한 조건의 경우.

# Few-shot: esempio di regola di qualità inclusa nel prompt
FEW_SHOT_EXAMPLE = """
Esempio di regola di alta qualità per ispirazione:

title: Suspicious PowerShell Encoded Command Execution
id: 5b4f6d89-1234-4321-ab12-fedcba987654
status: stable
description: >
  Rileva l'esecuzione di PowerShell con parametri di encoding (-enc, -EncodedCommand)
  frequentemente usati da malware per offuscare payload malevoli.
references:
  - https://attack.mitre.org/techniques/T1059/001/
author: SigmaHQ Community
date: 2025-01-15
tags:
  - attack.execution
  - attack.t1059.001
logsource:
  category: process_creation
  product: windows
detection:
  selection:
    Image|endswith:
      - '\\\\powershell.exe'
      - '\\\\pwsh.exe'
    CommandLine|contains:
      - ' -enc '
      - ' -EncodedCommand '
      - ' -ec '
  condition: selection
falsepositives:
  - Software legittimo che usa PowerShell con encoding per configurazioni complesse
  - Script di deployment enterprise
level: medium
"""

패턴 3: 명시적인 사고 사슬

복잡한 기술의 경우 규칙을 작성하기 전에 모델이 추론하도록 요구하면 더 정확한 출력이 생성됩니다. 이 접근 방식은 대기 시간을 추가하지만 필요한 반복 횟수를 크게 줄입니다.

def generate_with_cot(self, technique_id: str, report: ThreatReport) -> GeneratedRule:
    """Generazione con Chain-of-Thought esplicito."""

    # Step 1: Chiedi al modello di ragionare
    reasoning_response = self.client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": SIGMA_SYSTEM_PROMPT},
            {
                "role": "user",
                "content": f"""Prima di scrivere la regola per {technique_id}, analizza:
1. Quali artefatti forensi questa tecnica lascia nei log?
2. Qual è il logsource più appropriato?
3. Quali campi hanno la maggiore discriminazione signal/noise?
4. Quali sono i falsi positivi più comuni?

Report: {report.content[:2000]}"""
            }
        ],
        temperature=0.3
    )

    reasoning = reasoning_response.choices[0].message.content

    # Step 2: Usa il ragionamento per guidare la generazione
    rule_response = self.client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": SIGMA_SYSTEM_PROMPT},
            {"role": "user", "content": f"Analisi tecnica:\n{reasoning}"},
            {
                "role": "assistant",
                "content": "Basandomi su questa analisi, genero la Sigma rule ottimale:"
            },
            {"role": "user", "content": "Procedi con la generazione YAML."}
        ],
        temperature=0.1
    )

    return GeneratedRule(
        sigma_yaml=rule_response.choices[0].message.content,
        mitre_techniques=[technique_id],
        confidence=0.0,  # Calcolata nella validazione
        validation_passed=False
    )

생성된 규칙의 자동 검증

LLM은 구문적으로는 유효하지만 의미적으로는 잘못된 YAML(존재하지 않는 로그 소스, 필드)을 생성할 수 있습니다. 이름이 잘못 지정되었습니다. 선택기를 올바르게 참조하지 않는 조건입니다. 자동 검증 규칙이 저장소에 들어가기 전의 임계 게이트입니다.

import yaml
from sigma.rule import SigmaRule
from sigma.exceptions import SigmaError
import re
import uuid

class SigmaRuleValidator:
    # Logsource validi più comuni
    VALID_CATEGORIES = {
        'process_creation', 'network_connection', 'file_event',
        'registry_event', 'registry_add', 'registry_set',
        'dns_query', 'image_load', 'pipe_created', 'raw_access_read'
    }

    VALID_LEVELS = {'informational', 'low', 'medium', 'high', 'critical'}
    VALID_STATUSES = {'stable', 'test', 'experimental', 'deprecated', 'unsupported'}

    def validate(self, sigma_yaml: str) -> tuple[bool, list[str]]:
        """Valida una Sigma rule. Restituisce (valida, lista errori)."""
        errors = []

        # 1. Validazione YAML sintattico
        try:
            rule_dict = yaml.safe_load(sigma_yaml)
        except yaml.YAMLError as e:
            return False, [f"YAML invalido: {str(e)}"]

        # 2. Campi obbligatori
        required_fields = ['title', 'description', 'logsource', 'detection']
        for field in required_fields:
            if field not in rule_dict:
                errors.append(f"Campo obbligatorio mancante: {field}")

        if errors:
            return False, errors

        # 3. Validazione logsource
        logsource = rule_dict.get('logsource', {})
        if 'category' in logsource:
            if logsource['category'] not in self.VALID_CATEGORIES:
                errors.append(
                    f"Categoria logsource non valida: {logsource['category']}. "
                    f"Valide: {', '.join(self.VALID_CATEGORIES)}"
                )

        # 4. Validazione detection
        detection = rule_dict.get('detection', {})
        if 'condition' not in detection:
            errors.append("Campo 'condition' mancante in detection")
        else:
            condition = detection['condition']
            # Verifica che i selettori nella condition esistano
            selectors = [k for k in detection.keys() if k != 'condition']
            # Parse base della condition per trovare riferimenti
            referenced = re.findall(r'\b([a-zA-Z_][a-zA-Z0-9_]*)\b', condition)
            for ref in referenced:
                if ref not in ['and', 'or', 'not', '1', 'of', 'all', 'them', 'filter']:
                    if ref not in selectors:
                        errors.append(
                            f"Condition referenzia '{ref}' che non esiste nei selettori: {selectors}"
                        )

        # 5. Validazione level
        level = rule_dict.get('level', '')
        if level and level not in self.VALID_LEVELS:
            errors.append(f"Level non valido: {level}. Validi: {self.VALID_LEVELS}")

        # 6. UUID check
        rule_id = rule_dict.get('id', '')
        if rule_id:
            try:
                uuid.UUID(str(rule_id))
            except ValueError:
                errors.append(f"ID non e un UUID valido: {rule_id}")
        else:
            errors.append("Campo 'id' mancante - genera un UUID v4")

        # 7. Validazione pySigma (se disponibile)
        try:
            SigmaRule.from_yaml(sigma_yaml)
        except SigmaError as e:
            errors.append(f"Errore pySigma: {str(e)}")

        return len(errors) == 0, errors

안티 패턴: LLM 출력을 맹목적으로 신뢰

AI 지원 탐지를 구현하는 팀에서 흔히 저지르는 실수는 검증 없이 생성된 규칙을 배포하는 것입니다. LLM은 시그마 생성 시 구체적이고 반복 가능한 실수를 범합니다.

필드 사용 ProcessName 대신에 Image (시스몬)
존재하지 않는 선택자를 참조하는 조건 작성
비표준 로그 소스 카테고리 개발
사용 contains 대상 로그 소스에서 와일드카드를 지원하지 않는 필드

실제 로그를 사용한 자동 검증 및 테스트는 선택 사항이 아닙니다.

합성 로그를 사용한 자동화된 테스트

구문 검증 후 두 동작을 모두 시뮬레이션하는 로그에서 각 규칙을 테스트해야 합니다. 정상 트래픽(오탐 테스트)보다 예상되는 악성 트래픽(참 긍정 테스트)이 더 높습니다. 이 접근법은, 전화하다 규칙 단위 테스트, 성숙한 파이프라인과 성숙한 파이프라인을 구별하는 관행 실험.

import json
from sigma.collection import SigmaCollection
from sigma.backends.test import TextQueryTestBackend
from typing import Any

class SigmaRuleTester:
    def __init__(self):
        self.backend = TextQueryTestBackend()

    def generate_test_events(self, sigma_yaml: str,
                              llm_client) -> dict[str, list[dict]]:
        """Genera eventi di test tramite LLM basandosi sulla regola."""
        rule_dict = yaml.safe_load(sigma_yaml)

        prompt = f"""Data questa Sigma rule:
{sigma_yaml}

Genera in formato JSON due liste di eventi di log:
1. "true_positives": 3 eventi che DEVONO triggherare la regola
2. "false_positives": 3 eventi legittimi che NON devono triggherare la regola

Ogni evento deve avere i campi esatti che la regola usa per il matching.
Formato richiesto:
{
  "true_positives": [
    {"Image": "C:\\\\Windows\\\\System32\\\\cmd.exe", "CommandLine": "...", ...}
  ],
  "false_positives": [
    {"Image": "C:\\\\Program Files\\\\...", "CommandLine": "...", ...}
  ]
}"""

        response = llm_client.chat.completions.create(
            model="gpt-4o-mini",  # Modello più economico per i test
            messages=[{"role": "user", "content": prompt}],
            temperature=0.2
        )

        try:
            return json.loads(response.choices[0].message.content)
        except json.JSONDecodeError:
            return {"true_positives": [], "false_positives": []}

    def run_tests(self, sigma_yaml: str, test_events: dict) -> dict[str, Any]:
        """Esegue i test e restituisce risultati dettagliati."""
        results = {
            "tp_tests": {"passed": 0, "failed": 0, "details": []},
            "fp_tests": {"passed": 0, "failed": 0, "details": []},
            "overall_pass": False
        }

        # Nota: questa e una simulazione del meccanismo di test.
        # In produzione si usano tool come sigma-test o un SIEM sandbox.
        rule_dict = yaml.safe_load(sigma_yaml)
        detection = rule_dict.get('detection', {})

        for event in test_events.get('true_positives', []):
            matched = self._simulate_match(event, detection)
            if matched:
                results["tp_tests"]["passed"] += 1
                results["tp_tests"]["details"].append({"event": event, "result": "PASS"})
            else:
                results["tp_tests"]["failed"] += 1
                results["tp_tests"]["details"].append({"event": event, "result": "FAIL - non matchato"})

        for event in test_events.get('false_positives', []):
            matched = self._simulate_match(event, detection)
            if not matched:
                results["fp_tests"]["passed"] += 1
                results["fp_tests"]["details"].append({"event": event, "result": "PASS"})
            else:
                results["fp_tests"]["failed"] += 1
                results["fp_tests"]["details"].append({"event": event, "result": "FAIL - falso positivo"})

        tp_ok = results["tp_tests"]["failed"] == 0
        fp_ok = results["fp_tests"]["failed"] == 0
        results["overall_pass"] = tp_ok and fp_ok
        return results

    def _simulate_match(self, event: dict, detection: dict) -> bool:
        """Simulazione semplificata del match. Per produzione: usa sigma-test."""
        # Logica di matching semplificata per demo
        for selector_name, selector_criteria in detection.items():
            if selector_name == 'condition':
                continue
            if not isinstance(selector_criteria, dict):
                continue
            for field, value in selector_criteria.items():
                actual_field = field.split('|')[0]
                modifier = field.split('|')[1] if '|' in field else 'exact'
                event_value = event.get(actual_field, '')
                if isinstance(value, list):
                    for v in value:
                        if self._apply_modifier(str(event_value), str(v), modifier):
                            return True
                else:
                    if self._apply_modifier(str(event_value), str(value), modifier):
                        return True
        return False

    def _apply_modifier(self, event_val: str, pattern: str, modifier: str) -> bool:
        pattern_clean = pattern.replace('*', '')
        if modifier == 'contains':
            return pattern_clean.lower() in event_val.lower()
        elif modifier == 'endswith':
            return event_val.lower().endswith(pattern_clean.lower())
        elif modifier == 'startswith':
            return event_val.lower().startswith(pattern_clean.lower())
        return event_val.lower() == pattern_clean.lower()

SigmaGen: AI 지원 탐지를 위한 오픈 소스 프레임워크

시그마젠MITRE ATT&CK APAC 2025에서 발표된 는 프레임워크의 최첨단 기술을 나타냅니다. AI 탐지 규칙 생성을 위한 오픈 소스입니다. 이 프로젝트는 선별된 데이터세트에 대한 미세 조정과 규칙의 전체 수명주기를 포괄하는 파이프라인 아키텍처입니다.

# Integrazione con SigmaGen (workflow concettuale)
# SigmaGen usa un approccio in tre fasi:
# 1. Ingestion di CTI (blog, advisory, STIX feeds)
# 2. Extraction di tecniche ATT&CK tramite NER specializzato
# 3. Generazione Sigma tramite modello fine-tuned

# Workflow alternativo con n8n e LLM generici:
# n8n Workflow JSON (estratto concettuale)
N8N_WORKFLOW_STRUCTURE = {
    "nodes": [
        {
            "name": "RSS_CTI_Feed",
            "type": "n8n-nodes-base.rssFeedRead",
            "parameters": {
                "url": "https://example-cti-blog.com/feed.xml"
            }
        },
        {
            "name": "Extract_Techniques",
            "type": "n8n-nodes-base.openAi",
            "parameters": {
                "model": "gpt-4o",
                "prompt": "Estrai tecniche ATT&CK da: {{$json.content}}",
                "system_prompt": "Sei un analista CTI esperto..."
            }
        },
        {
            "name": "Generate_Sigma",
            "type": "n8n-nodes-base.openAi",
            "parameters": {
                "model": "gpt-4o",
                "prompt": "Genera Sigma rule per: {{$json.techniques}}",
                "system_prompt": SIGMA_SYSTEM_PROMPT
            }
        },
        {
            "name": "Validate_Rule",
            "type": "n8n-nodes-base.code",
            "parameters": {
                "code": "// Chiama API di validazione Python"
            }
        },
        {
            "name": "GitHub_PR",
            "type": "n8n-nodes-base.github",
            "parameters": {
                "operation": "createPullRequest",
                "repository": "org/detection-rules"
            }
        }
    ]
}

# Pipeline Python completa con gestione errori
class FullAISigmaPipeline:
    def __init__(self, config: dict):
        self.llm_client = openai.OpenAI(api_key=config['openai_key'])
        self.validator = SigmaRuleValidator()
        self.tester = SigmaRuleTester()
        self.max_retries = config.get('max_retries', 3)

    def generate_validated_rule(self, technique_id: str,
                                 report: ThreatReport) -> Optional[GeneratedRule]:
        """Genera, valida e testa una regola con retry automatico."""
        errors_history = []

        for attempt in range(self.max_retries):
            # Genera la regola (con errori precedenti nel prompt se disponibili)
            sigma_yaml = self._generate_with_error_feedback(
                technique_id, report, errors_history
            )

            # Valida
            is_valid, errors = self.validator.validate(sigma_yaml)
            if not is_valid:
                errors_history.extend(errors)
                continue

            # Test con eventi sintetici
            test_events = self.tester.generate_test_events(sigma_yaml, self.llm_client)
            test_results = self.tester.run_tests(sigma_yaml, test_events)

            if test_results['overall_pass']:
                return GeneratedRule(
                    sigma_yaml=sigma_yaml,
                    mitre_techniques=[technique_id],
                    confidence=self._calculate_confidence(test_results),
                    validation_passed=True,
                    test_results=test_results
                )
            else:
                errors_history.append(f"Test falliti: {test_results}")

        return None  # Non e riuscito a generare una regola valida

    def _generate_with_error_feedback(self, technique_id: str,
                                        report: ThreatReport,
                                        errors: list[str]) -> str:
        """Genera con feedback sugli errori precedenti."""
        error_context = ""
        if errors:
            error_context = f"\n\nTentativo precedente fallito con errori:\n" + \
                           "\n".join(f"- {e}" for e in errors[-3:])  # Ultimi 3 errori

        prompt = build_generation_prompt(
            technique_id=technique_id,
            behaviors=[],
            logsource_hint="process_creation",
            report_excerpt=report.content[:2000] + error_context
        )

        response = self.llm_client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": SIGMA_SYSTEM_PROMPT},
                {"role": "user", "content": prompt}
            ],
            temperature=0.1
        )

        return response.choices[0].message.content

    def _calculate_confidence(self, test_results: dict) -> float:
        """Calcola un confidence score basato sui test."""
        tp = test_results["tp_tests"]
        fp = test_results["fp_tests"]

        total_tp = tp["passed"] + tp["failed"]
        total_fp = fp["passed"] + fp["failed"]

        if total_tp == 0 or total_fp == 0:
            return 0.5

        tp_rate = tp["passed"] / total_tp
        fp_ok_rate = fp["passed"] / total_fp

        return (tp_rate + fp_ok_rate) / 2

CI/CD 파이프라인에 통합

생성되고 검증된 규칙은 자동으로 프로덕션에 적용되지 않으며 프로세스를 거쳐야 합니다. 인적 검토와 코드로서의 탐지 파이프라인의 CI/CD 게이트. 권장되는 흐름은 AI 직접 병합이 아닌 Pull Request를 생성합니다.

# GitHub Actions workflow per AI-generated rules
# File: .github/workflows/ai-sigma-generation.yml

"""
name: AI Sigma Rule Generation

on:
  schedule:
    - cron: '0 6 * * *'  # Ogni giorno alle 6:00 UTC
  workflow_dispatch:
    inputs:
      cti_url:
        description: 'URL del CTI report da processare'
        required: false

jobs:
  generate-rules:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.12'

      - name: Install dependencies
        run: pip install openai pySigma pySigma-backend-splunk pyyaml

      - name: Run AI pipeline
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        run: python scripts/ai_sigma_pipeline.py

      - name: Validate generated rules
        run: python scripts/validate_all_rules.py rules/ai-generated/

      - name: Create Pull Request
        uses: peter-evans/create-pull-request@v6
        with:
          title: '[AI-Generated] Detection rules da CTI feed'
          body: |
            ## Regole generate automaticamente da AI

            Tecniche ATT&CK rilevate e tradotte in Sigma rules.

            **REVIEW OBBLIGATORIA** - Verificare prima del merge:
            - [ ] Logsource corretto per il SIEM target
            - [ ] Condition logicamente corretta
            - [ ] Falsi positivi accettabili
            - [ ] Test su ambiente staging completato

          labels: 'ai-generated,needs-review'
          branch: 'ai-generated-rules'
"""

# Script di validazione bulk
# File: scripts/validate_all_rules.py
import sys
from pathlib import Path

def validate_directory(rules_dir: str) -> int:
    """Valida tutte le regole in una directory. Restituisce exit code."""
    validator = SigmaRuleValidator()
    rules_path = Path(rules_dir)
    failed = []

    for rule_file in rules_path.glob("**/*.yml"):
        content = rule_file.read_text()
        is_valid, errors = validator.validate(content)
        if not is_valid:
            failed.append((rule_file.name, errors))
            print(f"FAIL: {rule_file.name}")
            for e in errors:
                print(f"  - {e}")
        else:
            print(f"OK: {rule_file.name}")

    print(f"\nRisultati: {len(failed)} fallite su {len(list(rules_path.glob('**/*.yml')))} totali")

    return 1 if failed else 0

if __name__ == "__main__":
    sys.exit(validate_directory(sys.argv[1] if len(sys.argv) > 1 else "rules/"))

대체 모델 및 비용 고려 사항

모든 조직이 민감한 CTI 데이터를 클라우드 API로 보낼 수 있거나 보내기를 원하는 것은 아닙니다. 템플릿 사용 구내를 통해 올라마 o vLLM 환경을 위한 구체적인 대안 데이터 상주 요구 사항이 있습니다.

모델	시그마 품질	규칙 100개당 비용	평균 대기 시간	거주 날짜
GPT-4o	높음(87% 유효)	~$2.50	3-8초	클라우드(OpenAI)
GPT-4o-미니	양호(71% 유효)	~$0.15	1-3초	클라우드(OpenAI)
클로드 3.5 소네트	높음(84% 유효)	~$3.00	3-6초	클라우드(인류)
라마 3.1 70B(로컬)	보통(58% 유효)	~$0(아래)	15~45초	온프레미스
미스트랄 7B 미세 조정	양호(69% 유효)	~$0(아래)	5~15초	온프레미스

비용 효율적인 전략

초기 생성 및 재시도 반복에는 GPT-4o-mini를 사용합니다(저비용, 우수한 품질). GPT-4o는 2번의 시도 후에 mini가 실패하는 경우에만 해당됩니다. 이러한 하이브리드 접근 방식을 사용하면 비슷한 품질을 유지하면서 GPT-4o 단독 사용에 비해 평균 비용은 30%로 떨어집니다.

실제 한계 및 안티패턴

정직성은 이 기술을 평가하는 기본입니다. LLM은 절대 오류가 없습니다. 탐지 규칙을 생성하고 그 한계를 아는 것이 이를 악용하는 것만큼 중요합니다. 강점.

탐지 공학을 위한 LLM의 문서화된 제한 사항

들판의 환각: 모델이 존재하지 않는 필드 이름을 만들어낼 수 있습니다. 실제 로그(예: ProcessHash 대신에 Hashes 시스몬에서).
보고서 과적합: 생성된 규칙이 너무 구체적일 수 있습니다. (IOC 기반) 일반적인 동작을 캡처하는 대신.
비현실적인 거짓 긍정: LLM에 의해 생성된 "오탐지"는 종종 일반적이며 조직의 특정 맥락에서 실제 사례를 나타내지는 않습니다.
어두운 기술: 드물거나 매우 최근(컷오프 후) ATT&CK 기술의 경우, 검색량이 증가하지 않으면 품질이 크게 저하됩니다.
SIEM 컨텍스트 부족: 모델이 구체적인 정규화를 알지 못합니다. SIEM(예: Elastic ECS와 비교하여 Splunk가 Windows 필드를 정규화하는 방법)

SIEM 상황별 RAG

RAG(Retrieval Augmented Generation)를 사용하면 조직별 컨텍스트를 프롬프트에 삽입할 수 있습니다. SIEM 필드의 정규화, 기존 규칙, 거짓 긍정 교정 데이터. 이 접근 방식은 컨텍스트 부족과 관련된 오류를 크게 줄입니다.

# RAG per generazione Sigma contestualizzata
from chromadb import Client
from chromadb.utils import embedding_functions

class RAGSigmaGenerator:
    def __init__(self, chroma_persist_dir: str, openai_key: str):
        self.chroma_client = Client()
        self.embed_fn = embedding_functions.OpenAIEmbeddingFunction(
            api_key=openai_key,
            model_name="text-embedding-3-small"
        )

        # Collection di regole esistenti per few-shot contestuale
        self.rules_collection = self.chroma_client.get_or_create_collection(
            name="sigma_rules",
            embedding_function=self.embed_fn
        )

        # Collection di field mappings SIEM-specifici
        self.field_mappings_collection = self.chroma_client.get_or_create_collection(
            name="siem_field_mappings",
            embedding_function=self.embed_fn
        )

    def index_existing_rules(self, rules_dir: str) -> None:
        """Indicizza le regole esistenti per few-shot retrieval."""
        for rule_file in Path(rules_dir).glob("**/*.yml"):
            content = rule_file.read_text()
            rule_dict = yaml.safe_load(content)
            self.rules_collection.add(
                documents=[content],
                metadatas=[{
                    "title": rule_dict.get('title', ''),
                    "category": rule_dict.get('logsource', {}).get('category', ''),
                    "tags": str(rule_dict.get('tags', []))
                }],
                ids=[str(rule_file)]
            )

    def generate_with_rag(self, technique_id: str, report: ThreatReport) -> str:
        """Genera Sigma rule con contesto recuperato dal RAG."""

        # Recupera regole simili come few-shot examples
        similar_rules = self.rules_collection.query(
            query_texts=[report.content[:500]],
            n_results=3
        )

        # Recupera field mappings SIEM-specifici
        field_mappings = self.field_mappings_collection.query(
            query_texts=[f"process_creation windows {technique_id}"],
            n_results=2
        )

        # Costruisce prompt arricchito con contesto RAG
        context = "\n\n".join(similar_rules['documents'][0][:2])
        mappings_context = "\n".join(field_mappings['documents'][0])

        enhanced_prompt = f"""
Regole simili esistenti nel nostro repository (usa come ispirazione):
{context}

Field mappings specifici del nostro SIEM:
{mappings_context}

Ora genera una nuova regola per la tecnica {technique_id}
basandoti sul seguente report: {report.content[:2000]}
"""

        # Continua con la generazione standard...
        return enhanced_prompt

결론 및 다음 단계

AI 지원 탐지 엔지니어링은 트렌드가 아닙니다. 새로운 공격 기술이 등장하는 속도를 따라잡으세요. 의 조합 생성을 위한 LLM, 엄격한 자동 검증 및 합성 로그를 사용한 테스트를 통해 새로 게시된 위협 보고서의 탐지 시간을 며칠에서 몇 시간으로 단축합니다.

주요 시사점

LLM은 YAML 구조 덕분에 직접 SPL 쿼리보다 더 높은 품질의 시그마 규칙을 생성합니다.
신속한 엔지니어링(구조화된 시스템 프롬프트, 퓨샷, 일련의 사고)이 품질에 중요합니다.
자동 구문 및 의미 검증은 선택 사항이 아닙니다.
AI 생성 합성 로그를 사용한 테스트로 품질 루프가 종료됩니다.
생성된 규칙은 자동 병합이 아닌 사람의 PR 검토를 통과해야 합니다.
SIEM 상황별 RAG는 필드 오류 및 오탐을 줄입니다.
로컬 모델(Llama, Mistral 미세 조정)은 데이터 상주에 대한 유효한 대안입니다.