GitOps for Terraform: Flux TF Controller, Spacelift and Drift Detection
Bring Terraform into the GitOps paradigm: Flux Terraform Controller for reconciliation continues from repository status, Spacelift for advanced policy and RBAC and alerts drift for critical environments.
GitOps and Terraform: Why the Combination is Powerful
GitOps transformed Kubernetes application deployment: Git becomes the source of truth, a controller continuously reconciles the desired state with the real one, every change goes through a pull request. In 2026, the same paradigm is taking hold for cloud infrastructure managed with Terraform, with one crucial difference compared to Traditional CI/CD: Instead of a push-and-forget trigger, you have a reconciliation continue which detects and corrects drifts automatically.
The problem with traditional Terraform workflows based on GitHub Actions or Atlantis e which I am reactive: Someone makes a manual change on the AWS console, and no one it knows this until the next pipeline runs. With GitOps for Terraform, every discrepancy between the HCL code and the actual state becomes an alert — or is corrected automatically based on the configured policy.
What You Will Learn
- GitOps architecture for IaC: pull model vs push model
- Flux Terraform Controller: installation, Terraform object CRD and reconciliation
- Terraform state management from Kubernetes with S3 and IRSA backend
- Spacelift: stacks, Rego policies, RBAC and approval workflows
- Drift detection: Slack/PagerDuty alert for unauthorized deviations
- Pattern for critical environments: auto-remediation vs manual approval
Pull Model vs Push Model for IaC
The key distinction between GitOps and traditional CI/CD is the synchronization model. In the push model (GitHub Actions, Jenkins), the pipeline fires on every commit and “pushes” changes to the infrastructure. In the pull model (pure GitOps), an agent running inside the cluster continuously "pulls" the desired state from the repository and reconcile. This difference has profound implications for security and resilience:
# Push Model (GitHub Actions) — richiede credenziali cloud nella pipeline
# Il runner GitHub deve avere accesso outbound al cloud provider
# Problem: se il job fallisce a meta, lo state puo essere inconsistente
# Pull Model (Flux TF Controller) — l'agente vive dentro il cluster
# Solo il cluster Kubernetes ha le credenziali cloud (via IRSA o Workload Identity)
# Vantaggio: single point of trust, nessuna credenziale nelle GitHub Secrets
# Vantaggio: riconciliazione continua ogni N minuti, non solo su commit
# Confronto security:
# Push Model: GitHub runner --[credenziali]--> AWS/Azure/GCP
# Pull Model: Kubernetes pod -[IRSA/WI]--> AWS/Azure/GCP
# Git repository -[SSH/HTTPS]--> Flux controller (dentro cluster)
Flux Terraform Controller
Il Flux Terraform Controller (tf-controller) and a Kubernetes controller open-source that brings Terraform into the GitOps world. It's a Flux community project (Weaveworks + independent maintainer) which extends Flux with the ability to execute plans and apply Terraform as native Kubernetes reconciliation loops.
Installation
# Prerequisiti: cluster Kubernetes + Flux installato
# Installa Flux sul cluster (se non presente)
flux install
# Installa il TF Controller tramite HelmRelease
cat <<'EOF' | kubectl apply -f -
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
name: tf-controller
namespace: flux-system
spec:
interval: 1h
url: https://weaveworks.github.io/tf-controller
---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: tf-controller
namespace: flux-system
spec:
interval: 1h
chart:
spec:
chart: tf-controller
version: "0.16.x"
sourceRef:
kind: HelmRepository
name: tf-controller
namespace: flux-system
values:
replicaCount: 1
resources:
limits:
cpu: "1"
memory: 1Gi
requests:
cpu: 200m
memory: 512Mi
# Runner pods: eseguono il processo terraform effettivo
runner:
image:
tag: "v1.5.x-flux"
EOF
# Verifica installazione
kubectl get pods -n flux-system | grep tf-controller
# NAME READY STATUS RESTARTS
# tf-controller-6d8f9b4b5-xn7q2 1/1 Running 0
GitRepository and Terraform CRD configuration
The workflow is based on two Kubernetes objects: a GitRepository which points to
repository with the HCL code, and an object Terraform (CRD custom) that defines
what to reconcile.
# 1. GitRepository: sorgente del codice HCL
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: GitRepository
metadata:
name: infra-repo
namespace: flux-system
spec:
interval: 1m # Controlla il repo ogni minuto
url: https://github.com/myorg/terraform-infra
ref:
branch: main
secretRef:
name: github-ssh-key # Secret con chiave SSH o token
---
# 2. Terraform CRD: definisce il modulo da riconciliare
apiVersion: infra.contrib.fluxcd.io/v1alpha2
kind: Terraform
metadata:
name: aws-networking
namespace: flux-system
spec:
# Intervallo di riconciliazione
interval: 10m
# Sorgente HCL
sourceRef:
kind: GitRepository
name: infra-repo
path: ./environments/prod/networking # Path nel repo
# Approvazione automatica (auto-apply) o manuale
approvePlan: auto
# Gestione del drift: se lo stato reale differisce dal desired
# force: riconcilia automaticamente
# drift: solo alert, non corregge
enableInventory: true
# Backend per lo state (S3 con IRSA)
backendConfig:
customConfiguration: |
backend "s3" {
bucket = "myorg-terraform-state-prod"
key = "networking/terraform.tfstate"
region = "eu-west-1"
dynamodb_table = "terraform-state-lock"
encrypt = true
}
# Variabili passate al modulo
vars:
- name: environment
value: prod
- name: aws_region
value: eu-west-1
# Variabili da Secret Kubernetes (per segreti)
varsFrom:
- kind: Secret
name: terraform-vars-prod
varsKeys:
- db_password
- api_key
IRSA for AWS Access from Kubernetes
Best practices for AWS authentication from Kubernetes e IRSA (IAM Roles for Service Accounts): The Terraform pod receives a JWT token signed by the cluster that is exchanged with temporary AWS credentials, without any key hardcoded into the cluster.
# Crea il Service Account con annotazione IRSA
kubectl create serviceaccount tf-runner -n flux-system
kubectl annotate serviceaccount tf-runner \
-n flux-system \
eks.amazonaws.com/role-arn=arn:aws:iam::123456789:role/TerraformRunnerRole
# IAM Role Trust Policy (da configurare su AWS):
# {
# "Version": "2012-10-17",
# "Statement": [{
# "Effect": "Allow",
# "Principal": {
# "Federated": "arn:aws:iam::123456789:oidc-provider/oidc.eks.eu-west-1.amazonaws.com/..."
# },
# "Action": "sts:AssumeRoleWithWebIdentity",
# "Condition": {
# "StringEquals": {
# "oidc.eks.eu-west-1.amazonaws.com/...:sub":
# "system:serviceaccount:flux-system:tf-runner"
# }
# }
# }]
# }
# Aggiorna il CRD Terraform per usare il Service Account
# Aggiungi nella spec:
# serviceAccountName: tf-runner
Drift Detection and Notifications
Drift occurs when the actual state of the infrastructure differs from that
described in the HCL code — usually for manual changes on the cloud console.
The TF Controller detects the drift at each reconciliation cycle and reports it via
him Alert by Flux.
# Alert Flux per notifiche Slack sul drift
apiVersion: notification.toolkit.fluxcd.io/v1beta2
kind: Provider
metadata:
name: slack-infra
namespace: flux-system
spec:
type: slack
channel: "#infra-alerts"
secretRef:
name: slack-webhook-url
---
apiVersion: notification.toolkit.fluxcd.io/v1beta2
kind: Alert
metadata:
name: terraform-drift-alert
namespace: flux-system
spec:
providerRef:
name: slack-infra
eventSeverity: warning
eventSources:
- kind: Terraform
name: "*" # Tutti gli oggetti Terraform
# Invia alert per questi eventi:
# - drift detected
# - reconciliation failed
# - plan pending approval
# Verificare lo stato di drift manualmente
kubectl get terraform -n flux-system
# NAME READY STATUS AGE
# aws-networking True Reconciliation succeeded 2h
# aws-database False Drift detected: 3 resources 15m
# Dettaglio del drift
kubectl describe terraform aws-database -n flux-system | grep -A 20 "Conditions:"
# Conditions:
# Last Transition Time: 2026-03-20T10:30:00Z
# Message: Drift detected: aws_db_instance.main (tags changed),
# aws_security_group.db (ingress rule added manually)
# Reason: TerraformOutputsWritten
# Status: False
# Type: Ready
Spacelift: GitOps Enterprise for Terraform
Spacelift and a thoughtful SaaS platform (with self-hosted option). for teams running Terraform in enterprise environments. Unlike the TF Controller living inside the Kubernetes cluster, Spacelift offers comprehensive UI, advanced policies written in Rego (same language as OPA), granular RBAC e approval workflow with complete audit trail.
Key Spacelift Concepts
# Struttura Spacelift
# Stack = equivalente di un workspace Terraform
# Ogni stack ha:
# - Source: GitHub/GitLab repository + branch + path
# - Runner image: immagine Docker con Terraform + provider
# - Environment variables: variabili e segreti
# - Policies: regole Rego applicate a plan/apply
# - Contexts: set di variabili condivisibili tra stack
# Creare uno stack via Spacelift API (Terraform provider spacelift):
resource "spacelift_stack" "networking_prod" {
name = "networking-prod"
repository = "terraform-infra"
branch = "main"
project_root = "environments/prod/networking"
# Auto-deploy su push al branch
autodeploy = false # Per prod: richiede approvazione manuale
# Terraform version
terraform_version = "1.9.x"
labels = ["team:platform", "env:prod", "tier:networking"]
}
resource "spacelift_context_attachment" "networking_prod" {
context_id = spacelift_context.aws_prod.id
stack_id = spacelift_stack.networking_prod.id
priority = 1
}
Policy Rego in Spacelift
Rego policies are Spacelift's strong point: they allow you to define guardrails complexes that are evaluated on each plan before deciding whether to request approval, block, or auto-apply. And basically a programmable gate.
# policy: require-approval-for-destructive-changes.rego
# Richiede approvazione umana se il plan contiene distruzioni
package spacelift
# Nega auto-apply se ci sono risorse da distruggere
deny[sprintf("Destroy richiede approvazione: %s", [resource])] {
change := input.terraform.resource_changes[_]
change.change.actions[_] == "delete"
resource := change.address
}
# Blocca completamente se piu di 5 risorse vengono distrutte
deny["Piu di 5 destroy in un singolo plan: richiede approvazione senior"] {
destroy_count := count([c |
c := input.terraform.resource_changes[_]
c.change.actions[_] == "delete"
])
destroy_count > 5
}
# Warn (non blocca) per modifiche ai security group
warn[sprintf("Security group modificato: %s", [resource])] {
change := input.terraform.resource_changes[_]
change.type == "aws_security_group"
change.change.actions[_] != "no-op"
resource := change.address
}
# policy: cost-control.rego
# Blocca istanze grandi in ambienti non-prod
package spacelift
expensive_instance_types := {
"m5.4xlarge", "m5.8xlarge", "m5.16xlarge",
"c5.4xlarge", "c5.9xlarge",
"r5.4xlarge", "r5.8xlarge"
}
deny[msg] {
# Leggi i tag dallo stack Spacelift
not contains(input.spacelift.stack.labels[_], "env:prod")
# Cerca istanze EC2 con instance_type costoso
change := input.terraform.resource_changes[_]
change.type == "aws_instance"
instance_type := change.change.after.instance_type
expensive_instance_types[instance_type]
msg := sprintf(
"Istanza %s di tipo %s non consentita in ambienti non-prod",
[change.address, instance_type]
)
}
Approval Workflow Spacelift
# Spacelift approval workflow con notifiche Slack
# 1. Developer fa push al branch feature/add-rds
# 2. Spacelift crea automaticamente un preview run
# 3. La policy Rego valuta il plan: contiene 1 destroy (vecchio RDS)
# 4. Spacelift blocca l'auto-deploy e notifica Slack
# "Run #abc123 richiede approvazione: destroy aws_db_instance.old_db"
# 5. Senior engineer esamina il plan su Spacelift UI
# 6. Approva cliccando "Confirm" oppure aggiunge commento e rifiuta
# 7. Spacelift esegue l'apply o notifica il developer del blocco
# Via Spacelift CLI (spacectl):
spacectl stack run list --id networking-prod
# ID COMMIT STATE CREATED AT
# abc123 f3a8b91 PENDING_REVIEW 2026-03-20 10:30
# xyz789 a1c2d3e FINISHED 2026-03-19 14:22
spacectl run confirm --run abc123 --stack networking-prod
# Run abc123 confirmed, applying...
Advanced Drift Detection: Alert and Auto-Remediation
Drift detection is not enough if it is not accompanied by a clear response strategy. There are three approaches, each with their own trade-offs:
# Approccio 1: Solo Alert (ambienti critici, audit trail necessario)
# Il drift viene rilevato e segnalato, ma non corretto automaticamente
# Uso: database di produzione, networking critico
# Approccio 2: Auto-Remediation per drift minore
# Modifiche ai tag, aggiornamenti di patch: correggi automaticamente
# Blocca e avvisa per modifiche strutturali
# Approccio 3: Full Auto-Apply (ambienti dev/staging)
# Qualsiasi drift viene corretto immediatamente dal controller
---
# Esempio Flux TF Controller: configurazione per approccio ibrido
apiVersion: infra.contrib.fluxcd.io/v1alpha2
kind: Terraform
metadata:
name: aws-networking-prod
namespace: flux-system
spec:
interval: 5m
approvePlan: "auto" # "auto" per ambienti non critici
# Plan runner: genera il piano ma NON lo applica
# L'apply richiede un secondo passaggio (manuale o automatico)
planOnly: false
# Dopo quanti drift consecutivi inviare un alert critico
# (configurato via Flux Alert con severita error)
retryInterval: 1m
timeout: 5m
# Script di scheduled drift check (alternativa leggera senza GitOps controller)
#!/bin/bash
# drift-check.sh — eseguito ogni ora via cron o GitHub Actions scheduled
set -euo pipefail
ENVIRONMENTS=("dev" "staging" "prod")
SLACK_WEBHOOK="${SLACK_DRIFT_WEBHOOK}"
for ENV in "${ENVIRONMENTS[@]}"; do
cd "/infra/environments/${ENV}"
# Inizializza senza output
terraform init -reconfigure -input=false -no-color > /dev/null 2>&1
# Esegui plan e cattura l'exit code
# 0 = no changes, 1 = error, 2 = changes detected (drift)
set +e
terraform plan -detailed-exitcode -no-color -out=/tmp/plan-${ENV} 2>&1
EXITCODE=$?
set -e
if [ $EXITCODE -eq 2 ]; then
CHANGES=$(terraform show -no-color /tmp/plan-${ENV} | \
grep -E "^\s+(#|~|\+|-)" | head -20)
curl -s -X POST "$SLACK_WEBHOOK" \
-H "Content-Type: application/json" \
-d "{
\"text\": \"*DRIFT DETECTED* in environment: ${ENV}\n\`\`\`${CHANGES}\`\`\`\"
}"
echo "Drift alert sent for ${ENV}"
elif [ $EXITCODE -eq 0 ]; then
echo "${ENV}: no drift detected"
else
echo "ERROR: terraform plan failed for ${ENV}" >&2
exit 1
fi
done
Comparison: TF Controller vs Spacelift vs Atlantis
When to Use Which Tool
- Flux TF Controller: Team that already uses Flux/Argo for Kubernetes, wants pure and open-source GitOps, manages AWS infrastructure with IRSA. Self-hosted, free, medium learning curve.
- Spacelift: Enterprise team with complex RBAC requirements, audit trail, approval workflow with multiple approvers, advanced Rego policies. Paid SaaS, great UX, out-of-the-box integrations (Slack, PagerDuty, Jira).
- Atlantis: Team that wants to remain in the PR-based paradigm without Pure GitOps. Plan/Apply commentable directly in the PR. Self-hosted, free, very mature. It has no native continuous reconciliation.
- Terraform Cloud/Enterprise: Natural choice if already in the ecosystem HashiCorp, native Sentinel policy language, Vault integration. See Article 10.
Best Practices for GitOps IaC in Production
# Repository structure per GitOps Terraform
terraform-infra/
├── modules/ # Moduli riusabili (non riconciliati direttamente)
│ ├── networking/
│ ├── compute/
│ └── database/
├── environments/
│ ├── dev/
│ │ ├── networking/ # Stack separati per ogni layer
│ │ │ ├── main.tf
│ │ │ └── terraform.auto.tfvars
│ │ ├── compute/
│ │ └── database/
│ ├── staging/
│ └── prod/
│ ├── networking/ # Ogni ambiente ha il suo state isolato
│ ├── compute/
│ └── database/
├── flux/ # Manifesti Flux per i CRD Terraform
│ ├── dev/
│ │ ├── networking-tf.yaml
│ │ └── compute-tf.yaml
│ └── prod/
│ ├── networking-tf.yaml # approvePlan: "auto" o manuale
│ └── compute-tf.yaml
└── policies/ # Policy Rego (se Spacelift)
├── require-approval.rego
└── cost-control.rego
Anti-Pattern: Reconciliation Too Aggressive
Set interval: 1m with approvePlan: auto on environments
production and dangerous: a change not yet merged in main could be
applied before the review. The golden rule: the more critical the environment, the longer
and the interval is more stringent and the approval process. In prod, use interval
of 30m+ and always require manual approval for structural changes.
Conclusions and Next Steps
GitOps for Terraform represents the maturity of Infrastructure as Code: not anymore trigger-based pipelines but continuous reconciliation, no more credentials in pipelines but native identities of the cluster, no longer "who made that change" but audit trails complete in Git. The Flux TF Controller is the ideal choice for Kubernetes-native teams, while Spacelift meets enterprise requirements with its Rego policy engine.
The Complete Series: Terraform and IaC
- Article 01 — Terraform from Scratch: HCL, Provider and Plan-Apply-Destroy
- Article 02 — Designing Reusable Terraform Modules
- Article 03 — Terraform State: Remote Backend with S3/GCS
- Article 04 — Terraform in CI/CD: GitHub Actions and Atlantis
- Article 05 — IaC Testing: Terratest and Terraform Test
- Article 06 — IaC security: Checkov, Trivy and OPA
- Article 07 — Terraform Multi-Cloud: AWS + Azure + GCP
- Article 08 (this) — GitOps for Terraform: Flux TF Controller, Spacelift, and Drift Detection
- Article 09 — Terraform vs Pulumi vs OpenTofu: Comparison 2026
- Article 10 — Terraform Enterprise Patterns: Workspace, Sentinel, and Team Scaling







