GitOps and Terraform: Why the Combination is Powerful

GitOps transformed Kubernetes application deployment: Git becomes the source of truth, a controller continuously reconciles the desired state with the real one, every change goes through a pull request. In 2026, the same paradigm is taking hold for cloud infrastructure managed with Terraform, with one crucial difference compared to Traditional CI/CD: Instead of a push-and-forget trigger, you have a reconciliation continue which detects and corrects drifts automatically.

The problem with traditional Terraform workflows based on GitHub Actions or Atlantis e which I am reactive: Someone makes a manual change on the AWS console, and no one it knows this until the next pipeline runs. With GitOps for Terraform, every discrepancy between the HCL code and the actual state becomes an alert — or is corrected automatically based on the configured policy.

What You Will Learn

  • GitOps architecture for IaC: pull model vs push model
  • Flux Terraform Controller: installation, Terraform object CRD and reconciliation
  • Terraform state management from Kubernetes with S3 and IRSA backend
  • Spacelift: stacks, Rego policies, RBAC and approval workflows
  • Drift detection: Slack/PagerDuty alert for unauthorized deviations
  • Pattern for critical environments: auto-remediation vs manual approval

Pull Model vs Push Model for IaC

The key distinction between GitOps and traditional CI/CD is the synchronization model. In the push model (GitHub Actions, Jenkins), the pipeline fires on every commit and “pushes” changes to the infrastructure. In the pull model (pure GitOps), an agent running inside the cluster continuously "pulls" the desired state from the repository and reconcile. This difference has profound implications for security and resilience:

# Push Model (GitHub Actions) — richiede credenziali cloud nella pipeline
# Il runner GitHub deve avere accesso outbound al cloud provider
# Problem: se il job fallisce a meta, lo state puo essere inconsistente

# Pull Model (Flux TF Controller) — l'agente vive dentro il cluster
# Solo il cluster Kubernetes ha le credenziali cloud (via IRSA o Workload Identity)
# Vantaggio: single point of trust, nessuna credenziale nelle GitHub Secrets
# Vantaggio: riconciliazione continua ogni N minuti, non solo su commit

# Confronto security:
# Push Model: GitHub runner --[credenziali]--> AWS/Azure/GCP
# Pull Model: Kubernetes pod -[IRSA/WI]--> AWS/Azure/GCP
#             Git repository -[SSH/HTTPS]--> Flux controller (dentro cluster)

Flux Terraform Controller

Il Flux Terraform Controller (tf-controller) and a Kubernetes controller open-source that brings Terraform into the GitOps world. It's a Flux community project (Weaveworks + independent maintainer) which extends Flux with the ability to execute plans and apply Terraform as native Kubernetes reconciliation loops.

Installation

# Prerequisiti: cluster Kubernetes + Flux installato
# Installa Flux sul cluster (se non presente)
flux install

# Installa il TF Controller tramite HelmRelease
cat <<'EOF' | kubectl apply -f -
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
  name: tf-controller
  namespace: flux-system
spec:
  interval: 1h
  url: https://weaveworks.github.io/tf-controller
---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: tf-controller
  namespace: flux-system
spec:
  interval: 1h
  chart:
    spec:
      chart: tf-controller
      version: "0.16.x"
      sourceRef:
        kind: HelmRepository
        name: tf-controller
        namespace: flux-system
  values:
    replicaCount: 1
    resources:
      limits:
        cpu: "1"
        memory: 1Gi
      requests:
        cpu: 200m
        memory: 512Mi
    # Runner pods: eseguono il processo terraform effettivo
    runner:
      image:
        tag: "v1.5.x-flux"
EOF

# Verifica installazione
kubectl get pods -n flux-system | grep tf-controller
# NAME                                          READY   STATUS    RESTARTS
# tf-controller-6d8f9b4b5-xn7q2               1/1     Running   0

GitRepository and Terraform CRD configuration

The workflow is based on two Kubernetes objects: a GitRepository which points to repository with the HCL code, and an object Terraform (CRD custom) that defines what to reconcile.

# 1. GitRepository: sorgente del codice HCL
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: GitRepository
metadata:
  name: infra-repo
  namespace: flux-system
spec:
  interval: 1m          # Controlla il repo ogni minuto
  url: https://github.com/myorg/terraform-infra
  ref:
    branch: main
  secretRef:
    name: github-ssh-key  # Secret con chiave SSH o token

---
# 2. Terraform CRD: definisce il modulo da riconciliare
apiVersion: infra.contrib.fluxcd.io/v1alpha2
kind: Terraform
metadata:
  name: aws-networking
  namespace: flux-system
spec:
  # Intervallo di riconciliazione
  interval: 10m

  # Sorgente HCL
  sourceRef:
    kind: GitRepository
    name: infra-repo
  path: ./environments/prod/networking   # Path nel repo

  # Approvazione automatica (auto-apply) o manuale
  approvePlan: auto

  # Gestione del drift: se lo stato reale differisce dal desired
  # force: riconcilia automaticamente
  # drift: solo alert, non corregge
  enableInventory: true

  # Backend per lo state (S3 con IRSA)
  backendConfig:
    customConfiguration: |
      backend "s3" {
        bucket         = "myorg-terraform-state-prod"
        key            = "networking/terraform.tfstate"
        region         = "eu-west-1"
        dynamodb_table = "terraform-state-lock"
        encrypt        = true
      }

  # Variabili passate al modulo
  vars:
    - name: environment
      value: prod
    - name: aws_region
      value: eu-west-1

  # Variabili da Secret Kubernetes (per segreti)
  varsFrom:
    - kind: Secret
      name: terraform-vars-prod
      varsKeys:
        - db_password
        - api_key

IRSA for AWS Access from Kubernetes

Best practices for AWS authentication from Kubernetes e IRSA (IAM Roles for Service Accounts): The Terraform pod receives a JWT token signed by the cluster that is exchanged with temporary AWS credentials, without any key hardcoded into the cluster.

# Crea il Service Account con annotazione IRSA
kubectl create serviceaccount tf-runner -n flux-system

kubectl annotate serviceaccount tf-runner \
  -n flux-system \
  eks.amazonaws.com/role-arn=arn:aws:iam::123456789:role/TerraformRunnerRole

# IAM Role Trust Policy (da configurare su AWS):
# {
#   "Version": "2012-10-17",
#   "Statement": [{
#     "Effect": "Allow",
#     "Principal": {
#       "Federated": "arn:aws:iam::123456789:oidc-provider/oidc.eks.eu-west-1.amazonaws.com/..."
#     },
#     "Action": "sts:AssumeRoleWithWebIdentity",
#     "Condition": {
#       "StringEquals": {
#         "oidc.eks.eu-west-1.amazonaws.com/...:sub":
#           "system:serviceaccount:flux-system:tf-runner"
#       }
#     }
#   }]
# }

# Aggiorna il CRD Terraform per usare il Service Account
# Aggiungi nella spec:
# serviceAccountName: tf-runner

Drift Detection and Notifications

Drift occurs when the actual state of the infrastructure differs from that described in the HCL code — usually for manual changes on the cloud console. The TF Controller detects the drift at each reconciliation cycle and reports it via him Alert by Flux.

# Alert Flux per notifiche Slack sul drift
apiVersion: notification.toolkit.fluxcd.io/v1beta2
kind: Provider
metadata:
  name: slack-infra
  namespace: flux-system
spec:
  type: slack
  channel: "#infra-alerts"
  secretRef:
    name: slack-webhook-url

---
apiVersion: notification.toolkit.fluxcd.io/v1beta2
kind: Alert
metadata:
  name: terraform-drift-alert
  namespace: flux-system
spec:
  providerRef:
    name: slack-infra
  eventSeverity: warning
  eventSources:
    - kind: Terraform
      name: "*"   # Tutti gli oggetti Terraform
  # Invia alert per questi eventi:
  # - drift detected
  # - reconciliation failed
  # - plan pending approval
# Verificare lo stato di drift manualmente
kubectl get terraform -n flux-system
# NAME              READY   STATUS                          AGE
# aws-networking    True    Reconciliation succeeded        2h
# aws-database      False   Drift detected: 3 resources     15m

# Dettaglio del drift
kubectl describe terraform aws-database -n flux-system | grep -A 20 "Conditions:"
# Conditions:
#   Last Transition Time:  2026-03-20T10:30:00Z
#   Message:               Drift detected: aws_db_instance.main (tags changed),
#                           aws_security_group.db (ingress rule added manually)
#   Reason:                TerraformOutputsWritten
#   Status:                False
#   Type:                  Ready

Spacelift: GitOps Enterprise for Terraform

Spacelift and a thoughtful SaaS platform (with self-hosted option). for teams running Terraform in enterprise environments. Unlike the TF Controller living inside the Kubernetes cluster, Spacelift offers comprehensive UI, advanced policies written in Rego (same language as OPA), granular RBAC e approval workflow with complete audit trail.

Key Spacelift Concepts

# Struttura Spacelift
# Stack = equivalente di un workspace Terraform
# Ogni stack ha:
# - Source: GitHub/GitLab repository + branch + path
# - Runner image: immagine Docker con Terraform + provider
# - Environment variables: variabili e segreti
# - Policies: regole Rego applicate a plan/apply
# - Contexts: set di variabili condivisibili tra stack

# Creare uno stack via Spacelift API (Terraform provider spacelift):
resource "spacelift_stack" "networking_prod" {
  name        = "networking-prod"
  repository  = "terraform-infra"
  branch      = "main"
  project_root = "environments/prod/networking"

  # Auto-deploy su push al branch
  autodeploy = false   # Per prod: richiede approvazione manuale

  # Terraform version
  terraform_version = "1.9.x"

  labels = ["team:platform", "env:prod", "tier:networking"]
}

resource "spacelift_context_attachment" "networking_prod" {
  context_id = spacelift_context.aws_prod.id
  stack_id   = spacelift_stack.networking_prod.id
  priority   = 1
}

Policy Rego in Spacelift

Rego policies are Spacelift's strong point: they allow you to define guardrails complexes that are evaluated on each plan before deciding whether to request approval, block, or auto-apply. And basically a programmable gate.

# policy: require-approval-for-destructive-changes.rego
# Richiede approvazione umana se il plan contiene distruzioni

package spacelift

# Nega auto-apply se ci sono risorse da distruggere
deny[sprintf("Destroy richiede approvazione: %s", [resource])] {
  change := input.terraform.resource_changes[_]
  change.change.actions[_] == "delete"
  resource := change.address
}

# Blocca completamente se piu di 5 risorse vengono distrutte
deny["Piu di 5 destroy in un singolo plan: richiede approvazione senior"] {
  destroy_count := count([c |
    c := input.terraform.resource_changes[_]
    c.change.actions[_] == "delete"
  ])
  destroy_count > 5
}

# Warn (non blocca) per modifiche ai security group
warn[sprintf("Security group modificato: %s", [resource])] {
  change := input.terraform.resource_changes[_]
  change.type == "aws_security_group"
  change.change.actions[_] != "no-op"
  resource := change.address
}
# policy: cost-control.rego
# Blocca istanze grandi in ambienti non-prod

package spacelift

expensive_instance_types := {
  "m5.4xlarge", "m5.8xlarge", "m5.16xlarge",
  "c5.4xlarge", "c5.9xlarge",
  "r5.4xlarge", "r5.8xlarge"
}

deny[msg] {
  # Leggi i tag dallo stack Spacelift
  not contains(input.spacelift.stack.labels[_], "env:prod")

  # Cerca istanze EC2 con instance_type costoso
  change := input.terraform.resource_changes[_]
  change.type == "aws_instance"
  instance_type := change.change.after.instance_type
  expensive_instance_types[instance_type]

  msg := sprintf(
    "Istanza %s di tipo %s non consentita in ambienti non-prod",
    [change.address, instance_type]
  )
}

Approval Workflow Spacelift

# Spacelift approval workflow con notifiche Slack

# 1. Developer fa push al branch feature/add-rds
# 2. Spacelift crea automaticamente un preview run
# 3. La policy Rego valuta il plan: contiene 1 destroy (vecchio RDS)
# 4. Spacelift blocca l'auto-deploy e notifica Slack
#    "Run #abc123 richiede approvazione: destroy aws_db_instance.old_db"
# 5. Senior engineer esamina il plan su Spacelift UI
# 6. Approva cliccando "Confirm" oppure aggiunge commento e rifiuta
# 7. Spacelift esegue l'apply o notifica il developer del blocco

# Via Spacelift CLI (spacectl):
spacectl stack run list --id networking-prod
# ID        COMMIT    STATE           CREATED AT
# abc123    f3a8b91   PENDING_REVIEW  2026-03-20 10:30
# xyz789    a1c2d3e   FINISHED        2026-03-19 14:22

spacectl run confirm --run abc123 --stack networking-prod
# Run abc123 confirmed, applying...

Advanced Drift Detection: Alert and Auto-Remediation

Drift detection is not enough if it is not accompanied by a clear response strategy. There are three approaches, each with their own trade-offs:

# Approccio 1: Solo Alert (ambienti critici, audit trail necessario)
# Il drift viene rilevato e segnalato, ma non corretto automaticamente
# Uso: database di produzione, networking critico

# Approccio 2: Auto-Remediation per drift minore
# Modifiche ai tag, aggiornamenti di patch: correggi automaticamente
# Blocca e avvisa per modifiche strutturali

# Approccio 3: Full Auto-Apply (ambienti dev/staging)
# Qualsiasi drift viene corretto immediatamente dal controller

---
# Esempio Flux TF Controller: configurazione per approccio ibrido
apiVersion: infra.contrib.fluxcd.io/v1alpha2
kind: Terraform
metadata:
  name: aws-networking-prod
  namespace: flux-system
spec:
  interval: 5m
  approvePlan: "auto"    # "auto" per ambienti non critici

  # Plan runner: genera il piano ma NON lo applica
  # L'apply richiede un secondo passaggio (manuale o automatico)
  planOnly: false

  # Dopo quanti drift consecutivi inviare un alert critico
  # (configurato via Flux Alert con severita error)
  retryInterval: 1m
  timeout: 5m
# Script di scheduled drift check (alternativa leggera senza GitOps controller)
#!/bin/bash
# drift-check.sh — eseguito ogni ora via cron o GitHub Actions scheduled

set -euo pipefail

ENVIRONMENTS=("dev" "staging" "prod")
SLACK_WEBHOOK="${SLACK_DRIFT_WEBHOOK}"

for ENV in "${ENVIRONMENTS[@]}"; do
  cd "/infra/environments/${ENV}"

  # Inizializza senza output
  terraform init -reconfigure -input=false -no-color > /dev/null 2>&1

  # Esegui plan e cattura l'exit code
  # 0 = no changes, 1 = error, 2 = changes detected (drift)
  set +e
  terraform plan -detailed-exitcode -no-color -out=/tmp/plan-${ENV} 2>&1
  EXITCODE=$?
  set -e

  if [ $EXITCODE -eq 2 ]; then
    CHANGES=$(terraform show -no-color /tmp/plan-${ENV} | \
      grep -E "^\s+(#|~|\+|-)" | head -20)

    curl -s -X POST "$SLACK_WEBHOOK" \
      -H "Content-Type: application/json" \
      -d "{
        \"text\": \"*DRIFT DETECTED* in environment: ${ENV}\n\`\`\`${CHANGES}\`\`\`\"
      }"
    echo "Drift alert sent for ${ENV}"
  elif [ $EXITCODE -eq 0 ]; then
    echo "${ENV}: no drift detected"
  else
    echo "ERROR: terraform plan failed for ${ENV}" >&2
    exit 1
  fi
done

Comparison: TF Controller vs Spacelift vs Atlantis

When to Use Which Tool

  • Flux TF Controller: Team that already uses Flux/Argo for Kubernetes, wants pure and open-source GitOps, manages AWS infrastructure with IRSA. Self-hosted, free, medium learning curve.
  • Spacelift: Enterprise team with complex RBAC requirements, audit trail, approval workflow with multiple approvers, advanced Rego policies. Paid SaaS, great UX, out-of-the-box integrations (Slack, PagerDuty, Jira).
  • Atlantis: Team that wants to remain in the PR-based paradigm without Pure GitOps. Plan/Apply commentable directly in the PR. Self-hosted, free, very mature. It has no native continuous reconciliation.
  • Terraform Cloud/Enterprise: Natural choice if already in the ecosystem HashiCorp, native Sentinel policy language, Vault integration. See Article 10.

Best Practices for GitOps IaC in Production

# Repository structure per GitOps Terraform
terraform-infra/
├── modules/                    # Moduli riusabili (non riconciliati direttamente)
│   ├── networking/
│   ├── compute/
│   └── database/
├── environments/
│   ├── dev/
│   │   ├── networking/         # Stack separati per ogni layer
│   │   │   ├── main.tf
│   │   │   └── terraform.auto.tfvars
│   │   ├── compute/
│   │   └── database/
│   ├── staging/
│   └── prod/
│       ├── networking/         # Ogni ambiente ha il suo state isolato
│       ├── compute/
│       └── database/
├── flux/                       # Manifesti Flux per i CRD Terraform
│   ├── dev/
│   │   ├── networking-tf.yaml
│   │   └── compute-tf.yaml
│   └── prod/
│       ├── networking-tf.yaml  # approvePlan: "auto" o manuale
│       └── compute-tf.yaml
└── policies/                   # Policy Rego (se Spacelift)
    ├── require-approval.rego
    └── cost-control.rego

Anti-Pattern: Reconciliation Too Aggressive

Set interval: 1m with approvePlan: auto on environments production and dangerous: a change not yet merged in main could be applied before the review. The golden rule: the more critical the environment, the longer and the interval is more stringent and the approval process. In prod, use interval of 30m+ and always require manual approval for structural changes.

Conclusions and Next Steps

GitOps for Terraform represents the maturity of Infrastructure as Code: not anymore trigger-based pipelines but continuous reconciliation, no more credentials in pipelines but native identities of the cluster, no longer "who made that change" but audit trails complete in Git. The Flux TF Controller is the ideal choice for Kubernetes-native teams, while Spacelift meets enterprise requirements with its Rego policy engine.

The Complete Series: Terraform and IaC

  • Article 01 — Terraform from Scratch: HCL, Provider and Plan-Apply-Destroy
  • Article 02 — Designing Reusable Terraform Modules
  • Article 03 — Terraform State: Remote Backend with S3/GCS
  • Article 04 — Terraform in CI/CD: GitHub Actions and Atlantis
  • Article 05 — IaC Testing: Terratest and Terraform Test
  • Article 06 — IaC security: Checkov, Trivy and OPA
  • Article 07 — Terraform Multi-Cloud: AWS + Azure + GCP
  • Article 08 (this) — GitOps for Terraform: Flux TF Controller, Spacelift, and Drift Detection
  • Article 09 — Terraform vs Pulumi vs OpenTofu: Comparison 2026
  • Article 10 — Terraform Enterprise Patterns: Workspace, Sentinel, and Team Scaling