IaC at Enterprise Scale: The Real Challenges

When Terraform goes from a team of 3 people to an organization with 50 teams that manage hundreds of cloud environments, challenges emerge that the introductory articles they never mention: how to ensure that no one can create a $50,000/month instance accidentally? How to prevent one team from overwriting another's state? How ensure compliance with company security policies without blocking the productivity? How to manage secrets centrally?

In this final article of the series we cover enterprise patterns for Terraform: HCP Terraform (formerly Terraform Cloud) as a governance platform, Sentinel as policy language for financial guardrails and compliance, e HashiCorp Vault for the centralized management of secrets.

What You Will Learn

  • Workspace strategy: flat, monorepo, per-team/environment/layer
  • Sentinel: HashiCorp policy language, mock-based testing, enforcement levels
  • Financial guardrails: block expensive instances, automatic budget alerts
  • Private Module Registry: versioning, governance, RBAC for modules
  • Vault + Terraform: dynamic secrets, secret injection without hardcoding
  • Team scaling: variable sets, run triggers, agent pools

Workspace Strategy: Organizing Infrastructure at Scale

A Terraform Cloud/Enterprise workspace corresponds to an isolated state file. The naming and organization strategy of workspaces is the first decision impacts the entire life cycle of enterprise infrastructure. There are three patterns main ones, each with distinct pros and cons.

Pattern 1: By Environment (most common)

# Naming convention: {team}-{service}-{environment}
# Esempio per un team platform con servizi networking, compute, database:

platform-networking-dev
platform-networking-staging
platform-networking-prod

platform-compute-dev
platform-compute-staging
platform-compute-prod

platform-database-dev
platform-database-staging
platform-database-prod

# Pro: isolamento completo per ambiente
# Pro: policy diverse per prod vs non-prod (piu strict su prod)
# Con: se hai 20 team con 5 servizi ognuno = 300 workspace da gestire
# Con: aggiornare una variabile comune richiede toccare molti workspace

Pattern 2: Monorepo with Dynamic Workspaces

# Struttura monorepo con HCP Terraform
terraform-infra/
├── modules/                     # Moduli privati condivisi
├── stacks/                      # "Stack" = unita di deployment
│   ├── networking/
│   │   ├── main.tf
│   │   └── variables.tf
│   ├── eks-cluster/
│   └── rds-aurora/
└── workspaces/                  # Configurazione per ogni workspace
    ├── platform-networking-prod.tfvars
    ├── platform-networking-staging.tfvars
    └── team-a-eks-prod.tfvars

# HCP Terraform workspace configuration (via Terraform provider)
resource "tfe_workspace" "networking_prod" {
  name         = "platform-networking-prod"
  organization = "myorg"

  # Trigger: applica automaticamente su push al branch main
  # ma solo per i file nella directory stacks/networking
  trigger_prefixes  = ["stacks/networking/"]
  working_directory = "stacks/networking"

  # Auto-apply solo in non-prod
  auto_apply = false  # prod: sempre approvazione manuale

  # Terraform version
  terraform_version = "1.9.x"

  tag_names = ["env:prod", "team:platform", "tier:networking"]
}

Variable Sets: Shared Configuration between Workspaces

# Variable Sets = gruppo di variabili applicabili a piu workspace
# Evita di duplicare le stesse variabili in 50 workspace

# Esempio: Variable Set globale con configurazione AWS
resource "tfe_variable_set" "aws_prod" {
  name         = "AWS Production Credentials"
  description  = "Credenziali AWS per ambienti di produzione"
  organization = "myorg"
  global       = false  # Non globale: solo workspace con tag specifici
}

# Variabili nel set (environment variables per il runner)
resource "tfe_variable" "aws_role_arn" {
  key             = "AWS_ROLE_ARN"
  value           = "arn:aws:iam::123456789:role/TerraformProdRole"
  category        = "env"
  variable_set_id = tfe_variable_set.aws_prod.id
  sensitive       = false
}

# Applica il variable set a workspace con tag "env:prod"
resource "tfe_workspace_variable_set" "prod_networking" {
  variable_set_id = tfe_variable_set.aws_prod.id
  workspace_id    = tfe_workspace.networking_prod.id
}

Sentinel: Policy as Code for Enterprise Governance

Sentinel and HashiCorp's proprietary policy framework, available in the Plus and Enterprise tiers of HCP Terraform. Unlike OPA/Rego (open-source), Sentinel is designed specifically for HashiCorp: it has native access to the Terraform plan with a rich data model and a three-level enforcement system.

Enforcement Levels

# Sentinel ha tre livelli di enforcement:

# 1. advisory: la policy fallisce ma il run procede (solo log/warning)
#    Uso: notifiche soft, metriche, awareness

# 2. soft-mandatory: la policy fallisce e blocca l'apply,
#    MA un utente con permessi puo fare override con giustificazione
#    Uso: policy importanti che potrebbero avere eccezioni legittime

# 3. hard-mandatory: la policy fallisce e NESSUNO puo fare override
#    Uso: compliance legale, sicurezza critica, guardrail finanziari

# Configurazione enforcement nel policy set (HCP Terraform):
resource "tfe_policy_set" "global_policies" {
  name         = "global-security-policies"
  organization = "myorg"
  global       = true   # Applica a tutti i workspace

  policy_ids = [
    tfe_sentinel_policy.no_public_s3.id,
    tfe_sentinel_policy.cost_limits.id,
    tfe_sentinel_policy.required_tags.id,
  ]
}

Policy Sentinel: Financial Guardrails

# cost-control.sentinel
# Blocca hard se il costo stimato supera soglie per ambiente

import "tfplan/v2" as tfplan
import "decimal"

# Leggi il tag "env" dal workspace per determinare le soglie
env = tfplan.variables["environment"].value

# Soglie di costo mensile (USD) per ambiente
cost_limits = {
  "dev":     decimal.new(500),
  "staging": decimal.new(2000),
  "prod":    decimal.new(50000),
}

# Tipi di istanza EC2 proibiti in non-prod
expensive_types = [
  "m5.4xlarge", "m5.8xlarge", "m5.16xlarge", "m5.24xlarge",
  "c5.9xlarge", "c5.18xlarge",
  "r5.8xlarge", "r5.16xlarge", "r5.24xlarge",
  "p3.2xlarge", "p3.8xlarge", "p4d.24xlarge",
]

# Regola 1: Blocca istanze costose in non-prod
deny_expensive_instances = rule {
  env is "prod" or
  all tfplan.resource_changes as _, rc {
    rc.type is not "aws_instance" or
    rc.change.after.instance_type not in expensive_types
  }
}

# Regola 2: Verifica costo stimato se disponibile (richiede Infracost)
check_estimated_cost = rule {
  # La policy puo accedere ai costi stimati tramite Terraform Cloud
  # se Infracost e integrato nel workflow
  true  # Placeholder: implementare con Infracost webhook
}

# Main: combina le regole
main = rule {
  deny_expensive_instances and
  check_estimated_cost
}

Policy Sentinel: Mandatory Tags

# required-tags.sentinel
# Garantisce che tutte le risorse taggabili abbiano i tag obbligatori

import "tfplan/v2" as tfplan

# Tag obbligatori per ogni risorsa
required_tags = ["Environment", "Team", "CostCenter", "ManagedBy"]

# Tipi di risorsa che supportano i tag (lista parziale)
taggable_types = [
  "aws_instance", "aws_vpc", "aws_subnet", "aws_s3_bucket",
  "aws_rds_instance", "aws_eks_cluster", "aws_lb",
  "azurerm_virtual_machine", "azurerm_virtual_network",
  "google_compute_instance", "google_storage_bucket",
]

# Controlla ogni risorsa che viene creata o modificata
violations = []

for tfplan.resource_changes as address, rc {
  if rc.type in taggable_types and
     rc.change.actions contains "create" or rc.change.actions contains "update" {
    tags = rc.change.after.tags else {}
    for required_tags as tag {
      if tag not in tags or tags[tag] is "" {
        append(violations, sprintf(
          "Risorsa %s manca del tag obbligatorio: %s",
          [address, tag]
        ))
      }
    }
  }
}

# Stampa le violazioni per il feedback all'utente
print("Tag violations:", violations)

main = rule { length(violations) is 0 }

Testing Policy Sentinel

# Sentinel CLI per testing locale delle policy
# Installa Sentinel CLI
sentinel version
# Sentinel v0.26.x

# Struttura directory per testing:
policies/
├── cost-control.sentinel
├── required-tags.sentinel
└── test/
    ├── cost-control/
    │   ├── pass/
    │   │   └── mock-tfplan.json    # Plan con istanze economiche
    │   └── fail/
    │       └── mock-tfplan.json    # Plan con istanze costose
    └── required-tags/
        ├── pass/
        │   └── mock-tfplan.json    # Tutte le risorse hanno i tag
        └── fail/
            └── mock-tfplan.json    # Risorse senza tag

# Genera mock da un plan reale:
terraform plan -out=tfplan.bin
terraform show -json tfplan.bin > mock-tfplan.json
# Modifica il JSON per creare scenari pass/fail

# Esegui i test:
sentinel test cost-control.sentinel
# PASS - cost-control.sentinel (2 test)
#   PASS - test/cost-control/pass/mock-tfplan.json
#   PASS - test/cost-control/fail/mock-tfplan.json

sentinel apply cost-control.sentinel
# Execution trace...
# Policy result: true

Private Module Registry

In an enterprise organization, Terraform modules are the primary mechanism for standardize the infrastructure: the Platform team publishes "golden path" modules that application teams consume. HCP Terraform offers a private registry with versioning semantic, auto-generated documentation and RBAC.

# Struttura di un modulo pubblicabile nel registry privato
terraform-module-aws-vpc/
├── main.tf
├── variables.tf
├── outputs.tf
├── versions.tf
├── README.md           # Documentazione auto-esposta nel registry
├── examples/
│   ├── simple/
│   │   └── main.tf     # Esempio minimo
│   └── full/
│       └── main.tf     # Esempio completo
└── tests/
    └── vpc_test.go     # Test Terratest

# Pubblicazione via Git tags (il registry legge i tag SemVer):
git tag v1.2.0
git push origin v1.2.0

# HCP Terraform registry: importa automaticamente il modulo
# dal repository GitHub/GitLab quando trova un tag vX.Y.Z

# Uso del modulo privato da altri workspace:
module "vpc" {
  source  = "app.terraform.io/myorg/aws-vpc/aws"
  version = "~> 1.2"

  environment  = var.environment
  cidr_block   = "10.0.0.0/16"
  subnet_count = 3
}

HashiCorp Vault: Secret Management for Terraform

The most common problem in Terraform enterprise pipelines is secret management: database passwords, API keys, certificates. HashiCorp Vault solves this with dynamic secrets: Instead of static credentials, Vault generates on-demand temporary credentials that expire automatically.

Vault Dynamic Secrets for AWS

# Configurazione Vault per generare credenziali AWS temporanee

# Provider Vault in Terraform
provider "vault" {
  address = "https://vault.myorg.internal:8200"
  # Autenticazione via AppRole (in CI/CD) o AWS IAM auth (in EC2/EKS)
}

# Leggi credenziali AWS dinamiche da Vault
data "vault_aws_access_credentials" "terraform_runner" {
  backend = "aws"
  role    = "terraform-role"
  type    = "iam_user"   # oppure "assumed_role" per STS
}

# Usa le credenziali nel provider AWS
provider "aws" {
  region     = var.aws_region
  access_key = data.vault_aws_access_credentials.terraform_runner.access_key
  secret_key = data.vault_aws_access_credentials.terraform_runner.secret_key
  token      = data.vault_aws_access_credentials.terraform_runner.security_token
}

# Vault genera credenziali con TTL di 1 ora
# Dopo il terraform apply, le credenziali scadono automaticamente
# Nessuna credenziale permanente nelle pipeline

Vault for Application Secrets

# Recuperare segreti da Vault per passarli alle risorse
# (es: password database RDS)

data "vault_kv_secret_v2" "db_password" {
  mount = "secret"
  name  = "prod/database/rds-main"
}

# Usa il segreto nella risorsa RDS
resource "aws_db_instance" "main" {
  identifier        = "${local.name_prefix}-rds"
  engine            = "postgres"
  engine_version    = "15.4"
  instance_class    = "db.r6g.xlarge"

  # Password da Vault: non hardcoded, non in tfvars
  password          = data.vault_kv_secret_v2.db_password.data["password"]

  username          = "app_user"
  db_name           = "appdb"

  # Output: il DB endpoint viene scritto su Vault dopo l'apply
  # tramite un provisioner o una pipeline separata
}

# Scrivi l'endpoint del DB su Vault dopo la creazione
resource "vault_kv_secret_v2" "db_connection" {
  mount = "secret"
  name  = "prod/database/connection-info"

  data_json = jsonencode({
    host     = aws_db_instance.main.address
    port     = aws_db_instance.main.port
    database = aws_db_instance.main.db_name
  })
}

Run Triggers and Dependency Graph between Workspaces

# In organizzazioni grandi, i workspace hanno dipendenze:
# networking -> compute -> application
# Il cambio di networking deve triggerare il re-apply di compute

# HCP Terraform: Run Triggers
resource "tfe_run_trigger" "compute_from_networking" {
  workspace_id  = tfe_workspace.compute_prod.id        # workspace downstream
  sourceable_id = tfe_workspace.networking_prod.id     # workspace upstream
}

# Quando networking-prod completa un apply con successo,
# HCP Terraform triggera automaticamente un plan su compute-prod
# che legge gli output di networking via remote_state data source

# Lettura output da workspace upstream
data "tfe_outputs" "networking" {
  organization = "myorg"
  workspace    = "platform-networking-prod"
}

# Usa gli output nel compute workspace
resource "aws_eks_cluster" "main" {
  name = "${local.name_prefix}-eks"

  vpc_config {
    subnet_ids = data.tfe_outputs.networking.values.private_subnet_ids
  }
}

Agent Pools: Self-Hosted Runner in Private Environments

# HCP Terraform Agent: per eseguire piani in reti private
# (es: infrastruttura on-premise, VPC senza accesso pubblico)

# Installa l'agent nel tuo ambiente
docker run -e TFE_AGENT_TOKEN="your-token" \
  -e TFE_AGENT_NAME="datacenter-agent-01" \
  hashicorp/tfc-agent:latest

# Oppure via Kubernetes DaemonSet nel cluster privato
kubectl apply -f - <<'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
  name: tfc-agent
  namespace: terraform-system
spec:
  replicas: 3
  selector:
    matchLabels:
      app: tfc-agent
  template:
    spec:
      containers:
        - name: tfc-agent
          image: hashicorp/tfc-agent:latest
          env:
            - name: TFE_AGENT_TOKEN
              valueFrom:
                secretKeyRef:
                  name: tfc-agent-token
                  key: token
            - name: TFE_AGENT_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
EOF

# Configura il workspace per usare l'agent pool privato
resource "tfe_workspace" "private_datacenter" {
  name              = "datacenter-networking-prod"
  organization      = "myorg"
  agent_pool_id     = tfe_agent_pool.datacenter.id
  execution_mode    = "agent"   # Usa l'agent invece del runner HCP
}

Team RBAC in HCP Terraform

# Organizzare i permessi per team
resource "tfe_team" "platform_engineers" {
  name         = "platform-engineers"
  organization = "myorg"
  visibility   = "organization"
}

resource "tfe_team" "app_developers" {
  name         = "app-developers"
  organization = "myorg"
}

# Platform Engineers: accesso completo a tutti i workspace di networking
resource "tfe_team_access" "platform_networking" {
  access       = "admin"   # plan, apply, destroy, admin
  team_id      = tfe_team.platform_engineers.id
  workspace_id = tfe_workspace.networking_prod.id
}

# App Developers: solo plan (read) su prod, write su dev
resource "tfe_team_access" "dev_compute_prod" {
  access       = "read"   # Solo visualizzazione, nessun trigger
  team_id      = tfe_team.app_developers.id
  workspace_id = tfe_workspace.compute_prod.id
}

resource "tfe_team_access" "dev_compute_dev" {
  access       = "write"  # Plan e apply, ma non admin
  team_id      = tfe_team.app_developers.id
  workspace_id = tfe_workspace.compute_dev.id
}

Conclusions: The Maturity of IaC Enterprise

Terraform enterprise is not just "Terraform with multiple workspaces": it is a system of infrastructure governance with policy machine (Sentinel), secret management (Vault), full audit trail and granular RBAC. The patterns described in this article they represent the maturity of Infrastructure as Code in organizations they manage dozens of teams and hundreds of cloud environments.

With this article we conclude the Terraform and IaC series: from basic HCL up to enterprise pattern, you now have all the tools to build and manage infrastructures professional clouds at any scale.

The Complete Series: Terraform and IaC

  • Article 01 — Terraform from Scratch: HCL, Provider and Plan-Apply-Destroy
  • Article 02 — Designing Reusable Terraform Modules
  • Article 03 — Terraform State: Remote Backend with S3/GCS
  • Article 04 — Terraform in CI/CD: GitHub Actions and Atlantis
  • Article 05 — IaC Testing: Terratest and Terraform Test
  • Article 06 — IaC security: Checkov, Trivy and OPA
  • Article 07 — Terraform Multi-Cloud: AWS + Azure + GCP
  • Article 08 — GitOps for Terraform: Flux TF Controller and Spacelift
  • Article 09 — Terraform vs Pulumi vs OpenTofu
  • Article 10 (this) — Terraform Enterprise Patterns: Workspace, Sentinel, and Team Scaling