Kubernetes Operators: CRD, Controller Pattern and Operator SDK
How do you manage a PostgreSQL cluster in production? You have to monitor the primary, detect failures, promote a replica, update configurations, perform backups scheduled and manage the rotation of certificates. These are operations that an expert DBA knows how to do by heart, but which require hours of manual labor every time something goes wrong crooked at 3am.
The pattern Kubernetes Operator allows you to codify this knowledge operational in Kubernetes-native software: a controller that observes the state of the cluster, compares it with the desired state, and takes the necessary actions to reconcile them. Automatically. Continuously. Without human intervention. In this article we will build a complete Operator using Operator SDK and Kubebuilder, understanding the controller thoroughly pattern and the reconcile loop.
What You Will Learn
- What is a Kubernetes Operator and when does it make sense to build one
- Custom Resource Definition (CRD): schema, versioning, validation
- The Controller Pattern and the Reconcile Loop
- Operator SDK vs Kubebuilder: differences and when to use which
- Implement a complete Operator with Kubebuilder
- Operator Hub and Operator Lifecycle Manager (OLM)
- Operator testing with envtest
- Production Operator: Zalando Postgres Operator, Strimzi for Kafka
What is a Kubernetes Operator
The term "Operator" was introduced by CoreOS in 2016 to describe a pattern: a software that encapsulates the operational know-how of a specific application such as a set of Kubernetes controllers. Google's formal definition:
An Operator and a method of packaging, deploying, and managing a Kubernetes application. An Operator implements and automates common tasks of a human operator when managing that type of application: deployment, updates, backup, failover, scaling.
An Operator extends the Kubernetes declarative model to specific domains. Instead of “give me a Pod”, you can say “give me a PostgreSQL cluster with 3 replicas, daily backup on S3, automatic failover and TLS certificates". The Operator knows how to transform this high-level specification in concrete Kubernetes resources.
Operator Maturity Model
The Operator Capability Model defines 5 levels of increasing maturity:
| Level | Name | Capacity |
|---|---|---|
| 1 | Basic Install | Automated application provisioning |
| 2 | Seamless Upgrades | Patches and minor version upgrades |
| 3 | Full Lifecycle | Backup, failure recovery, reconfiguration |
| 4 | Deep Insights | Metrics, alerting, log processing, workload analysis |
| 5 | Auto Pilot | Auto-scaling, auto-config, anomaly detection |
Custom Resource Definition (CRD)
A CRD extends the Kubernetes API with custom resource types. Instead of just using
native resources (Pod, Deployment, Service), you can define specific domain resources
how PostgresCluster, KafkaTopic, MLModel.
Define a CRD
# postgres-cluster-crd.yaml
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: postgresclusters.database.example.com
spec:
group: database.example.com
scope: Namespaced
names:
plural: postgresclusters
singular: postgrescluster
kind: PostgresCluster
shortNames:
- pgc
versions:
- name: v1alpha1
served: true
storage: true
# Schema di validazione OpenAPI v3
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
required:
- replicas
- version
properties:
replicas:
type: integer
minimum: 1
maximum: 5
description: "Numero di repliche PostgreSQL"
version:
type: string
enum: ["14", "15", "16"]
description: "Versione PostgreSQL"
storage:
type: object
properties:
size:
type: string
pattern: "^[0-9]+Gi$"
default: "10Gi"
storageClass:
type: string
default: "fast-ssd"
backup:
type: object
properties:
enabled:
type: boolean
default: false
schedule:
type: string
description: "Cron expression per backup schedulato"
s3Bucket:
type: string
resources:
type: object
properties:
requests:
type: object
properties:
memory:
type: string
cpu:
type: string
limits:
type: object
properties:
memory:
type: string
cpu:
type: string
status:
type: object
properties:
phase:
type: string
enum: ["Pending", "Creating", "Running", "Degraded", "Failed"]
readyReplicas:
type: integer
primaryEndpoint:
type: string
conditions:
type: array
items:
type: object
properties:
type:
type: string
status:
type: string
reason:
type: string
message:
type: string
lastTransitionTime:
type: string
format: date-time
# Stampa colonne aggiuntive in kubectl get
additionalPrinterColumns:
- name: Replicas
type: integer
jsonPath: .spec.replicas
- name: Version
type: string
jsonPath: .spec.version
- name: Status
type: string
jsonPath: .status.phase
- name: Age
type: date
jsonPath: .metadata.creationTimestamp
# Subresource status (necessario per UpdateStatus)
subresources:
status: {}
A Custom Resource in Action
# my-postgres-cluster.yaml
apiVersion: database.example.com/v1alpha1
kind: PostgresCluster
metadata:
name: myapp-db
namespace: production
spec:
replicas: 3
version: "16"
storage:
size: "100Gi"
storageClass: fast-ssd
backup:
enabled: true
schedule: "0 2 * * *" # ogni notte alle 2:00
s3Bucket: "my-postgres-backups"
resources:
requests:
memory: "2Gi"
cpu: "1000m"
limits:
memory: "4Gi"
cpu: "2000m"
The Controller Pattern and the Reconcile Loop
The heart of an Operator is the controller: a process that continuously observes the current state of resources in the cluster and compares it to the desired state declared in the Custom Resource. When there is a difference (drift), the controller executes the actions necessary to reconcile the two states. This cycle is called reconcile loop.
// Pseudocodice del reconcile loop
for {
desiredState = getDesiredState(customResource)
currentState = getCurrentState(cluster)
if currentState != desiredState {
actions = computeActions(desiredState, currentState)
execute(actions)
}
// Attendi il prossimo trigger (evento API server o requeueing)
waitForTrigger()
}
The controller does not use a "pure event-driven" approach (where each event triggers an action specification) but an approach level-based: observe the total state and the reconcile. This makes controllers more robust: if events (crash, restart) are missing, the controller will restart and converge to the correct state anyway.
Kubebuilder: Building an Operator
Kubebuilder is the official CNCF framework for building Operator in Go. Generate the project scaffolding, manages communication with the API server and provides helpers for the reconcile loop. Operator SDK is based on Kubebuilder and adds support for Helm and Ansible operators.
Project Setup
# Installa Kubebuilder
curl -L -o kubebuilder "https://go.kubebuilder.io/dl/latest/$(go env GOOS)/$(go env GOARCH)"
chmod +x kubebuilder
sudo mv kubebuilder /usr/local/bin/
# Crea un nuovo progetto Operator
mkdir postgres-operator && cd postgres-operator
kubebuilder init \
--domain database.example.com \
--repo github.com/myorg/postgres-operator
# Genera l'API e il controller per PostgresCluster
kubebuilder create api \
--group database \
--version v1alpha1 \
--kind PostgresCluster \
--resource \
--controller
# Struttura generata:
# api/v1alpha1/
# postgrescluster_types.go <- Definizione della struct CRD
# groupversion_info.go
# internal/controller/
# postgrescluster_controller.go <- Logica del reconcile loop
# config/crd/ <- Manifest YAML della CRD
Define the API Type
// api/v1alpha1/postgrescluster_types.go
package v1alpha1
import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
corev1 "k8s.io/api/core/v1"
)
// PostgresClusterSpec definisce lo stato desiderato
type PostgresClusterSpec struct {
// +kubebuilder:validation:Minimum=1
// +kubebuilder:validation:Maximum=5
Replicas int32 `json:"replicas"`
// +kubebuilder:validation:Enum={"14","15","16"}
Version string `json:"version"`
Storage PostgresStorageSpec `json:"storage,omitempty"`
Backup PostgresBackupSpec `json:"backup,omitempty"`
Resources corev1.ResourceRequirements `json:"resources,omitempty"`
}
type PostgresStorageSpec struct {
// +kubebuilder:default="10Gi"
Size string `json:"size,omitempty"`
StorageClass string `json:"storageClass,omitempty"`
}
type PostgresBackupSpec struct {
Enabled bool `json:"enabled,omitempty"`
Schedule string `json:"schedule,omitempty"`
S3Bucket string `json:"s3Bucket,omitempty"`
}
// PostgresClusterStatus descrive lo stato osservato
type PostgresClusterStatus struct {
Phase string `json:"phase,omitempty"`
ReadyReplicas int32 `json:"readyReplicas,omitempty"`
PrimaryEndpoint string `json:"primaryEndpoint,omitempty"`
Conditions []metav1.Condition `json:"conditions,omitempty"`
}
// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// +kubebuilder:printcolumn:name="Replicas",type=integer,JSONPath=".spec.replicas"
// +kubebuilder:printcolumn:name="Status",type=string,JSONPath=".status.phase"
// +kubebuilder:printcolumn:name="Age",type=date,JSONPath=".metadata.creationTimestamp"
type PostgresCluster struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec PostgresClusterSpec `json:"spec,omitempty"`
Status PostgresClusterStatus `json:"status,omitempty"`
}
The Reconcile Loop: Implementation
// internal/controller/postgrescluster_controller.go
package controller
import (
"context"
"fmt"
appsv1 "k8s.io/api/apps/v1"
corev1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/log"
databasev1alpha1 "github.com/myorg/postgres-operator/api/v1alpha1"
)
type PostgresClusterReconciler struct {
client.Client
Scheme *runtime.Scheme
}
// +kubebuilder:rbac:groups=database.example.com,resources=postgresclusters,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=database.example.com,resources=postgresclusters/status,verbs=get;update;patch
// +kubebuilder:rbac:groups=apps,resources=statefulsets,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=core,resources=services,verbs=get;list;watch;create;update;patch;delete
func (r *PostgresClusterReconciler) Reconcile(
ctx context.Context,
req ctrl.Request,
) (ctrl.Result, error) {
logger := log.FromContext(ctx)
// 1. Ottieni la Custom Resource
pgCluster := &databasev1alpha1.PostgresCluster{}
if err := r.Get(ctx, req.NamespacedName, pgCluster); err != nil {
if errors.IsNotFound(err) {
// CR eliminata, pulizia gia gestita dai finalizer
return ctrl.Result{}, nil
}
return ctrl.Result{}, err
}
logger.Info("Reconciling PostgresCluster",
"name", pgCluster.Name,
"namespace", pgCluster.Namespace,
"replicas", pgCluster.Spec.Replicas)
// 2. Reconcilia il Service headless
if err := r.reconcileHeadlessService(ctx, pgCluster); err != nil {
return ctrl.Result{}, fmt.Errorf("failed to reconcile headless service: %w", err)
}
// 3. Reconcilia lo StatefulSet
sts, err := r.reconcileStatefulSet(ctx, pgCluster)
if err != nil {
return ctrl.Result{}, fmt.Errorf("failed to reconcile statefulset: %w", err)
}
// 4. Aggiorna lo status della CR
pgCluster.Status.ReadyReplicas = sts.Status.ReadyReplicas
pgCluster.Status.PrimaryEndpoint = fmt.Sprintf(
"%s-0.%s.%s.svc.cluster.local:5432",
pgCluster.Name,
pgCluster.Name,
pgCluster.Namespace,
)
if sts.Status.ReadyReplicas == pgCluster.Spec.Replicas {
pgCluster.Status.Phase = "Running"
} else if sts.Status.ReadyReplicas > 0 {
pgCluster.Status.Phase = "Degraded"
} else {
pgCluster.Status.Phase = "Creating"
}
if err := r.Status().Update(ctx, pgCluster); err != nil {
return ctrl.Result{}, fmt.Errorf("failed to update status: %w", err)
}
logger.Info("Reconciliation complete",
"phase", pgCluster.Status.Phase,
"readyReplicas", pgCluster.Status.ReadyReplicas)
return ctrl.Result{}, nil
}
func (r *PostgresClusterReconciler) reconcileStatefulSet(
ctx context.Context,
pgCluster *databasev1alpha1.PostgresCluster,
) (*appsv1.StatefulSet, error) {
desired := r.buildStatefulSet(pgCluster)
// Imposta il owner reference per la garbage collection automatica
if err := ctrl.SetControllerReference(pgCluster, desired, r.Scheme); err != nil {
return nil, err
}
existing := &appsv1.StatefulSet{}
err := r.Get(ctx, client.ObjectKeyFromObject(desired), existing)
if errors.IsNotFound(err) {
// StatefulSet non esiste: crealo
if err := r.Create(ctx, desired); err != nil {
return nil, fmt.Errorf("failed to create StatefulSet: %w", err)
}
return desired, nil
}
if err != nil {
return nil, err
}
// StatefulSet esiste: aggiornalo se necessario
existing.Spec.Replicas = desired.Spec.Replicas
existing.Spec.Template = desired.Spec.Template
if err := r.Update(ctx, existing); err != nil {
return nil, fmt.Errorf("failed to update StatefulSet: %w", err)
}
return existing, nil
}
func (r *PostgresClusterReconciler) buildStatefulSet(
pgCluster *databasev1alpha1.PostgresCluster,
) *appsv1.StatefulSet {
image := fmt.Sprintf("postgres:%s", pgCluster.Spec.Version)
return &appsv1.StatefulSet{
ObjectMeta: metav1.ObjectMeta{
Name: pgCluster.Name,
Namespace: pgCluster.Namespace,
},
Spec: appsv1.StatefulSetSpec{
Replicas: &pgCluster.Spec.Replicas,
ServiceName: pgCluster.Name,
Selector: &metav1.LabelSelector{
MatchLabels: map[string]string{"app": pgCluster.Name},
},
Template: corev1.PodTemplateSpec{
ObjectMeta: metav1.ObjectMeta{
Labels: map[string]string{"app": pgCluster.Name},
},
Spec: corev1.PodSpec{
Containers: []corev1.Container{
{
Name: "postgres",
Image: image,
Resources: pgCluster.Spec.Resources,
},
},
},
},
VolumeClaimTemplates: []corev1.PersistentVolumeClaim{
{
ObjectMeta: metav1.ObjectMeta{
Name: "data",
},
Spec: corev1.PersistentVolumeClaimSpec{
AccessModes: []corev1.PersistentVolumeAccessMode{
corev1.ReadWriteOnce,
},
StorageClassName: &pgCluster.Spec.Storage.StorageClass,
Resources: corev1.VolumeResourceRequirements{
Requests: corev1.ResourceList{
corev1.ResourceStorage: pgCluster.Spec.Storage.ParsedSize(),
},
},
},
},
},
},
}
}
// SetupWithManager registra il controller con il manager
func (r *PostgresClusterReconciler) SetupWithManager(
mgr ctrl.Manager,
) error {
return ctrl.NewControllerManagedBy(mgr).
For(&databasev1alpha1.PostgresCluster{}).
Owns(&appsv1.StatefulSet{}). // Reconcilia quando cambia lo StatefulSet owned
Owns(&corev1.Service{}).
Complete(r)
}
Finalizer: Resource Cleanup
Finalizers allow you to perform cleanup operations before a resource is released eliminated. Without finalizer, deleting a PostgresCluster CR would delete the CR but not necessarily the data on S3 or backups. With finalizers you can manage this cleaning:
// Aggiungi finalizer handling al Reconcile
const pgClusterFinalizer = "database.example.com/finalizer"
func (r *PostgresClusterReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
pgCluster := &databasev1alpha1.PostgresCluster{}
if err := r.Get(ctx, req.NamespacedName, pgCluster); err != nil {
return ctrl.Result{}, client.IgnoreNotFound(err)
}
// Gestione eliminazione
if !pgCluster.DeletionTimestamp.IsZero() {
if controllerutil.ContainsFinalizer(pgCluster, pgClusterFinalizer) {
// Esegui cleanup
if err := r.cleanupExternalResources(ctx, pgCluster); err != nil {
return ctrl.Result{}, err
}
// Rimuovi il finalizer
controllerutil.RemoveFinalizer(pgCluster, pgClusterFinalizer)
if err := r.Update(ctx, pgCluster); err != nil {
return ctrl.Result{}, err
}
}
return ctrl.Result{}, nil
}
// Aggiungi finalizer se non presente
if !controllerutil.ContainsFinalizer(pgCluster, pgClusterFinalizer) {
controllerutil.AddFinalizer(pgCluster, pgClusterFinalizer)
if err := r.Update(ctx, pgCluster); err != nil {
return ctrl.Result{}, err
}
}
// ... resto della logica di reconcile
return ctrl.Result{}, nil
}
Operator testing with envtest
Kubebuilder provides envtest, a testing framework that launches an API server Real Kubernetes (without kubelets and nodes) to test controllers in an integrated way:
// internal/controller/postgrescluster_controller_test.go
package controller
import (
"context"
"time"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
appsv1 "k8s.io/api/apps/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/types"
databasev1alpha1 "github.com/myorg/postgres-operator/api/v1alpha1"
)
var _ = Describe("PostgresCluster Controller", func() {
const (
timeout = time.Second * 10
interval = time.Millisecond * 250
)
Context("Quando crea un PostgresCluster", func() {
It("Deve creare lo StatefulSet corrispondente", func() {
ctx := context.Background()
pgCluster := &databasev1alpha1.PostgresCluster{
ObjectMeta: metav1.ObjectMeta{
Name: "test-postgres",
Namespace: "default",
},
Spec: databasev1alpha1.PostgresClusterSpec{
Replicas: 1,
Version: "16",
Storage: databasev1alpha1.PostgresStorageSpec{
Size: "10Gi",
StorageClass: "standard",
},
},
}
Expect(k8sClient.Create(ctx, pgCluster)).Should(Succeed())
// Verifica che lo StatefulSet venga creato
stsLookupKey := types.NamespacedName{
Name: "test-postgres",
Namespace: "default",
}
createdSts := &appsv1.StatefulSet{}
Eventually(func() bool {
err := k8sClient.Get(ctx, stsLookupKey, createdSts)
return err == nil
}, timeout, interval).Should(BeTrue())
// Verifica le specifiche dello StatefulSet
Expect(*createdSts.Spec.Replicas).Should(Equal(int32(1)))
Expect(createdSts.Spec.Template.Spec.Containers[0].Image).
Should(Equal("postgres:16"))
// Cleanup
Expect(k8sClient.Delete(ctx, pgCluster)).Should(Succeed())
})
})
})
Operator Build and Deploy
# Build dell'immagine
make docker-build docker-push IMG="myregistry/postgres-operator:v0.1.0"
# Deploy dell'Operator nel cluster
make deploy IMG="myregistry/postgres-operator:v0.1.0"
# Verifica il deployment
kubectl get pods -n postgres-operator-system
kubectl logs -n postgres-operator-system deployment/postgres-operator-controller-manager
# Applica una CR
kubectl apply -f my-postgres-cluster.yaml
kubectl get postgresclusters -n production
kubectl describe postgrescluster myapp-db -n production
Production Operator: Real Examples
It is not necessary to build an Operator for every common application. The ecosystem Kubernetes offers mature Operators for major stateful applications:
Zalando Postgres Operator
# Installa il Postgres Operator di Zalando (level 5 maturity)
helm repo add postgres-operator-charts \
https://opensource.zalando.com/postgres-operator/charts/postgres-operator
helm install postgres-operator \
postgres-operator-charts/postgres-operator \
-n postgres-operator --create-namespace
# Crea un cluster PostgreSQL con HA e backup su S3
apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
name: myapp-postgres
namespace: production
spec:
teamId: "myteam"
volume:
size: 100Gi
storageClass: fast-ssd
numberOfInstances: 3
users:
myapp:
- superuser
- createdb
databases:
myapp: myapp
postgresql:
version: "16"
parameters:
shared_buffers: "1GB"
max_connections: "200"
resources:
requests:
cpu: 1000m
memory: 2Gi
limits:
cpu: 2000m
memory: 4Gi
patroni:
failsafe_mode: false
# Backup automatico su S3 con WAL-G
enableLogicalBackup: true
logicalBackupSchedule: "00 02 * * *"
Strimzi: Kafka on Kubernetes
# Kafka cluster con Strimzi (level 5 maturity)
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: production-cluster
namespace: kafka
spec:
kafka:
version: 3.7.0
replicas: 3
listeners:
- name: plain
port: 9092
type: internal
tls: false
- name: tls
port: 9093
type: internal
tls: true
config:
offsets.topic.replication.factor: 3
transaction.state.log.replication.factor: 3
transaction.state.log.min.isr: 2
default.replication.factor: 3
min.insync.replicas: 2
inter.broker.protocol.version: "3.7"
storage:
type: jbod
volumes:
- id: 0
type: persistent-claim
size: 200Gi
class: fast-ssd
deleteClaim: false
resources:
requests:
memory: 4Gi
cpu: 2000m
limits:
memory: 8Gi
cpu: 4000m
zookeeper:
replicas: 3
storage:
type: persistent-claim
size: 10Gi
class: fast-ssd
deleteClaim: false
entityOperator:
topicOperator: {}
userOperator: {}
Operator Lifecycle Manager (OLM)
The OLM manages the installation, upgrade and lifecycle management of the Operator in the cluster. And the mechanism used by OperatorHub.io to distribute Operators.
# Installa OLM nel cluster
curl -sL https://github.com/operator-framework/operator-lifecycle-manager/releases/download/v0.28.0/install.sh | bash -s v0.28.0
# Installa un Operator da OperatorHub tramite OLM
kubectl create -f https://operatorhub.io/install/postgres-operator.yaml
# Verifica gli Operator installati
kubectl get csv -n operators # ClusterServiceVersion
kubectl get subscription -n operators
Best Practices for Operators
Checklist for Production Operators
- Use finalizer: Always for resources that have external side effects (S3 bucket, DNS records, etc.)
- Implement status conditions: Follow the Kubernetes condition pattern (type, status, reason, message)
- Idempotence: The reconcile loop must be safe to execute multiple times with the same result
- Handle errors with retry: USA
ctrl.Result{RequeueAfter: time.Minute}for transient errors - Don't do blocking operations: The reconcile must not block; use goroutines for long operations
- Minimum RBAC: Use only the strictly necessary permissions in the annotations
+kubebuilder:rbac - Testing with envtest: Write integration tests for each reconcile scenario
- Versioning API: Use versioning (v1alpha1 -> v1beta1 -> v1) and conversion webhooks for migrations
When NOT to Build an Operator
Operators have a significant development and maintenance cost. Build one it only makes sense if: (1) there is a mature Operator on OperatorHub for the application that you are managing, use it; (2) the application is complex, stateful and requires knowledge specialized operations to be automated; (3) you have dedicated teams that they can maintain Go code over time. For simple deploymentin this is not necessary.
Conclusions and Next Steps
The Kubernetes Operator pattern is the natural extension of the declarative philosophy of Kubernetes to complex application domains. Instead of manually managing databases, messaging systems and stateful services, you codify operational knowledge in a controller that works 24/7 to keep the system in the desired state.
Kubebuilder and Operator SDK provide the foundation: project scaffolding, management of the server API, reconcile framework. But business logic - how to manage a PostgreSQL failover, how to scale a Kafka cluster, how to rotate TLS certificates - must be implemented with deep knowledge of the specific application.
Upcoming Articles in the Kubernetes at Scale Series
Related Resources and Series
- Kubernetes Networking: CNI, Cilium with eBPF
- Persistent Storage in Kubernetes
- MLOps: Scaling ML on Kubernetes — Operator for training pipeline







