Persistent Storage in Kubernetes: CSI, PV, StorageClass and StatefulSet
Kubernetes was born as a platform for stateless workloads, but the reality of applications enterprise and very different: databases, message queues, persistent cache systems, shared file systems. All require storage that survives the lifecycle of a Pod. Manage this storage in a reliable, high-performance and portable way between cloud providers different and one of the most concrete challenges of day-to-day production.
In this article we will explore the entire Kubernetes storage layer: from Container Storage Interface (CSI) which standardizes integration with providers, PersistentVolume and StorageClass for dynamic provisioning, up to StatefulSet for manage databases such as PostgreSQL, Cassandra and Redis on Kubernetes.
What You Will Learn
- The Kubernetes storage model: Volume, PersistentVolume, PersistentVolumeClaim
- How the Container Storage Interface (CSI) works and the most used drivers
- StorageClass and dynamic provisioning: configuration for AWS EBS, GCE PD, Azure Disk
- Access Mode: ReadWriteOnce, ReadOnlyMany, ReadWriteMany - when to use which
- StatefulSet: stable identity, automatic PVCs, orderly rolling update
- How to run PostgreSQL on Kubernetes with StatefulSet
- Backup and restore of PersistentVolumes with Velero
The Kubernetes Storage Model
Kubernetes defines an abstraction hierarchy for storage that separates "what's needed" (PersistentVolumeClaim) from "as and provided" (PersistentVolume and StorageClass). This allows developers to request storage without knowing provider details underlying cloud.
Storage Primitives
| Resource | Brooms | Who manages it | Description |
|---|---|---|---|
| Volume | Pod | Developer | Ephemeral storage linked to the life cycle of the Pod |
| PersistentVolume (PV) | Clusters | Admin / Provisioner | Storage piece in the cluster, lifecycle independent of Pods |
| PersistentVolumeClaim (PVC) | Namespace | Developer | Requesting storage from a Pod |
| StorageClass | Clusters | Admin | Defines the storage "type" and provisioner |
Lifecycle of a PersistentVolume
The life cycle of a PV goes through different states. Understanding them is essential troubleshooting:
- Available: The PV exists and is free, not associated with any PVC
- Bound: The PV has been bonded to a PVC that meets the requirements
- Released: The PVC has been eliminated, but the PV is not yet available (the data is still there)
- Failed: The PV failed automatic reclaim
# Verifica lo stato dei PersistentVolume nel cluster
kubectl get pv -o wide
# Output tipico:
# NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS
# pv-db-001 100Gi RWO Retain Bound production/postgres fast-ssd
# pv-db-002 100Gi RWO Retain Available fast-ssd
# Descrizione dettagliata di un PV
kubectl describe pv pv-db-001
# Verifica i PVC in un namespace
kubectl get pvc -n production
kubectl describe pvc postgres-data -n production
Container Storage Interface (CSI)
Before CSI, every storage provider had to maintain plugins built into their code Kubernetes source (in-tree plugins). This created a strong and efficient coupling difficult to update plugins independently of Kubernetes. The CSI solves this with a standard gRPC interface that allows third parties to create storage drivers as independent Kubernetes Pods.
Main CSI Drivers
| CSI Drivers | Providers | Storage type | ReadWriteMany |
|---|---|---|---|
| aws-ebs-csi-driver | AWS | Block (gp3, io2) | No |
| aws-efs-csi-driver | AWS | NFS (EFS) | Si |
| gce-pd-csi-driver | GCP | Block (pd-ssd, pd-balanced) | No (RWX FileStore only) |
| azuredisk-csi-driver | Azure | Block (Premium SSD) | No |
| azurefile-csi-driver | Azure | NFS (Azure Files) | Si |
| csi-rook-ceph | Rook/Ceph | Block/FS/Object | Yes (CephFS) |
| longhorn | Ranchers | Block distributed | Yes (with NFS) |
Installation of the CSI EBS Driver on EKS
# Installa il driver CSI EBS su Amazon EKS
# Prima, crea un IAM Role con le policy necessarie
aws eks create-addon \
--cluster-name my-cluster \
--addon-name aws-ebs-csi-driver \
--service-account-role-arn arn:aws:iam::ACCOUNT_ID:role/EBSCSIRole
# Verifica il daemonset del driver CSI
kubectl get daemonset -n kube-system ebs-csi-node
kubectl get deployment -n kube-system ebs-csi-controller
StorageClass: Dynamic Provisioning
Static provisioning (manually creating PVs) is impractical in production. With dynamic provisioning, Kubernetes automatically creates the PV when a PVC comes created, using the CSI driver configured in the StorageClass.
StorageClass for AWS EBS
# storage-classes-aws.yaml
# StorageClass per dischi gp3 (performance ottimale)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-ssd
annotations:
storageclass.kubernetes.io/is-default-class: "false"
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer # IMPORTANTE: evita cross-AZ mounting
reclaimPolicy: Retain # Protegge i dati in produzione
allowVolumeExpansion: true
parameters:
type: gp3
iops: "3000"
throughput: "125"
encrypted: "true"
kmsKeyId: "arn:aws:kms:eu-west-1:ACCOUNT:key/KEY_ID"
---
# StorageClass per io2 (database ad alto IOPS)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ultra-fast-ssd
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Retain
allowVolumeExpansion: true
parameters:
type: io2
iops: "32000"
encrypted: "true"
---
# StorageClass per EFS (ReadWriteMany, NFS)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: shared-storage
provisioner: efs.csi.aws.com
reclaimPolicy: Retain
parameters:
provisioningMode: efs-ap
fileSystemId: fs-XXXXXXXX
directoryPerms: "700"
StorageClass for GKE (Google Kubernetes Engine)
# storage-classes-gke.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-ssd-gke
provisioner: pd.csi.storage.gke.io
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Retain
allowVolumeExpansion: true
parameters:
type: pd-ssd
replication-type: regional-pd # replica su 2 zone
availability-class: regional-hard-failover
volumeBindingMode: Why WaitForFirstConsumer
Always use WaitForFirstConsumer instead of Immediate when
the cluster has multiple Availability Zones. With Immediate, the PV is created
in the zone where the provisioner is scheduled, which may be different from the zone
where the Pod will be scheduled. Result: The Pod fails to mount the volume.
WaitForFirstConsumer creates the PV in the same area as the Pod.
PersistentVolumeClaim in Practice
PVC and how a Pod requires storage. The PVC specifies size, access mode and StorageClass. Kubernetes finds or creates a compatible PV and binds it to the PVC.
# pvc-database.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-data
namespace: production
labels:
app: postgres
tier: database
annotations:
# Snapshot policy (con alcuni storage providers)
storageclass.kubernetes.io/is-default-class: "false"
spec:
accessModes:
- ReadWriteOnce
storageClassName: fast-ssd
resources:
requests:
storage: 100Gi
# Espandi a 200Gi in futuro con:
# kubectl patch pvc postgres-data -p '{"spec":{"resources":{"requests":{"storage":"200Gi"}}}}'
Access Mode: Choose the Right One
| Access Mode | Abbreviation | What does it mean | Typical use |
|---|---|---|---|
| ReadWriteOnce | RWO | Only one read/write node | Databases, single-instance applications |
| ReadOnlyMany | ROX | Many read-only nodes | Shared static data,configurations |
| ReadWriteMany | RWX | Many read/write nodes | Shared NFS, upload handler |
| ReadWriteOncePod | RWOP | Only one read/write Pod | Exclusive storage for single Pod (K8s 1.29+) |
StatefulSet: Workload with Stable Identity
Deployments are great for stateless workloads, but for databases and applications that require stable identity (predictable hostname, dedicated volume, boot order) serves it StatefulSet. The key differences compared to Deployments:
- Stable identity: Pods have predictable names:
myapp-0,myapp-1,myapp-2 - Boot order: Pods are started in order (0, then 1, then 2) and shut down in reverse order
- Dedicated PVCs: Each Pod gets its own PVC via
volumeClaimTemplates - Headless Service: Each Pod has a stable DNS entry:
myapp-0.myapp.namespace.svc.cluster.local
PostgreSQL on Kubernetes with StatefulSet
Here is a complete and production-ready setup for PostgreSQL with StatefulSet, including ConfigMap for configuration, Secret for credentials, and headless Service:
# postgres-statefulset.yaml
apiVersion: v1
kind: Service
metadata:
name: postgres
namespace: production
labels:
app: postgres
spec:
ports:
- port: 5432
name: postgres
clusterIP: None # Headless Service - abilita il DNS per Pod individuali
selector:
app: postgres
---
# Service per accesso al master (read/write)
apiVersion: v1
kind: Service
metadata:
name: postgres-master
namespace: production
spec:
ports:
- port: 5432
selector:
app: postgres
role: master
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
namespace: production
spec:
serviceName: postgres # deve corrispondere al nome del headless Service
replicas: 3
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
# Anti-affinity: distribuisce le repliche su nodi diversi
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: postgres
topologyKey: kubernetes.io/hostname
initContainers:
# init container per configurare i permessi del volume
- name: init-postgres
image: postgres:16
command:
- bash
- "-c"
- |
chown -R 999:999 /var/lib/postgresql/data
volumeMounts:
- name: postgres-data
mountPath: /var/lib/postgresql/data
containers:
- name: postgres
image: postgres:16
ports:
- containerPort: 5432
name: postgres
env:
- name: POSTGRES_DB
value: myapp
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: postgres-credentials
key: username
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-credentials
key: password
- name: PGDATA
value: /var/lib/postgresql/data/pgdata
volumeMounts:
- name: postgres-data
mountPath: /var/lib/postgresql/data
- name: postgres-config
mountPath: /etc/postgresql/postgresql.conf
subPath: postgresql.conf
resources:
requests:
memory: "2Gi"
cpu: "1000m"
limits:
memory: "4Gi"
cpu: "2000m"
livenessProbe:
exec:
command:
- pg_isready
- -U
- $(POSTGRES_USER)
- -d
- $(POSTGRES_DB)
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
exec:
command:
- pg_isready
- -U
- $(POSTGRES_USER)
- -d
- $(POSTGRES_DB)
initialDelaySeconds: 5
periodSeconds: 5
volumes:
- name: postgres-config
configMap:
name: postgres-config
# PVC template: ogni Pod ottiene il proprio volume da 100Gi
volumeClaimTemplates:
- metadata:
name: postgres-data
spec:
accessModes:
- ReadWriteOnce
storageClassName: fast-ssd
resources:
requests:
storage: 100Gi
ConfigMap for PostgreSQL Configuration
# postgres-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: postgres-config
namespace: production
data:
postgresql.conf: |
# Performance tuning per 4GB RAM
shared_buffers = 1GB
work_mem = 64MB
maintenance_work_mem = 256MB
effective_cache_size = 3GB
# WAL settings
wal_level = replica
max_wal_senders = 5
wal_keep_size = 1GB
# Checkpoint
checkpoint_completion_target = 0.9
max_wal_size = 4GB
min_wal_size = 1GB
# Connection settings
max_connections = 200
# Logging
log_min_duration_statement = 1000 # log query lente >1s
log_checkpoints = on
log_connections = on
log_disconnections = on
Volume Snapshot and Backup
Volume Snapshots allow you to create point-in-time backups of PersistentVolumes using the cloud provider's native capabilities (EBS snapshot, GCE PD snapshot, etc.).
# Installa le CRD per Volume Snapshot (se non presenti)
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/client/config/crd/snapshot.storage.k8s.io_volumesnapshotclasses.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/client/config/crd/snapshot.storage.k8s.io_volumesnapshotcontents.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/client/config/crd/snapshot.storage.k8s.io_volumesnapshots.yaml
# VolumeSnapshotClass
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: ebs-snapshot-class
driver: ebs.csi.aws.com
deletionPolicy: Retain
---
# Crea uno snapshot del volume di PostgreSQL
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: postgres-snapshot-20260801
namespace: production
spec:
volumeSnapshotClassName: ebs-snapshot-class
source:
persistentVolumeClaimName: postgres-data-postgres-0
---
# Verifica lo stato dello snapshot
kubectl get volumesnapshot -n production
# Restore da snapshot: crea un nuovo PVC da snapshot
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-data-restored
namespace: production
spec:
accessModes:
- ReadWriteOnce
storageClassName: fast-ssd
resources:
requests:
storage: 100Gi
dataSource:
name: postgres-snapshot-20260801
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
Velero for Complete Cluster Backup
Velero is the reference tool for backup and restore of entire Kubernetes clusters, including PersistentVolumes:
# Installa Velero con il plugin EBS
velero install \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.8.0 \
--bucket my-velero-backups \
--backup-location-config region=eu-west-1 \
--snapshot-location-config region=eu-west-1 \
--secret-file ./credentials-velero
# Crea un backup del namespace production con i volume
velero backup create production-backup-20260801 \
--include-namespaces production \
--snapshot-volumes \
--wait
# Verifica il backup
velero backup describe production-backup-20260801
velero backup logs production-backup-20260801
# Schedule: backup giornaliero a mezzanotte
velero schedule create daily-production \
--schedule="0 0 * * *" \
--include-namespaces production \
--snapshot-volumes \
--ttl 720h # mantieni per 30 giorni
# Restore in un nuovo cluster
velero restore create --from-backup production-backup-20260801 \
--namespace-mappings production:production-restored
Storage for AI/ML Workloads
ML training workloads have special storage requirements: parallel access high throughput for large datasets, often from multiple GPU workers simultaneously.
# PVC con ReadWriteMany per training distribuito
# Usa EFS (AWS) o CephFS (on-premise) per RWX
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: ml-dataset-storage
namespace: ml-training
spec:
accessModes:
- ReadWriteMany
storageClassName: shared-storage # EFS o CephFS
resources:
requests:
storage: 10Ti # 10TB per dataset ImageNet, etc.
---
# Job di training che accede ai dati in parallelo
apiVersion: batch/v1
kind: Job
metadata:
name: distributed-training
namespace: ml-training
spec:
parallelism: 8 # 8 worker GPU in parallelo
completions: 8
template:
spec:
containers:
- name: trainer
image: pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime
resources:
limits:
nvidia.com/gpu: "1"
volumeMounts:
- name: dataset
mountPath: /data
readOnly: true # tutti i worker leggono, nessuno scrive
- name: checkpoints
mountPath: /checkpoints
volumes:
- name: dataset
persistentVolumeClaim:
claimName: ml-dataset-storage
readOnly: true
- name: checkpoints
persistentVolumeClaim:
claimName: ml-checkpoints-rwx # RWX per checkpoint condivisi
Best Practices for Kubernetes Storage
Production Storage Checklist
- Always use reclaimPolicy: Retain for production data.
Deletedeletes data automatically when PVC is deleted - volumeBindingMode: WaitForFirstConsumer: Avoid cross-AZ binding issues in multi-zone clusters
- allowVolumeExpansion: true: Configure StorageClasses to allow volume expansion without downtime
- Monitor disk usage: Configure alerts on Prometheus when a PVC exceeds 80% capacity
- Automatic snapshots: Configure VolumeSnapshotClass and scheduled backups
- Test the restore: An untested and useless backup. Do monthly restore tests
- Separate PVCs by role: One PVC for data, one for logs, one for temporary backups
- StatefulSet with anti-affinity: Distribute replicas across different nodes and zones
Anti-Pattern: Don't Do This
- Don't use hostPath in production: Ties the Pod to a specific node and is not portable
- Don't use emptyDir for persistent data: It clears when the Pod is restarted
- Do not use reclaimPolicy: Delete for production data: You can lose everything by mistake
- Do not mount the same PVC (RWO) on multiple Pods: Causes data corruption
Storage Monitoring with Prometheus
# Metriche chiave da monitorare con kube-state-metrics
# Aggiungi alert a Prometheus
# Alert: PVC vicino alla capacita massima
groups:
- name: kubernetes-storage
rules:
- alert: PVCStorageUsageHigh
expr: |
kubelet_volume_stats_used_bytes /
kubelet_volume_stats_capacity_bytes > 0.80
for: 5m
labels:
severity: warning
annotations:
summary: "PVC {{ $labels.persistentvolumeclaim }} e all'80% della capacita"
description: "Namespace: {{ $labels.namespace }}"
- alert: PVCStorageFull
expr: |
kubelet_volume_stats_used_bytes /
kubelet_volume_stats_capacity_bytes > 0.95
for: 2m
labels:
severity: critical
annotations:
summary: "PVC {{ $labels.persistentvolumeclaim }} quasi piena!"
- alert: PVCNotBound
expr: |
kube_persistentvolumeclaim_status_phase{phase="Pending"} == 1
for: 10m
labels:
severity: warning
annotations:
summary: "PVC {{ $labels.persistentvolumeclaim }} in stato Pending da 10 minuti"
Conclusions and Next Steps
Kubernetes storage is one of the most critical layers for enterprise applications in production. The Container Storage Interface standardizes integration with any provider, dynamic provisioning with StorageClass eliminates manual work, e StatefulSets provide the primitives necessary to manage databases with stable identities.
The key to robust storage in production is a combination of architectural choices correct (reclaimPolicy Retain, WaitForFirstConsumer, anti-affinity), proactive monitoring with Prometheus and a regularly tested backup strategy with Velero or VolumeSnapshot.







