FinOps for Kubernetes: Rightsizing, Spot Instances and Cost Reduction
68% of organizations using Kubernetes in production spend 20-40% more than necessary (CNCF Cost Survey 2026). Not because Kubernetes is inefficient, but because generous and easy provisioning, and rightsizing requires operational discipline. The result and cluster with average CPU usage at 15-20% and memory at 30-40%: resources paid but don't use.
FinOps for Kubernetes and the practice of measuring, optimizing and governing cluster spending. In this article we will see how to use Kubecost for the visibility of costs for namespaces, such as the VPA helps in rightsizing automatic requests, how Karpenter optimizes node bin-packing with spot instances, and which policies to implement at the organizational level to maintain costs under control over time.
What You Will Learn
- Install and configure Kubecost for namespace/team cost visibility
- VPA in Off mode: Automatic rightsizing recommendations without disruption
- Karpenter: Consolidation for optimal bin-packing of nodes
- Spot instances on AWS/GCP/Azure with Karpenter and interrupt management
- ResourceQuota as a team budgeting tool
- Alert Prometheus for waste and cost anomalies
- How to build a chargeback system for teams
- Benchmark: real savings achievable with each technique
Why Kubernetes Tends to be Overprovisioned
Before optimizing, you need to understand the causes of the problem:
- Conservative Requests: Teams set requests high "to be safe" (e.g. 1 CPU for an app that uses 0.1). Requests determine the number of nodes needed
- Limits = requests: Many CI/CD templates set limits equal to requests, creating systematic over-provisioning
- Empty namespaces: Development namespace with "Hello World" Pods that stay on 24/7
- Oversized knots: Large cloud instances with inefficient bin-packing (a 2 vCPU Pod on a 32 vCPU node)
- No visibility for teams: If teams don't see their cost, they have no incentive to optimize
Kubecost: Visibility of Namespace Costs
Kubecost It is the most used FinOps tool in the Kubernetes ecosystem. Collects metrics from Prometheus, retrieves cloud instance pricing (AWS, GCP, Azure) and calculate the cost allocated for namespace, deployment, service account and label.
Kubecost installation
# Installa Kubecost con Helm
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm repo update
helm install kubecost kubecost/cost-analyzer \
--namespace kubecost \
--create-namespace \
--set kubecostToken="your-token-here" \
--set prometheus.enabled=true \
--set grafana.enabled=true \
--set global.prometheus.enabled=false \
--set global.prometheus.fqdn="http://prometheus.monitoring.svc:9090"
# Port-forward per accedere alla UI
kubectl port-forward svc/kubecost-cost-analyzer 9090:9090 -n kubecost &
# Apri: http://localhost:9090
# Alternativa open-source senza token: OpenCost
helm install opencost opencost/opencost \
--namespace opencost \
--create-namespace
Configure Chargeback for Teams
# Kubecost usa i label Kubernetes per il cost allocation
# Configura label standard per tutti i workload:
# Nel values.yaml di ogni applicazione (Helm):
podLabels:
team: "team-alpha"
cost-center: "CC-2024-ENG"
environment: "production"
product: "checkout-service"
# Kubecost mostra automaticamente i costi aggregati per questi label.
# Es: costo totale team-alpha nel mese = CPU + Memoria + Storage + GPU + Networking
# API Kubecost per report automatici (da inviare via email settimanale):
curl "http://kubecost:9090/model/allocation?window=7d&aggregate=label:team&accumulate=true" \
| jq '.data[0] | to_entries[] | {team: .key, cost: .value.totalCost}'
VPA for Automatic Rightsizing
Il Vertical Pod Autoscaler (VPA) analyzes historical CPU consumption and
Pod memory and calculates optimal requests. In mode Off (recommendation
without automatic application) is the safest tool for rightsizing: it provides
precise numbers without risk of disruption.
# Installa VPA
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh
# Oppure con Helm
helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm install vpa fairwinds-stable/vpa \
--namespace vpa \
--create-namespace
---
# vpa-recommendation.yaml - modalita Off: solo raccomandazioni
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-service-vpa
namespace: team-alpha
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-service
updatePolicy:
updateMode: "Off" # non applica automaticamente, solo raccomanda
resourcePolicy:
containerPolicies:
- containerName: api
minAllowed:
cpu: 50m
memory: 64Mi
maxAllowed:
cpu: "4"
memory: 8Gi
# Leggi le raccomandazioni dopo 24-48 ore:
kubectl describe vpa api-service-vpa -n team-alpha
# Output:
# Container Recommendations:
# Container Name: api
# Lower Bound:
# Cpu: 120m
# Memory: 256Mi
# Target: <-- usa questi valori nelle requests
# Cpu: 250m
# Memory: 512Mi
# Upper Bound:
# Cpu: 800m
# Memory: 1500Mi
# Uncapped Target:
# Cpu: 230m
# Memory: 480Mi
Script for Rightsizing Analysis in Bulk
# Trova tutti i Deployment con requests > 2x il consumo reale
kubectl get vpa -A -o json | jq -r '
.items[] |
.metadata.namespace + "/" + .metadata.name + ": " +
(.status.recommendation.containerRecommendations[]? |
"target CPU=" + .target.cpu + " Mem=" + .target.memory)
'
# Script bash per generare report di rightsizing
#!/bin/bash
echo "=== Rightsizing Recommendations ==="
for ns in $(kubectl get ns -o jsonpath='{.items[*].metadata.name}'); do
echo "--- Namespace: $ns ---"
kubectl get vpa -n "$ns" -o custom-columns=\
"NAME:.metadata.name,\
TARGET-CPU:.status.recommendation.containerRecommendations[0].target.cpu,\
TARGET-MEM:.status.recommendation.containerRecommendations[0].target.memory" \
2>/dev/null
done
Karpenter: Consolidation and Spot Instances
Karpenter optimizes costs from two directions: use spot instances for reduce the cost per node by 60-70%, e consolidate the nodes to delete those underutilized (bin-packing).
NodePool with Spot + Consolidation
# karpenter-cost-optimized.yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: cost-optimized
spec:
template:
spec:
nodeClassRef:
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
name: default
requirements:
# Tipi di istanza ottimali per bin-packing (diverse size)
- key: node.kubernetes.io/instance-type
operator: In
values:
- m7i.xlarge # 4 vCPU, 16GB ~$0.20/h on-demand, ~$0.05/h spot
- m7i.2xlarge # 8 vCPU, 32GB
- m7i.4xlarge # 16 vCPU, 64GB
- m7i.8xlarge # 32 vCPU, 128GB
- c7i.2xlarge # CPU-optimized per workload compute-intensive
- r7i.2xlarge # Memory-optimized per carichi con alta RAM
# Mix spot e on-demand con peso
- key: karpenter.sh/capacity-type
operator: In
values:
- spot # priorita spot (economico)
- on-demand # fallback on-demand
disruption:
consolidationPolicy: WhenUnderutilized
consolidateAfter: 1m # consolida subito i nodi sottoutilizzati
limits:
cpu: "1000" # max 1000 vCPU totali
memory: "4000Gi" # max 4TB RAM totale
---
# Configurazione spot interruption handler
# Necessario per gestire le interruzioni spot (AWS invia 2 min di preavviso)
helm install aws-node-termination-handler \
eks/aws-node-termination-handler \
--namespace kube-system \
--set enableSqsTerminationDraining=true \
--set queueURL=https://sqs.eu-west-1.amazonaws.com/123456789/NodeTerminationHandler
Savings Analysis with Karpenter
# Verifica quale percentuale dei nodi sono spot
kubectl get nodes -o json | jq '
[.items[] |
{type: .metadata.labels["karpenter.sh/capacity-type"],
instance: .metadata.labels["node.kubernetes.io/instance-type"]}
] |
group_by(.type) |
map({type: .[0].type, count: length})'
# Output esempio:
# [{"type": "on-demand", "count": 3}, {"type": "spot", "count": 12}]
# 80% dei nodi sono spot -> risparmio medio 65% = -$2400/mese
# Kubecost: visualizza costo spot vs on-demand storicamente
# Vai in Cost Allocation > Filter by node type
Namespace Budget: ResourceQuota as a FinOps Tool
ResourceQuotas aren't just for technical isolation — they're also a tool of financial governance. By assigning quotas based on budgets, you create a system of incentives that push teams to optimize:
# budget-quota-team.yaml
# Calcola le quote basandoti sul budget mensile del team
# Budget: 500 EUR/mese (circa 1500 CPU-hours a 0.033 EUR/CPU-hour)
# Assumendo utilizzo medio 50%: requests max = 2000 CPU-hours / 720 ore = 2.8 vCPU mean
apiVersion: v1
kind: ResourceQuota
metadata:
name: team-alpha-budget-quota
namespace: team-alpha
annotations:
finops/monthly-budget-eur: "500"
finops/last-reviewed: "2026-01-15"
finops/owner: "alice@company.com"
spec:
hard:
# Basato su budget EUR/mese
requests.cpu: "6" # ~500 EUR/mese a tasso medio spot
requests.memory: "12Gi" # proporzionale
requests.storage: "200Gi"
---
# Alert quando il team si avvicina al budget
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: finops-budget-alerts
namespace: monitoring
spec:
groups:
- name: finops
rules:
- alert: TeamBudgetUsageHigh
expr: |
kubecost_namespace_allocation_cpu_cost_hourly * 24 * 30 +
kubecost_namespace_allocation_memory_cost_hourly * 24 * 30 > 400
for: 1h
labels:
severity: warning
annotations:
summary: "Team {{ $labels.namespace }}: costo mensile proiettato > 400 EUR"
Savings Strategies for Different Workloads
| Workload type | Strategy | Expected Savings |
|---|---|---|
| Web API (stateless) | Spot + HPA + rightsizing VPA | 50-65% |
| ML batch jobs | Spot + checkpoint + scale-to-zero | 60-70% |
| Databases (stateful) | On-demand reserved + rightsizing | 20-30% |
| CI/CD runners | Spot + scale-to-zero (KEDA) | 70-80% |
| Critical services 24/7 | Reserved instances + rightsizing | 30-40% |
CI/CD Job Optimization with KEDA + Spot
# Runners CI/CD sono idle la maggior parte del tempo:
# scala a 0 quando non ci sono job, usa spot
# KEDA ScaledJob per runner GitHub Actions
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
name: github-actions-runner
namespace: actions-runners
spec:
jobTargetRef:
template:
spec:
containers:
- name: runner
image: summerwind/actions-runner:latest
resources:
limits:
cpu: "2"
memory: "4Gi"
nodeSelector:
karpenter.sh/capacity-type: spot # usa spot per runners
tolerations:
- key: "spot"
operator: "Exists"
effect: "NoSchedule"
pollingInterval: 30
minReplicaCount: 0 # scala a 0 quando nessun job
maxReplicaCount: 20 # massimo 20 runner paralleli
triggers:
- type: github-runner
metadata:
owner: "myorg"
repos: "myrepo"
targetWorkflowQueueLength: "1"
FinOps Dashboard with Grafana
# PromQL queries per dashboard FinOps Kubernetes
# Costo orario totale del cluster
sum(
kube_pod_container_resource_requests{resource="cpu"} * 0.033 +
kube_pod_container_resource_requests{resource="memory"} / 1073741824 * 0.004
) by (namespace)
# Efficienza CPU per namespace (utilizzo reale / requested)
sum(rate(container_cpu_usage_seconds_total[5m])) by (namespace) /
sum(kube_pod_container_resource_requests{resource="cpu"}) by (namespace) * 100
# Pod con requests molto piu alte del consumo reale (spreco > 70%)
(
kube_pod_container_resource_requests{resource="cpu"} -
rate(container_cpu_usage_seconds_total[24h])
) / kube_pod_container_resource_requests{resource="cpu"} > 0.7
# Importa il dashboard Grafana Kubecost: ID 11270
FinOps Best Practices for Kubernetes
Monthly FinOps Checklist
- Review VPA recommendations: Analyze VPA recommendations and apply rightsizing to the Top 10 wastes
- Check idle nodes: Nodes with CPU and memory usage < 10% for 24 hours must be terminated (Karpenter does this automatically)
- Check spot coverage: The goal is 70-80% of non-stateful nodes on spot instances
- Chargeback reports: Send monthly cost report per team via Kubecost API → email or Slack
- Review ResourceQuota: Increase quotas only with business justification, not by default
- Scale-to-zero for dev environments: Development environments should be shut down outside business hours (-65% cost)
Conclusions and Next Steps
FinOps for Kubernetes is not a one-time optimization: it is an ongoing process that requires visibility (Kubecost), intelligent recommendations (VPA), efficient provisioning (Karpenter with commercial) and organizational governance (ResourceQuota as budget). Implementing all these techniques together, organizations typically achieve savings of 35-55% on your Kubernetes cloud bill in the first 3-6 months.
The next step is to integrate FinOps Kubernetes into the development cycle: the developers they should see the estimated cost of their deployment before merging into production — a "shift-left" approach applied to costs.







