Hi! I'm

Federico Calò

Software Developer | Technical Writer

I create modern web applications and custom digital tools to help businesses grow through technological innovation. My passion is combining computer science and economics to generate real value.

Contact Me

About Me

My passion for computer science was born at the Technical Commercial Institute of Maglie, where I discovered the power of programming and the fascination of creating digital solutions. From the start, I understood that computer science was not just code, but an extraordinary tool for turning ideas into reality.

During my studies in Business Information Systems, I began to interweave computer science and economics, understanding how technology can be the engine of growth for any business. This vision accompanied me to the University of Bari, where I obtained my degree in Computer Science, deepening my technical skills and passion for software development.

Today I put this experience at the service of businesses, professionals and startups, creating tailor-made digital solutions that automate processes, optimize resources and open new business opportunities. Because true innovation begins when technology meets the real needs of people.

My Skills

Data Analysis & Predictive Models

I transform data into strategic insights with in-depth analysis and predictive models for informed decisions

Process Automation

I create custom tools that automate repetitive operations and free up time for value-added activities

Custom Systems

I develop tailor-made software systems, from platform integrations to customized dashboards

const federico = {
  nome: "Federico Calò",
  ruolo: "Sviluppatore Software",
  città: "Bari, Italia",
  missione: "Aiutare attraverso l'informatica",
  passioni: [
    "Codice Pulito",
    "Innovazione",
    "Crescita Continua"
  ]
};

La Mia Missione

Credo fermamente che l'informatica sia lo strumento più potente per trasformare le idee in realtà e migliorare la vita delle persone.

🚀

Democratizzare la Tecnologia

La mia missione è rendere l'informatica accessibile a tutti: dalle piccole imprese locali alle startup innovative, fino ai professionisti che vogliono digitalizzare la propria attività. Ogni realtà merita di sfruttare le potenzialità del digitale.

💡

Unire Informatica ed Economia

Non è solo questione di scrivere codice: è capire come la tecnologia possa generare valore reale. Intrecciando competenze informatiche e visione economica, aiuto le attività a crescere, ottimizzare processi e raggiungere nuovi traguardi di efficienza e redditività.

🎯

Creare Soluzioni su Misura

Ogni attività è unica, e così devono esserlo le soluzioni. Sviluppo strumenti personalizzati che rispondono alle esigenze specifiche di ciascun cliente, automatizzando processi ripetitivi e liberando tempo per ciò che conta davvero: far crescere il business.

Trasforma la Tua Attività con la Tecnologia

December 2024

View

Master SQL

RoadMap.sh

Novembre 2024

View

Oracle Certified Foundations Associate

Oracle

October 2024

View

People Leadership Credential

Connect

Settembre 2024

💻 Languages & Technologies

☕Java

🐍Python

📜JavaScript

🅰️Angular

⚛️React

🔷TypeScript

🗄️SQL

🐘PHP

🎨CSS/SCSS

🔧Node.js

🐳Docker

🌿Git

💼

12/2024 - Presente

Custom Software Engineering Analyst

Accenture

Bari, Puglia, Italia · Ibrida Analisi e sviluppo di sistemi informatici attraverso l'utilizzo di Java e Quarkus in Health and Public Sector. Formazione continua su tecnologie moderne per la creazione di soluzioni software personalizzate ed efficienti e sugli agenti.

💼

06/2022 - 12/2024

Analista software e Back End Developer Associate Consultant

Links Management and Technology SpA

Esperienza nell'analisi di sistemi software as-is e flussi ETL utilizzando PowerCenter. Formazione completata su Spring Boot per lo sviluppo di applicazioni backend moderne e scalabili. Sviluppatore Backend specializzato in Spring Boot, con esperienza in progettazione di database, analisi, sviluppo e testing dei task assegnati.

💼

02/2021 - 10/2021

Programmatore software

Adesso.it (prima era WebScience srl)

Esperienza nell'analisi AS-IS e TO-BE, evoluzioni SEO ed evoluzioni website per migliorare le performance e l'engagement degli utenti.

🎓

2018 - 2025

Laurea in Informatica

Università degli Studi di Bari Aldo Moro

Bachelor's degree in Computer Science, focusing on software engineering, algorithms, and modern development practices.

📚

2013 - 2018

Diploma - Sistemi Informativi Aziendali

Istituto Tecnico Commerciale di Maglie

Technical diploma specializing in Business Information Systems, combining IT knowledge with business management.

Contattami

Hai un progetto in mente? Parliamone! Compila il form qui sotto e ti risponderò al più presto.

* Campi obbligatori. I tuoi dati saranno utilizzati solo per rispondere alla tua richiesta.

Autoscaling in Kubernetes: HPA, VPA, KEDA and Karpenter

One of the main advantages of Kubernetes in production is the ability to scale automatically workloads in response to demand. Yet, most teams only use a fraction of available autoscaling capabilities: they configure an HPA with CPU scaling and leave it at that. The result? Under-provisioned pods that slow down under load, or over-provisioned nodes that they burn budgets for no reason.

Kubernetes offers four complementary levels of autoscaling: HPA scale horizontally the Pods on CPU/memory/custom metrics, VPA fix resource requests automatically, KEDA enable event-driven scaling on any source (queues, databases, Prometheus metrics), e Karpenter Provisions nodes in less than 30 seconds, 40% faster than traditional Cluster Autoscaler according to CNCF 2025 benchmarks. This article shows how to use them together in production.

What You Will Learn

How the Horizontal Pod Autoscaler (HPA) works with custom and external metrics
Configure the Vertical Pod Autoscaler (VPA) for automatic rightsizing
KEDA: Event-driven autoscaling on SQS, Kafka, Redis and Prometheus queues
Karpenter: Just-in-time node provisioning with NodePool and NodeClass
Combination Pattern: Use HPA and KEDA together without conflicts
Troubleshooting: why your HPA isn't scaling as you expect
Best practices to avoid flap loops and cold starts

Horizontal Pod Autoscaler (HPA)

The HPA and the Kubernetes component that scales the number of replicas of a Deployment, StatefulSet o ReplicaSet based on observed metrics. The HPA controller queries the metrics every 15 seconds (configurable) and calculate the desired number of replicas with the formula:

desiredReplicas = ceil(currentReplicas * (currentMetricValue / desiredMetricValue))

To avoid flapping (continuous up and down scaling), the HPA has a stabilization period: 5 minutes for scale-down and 0 seconds for scale-up by default.

HPA on CPU and Memory

The basic configuration with CPU and memory. Note that to scale to memory, the application must free memory when the load decreases, otherwise the scale-down never happens:

# hpa-basic.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-server-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 60  # scala quando CPU media > 60%
    - type: Resource
      resource:
        name: memory
        target:
          type: AverageValue
          averageValue: "512Mi"  # scala quando memoria media > 512Mi
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0  # scala su immediatamente
      policies:
        - type: Percent
          value: 100
          periodSeconds: 60  # max raddoppio delle repliche al minuto
        - type: Pods
          value: 4
          periodSeconds: 60  # o max 4 pod al minuto
      selectPolicy: Max  # usa la policy piu aggressiva
    scaleDown:
      stabilizationWindowSeconds: 300  # 5 minuti prima di scalare giu
      policies:
        - type: Percent
          value: 25
          periodSeconds: 60  # riduce max 25% delle repliche al minuto
      selectPolicy: Min

HPA with Custom Metrics via Prometheus Adapter

To scale on application metrics (requests per second, queue length, etc.), you need the Prometheus Adapter exposing Prometheus metrics as Custom Metrics Kubernetes API:

# Installa Prometheus Adapter
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus-adapter prometheus-community/prometheus-adapter \
  --namespace monitoring \
  --set prometheus.url=http://kube-prometheus-stack-prometheus.monitoring.svc \
  --set prometheus.port=9090

# prometheus-adapter-config.yaml - regola di mapping metrica
rules:
  custom:
    - seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
      resources:
        overrides:
          namespace: {resource: "namespace"}
          pod: {resource: "pod"}
      name:
        matches: "^(.*)_total$"
        as: "${1}_per_second"
      metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'

---
# hpa-custom-metric.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-server-rps-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 2
  maxReplicas: 50
  metrics:
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "1000"  # 1000 req/s per Pod

HPA with External Metrics

External metrics allow you to scale to sources outside the cluster, such as length of an SQS queue or the number of unconsumed Kafka messages:

# hpa-external-metric.yaml
# Scala in base alla lunghezza di una coda SQS
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: worker-queue-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: queue-worker
  minReplicas: 1
  maxReplicas: 100
  metrics:
    - type: External
      external:
        metric:
          name: sqs_approximate_number_of_messages_visible
          selector:
            matchLabels:
              queue: "job-queue-prod"
        target:
          type: AverageValue
          averageValue: "10"  # 10 messaggi per worker

# Verifica stato HPA
kubectl get hpa -n production -w
kubectl describe hpa api-server-hpa -n production

Vertical Pod Autoscaler (VPA)

VPA monitors the actual CPU and memory usage of your Pods and automatically adjusts i resources.requests e limits. And the solution to the problem of "garbage in, garbage out" of resource requests: if you don't know how many resources yours needs Pod, the VPA finds out for you.

VPA and HPA: Beware of Conflicts

Do not use VPA in mode Auto along with HPA scaling to CPU or memory: the two controllers will conflict. The correct combination is: VPA in mode Off o Initial for resource requests, and HPA for replications on custom metrics. Or use KEDA instead of HPA to avoid the problem.

VPA Installation and Configuration

# Installa VPA
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-install.sh

# oppure con Helm
helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm install vpa fairwinds-stable/vpa --namespace vpa --create-namespace

---
# vpa-recommendation.yaml - modalita Off (solo raccomandazioni)
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-server-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  updatePolicy:
    updateMode: "Off"  # Off|Initial|Recreate|Auto
  resourcePolicy:
    containerPolicies:
      - containerName: api-server
        minAllowed:
          cpu: "100m"
          memory: "128Mi"
        maxAllowed:
          cpu: "4"
          memory: "4Gi"
        controlledResources: ["cpu", "memory"]
        controlledValues: RequestsAndLimits

# Leggi le raccomandazioni VPA
kubectl describe vpa api-server-vpa -n production
# Output tipico:
#   Recommendation:
#     Container Recommendations:
#       Container Name: api-server
#         Lower Bound:    cpu: 100m, memory: 256Mi
#         Target:         cpu: 450m, memory: 512Mi
#         Uncapped Target: cpu: 450m, memory: 512Mi
#         Upper Bound:    cpu: 2000m, memory: 2Gi

VPA in Auto Mode

# vpa-auto.yaml - aggiorna automaticamente i resource (riavvia i Pod)
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: background-worker-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: background-worker
  updatePolicy:
    updateMode: "Auto"  # Riavvia i Pod con i nuovi resource
    minReplicas: 2      # Non aggiornare se le repliche sono meno di 2
  resourcePolicy:
    containerPolicies:
      - containerName: worker
        minAllowed:
          cpu: "200m"
          memory: "256Mi"
        maxAllowed:
          cpu: "2"
          memory: "2Gi"

KEDA: Event-Driven Autoscaling

KEDA (Kubernetes Event-Driven Autoscaling) is a CNCF operator that extends HPA with 60+ pre-built scalers: AWS SQS, Azure Service Bus, Kafka, RabbitMQ, Redis, Prometheus, Datadog, and many more. KEDA can scale a deployment to 0 replicas when there are no events, and set it back to 1 when the first event arrives.

KEDA installation

# Installa KEDA via Helm
helm repo add kedacore https://kedacore.github.io/charts
helm repo update
helm install keda kedacore/keda \
  --namespace keda \
  --create-namespace \
  --version 2.14.0

# Verifica
kubectl get pods -n keda

ScaledObject for Kafka

A worker consuming from a Kafka topic scales based on consumer group lag:

# keda-kafka-scaledobject.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: kafka-consumer-scaler
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: kafka-consumer
  pollingInterval: 15       # controlla ogni 15 secondi
  cooldownPeriod: 30        # aspetta 30s prima di scalare a 0
  minReplicaCount: 0        # scala a zero se non ci sono messaggi
  maxReplicaCount: 50
  advanced:
    restoreToOriginalReplicaCount: true
    horizontalPodAutoscalerConfig:
      behavior:
        scaleDown:
          stabilizationWindowSeconds: 30
  triggers:
    - type: kafka
      metadata:
        bootstrapServers: kafka-broker.kafka.svc:9092
        consumerGroup: my-consumer-group
        topic: orders-topic
        lagThreshold: "100"     # 100 messaggi per replica
        offsetResetPolicy: latest
        allowIdleConsumers: "false"
        scaleToZeroOnInvalidOffset: "false"
      authenticationRef:
        name: kafka-auth  # TriggerAuthentication con credenziali Kafka

ScaledObject for AWS SQS

# keda-sqs-scaledobject.yaml
apiVersion: v1
kind: Secret
metadata:
  name: aws-credentials
  namespace: production
data:
  AWS_ACCESS_KEY_ID: BASE64_KEY
  AWS_SECRET_ACCESS_KEY: BASE64_SECRET
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: aws-trigger-auth
  namespace: production
spec:
  secretTargetRef:
    - parameter: awsAccessKeyID
      name: aws-credentials
      key: AWS_ACCESS_KEY_ID
    - parameter: awsSecretAccessKey
      name: aws-credentials
      key: AWS_SECRET_ACCESS_KEY
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: sqs-worker-scaler
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: sqs-worker
  minReplicaCount: 0
  maxReplicaCount: 100
  triggers:
    - type: aws-sqs-queue
      authenticationRef:
        name: aws-trigger-auth
      metadata:
        queueURL: https://sqs.eu-west-1.amazonaws.com/123456789/job-queue
        queueLength: "5"       # 5 messaggi per replica
        awsRegion: eu-west-1
        identityOwner: pod     # usa IRSA se disponibile

ScaledObject on Prometheus

# keda-prometheus-scaledobject.yaml
# Scala in base a una query Prometheus personalizzata
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: api-latency-scaler
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicaCount: 2
  maxReplicaCount: 30
  triggers:
    - type: prometheus
      metadata:
        serverAddress: http://kube-prometheus-stack-prometheus.monitoring.svc:9090
        metricName: http_request_duration_p99
        threshold: "0.5"  # scala se P99 latency > 500ms
        query: histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket{job="api-server"}[2m])) by (le))

# Verifica stato KEDA
kubectl get scaledobject -n production
kubectl describe scaledobject kafka-consumer-scaler -n production

Karpenter: Just-in-Time Node Provisioning

Karpenter is the next generation node provisioner created by AWS and now CNCF project. Unlike the Cluster Autoscaler, which works with predefined node groups, Karpenter provision nodes with the exact characteristics required by pending Pods: instance type, area, on-demand or spot capacity, CPU/GPU. The result: provisioning in 30-60 seconds versus 3-5 minutes for the Cluster Autoscaler.

Karpenter Architecture

Karpenter completely replaces the Cluster Autoscaler. It has two main CRDs:

NodePool: defines the requirements of the nodes that Karpenter can create (instance types, zones, taints, labels, limits)
NodeClass (EC2NodeClass on AWS): cloud-provider-specific configuration (AMI, subnet, security groups, user data)

Karpenter installation on EKS

# Prerequisiti: IRSA configurata per Karpenter
export CLUSTER_NAME="my-production-cluster"
export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
export AWS_REGION=eu-west-1

# Installa Karpenter con Helm
helm repo add karpenter https://charts.karpenter.sh/
helm repo update

helm upgrade --install karpenter karpenter/karpenter \
  --namespace karpenter \
  --create-namespace \
  --version 1.0.0 \
  --set serviceAccount.annotations."eks.amazonaws.com/role-arn"=arn:aws:iam::${AWS_ACCOUNT_ID}:role/KarpenterControllerRole \
  --set settings.clusterName=${CLUSTER_NAME} \
  --set settings.interruptionQueue=${CLUSTER_NAME} \
  --set controller.resources.requests.cpu=1 \
  --set controller.resources.requests.memory=1Gi

NodePool and EC2NodeClass for Production

# karpenter-nodepool.yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: general-purpose
spec:
  template:
    metadata:
      labels:
        node-type: general-purpose
    spec:
      nodeClassRef:
        apiVersion: karpenter.k8s.aws/v1
        kind: EC2NodeClass
        name: default
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand", "spot"]  # preferisce spot, fallback on-demand
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r"]  # compute, memory, general
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["2"]  # solo istanze di generazione 3+
        - key: karpenter.k8s.aws/instance-cpu
          operator: In
          values: ["4", "8", "16", "32"]
      taints: []
      expireAfter: 720h    # ricicla nodi ogni 30 giorni
      terminationGracePeriod: 48h
  limits:
    cpu: "500"             # max 500 vCPU in questo NodePool
    memory: 2000Gi
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 1m  # consolida nodi vuoti dopo 1 minuto
    budgets:
      - nodes: "20%"      # non drainare piu del 20% dei nodi alla volta
---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiFamily: AL2023
  amiSelectorTerms:
    - alias: al2023@latest   # usa sempre la AMI piu recente
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: "my-production-cluster"
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: "my-production-cluster"
  instanceProfile: KarpenterNodeInstanceProfile
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 100Gi
        volumeType: gp3
        iops: 3000
        encrypted: true
  metadataOptions:
    httpEndpoint: enabled
    httpProtocolIPv6: disabled
    httpPutResponseHopLimit: 1   # sicurezza: blocca access IMDSv1 da container
    httpTokens: required         # richiede IMDSv2

NodePool for GPU Workloads

# karpenter-gpu-nodepool.yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: gpu-nodes
spec:
  template:
    metadata:
      labels:
        node-type: gpu
    spec:
      nodeClassRef:
        apiVersion: karpenter.k8s.aws/v1
        kind: EC2NodeClass
        name: gpu-nodeclass
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]  # GPU spot non disponibile in tutte le zone
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values: ["g5", "p3", "p4d"]  # GPU instance families
      taints:
        - key: nvidia.com/gpu
          effect: NoSchedule      # solo Pod che tollerano questo taint
  limits:
    cpu: "128"
    memory: 1024Gi
    nvidia.com/gpu: "32"          # max 32 GPU in questo NodePool

Consolidation and Cost Optimization

# Forza la consolidazione immediata (utile per test)
kubectl annotate node  karpenter.sh/do-not-disrupt-

# Vedi i nodi creati da Karpenter
kubectl get nodes -l karpenter.sh/nodepool=general-purpose -o wide

# Vedi le decisioni di Karpenter in tempo reale
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter -f | grep -E "launched|terminated|consolidated"

# Vedi quanto sta costando ogni nodo (con Kubecost)
kubectl get nodeclaims -o json | jq '.items[] | {name: .metadata.name, type: .status.providerID, price: .metadata.annotations["karpenter.sh/nodepool"]}'

Combine HPA, KEDA and Karpenter

In a mature manufacturing cluster, these three components work synergistically:

KEDA scale Pods from 0 to N based on events (Kafka lag, SQS depth, Prometheus query)
Karpenter detects pending Pods and provisions nodes with the exact characteristics required in 30-60 seconds
VPA (in Off mode) provides resource request recommendations that you apply manually or via CI/CD pipeline

Recommended Pattern for Production

Stateless server API: KEDA on Prometheus (P99 latency) + Karpenter general-purpose NodePool
Queue worker: KEDA on SQS/Kafka with minReplicas=0 + Karpenter with on-demand/spot mix
Database/StatefulSet: VPA in Auto mode with minReplicas >= 2, no HPA on memory
Batch jobs: KEDA ScaledJob (not ScaledObject) for K8s Jobs finishing
Don't use HPA on CPU together with KEDA - leads to conflicts on targetMetrics

KEDA ScaledJob for Batch

# keda-scaledjob.yaml - per batch job che terminano
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: ml-training-job
  namespace: production
spec:
  jobTargetRef:
    template:
      spec:
        containers:
          - name: trainer
            image: my-registry/ml-trainer:latest
            resources:
              requests:
                cpu: "2"
                memory: "4Gi"
                nvidia.com/gpu: "1"
              limits:
                nvidia.com/gpu: "1"
            tolerations:
              - key: nvidia.com/gpu
                operator: Exists
                effect: NoSchedule
        restartPolicy: Never
  pollingInterval: 30
  maxReplicaCount: 20
  successfulJobsHistoryLimit: 5
  failedJobsHistoryLimit: 3
  triggers:
    - type: aws-sqs-queue
      authenticationRef:
        name: aws-trigger-auth
      metadata:
        queueURL: https://sqs.eu-west-1.amazonaws.com/123456789/ml-jobs
        queueLength: "1"   # 1 job per task
        awsRegion: eu-west-1

HPA and KEDA troubleshooting

# HPA non scala? Controlla lo stato
kubectl describe hpa api-server-hpa -n production
# Cerca: "AbleToScale", "ScalingActive", "DesiredReplicas"
# Errore comune: "failed to get cpu utilization" = metrics-server non installato

# Verifica che metrics-server funzioni
kubectl top pods -n production
kubectl top nodes

# KEDA non scala a zero? Controlla il cooldownPeriod
kubectl get scaledobject kafka-consumer-scaler -n production -o yaml | grep -A5 "conditions"

# Karpenter non provisiona?
kubectl get pods --field-selector=status.phase=Pending -A
kubectl describe pod  | grep "Events" -A20
# Cerca: "0/N nodes are available" + il motivo del pending

# Vedi log Karpenter
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter --tail=50 | grep -i "error\|warning\|launched"

# Simula provisioning senza applicare
kubectl annotate pods  karpenter.sh/do-not-disrupt=true

Best Practices and Anti-Patterns

Best Practices for Autoscaling

Always set minReplicas >= 2 for critical services: scaling from 0 requires a cold start; for APIs in production, maintain at least 2 minimum replicas
Use PodDisruptionBudget: prevent Karpenter/HPA from draining too many Pods during consolidation
Configure accurate resource requests: HPA calculates percentage usage on resource.requests; if they are too low, it never scales
Strict readiness probe: Kubernetes waits for the Pod to be Ready before sending traffic; Without readiness probes, newly scaled Pods receive traffic before they are ready
Monitor flaps: if the HPA scales up and down every few minutes, increase the stabilizationWindowSeconds of scale-down
Use topology spread constraints with Karpenter: distribute Pods across zones for high availability even during provisioning

Anti-Patterns to Avoid

HPA without resource requests defined: HPA cannot calculate percentage usage without requests in the container spec
VPA Auto + HPA on CPU/Memory: the two controllers compete for resources and cause inconsistent scaling; use KEDA on custom metrics if you want both
maxReplicas too low: if your peak traffic requires 100 Pods but maxReplicas and 20, autoscaling is not enough and the service degrades
Karpenter without disruption budget: without disruption.budgets, Karpenter can drain 100% of the nodes during overnight consolidation
Polling interval too low on KEDA: un pollingInterval of 5 seconds on external sources (SQS, external APIs) generates too many API calls and possible throttling

Conclusions and Next Steps

Effective autoscaling in Kubernetes is not a single solution but a multiple strategy levels: KEDA for event-driven scaling of Pods, HPA for usage-based scaling of resources, VPA to optimize resource requests, and Karpenter for provisioning quick of the knots. Used together, these tools can reduce costs by 30-50% compared to statically provisioned clusters, maintaining high SLAs.

The key to success is the accurate configuration of resource requests (VPA helps here), choosing the right metrics to scale on (not always CPU and response), and the configuration of scaling behavior (stabilization periods, rate limiting) to avoid oscillations that can worsen performance instead of improving it.