Hi! I'm

Federico Calò

Software Developer | Technical Writer

I create modern web applications and custom digital tools to help businesses grow through technological innovation. My passion is combining computer science and economics to generate real value.

Contact Me

About Me

My passion for computer science was born at the Technical Commercial Institute of Maglie, where I discovered the power of programming and the fascination of creating digital solutions. From the start, I understood that computer science was not just code, but an extraordinary tool for turning ideas into reality.

During my studies in Business Information Systems, I began to interweave computer science and economics, understanding how technology can be the engine of growth for any business. This vision accompanied me to the University of Bari, where I obtained my degree in Computer Science, deepening my technical skills and passion for software development.

Today I put this experience at the service of businesses, professionals and startups, creating tailor-made digital solutions that automate processes, optimize resources and open new business opportunities. Because true innovation begins when technology meets the real needs of people.

My Skills

Data Analysis & Predictive Models

I transform data into strategic insights with in-depth analysis and predictive models for informed decisions

Process Automation

I create custom tools that automate repetitive operations and free up time for value-added activities

Custom Systems

I develop tailor-made software systems, from platform integrations to customized dashboards

const federico = {
  nome: "Federico Calò",
  ruolo: "Sviluppatore Software",
  città: "Bari, Italia",
  missione: "Aiutare attraverso l'informatica",
  passioni: [
    "Codice Pulito",
    "Innovazione",
    "Crescita Continua"
  ]
};

La Mia Missione

Credo fermamente che l'informatica sia lo strumento più potente per trasformare le idee in realtà e migliorare la vita delle persone.

🚀

Democratizzare la Tecnologia

La mia missione è rendere l'informatica accessibile a tutti: dalle piccole imprese locali alle startup innovative, fino ai professionisti che vogliono digitalizzare la propria attività. Ogni realtà merita di sfruttare le potenzialità del digitale.

💡

Unire Informatica ed Economia

Non è solo questione di scrivere codice: è capire come la tecnologia possa generare valore reale. Intrecciando competenze informatiche e visione economica, aiuto le attività a crescere, ottimizzare processi e raggiungere nuovi traguardi di efficienza e redditività.

🎯

Creare Soluzioni su Misura

Ogni attività è unica, e così devono esserlo le soluzioni. Sviluppo strumenti personalizzati che rispondono alle esigenze specifiche di ciascun cliente, automatizzando processi ripetitivi e liberando tempo per ciò che conta davvero: far crescere il business.

Trasforma la Tua Attività con la Tecnologia

December 2024

View

Master SQL

RoadMap.sh

Novembre 2024

View

Oracle Certified Foundations Associate

Oracle

October 2024

View

People Leadership Credential

Connect

Settembre 2024

💻 Languages & Technologies

☕Java

🐍Python

📜JavaScript

🅰️Angular

⚛️React

🔷TypeScript

🗄️SQL

🐘PHP

🎨CSS/SCSS

🔧Node.js

🐳Docker

🌿Git

💼

12/2024 - Presente

Custom Software Engineering Analyst

Accenture

Bari, Puglia, Italia · Ibrida Analisi e sviluppo di sistemi informatici attraverso l'utilizzo di Java e Quarkus in Health and Public Sector. Formazione continua su tecnologie moderne per la creazione di soluzioni software personalizzate ed efficienti e sugli agenti.

💼

06/2022 - 12/2024

Analista software e Back End Developer Associate Consultant

Links Management and Technology SpA

Esperienza nell'analisi di sistemi software as-is e flussi ETL utilizzando PowerCenter. Formazione completata su Spring Boot per lo sviluppo di applicazioni backend moderne e scalabili. Sviluppatore Backend specializzato in Spring Boot, con esperienza in progettazione di database, analisi, sviluppo e testing dei task assegnati.

💼

02/2021 - 10/2021

Programmatore software

Adesso.it (prima era WebScience srl)

Esperienza nell'analisi AS-IS e TO-BE, evoluzioni SEO ed evoluzioni website per migliorare le performance e l'engagement degli utenti.

🎓

2018 - 2025

Laurea in Informatica

Università degli Studi di Bari Aldo Moro

Bachelor's degree in Computer Science, focusing on software engineering, algorithms, and modern development practices.

📚

2013 - 2018

Diploma - Sistemi Informativi Aziendali

Istituto Tecnico Commerciale di Maglie

Technical diploma specializing in Business Information Systems, combining IT knowledge with business management.

Contattami

Hai un progetto in mente? Parliamone! Compila il form qui sotto e ti risponderò al più presto.

* Campi obbligatori. I tuoi dati saranno utilizzati solo per rispondere alla tua richiesta.

Cluster observability: Prometheus, Grafana and OpenTelemetry

“If you can't measure it, you can't manage it.” For a production Kubernetes cluster, this it means having visibility on three dimensions: metrics (how much does he consume each component), log (what is happening in the system), e trace (how requests move through microservices). The Prometheus + Grafana + stack Loki + OpenTelemetry is the open-source answer to this need.

In this article we will build a complete observability stack for Kubernetes: we will install kube-prometheus-stack for infrastructure metrics, we will configure Loki for aggregate logs, and we will integrate OpenTelemetry Collector to collect traces distributed by applications and forward them to Tempo (the backend of Grafana tracing). The result is a unified observability platform displayed in Grafana.

What You Will Learn

Install kube-prometheus-stack: Prometheus Operator, kube-state-metrics, Node Exporter
Create ServiceMonitor and PodMonitor for automatic app scraping
PrometheusRule for critical cluster alerts (OOMKill, CrashLoopBackOff, etc.)
Loki + Promtail for aggregated logs with Kubernetes labels
OpenTelemetry Collector - Configurable telemetry pipeline
Grafana Tempo for distributed tracing
Prebuilt Grafana dashboards for Kubernetes
Metrics-log-trace correlation in Grafana (Exemplars)

Architecture of the Observability Stack

Before installing, let's understand how the components relate to each other:

Prometheus: Collect metrics with HTTP scraping. Store data for 15-30 days
kube-state-metrics: Exposes metrics on the status of K8s objects (Deployment, Pod, etc.)
Node Exporter: Exposes node hardware metrics (CPU, disk, network)
Loki: Aggregate Pod logs. It does not index the content, only the labels
Promtail: DaemonSet that sends container logs to Loki
OpenTelemetry Collector: Receives traces/metrics/logs from apps and routes them to backends
Grafana Time: Backend for distributed tracing (traces)
Grafana: Unified UI to view metrics (Prometheus), log (Loki), trace (Tempo)

Installing kube-prometheus-stack

# Installa kube-prometheus-stack (include Prometheus, Alertmanager, Grafana, kube-state-metrics, Node Exporter)
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# values.yaml per produzione
cat > kube-prometheus-values.yaml << 'EOF'
# Prometheus
prometheus:
  prometheusSpec:
    retention: 30d
    retentionSize: "50GB"
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: ssd
          resources:
            requests:
              storage: 100Gi
    # Scrape tutte le ServiceMonitor/PodMonitor nel cluster
    serviceMonitorSelectorNilUsesHelmValues: false
    podMonitorSelectorNilUsesHelmValues: false
    ruleSelectorNilUsesHelmValues: false
    resources:
      requests:
        cpu: 500m
        memory: 2Gi
      limits:
        cpu: 2000m
        memory: 8Gi

# Alertmanager
alertmanager:
  alertmanagerSpec:
    storage:
      volumeClaimTemplate:
        spec:
          storageClassName: ssd
          resources:
            requests:
              storage: 10Gi
  config:
    global:
      slack_api_url: "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
    route:
      receiver: 'slack-critical'
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 12h
      routes:
        - receiver: 'slack-critical'
          matchers:
            - alertname =~ ".*Critical.*"
        - receiver: 'slack-warning'
          matchers:
            - severity = warning
    receivers:
      - name: 'slack-critical'
        slack_configs:
          - channel: '#alerts-critical'
            send_resolved: true
            title: '[{{ .Status | toUpper }}] {{ .GroupLabels.alertname }}'
            text: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'
      - name: 'slack-warning'
        slack_configs:
          - channel: '#alerts-warning'
            send_resolved: true

# Grafana
grafana:
  enabled: true
  ingress:
    enabled: true
    hosts:
      - grafana.company.com
  persistence:
    enabled: true
    size: 10Gi
  # Datasource Loki pre-configurato
  additionalDataSources:
    - name: Loki
      type: loki
      url: http://loki.monitoring.svc:3100
      jsonData:
        derivedFields:
          - datasourceUid: tempo
            matcherRegex: '"traceID":"(\w+)"'
            name: TraceID
            url: '${__value.raw}'
    - name: Tempo
      type: tempo
      url: http://tempo.monitoring.svc:3100

# kube-state-metrics
kube-state-metrics:
  metricLabelsAllowlist:
    - pods=[team,environment,app]
    - deployments=[team,environment,app]
EOF

helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace \
  --version 65.0.0 \
  -f kube-prometheus-values.yaml

# Verifica
kubectl get pods -n monitoring
kubectl get servicemonitors -A

ServiceMonitor for Application Scraping

The Prometheus Operator uses ServiceMonitor e PodMonitor to dynamically configure application metrics scraping. It's no use modify Prometheus configuration:

# servicemonitor-api-service.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: api-service-monitor
  namespace: team-alpha-production
  labels:
    team: team-alpha   # label per selezionare questo monitor
spec:
  selector:
    matchLabels:
      app: api-service   # seleziona il Service con questo label
  endpoints:
    - port: metrics      # nome della porta nel Service
      interval: 30s
      path: /metrics
      # Basic auth se le metriche sono protette
      # basicAuth:
      #   username: { name: metrics-auth, key: username }
      #   password: { name: metrics-auth, key: password }
  namespaceSelector:
    matchNames:
      - team-alpha-production
---
# Aggiungi la porta metrics al Service dell'applicazione
apiVersion: v1
kind: Service
metadata:
  name: api-service
  namespace: team-alpha-production
  labels:
    app: api-service
spec:
  selector:
    app: api-service
  ports:
    - name: http
      port: 8080
    - name: metrics        # porta dedicata alle metriche
      port: 9090
      targetPort: 9090

Cluster Critical Alert

# prometheusrule-kubernetes-alerts.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: kubernetes-critical-alerts
  namespace: monitoring
  labels:
    prometheus: kube-prometheus
spec:
  groups:
    - name: kubernetes.critical
      rules:
        # Pod in CrashLoopBackOff
        - alert: PodCrashLoopBackOff
          expr: |
            rate(kube_pod_container_status_restarts_total[15m]) > 0
            AND
            kube_pod_container_status_waiting_reason{reason="CrashLoopBackOff"} == 1
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} in CrashLoopBackOff"

        # Pod OOMKilled
        - alert: PodOOMKilled
          expr: |
            kube_pod_container_status_last_terminated_reason{reason="OOMKilled"} == 1
          for: 0m
          labels:
            severity: warning
          annotations:
            summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} terminato per OOM"
            description: "Aumenta i memory limits del container {{ $labels.container }}"

        # Nodo sotto pressione di memoria
        - alert: NodeMemoryPressure
          expr: kube_node_status_condition{condition="MemoryPressure",status="true"} == 1
          for: 2m
          labels:
            severity: critical
          annotations:
            summary: "Nodo {{ $labels.node }} sotto MemoryPressure"

        # PVC quasi pieno
        - alert: PersistentVolumeFillingUp
          expr: |
            kubelet_volume_stats_available_bytes /
            kubelet_volume_stats_capacity_bytes < 0.15
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "PVC {{ $labels.namespace }}/{{ $labels.persistentvolumeclaim }} al 85% della capacita"

        # Deployment con zero repliche disponibili
        - alert: DeploymentUnavailable
          expr: kube_deployment_status_replicas_available == 0
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "Deployment {{ $labels.namespace }}/{{ $labels.deployment }} ha 0 repliche disponibili"

Loki + Promtail for Aggregate Logs

# Installa Loki (modalita monolitica per cluster medio)
helm repo add grafana https://grafana.github.io/helm-charts

helm install loki grafana/loki \
  --namespace monitoring \
  --set loki.commonConfig.replication_factor=1 \
  --set loki.storage.type=filesystem \
  --set singleBinary.replicas=1 \
  --set monitoring.selfMonitoring.enabled=false

# Installa Promtail (DaemonSet che invia log a Loki)
helm install promtail grafana/promtail \
  --namespace monitoring \
  --set config.clients[0].url=http://loki.monitoring.svc:3100/loki/api/v1/push \
  --set config.snippets.extraScrapeConfigs='
- job_name: kubernetes-pods
  kubernetes_sd_configs:
    - role: pod
  pipeline_stages:
    - cri: {}
    - labeldrop:
        - filename
  relabel_configs:
    - source_labels: [__meta_kubernetes_pod_label_team]
      target_label: team
    - source_labels: [__meta_kubernetes_pod_label_app]
      target_label: app
    - source_labels: [__meta_kubernetes_namespace]
      target_label: namespace'

# Query Loki in Grafana (LogQL):
# Tutti i log error del team-alpha:
# {namespace="team-alpha-production"} |= "ERROR"

# Error rate per app negli ultimi 5 minuti:
# sum(rate({namespace="team-alpha-production"} |= "ERROR" [5m])) by (app)

# Log strutturati JSON - estrai campo:
# {app="api-service"} | json | level="error" | line_format "{{.message}}"

OpenTelemetry Collector for Distributed Traces

# Installa OpenTelemetry Operator
kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml

---
# otel-collector.yaml - pipeline di telemetria
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: otel-collector
  namespace: monitoring
spec:
  mode: DaemonSet   # un collector per nodo
  config:
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318

    processors:
      batch:
        timeout: 1s
        send_batch_size: 1024
      # Aggiunge metadati Kubernetes ai trace (namespace, pod name, etc.)
      k8sattributes:
        auth_type: "serviceAccount"
        passthrough: false
        extract:
          metadata:
            - k8s.pod.name
            - k8s.namespace.name
            - k8s.deployment.name
            - k8s.node.name
          labels:
            - tag_name: team
              key: team
              from: pod
      # Campionamento: mantieni solo 10% dei trace in produzione (volume alto)
      probabilistic_sampler:
        sampling_percentage: 10

    exporters:
      otlp/tempo:
        endpoint: http://tempo.monitoring.svc:4317
        tls:
          insecure: true
      prometheus:
        endpoint: "0.0.0.0:8889"
        const_labels:
          cluster: production

    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [batch, k8sattributes, probabilistic_sampler]
          exporters: [otlp/tempo]
        metrics:
          receivers: [otlp]
          processors: [batch, k8sattributes]
          exporters: [prometheus]

Installation of Grafana Tempo

# Installa Grafana Tempo
helm install tempo grafana/tempo-distributed \
  --namespace monitoring \
  --set storage.trace.backend=local

# Alternativa: Tempo monolitico per cluster piccoli/medi
helm install tempo grafana/tempo \
  --namespace monitoring \
  --set tempo.storage.trace.backend=filesystem \
  --set tempo.storage.trace.local.path=/var/tempo

Grafana dashboard for Kubernetes

Grafana has a catalog of pre-built dashboards. Import these IDs from the UI (Dashboards > Import):

Dashboards	ID Grafana	Utility
Kubernetes Cluster Overview	7249	CPU/Memory/Pod Overview by Node
Kubernetes Deployments	8588	Deployment status, restart rate, replicas
Node Exporter Full	1860	Node hardware metrics (CPU, disk, network)
Kubernetes PVC	13646	Use of PVC storage
Loki Dashboard	15141	Aggregate logs, error rate, log explorer
NGINX Input Controller	9614	Request rate, latency, input status code

Metrics-Log-Trace Correlation (Exemplars)

The true power of this stack is the correlation: from a metric of high latency you can go directly to the corresponding trace, and from trace to logs of the Pod at that time. This is called Exemplary:

# Abilita gli exemplar in Prometheus (già abilitati in kube-prometheus-stack)
# Nell'applicazione, includi il traceID nell'histogram metric:

# Go/Python con OpenTelemetry:
# Quando crei un histogram, aggiungi l'exemplar con il trace ID corrente
# Il Prometheus scraper lo raccoglie e lo conserva

# In Grafana:
# 1. Vai alla dashboard API latency
# 2. Vedi un picco di latenza
# 3. Clicca il diamante (exemplar) sul grafico
# 4. Grafana ti apre automaticamente il trace in Tempo
# 5. Dal trace, clicca sul servizio con errore
# 6. Grafana mostra i log di quel Pod in Loki per quel timestamp

# Questo flusso metriche → trace → log e il "holy grail" dell'osservabilita

Best Practices for Kubernetes Observability

Production Observability Checklist

USE Method for resources: For each resource (CPU, memory, disk, network): Utilization, Saturation, Errors. These are the 3 fundamental alerts for each node
RED Method for services: For each service: Rate (req/s), Errors (error rate), Duration (latency). Alert on all three
SLO-based alerting: Don't alert on every anomaly, but only when you are using up your SLO error budget. Less noise, more signal
Differentiated retention: Raw metrics 15 days, monthly aggregates 1 year. Raw log 7 days, audit log 1 year
Sampling for tracks: Don't keep 100% of the traces in production — it costs too much. 1-10% is enough for debugging
Consistent labels: Each metric, log and trace must have team, environment, app, version for filtering

Conclusions and Next Steps

A complete observability stack — Prometheus for metrics, Loki for logs, OpenTelemetry + Trace Time — transforms Kubernetes from a black box into a system understandable. The correlation between the three signals in Grafana drastically reduces the time Debugging average from hours to minutes.

The next and last article in this series — Kubernetes Multi-Cloud with Federation e Submariner — addresses the challenge of managing multiple clusters across different cloud providers as if were only one, extending all the concepts of this series to multi-cluster.

Next article in the series

Kubernetes Multi-Cloud: Federation, Submariner and Unified Management

Related Series

Observability and OpenTelemetry — in-depth analysis of application instrumentation
GitOps with ArgoCD — Argo Rollouts uses Prometheus for automatic canary analysis