Service Mesh: Istio vs Linkerd, mTLS and Traffic Management
In a microservices architecture on Kubernetes, each service communicates with dozens of others. Who guarantees that these communications are encrypted? Who manages the automatic retry when a service is temporarily unreachable? Who provides the latency and rate metrics of error for each source-destination pair? Without a service mesh, answer these questions requires custom code in each service.
Un service mesh solves these problems at the infrastructure level, transparent to application. The two dominant players in the Kubernetes environment are Istio, the most feature-rich, and Linkerd, designed for simplicity and minimal overhead. This article shows how to install both, configure automatic mTLS, manage traffic with canary deployment and circuit breaking, and when to choose one or the other.
What You Will Learn
- How a service mesh works: data plane (sidecar) and control plane
- Automatic mTLS: what it means, how to check it, how to handle exceptions
- Istio: installation, VirtualService, DestinationRule, canary and blue/green
- Linkerd: lightweight installation, SMI TrafficSplit, extensions
- Circuit breaking with Istio OutlierDetection
- Retry and timeout at the infrastructure level
- Observability: golden signals metrics from the service mesh
- Istio vs Linkerd: when to choose which one
How a Service Mesh Works
The service mesh is based on the automatic injection of a sidecar proxy (Envoy for Istio, Linkerd2-proxy for Linkerd) in each Pod. The sidecar intercepts everything traffic in and out of the application container, without requiring changes to the code.
Il data plane and the set of all proxy sidecars that handle the traffic effective. The control plane (Istiod for Istio, linkerd-control-plane for Linkerd) distributes configuration to proxies, manages certificates for mTLS, and collects telemetry.
Istio vs Linkerd comparison
| Characteristic | Istio | Linkerd |
|---|---|---|
| Sidecar proxy | Envoy (C++, 50-100MB) | linkerd2-proxy (Rust, 10-20 MB) |
| Memory overhead for Pod | 100-200MB | 20-30MB |
| Latency overhead P99 | 2-5 ms | 0.5-1ms |
| Automatic mTLS | Yes (cert-manager or built-in) | Yes (24h automatic rotation) |
| Traffic management | Full (VirtualService, DR) | Basic (HTTPRoute, TrafficSplit) |
| L7 policy | HTTP, gRPC, TCP | HTTP, gRPC |
| Entrance | API Gateway + Istio Gateway | API Gateway |
| Learning curve | Steep | Moderate |
| Production maturity | High (Google, Airbnb) | High (Shopify, Microsoft) |
Istio: Basic Installation and Configuration
Installation with istioctl
# Scarica e installa istioctl
curl -L https://istio.io/downloadIstio | ISTIO_VERSION=1.22.0 sh -
export PATH=$PWD/istio-1.22.0/bin:$PATH
# Installa Istio con profilo di produzione
istioctl install --set profile=production -y
# Il profilo production abilita:
# - HA con multiple repliche del control plane
# - Affinity rules per distribuire su nodi diversi
# - Risorse CPU/memoria adeguate
# - Solo le feature necessarie (no addon come Kiali, Jaeger)
# Abilita l'iniezione automatica del sidecar nel namespace
kubectl label namespace production istio-injection=enabled
# Verifica lo stato del mesh
istioctl proxy-status
# Analizza la configurazione per possibili problemi
istioctl analyze --namespace production
mTLS with Istio: PeerAuthentication and DestinationRule
Istio implements automatic mTLS between all injected sidecar Pods. With PeerAuthentication can be done via mTLS STRICT (required) a namespace or entire mesh level:
# mtls-strict.yaml
# Abilita mTLS STRICT per tutto il namespace production
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
name: default-mtls
namespace: production
spec:
mtls:
mode: STRICT # DISABLE, PERMISSIVE, STRICT
---
# Eccezione: un servizio legacy che non ha sidecar
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
name: legacy-service-exception
namespace: production
spec:
selector:
matchLabels:
app: legacy-service
mtls:
mode: PERMISSIVE # accetta anche connessioni plain-text
# Verifica mTLS
kubectl exec -n production frontend-pod -c istio-proxy -- \
pilot-agent request GET /config_dump | grep -A5 "tls_context"
# Visualizza lo stato mTLS con istioctl
istioctl x describe pod frontend-pod.production
VirtualService: Routing and Traffic Splitting
The VirtualService defines how traffic to a service is routed, with rules based on HTTP headers, weights, path matching:
# virtual-service-canary.yaml
# Canary deployment: 90% traffico a v1, 10% a v2
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
name: api-service-vs
namespace: production
spec:
hosts:
- api-service # nome del Kubernetes Service
http:
# Routing per header: dev e QA ricevono sempre v2
- match:
- headers:
x-user-group:
exact: "beta-testers"
route:
- destination:
host: api-service
subset: v2
# Traffico generale: 90/10 split
- route:
- destination:
host: api-service
subset: v1
weight: 90
- destination:
host: api-service
subset: v2
weight: 10
# Timeout e retry a livello di infrastruttura
timeout: 5s
retries:
attempts: 3
perTryTimeout: 2s
retryOn: "5xx,reset,connect-failure,retriable-4xx"
---
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
name: api-service-dr
namespace: production
spec:
host: api-service
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100
http:
http2MaxRequests: 1000
maxRequestsPerConnection: 10
loadBalancer:
simple: LEAST_CONN # ROUND_ROBIN, RANDOM, LEAST_CONN
subsets:
- name: v1
labels:
version: v1
- name: v2
labels:
version: v2
Circuit Breaker with OutlierDetection
The circuit breaker temporarily removes "unhealthy" hosts from load balancing pool when they exceed defined error thresholds:
# destination-rule-circuit-breaker.yaml
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
name: payment-service-circuit-breaker
namespace: production
spec:
host: payment-service
trafficPolicy:
connectionPool:
tcp:
maxConnections: 50
connectTimeout: 3s
http:
http1MaxPendingRequests: 100
http2MaxRequests: 500
maxRetries: 3
outlierDetection:
# Rimuovi un host se in 10 secondi riceve 5 errori 5xx
consecutiveGatewayErrors: 5
consecutive5xxErrors: 5
interval: 10s
# Tienilo fuori per 30 secondi
baseEjectionTime: 30s
# Massimo 50% degli host puo essere rimosso
maxEjectionPercent: 50
# Analizza solo richieste con 100ms o piu di latenza
minHealthPercent: 50
Istio Gateway for Ingress Traffic
# istio-gateway.yaml
apiVersion: networking.istio.io/v1
kind: Gateway
metadata:
name: production-gateway
namespace: istio-system
spec:
selector:
istio: ingressgateway
servers:
- port:
number: 443
name: https
protocol: HTTPS
tls:
mode: SIMPLE
credentialName: wildcard-tls-cert # Secret con cert TLS
hosts:
- "*.federicocalo.dev"
- port:
number: 80
name: http
protocol: HTTP
tls:
httpsRedirect: true # redirect tutto HTTP a HTTPS
hosts:
- "*.federicocalo.dev"
---
# Collega il Gateway al VirtualService
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
name: api-gateway-vs
namespace: production
spec:
hosts:
- "api.federicocalo.dev"
gateways:
- istio-system/production-gateway
- mesh # anche per traffico interno al mesh
http:
- route:
- destination:
host: api-service
port:
number: 8080
AuthorizationPolicy: Access Control
# authorization-policy.yaml
# Solo il frontend puo chiamare il backend
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: backend-access-policy
namespace: production
spec:
selector:
matchLabels:
app: backend
action: ALLOW
rules:
- from:
- source:
principals:
- "cluster.local/ns/production/sa/frontend-service-account"
to:
- operation:
methods: ["GET", "POST"]
paths: ["/api/v1/*"]
---
# Blocca tutto il resto (default deny)
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: deny-all
namespace: production
spec:
action: DENY
# Nessun selector = si applica a tutti i Pod nel namespace
# Nessuna rule = blocca tutto
Linkerd: Installation and Configuration
Linkerd favors operational simplicity and performance. Its proxy written in Rust It has an overhead of ~1ms P99 latency and ~20MB of memory per Pod, making it ideal for clusters with many microservices or resource constraints.
Linkerd installation
# Installa Linkerd CLI
curl --proto '=https' --tlsv1.2 -sSfL https://run.linkerd.io/install | sh
export PATH=$HOME/.linkerd2/bin:$PATH
# Prerequisiti: verifica che il cluster sia compatibile
linkerd check --pre
# Installa il control plane
linkerd install --crds | kubectl apply -f -
linkerd install | kubectl apply -f -
# Verifica installazione
linkerd check
# Abilita injection sul namespace
kubectl annotate namespace production linkerd.io/inject=enabled
# Installa l'estensione Viz per osservabilita
linkerd viz install | kubectl apply -f -
linkerd viz check
linkerd viz dashboard &
Canary with Linkerd and HTTPRoute (Gateway API)
# linkerd-canary-httproute.yaml
# Linkerd usa la Gateway API per il traffic splitting
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: api-service-canary
namespace: production
spec:
parentRefs:
- name: api-service
kind: Service
group: core
port: 80
rules:
- backendRefs:
- name: api-service-v1
port: 80
weight: 90
- name: api-service-v2
port: 80
weight: 10
# Monitora il canary con linkerd viz
linkerd viz stat httproute/api-service-canary -n production
linkerd viz routes deploy/api-service-v2 -n production
Observable with Linkerd
# Visualizza metriche golden signals per tutti i deployment
linkerd viz stat deploy -n production
# Output tipico:
# NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P99 TCP_CONN
# api-service 4/4 99.8% 245 1ms 12ms 42
# backend-service 3/3 98.2% 180 3ms 45ms 28
# payment-service 2/2 100.0% 65 8ms 89ms 12
# Visualizza il traffico per un singolo Pod
linkerd viz tap pod/api-service-xyz -n production
# Genera un report di osservabilita
linkerd viz check --proxy -n production
Istio Telemetry: Metrics, Tracing and Logging
Istio automatically issues the golden signals (latency, traffic, errors, saturation) for each source-destination pair. Metrics are exposed in format Prometheus and visible in Grafana with the official Istio dashboards.
# telemetry-config.yaml
# Configura il sampling per il distributed tracing
apiVersion: telemetry.istio.io/v1
kind: Telemetry
metadata:
name: mesh-tracing
namespace: istio-system
spec:
tracing:
- providers:
- name: tempo # o jaeger, zipkin
randomSamplingPercentage: 1.0 # campiona 1% del traffico in produzione
---
# Metriche custom per un servizio specifico
apiVersion: telemetry.istio.io/v1
kind: Telemetry
metadata:
name: payment-service-metrics
namespace: production
spec:
selector:
matchLabels:
app: payment-service
metrics:
- providers:
- name: prometheus
overrides:
- match:
metric: REQUEST_COUNT
tagOverrides:
destination_version:
value: "request.headers['x-version'] | 'unknown'"
# Importa le dashboard Grafana ufficiali Istio
# Dashboard ID: 7639 (Mesh Overview), 11829 (Service), 12378 (Workload)
Service Mesh vs Cilium eBPF: When to Use Them
With the advent of Cilium, which offers mTLS and L7 policies without sidecars (via eBPF in the kernel), the choice is complicated. Here is a practical guide:
| Scenario | Recommended Choice | Reason |
|---|---|---|
| Microservices with advanced traffic management (canary, header-based routing) | Istio | VirtualService/DestinationRule irreplaceable |
| mTLS security with minimal overhead | Linkerd or Cilium | Rust proxy or minimal eBPF |
| Cluster with hundreds of microservices, limited resources | Linkerd or Cilium mTLS | Sidecar overhead multiplied by each Pod |
| Multi-cluster or multi-cloud | Istio | Mesh federation built-in |
| Team new to service mesh | Linkerd | Significantly lower learning curve |
Best Practices for Service Mesh
Checklist for Service Mesh in Production
- Start in PERMISSIVE mode: enable mTLS in PERMISSIVE before making it STRICT, to identify services that do not yet have sidecar
- Set timeout on all VirtualServices: without timeout, a slow dependency blocks the entire call chain
- Circuit breaker for each external dependency: databases, external APIs, payment services
- Use retry only for idempotent errors: do not do automatic retry on POST without idempotency key, otherwise you will create duplicate orders
- Monitor CPU/memory sidecar: in clusters with many Pods, the aggregate sidecar overhead can be significant
- Test failover: manually disable a pod and verify that the circuit breaker and retries work as expected
- Keep the mesh updated: Istio and Linkerd versions have short life cycles (6-12 months); plan regular upgrades
Common Anti-Patterns
- Infinite retries with amplification: if A recalls B with 3 retry, and B recalls C with 3 retry, a single error generates 9 attempts towards C (retry storm)
- Too generous timeouts: a 60 second timeout on a critical API means that during an outage all application threads/goroutines are blocked for 60s
- Ignoring control plane overhead: Istiod consumes significant CPU/memory for each connected proxy; in clusters with 1000+ Pods, a dedicated node is needed
- Selective injection without plane: if only some Pods have sidecars, mTLS STRICT fails for communications to Pods without sidecars
Conclusions and Next Steps
Service mesh has stopped being a "nice to have" and has become a requirement for any serious microservices architecture. Automatic mTLS, granular observability, circuit breaking and declarative traffic management solve real problems without a service mesh would require custom code in each service.
The choice between Istio and Linkerd depends on your needs: Istio for complex scenarios with advanced and multi-cluster traffic management, Linkerd for operational simplicity and overhead minimum. In both cases, the initial investment in learning and setup and paid off by the reduction of infrastructural code in applications and visibility unprecedented impact on intra-cluster traffic.
Upcoming Articles in the Kubernetes at Scale Series
Previous Articles
Related Series
- Kubernetes Networking: Cilium with eBPF — alternative to service mesh for mTLS
- Observability and OpenTelemetry — distributed tracing integrated with Istio
- Platform Engineering — service mesh as an internal platform







