The Problem of Cold Starts in Traditional Serverless

In 2024, a Datadog analysis documented that 40% of Lambda invocations in production suffers cold starts exceeding 500ms. For Python or Java functions this number increases to 1-3 seconds. The cold start is the time that passes from the moment a request arrives to the moment when the function is actually ready to process it: that's the time it takes to start a container, load the runtime, initialize the dependencies.

Cloudflare Workers takes a radically different approach: instead of containers, it uses V8 Isolates. The measurable result: average startup less than 1ms, zero "cold starts" in the traditional sense of the term. Understanding why it requires coming down detailing the V8 engine architecture and process isolation model.

What You Will Learn

  • What is a V8 Isolate and how does it differ from an OS process or Docker container
  • Why containers suffer from structural cold starts and how they manifest themselves
  • The Cloudflare Workers execution model: from request routing to isolate pool
  • Snapshot V8: The technique that eliminates JavaScript initialization cost
  • Isolated model limitations: CPU time, memory, available APIs
  • Benchmark comparison: Workers vs Lambda vs Lambda@Edge
  • When to use Workers and when containers remain the right choice

Containers, Processes and Isolates: a Taxonomy

To understand isolates you must first understand what they are replacing. Every level of the abstraction has a different startup cost:

Primitive Insulation Typical startup Memory overhead Examples
VM (hypervisor) Hardware 10-60 seconds 512MB - 2GB EC2, GCE, Azure VM
Containers (Linux namespaces) Kernel (cgroups + namespaces) 100ms - 2s 50-500MB Docker, Lambda, Cloud Run
OS Process Kernel (PID, VAS) 10-100ms 10-100MB Node.js, Python process
V8 Isolated Runtime (separate heap, no shared memory) < 1ms 1-10MB Cloudflare Workers, Deno Deploy

Un V8 Isolated is an isolated instance of the V8 JavaScript heap: it has the your own heap allocator, your own JavaScript objects, your own garbage collector. Two isolates they do not share JavaScript memory and cannot interfere with each other. This is it the foundation of Workers' safety isolation, but the architectural consequence is that creating an isolate is an operation that takes microseconds, not milliseconds.

Why Lambda Suffers from Cold Starts

When a request arrives to a "cold" (not warm) Lambda function without a container pre-allocated), AWS must execute this sequence:

# Sequenza di cold start Lambda (Node.js 20)
1. Allocazione risorse compute (EC2/Firecracker VM)      ~50-200ms
2. Download immagine container dal registro              ~50-300ms (dipende dalla dimensione)
3. Setup rete: VPC, ENI attachment (se VPC configurata) ~200-1000ms (!)
4. Avvio runtime Node.js                                 ~30-80ms
5. Caricamento dependencies (node_modules)               ~20-200ms
6. Esecuzione handler initialization code               ~10-500ms
7. Prima invocazione effettiva                           ----------

Totale cold start: 360ms - 2.28 secondi (VPC worst case: fino a 3-5s)

Firecracker (the microVM used by AWS) reduced the VM startup time to around 125ms, but the structural problem remains: each execution environment is an isolated container at the kernel level which must be started from scratch at every cold start.

V8 Isolates: Workers Architecture

Cloudflare Workers runs on workerd, the open-source runtime released by Cloudflare in 2022. Every PoP (Point of Presence, there are 300+ worldwide) runs a fleet of workerd processes. When a request arrives:

Sequenza di gestione richiesta in Cloudflare Workers:

1. Richiesta arriva al PoP Cloudflare piu vicino        ~0ms (BGP anycast routing)
2. Routing al processo workerd appropriato               ~0.1ms
3. Lookup/allocation dell'isolate per questo Worker      ~0.5ms (da pool precreato)
4. Esecuzione del fetch handler                          ~0.1ms overhead
5. Risposta

Totale "startup": < 1ms

The key point is in step 3: workerd maintains a pool of isolates pre-initialized. Each isolate has already loaded the Worker code thanks to V8 Snapshots.

V8 Snapshots: Serialize Heap State

V8 supports heap serialization in a binary format called "snapshot". When deploy a Worker, Cloudflare:

  1. Runs the Worker's JavaScript code in a temporary isolate
  2. Let the code complete its initialization (module evaluation)
  3. Serialize the isolate heap into a binary snapshot
  4. Distributes this snapshot to PoPs
  5. New isolates are created from this snapshot (deserialization), not from scratch
// Il tuo Worker ha questa struttura:
import { Router } from 'itty-router';

// Questo codice viene eseguito durante il "module evaluation"
// e serializzato nello snapshot
const router = new Router();

router.get('/users/:id', async ({ params, env }) => {
  const user = await env.DB.get(`user:${params.id}`);
  return Response.json({ user });
});

// Il fetch handler e il punto di ingresso per ogni richiesta
export default {
  fetch: router.fetch
};

// Quando un isolate viene creato dallo snapshot:
// - router e gia costruito e configurato
// - tutte le route sono gia registrate
// - NON c'e alcun costo di module evaluation per ogni richiesta

This is fundamentally different from what happens in Lambda: in Lambda, each cold start must re-execute all module initialization code. In Workers, this phase occurred only once at the time of deployment.

Security Isolation: V8 Sandbox

A common objection is: "if multiple Workers run in the same process, how are they isolated?". The answer is the V8 Sandbox, a mechanism multi-layer:

V8 Insulation Layers

  • Heap separation: Each isolate has a completely separate heap. Another isolate's memory cannot be accessed via JavaScript.
  • No shared mutable state: A Worker's global variables are not visible to other Workers.
  • Controlled APIs: Access to dangerous functionality (filesystem, network raw, process) is controlled by the workerd runtime, not by V8 itself.
  • Sandbox V8: V8 includes sandbox mechanisms that prevent JavaScript to execute arbitrary machine code or access outside memory of its heap.
  • Additional process isolation: workerd can be configured withsecmp-bpf to filter available syscalls.

The model is not without risks: vulnerabilities in the V8 parser could theoretically allow escape from the isolate. Cloudflare has a bug bounty program dedicated and regularly updates V8. For high-security workloads, Cloudflare delivers even more stringent isolation methods.

Limitations of the Isolate Model

The absence of cold start has a cost: the isolated model imposes constraints that containers they do not have:

Limit Cloudflare Workers (free/paid) AWS Lambda
CPU time per request 10ms (free) / 30s (paid, with pauses) 15 minutes
Maximum memory 128MB 10GB
Bundle size 10MB (compressed) / 1MB (free) 250MB unzipped
Direct TCP Limited (Workers TCP Socket API) Full access
Filesystem Not available /tmp (512MB)
Runtime languages JavaScript/TypeScript/WASM Any
Node.js compatibility Partial (nodejs_compat flag) Complete

The limit of CPU time is the most important to understand: it's not a wall-clock timeout. Workers actually measures CPU time consumed. A Worker can make many asynchronous I/O calls (fetch, KV) than not consume CPU time. The limit applies only to running JavaScript code.

Benchmark: Workers vs Lambda vs Lambda@Edge

Here is a comparison based on public benchmarks and official documentation (2025):

Metric Cloudflare Workers AWS Lambda (Node.js) Lambda@Edge Vercel Edge Functions
Cold start P50 < 1ms ~200ms ~100ms < 5ms
Cold start P99 < 5ms ~1500ms ~500ms < 50ms
Global PoPs 300+ ~30 regions ~450 (but limited) ~100+
Global average latency ~10ms (at the edge) ~50-200ms (regional) ~25-100ms ~30ms
Free tiers 100K req/day 1M req/month 1M req/month (shared Lambda) 100GB-hr/month

A Minimal Worker: Understanding the Execution Model

To make what has been explained concrete, here is the simplest possible code shows the life cycle of a Worker:

// worker.ts - formato ES Module (obbligatorio con isolates moderni)

// FASE 1: Module evaluation (eseguita UNA VOLTA, serializzata nello snapshot)
console.log('Questo viene eseguito solo al deploy, non per ogni richiesta');

const CONFIG = {
  version: '1.0',
  region: 'auto',
};

// FASE 2: Export del handler - il punto di ingresso per ogni richiesta
export default {
  // Chiamato per ogni HTTP request
  async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
    // request: la richiesta HTTP
    // env: i binding (KV, R2, D1, secrets, variables)
    // ctx: context per waitUntil() - operazioni post-risposta

    const url = new URL(request.url);
    const path = url.pathname;

    // ctx.waitUntil() permette operazioni asincrone dopo la risposta
    ctx.waitUntil(logRequest(request, env));

    if (path === '/health') {
      return new Response(JSON.stringify({
        status: 'ok',
        config: CONFIG,
        timestamp: Date.now(),
      }), {
        headers: { 'Content-Type': 'application/json' },
      });
    }

    return new Response('Not Found', { status: 404 });
  },
};

async function logRequest(request: Request, env: Env): Promise<void> {
  // Questo viene eseguito dopo che la risposta e gia stata inviata
  // Non blocca il client
  await env.ANALYTICS_KV.put(
    `log:${Date.now()}`,
    JSON.stringify({
      url: request.url,
      method: request.method,
      cf: request.cf, // Informazioni Cloudflare: paese, ASN, datacenter, etc.
    })
  );
}

// Interfaccia per i binding definiti in wrangler.toml
interface Env {
  ANALYTICS_KV: KVNamespace;
  API_KEY: string; // secret
}

The Concurrency Model: One Isolate, Many Requests

A subtle but important aspect: unlike Lambda where each invocation has the own isolated environment, in Workers the same isolate can handle multiple competing requests. This is possible because JavaScript is single-threaded with event loops, so there's no real concurrency on shared state, but I/O-bound requests can be multiplexed.

// ATTENZIONE: stato globale condiviso tra richieste
// In Lambda questo non sarebbe un problema (ogni invocazione ha il suo processo)
// In Workers, piu richieste possono condividere lo stesso isolate

// SBAGLIATO: questo contatore e condiviso tra tutte le richieste
let requestCount = 0;

export default {
  async fetch(request: Request): Promise<Response> {
    requestCount++; // Race condition! Non fare questo.
    return new Response(`Request #${requestCount}`);
  },
};

// CORRETTO: usa ctx.waitUntil() per operazioni post-risposta
// e storage esterno (KV) per contatori condivisi
export default {
  async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
    const count = parseInt(await env.COUNTERS.get('requests') ?? '0') + 1;
    ctx.waitUntil(env.COUNTERS.put('requests', String(count)));
    return new Response(`Requests: ${count}`);
  },
};

Global Status in Workers: Attention

Unlike Lambda (where each invocation is isolated), in Workers the state Global JavaScript can be shared between internally concurrent requests of the same isolated. Always use external storage (KV, D1, R2) for data that needs to persist or be shared. Immutable global variables (configuration, router) they are safe; mutable variables are dangerous.

Workerd: The Open-Source Runtime

In September 2022, Cloudflare made it open-source workerd (github.com/cloudflare/workerd), the runtime that powers Workers. This he had important consequences:

  • Deno Deploy adopted V8 isolates with similar architecture
  • Miniflare (the local simulator) has been using workerd internally since v3
  • You can run workerd on-premise for private environments
  • The community can contribute to the runtime and inspect the security code
# Architettura workerd (semplificata)

┌─────────────────────────────────────────────────┐
│                  workerd process                │
│                                                 │
│  ┌──────────────┐  ┌──────────────┐            │
│  │  Isolate #1  │  │  Isolate #2  │   ...      │
│  │  (Worker A)  │  │  (Worker B)  │            │
│  │              │  │              │            │
│  │  V8 Heap A   │  │  V8 Heap B   │            │
│  │  (isolato)   │  │  (isolato)   │            │
│  └──────┬───────┘  └──────┬───────┘            │
│         │                 │                    │
│  ┌──────▼─────────────────▼───────┐            │
│  │        I/O Subsystem           │            │
│  │  (fetch, KV, R2, D1, DO, AI)  │            │
│  └───────────────────────────────┘            │
│                                                 │
│  ┌─────────────────────────────────────────┐   │
│  │        Network Layer                    │   │
│  │  (Cloudflare anycast, TLS, HTTP/3)     │   │
│  └─────────────────────────────────────────┘   │
└─────────────────────────────────────────────────┘

When NOT to Use Workers

V8 isolates are not the answer to everything. There are scenarios where Lambda or container remain the right choice:

  • Long CPU-intensive computation: ML training, video rendering, audio encoding. The limit of 30s of CPU time and 128MB of RAM is prohibitive.
  • Non-JavaScript code: Workers supports WebAssembly but not all languages compile well into WASM. Native Python, Java, Ruby require Lambda.
  • Complete Node.js ecosystem: Many npm libraries use native Node.js APIs (fs, advanced crypto, buffer manip) not available in Workers.
  • Complex state management: If you need WebSocket sessions durable objects with a lot of status, consider Durable Objects (see article 4 in the series).
  • Persistent Connections Database: Workers does not maintain connections Persistent TCP between requests. Use Hyperdrive for pooling to traditional databases.

Conclusions and Next Steps

The V8 isolates represent an architectural paradigm shift, not just an optimization: they move the isolation boundary from the kernel level to the runtime level, resulting in times of Sub-millisecond boot with sufficient security for multi-tenant edge computing. The cost it is a more constrained execution environment than traditional containers.

For most RESTful APIs, authentication middleware, transformation of requests and content customization, the constraints of Workers are more than acceptable, and the gain in latency is global and measurable.

Next Articles in the Series

  • Article 2: Your First Cloudflare Worker — Fetch Handler, Wrangler and Deploy: from concept to practice, with a working worker in production.
  • Article 3: Edge Persistence — Workers KV, R2, and D1 SQLite: when and how to use each storage layer available in Workers.
  • Article 4: Durable Objects — Strongly Consistent State e WebSocket: The most powerful primitive for stateful applications at the edge.