Why 2PC Doesn't Work in Microservices

Il Two-Phase Commit (2PC) is the classic protocol for distributed transactions: a Coordinator asks all participants to prepare (prepare phase), then order the commit. This works well for homogeneous databases (e.g. two PostgreSQL instances), but is problematic in microservices:

  • Coupling: All services must be available during the transaction
  • Performance: resources remain locked until committed (distributed locks)
  • Limited support: Many cloud services, REST APIs, and NoSQL databases do not support 2PC
  • Scalability: the coordinator becomes a single point of failure

The Saga Pattern replaces one large distributed transaction with one sequence of local transactions, each completed independently, coordinated through events or an orchestrator.

Example: E-commerce Purchasing Saga

Saga steps for a purchase:

  1. Create Order (Orders Service) → OrdineCreato
  2. Reserve Inventory (Inventory Service) → Reserved Inventory
  3. Process Payment (Payment Service) → PaymentConfirmed
  4. Confirm Order (Order Service) → OrderConfirmed

If the payment fails, the compensating transactions are:

  1. Release Inventory (Inventory Service)
  2. Cancel Order (Order Service)

Choreography Saga: Service Autonomy

In the Choreography Saga, there is no central orchestrator. Each service listens to the events produced by the previous service and reacts accordingly, in turn producing new events. The coordination logic is distributed between services: each knows only its own step and the events to which it reacts.

Pro: maximum decoupling, no single point of failure, simple to implement. Against: hard to debug (saga logic is distributed), no visibility central to the state of the saga, risk of cycles of poorly managed events.

// CHOREOGRAPHY SAGA con Spring Kafka

// ===== STEP 1: Servizio Ordini - crea l'ordine =====
@Service
public class OrdiniService {

    private final KafkaTemplate<String, Object> kafkaTemplate;

    public String creaOrdine(CreaOrdineCommand cmd) {
        String ordineId = UUID.randomUUID().toString();

        // Salva l'ordine in stato PENDING
        Ordine ordine = Ordine.builder()
            .id(ordineId)
            .clienteId(cmd.getClienteId())
            .prodotti(cmd.getProdotti())
            .stato(StatoOrdine.PENDING)
            .build();
        ordineRepository.save(ordine);

        // Pubblica evento: il Servizio Inventario lo ascolterà
        OrdineCreato event = new OrdineCreato(ordineId, cmd.getProdotti());
        kafkaTemplate.send("ordini-events", ordineId, event);

        return ordineId;
    }

    // Ascolta la conferma del pagamento per finalizzare l'ordine
    @KafkaListener(topics = "pagamenti-events", groupId = "ordini-service")
    public void onPagamento(ConsumerRecord<String, Object> record) {
        if (record.value() instanceof PagamentoConfermato event) {
            ordineRepository.updateStato(event.getOrdineId(), StatoOrdine.CONFERMATO);
            kafkaTemplate.send("ordini-events", event.getOrdineId(),
                new OrdineConfermato(event.getOrdineId()));
        }
        if (record.value() instanceof PagamentoFallito event) {
            // Compensating: annulla l'ordine
            ordineRepository.updateStato(event.getOrdineId(), StatoOrdine.ANNULLATO);
        }
    }
}

// ===== STEP 2: Servizio Inventario - riserva i prodotti =====
@Service
public class InventarioService {

    @KafkaListener(topics = "ordini-events", groupId = "inventario-service")
    public void onOrdinCreato(ConsumerRecord<String, Object> record) {
        if (record.value() instanceof OrdineCreato event) {
            try {
                // Tenta la prenotazione dell'inventario
                prenotaInventario(event.getOrdineId(), event.getProdotti());

                // Successo: pubblica evento per il Servizio Pagamenti
                kafkaTemplate.send("inventario-events", event.getOrdineId(),
                    new InventarioRiservato(event.getOrdineId(), event.getTotale()));

            } catch (InventarioInsufficienterException e) {
                // Fallimento: pubblica evento di fallimento
                // Il Servizio Ordini annullerà l'ordine
                kafkaTemplate.send("inventario-events", event.getOrdineId(),
                    new InventarioNonDisponibile(event.getOrdineId(), e.getMessage()));
            }
        }

        // Compensating: rilascia l'inventario se il pagamento fallisce
        if (record.value() instanceof PagamentoFallito event) {
            rilasciaInventario(event.getOrdineId());
        }
    }
}

// ===== STEP 3: Servizio Pagamenti - processa il pagamento =====
@Service
public class PagamentiService {

    @KafkaListener(topics = "inventario-events", groupId = "pagamenti-service")
    public void onInventarioRiservato(ConsumerRecord<String, Object> record) {
        if (record.value() instanceof InventarioRiservato event) {
            try {
                processaPagamento(event.getOrdineId(), event.getTotale());

                kafkaTemplate.send("pagamenti-events", event.getOrdineId(),
                    new PagamentoConfermato(event.getOrdineId()));

            } catch (PagamentoRifiutatoException e) {
                kafkaTemplate.send("pagamenti-events", event.getOrdineId(),
                    new PagamentoFallito(event.getOrdineId(), e.getMessage()));
            }
        }
    }
}

Orchestration Saga: Centralized Control

In the'Orchestration Saga, a central component called Saga Orchestrator (or Process Manager) coordinates the entire sequence. The orchestrator knows all the steps, send Command to services and wait for their response events. In case of failure, the orchestrator explicitly sends compensation commands.

Pro: centralized and visible logic, easy to debug, explicit saga state, more precise error handling. Against: major coupling, the orchestrator knows all the services involved.

// ORCHESTRATION SAGA con Axon Framework (Java)
// Axon gestisce automaticamente la persistenza dello stato della saga

import org.axonframework.saga.*;
import org.axonframework.eventhandling.*;

@Saga
public class AcquistoSaga {

    @Autowired
    private transient CommandGateway commandGateway;

    // Lo stato della saga persiste automaticamente tra gli step
    private String ordineId;
    private List<ProdottoOrdinato> prodotti;
    private BigDecimal totale;

    // STEP 1: Avvio della saga quando viene creato un ordine
    @StartSaga
    @SagaEventHandler(associationProperty = "ordineId")
    public void on(OrdineCreato event) {
        this.ordineId = event.getOrdineId();
        this.prodotti = event.getProdotti();
        this.totale = event.getTotale();

        // Invia comando al servizio inventario
        commandGateway.send(new RiservaInventarioCommand(
            ordineId,
            prodotti
        ));
    }

    // STEP 2: Inventario riservato con successo - procedi con il pagamento
    @SagaEventHandler(associationProperty = "ordineId")
    public void on(InventarioRiservato event) {
        commandGateway.send(new ProcessaPagamentoCommand(
            ordineId,
            totale,
            "CARTA_CREDITO"
        ));
    }

    // STEP 3: Pagamento confermato - finalizza l'ordine
    @SagaEventHandler(associationProperty = "ordineId")
    public void on(PagamentoConfermato event) {
        commandGateway.send(new ConfermaOrdineCommand(ordineId));
    }

    // Fine saga (successo)
    @EndSaga
    @SagaEventHandler(associationProperty = "ordineId")
    public void on(OrdineConfermato event) {
        System.out.println("Saga completata con successo per ordine: " + ordineId);
    }

    // ===== COMPENSATING TRANSACTIONS =====

    // Pagamento fallito: rilascia inventario, poi annulla ordine
    @SagaEventHandler(associationProperty = "ordineId")
    public void on(PagamentoFallito event) {
        // Compensating step 1: rilascia inventario
        commandGateway.send(new RilasciaInventarioCommand(ordineId, prodotti));
    }

    // Inventario rilasciato - annulla l'ordine
    @SagaEventHandler(associationProperty = "ordineId")
    public void on(InventarioRilasciato event) {
        // Compensating step 2: annulla l'ordine
        commandGateway.send(new AnnullaOrdineCommand(ordineId, "Pagamento fallito"));
    }

    // Inventario non disponibile - annulla direttamente
    @SagaEventHandler(associationProperty = "ordineId")
    public void on(InventarioNonDisponibile event) {
        commandGateway.send(new AnnullaOrdineCommand(ordineId, "Inventario insufficiente"));
    }

    // Fine saga per percorso di fallimento
    @EndSaga
    @SagaEventHandler(associationProperty = "ordineId")
    public void on(OrdineAnnullato event) {
        System.out.println("Saga fallita/annullata per ordine: " + ordineId + ". Motivo: " + event.getMotivo());
    }
}

Saga with AWS Step Functions (Serverless)

For serverless architectures on AWS, Step Functions it is the managed service to implement Orchestration Saga. Each step is a Lambda function or an AWS action, and Step Functions automatically manages state, retry and compensations by definition in Amazon States Language (ASL).

{
  "Comment": "Saga acquisto e-commerce con compensating transactions",
  "StartAt": "CreaOrdine",
  "States": {
    "CreaOrdine": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:eu-west-1:123456:function:crea-ordine",
      "Next": "RiservaInventario",
      "Catch": [
        {
          "ErrorEquals": ["States.ALL"],
          "Next": "FallimentoSaga"
        }
      ]
    },
    "RiservaInventario": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:eu-west-1:123456:function:riserva-inventario",
      "Next": "ProcessaPagamento",
      "Catch": [
        {
          "ErrorEquals": ["InventarioInsufficienterException"],
          "Next": "CompensaAnnullaOrdine"
        }
      ]
    },
    "ProcessaPagamento": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:eu-west-1:123456:function:processa-pagamento",
      "Retry": [
        {
          "ErrorEquals": ["TransientPaymentError"],
          "IntervalSeconds": 2,
          "MaxAttempts": 3,
          "BackoffRate": 2
        }
      ],
      "Next": "ConfermaOrdine",
      "Catch": [
        {
          "ErrorEquals": ["PagamentoRifiutatoException"],
          "Next": "CompensaRilasciaInventario"
        }
      ]
    },
    "ConfermaOrdine": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:eu-west-1:123456:function:conferma-ordine",
      "End": true
    },

    "CompensaRilasciaInventario": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:eu-west-1:123456:function:rilascia-inventario",
      "Next": "CompensaAnnullaOrdine"
    },
    "CompensaAnnullaOrdine": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:eu-west-1:123456:function:annulla-ordine",
      "Next": "FallimentoSaga"
    },
    "FallimentoSaga": {
      "Type": "Fail",
      "Error": "SagaFailed",
      "Cause": "La transazione distribuita e fallita, compensazioni eseguite"
    }
  }
}

Idempotence in Compensating Transactions

Compensating transactions must be idempotent: if invoked several times (due to network or orchestrator retry), the result must be the same. The most common technique is to use a idempotency key: the orderId + the operation type form a unique key that prevents duplicate operations.

// RilasciaInventarioHandler.java - Compensating transaction idempotente
@CommandHandler
public void handle(RilasciaInventarioCommand cmd) {
    // Controllo idempotenza: questa compensazione e gia stata eseguita?
    String idempotencyKey = cmd.getOrdineId() + ":RILASCIA_INVENTARIO";

    if (compensazioniEseguite.contains(idempotencyKey)) {
        System.out.println("Compensazione gia eseguita per " + idempotencyKey + ", skip");
        return;
    }

    // Esegui la compensazione
    for (ProdottoOrdinato prodotto : cmd.getProdotti()) {
        inventarioRepository.incrementaDisponibilita(
            prodotto.getProductId(),
            prodotto.getQuantita()
        );
    }

    // Segna la compensazione come eseguita
    compensazioniEseguite.add(idempotencyKey);

    // Pubblica evento per continuare la catena di compensazione
    eventBus.publish(new InventarioRilasciato(cmd.getOrdineId()));
}

Saga State Tracking: Visibility into the State

One of the criticisms of Orchestration Saga is the difficulty of tracking the state in high volume systems. An explicit saga tracking table solves this problem:

-- Tabella di tracking per le saga in corso
CREATE TABLE saga_state (
    saga_id       VARCHAR(36) PRIMARY KEY,
    saga_type     VARCHAR(100) NOT NULL,
    ordine_id     VARCHAR(36) NOT NULL,
    current_step  VARCHAR(100) NOT NULL,
    stato         VARCHAR(50) NOT NULL,  -- IN_CORSO, COMPLETATA, FALLITA, COMPENSATA
    started_at    TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at    TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    completed_at  TIMESTAMPTZ,
    error_message TEXT
);

-- Query utili per monitoring
-- Saga in corso da piu di 5 minuti (potenzialmente bloccate)
SELECT * FROM saga_state
WHERE stato = 'IN_CORSO'
  AND started_at < NOW() - INTERVAL '5 minutes';

-- Distribuzione per step corrente
SELECT current_step, COUNT(*) as count
FROM saga_state WHERE stato = 'IN_CORSO'
GROUP BY current_step;

Comparison: Choreography vs Orchestration

I wait Choreography Orchestration
Coupling Low (events) Medium (orchestrator knows services)
Status visibility Hard (distributed) High (centralized)
Debugging Hard (distributed logic) Easy (explicit state)
Complexity of services High (each service knows the compensations) Low (simple services, logic in orchestrator)
Scalability High (no single points) Medium (Orchestrator to scale)
Ideal use case Few steps, autonomous services, independent teams Many steps, complex logic, necessary visibility

Next Steps in the Series

  • Article 6 – AWS EventBridge: Choreography Saga uses a message bus to exchange events. EventBridge is the ideal AWS managed bus for intelligent routing of events between Lambda and SQS.
  • Article 10 – Outbox Pattern: Ensure atomic dispatch of Saga events together with the local database update.

Link with Other Series

  • Event Sourcing + CQRS (Article 4): each step of the Saga produces events which can be used as a source of truth to reconstruct the aggregate state. Axon Framework natively supports the Saga + Event Sourcing combination.
  • Apache Kafka (Series 38): Kafka is the ideal message bus for Choreography Saga in production, with guarantees of durability and ordering per key partition.