The Project | About Us | Contribute | Donations | License

HOME

In a monolith with a single database, maintaining consistency is simple: you wrap everything in an ACID transaction and, if something fails, you ROLLBACK and it is as if nothing had happened. But in a microservices architecture each service has its own database, and a business operation (such as "process an order") spans several of them. There is no global ROLLBACK that spans different databases and remote services. So how do we guarantee the consistency of an operation that touches orders, payments, inventory, and shipping?

The answer is the Saga pattern: we split the distributed transaction into a sequence of local transactions, each in one service, coordinated through events or messages. If a step fails, instead of a rollback we execute compensating transactions that semantically undo the previous steps. In this lesson we will study the two ways of implementing sagas (choreography and orchestration), the concept of compensation, and a complete order example.

The problem of distributed transactions
What a Saga is and compensating transactions
Choreography-based saga
Orchestration-based saga
Comparison: choreography vs orchestration
Complete example: processing an order (with diagram)
Common mistakes and tips
Exercises and solutions
Conclusion

The problem of distributed transactions

Imagine the "process order" flow:

Orders creates the order.
Payments charges the customer.
Inventory reserves the stock.
Shipping schedules the delivery.

If step 3 fails (no stock), the charge from step 2 has already been made. We cannot do a distributed ROLLBACK because:

Each service has its own DB; they do not share a transaction.
Classic solutions such as Two-Phase Commit (2PC) lock resources, scale poorly, and create strong coupling. In practice almost no one uses them in microservices.

The saga accepts an uncomfortable truth: in distributed systems we seek eventual consistency, not immediate consistency. There will be a moment when the charge is done but the stock is not reserved; the saga guarantees that the system ends up in a consistent state (everything confirmed or everything compensated).

What a Saga is and compensating transactions

A saga is a sequence of local transactions T1, T2, ..., Tn. Each Ti has a compensating transaction Ci that undoes its effect. If T1, T2, T3 succeed but T4 fails, the saga executes C3, C2, C1 in reverse order.

Crucial aspects of compensations:

It is not a technical rollback, but a semantic one. You do not "delete" the charge; you issue a refund. The history remains (important for auditing).
They must be idempotent and must not be able to fail (or have robust retries), because if the compensation fails, the system is left inconsistent.
Some actions cannot be compensated (sending an email that was already sent). For those, the saga is reordered to execute them at the end, when there is no going back (pivot transaction).

Transaction (Ti)	Compensation (Ci)
Charge the customer	Refund the customer
Reserve stock	Release the reservation
Create order	Cancel order

Choreography-based saga

In choreography there is no central coordinator. Each service listens to events, does its work, and emits a new event that triggers the next one. The flow "emerges" from the chained reactions (it is the broker topology from lesson 05-01).

flowchart LR
    P[Orders] -->|OrderCreated| Pay[Payments]
    Pay -->|PaymentConfirmed| Inv[Inventory]
    Inv -->|StockReserved| Ship[Shipping]
    Inv -.->|OutOfStock| Pay
    Pay -.->|PaymentRefunded| P

Happy path (solid lines): each event triggers the next service.
Failure path (dashed lines): if Inventory emits OutOfStock, Payments reacts by refunding and Orders cancels. Compensations also travel as events.

Advantages: maximum decoupling, no single point of failure. Drawbacks: the overall flow is nowhere; with many steps it becomes hard to follow and debug (risk of "cyclic dependencies" between events).

Orchestration-based saga

In orchestration, a central component —the orchestrator— directs the saga: it sends commands to each service, waits for the response, and decides the next step or triggers the compensations (it is the mediator topology).

// Orchestrator of the "Process Order" saga
@Component
public class ProcessOrderOrchestrator {

    public void process(String orderId) {
        var saga = new SagaState(orderId);
        try {
            // Each step is a local transaction in another service
            saga.mark("ORDER_CREATED",    orders.create(orderId));
            saga.mark("PAYMENT_DONE",     payments.charge(orderId));
            saga.mark("STOCK_RESERVED",   inventory.reserve(orderId));
            saga.mark("SHIPPING_SCHEDULED", shipping.schedule(orderId));
            saga.complete();
        } catch (SagaStepException ex) {
            // If something fails, we compensate in REVERSE order
            compensate(saga);
        }
    }

    private void compensate(SagaState saga) {
        if (saga.did("STOCK_RESERVED")) inventory.release(saga.orderId());
        if (saga.did("PAYMENT_DONE"))   payments.refund(saga.orderId());
        if (saga.did("ORDER_CREATED"))  orders.cancel(saga.orderId());
    }
}

Detailed explanation:

The orchestrator knows the whole flow: the order of the steps and their compensations. This centralizes the process logic in a single readable place.
SagaState records which steps completed. It is essential to persist it (in a DB), because if the orchestrator restarts midway it must know where it was in order to resume or compensate.
In compensate, we only undo the steps that did execute, in reverse order: first release stock, then refund, then cancel the order.
In production, the calls are usually asynchronous (commands via queue and responses via events) instead of blocking invocations, but the coordination logic is the same.

Comparison: choreography vs orchestration

Criterion	Choreography	Orchestration
Coordination	Distributed (events)	Centralized (orchestrator)
Coupling	Minimal	Medium (everyone depends on the orchestrator)
Flow visibility	Low (spread out)	High (in one place)
Ease of debugging	Hard	Easy
Single point of failure	No	Yes (the orchestrator)
Risk	Cyclic event dependencies	Overloaded "god" orchestrator
Ideal for	Few steps (2-4), maximum decoupling	Long and complex flows

Practical rule: use choreography for short, simple sagas; use orchestration when the process has many steps, conditional logic, or you need clear visibility and control.

Complete example: processing an order (with diagram)

Sequence diagram of the orchestrated saga, including the compensation path when stock is missing:

sequenceDiagram
    participant O as Orchestrator
    participant P as Orders
    participant Pay as Payments
    participant I as Inventory
    O->>P: create(order)
    P-->>O: OrderCreated
    O->>Pay: charge(order)
    Pay-->>O: PaymentConfirmed
    O->>I: reserve(stock)
    I-->>O: OutOfStock (FAILURE)
    Note over O: Starts compensation in reverse order
    O->>Pay: refund(order)
    Pay-->>O: PaymentRefunded
    O->>P: cancel(order)
    P-->>O: OrderCancelled

Notice that there is no stock reservation to release (that step failed), so the compensation starts directly with the payment refund and ends by cancelling the order. The system is left in a consistent state: the customer has not paid and has no order.

A key implementation detail: each compensation must be idempotent. If the orchestrator retries refund(order) because it did not receive a response, the Payments service must use the idempotency key (lesson 05-02) so as not to refund twice.

Common Mistakes and Tips

Compensations that can fail and are not retried. If a compensation fails and there are no retries, the system is left inconsistent with no remedy. Make them idempotent and with persistent retries.
Not persisting the saga's state. If the orchestrator restarts and does not know which steps it did, it cannot compensate correctly. The saga's state must survive crashes.
Trying to compensate irreversible actions. A sent email cannot be "un-sent". Place those actions at the end of the saga (after the pivot transaction) or emit a corrective action (an apology email).
Confusing a saga with 2PC. The saga does not offer isolation: during its execution, others may see intermediate states. You must design with that in mind (statuses like "PENDING", temporary reservations, etc.).
Tip: explicitly model each intermediate state of the order (PENDING_PAYMENT, PAID, CANCELLED) so that the temporary inconsistency is visible and controlled, not accidental.

Exercises

Define the compensating transactions for this travel-booking saga: T1 book flight, T2 book hotel, T3 rent car. If T3 fails, which compensations are executed and in what order?
For the "opening an insurance policy" flow (validate data, rate, charge premium, issue policy, send welcome email) decide whether you would use choreography or orchestration and where you would place the email sending. Justify your answer.
Explain why compensating transactions must be idempotent, using as an example a payment refund in a saga with retries.

Solutions

Compensations: C1 cancel flight, C2 cancel hotel, C3 cancel car. If T3 fails, there is no car to cancel; the previous steps are compensated in reverse order: first C2 (cancel hotel) and then C1 (cancel flight).
Orchestration, because it is a long flow with several steps and conditional logic (rating and charging can be rejected), and visibility/control is needed for a regulated process. The welcome email goes at the end, as a non-compensable action after the pivot transaction (issue policy): it is only sent when there is no going back, avoiding having to "un-send" an email if something fails earlier.
With retries (at-least-once), the orchestrator may invoke refund(order) more than once if it does not receive confirmation. If the compensation were not idempotent, we would refund the customer twice or more. Using the payment's idempotency key, the service detects that this refund was already made and ignores it, guaranteeing a single refund despite the retries.

Conclusion

The Saga pattern solves the problem of distributed transactions by decomposing them into coordinated local transactions, with semantic compensations that undo the work when a step fails. We distinguished choreography (chained events, maximum decoupling) from orchestration (central coordinator, maximum visibility), and we saw that the idempotency of compensations is non-negotiable. We also accepted that we gain eventual consistency in exchange for tolerating intermediate states.

In the next lesson, "Real-Time Data Streaming", we close the module by seeing how to process continuous flows of events —no longer individual operations— for analytics and real-time reaction with tools such as Kafka Streams.

Managing Distributed Transactions: The Saga Pattern

Contents

The problem of distributed transactions

What a Saga is and compensating transactions

Choreography-based saga

Orchestration-based saga

Comparison: choreography vs orchestration

Complete example: processing an order (with diagram)

Common Mistakes and Tips

Exercises

Solutions

Conclusion

Application Architecture Course

Module 1: Fundamentals of Application Architecture

Module 2: Design Principles and Tactics

Module 3: Architectural Styles and Patterns

Module 4: Distributed Architectures and Microservices

Module 5: Event-Driven Architectures and Messaging

Module 6: Domain-Driven Design (DDD)

Module 7: Data and Persistence

Module 8: Cloud Architecture and Deployment

Module 9: Quality, Security and Observability

Module 10: Evolution, Governance and Case Studies