In a monolith with a single database, maintaining consistency is simple: you wrap everything in an ACID transaction and, if something fails, you ROLLBACK and it is as if nothing had happened. But in a microservices architecture each service has its own database, and a business operation (such as "process an order") spans several of them. There is no global ROLLBACK that spans different databases and remote services. So how do we guarantee the consistency of an operation that touches orders, payments, inventory, and shipping?
The answer is the Saga pattern: we split the distributed transaction into a sequence of local transactions, each in one service, coordinated through events or messages. If a step fails, instead of a rollback we execute compensating transactions that semantically undo the previous steps. In this lesson we will study the two ways of implementing sagas (choreography and orchestration), the concept of compensation, and a complete order example.
Contents
- The problem of distributed transactions
- What a Saga is and compensating transactions
- Choreography-based saga
- Orchestration-based saga
- Comparison: choreography vs orchestration
- Complete example: processing an order (with diagram)
- Common mistakes and tips
- Exercises and solutions
- Conclusion
- The problem of distributed transactions
Imagine the "process order" flow:
- Orders creates the order.
- Payments charges the customer.
- Inventory reserves the stock.
- Shipping schedules the delivery.
If step 3 fails (no stock), the charge from step 2 has already been made. We cannot do a distributed ROLLBACK because:
- Each service has its own DB; they do not share a transaction.
- Classic solutions such as Two-Phase Commit (2PC) lock resources, scale poorly, and create strong coupling. In practice almost no one uses them in microservices.
The saga accepts an uncomfortable truth: in distributed systems we seek eventual consistency, not immediate consistency. There will be a moment when the charge is done but the stock is not reserved; the saga guarantees that the system ends up in a consistent state (everything confirmed or everything compensated).
- What a Saga is and compensating transactions
A saga is a sequence of local transactions T1, T2, ..., Tn. Each Ti has a compensating transaction Ci that undoes its effect. If T1, T2, T3 succeed but T4 fails, the saga executes C3, C2, C1 in reverse order.
Crucial aspects of compensations:
- It is not a technical rollback, but a semantic one. You do not "delete" the charge; you issue a refund. The history remains (important for auditing).
- They must be idempotent and must not be able to fail (or have robust retries), because if the compensation fails, the system is left inconsistent.
- Some actions cannot be compensated (sending an email that was already sent). For those, the saga is reordered to execute them at the end, when there is no going back (pivot transaction).
| Transaction (Ti) | Compensation (Ci) |
|---|---|
| Charge the customer | Refund the customer |
| Reserve stock | Release the reservation |
| Create order | Cancel order |
- Choreography-based saga
In choreography there is no central coordinator. Each service listens to events, does its work, and emits a new event that triggers the next one. The flow "emerges" from the chained reactions (it is the broker topology from lesson 05-01).
flowchart LR
P[Orders] -->|OrderCreated| Pay[Payments]
Pay -->|PaymentConfirmed| Inv[Inventory]
Inv -->|StockReserved| Ship[Shipping]
Inv -.->|OutOfStock| Pay
Pay -.->|PaymentRefunded| P- Happy path (solid lines): each event triggers the next service.
- Failure path (dashed lines): if Inventory emits
OutOfStock, Payments reacts by refunding and Orders cancels. Compensations also travel as events.
Advantages: maximum decoupling, no single point of failure. Drawbacks: the overall flow is nowhere; with many steps it becomes hard to follow and debug (risk of "cyclic dependencies" between events).
- Orchestration-based saga
In orchestration, a central component —the orchestrator— directs the saga: it sends commands to each service, waits for the response, and decides the next step or triggers the compensations (it is the mediator topology).
// Orchestrator of the "Process Order" saga
@Component
public class ProcessOrderOrchestrator {
public void process(String orderId) {
var saga = new SagaState(orderId);
try {
// Each step is a local transaction in another service
saga.mark("ORDER_CREATED", orders.create(orderId));
saga.mark("PAYMENT_DONE", payments.charge(orderId));
saga.mark("STOCK_RESERVED", inventory.reserve(orderId));
saga.mark("SHIPPING_SCHEDULED", shipping.schedule(orderId));
saga.complete();
} catch (SagaStepException ex) {
// If something fails, we compensate in REVERSE order
compensate(saga);
}
}
private void compensate(SagaState saga) {
if (saga.did("STOCK_RESERVED")) inventory.release(saga.orderId());
if (saga.did("PAYMENT_DONE")) payments.refund(saga.orderId());
if (saga.did("ORDER_CREATED")) orders.cancel(saga.orderId());
}
}Detailed explanation:
- The orchestrator knows the whole flow: the order of the steps and their compensations. This centralizes the process logic in a single readable place.
SagaStaterecords which steps completed. It is essential to persist it (in a DB), because if the orchestrator restarts midway it must know where it was in order to resume or compensate.- In
compensate, we only undo the steps that did execute, in reverse order: first release stock, then refund, then cancel the order. - In production, the calls are usually asynchronous (commands via queue and responses via events) instead of blocking invocations, but the coordination logic is the same.
- Comparison: choreography vs orchestration
| Criterion | Choreography | Orchestration |
|---|---|---|
| Coordination | Distributed (events) | Centralized (orchestrator) |
| Coupling | Minimal | Medium (everyone depends on the orchestrator) |
| Flow visibility | Low (spread out) | High (in one place) |
| Ease of debugging | Hard | Easy |
| Single point of failure | No | Yes (the orchestrator) |
| Risk | Cyclic event dependencies | Overloaded "god" orchestrator |
| Ideal for | Few steps (2-4), maximum decoupling | Long and complex flows |
Practical rule: use choreography for short, simple sagas; use orchestration when the process has many steps, conditional logic, or you need clear visibility and control.
- Complete example: processing an order (with diagram)
Sequence diagram of the orchestrated saga, including the compensation path when stock is missing:
sequenceDiagram
participant O as Orchestrator
participant P as Orders
participant Pay as Payments
participant I as Inventory
O->>P: create(order)
P-->>O: OrderCreated
O->>Pay: charge(order)
Pay-->>O: PaymentConfirmed
O->>I: reserve(stock)
I-->>O: OutOfStock (FAILURE)
Note over O: Starts compensation in reverse order
O->>Pay: refund(order)
Pay-->>O: PaymentRefunded
O->>P: cancel(order)
P-->>O: OrderCancelledNotice that there is no stock reservation to release (that step failed), so the compensation starts directly with the payment refund and ends by cancelling the order. The system is left in a consistent state: the customer has not paid and has no order.
A key implementation detail: each compensation must be idempotent. If the orchestrator retries refund(order) because it did not receive a response, the Payments service must use the idempotency key (lesson 05-02) so as not to refund twice.
Common Mistakes and Tips
- Compensations that can fail and are not retried. If a compensation fails and there are no retries, the system is left inconsistent with no remedy. Make them idempotent and with persistent retries.
- Not persisting the saga's state. If the orchestrator restarts and does not know which steps it did, it cannot compensate correctly. The saga's state must survive crashes.
- Trying to compensate irreversible actions. A sent email cannot be "un-sent". Place those actions at the end of the saga (after the pivot transaction) or emit a corrective action (an apology email).
- Confusing a saga with 2PC. The saga does not offer isolation: during its execution, others may see intermediate states. You must design with that in mind (statuses like "PENDING", temporary reservations, etc.).
- Tip: explicitly model each intermediate state of the order (
PENDING_PAYMENT,PAID,CANCELLED) so that the temporary inconsistency is visible and controlled, not accidental.
Exercises
- Define the compensating transactions for this travel-booking saga: T1 book flight, T2 book hotel, T3 rent car. If T3 fails, which compensations are executed and in what order?
- For the "opening an insurance policy" flow (validate data, rate, charge premium, issue policy, send welcome email) decide whether you would use choreography or orchestration and where you would place the email sending. Justify your answer.
- Explain why compensating transactions must be idempotent, using as an example a payment refund in a saga with retries.
Solutions
- Compensations: C1 cancel flight, C2 cancel hotel, C3 cancel car. If T3 fails, there is no car to cancel; the previous steps are compensated in reverse order: first C2 (cancel hotel) and then C1 (cancel flight).
- Orchestration, because it is a long flow with several steps and conditional logic (rating and charging can be rejected), and visibility/control is needed for a regulated process. The welcome email goes at the end, as a non-compensable action after the pivot transaction (issue policy): it is only sent when there is no going back, avoiding having to "un-send" an email if something fails earlier.
- With retries (at-least-once), the orchestrator may invoke
refund(order)more than once if it does not receive confirmation. If the compensation were not idempotent, we would refund the customer twice or more. Using the payment's idempotency key, the service detects that this refund was already made and ignores it, guaranteeing a single refund despite the retries.
Conclusion
The Saga pattern solves the problem of distributed transactions by decomposing them into coordinated local transactions, with semantic compensations that undo the work when a step fails. We distinguished choreography (chained events, maximum decoupling) from orchestration (central coordinator, maximum visibility), and we saw that the idempotency of compensations is non-negotiable. We also accepted that we gain eventual consistency in exchange for tolerating intermediate states.
In the next lesson, "Real-Time Data Streaming", we close the module by seeing how to process continuous flows of events —no longer individual operations— for analytics and real-time reaction with tools such as Kafka Streams.
Application Architecture Course
Module 1: Fundamentals of Application Architecture
- What Is Application Architecture?
- The Role of the Software Architect
- Quality Attributes and Non-Functional Requirements
- Architectural Decisions and Trade-offs
- Architecture Documentation: Views and the C4 Model
Module 2: Design Principles and Tactics
- Coupling, Cohesion and Separation of Concerns
- SOLID Principles Applied to Architecture
- DRY, KISS, YAGNI and Other Design Principles
- Architectural Tactics for Quality Attributes
- Managing Technical Debt
Module 3: Architectural Styles and Patterns
- Monolithic Architecture
- Layered Architecture (N-Tier)
- Client-Server Architecture
- Hexagonal Architecture (Ports and Adapters)
- Clean and Onion Architecture
Module 4: Distributed Architectures and Microservices
- Introduction to Distributed Systems
- Microservices Architecture
- Service Decomposition and Bounded Contexts
- API Gateway, Service Discovery and Inter-Service Communication
- Resilience Patterns: Circuit Breaker, Retry and Bulkhead
- The CAP Theorem and Data Consistency
Module 5: Event-Driven Architectures and Messaging
- Fundamentals of Event-Driven Architecture
- Asynchronous Messaging: Queues and Brokers
- Event Patterns: Event Sourcing and CQRS
- Managing Distributed Transactions: The Saga Pattern
- Real-Time Data Streaming
Module 6: Domain-Driven Design (DDD)
- Core DDD Concepts
- Strategic Design: Bounded Contexts and Ubiquitous Language
- Tactical Design: Entities, Aggregates and Repositories
- Context Mapping
Module 7: Data and Persistence
- Persistence Strategies: SQL vs NoSQL
- Data Access Patterns: Repository, Unit of Work and DAO
- Database per Service and Distributed Data Management
- Caching and Invalidation Strategies
Module 8: Cloud Architecture and Deployment
- Cloud Computing Fundamentals (IaaS, PaaS, SaaS)
- Containers and Orchestration with Docker and Kubernetes
- Serverless Architecture
- Cloud-Native Design Patterns
- Infrastructure as Code (IaC)
Module 9: Quality, Security and Observability
- Scalability: Horizontal vs Vertical and Load Balancing
- High Availability and Fault Tolerance
- Security by Design and Authentication/Authorization
- Observability: Logging, Metrics and Tracing
- Performance and Load Testing
