In a microservices architecture, the independence of services is not just a matter of code: it also concerns data. The principle of database per service establishes that each microservice must be the exclusive owner of its data and no one else may touch it directly. This rule, simple to state, blows up many habits inherited from the monolithic world: there is no longer a JOIN that spans all tables, nor a single transaction that encompasses the entire operation. In exchange, we gain truly autonomous services, deployable and scalable separately. In this lesson we will study what this pattern implies, what problems it creates (distributed queries and transactions), and the techniques for living with them: sharding, replication, and the separation of views through CQRS.
Contents
- The Database per Service pattern
- The problem of cross-service queries
- The problem of cross-service transactions
- Data scaling: sharding and replication
- CQRS to build read views
- Synthesis: how the pieces fit together
- The Database per Service pattern
The central idea is that each service owns its own database and is the only way to access that data. Another service that needs information must request it through the API, never by querying the other's database.
┌──────────────────┐ ┌──────────────────┐
│ Orders Service │ │ Customers Service│
│ ┌──────────┐ │ │ ┌──────────┐ │
│ │ DB │ │ │ │ DB │ │
│ │ orders │ │ │ │ customers│ │
│ └──────────┘ │ │ └──────────┘ │
└────────┬─────────┘ └─────────┬────────┘
│ API (HTTP / events) │
└────────────────────────────┘Benefits:
- Loose coupling: a service can change its schema without breaking the others.
- Technology freedom: Orders can use PostgreSQL and Customers, MongoDB (polyglot persistence).
- Fault isolation and independent scaling: the load of one service does not affect the other's database.
The price is that we lose the two great conveniences of the monolith: queries that cross data and transactions that span it. Let's look at them.
- The problem of cross-service queries
In a monolith, displaying "Ana's orders with her name and email" is a trivial JOIN:
-- This ONLY works in a monolith with a single database SELECT c.name, c.email, o.id, o.total FROM customers c JOIN orders o ON o.customer_id = c.id WHERE c.id = 1042;
In microservices this JOIN is impossible: customers and orders live in different databases, possibly with different engines. There are three strategies to solve it:
2.1 API Composition
A service (or the gateway) calls several services and joins the results in memory:
public OrderDetailDto detail(long orderId) {
Order o = ordersClient.get(orderId); // call to the Orders service
Customer c = customersClient.get(o.customerId()); // call to the Customers service
return new OrderDetailDto(
c.name(), c.email(), o.id(), o.total()); // "JOIN" done in code
}Simple, but each query involves several network calls and does not scale well for large listings or complex filters (the classic "N+1" problem multiplied by network latency).
2.2 Replication of reference data
The service that needs the data keeps a local read-only copy of the other's data, synchronized through events. For example, Orders keeps a copy of the customer's name, updated when Customers publishes a CustomerUpdated event.
2.3 CQRS with a materialized view
Building a dedicated read database that already has the combined data (we will see this in section 5).
- The problem of cross-service transactions
In a monolith, confirming an order and deducting the stock was a single ACID transaction. In microservices, "order" and "stock" are in different databases: there is no transaction that spans both. Classic distributed transactions (2PC, two-phase commit) exist, but they are fragile, slow, and frowned upon in modern architectures because they lock resources and reduce availability.
The recommended solution is the Saga pattern (studied in Module 5): the operation is decomposed into a sequence of local transactions, each in its own service, coordinated by events. If a step fails, compensating transactions are executed that undo the previous ones.
Create Order (Orders) --ok--> Reserve Stock (Inventory) --ok--> Charge (Payments)
▲ │ fails
└──── Compensate: cancel order ◄───┘The most important conceptual consequence: we give up immediate consistency in exchange for eventual consistency. For a moment the system may be "halfway" (order created, stock not yet deducted), and we must design the application to tolerate it.
// Local step + event publishing, within ONE local transaction
@Transactional
public void createOrder(CreateOrderCommand cmd) {
Order order = Order.of(cmd);
orderRepo.save(order); // LOCAL transaction of the Orders service
publisher.publish(new OrderCreated(order.id(), order.lines()));
// The Inventory service will react to OrderCreated in ITS own transaction
}Here atomicity is only guaranteed within the Orders service. Coordination with Inventory happens asynchronously via the OrderCreated event. So that saving and publishing don't get out of sync, the Transactional Outbox pattern is used, which writes the event in the same local transaction.
- Data scaling: sharding and replication
When a service's data grows, two (complementary) techniques help to scale.
4.1 Replication
Copies of the same data are kept on several nodes. Usually one is the primary (accepts writes) and the others are read-only replicas.
# Conceptual primary-replica replication configuration
database:
primary:
host: db-primary.internal # receives all WRITES
role: read-write
replicas:
- host: db-replica-1.internal # serve READS, offload the primary
role: read-only
- host: db-replica-2.internal
role: read-only- Advantages: it distributes the read load, improves availability (if the primary goes down, a replica can be promoted).
- Cost: replicas may lag slightly behind (replication lag), introducing eventual consistency in reads.
4.2 Sharding (horizontal partitioning)
The data set is divided across several nodes according to a partition key (shard key). Each node stores a different subset; no node has everything.
| Sharding strategy | How it distributes | Advantage | Risk |
|---|---|---|---|
| By range | By intervals of the key (A-M, N-Z) | Efficient range queries | Hotspots |
| By hash | hash(key) % number of shards | Uniform distribution | Expensive range queries |
| By directory | Explicit lookup table | Flexible | The directory is a bottleneck |
shard = hash(customer_id) % 4 customer_id=1042 -> shard 2 → node C customer_id=2001 -> shard 1 → node B
Essential difference: replication copies the same data (scales reads and provides resilience); sharding divides different data (scales writes and volume). In large systems they are combined: each shard also has its replicas.
- CQRS to build read views
CQRS (Command Query Responsibility Segregation) separates the write model (commands that change state) from the read model (queries). We introduced it in Module 5; here we apply it to the problem of distributed queries.
The idea: instead of fighting with impossible JOINs between services, we build a read-only materialized view that already combines the data, fed by the events that the services publish.
// Projector: listens to events and maintains a denormalized read view
@Component
public class OrderSummaryProjector {
private final OrderViewRepository view;
@EventListener
public void on(OrderCreated e) {
// Creates/updates a combined row ready to query
view.save(new OrderSummary(e.orderId(), e.customerId(), "PENDING"));
}
@EventListener
public void on(CustomerRenamed e) {
// Keeps the customer name updated in the view
view.updateCustomerName(e.customerId(), e.newName());
}
}What we achieve:
- The query "orders with the customer's name" is resolved by reading a single table already combined (the
OrderSummaryview), without calls between services. - The read model is denormalized and optimized for the actual queries of the interface.
- In exchange, the view is updated asynchronously: it is eventually consistent with respect to the source data.
-- Query over the materialized view: fast and without cross-service JOINs SELECT customer_name, order_id, status FROM order_summaries WHERE customer_id = 1042;
CQRS is not free: it duplicates data and adds the complexity of maintaining the projectors. Use it when the asymmetry between reads and writes justifies it, not by default.
Common Mistakes and Tips
- Sharing the database between services "just for this query." This is the gateway to the distributed monolith: it breaks autonomy and reintroduces coupling through the database. Forbidden.
- Trying global ACID transactions with 2PC. They sacrifice availability and scalability. Prefer Sagas and eventual consistency.
- Choosing the shard key poorly. A key with an uneven distribution (for example, "country" when 90% are from a single country) creates hotspots that nullify the benefit of sharding.
- Forgetting replication lag. If you read from a replica right after writing to the primary, you may not see your own change. For "read-your-writes" read from the primary or use consistent reads.
- Not designing the Saga's compensations. Each step needs its inverse action; if you forget it, a failure leaves the system in an inconsistent state with no automatic remedy.
- Tip: eventual consistency must be an explicit decision communicated to the business, not a silent side effect.
Exercises
Exercise 1. Explain why a JOIN between the orders table (Orders service) and the customers table (Customers service) is impossible in a database-per-service architecture, and name two ways to solve the need to combine that data.
Exercise 2. You have 100 million customers and a service's database cannot keep up with writes. Replication or sharding? Propose a reasonable shard key and explain a risk of your choice.
Exercise 3. Describe, step by step, how a Saga manages the "create order → reserve stock → charge" operation when the charge fails.
Solutions
Solution 1. It is impossible because both tables reside in different databases, governed by different services and possibly with different engines; the engine of one database cannot execute a JOIN against tables it does not own. Two solutions: (a) API Composition (call both services and join in memory) and (b) CQRS with a materialized view (maintain a read view already combined, fed by events). Local replication of reference data also works.
Solution 2. Sharding, because the problem is one of write volume and replication only scales reads. A reasonable shard key is hash(customer_id), which distributes uniformly. Risk: queries that span ranges of customers or global aggregations become expensive, because they must query all shards and combine results (scatter-gather).
Solution 3. (1) The Orders service creates the order in PENDING status (local transaction) and publishes OrderCreated. (2) The Inventory service reserves the stock (local transaction) and publishes StockReserved. (3) The Payments service attempts to charge and fails, publishing ChargeRejected. (4) Inventory reacts to ChargeRejected by executing the compensation: it releases the reserved stock. (5) Orders reacts by marking the order as CANCELLED. The system returns to a consistent state without having used a global transaction.
Conclusion
You have learned that the database per service is the piece that makes microservices truly autonomous, at the cost of giving up JOINs and global transactions. To live with this you now command a set of techniques: API composition and CQRS with materialized views for queries, Sagas and eventual consistency for transactions, and replication plus sharding for scaling. They all share a common thread: doing more reads and doing them fast. And when we talk about speeding up reads, the tool par excellence is the cache, with its own strategies and, above all, its difficult problem of invalidation. That is exactly what we will see in the last lesson of the module: Caching and Invalidation Strategies.
Application Architecture Course
Module 1: Fundamentals of Application Architecture
- What Is Application Architecture?
- The Role of the Software Architect
- Quality Attributes and Non-Functional Requirements
- Architectural Decisions and Trade-offs
- Architecture Documentation: Views and the C4 Model
Module 2: Design Principles and Tactics
- Coupling, Cohesion and Separation of Concerns
- SOLID Principles Applied to Architecture
- DRY, KISS, YAGNI and Other Design Principles
- Architectural Tactics for Quality Attributes
- Managing Technical Debt
Module 3: Architectural Styles and Patterns
- Monolithic Architecture
- Layered Architecture (N-Tier)
- Client-Server Architecture
- Hexagonal Architecture (Ports and Adapters)
- Clean and Onion Architecture
Module 4: Distributed Architectures and Microservices
- Introduction to Distributed Systems
- Microservices Architecture
- Service Decomposition and Bounded Contexts
- API Gateway, Service Discovery and Inter-Service Communication
- Resilience Patterns: Circuit Breaker, Retry and Bulkhead
- The CAP Theorem and Data Consistency
Module 5: Event-Driven Architectures and Messaging
- Fundamentals of Event-Driven Architecture
- Asynchronous Messaging: Queues and Brokers
- Event Patterns: Event Sourcing and CQRS
- Managing Distributed Transactions: The Saga Pattern
- Real-Time Data Streaming
Module 6: Domain-Driven Design (DDD)
- Core DDD Concepts
- Strategic Design: Bounded Contexts and Ubiquitous Language
- Tactical Design: Entities, Aggregates and Repositories
- Context Mapping
Module 7: Data and Persistence
- Persistence Strategies: SQL vs NoSQL
- Data Access Patterns: Repository, Unit of Work and DAO
- Database per Service and Distributed Data Management
- Caching and Invalidation Strategies
Module 8: Cloud Architecture and Deployment
- Cloud Computing Fundamentals (IaaS, PaaS, SaaS)
- Containers and Orchestration with Docker and Kubernetes
- Serverless Architecture
- Cloud-Native Design Patterns
- Infrastructure as Code (IaC)
Module 9: Quality, Security and Observability
- Scalability: Horizontal vs Vertical and Load Balancing
- High Availability and Fault Tolerance
- Security by Design and Authentication/Authorization
- Observability: Logging, Metrics and Tracing
- Performance and Load Testing
