The Project | About Us | Contribute | Donations | License

HOME

When data is spread and replicated across several nodes, a fundamental question arises: can we simultaneously guarantee that all reads see the most recent data, that the system always responds, and that it tolerates network failures? The CAP Theorem proves that we cannot: we must choose. Understanding this trade-off is essential for designing the data storage of any distributed system, and for choosing correctly among the dozens of databases available today.

In this lesson we will study the CAP Theorem, its PACELC refinement, the difference between strong and eventual consistency, and classify the most common databases according to the guarantees they offer.

The three properties: C, A, and P
The statement of the CAP Theorem
Why P is not optional in practice
CP versus AP: the real choice
PACELC: beyond CAP
Strong versus eventual consistency
Table of databases by CAP
Common mistakes and tips
Exercises
Conclusion

The three properties: C, A, and P

The CAP Theorem, formulated by Eric Brewer, deals with three properties of a distributed data system:

Property	Meaning
C — Consistency	Every read returns the most recent write or an error. All nodes see the same data at the same time.
A — Availability	Every request receives a response (not an error), even if it may not be the most recent data.
P — Partition tolerance	The system keeps working despite messages being lost between nodes (a network "partition").

Beware: the "Consistency" of CAP is not the "C" of ACID. Here it means that all replicas agree, not transactional integrity.

The statement of the CAP Theorem

The theorem states:

In the presence of a network partition, a distributed system can only guarantee two of the three properties: either consistency or availability, but not both.

graph TD
    CAP[CAP Theorem] --> C[Consistency]
    CAP --> A[Availability]
    CAP --> P[Partition tolerance]
    C -.- note["During a partition:<br/>choose C or A"]
    A -.- note

Imagine two replicated nodes that stop communicating (partition). A write arrives at one of them. Now a read arrives at the other node, which does not have that write. The system must decide:

Return the old data (maintains availability, sacrifices consistency).
Reject the read or wait (maintains consistency, sacrifices availability).

There is no third option. That is why CAP is a choice, not a buffet.

Why P is not optional in practice

Many people believe they can choose "CA" (consistency and availability, giving up P). In a truly distributed system this is an illusion: network partitions happen (cables get cut, switches fail, extreme latencies). You cannot guarantee that there will never be a partition.

Therefore, in a distributed system P is mandatory, and the real choice comes down to:

CP: during a partition, I prioritize consistency (I may stop responding).
AP: during a partition, I prioritize availability (I may return old data).

A "CA" system only makes sense on a single machine (a classic relational database on a single server), where there is no network to partition.

CP versus AP: the real choice

The decision depends on the business case:

Scenario	Choice	Reason
Bank balance, critical stock	CP	Better an error than incorrect data.
Shopping cart, catalog	AP	Better to respond with something than to go down.
Booking limited seats	CP	We don't want to sell the same thing twice.
Visit counter, "likes"	AP	A small lag is irrelevant.

graph LR
    P[There is a network partition] --> D{What do I prioritize?}
    D -->|Consistency| CP[CP: reject or wait<br/>to avoid serving wrong data]
    D -->|Availability| AP[AP: respond with<br/>possibly stale data]

The guiding question is always: what is worse for my business, giving an incorrect response or giving no response?

PACELC: beyond CAP

CAP only talks about what happens during a partition, which is a rare event. PACELC, formulated by Daniel Abadi, completes the picture by adding what happens the rest of the time:

Partition: Availability or Consistency; Else (in normal operation): Latency or Consistency.

In other words:

If there is a Partition → you choose between A and C (as in CAP).
If there is not (Else) → you still choose between Latency and Consistency, because guaranteeing strong consistency between replicas requires coordination, which costs time.

System	PACELC classification	Reading
Cassandra	PA/EL	Prioritizes availability and low latency.
MongoDB	PA/EC (configurable)	Available during a partition, consistent in normal operation.
Systems with strong consensus	PC/EC	Always prioritize consistency, at the cost of latency.

PACELC is more useful than CAP in day-to-day work, because most of the time there are no partitions, and yet you still pay a price for consistency.

Strong versus eventual consistency

Model	Guarantee	Cost	When to use it
Strong	Every read sees the last write.	Higher latency, lower availability.	Money, stock, bookings.
Eventual	Replicas converge "over time".	Possible stale reads.	Catalogs, counters, feeds.

Eventual consistency does not mean "incorrect data forever", but rather that, if writes stop arriving, all replicas eventually agree. There are useful intermediate levels, such as read-your-own-writes (a user always sees their own changes).

-- In Cassandra, the consistency level is chosen per query.
-- QUORUM: confirms on a majority of replicas (balance).
CONSISTENCY QUORUM;
INSERT INTO policies (id, line) VALUES ('POL-00123', 'home');

-- ONE: confirms with a single replica (fast, less consistent).
CONSISTENCY ONE;
SELECT * FROM policies WHERE id = 'POL-00123';

What is interesting about systems like Cassandra is that you tune consistency per operation: critical writes with QUORUM, tolerant reads with ONE. If reads and writes together add up to more replicas than the total (W + R > N), you obtain strong consistency on top of an AP base system.

Table of databases by CAP

An indicative classification (many are configurable and can move between categories):

Database	Type	Typical CAP	Notes
PostgreSQL / MySQL (single node)	Relational	CA	No real partition, being a single node.
MongoDB	Document	CP (by default)	Prioritizes consistency; configurable.
HBase	Columnar	CP	Built on HDFS, consistent.
Redis (cluster)	Key-value	CP	Coherence over availability.
Cassandra	Columnar	AP	Consistency tunable per query.
DynamoDB	Key-value	AP (configurable)	Eventual by default, strong optional.
Riak	Key-value	AP	Designed for high availability.
CouchDB	Document	AP	Replication and conflict resolution.

Take this table as a guide, not as dogma: the specific configuration (replication factors, consistency levels) determines the actual behavior.

Common Mistakes and Tips

Confusing the C in CAP with the C in ACID: in CAP it is "all replicas agree"; in ACID it is "transactional integrity". They are not the same.
Believing you can choose CA in a distributed system: partitions happen; you must choose between CP and AP.
Applying strong consistency to everything: it increases latency and reduces availability needlessly. Reserve strong consistency for critical data.
Ignoring PACELC: the cost of consistency is also paid when there are NO partitions, in the form of latency.
Choosing a database because it's trendy: choose according to the guarantees your business case needs, not the most talked-about technology.

Exercises

State the CAP Theorem and explain why, in a real distributed system, the practical choice comes down to CP or AP.
For each case, indicate whether you would choose strong consistency (CP) or eventual (AP) and justify: (a) the balance of a bank account, (b) the number of "likes" on a post, (c) booking a seat at a cinema.
Explain what PACELC adds compared to CAP and what the "PA/EL" classification means.

Solutions

CAP says that, during a network partition, a distributed system can only guarantee two of the three properties (C, A, P). Since partitions are inevitable in a truly distributed system, P is mandatory; therefore, the real choice is between CP (sacrificing availability to maintain consistency) and AP (sacrificing consistency to maintain availability).
(a) CP/strong: an incorrect balance is unacceptable; better an error than wrong data. (b) AP/eventual: a momentary lag in "likes" is irrelevant; always responding takes priority. (c) CP/strong: we cannot sell the same seat twice, so consistency rules.
PACELC adds what happens when there is no partition: even then you must choose between Latency and Consistency. "PA/EL" means: during a Partition prioritize availability (A), and in normal operation (Else) prioritize low Latency (at the cost of consistency). It is the profile of systems like Cassandra.

Conclusion

We have learned that the CAP Theorem imposes an inevitable trade-off: during a network partition, you must choose between consistency (CP) and availability (AP). PACELC refines the idea by reminding us that, even without partitions, we pay in latency for consistency. The choice between strong and eventual consistency, and the database that underpins it, must always respond to the needs of the business: is an incorrect response or no response worse?

With this lesson we close Module 4, Distributed Architectures and Microservices. You have traveled the complete path: from the fundamentals of distributed systems and their fallacies, through decomposition into microservices and bounded contexts, communication mechanisms, resilience patterns, all the way to data consistency trade-offs. This knowledge enables you to design robust distributed systems, aware of their limits and suited to each business problem.

The CAP Theorem and Data Consistency

Contents

The three properties: C, A, and P

The statement of the CAP Theorem

Why P is not optional in practice

CP versus AP: the real choice

PACELC: beyond CAP

Strong versus eventual consistency

Table of databases by CAP

Common Mistakes and Tips

Exercises

Solutions

Conclusion

Application Architecture Course

Module 1: Fundamentals of Application Architecture

Module 2: Design Principles and Tactics

Module 3: Architectural Styles and Patterns

Module 4: Distributed Architectures and Microservices

Module 5: Event-Driven Architectures and Messaging

Module 6: Domain-Driven Design (DDD)

Module 7: Data and Persistence

Module 8: Cloud Architecture and Deployment

Module 9: Quality, Security and Observability

Module 10: Evolution, Governance and Case Studies