"There are only two hard things in computer science: cache invalidation and naming things." This famous joke by Phil Karlton sums up the paradox of caching: it is the most effective technique for speeding up a system and, at the same time, one of the subtlest sources of bugs. A cache stores copies of expensive-to-obtain data in a fast-access location, so as not to recompute them or request them from the database every time. Used well, it reduces latency from milliseconds to microseconds and drastically offloads the main storage. Used poorly, it serves stale data, hides errors, and causes cascading failures. In this lesson we will study cache levels, read and write patterns, invalidation and TTL strategies, classic problems such as the cache stampede, and we will look at concrete examples with Redis.
Contents
- What a cache is and why it works
- Cache levels
- Read patterns: cache-aside and read-through
- Write patterns: write-through and write-behind
- Invalidation, TTL, and eviction policies
- Cache problems and how to mitigate them
- Practical example with Redis
- What a cache is and why it works
A cache is an intermediate store, fast and of limited capacity, that keeps copies of data to serve them without repeating the work of obtaining them from the source. It works thanks to two principles:
- Temporal locality: a piece of data queried now is likely to be queried again soon.
- Pareto principle: a small percentage of the data concentrates the majority of the accesses (the best-selling products, the most active users).
The two fundamental metrics are the hit ratio (percentage of requests served from the cache) and the latency. A hit avoids going to the source; a miss involves the full cost plus that of storing in the cache.
- Cache levels
The cache appears at many layers of an architecture. From closest to the user to closest to the data:
| Level | Where it lives | Example | Scope |
|---|---|---|---|
| Browser / client | User's device | Cache-Control headers |
One user |
| CDN | Edge network | Cloudflare, CloudFront | Global, static content |
| Gateway / reverse proxy | Front end of the system | Nginx, Varnish | All requests |
| Application cache (local) | Process memory | Caffeine, Guava | One instance |
| Distributed cache | Shared external service | Redis, Memcached | All instances |
| Database cache | DB engine | Buffer pool | Internal |
The key distinction for the architect is between the local cache (in-process: blazing fast but not shared and lost on restart) and the distributed cache (Redis: slightly slower due to the network, but shared across all instances and persistent). In systems with several instances, the distributed cache prevents each one from keeping inconsistent copies.
- Read patterns: cache-aside and read-through
3.1 Cache-Aside (Lazy Loading)
This is the most common pattern. The application manages the cache explicitly: it looks in the cache first and, if it isn't there, goes to the database and stores the result.
public Product getProduct(long id) {
String key = "product:" + id;
Product cached = cache.get(key); // 1. is it in the cache?
if (cached != null) {
return cached; // HIT: we return without touching the DB
}
Product product = repository.findById(id); // 2. MISS: we go to the DB
if (product != null) {
cache.set(key, product, Duration.ofMinutes(10)); // 3. we store with a TTL
}
return product;
}Step by step:
- Step 1: the cache is queried. If there is a hit, it is returned immediately.
- Step 2: in case of a miss, we go to the data source.
- Step 3: it is stored in the cache with a TTL (time to live) of 10 minutes for future requests.
Advantage: only what is actually requested is cached (lazy). Drawback: the cache logic gets mixed with the business logic and the first access is always slow (a mandatory cache miss).
3.2 Read-Through
The cache is responsible for loading the data from the source when it is missing; the application only talks to the cache. The difference from cache-aside is one of responsibility: here the loading code lives inside the cache layer (configured with a "cache loader"), not in the service.
// The cache knows how to load what it doesn't have; the service just asks
LoadingCache<Long, Product> cache = Caffeine.newBuilder()
.expireAfterWrite(Duration.ofMinutes(10))
.build(id -> repository.findById(id)); // loader: invoked only on a miss
Product p = cache.get(42L); // if it's not there, Caffeine calls the loader for us| Aspect | Cache-Aside | Read-Through |
|---|---|---|
| Who loads from the source | The application | The cache (loader) |
| Coupling | Cache logic in the service | Encapsulated in the cache |
| Control | Maximum | Less, cleaner |
- Write patterns: write-through and write-behind
When data changes, you have to decide how the cache is updated.
4.1 Write-Through
Each write goes to the cache and to the database synchronously, in the same operation.
public void updatePrice(long id, BigDecimal newPrice) {
Product p = repository.findById(id);
p.setPrice(newPrice);
repository.save(p); // 1. writes to the DB
cache.set("product:" + id, p, Duration.ofMinutes(10)); // 2. and to the cache
}- Advantage: the cache is never stale relative to the database; the next read is always consistent.
- Drawback: each write pays the cost of updating both stores, increasing write latency.
4.2 Write-Behind (Write-Back)
The write goes first to the cache and is persisted to the database asynchronously, deferred (in batches, after a delay).
- Advantage: very fast writes; it allows grouping and absorbing spikes.
- Serious drawback: if the cache goes down before flushing to the database, data is lost. Only suitable when that loss is tolerated or the durability of the cache is ensured.
| Pattern | Write latency | Loss risk | Cache-DB consistency |
|---|---|---|---|
| Write-through | High (synchronous double) | Low | Strong |
| Write-behind | Low (asynchronous) | High if the cache goes down | Eventual |
- Invalidation, TTL, and eviction policies
The central challenge: when do cached data stop being valid? There are two complementary approaches.
5.1 TTL expiration
Each entry is assigned a Time To Live: after that time, the cache considers it expired and reloads it on the next access. It is simple and self-cleaning.
- Short TTL: fresher data, lower hit ratio.
- Long TTL: better hit ratio, greater risk of serving stale data.
The choice depends on how much staleness the business tolerates. A catalog can tolerate minutes; a balance, seconds or none.
5.2 Explicit invalidation
When a piece of data changes, we delete or update its cache entry immediately:
public void updateProduct(Product p) {
repository.save(p);
cache.delete("product:" + p.getId()); // invalidates; the next access will reload
}Deleting (instead of updating) is often safer: it avoids caching a half-computed value.
5.3 Eviction policies
Since the cache has limited capacity, when it fills up it must evict entries:
| Policy | Criterion | Ideal when |
|---|---|---|
| LRU (Least Recently Used) | Evicts the least recently used | There is temporal locality (the usual case) |
| LFU (Least Frequently Used) | Evicts the least frequent | There is stable "hot" data |
| FIFO | Evicts the oldest | Simple cases |
| TTL/Random | By expiration or at random | When the pattern is uniform |
- Cache problems and how to mitigate them
- Cache Stampede (thundering herd): when a very popular entry expires, thousands of simultaneous requests suffer a miss at the same time and hit the database in unison, potentially bringing it down. Mitigations: (a) a lock or single-flight so that only one request recomputes while the others wait; (b) early recomputation (refresh before it expires); (c) TTL with jitter (add randomness so they don't all expire at the same time).
- Cache Penetration: queries for keys that do not exist in the database; they are never cached and always hit the source. Mitigation: cache the "does not exist" (a null value with a short TTL) or use a Bloom filter.
- Cache Avalanche: many entries expire simultaneously (e.g., all with the same TTL set at startup). Mitigation: staggered TTL with jitter.
- Stale data: the data changed in the database but the cache still serves the old value. It is the inherent cost; it is managed with the combination of an appropriate TTL and explicit invalidation.
- Practical example with Redis
Redis is the most widely used distributed cache: an in-memory key-value store, blazing fast and shared across instances. Let's look at cache-aside against Redis with anti-stampede protection.
public Product get(long id) {
String key = "product:" + id;
String json = redis.get(key); // 1. query to Redis
if (json != null) {
return deserialize(json); // HIT
}
// 2. MISS: we try to acquire a lock to avoid the stampede
String lockKey = "lock:" + key;
boolean acquired = redis.set(lockKey, "1", SetParams.setParams().nx().px(3000));
if (!acquired) {
Thread.sleep(50); // another thread is recomputing: we wait
return get(id); // we retry: it's probably there already
}
try {
Product p = repository.findById(id); // 3. only ONE thread goes to the DB
int ttl = 600 + new Random().nextInt(60); // 4. TTL with jitter (600-660s)
redis.setex(key, ttl, serialize(p)); // stores with expiration
return p;
} finally {
redis.del(lockKey); // 5. we release the lock
}
}Analysis of the code:
- Step 1: direct read from Redis with
GET. If there is a value, it's a hit and we deserialize. - Step 2: on a miss, we try
SET ... NX PX 3000.NXmeans "only if it does not exist," so only one thread obtains the lock;PX 3000gives it a 3 s expiration so the lock doesn't hang if the thread dies. - If we don't get the lock, we wait a bit and retry: by then, the "winning" thread has probably already populated the cache.
- Step 3: only the thread with the lock queries the database, avoiding the stampede.
- Step 4: we store with
SETEXand a TTL with jitter (600 to 660 s) so that not all keys expire at the same time (anti-avalanche). - Step 5: we release the lock in the
finally, no matter what.
And the invalidation on update:
public void update(Product p) {
repository.save(p);
redis.del("product:" + p.getId()); // invalidates; the next GET will reload from the DB
}Common Mistakes and Tips
- Caching data that changes constantly. If a piece of data changes faster than it is read, the cache almost never hits and adds complexity without benefit. Cache what is read a lot and changes little.
- Not setting a TTL. A cache without expiration accumulates stale data indefinitely. Always set a TTL, even a long one, as a safety net.
- Identical TTLs for all keys. This causes avalanches. Add jitter.
- Local cache in multi-instance systems without coordination. Each instance has its own copy; when you invalidate on one, the others keep serving the old value. Use a distributed cache or an invalidation channel (pub/sub).
- Treating the cache as the source of truth. The cache is a disposable copy; the database is the authority. Your system must work (more slowly) even if the cache is emptied entirely.
- Tip: measure the hit ratio in production. A cache with a low hit ratio is not helping; review what you cache and the TTLs.
Exercises
Exercise 1. Explain the difference between cache-aside and read-through in terms of "who is responsible for loading the data from the source."
Exercise 2. A very popular product has a TTL of 600 s. Right when it expires, 5,000 requests arrive in the same second. Describe what problem occurs and propose two concrete mitigations.
Exercise 3. You want to cache the balance of a bank account that must always appear up to date after a transfer. Which write and invalidation pattern would you use and why? Which pattern would you avoid?
Solutions
Solution 1. In cache-aside, the responsibility for loading the data from the source falls on the application: the code checks the cache and, in case of a miss, queries the database and repopulates it manually. In read-through, that responsibility is assumed by the cache itself through a configured loader; the application only asks the cache for the data, which internally loads it if missing.
Solution 2. The problem is a cache stampede (thundering herd): when the entry expires, the 5,000 requests suffer a simultaneous miss and hit the database at once, potentially saturating it. Two mitigations: (a) a single-flight lock with Redis SET NX so that only one request recomputes while the others wait and reuse the result; (b) TTL with jitter and/or early refresh of the value before it expires, so that there is never a window of massive miss.
Solution 3. I would use write-through (write to the cache and the database synchronously) or, simpler and safer, explicit invalidation: after persisting the transfer, delete the balance key so that the next read reloads it up to date from the database. I would avoid write-behind, because its asynchronous persistence can lose data if the cache goes down, something unacceptable for a bank balance. In addition, a very short TTL would be advisable as a safety net.
Conclusion
You have completed the tour of caching: you know what it is and why it works, at which levels it appears (from the CDN to Redis), how to choose between read patterns (cache-aside, read-through) and write patterns (write-through, write-behind), and how to manage invalidation by combining TTL, explicit deletion, and eviction policies. You also know the classic dangers (stampede, penetration, avalanche, stale data) and how to mitigate them with locks, jitter, and negative caching, illustrated with a real example in Redis. With this lesson you close Module 7, in which you have learned to decide where to store data (SQL vs NoSQL), how to access it cleanly (Repository, Unit of Work, DAO), how to manage it in distributed systems (database per service, Sagas, CQRS), and how to speed up its reading without sacrificing consistency (caching). The next module takes us outside our own data center to explore cloud architecture and deployment.
Application Architecture Course
Module 1: Fundamentals of Application Architecture
- What Is Application Architecture?
- The Role of the Software Architect
- Quality Attributes and Non-Functional Requirements
- Architectural Decisions and Trade-offs
- Architecture Documentation: Views and the C4 Model
Module 2: Design Principles and Tactics
- Coupling, Cohesion and Separation of Concerns
- SOLID Principles Applied to Architecture
- DRY, KISS, YAGNI and Other Design Principles
- Architectural Tactics for Quality Attributes
- Managing Technical Debt
Module 3: Architectural Styles and Patterns
- Monolithic Architecture
- Layered Architecture (N-Tier)
- Client-Server Architecture
- Hexagonal Architecture (Ports and Adapters)
- Clean and Onion Architecture
Module 4: Distributed Architectures and Microservices
- Introduction to Distributed Systems
- Microservices Architecture
- Service Decomposition and Bounded Contexts
- API Gateway, Service Discovery and Inter-Service Communication
- Resilience Patterns: Circuit Breaker, Retry and Bulkhead
- The CAP Theorem and Data Consistency
Module 5: Event-Driven Architectures and Messaging
- Fundamentals of Event-Driven Architecture
- Asynchronous Messaging: Queues and Brokers
- Event Patterns: Event Sourcing and CQRS
- Managing Distributed Transactions: The Saga Pattern
- Real-Time Data Streaming
Module 6: Domain-Driven Design (DDD)
- Core DDD Concepts
- Strategic Design: Bounded Contexts and Ubiquitous Language
- Tactical Design: Entities, Aggregates and Repositories
- Context Mapping
Module 7: Data and Persistence
- Persistence Strategies: SQL vs NoSQL
- Data Access Patterns: Repository, Unit of Work and DAO
- Database per Service and Distributed Data Management
- Caching and Invalidation Strategies
Module 8: Cloud Architecture and Deployment
- Cloud Computing Fundamentals (IaaS, PaaS, SaaS)
- Containers and Orchestration with Docker and Kubernetes
- Serverless Architecture
- Cloud-Native Design Patterns
- Infrastructure as Code (IaC)
Module 9: Quality, Security and Observability
- Scalability: Horizontal vs Vertical and Load Balancing
- High Availability and Fault Tolerance
- Security by Design and Authentication/Authorization
- Observability: Logging, Metrics and Tracing
- Performance and Load Testing
