In the previous lesson we saw what an event is and who produces or consumes it. Now we go one level down: how does that message physically travel from one service to another reliably when the two do not know each other and may be starting up, going down, or overloaded at different times? The answer is asynchronous messaging through message brokers. A broker is an intermediate piece of infrastructure that receives messages from producers, stores them durably, and delivers them to consumers when the latter are ready.
Mastering asynchronous messaging is essential because the reliability and consistency of any distributed system depend on it. In this lesson we will study the two basic primitives (queues and topics), delivery guarantees, the problem of duplicates and how to solve it with idempotency, and we will compare the three most widely used technologies on the market: RabbitMQ, Apache Kafka, and Amazon SQS.
Contents
- Why asynchronous messaging?
- Queues (point-to-point) vs Topics (publish/subscribe)
- Delivery guarantees: at-most-once, at-least-once, exactly-once
- The problem of duplicates and idempotency
- Comparison: RabbitMQ vs Kafka vs Amazon SQS
- Practical example: producer and idempotent consumer
- Common mistakes and tips
- Exercises and solutions
- Conclusion
- Why asynchronous messaging?
In a synchronous call, if service B is down, A's call fails. With asynchronous messaging, A deposits the message in the broker and keeps working; when B recovers, it will consume it. This provides:
- Temporal decoupling: producer and consumer do not need to be alive at the same time.
- Peak buffering: if 10,000 messages arrive at once, the broker holds them and the consumer processes them at its own pace.
- Resilience: a slow or down consumer does not bring down the producer.
- Queues (point-to-point) vs Topics (publish/subscribe)
There are two message distribution models.
Queue (point-to-point)
A message in a queue is processed by a single consumer. If there are several consumers connected to the same queue, the broker distributes (balances) the messages among them, but each message goes to only one. It is used to distribute work (work queue).
Topic (publish/subscribe)
A message published to a topic is delivered to all subscribers. It is used to broadcast events to multiple interested parties.
flowchart TB
subgraph Queue["QUEUE (point-to-point)"]
P1[Producer] --> Q[(Queue)]
Q --> CA[Consumer A]
Q --> CB[Consumer B]
Q -. each message to ONLY one .-> CA
end
subgraph Topic["TOPIC (pub/sub)"]
P2[Producer] --> T{{Topic}}
T --> SA[Subscriber A]
T --> SB[Subscriber B]
T -. each message to ALL .-> SA
end| Aspect | Queue (point-to-point) | Topic (pub/sub) |
|---|---|---|
| Receivers per message | One | All subscribers |
| Goal | Distribute the workload | Broadcast events |
| Typical pattern | Commands / tasks | Domain events |
| Example | Email-sending queue | "OrderCreated" to inventory, billing, shipping |
- Delivery guarantees: at-most-once, at-least-once, exactly-once
When a message travels over the network, acknowledgments (acks) can be lost, processes can crash, etc. That is why different levels of guarantee exist:
| Guarantee | Meaning | Risk | Cost |
|---|---|---|---|
| At-most-once | Delivered 0 or 1 times | A message may be lost | Minimal (no retries) |
| At-least-once | Delivered 1 or more times | May be duplicated | Medium (requires retries and acks) |
| Exactly-once | Delivered exactly 1 time | None (ideal) | High (complex, not always real) |
Key points:
- At-most-once: the consumer acknowledges before processing. If it fails, the message is lost. Only valid if losing some data is acceptable (e.g., non-critical metrics).
- At-least-once: the consumer acknowledges after processing successfully. If it fails just before the ack, the message is redelivered → there may be duplicates. It is the most common and recommended level.
- Exactly-once: semantically perfect, but costly. In pure distributed systems it is very hard to guarantee end to end; it is usually achieved by combining at-least-once with idempotency in the consumer (we will see this next). Kafka offers "exactly-once" within its own boundaries (transactions), but as soon as the effect leaves Kafka (writing to another DB, calling an API), it again depends on idempotency.
Real-world golden rule: design for at-least-once and make your consumers idempotent. In practice, that gives you the effect of exactly-once.
- The problem of duplicates and idempotency
An operation is idempotent if executing it several times produces the same result as executing it once. If our consumers are idempotent, at-least-once duplicates stop being a problem.
Common techniques to achieve idempotency:
- Idempotency key / message ID: record the IDs already processed and discard repeats.
- Naturally idempotent operations:
UPDATE balance SET value = 100(absolute assignment) is idempotent;UPDATE balance SET value = value + 100(increment) is not. UPSERTwith a unique key: insert or ignore if it already exists.
-- Table that records which messages we have already processed.
CREATE TABLE processed_messages (
message_id VARCHAR(64) PRIMARY KEY, -- idempotency key
processed_at TIMESTAMP NOT NULL
);
-- Upon receiving a message, we try to insert its ID.
-- If it already exists (duplicate primary key), we know it is a duplicate.
INSERT INTO processed_messages (message_id, processed_at)
VALUES ('msg-abc-123', CURRENT_TIMESTAMP)
ON CONFLICT (message_id) DO NOTHING; -- PostgreSQL: ignore if it already existsExplanation:
message_idis the primary key: the database guarantees there are no two identical ones.ON CONFLICT ... DO NOTHINGmakes the second attempt to insert the same ID produce neither an error nor an effect. This way we detect the duplicate without complex additional logic.- If the
INSERTaffected 0 rows, it was a duplicate and we can skip processing.
- Comparison: RabbitMQ vs Kafka vs Amazon SQS
| Characteristic | RabbitMQ | Apache Kafka | Amazon SQS |
|---|---|---|---|
| Main model | Queue broker (AMQP) | Distributed event log | Managed queue (cloud) |
| Paradigm | Queues + exchanges (pub/sub) | Partitioned topics + offsets | Queues (Standard and FIFO) |
| Message retention | Until consumed (deleted) | Configurable (days/weeks), replayable | Up to 14 days |
| Replay | Not native | Yes (re-read from an offset) | No |
| Ordering | Per queue | Per partition | FIFO only in FIFO queues |
| Throughput | High | Very high (millions/s) | High (auto-scaling) |
| Typical guarantee | At-least-once | At-least-once / exactly-once* | At-least-once (Std) / exactly-once (FIFO) |
| Management | Self-managed / cloud | Self-managed / managed | Fully managed (AWS) |
| Ideal case | Complex routing, RPC, tasks | Streaming, event sourcing, big data | Simple decoupling in AWS without operating infra |
Practical summary:
- RabbitMQ: excellent when you need flexible routing (exchanges with rules) and traditional work-queue patterns.
- Kafka: the choice for high volume, retention, and replay of events; the foundation of event sourcing and streaming (lesson 05-05).
- SQS: the simplest option if you are already on AWS and just want to decouple without managing servers.
- Practical example: producer and idempotent consumer
Let's look at a Kafka consumer in Spring that applies at-least-once + idempotency.
@Component
public class PaymentConsumer {
private final IdempotencyRepository idempotency;
private final AccountingService accounting;
public PaymentConsumer(IdempotencyRepository idempotency,
AccountingService accounting) {
this.idempotency = idempotency;
this.accounting = accounting;
}
@KafkaListener(topics = "payments.confirmed", groupId = "accounting")
public void consume(PaymentConfirmedEvent event, Acknowledgment ack) {
// 1. Have we already processed this message? -> idempotency
if (!idempotency.registerIfNew(event.paymentId())) {
ack.acknowledge(); // duplicate: we acknowledge and exit
return;
}
// 2. Actual business logic
accounting.postEntry(event);
// 3. We acknowledge ONLY after processing successfully (at-least-once)
ack.acknowledge();
}
}Detailed explanation:
@KafkaListenersubscribes the method to the topicpayments.confirmed. ThegroupId"accounting" identifies this consumer group; Kafka distributes the partitions among the group members.registerIfNew(...)attempts to insert the ID (as in the SQL above). It returnsfalseif it already existed → it is a duplicate, we acknowledge and exit without reprocessing.- The acknowledgment (
ack.acknowledge()) is done afterpostEntry. If the process dies before the ack, Kafka will redeliver the message (at-least-once), but idempotency will prevent the double entry.
# Spring Kafka configuration for manual acknowledgment (key to at-least-once)
spring:
kafka:
consumer:
group-id: accounting
enable-auto-commit: false # do NOT acknowledge automatically
auto-offset-reset: earliest # read from the beginning if there is no offset
listener:
ack-mode: manual # we acknowledge ourselves with ack.acknowledge()enable-auto-commit: falseis essential: if Kafka acknowledged on its own, it could acknowledge before we finished processing, and we would lose messages upon a failure.ack-mode: manualdelegates to our code the exact moment of acknowledgment.
Common Mistakes and Tips
- Acknowledging before processing. It turns your at-least-once into an accidental at-most-once and you lose messages upon failures. Always acknowledge at the end.
- Believing that "exactly-once" eliminates the need for idempotency. As soon as the effect crosses the broker's boundary (another DB, an external API), you need idempotency all the same.
- Not sizing the dead-letter queue (DLQ). Messages that fail repeatedly must go to a discard queue so they do not block the main queue in an infinite loop of retries.
- Using a topic when you wanted a queue (or vice versa). Broadcasting a command to "everyone" can execute the same action N times.
- Tip: always define a unique
message_idfield in your events from day one; adding it later is painful.
Exercises
- A team processes payments with a consumer that acknowledges the message right upon receiving it and then contacts the banking gateway. If the process dies between the ack and the banking call, what real guarantee do they have and what problem occurs? How would you fix it?
- Indicate for each case whether you would use a queue or a topic: (a) send the welcome email exactly once; (b) notify inventory, billing, and CRM of a new order; (c) distribute 1,000 PDF-generation tasks among 5 workers.
- Write an idempotent SQL statement to "mark an order as paid" that can be executed several times without side effects.
Solutions
- They have at-most-once: if it dies after the ack, the message is not redelivered and the payment never reaches the gateway (it is lost). Fix: acknowledge after the banking call (at-least-once) and make the operation idempotent using the payment's idempotency key so as not to charge twice upon a retry.
- (a) Queue (a single receiver, a single time). (b) Topic (three subscribers receive the event). (c) Queue (work distribution among workers).
- For example:
-- Absolute assignment of the state: idempotent. UPDATE orders SET status = 'PAID', paid_at = COALESCE(paid_at, CURRENT_TIMESTAMP) WHERE order_id = :orderId AND status <> 'PAID';
Running it again changes nothing because the condition status <> 'PAID' is no longer met and paid_at is preserved with COALESCE.
Conclusion
Asynchronous messaging is the circulatory system of event-driven architectures. We distinguished queues (one receiver, load distribution) from topics (all subscribers, broadcast). We understood the three delivery guarantees and why at-least-once + idempotency is the pragmatic combination the industry uses. Finally, we compared RabbitMQ, Kafka, and SQS to know when to choose each one.
In the next lesson, "Event Patterns: Event Sourcing and CQRS", we will see how, instead of storing only the current state, we can store the complete sequence of events as the source of truth, and how to separate the read and write models to scale.
Application Architecture Course
Module 1: Fundamentals of Application Architecture
- What Is Application Architecture?
- The Role of the Software Architect
- Quality Attributes and Non-Functional Requirements
- Architectural Decisions and Trade-offs
- Architecture Documentation: Views and the C4 Model
Module 2: Design Principles and Tactics
- Coupling, Cohesion and Separation of Concerns
- SOLID Principles Applied to Architecture
- DRY, KISS, YAGNI and Other Design Principles
- Architectural Tactics for Quality Attributes
- Managing Technical Debt
Module 3: Architectural Styles and Patterns
- Monolithic Architecture
- Layered Architecture (N-Tier)
- Client-Server Architecture
- Hexagonal Architecture (Ports and Adapters)
- Clean and Onion Architecture
Module 4: Distributed Architectures and Microservices
- Introduction to Distributed Systems
- Microservices Architecture
- Service Decomposition and Bounded Contexts
- API Gateway, Service Discovery and Inter-Service Communication
- Resilience Patterns: Circuit Breaker, Retry and Bulkhead
- The CAP Theorem and Data Consistency
Module 5: Event-Driven Architectures and Messaging
- Fundamentals of Event-Driven Architecture
- Asynchronous Messaging: Queues and Brokers
- Event Patterns: Event Sourcing and CQRS
- Managing Distributed Transactions: The Saga Pattern
- Real-Time Data Streaming
Module 6: Domain-Driven Design (DDD)
- Core DDD Concepts
- Strategic Design: Bounded Contexts and Ubiquitous Language
- Tactical Design: Entities, Aggregates and Repositories
- Context Mapping
Module 7: Data and Persistence
- Persistence Strategies: SQL vs NoSQL
- Data Access Patterns: Repository, Unit of Work and DAO
- Database per Service and Distributed Data Management
- Caching and Invalidation Strategies
Module 8: Cloud Architecture and Deployment
- Cloud Computing Fundamentals (IaaS, PaaS, SaaS)
- Containers and Orchestration with Docker and Kubernetes
- Serverless Architecture
- Cloud-Native Design Patterns
- Infrastructure as Code (IaC)
Module 9: Quality, Security and Observability
- Scalability: Horizontal vs Vertical and Load Balancing
- High Availability and Fault Tolerance
- Security by Design and Authentication/Authorization
- Observability: Logging, Metrics and Tracing
- Performance and Load Testing
