We begin the chapter on messaging and events, the pieces that connect the different components of a modern architecture. The first and most fundamental is SQS (Simple Queue Service): AWS's queue service. We already mentioned it as a Lambda trigger (subchapter 14.2); now we'll understand it in depth.

The problem: connecting services without making them dependent on each other

Imagine an online store. When a customer places an order, you need to: charge, update inventory, send an email, notify the warehouse... If the service that receives the order had to do all that at once and wait for each step to finish, it would be slow and fragile: if the email service is down, does the entire order get lost?

The solution is to decouple the services with a queue in between. The order service just leaves a "message" ("process this order") in the queue and moves on. Other services pull messages from the queue and process them at their own pace.

What is an SQS queue

A queue is like a shared to-do list. One service puts messages (tasks) in on one side, and another service takes them out and processes them on the other.

  Producer                SQS Queue                 Consumer
 (puts messages)    [msg][msg][msg][msg]      (takes and processes)
       │ ──────────────►                ──────────────► │
  • Producer: the one who creates messages and puts them in the queue (e.g., the order service).
  • Consumer: the one who takes messages out and processes them (e.g., a Lambda or a server).

Analogy: an SQS queue is like the order slip in a restaurant. The waiter (producer) writes down the orders and hangs them in the kitchen. The cooks (consumers) pick up the slips and prepare them at their own pace. The waiter doesn't wait in the kitchen: he leaves the slip and keeps serving tables. If many orders come in at once, they pile up and get done as time allows; nothing is lost.

The big advantage: decoupling and resilience

The queue separates producer and consumer, which brings huge advantages:

  • Resilience: if the consumer goes down, the messages wait in the queue without being lost. When it comes back, it processes them. The producer doesn't even notice.
  • Peak smoothing: if a flood of messages arrives (Black Friday), the queue accumulates them and the consumer processes them at a sustainable pace, without being overwhelmed.
  • Independent scaling: you can add more consumers to empty the queue faster, without touching the producer.

We'll dive deeper into this decoupling in subchapter 15.4.

The two types of queues: Standard vs FIFO

SQS offers two types of queues, and choosing the right one is important.

Standard Queue

This is the default queue. It prioritizes maximum throughput (handles huge amounts of messages). In exchange, it has two peculiarities you should know:

  • No order guarantee: messages may be processed in a slightly different order than they arrived.
  • "At least once" delivery: on rare occasions, a message could be delivered duplicated.

FIFO Queue (First In, First Out)

FIFO means "first in, first out." It guarantees two things that the standard queue does not:

  • Strict order: messages are processed exactly in the order they arrived.
  • No duplicates: each message is delivered exactly once.

In exchange, it has more limited throughput than the standard queue (though more than enough for most cases).

Which should I choose?

Standard Queue FIFO Queue
Order Not guaranteed Strict (order of arrival)
Duplicates Possible (rare) No (exactly once)
Throughput Very high High, but limited
When to use When order doesn't matter When order or duplicates are critical

Examples to decide:

  • Standard: resizing uploaded images. The order in which they're processed doesn't matter, and processing one twice isn't a big deal.
  • FIFO: bank account operations. Order matters ("deposit 100" before "withdraw 50") and a duplicate would be a disaster (charging twice).

Practical rule: use standard by default; use FIFO only when order or avoiding duplicates are really important.

Dead Letter Queue (DLQ): the queue for problematic messages

What happens if a message can't be processed? For example, it comes with corrupted data and the function fails every time it tries. Without protection, that message would be retried over and over, blocking the queue and wasting resources (a "poisoned message").

That's what the Dead Letter Queue (DLQ) is for, the "dead messages queue." You configure it so that, after a number of failed attempts (for example, 3), the problematic message is automatically moved to a separate queue:

  Main queue
   [msg ✓][msg ✗][msg ✓]
            │
            │ fails 3 times
            ▼
  Dead Letter Queue (DLQ)
   [msg ✗]  ← here it's saved for later review

This way, the problematic message stops blocking the main queue, but is not lost: it's saved in the DLQ so you can investigate it later.

Analogy: the DLQ is like the "problem orders" drawer in the kitchen. If an order slip is illegible or impossible to prepare, instead of blocking the cooks, it's set aside in that drawer for the manager to review later. The kitchen keeps running.

What you should remember

  • SQS is AWS's queue service: a producer puts messages in and a consumer takes them out and processes them at their own pace.
  • The queue decouples services: it provides resilience (messages wait if the consumer goes down), smooths peaks, and allows you to scale consumers independently.
  • Standard queue: maximum throughput, but no guaranteed order and possible duplicates. It's the default option.
  • FIFO queue: guarantees strict order and no duplicates, with more limited throughput. Use it when order or duplicates are critical (e.g., banking operations).
  • The Dead Letter Queue (DLQ) collects messages that repeatedly fail, preventing them from blocking the queue without losing them, so you can review them later.

In the next subchapter we'll see the complement to queues: SNS, the notification service, which instead of "one to one" lets you send a message to many recipients at once.

Cloud, AWS & Terraform — From Zero to Expert

Chapter 1 · What is cloud computing

Chapter 2 · The cloud market and major providers

Chapter 3 · Regions, availability zones and edge

Chapter 4 · Compute: EC2

Chapter 5 · Storage: S3

Chapter 6 · Networking: VPC

Chapter 7 · Identity and access: IAM

Chapter 8 · Managed databases

Chapter 9 · Why Infrastructure as Code

Chapter 10 · HCL: the Terraform language

Chapter 11 · Providers and state

Chapter 12 · Your first real infrastructure in Terraform

Chapter 13 · Load balancing and auto scaling

Chapter 14 · Serverless with Lambda

Chapter 15 · Messaging and events

Chapter 16 · Content delivery and DNS

Chapter 17 · Containers on AWS

Chapter 18 · Modules: reuse and composition

Chapter 19 · Workspaces and environment management

Chapter 20 · Remote backends and locking

Chapter 21 · Infrastructure testing

Chapter 22 · Terraform in CI/CD

Chapter 23 · Defense in depth

Chapter 24 · Observability: logs, metrics and traces

Chapter 25 · Cost optimization

Chapter 26 · High availability and disaster recovery

Chapter 27 · AWS Well-Architected Framework

Chapter 28 · Serverless architectures at scale

Chapter 29 · Data platforms on AWS

Chapter 30 · Multi-account and landing zones

Chapter 31 · Platform Engineering and Internal Developer Platform

Chapter 32 · Relevant AWS certifications

Chapter 33 · Projects to consolidate what you've learned

Chapter 34 · Resources and community

© Copyright 2024. All rights reserved