We begin the chapter on messaging and events, the pieces that connect the different components of a modern architecture. The first and most fundamental is SQS (Simple Queue Service): AWS's queue service. We already mentioned it as a Lambda trigger (subchapter 14.2); now we'll understand it in depth.
The problem: connecting services without making them dependent on each other
Imagine an online store. When a customer places an order, you need to: charge, update inventory, send an email, notify the warehouse... If the service that receives the order had to do all that at once and wait for each step to finish, it would be slow and fragile: if the email service is down, does the entire order get lost?
The solution is to decouple the services with a queue in between. The order service just leaves a "message" ("process this order") in the queue and moves on. Other services pull messages from the queue and process them at their own pace.
What is an SQS queue
A queue is like a shared to-do list. One service puts messages (tasks) in on one side, and another service takes them out and processes them on the other.
Producer SQS Queue Consumer
(puts messages) [msg][msg][msg][msg] (takes and processes)
│ ──────────────► ──────────────► │- Producer: the one who creates messages and puts them in the queue (e.g., the order service).
- Consumer: the one who takes messages out and processes them (e.g., a Lambda or a server).
Analogy: an SQS queue is like the order slip in a restaurant. The waiter (producer) writes down the orders and hangs them in the kitchen. The cooks (consumers) pick up the slips and prepare them at their own pace. The waiter doesn't wait in the kitchen: he leaves the slip and keeps serving tables. If many orders come in at once, they pile up and get done as time allows; nothing is lost.
The big advantage: decoupling and resilience
The queue separates producer and consumer, which brings huge advantages:
- Resilience: if the consumer goes down, the messages wait in the queue without being lost. When it comes back, it processes them. The producer doesn't even notice.
- Peak smoothing: if a flood of messages arrives (Black Friday), the queue accumulates them and the consumer processes them at a sustainable pace, without being overwhelmed.
- Independent scaling: you can add more consumers to empty the queue faster, without touching the producer.
We'll dive deeper into this decoupling in subchapter 15.4.
The two types of queues: Standard vs FIFO
SQS offers two types of queues, and choosing the right one is important.
Standard Queue
This is the default queue. It prioritizes maximum throughput (handles huge amounts of messages). In exchange, it has two peculiarities you should know:
- No order guarantee: messages may be processed in a slightly different order than they arrived.
- "At least once" delivery: on rare occasions, a message could be delivered duplicated.
FIFO Queue (First In, First Out)
FIFO means "first in, first out." It guarantees two things that the standard queue does not:
- Strict order: messages are processed exactly in the order they arrived.
- No duplicates: each message is delivered exactly once.
In exchange, it has more limited throughput than the standard queue (though more than enough for most cases).
Which should I choose?
| Standard Queue | FIFO Queue | |
|---|---|---|
| Order | Not guaranteed | Strict (order of arrival) |
| Duplicates | Possible (rare) | No (exactly once) |
| Throughput | Very high | High, but limited |
| When to use | When order doesn't matter | When order or duplicates are critical |
Examples to decide:
- Standard: resizing uploaded images. The order in which they're processed doesn't matter, and processing one twice isn't a big deal.
- FIFO: bank account operations. Order matters ("deposit 100" before "withdraw 50") and a duplicate would be a disaster (charging twice).
Practical rule: use standard by default; use FIFO only when order or avoiding duplicates are really important.
Dead Letter Queue (DLQ): the queue for problematic messages
What happens if a message can't be processed? For example, it comes with corrupted data and the function fails every time it tries. Without protection, that message would be retried over and over, blocking the queue and wasting resources (a "poisoned message").
That's what the Dead Letter Queue (DLQ) is for, the "dead messages queue." You configure it so that, after a number of failed attempts (for example, 3), the problematic message is automatically moved to a separate queue:
Main queue
[msg ✓][msg ✗][msg ✓]
│
│ fails 3 times
▼
Dead Letter Queue (DLQ)
[msg ✗] ← here it's saved for later reviewThis way, the problematic message stops blocking the main queue, but is not lost: it's saved in the DLQ so you can investigate it later.
Analogy: the DLQ is like the "problem orders" drawer in the kitchen. If an order slip is illegible or impossible to prepare, instead of blocking the cooks, it's set aside in that drawer for the manager to review later. The kitchen keeps running.
What you should remember
- SQS is AWS's queue service: a producer puts messages in and a consumer takes them out and processes them at their own pace.
- The queue decouples services: it provides resilience (messages wait if the consumer goes down), smooths peaks, and allows you to scale consumers independently.
- Standard queue: maximum throughput, but no guaranteed order and possible duplicates. It's the default option.
- FIFO queue: guarantees strict order and no duplicates, with more limited throughput. Use it when order or duplicates are critical (e.g., banking operations).
- The Dead Letter Queue (DLQ) collects messages that repeatedly fail, preventing them from blocking the queue without losing them, so you can review them later.
In the next subchapter we'll see the complement to queues: SNS, the notification service, which instead of "one to one" lets you send a message to many recipients at once.
Cloud, AWS & Terraform — From Zero to Expert
Chapter 1 · What is cloud computing
- 1.1 The traditional client-server model
- 1.2 Problems the cloud came to solve
- 1.3 On-premise vs cloud vs hybrid
- 1.4 The three service models: IaaS, PaaS, SaaS
- 1.5 The five pillars of cloud (according to NIST)
- 1.6 Real advantages: elasticity, pay-as-you-go, global availability
Chapter 2 · The cloud market and major providers
- 2.1 AWS, Azure and GCP: differences and market share
- 2.2 Why learn AWS first
- 2.3 Concepts that are universal among providers
Chapter 3 · Regions, availability zones and edge
- 3.1 What is an AWS region and how to choose it
- 3.2 Availability Zones: high availability by design
- 3.3 Edge locations and CloudFront
- 3.4 Latency, resilience and data sovereignty
Chapter 4 · Compute: EC2
- 4.1 Instances: types, families and when to choose each
- 4.2 AMIs, key pairs and Security Groups
- 4.3 Instance lifecycle
- 4.4 Elastic IPs and Placement Groups
- 4.5 Savings Plans vs Reserved vs On-Demand vs Spot
Chapter 5 · Storage: S3
- 5.1 Buckets, objects and keys
- 5.2 Storage classes (Standard, IA, Glacier…)
- 5.3 Versioning and object lifecycle
- 5.4 Bucket policies and ACLs
- 5.5 Static website hosting
Chapter 6 · Networking: VPC
- 6.1 What is a VPC and why you need it
- 6.2 Public and private subnets
- 6.3 Internet Gateway and NAT Gateway
- 6.4 Route Tables and Network ACLs
- 6.5 VPC Peering and endpoints
Chapter 7 · Identity and access: IAM
- 7.1 Users, groups, roles and policies
- 7.2 The principle of least privilege
- 7.3 Identity-based vs resource-based policies
- 7.4 MFA and temporary credentials (STS)
- 7.5 IAM security best practices
Chapter 8 · Managed databases
- 8.1 RDS: engines, Multi-AZ and read replicas
- 8.2 Aurora and its advantages over vanilla RDS
- 8.3 DynamoDB: key-value / document model
- 8.4 ElastiCache for in-memory cache
- 8.5 When to use each type of database
Chapter 9 · Why Infrastructure as Code
- 9.1 Problems with manual provisioning
- 9.2 Declarative vs imperative IaC
- 9.3 Terraform vs CloudFormation vs Pulumi vs CDK
- 9.4 The plan → apply → destroy cycle
Chapter 10 · HCL: the Terraform language
- 10.1 Resource, variable, output, locals blocks
- 10.2 Data types: string, number, bool, list, map, object
- 10.3 Expressions, references and built-in functions
- 10.4 Conditionals and loops (count, for_each, for)
Chapter 11 · Providers and state
- 11.1 How the AWS provider works
- 11.2 The terraform.tfstate file and its importance
- 11.3 Local state vs remote state (S3 + DynamoDB)
- 11.4 Essential commands: init, plan, apply, destroy, fmt, validate
Chapter 12 · Your first real infrastructure in Terraform
- 12.1 Create a VPC with subnets from scratch
- 12.2 Launch a public EC2 instance
- 12.3 Associate a Security Group and an Elastic IP
- 12.4 Outputs and references between resources
- 12.5 Team workflow: PR review of plans
Chapter 13 · Load balancing and auto scaling
- 13.1 Application Load Balancer vs Network Load Balancer
- 13.2 Target Groups, listeners and rules
- 13.3 Auto Scaling Groups: policies and metrics
- 13.4 Warm pools and lifecycle hooks
Chapter 14 · Serverless with Lambda
- 14.1 The Lambda execution model
- 14.2 Triggers: API Gateway, S3, DynamoDB Streams, SQS
- 14.3 Dependency management and layers
- 14.4 Cold starts and strategies to reduce them
- 14.5 Limits and anti-patterns
Chapter 15 · Messaging and events
- 15.1 SQS: standard vs FIFO queues, DLQ
- 15.2 SNS: topics, subscriptions, fan-out
- 15.3 EventBridge: event buses and rules
- 15.4 Patterns: pub/sub, decoupling, saga
Chapter 16 · Content delivery and DNS
- 16.1 Route 53: record types and routing policies
- 16.2 CloudFront: distributions, caches and origins
- 16.3 ACM: free SSL/TLS certificates
- 16.4 WAF integrated with CloudFront
Chapter 17 · Containers on AWS
- 17.1 Docker: quick review of key concepts
- 17.2 ECR: private image registry
- 17.3 ECS: task definitions, services, Fargate vs EC2
- 17.4 EKS: when Kubernetes and when not
Chapter 18 · Modules: reuse and composition
- 18.1 Anatomy of a Terraform module
- 18.2 Input variables, outputs and dependencies
- 18.3 Local modules vs Terraform Registry modules
- 18.4 Module versioning with Git tags
- 18.5 Design of generic vs domain-specific modules
Chapter 19 · Workspaces and environment management
- 19.1 Terraform workspaces: use cases and limitations
- 19.2 Directory strategy per environment (dev/stg/prod)
- 19.3 Terragrunt: DRY for environment configurations
- 19.4 Environment variables and .tfvars files
Chapter 20 · Remote backends and locking
- 20.1 Configure S3 + DynamoDB as backend
- 20.2 State locking: avoiding team corruption
- 20.3 State migration between backends
- 20.4 terraform import: bring existing resources into state
Chapter 21 · Infrastructure testing
- 21.1 Terraform validate and fmt in CI
- 21.2 Checkov and tfsec: static security analysis
- 21.3 Terratest: integration tests in Go
- 21.4 Contract testing between modules
Chapter 22 · Terraform in CI/CD
- 22.1 Basic pipeline: lint → plan → apply in GitHub Actions
- 22.2 Atlantis: GitOps for Terraform
- 22.3 Terraform Cloud / HCP Terraform
- 22.4 Drift detection and automatic reconciliation
Chapter 23 · Defense in depth
- 23.1 AWS Organizations and Service Control Policies
- 23.2 AWS Config: continuous compliance
- 23.3 GuardDuty: threat detection
- 23.4 Security Hub: centralized view
- 23.5 KMS: key management and rotation
- 23.6 Secrets Manager vs Parameter Store
Chapter 24 · Observability: logs, metrics and traces
- 24.1 CloudWatch Logs, metrics and alarms
- 24.2 CloudWatch Dashboards and Contributor Insights
- 24.3 X-Ray: distributed tracing
- 24.4 OpenTelemetry on AWS
- 24.5 Managed Grafana and Managed Prometheus
Chapter 25 · Cost optimization
- 25.1 AWS Cost Explorer and budgets with alerts
- 25.2 Trusted Advisor and Compute Optimizer
- 25.3 Rightsizing: how to detect overprovisioning
- 25.4 Savings Plans vs Reserved Instances: strategic decision
- 25.5 FinOps: culture and processes to control spending
Chapter 26 · High availability and disaster recovery
- 26.1 RTO and RPO: defining objectives
- 26.2 Strategies: backup/restore, pilot light, warm standby, multi-site
- 26.3 Route 53 health checks and automatic failover
- 26.4 AWS Backup: centralized backup policy
Chapter 27 · AWS Well-Architected Framework
- 27.1 The six pillars: operational excellence, security, reliability, performance efficiency, cost optimization, sustainability
- 27.2 Well-Architected Tool: formal reviews
- 27.3 How to apply the framework in design decisions
Chapter 28 · Serverless architectures at scale
- 28.1 Event-driven architecture with Lambda + EventBridge
- 28.2 Saga pattern for distributed transactions
- 28.3 Step Functions: orchestration of complex workflows
- 28.4 Lambda@Edge and CloudFront Functions
Chapter 29 · Data platforms on AWS
- 29.1 Data Lake with S3, Glue and Athena
- 29.2 Kinesis Data Streams and Firehose for streaming
- 29.3 Redshift: data warehousing at scale
- 29.4 Lake Formation: data governance
Chapter 30 · Multi-account and landing zones
- 30.1 Why separate workloads into different accounts
- 30.2 AWS Control Tower and Account Factory
- 30.3 Centralized log and security management
- 30.4 Terraform at multi-account scale with shared modules
Chapter 31 · Platform Engineering and Internal Developer Platform
- 31.1 Golden paths and abstractions over Terraform
- 31.2 AWS Service Catalog
- 31.3 Backstage as a developer portal
- 31.4 Terraform modules as internal product
Chapter 32 · Relevant AWS certifications
- 32.1 Cloud Practitioner: is it worth it?
- 32.2 Solutions Architect Associate → Professional
- 32.3 DevOps Engineer Professional
- 32.4 Specialty: Security, Database, Networking
- 32.5 HashiCorp Terraform Associate
Chapter 33 · Projects to consolidate what you've learned
- 33.1 Project 1: serverless blog (S3 + CloudFront + Lambda + DynamoDB)
- 33.2 Project 2: REST API with ECS Fargate + RDS + ALB
- 33.3 Project 3: data platform with Glue + Athena + Redshift
- 33.4 Project 4: multi-account landing zone with Terraform and Control Tower
