We close Part V with a very real problem that affects all infrastructure managed with code: drift. It occurs when the real infrastructure no longer matches what your code says. In this subchapter, you’ll understand why it happens, why it’s dangerous, and how to detect and correct it automatically. It’s the cherry on top of a mature Infrastructure as Code workflow.

What is drift

Remember Terraform’s central idea: your code describes how the infrastructure should be, and Terraform makes reality match it (Chapter 9). Drift is when the real infrastructure deviates from what the code says, without the code having changed.

Your code says:        server with 2 CPUs, port 443 open
Reality is now:        server with 4 CPUs, port 22 also open
                       ↑ someone changed it outside = DRIFT

The code and reality no longer match. That difference is drift.

Why drift occurs

Drift appears when someone or something modifies the infrastructure outside of Terraform:

  • Manual changes: someone logs into the AWS console and modifies a resource “quickly” to solve an emergency (opens a port, changes a size...), without updating the code.
  • Other tools or scripts that touch the same resources.
  • Automatic AWS processes that modify something (rare, but possible).
  • Autoscaling or other systems that change the number of resources.

The most common and dangerous case: a manual “emergency” change. At 3 a.m. there’s an incident, someone logs into the AWS console and changes something by hand to put out the fire, and then forgets to reflect it in the code. From then on, the code lies: it no longer describes reality.

Analogy: drift is like having blueprints of a house that no longer match the real house because someone knocked down a wall without updating the blueprints. If later a builder works guided by the old blueprints, it can cause a disaster, because reality is different.

Why drift is dangerous

Drift undermines all trust in your Infrastructure as Code:

  • The code stops being the source of truth: you can no longer trust that the code describes what’s really there.
  • Surprises in the next apply: when someone applies Terraform again, it will try to “correct” the manual change (revert it to what the code says), which can break what was fixed by hand, or remove a security patch!
  • Hidden security risks: if the manual change opened a dangerous port, the code doesn’t reflect it, so security reviews (Chapter 21) won’t detect it. The hole remains hidden.
  • Loss of reproducibility: if you recreate the infrastructure from the code, you won’t get what was really there, because the code is outdated.

How to detect drift

The good news is that detecting drift is simple, because Terraform already knows how to compare code with reality. Remember that terraform plan (subchapter 11.4) does exactly that: it compares code, state, and reality. If there’s drift, the plan shows it:

terraform plan
   → if NO drift → "No changes" ✓ (code and reality match)
   → if THERE IS drift → shows the differences:
       ~ aws_security_group.web: port 22 open (not in the code) ⚠️

Automatic and periodic detection

The key is not to wait for someone to run a plan by chance. Automatic drift detection consists of running terraform plan periodically (for example, every night) automatically, and alerting if it detects differences:

Every night, automatically:
   terraform plan
      → are there unexpected changes?
         → YES → ALERT the team (Slack, email...): "there is drift in production"
         → NO → all good, nothing to report

This way, if someone made a manual change, the team finds out the next day, not weeks later when it’s already caused a problem. Platforms like HCP Terraform (subchapter 22.3) offer this integrated drift detection; you can also set it up with a scheduled pipeline (remember EventBridge schedules, subchapter 15.3, or a cron in your CI).

Reconciliation: correcting drift

Detecting drift is only half the battle; then you have to reconcile (realign code and reality). There are two ways, depending on which change is the “correct” one:

Option A: code is the truth → revert the manual change

If the manual change shouldn’t have been made, you run terraform apply so that Terraform returns the infrastructure to what the code says, eliminating the deviation.

The manual change was a mistake → terraform apply → returns to what the code says

Option B: the manual change was necessary → update the code

If the manual change was correct (an adjustment that needs to be kept), then you update the code to reflect that change, and submit it via a PR (subchapter 12.5). Now the code is the truth again.

The manual change was valid → update the CODE to reflect it → PR → apply

Automatic reconciliation: some teams configure that, for certain drifts, the system automatically reverts to the state of the code (option A) without intervention. This is powerful to enforce that all changes go through code, but it must be used with care: automatically reverting a change that was a legitimate emergency patch could reopen a problem. That’s why many prefer automatic detection + human decision on how to reconcile.

The underlying lesson: every change, through code

Drift reinforces the central message of all Infrastructure as Code: all changes must be made through code, never by hand. Drift detection is the watchdog that enforces that rule, alerting when someone breaks it.

Real-world example: a company runs drift detection every night in production. One morning, the alert says: “the database Security Group has port 5432 open to the internet, and it’s not in the code.” They investigate: a developer opened it by hand the previous afternoon for a test and forgot to close it. Thanks to drift detection, they discover it within hours (not when an attacker finds it) and fix it. Without that vigilance, that hole could have gone unnoticed for months.

What you should remember

  • Drift is when the real infrastructure no longer matches the code, usually due to manual changes made outside of Terraform (the classic emergency patch not reflected in the code).
  • It’s dangerous: the code stops being the source of truth, the next apply can revert important changes, it hides security risks, and breaks reproducibility. Like blueprints that no longer match the house.
  • It’s detected with terraform plan (compares code and reality); ideally with automatic and periodic detection (e.g. every night) that alerts the team to differences.
  • It’s reconciled in two ways: revert the manual change with apply (if the code is the truth) or update the code via PR (if the manual change was valid).
  • The underlying lesson: all changes must be made through code; drift detection is the watchdog that enforces that rule.

You’ve finished Chapter 22 and Part V! You now master Terraform at an advanced level: modules, environments, state, testing, and CI/CD. In Part VI we shift focus to the transversal aspects of AWS that set a professional apart: we’ll start with defense-in-depth security.

Cloud, AWS & Terraform — From Zero to Expert

Chapter 1 · What is cloud computing

Chapter 2 · The cloud market and major providers

Chapter 3 · Regions, availability zones and edge

Chapter 4 · Compute: EC2

Chapter 5 · Storage: S3

Chapter 6 · Networking: VPC

Chapter 7 · Identity and access: IAM

Chapter 8 · Managed databases

Chapter 9 · Why Infrastructure as Code

Chapter 10 · HCL: the Terraform language

Chapter 11 · Providers and state

Chapter 12 · Your first real infrastructure in Terraform

Chapter 13 · Load balancing and auto scaling

Chapter 14 · Serverless with Lambda

Chapter 15 · Messaging and events

Chapter 16 · Content delivery and DNS

Chapter 17 · Containers on AWS

Chapter 18 · Modules: reuse and composition

Chapter 19 · Workspaces and environment management

Chapter 20 · Remote backends and locking

Chapter 21 · Infrastructure testing

Chapter 22 · Terraform in CI/CD

Chapter 23 · Defense in depth

Chapter 24 · Observability: logs, metrics and traces

Chapter 25 · Cost optimization

Chapter 26 · High availability and disaster recovery

Chapter 27 · AWS Well-Architected Framework

Chapter 28 · Serverless architectures at scale

Chapter 29 · Data platforms on AWS

Chapter 30 · Multi-account and landing zones

Chapter 31 · Platform Engineering and Internal Developer Platform

Chapter 32 · Relevant AWS certifications

Chapter 33 · Projects to consolidate what you've learned

Chapter 34 · Resources and community

© Copyright 2024. All rights reserved