In the previous subchapter, when talking about the Saga pattern, we mentioned that we needed a way to orchestrate multi-step processes: direct the order, decide what to do if a step fails, wait, retry... When you have many Lambdas (Chapter 14) that must collaborate in a complex flow, coordinating them "by hand" becomes chaos. That's what AWS Step Functions is for: a service that allows you to orchestrate multi-step workflows in a visual, orderly, and reliable way. It's like having a conductor for your serverless functions.

The problem: coordinating many functions is a mess

Imagine a business process with several steps: validate an order, charge, reserve stock, schedule shipping, notify... Each step could be a Lambda. If you try to coordinate them by having one call the next directly, problems arise:

❌ "By hand" coordination (Lambda calls Lambda):
   - What if a step fails? How do I retry?
   - How do I know which step the process is at right now?
   - How do I handle steps that take time, waits, decisions (if this, do that)?
   - The flow is "hidden" inside the code, hard to see and change

This "coordination hidden in the code" is fragile, hard to follow, and hard to modify. You need to separate the flow logic (the order of the steps, what to do if they fail) from the logic of each step (what each Lambda does).

What is Step Functions

AWS Step Functions is a service to orchestrate workflows: you define a sequence of steps —with their order, decisions, retries, and error handling— and Step Functions executes and coordinates it for you. The flow is defined in a visual and declarative way, separate from the code of each step.

   Step Functions executes a flow like this:

   [Validate order] → Valid?
                       ├─ yes → [Charge payment] → [Reserve stock] → [Ship] → [End]
                       └─ no → [Reject] → [End]
   (with retries and error handling at each step)

Analogy: Step Functions is like the conductor of an orchestra. Each musician (Lambda) knows how to play their instrument (do their task), but it's the conductor who sets the order, when each one comes in, what to do if someone makes a mistake, and keeps everyone coordinated so it sounds like a symphony and not chaos. Without a conductor, 50 musicians playing at once without coordination would be a disaster. Step Functions conducts your functions so they collaborate in an orderly flow.

Another way to see it: it's like a flowchart that actually runs. You draw the process (this step, then this, if this happens do that) and Step Functions carries it out.

What Step Functions gives you

  1. Visual and clear flow

The workflow looks like a diagram: at a glance you understand the entire process (what steps there are, in what order, what decisions are made). This makes the process easy to understand and modify, instead of being buried in the code.

  1. Integrated error handling and retries

Step Functions automatically manages what to do when a step fails: it can retry (with increasing waits), jump to an error handling step, or execute compensations (just what the Saga pattern from subchapter 28.2 needs!). You don't have to code all that logic by hand.

  1. State tracking

Step Functions remembers which step each execution is at and keeps the history. You can see exactly where a process is, what steps were completed, and where it failed if something went wrong. This gives huge visibility (complements the observability from Chapter 24).

  1. Steps that wait or take time

It naturally handles flows that take time (minutes, hours, or even days) or that must wait for something (a human approval, an external response). Something very hard to do with just Lambdas (which have a maximum execution time, remember subchapter 14.5).

Step Functions and the Saga pattern

Step Functions is the ideal tool to implement an orchestration Saga (subchapter 28.2): you define the process steps and, for each one, what compensation to execute if something fails later. Step Functions takes care of executing the compensations in order if a step fails, maintaining consistency, all in a visual and controlled way.

Saga with Step Functions:
   [Charge] → [Reserve] → [Ship] ✗ fails
        └─ Step Functions automatically executes compensations:
              [Release reservation] → [Refund] → consistent state

Real world example: a company processes loan applications, a process with many steps: validate data, check credit history (call to an external service that takes time), calculate risk, wait for human approval if the amount is high, and finally disburse or reject. They implement it with Step Functions: the complete flow looks like a clear diagram, steps that take time or wait (like human approval, which can take days) are handled without issue, and if any step fails, there are defined retries and error handling. The team can see at what point each application is in real time. What would have been fragile and unreadable to coordinate by hand with Lambdas, with Step Functions is a clear, robust, and easy-to-modify process.

When to use Step Functions

  • When you have a multi-step process to coordinate (especially with Lambdas).
  • When you need robust error handling, retries, or compensations (like in a Saga).
  • When the flow has decisions ("if this, do that"), waits, or steps that take time.
  • When you want to see and understand the process clearly, not hide it in the code.

💡 For a single simple task, a standalone Lambda is enough. Step Functions shines when there is a multi-step flow to coordinate.

What you should remember

  • Coordinating many functions by having one call the next is fragile and hard to follow; it's better to separate the flow logic from the logic of each step.
  • AWS Step Functions orchestrates multi-step workflows: you define the order, decisions, retries, and error handling in a visual and declarative way, and it executes and coordinates them. Like the conductor of an orchestra (or a flowchart that runs).
  • It gives you: clear visual flow, integrated error handling and retries, state tracking (which step each execution is at), and support for steps that wait or take time (even days).
  • It's the ideal tool to implement the Saga pattern by orchestration (executes compensations automatically if a step fails).
  • Use it for multi-step processes with decisions, waits, or robust error handling. 💡 For a simple task, a standalone Lambda is enough.

In the last subchapter of the chapter, we'll see how to run serverless logic very close to users, at the edge of the network, with Lambda@Edge and CloudFront Functions.

Cloud, AWS & Terraform — From Zero to Expert

Chapter 1 · What is cloud computing

Chapter 2 · The cloud market and major providers

Chapter 3 · Regions, availability zones and edge

Chapter 4 · Compute: EC2

Chapter 5 · Storage: S3

Chapter 6 · Networking: VPC

Chapter 7 · Identity and access: IAM

Chapter 8 · Managed databases

Chapter 9 · Why Infrastructure as Code

Chapter 10 · HCL: the Terraform language

Chapter 11 · Providers and state

Chapter 12 · Your first real infrastructure in Terraform

Chapter 13 · Load balancing and auto scaling

Chapter 14 · Serverless with Lambda

Chapter 15 · Messaging and events

Chapter 16 · Content delivery and DNS

Chapter 17 · Containers on AWS

Chapter 18 · Modules: reuse and composition

Chapter 19 · Workspaces and environment management

Chapter 20 · Remote backends and locking

Chapter 21 · Infrastructure testing

Chapter 22 · Terraform in CI/CD

Chapter 23 · Defense in depth

Chapter 24 · Observability: logs, metrics and traces

Chapter 25 · Cost optimization

Chapter 26 · High availability and disaster recovery

Chapter 27 · AWS Well-Architected Framework

Chapter 28 · Serverless architectures at scale

Chapter 29 · Data platforms on AWS

Chapter 30 · Multi-account and landing zones

Chapter 31 · Platform Engineering and Internal Developer Platform

Chapter 32 · Relevant AWS certifications

Chapter 33 · Projects to consolidate what you've learned

Chapter 34 · Resources and community

© Copyright 2024. All rights reserved