In the previous subchapter, we looked at CloudWatch logs, metrics, and alarms. You have a ton of valuable information... but it's scattered. How do you present it so it's understood at a glance? That's what Dashboards are for: they bring your metrics together in visual screens. And we'll look at Contributor Insights, a tool to discover who or what is behind a behavior.

The problem: lots of data, little overview

You have metrics from your servers, your database, your load balancer, your Lambdas... each one separately. To know "how the system is doing overall," you'd have to check them one by one. You need an overview: a single screen where you can see the status of everything important at once.

What is a CloudWatch Dashboard

A CloudWatch Dashboard is a customizable screen where you place the graphs of the metrics you care about, together, to see them at a glance. You choose what to show and how to organize it:

┌─────────────── Dashboard "Production" ───────────────┐
│  Server CPU         │  Requests/sec                  │
│   ▁▂▅▇▅▂▁           │   ▃▅▆▇▆▅▃                     │
│──────────────────────┼───────────────────────────────│
│  HTTP Errors         │  Database Latency             │
│   ▁▁▁▂▁▁▁           │   ▂▂▃▂▂▂▃                     │
└──────────────────────┴───────────────────────────────┘

Analogy: a Dashboard is like the cockpit control panel of an airplane or the control room of a power plant: all the important indicators gathered in one place, organized so the pilot (or operator) can see the complete system status at a glance, without having to check each instrument in different places.

What Dashboards are for

  • Overview at a glance: is the whole system healthy right now? A look at the dashboard tells you.
  • Operation screens: many teams have dashboards on big screens in the office (or always open) to continuously monitor production.
  • Investigating incidents: when something goes wrong, a good dashboard shows you all related metrics together, which helps correlate ("when CPU went up, latency also went up... they're related").
  • Sharing status: they let the whole team (technical or not) see how the system is doing.

You can have several dashboards: a general one, one per application, one for the business team with business metrics (orders, revenue), etc.

Real-world example: a team managing a streaming platform has a dashboard always visible on a screen in the office. It shows: connected users, bandwidth used, playback errors, and latency. During the premiere of a highly anticipated series, the team sees live how connected users increase, and monitors that errors stay low. When they see a small spike in errors, they act before it affects more people. The dashboard gives them the system's pulse in real time.

The hardest problem: WHO is causing this?

Normal metrics tell you how much (e.g., "there are 10,000 requests per minute"). But sometimes you need to know who or what is behind a number:

  • "There's a ton of traffic... which user or IP is it coming from?"
  • "The database is saturated... which query or client is overloading it?"
  • "Who are the top 10 users consuming the most resources?"

This is hard to see in a normal graph, which only shows the total. This is where Contributor Insights comes in.

What is Contributor Insights

CloudWatch Contributor Insights analyzes your logs to identify the top contributors to a behavior: who or what is generating most of a certain activity. It shows you rankings of the "main culprits":

Contributor Insights — "IPs with most requests":
   1. 203.0.113.5    → 45,000 requests  ← suspicious
   2. 198.51.100.2   →  3,200 requests
   3. 192.0.2.10     →  2,800 requests
   ...

Analogy: Contributor Insights is like the detective who, in a crowd, identifies the ringleaders. The normal metric tells you "there are a lot of people and a lot of noise"; Contributor Insights tells you "the noise is being caused by these three specific people." It takes you from "how much" to "who."

What Contributor Insights is for

  • Detecting abuse: identify the IP or user making abnormal use (possible attack or client overloading the system).
  • Finding the culprit of a problem: "what's overloading the database?" → see which client or query dominates.
  • Understanding real usage: "which endpoints of my API are used most?", "which clients consume the most resources?"

Real-world example: an API starts to slow down because it receives a ton of traffic. The "total requests" metric confirms the high volume, but not the cause. With Contributor Insights, the team instantly sees that a single IP is generating 80% of the requests: a client with a bug calling the API in a loop. They block that IP and the system returns to normal. Without Contributor Insights, it would have taken hours to find the culprit among thousands of clients.

How they fit together: from "how much" to "who"

DASHBOARDS           → overview at a glance (how is everything?)
METRICS + ALARMS     → how much is happening and when to alert (Ch. 24.1)
CONTRIBUTOR INSIGHTS → WHO or what is behind a behavior

Dashboards give you the big picture; metrics and alarms, the numbers and alerts; and Contributor Insights, when needed, takes you to the detail of who is causing something.

What you should remember

  • A CloudWatch Dashboard is a customizable screen that brings together the graphs of the metrics you care about, to see the status of everything at a glance. Like the cockpit control panel.
  • They're used for: overview, always-visible operation screens, investigating incidents (correlating metrics), and sharing status with the team.
  • Normal metrics tell you how much; sometimes you need to know who or what is causing it.
  • Contributor Insights analyzes logs to identify the top contributors to a behavior (rankings of "main culprits"). Like the detective who identifies the ringleaders in a crowd.
  • It's used to detect abuse (an attacking IP), find the culprit of a problem (what's overloading the DB), and understand real usage.

In the next subchapter, we'll take observability a step further: tracing the path of a request through many services with distributed tracing and X-Ray.

Cloud, AWS & Terraform — From Zero to Expert

Chapter 1 · What is cloud computing

Chapter 2 · The cloud market and major providers

Chapter 3 · Regions, availability zones and edge

Chapter 4 · Compute: EC2

Chapter 5 · Storage: S3

Chapter 6 · Networking: VPC

Chapter 7 · Identity and access: IAM

Chapter 8 · Managed databases

Chapter 9 · Why Infrastructure as Code

Chapter 10 · HCL: the Terraform language

Chapter 11 · Providers and state

Chapter 12 · Your first real infrastructure in Terraform

Chapter 13 · Load balancing and auto scaling

Chapter 14 · Serverless with Lambda

Chapter 15 · Messaging and events

Chapter 16 · Content delivery and DNS

Chapter 17 · Containers on AWS

Chapter 18 · Modules: reuse and composition

Chapter 19 · Workspaces and environment management

Chapter 20 · Remote backends and locking

Chapter 21 · Infrastructure testing

Chapter 22 · Terraform in CI/CD

Chapter 23 · Defense in depth

Chapter 24 · Observability: logs, metrics and traces

Chapter 25 · Cost optimization

Chapter 26 · High availability and disaster recovery

Chapter 27 · AWS Well-Architected Framework

Chapter 28 · Serverless architectures at scale

Chapter 29 · Data platforms on AWS

Chapter 30 · Multi-account and landing zones

Chapter 31 · Platform Engineering and Internal Developer Platform

Chapter 32 · Relevant AWS certifications

Chapter 33 · Projects to consolidate what you've learned

Chapter 34 · Resources and community

© Copyright 2024. All rights reserved