We close the observability chapter with two very popular tools in the open source world that AWS offers as managed services: Prometheus (for collecting and storing metrics) and Grafana (for visualizing them in beautiful dashboards). They are the de facto standard in many companies, especially with Kubernetes, and understanding what they are and why to use them in their managed version opens the door to a huge ecosystem.

The context: the open source observability ecosystem

In addition to CloudWatch (AWS's native tool), there is a widespread open source ecosystem for observability. Two of the most popular tools are:

  • Prometheus: for collecting and storing metrics.
  • Grafana: for visualizing those metrics (and others) in dashboards.

Many people use them together and they are almost a standard, especially in Kubernetes environments (remember EKS, subchapter 17.4). The problem: installing and maintaining them yourself is a lot of work (servers, updates, scaling, backups...). That's why AWS offers managed versions of both, where AWS takes care of all that operation (remember the idea of "managed service" we saw with RDS in Chapter 8).

What is Prometheus (and Managed Prometheus)

Prometheus is an open source system for collecting and storing metrics, very popular, especially in the world of containers and Kubernetes. It collects metrics from your applications and services and stores them in an optimized way for querying.

Amazon Managed Service for Prometheus is the managed version offered by AWS: you use Prometheus, but AWS takes care of the servers, scaling, availability, and maintenance. You focus on your metrics, not on operating the Prometheus infrastructure.

Your applications / Kubernetes
        │ (emit metrics)
        ▼
Managed Prometheus (collects and stores the metrics)
        │ AWS manages the servers, scaling, availability...
        ▼
   ready to query and visualize

Analogy: Prometheus is like a warehouse specialized in storing measurements (millions of numbers over time), very well organized to find them quickly. The managed version is like renting that warehouse with all the staff included: you put in and query the measurements, but you don't worry about maintaining the building, security, or expanding it when it gets full. AWS operates it for you.

What is Grafana (and Managed Grafana)

Grafana is an open source tool for visualizing data in very powerful, flexible, and attractive dashboards. It is famous for its spectacular graphs and for being able to combine data from many different sources in a single dashboard (from Prometheus, from CloudWatch, from databases...).

Amazon Managed Grafana is the managed version: AWS operates Grafana for you (servers, updates, scaling, security), and you just create and use your dashboards.

   ┌──────────── Grafana Dashboard ────────────┐
   │  Data from Managed Prometheus  +  CloudWatch   │
   │  + database  +  other sources, TOGETHER        │
   │   📊 powerful and customizable graphs          │
   └───────────────────────────────────────────────┘

Analogy: Grafana is like a professional control panel design studio: it takes data from anywhere and turns it into clear, beautiful, and highly configurable visual screens. The managed version is like hiring that studio "turnkey": you design your panels, but you don't maintain the premises or the equipment.

How Prometheus and Grafana work together

The classic combination is Prometheus collects, Grafana visualizes:

Applications → Managed Prometheus (collects and stores metrics)
                        │
                        ▼
              Managed Grafana (visualizes those metrics in dashboards)

Prometheus is the "warehouse of numbers" and Grafana is the "pretty screen" that displays them. Together they form a complete and widely used observability solution in the industry.

Why use these managed versions?

The key question: if CloudWatch already exists, why use managed Prometheus and Grafana? Common reasons:

  • Industry standard: Prometheus and Grafana are the de facto standard in many companies, especially with Kubernetes. If your team already knows them or your ecosystem uses them, it makes a lot of sense.
  • Without the pain of operating them: you get these powerful open source tools without having to install or maintain them (AWS does it).
  • Grafana's flexibility: Grafana can combine data from many sources (Prometheus, CloudWatch, other clouds, databases...) in a single dashboard, ideal for multi-cloud or hybrid environments.
  • Portability: since they are standard tools, your investment in dashboards and configuration is portable (fits with the OpenTelemetry philosophy from subchapter 24.4: avoiding lock-in).

Real world example: a company running its applications on Kubernetes (EKS) already uses Prometheus and Grafana, as is common in that world. Instead of maintaining those systems themselves (with the operational work that entails), they adopt Managed Prometheus and Managed Grafana. They keep exactly the tools their team masters, their dashboards work the same, but now AWS takes care of keeping them available and scaled. Also, in Grafana they combine in a single dashboard the metrics from Prometheus and some from CloudWatch, having a unified view. The best of both worlds: standard tools they know, operated by AWS.

CloudWatch vs Prometheus/Grafana: which one?

It's not that one is better; it depends on the context:

CloudWatch Managed Prometheus + Grafana
Origin Native to AWS Open source (industry standard)
Integration with AWS Total and immediate Good, but less "native"
Ideal if You are focused on AWS and want the simplest option You use Kubernetes, multi-cloud, or your team already masters these tools
Portability Tied to AWS High (standard tools)

To start and if you only use AWS, CloudWatch is the most straightforward. If you come from the Kubernetes/open source world or work multi-cloud, managed Prometheus + Grafana fit better.

What you should remember

  • There is a very widespread open source observability ecosystem; two key pieces are Prometheus (collects and stores metrics) and Grafana (visualizes them in dashboards), widely used together, especially with Kubernetes.
  • AWS offers managed versions: Amazon Managed Service for Prometheus and Amazon Managed Grafana, where AWS operates the servers, scaling, and maintenance (like any managed service).
  • Prometheus = "measurement warehouse" optimized; Grafana = "dashboard studio" that combines data from many sources into powerful graphs. Classic combination: Prometheus collects, Grafana visualizes.
  • They are used because they are the industry standard (especially with Kubernetes), to avoid the pain of operating them, for Grafana's flexibility with multiple sources, and for their portability (no lock-in, in line with OpenTelemetry).
  • CloudWatch is ideal if you focus on AWS and want simplicity; managed Prometheus + Grafana, if you use Kubernetes/multi-cloud or your team already knows them.

You have completed Chapter 24 and, with it, you master observability in AWS: logs, metrics, alarms, dashboards, distributed tracing, the OpenTelemetry standard, and the managed open source tools! In Chapter 25 we will tackle another crucial aspect of operating in the cloud: cost optimization.

Cloud, AWS & Terraform — From Zero to Expert

Chapter 1 · What is cloud computing

Chapter 2 · The cloud market and major providers

Chapter 3 · Regions, availability zones and edge

Chapter 4 · Compute: EC2

Chapter 5 · Storage: S3

Chapter 6 · Networking: VPC

Chapter 7 · Identity and access: IAM

Chapter 8 · Managed databases

Chapter 9 · Why Infrastructure as Code

Chapter 10 · HCL: the Terraform language

Chapter 11 · Providers and state

Chapter 12 · Your first real infrastructure in Terraform

Chapter 13 · Load balancing and auto scaling

Chapter 14 · Serverless with Lambda

Chapter 15 · Messaging and events

Chapter 16 · Content delivery and DNS

Chapter 17 · Containers on AWS

Chapter 18 · Modules: reuse and composition

Chapter 19 · Workspaces and environment management

Chapter 20 · Remote backends and locking

Chapter 21 · Infrastructure testing

Chapter 22 · Terraform in CI/CD

Chapter 23 · Defense in depth

Chapter 24 · Observability: logs, metrics and traces

Chapter 25 · Cost optimization

Chapter 26 · High availability and disaster recovery

Chapter 27 · AWS Well-Architected Framework

Chapter 28 · Serverless architectures at scale

Chapter 29 · Data platforms on AWS

Chapter 30 · Multi-account and landing zones

Chapter 31 · Platform Engineering and Internal Developer Platform

Chapter 32 · Relevant AWS certifications

Chapter 33 · Projects to consolidate what you've learned

Chapter 34 · Resources and community

© Copyright 2024. All rights reserved