Skip to main content
Skip to main content
Still in beta — questions, comments or suggestions? aramb@aramb.dev

CloudWatch Essentials

Learn the three CloudWatch signals — metrics, logs, and alarms — and how to use them for monitoring and troubleshooting.

15 min
Introductory
No AWS Account NeededFREE

This lesson is purely conceptual — no AWS usage required.

What CloudWatch does

CloudWatch is AWS's monitoring and observability service. At its core:

The three CloudWatch signals
  • Metrics tell you how the system is behaving
  • Logs tell you what happened
  • Alarms tell you when to pay attention

Metrics: the numbers over time

Metrics are numeric signals collected over time, such as CPU utilization for EC2 or invocation counts and duration for Lambda. You view them by service namespace and dimension in CloudWatch.

Examples:

  • EC2: CPU and instance metrics
  • Lambda: invocation count, errors, duration
  • S3: some storage and request-related metrics (depending on configuration)

A good habit is to ask: "What number would move if my app were unhealthy?" That number is probably a metric you should watch.


Logs: the event trail

Logs are text records that describe events, requests, failures, messages, and system behavior. In CloudWatch Logs, log groups hold related log streams, and each log stream contains the sequence of log events for a source.

Examples:

  • A Lambda function writing execution logs
  • An app server writing application logs
  • A service writing error messages or audit-like events

Note

By default, CloudWatch Logs retains log data indefinitely unless you set a retention policy on the log group. Be aware of this for cost management.


Alarms: the "tell me when this crosses the line" layer

A CloudWatch alarm watches a metric or metric math expression and changes state based on whether the value crosses your threshold over one or more evaluation periods. Alarms can trigger actions such as notifications.

Examples:

  • Alert me if EC2 CPU stays too high
  • Alert me if Lambda errors increase
  • Alert me if a queue depth grows too much

Tip

Do not alarm on everything. Alarm on things that need action. CloudWatch alarms are most useful when a threshold actually tells you something meaningful.


What CloudWatch often shows automatically

Some AWS services publish useful metrics automatically:

  • Lambda automatically sends invocation-related metrics to CloudWatch with no extra setup needed
  • EC2 sends instance metrics by default — basic monitoring at 5-minute intervals, detailed monitoring at 1-minute intervals

This is why you will often see something in CloudWatch right away for Lambda and EC2.


What usually needs extra setup

Not everything appears automatically.

For EC2 application logs and more detailed system metrics like memory and disk usage, you typically use the CloudWatch agent. The agent can collect metrics, logs, and traces from EC2, on-prem servers, and containers.

Key rule:

  • EC2 built-in metrics: some are there by default
  • EC2 app logs and richer host metrics: usually require the agent or additional setup

The easiest mental model for troubleshooting

When something breaks, check in this order:

The recommended troubleshooting order: metrics first, logs second, alarms third
  1. Metrics: Is something clearly high, low, or missing?
  2. Logs: What exact error or event happened?
  3. Alarms: Did the system already warn me?

Examples:

  • High CPU on EC2 → check metrics first, then inspect app/system logs
  • Lambda errors spike → check Lambda metrics first, then read the function logs in CloudWatch Logs

Quick comparison

What it answers
Metric: "How is this behaving over time?"
Log: "What happened?"
Alarm: "When should I be notified or take action?"
Example
Metric: CPU usage, error count, duration
Log: Stack trace, request log, error line
Alarm: CPU above threshold, errors above threshold

Micro-activity 1: Sort the signal

Practice
1 / 6

CPUUtilization = 82% — is this a metric, log, or alarm?


Micro-activity 2: First troubleshooting drill

A learner says: "My photo upload app is failing sometimes."

Practice
1 / 3

What would you check first?


Summary

CloudWatch helps you monitor AWS systems with metrics, logs, and alarms. Metrics show numeric behavior over time, logs show event details, and alarms help you react when thresholds are crossed.

Lambda is often the easiest place to see CloudWatch in action because it automatically publishes function metrics. EC2 often needs extra setup for rich logs and host-level metrics through the CloudWatch agent.

CloudWatch Logs uses log groups and log streams, and log retention is configurable per log group. If you do not change retention, logs are stored indefinitely by default.


Quiz

Knowledge Check
1 / 10

What is the best description of a metric?