CloudWatch Essentials
Learn the three CloudWatch signals — metrics, logs, and alarms — and how to use them for monitoring and troubleshooting.
This lesson is purely conceptual — no AWS usage required.
What CloudWatch does
CloudWatch is AWS's monitoring and observability service. At its core:
- Metrics tell you how the system is behaving
- Logs tell you what happened
- Alarms tell you when to pay attention
Metrics: the numbers over time
Metrics are numeric signals collected over time, such as CPU utilization for EC2 or invocation counts and duration for Lambda. You view them by service namespace and dimension in CloudWatch.
Examples:
- EC2: CPU and instance metrics
- Lambda: invocation count, errors, duration
- S3: some storage and request-related metrics (depending on configuration)
A good habit is to ask: "What number would move if my app were unhealthy?" That number is probably a metric you should watch.
Logs: the event trail
Logs are text records that describe events, requests, failures, messages, and system behavior. In CloudWatch Logs, log groups hold related log streams, and each log stream contains the sequence of log events for a source.
Examples:
- A Lambda function writing execution logs
- An app server writing application logs
- A service writing error messages or audit-like events
Note
By default, CloudWatch Logs retains log data indefinitely unless you set a retention policy on the log group. Be aware of this for cost management.
Alarms: the "tell me when this crosses the line" layer
A CloudWatch alarm watches a metric or metric math expression and changes state based on whether the value crosses your threshold over one or more evaluation periods. Alarms can trigger actions such as notifications.
Examples:
- Alert me if EC2 CPU stays too high
- Alert me if Lambda errors increase
- Alert me if a queue depth grows too much
Tip
Do not alarm on everything. Alarm on things that need action. CloudWatch alarms are most useful when a threshold actually tells you something meaningful.
What CloudWatch often shows automatically
Some AWS services publish useful metrics automatically:
- Lambda automatically sends invocation-related metrics to CloudWatch with no extra setup needed
- EC2 sends instance metrics by default — basic monitoring at 5-minute intervals, detailed monitoring at 1-minute intervals
This is why you will often see something in CloudWatch right away for Lambda and EC2.
What usually needs extra setup
Not everything appears automatically.
For EC2 application logs and more detailed system metrics like memory and disk usage, you typically use the CloudWatch agent. The agent can collect metrics, logs, and traces from EC2, on-prem servers, and containers.
Key rule:
- EC2 built-in metrics: some are there by default
- EC2 app logs and richer host metrics: usually require the agent or additional setup
The easiest mental model for troubleshooting
When something breaks, check in this order:
- Metrics: Is something clearly high, low, or missing?
- Logs: What exact error or event happened?
- Alarms: Did the system already warn me?
Examples:
- High CPU on EC2 → check metrics first, then inspect app/system logs
- Lambda errors spike → check Lambda metrics first, then read the function logs in CloudWatch Logs
Quick comparison
Micro-activity 1: Sort the signal
Micro-activity 2: First troubleshooting drill
A learner says: "My photo upload app is failing sometimes."
Summary
CloudWatch helps you monitor AWS systems with metrics, logs, and alarms. Metrics show numeric behavior over time, logs show event details, and alarms help you react when thresholds are crossed.
Lambda is often the easiest place to see CloudWatch in action because it automatically publishes function metrics. EC2 often needs extra setup for rich logs and host-level metrics through the CloudWatch agent.
CloudWatch Logs uses log groups and log streams, and log retention is configurable per log group. If you do not change retention, logs are stored indefinitely by default.