Add Basic Alarms and Health Checks for the Dashboard
Learn how to monitor your serverless application using CloudWatch Alarms for errors and latency, and implement a simple health check endpoint.
Core services are free, but some optional features may incur small costs.
AWS Services Used
CloudWatch Synthetics Canary is optional and costs $0.0012 per run. You can skip it without missing the core lesson.
Learning Outcomes
By the end of this lesson, you will be able to:
- Explain the difference between a metric, an alarm, and a health check.
- Add practical CloudWatch alarms for your Lambda functions and API.
- Choose sensible first alarm targets for serverless workloads.
- Add a simple
/healthendpoint to your API. - Explain the role of external synthetic monitoring.
Why This Lesson Matters
Your app now works, is secured, and is deployed across environments. The next hardening step is making sure you notice problems before your users do.
A good monitoring setup has three layers:
- Metrics: Data points telling you what changed (e.g., "5 errors occurred").
- Alarms: Notifications triggered when metrics cross a threshold (e.g., "Alert me if errors > 0").
- Health Checks: Active tests that confirm your app is actually reachable from the outside world.
The Core Idea
CloudWatch alarms watch metrics and change state when thresholds are crossed. Because Lambda and API Gateway send metrics to CloudWatch automatically, setting up basic alerts is a "low-effort, high-reward" hardening step.
What to Monitor First
1) Lambda Errors
Lambda automatically publishes invocation metrics. Monitoring Errors is the single most important first step for any backend function.
2) API Gateway 5XX Errors
HTTP APIs send 5XXError metrics automatically. A 5XX error usually indicates a server-side crash or integration failure, making it more urgent than a 4XX error (which might just be a user typo).
3) Lambda Duration
A function can be "working" but becoming dangerously slow. Monitoring duration helps you spot performance regressions or functions nearing their timeout limit.
Part 1: Create a Lambda Errors Alarm
In the CloudWatch console:
- Go to Alarms > All alarms and choose Create alarm.
- Choose Select metric and navigate to Lambda > By Function Name.
- Select the Errors metric for your Upload or Delete function.
- Set the threshold to Static > Greater than 0.
- Configure an action (like an SNS topic to email you) or skip for now to just see the state change in the console.
Part 2: Create an API Gateway 5XX Alarm
Repeat the alarm process but select the 5XXError metric for your HTTP API.
- Threshold:
5XXError >= 1. - Period: 1 minute.
This catches integration failures, such as when your API Gateway doesn't have the right permissions to invoke your Lambda.
Part 3: Add a Lightweight /health Endpoint
Metrics tell you what's happening inside AWS. A health endpoint lets you verify the app is reachable from the outside.
Create a new route: GET /health. Back it with a tiny Lambda function:
import json
def lambda_handler(event, context):
return {
"statusCode": 200,
"headers": {"Content-Type": "application/json"},
"body": json.dumps({"status": "ok", "environment": "prod"})
}
This endpoint confirms that API Gateway is alive, the route is mapped, and Lambda can execute.
Part 4: Optional - CloudWatch Synthetics
Once your /health endpoint is live, you can add a Canary. Canaries are scripts that run on a schedule (e.g., every 5 minutes) to "ping" your API. They simulate a real user experience and alert you if the endpoint stops responding, even if no real users are active.
Lab Checklist
| Step | Success Condition |
|---|---|
| Lambda Alarm | Alarm exists for Errors > 0 |
| API Alarm | Alarm exists for 5XXError > 0 |
| Health Route | GET /health returns a 200 OK |
| Test Alarm | Manually trigger an error and see the alarm turn red |
Micro-activity 1: Monitoring Strategy
Think about it
Plan your monitoring strategy: For your critical function, API endpoint, and performance targets — what metrics would you alarm on and what thresholds would you set? Think about Errors, 5XXError, and Duration as starting points.
Micro-activity 2: Match the Monitoring Concepts
Match each observability concept to its description
Examples
Choose one, then match it on the right
Characteristics
Select an example first
0 of 5 matched so far.
Summary
In this lesson, you added "eyes" to your dashboard. By monitoring Lambda errors and API failures, and providing a public health signal, you've moved from "hoping it works" to "knowing it works." This observability is a core pillar of operational excellence in the cloud.