Skip to main content
Skip to main content
Still in beta — questions, comments or suggestions? aramb@aramb.dev

Add Basic Alarms and Health Checks for the Dashboard

Learn how to monitor your serverless application using CloudWatch Alarms for errors and latency, and implement a simple health check endpoint.

30 min
Introductory
Mostly Free TierFREE w/ CAVEATS

Core services are free, but some optional features may incur small costs.

AWS Services Used

CloudWatch Alarms10 alarms always freeCloudWatch MetricsAlways free for AWS services

CloudWatch Synthetics Canary is optional and costs $0.0012 per run. You can skip it without missing the core lesson.

Learning Outcomes

By the end of this lesson, you will be able to:

  1. Explain the difference between a metric, an alarm, and a health check.
  2. Add practical CloudWatch alarms for your Lambda functions and API.
  3. Choose sensible first alarm targets for serverless workloads.
  4. Add a simple /health endpoint to your API.
  5. Explain the role of external synthetic monitoring.

Why This Lesson Matters

Your app now works, is secured, and is deployed across environments. The next hardening step is making sure you notice problems before your users do.

A good monitoring setup has three layers:

  • Metrics: Data points telling you what changed (e.g., "5 errors occurred").
  • Alarms: Notifications triggered when metrics cross a threshold (e.g., "Alert me if errors > 0").
  • Health Checks: Active tests that confirm your app is actually reachable from the outside world.

The Core Idea

CloudWatch alarms watch metrics and change state when thresholds are crossed. Because Lambda and API Gateway send metrics to CloudWatch automatically, setting up basic alerts is a "low-effort, high-reward" hardening step.

Serverless Monitoring Architecture

What to Monitor First

1) Lambda Errors

Lambda automatically publishes invocation metrics. Monitoring Errors is the single most important first step for any backend function.

2) API Gateway 5XX Errors

HTTP APIs send 5XXError metrics automatically. A 5XX error usually indicates a server-side crash or integration failure, making it more urgent than a 4XX error (which might just be a user typo).

3) Lambda Duration

A function can be "working" but becoming dangerously slow. Monitoring duration helps you spot performance regressions or functions nearing their timeout limit.


Part 1: Create a Lambda Errors Alarm

In the CloudWatch console:

  1. Go to Alarms > All alarms and choose Create alarm.
  2. Choose Select metric and navigate to Lambda > By Function Name.
  3. Select the Errors metric for your Upload or Delete function.
  4. Set the threshold to Static > Greater than 0.
  5. Configure an action (like an SNS topic to email you) or skip for now to just see the state change in the console.

Part 2: Create an API Gateway 5XX Alarm

Repeat the alarm process but select the 5XXError metric for your HTTP API.

  • Threshold: 5XXError >= 1.
  • Period: 1 minute.

This catches integration failures, such as when your API Gateway doesn't have the right permissions to invoke your Lambda.


Part 3: Add a Lightweight /health Endpoint

Metrics tell you what's happening inside AWS. A health endpoint lets you verify the app is reachable from the outside.

Create a new route: GET /health. Back it with a tiny Lambda function:

import json

def lambda_handler(event, context):
    return {
        "statusCode": 200,
        "headers": {"Content-Type": "application/json"},
        "body": json.dumps({"status": "ok", "environment": "prod"})
    }

This endpoint confirms that API Gateway is alive, the route is mapped, and Lambda can execute.


Part 4: Optional - CloudWatch Synthetics

Once your /health endpoint is live, you can add a Canary. Canaries are scripts that run on a schedule (e.g., every 5 minutes) to "ping" your API. They simulate a real user experience and alert you if the endpoint stops responding, even if no real users are active.


Lab Checklist

StepSuccess Condition
Lambda AlarmAlarm exists for Errors > 0
API AlarmAlarm exists for 5XXError > 0
Health RouteGET /health returns a 200 OK
Test AlarmManually trigger an error and see the alarm turn red

Micro-activity 1: Monitoring Strategy

Think about it

Plan your monitoring strategy: For your critical function, API endpoint, and performance targets — what metrics would you alarm on and what thresholds would you set? Think about Errors, 5XXError, and Duration as starting points.


Micro-activity 2: Match the Monitoring Concepts

Micro-Activity

Match each observability concept to its description

Examples

Choose one, then match it on the right

Characteristics

Select an example first

0 of 5 matched so far.


Summary

In this lesson, you added "eyes" to your dashboard. By monitoring Lambda errors and API failures, and providing a public health signal, you've moved from "hoping it works" to "knowing it works." This observability is a core pillar of operational excellence in the cloud.


Quiz

Knowledge Check
1 / 5

What is the primary difference between a metric and an alarm?