Upgrade the Upload Flow to Save File Metadata with DynamoDB
Save upload metadata from an S3-triggered Lambda function into a DynamoDB table with a composite primary key.
All services used in this lesson are covered by the AWS Free Tier.
AWS Services Used
Learning Outcomes
By the end of this lesson, you will be able to:
- Save upload metadata from an S3-triggered Lambda function into DynamoDB.
- Explain why DynamoDB is a good fit for simple metadata records in a serverless workflow.
- Use a composite primary key to store metadata by bucket and object key.
- Use Lambda environment variables for table configuration.
- Explain one overwrite risk with
PutItemand one way to reduce it.
Key Terms
- DynamoDB table: A collection of items. Each item is uniquely identified by its primary key. DynamoDB supports a simple primary key or a composite primary key made of a partition key and sort key.
- Item: A collection of attributes in a DynamoDB table. Basic CRUD operations include
PutItem,GetItem,UpdateItem, andDeleteItem. - Partition key / sort key: In a composite primary key, the partition key groups related items and the sort key distinguishes items within that partition.
- Environment variable: A configuration value stored on the Lambda function, useful for passing operational parameters like a table name without hard-coding them in code.
The Core Idea
Your S3 event already contains useful metadata, including the bucket name, object key, object size, eTag, event time, and a sequencer value. The object key in the event is URL-encoded, so your Lambda code should decode it before saving it.
In this lesson, you will keep the S3 trigger from the previous lesson, then upgrade the Lambda function so it writes one metadata record into DynamoDB for each uploaded object. DynamoDB is a strong fit here because it is schemaless beyond the primary key attributes, and PutItem gives you a simple way to create an item.
What You Will Build
You will create:
- One DynamoDB table for upload metadata
- One Lambda environment variable named
TABLE_NAME - One small IAM permission update so Lambda can write to DynamoDB
- One Lambda function that stores:
bucketobject_keysizeetagevent_timeevent_namesequencer
Table Design for this Lesson
Use this DynamoDB key design:
- Partition key:
bucket(String) - Sort key:
object_key(String)
Why this design works:
- Many uploads can belong to the same bucket.
- Each object key distinguishes one upload record within that bucket.
- Composite keys are a normal DynamoDB pattern for related items with shared grouping and unique secondary identity.
A simple table like this is also flexible because DynamoDB does not require you to predefine non-key attributes and their data types ahead of time.
Part 1: Create the DynamoDB Table
Open DynamoDB and create a table with:
- Table name:
upload_metadata - Partition key:
bucket(String) - Sort key:
object_key(String)
When you create a DynamoDB table, you must specify the primary key. For a composite primary key, you provide both the partition key and the sort key.
Part 2: Update the Lambda Execution Role
Your Lambda function now needs permission to write to the DynamoDB table. A minimal policy for this lesson can look like this:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "WriteMetadataToDynamoDB",
"Effect": "Allow",
"Action": ["dynamodb:PutItem"],
"Resource": "arn:aws:dynamodb:REGION:ACCOUNT_ID:table/upload_metadata"
},
{
"Sid": "WriteLogs",
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "*"
}
]
}
Attach that to the Lambda execution role, replacing the placeholders with your account details. The role is the identity Lambda uses to call AWS services on your behalf.
Part 3: Add an Environment Variable
Set a Lambda environment variable:
- Key:
TABLE_NAME - Value:
upload_metadata
AWS recommends using environment variables to pass operational parameters instead of hard-coding them. Lambda environment variables are stored as function configuration and are available to your code at runtime.
Part 4: Update the Lambda Code
Replace the simple logging-only function with this version:
import json
import os
import urllib.parse
import boto3
dynamodb = boto3.resource("dynamodb")
table = dynamodb.Table(os.environ["TABLE_NAME"])
def lambda_handler(event, context):
records = event.get("Records", [])
written = 0
for record in records:
if "s3" not in record:
continue
bucket = record["s3"]["bucket"]["name"]
object_key = urllib.parse.unquote_plus(
record["s3"]["object"]["key"],
encoding="utf-8"
)
item = {
"bucket": bucket,
"object_key": object_key,
"size": int(record["s3"]["object"].get("size", 0)),
"etag": record["s3"]["object"].get("eTag", ""),
"event_time": record.get("eventTime", ""),
"event_name": record.get("eventName", ""),
"sequencer": record["s3"]["object"].get("sequencer", "")
}
table.put_item(Item=item)
print(f"Saved metadata for {bucket}/{object_key}")
written += 1
return {
"statusCode": 200,
"body": json.dumps({"written": written})
}
Why this code works:
- The S3 event carries the bucket name, object key, size,
eTag, event time, andsequencer. - The object key must be URL-decoded before you treat it like a normal path or filename.
PutItemcreates the DynamoDB item for the metadata record.
Part 5: Test the Flow
Upload a file like incoming/notes.txt. Because your S3 trigger already listens for uploads in incoming/, Lambda should run asynchronously and process the event.
Then verify two places:
- CloudWatch Logs: You should see a log line like
Saved metadata for ...from the function. - DynamoDB Table Items: You should see one item with keys
bucketandobject_key, plus the extra attributes.
Important Note about Overwrites
PutItem writes an item for the key you provide. If you use the same primary key again, the new write can replace the old item.
For this lesson, that means:
- If the same bucket and object key are uploaded again, you may overwrite the previous metadata row.
- That is acceptable for a learning lab.
- Later, you can add a condition or include a version/timestamp in the key if you want every event stored separately.
Optional Improvement: Think about Ordering and Duplicates
AWS notes two useful things for S3 event processing:
- Event notifications are not guaranteed to arrive in the exact order events occurred.
- The
sequencerfield can help compare order for events on the same object key. - Lambda best practices also recommend writing idempotent code, because duplicate processing can happen in event-driven systems.
For now, just store the sequencer field so you have it available later.
Lab Checklist
| Step | Success Condition |
|---|---|
| Create DynamoDB table | upload_metadata exists |
| Add Lambda permissions | Execution role can call dynamodb:PutItem |
| Add environment variable | TABLE_NAME=upload_metadata is set |
| Update Lambda code | Function writes item to DynamoDB |
| Upload a test file | S3 trigger fires |
| Check logs | Function logs success |
| Check table | One metadata item appears |
Micro-activity 1: Inspect the Saved Item
Think about it
After your test upload, check the saved item: What bucket value was saved? What object key? What file size? What event name? Was a sequencer value stored? These values come from the S3 event structure that Lambda receives.
Micro-activity 2: Match DynamoDB Operations
Match each DynamoDB operation to what it does
Examples
Choose one, then match it on the right
Characteristics
Select an example first
0 of 5 matched so far.
Summary
In this lesson, you upgraded a simple S3-to-Lambda workflow into a real metadata pipeline. S3 sent the upload event, Lambda parsed the event, and DynamoDB stored a structured metadata record.
You also used two good serverless habits:
- Storing configuration in a Lambda environment variable instead of hard-coding it.
- Using a DynamoDB primary key that matches the shape of the event data you are saving.