AWS Analytics Services Overview
Understand the data lake analytics pattern and when to use streaming vs batch analytics on AWS.
Learning outcomes
By the end of this lesson, the learner can:
- Explain the "data lake analytics" pattern (S3 → Glue → Athena → QuickSight).
- Identify when to use streaming (Kinesis) vs batch analytics.
- Map analytics services to their roles in data processing.
The data lake pattern
AWS analytics services commonly work together in a data lake architecture:
Analytics Services
Definition Guide
AWS Analytics Services
Amazon Athena
Meaning
Serverless SQL query service for data in S3
Examples
Run ad-hoc queries on Apache Parquet files in your data lake
When it's ideal: Pay per query—no servers to manage
AWS Glue
Meaning
Serverless data integration and ETL service
Examples
Crawl S3 buckets to populate the Data Catalog for Athena
When it's ideal: Includes Glue Data Catalog as persistent metadata store
Amazon QuickSight
Meaning
Cloud-scale BI and dashboard service
Examples
Create dashboards showing sales trends from Athena queries
When it's ideal: SPICE engine for fast, interactive analysis
Amazon Kinesis
Meaning
Real-time streaming data ingestion and processing
Examples
Ingest IoT sensor data at scale with sub-second latency
When it's ideal: Streams/shards model for horizontal scaling
Amazon OpenSearch Service
Meaning
Managed search and analytics cluster
Examples
Centralize and analyze application logs with full-text search
When it's ideal: UltraWarm and cold storage tiers for cost-effective retention
When to use each analytics service
When to use each analytics service
Use Athena when...
- Data already lives in S3 (data lake)
- You need ad-hoc SQL queries
- No ETL pipeline exists yet
Use Kinesis when...
- Real-time ingestion required
- Sub-second latency matters
- Streaming events (IoT, clicks, logs)
Use OpenSearch when...
- Full-text search needed
- Log analytics at scale
- Complex aggregations and visualizations
Knowledge Check
Summary
The AWS analytics stack follows clear patterns:
- Data lake: S3 → Glue → Athena → QuickSight
- Streaming: Kinesis for real-time ingestion
- Log analytics: OpenSearch for search and visualization
Understanding when to use streaming vs batch processing is key to designing efficient analytics architectures.