Skip to main content
Skip to main content
Still in beta — questions, comments or suggestions? aramb@aramb.dev

AWS Analytics Services Overview

Understand the data lake analytics pattern and when to use streaming vs batch analytics on AWS.

15 min
Introductory

Learning outcomes

By the end of this lesson, the learner can:

  1. Explain the "data lake analytics" pattern (S3 → Glue → Athena → QuickSight).
  2. Identify when to use streaming (Kinesis) vs batch analytics.
  3. Map analytics services to their roles in data processing.

The data lake pattern

AWS analytics services commonly work together in a data lake architecture:

Analytics workflow: data lake pattern with streaming and log analysis

Analytics Services

Definition Guide

AWS Analytics Services

01

Amazon Athena

Meaning

Serverless SQL query service for data in S3

Examples

Run ad-hoc queries on Apache Parquet files in your data lake

When it's ideal: Pay per query—no servers to manage

02

AWS Glue

Meaning

Serverless data integration and ETL service

Examples

Crawl S3 buckets to populate the Data Catalog for Athena

When it's ideal: Includes Glue Data Catalog as persistent metadata store

03

Amazon QuickSight

Meaning

Cloud-scale BI and dashboard service

Examples

Create dashboards showing sales trends from Athena queries

When it's ideal: SPICE engine for fast, interactive analysis

04

Amazon Kinesis

Meaning

Real-time streaming data ingestion and processing

Examples

Ingest IoT sensor data at scale with sub-second latency

When it's ideal: Streams/shards model for horizontal scaling

05

Amazon OpenSearch Service

Meaning

Managed search and analytics cluster

Examples

Centralize and analyze application logs with full-text search

When it's ideal: UltraWarm and cold storage tiers for cost-effective retention


When to use each analytics service

When to use each analytics service

Use Athena when...

  • Data already lives in S3 (data lake)
  • You need ad-hoc SQL queries
  • No ETL pipeline exists yet

Use Kinesis when...

  • Real-time ingestion required
  • Sub-second latency matters
  • Streaming events (IoT, clicks, logs)

Use OpenSearch when...

  • Full-text search needed
  • Log analytics at scale
  • Complex aggregations and visualizations

Knowledge Check

Knowledge Check
1 / 3

Which service lets you run SQL queries directly on data stored in S3 without loading it first?


Summary

The AWS analytics stack follows clear patterns:

  • Data lake: S3 → Glue → Athena → QuickSight
  • Streaming: Kinesis for real-time ingestion
  • Log analytics: OpenSearch for search and visualization

Understanding when to use streaming vs batch processing is key to designing efficient analytics architectures.