AWS Integration & Messaging: SQS, SNS & Kinesis

zhuting
8 min readJan 30, 2022

When we start deploying multiple applications, they will inevitably need to communicate with one another. There are two patterns of application communication:

  • Synchronous communications (application to application)
  • Asynchronous / Event based (application to queue to application)

Synchronous between applications can be problematic if there are sudden spikes of traffic. What if you need to suddenly encode 1000 videos but usually it’s 10?

In that case, it’s better to decouple your application:

  • Using SQS: queue model
  • Using SNS: pub/sub model
  • Using Kinesis: real-time streaming model

Amazon SQS — Standard Queue

  • Oldest offering ( over 10 years old)
  • Fully managed service used to decouple applications
  • Unlimited throughput, unlimited number of messages in queue
  • Default retention of messages: 4 days, maximum of 14 days
  • Low latency ( < 10ms on publish and receive)
  • Limitation of 256 KB per message sent
  • Can have duplicate messages (at least once delivery, occasionally)
  • Can have out of order messages ( best effort ordering)

SQS — Producing Messages

  • Produced to SQS using the SDK (SendMessage API)
  • The message is persisted in SQS until a consumer deletes it
  • Message retention: default 4 days, up to 14 days

SQS — Consuming Messages

  • Consumers (running on EC2 instances, servers, or AWS Lambda)…
  • Poll SQS for messages (receive up to 10 messages at a time)
  • Process the message (example: insert the message into an RDS database)
  • Delete the message using the DeleteMessage API

After you send messages to a queue, you can receive and delete them. When you request messages from a queue, you can’t specify which messages to retrieve. Instead, you specify the maximum number of messages (up to 10) that you want to retrieve.

SQS — Multiple EC2 Instances Consumers

  • Consumers receive and process messages in parallel
  • At least once delivery
  • Best-effort message ordering
  • Consumers delete messages after processing them
  • We can scale consumers horizontally to improve throughput of processing

SQS with Auto Scaling Group (ASG)

SQS to decouple between application tiers

Amazon SQS — Security

  • Encryption: In-flight encryption using HTTPS API; At-rest encryption using KMS keys; Client-side encryption if the client wants to perform encryption / decryption itself
  • Access Controls: IAM policies to regulate access to the SQS API
  • SQS Access Policies (similar to S3 bucket policies): useful for cross-account access to SQS queues; useful for allowing other services (SNS, S3…) to write to an SQS queue

SQS Message Visibility Timeout

  • After a message is polled by a consumer, it becomes invisible to other consumers; by default, the “message visibility timeout” is 30 seconds;
  • That means the message has 30 seconds to be processed
  • After the message visibility timeout is over, the message is “visible” in SQS
  • If a message is not processed within the visibility timeout, it will be processed twice
  • A consumer could call the ChangeMessageVisibility API to get more time
  • If visibility timeout is high(hours), and consumer crashes, re-processing will take time
  • If visibility timeout is too low(seconds), we may get duplicates

Amazon SQS — Dead Letter Queue

  • If a consumer fails to process a message within the Visibility Timeout…. the message goes back to the queue.
  • We can set a threshold of how many times a message can go back to the queue.
  • After the MaximumReceives threshold is exceeded, the message goes into a dead letter queue (DLQ)
  • Useful for debugging
  • Make sure to process the message in the DLQ before they expire:
    Good to set retention of 14 days in the DLQ

Amazon SQS — Delay Queue

  • Delay a message (consumers don’t see it immediately) up to 15 minutes
  • Default is 0 seconds (message is available right away)
  • Can set a default at queue level
  • Can override the default on send using the DelaySeconds parameter

Amazon SQS — Long Polling

  • When a consumer requests message from the queue, it can optionally “wait” for messages to arrive if there are none in the queue
  • This is called Long Polling
  • LongPolling decreases the number of API calls made to SQS while increasing the efficiency and latency of your application
  • The wait time can be between 1 sec to 20 sec (20 sec preferable)
  • Long Polling is preferable to Short Polling
  • Long Polling can be enabled at the queue level or at the API level using WaitTimeSeconds

Amazon SQS — FIFO Queue

  • FIFO = First In First Out (ordering of message in the queue)
  • Limited throughput: 300msg/s without batching, 3000 msg/s with
  • Exactly-once send capability (by removing duplicates)
  • Messages are processed in order by the consumer

SQS FIFO — Deduplication

  • De-dupliation interval is 5 minutes
  • Two de-duplication methods
    💎 Content-based deduplication: will do a SHA-256 hash of the message body
    💎 Explicitly provide a Message Deduplication ID

SQS FIFO — Message Grouping

  • If you specify the same value of MessageGroupID in an SQS FIFO queue, you can only have one consumer, and all the messages are in order
  • To get ordering at the level of a subset of messages, specify different values for MessageGroupID
    💎Messages that share a common Message Group ID will be in order within the group
    💎 Each Group ID can have a different consumer(parallel processing)
    💎Ordering across groups is not guaranteed

Amazon SNS

  • The “event producer” only sends message to one SNS topic
  • As many “event receivers” (subscriptions) as we want to listen to the SNS topic notifications
  • Each subscriber to the topic will get all the message( note: new feature to filter messages)
  • Up to 10,000,000 subscriptions per topic
  • 100,000 topics limit
  • Subscribers can be :
    SQS
    HTTP / HTTPS
    Lambda
    Emails
    SMS messages
  • Mobile Notifications

Kinesis Overview

  • Make it easy to collect, process and analyze streaming data in real-time
  • Ingest real-time data such as: Application logs, Metrics, Website clickstreams, IOT telemetry data…
  • Kinesis Data Streams: capture,process and store data streams
  • Kinesis Data Firehose: load data streams into AWS data stores
  • Kinesis Data Analytics: analyze data streams with SQL or Apache Flink
  • Kinesis Video Streams: capture, process and store video streams

Kinesis Data Streams

  • Billing is per shard provisioned, can have as many shards as you want
  • Retention between 1 day(default) to 365 days
  • Ability to reprocess (reply) data
  • Once data is inserted in Kinesis, it can’t be deleted (immutability)
  • Data that share the same partition goes to the same shard (ordering)
  • Producers: AWS SDK, Kinesis Producer Library(KPL), Kinesis Agent
  • Consumers:
    Write your own: Kinesis Client Library (KCL), AWS SDK
    Managed: AWS Lambda, Kinesis Data Firehose

Kinesis Producers

  • Put data records into data streams
  • Data record consists of:
    Sequence number (unique per partition-key within shard)
    Partition key (must specify while put records into stream)
    Data blob(up to 1MB)
  • Producers:
    AWS SDK: simple producer
    Kinesis Producer Library(KPL): C++,Java,batch, compression, retries
    Kinesis Agent: monitor log files
  • Write throughput: 1MB/sec or 1000 records/sec per shard
  • PutRecord API
  • Use batching with PutRecords API to reduce costs & increase throughput

Worth reading: Amazon Kinesis Data Streams FAQs

Kinesis Operation — Shard Splitting

  • Used to increase the Stream capacity (1 MB/s data in per shard)
  • Used to divide a “hot shard”
  • The old shard is closed and will be deleted once the data is expired
  • No automatic scaling (manually increase/decrease capacity)
  • Can’t split into more than two shards in a single operation

Kinesis Operation — Merge Splitting

  • Decrease the Stream capacity and save costs
  • Can be used to group two shards with low traffic (cold shards)
  • Old shards are closed and will be deleted once the data is expired
  • Can’t merge more than two shards into a single operation

Kinesis Data Firehose

  • Fully Managed Service, no administration, automatic scaling, serverless
  • Pay for data going through Firehose
  • Near Real Time
  • Supports many data formats, conversations, transformations, compression
  • Supports custom data transformations using AWS Lambda
  • Can send failed or all data to a backup S3 bucket

Kinesis Data Analytics (SQL application)

  • Perform real-time analytics on Kinesis Stream using SQL
  • Fully managed, no servers to provision
  • Automatic scaling
  • Real-time analytics
  • Pay for actual consumption rate
  • Can create streams out of the real-time queries
  • Use cases: Time-series analytics, Real-time dashboards, Real-time metrics

Use Enhanced Fanout feature of Kinesis Data Streams

Amazon Kinesis Data Streams (KDS) is a massively scalable and durable real-time data streaming service. KDS can continuously capture gigabytes of data per second from hundreds of thousands of sources such as website clickstreams, database event streams, financial transactions, social media feeds, IT logs, and location-tracking events.

By default, the 2MB/second/shard output is shared between all of the applications consuming data from the stream. You should use enhanced fan-out if you have multiple consumers retrieving data from a stream in parallel. With enhanced fan-out developers can register stream consumers to use enhanced fan-out and receive their own 2MB/second pipe of read throughput per shard, and this throughput automatically scales with the number of shards in a stream.

--

--