AWS DynamoDB — NoSQL Serverless Database

zhuting
5 min readJan 29, 2022

NoSQL databases

  • NoSQL databases are non-relational databases and are distributed
  • NoSQL database include MongoDB, DynamoDB
  • NoSQL databases do not support query joins (or just limited support)
  • All the data that is needed for a query is present in one row
  • NoSQL databases don’t perform aggregations such as “SUM”, “AVG”…
  • NoSQL databases scale horizontally

There is no “right or wrong” for NoSQL vs SQL, they just require to model the data differently and think about user queries differently.

Amazon DynamoDB

  • Fully managed, highly available with replication across multiple AZs
  • NoSQL database — not a relational database
  • Scales to massive workloads, distributed database
  • Millions of requests per second, trillions of rows, 100s of TB of storage
  • Fast and consistent in performance (low latency on retrieval)
  • Integrated with IAM for security, authorization, and administration
  • Enables event driven programming with DynamoDB Streams
  • Low cost and auto-scaling capabilities

DynamoDB Basics

  • DynamoDB is made of Tables
  • Each table has a Primary Key (must be decided at creation time)
  • Each table can have an infinite number of items (= rows)
  • Each item has attributes (can be added over time, can be null)
  • The maximum size of an item is 400KB
  • Data types supported are:
    Scalar (String,Number,Binary,Boolean,Null),
    Document (List, Map),
    Set (String Set, Number Set, Binary Set)

DynamoDB Primary Keys

  • Option 1: Partition Key(HASH)
  • Option 2: Partition Key + Sort Key (HASH + RANGE)
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SecondaryIndexes.html

DynamoDB — Read / Write Capacity Modes

  • Control how you manage your table’s capacity ( read/ write throughput)
  • Provisioned Mode (default)
  • On-Demand Mode
  • You can switch between different modes once every 24 hours

Read /Write Capacity Modes — Provisioned

  • Table must have provisioned read and write capacity unit
  • RCU (Read Capacity Units)
  • WCU (Write Capacity Units)
  • Option to setup auto-scaling of throughput to meet demand
  • Throughput can be exceeded temporarily using “Burst Capacity”
  • If Burst Capacity has been consumed, you will get a ProvisionedThroughExceededException
  • It’s then advised to do an exponential backoff retry

DynamoDB — Read /Write Capacity Units (WCU)

  • One Write Capacity Unit (WCU) represents one write per second for an item up to 1 KB in size
  • One Read Capacity Unit (RCU) represents one Strongly Consistent Read per second, or two Eventually Consistent Reads per second, for an item up to 4 KB in size
  • Eventually Consistent Read(default) If we read just after a write, it’s possible we will get some stale data because of replication
  • Strongly Consistent Read — If we read just after a write, we will get the correct data. Set “ConsistentRead” parameter to True in API calls (GetItem, BathGetItem, Query, Scan). Consumes twice the RCU

DynamoDB — Partitions Internal

  • Data is stored in partitions
  • Partition Keys go through a hashing algorithm to know which partition they go to
  • WCUs and RCUs are spread evenly across partitions

DynamoDB — Throttling

  • If we exceed provisioned RCUs or WCUs, we get ProvisionedThroughExceedException
  • Reasons:
    Hot Keys,
    Hot Partitions,
    Very large items
  • Solutions:
    Exponential backoff when an exception is encountered (already in SDK), Distributed partition keys (as much as possible),
    If RCU issue, we can use DynamoDB Accelerator (DAX)

Read /Write Capacity Modes — On-Demand

  • Read /Write automatically scale up/ down with your workloads
  • No capacity planning needed (WCU / RCU)
  • Unlimited WCU & RCU, no throttle, more expensive
  • You are charged for reads / writes that you use in terms of RRU and WRU
  • Read Request Units — throughput for reads
  • Write Request Units — throughput for writes
  • 2.5x more expensive than provisioned capacity (use with care)
  • Use cases: unknown workloads, unpredictable application traffic…

DynamoDB — Writing Data

  • PutItem
    * Creates a new item or fully replace an old item (same Primary Key)
    * Consumes WCUs
  • UpdateItem
    * Edits an existing item’s attributes or adds a new item if it doesn’t exist
    * Can be used to implement Atomic Counter
  • Conditional Writes
    * Accept a write/update/delete only if conditions are met, otherwise returns an error
    * Helps with concurrent access to items
    * No performance impact

DynamoDB — Reading Data

  • GetItem
  • Query returns items
  • Scan the entire table and then filter out data (inefficient)

DynamoDB — Deleting Data

  • DeleteItem
  • DeleteTable

DynamoDB — Batch Operations

  • BatchWriteItem
  • BatchGetItem

DynamoDB Index

  • Local Secondary Index (LSI) — must be defined at table creation time
  • Global Secondary Index (GSI) — can be added/modified after table creation; if the writes are throttled on the GSI, then the main table will be throttled;

DynamoDB — Indexes and Throttling

  • Global Secondary Index (GSI)
    If the writes are throttled on the GSI, then the main table will be throttled
    Even if the WCU on the main tables are fine
    Choose your GSI partition carefully
    Assign your WCU capacity carefully
  • Local Secondary Index (LSI)
    Uses the WCUs and RCUs of the main table
    No special throttling considerations

DynamoDB — Optimistic Locking

  • DynamoDB has a feature called “Conditional Writes”
  • A strategy to ensure an item hasn’t changed before you update/ delete it
  • Each item has an attribute that acts as a version number

DynamoDB Streams

DynamoDB Streams & AWS Lambda

DynamoDB Streams — Time To Live (TTL)

DynamoDB — Write Types

DynamoDB Large Objects Pattern

DynamoDB Indexing S3 Objects Metadata

DynamoDB Operations

DynamoDB Secutiry & Other Features

DynamoDB Fine-Grained Access Control

--

--