AWS Fundamentals

zhuting
9 min readJan 21, 2022

--

  • ELB + ASG
  • RDS + Aurora + ElasticCache

What is load balancing?

  • Load Balances are servers that forward traffic to multiple servers downstream

Why use a load balancer?

  • Spread load across multiple downstream instances
  • Expose a single point of access (DNS) to your application
  • Seamlessly handle failures of downstream instances
  • Provide SSL termination (HTTPS) for your websites
  • Enforce stickiness with cookies
  • High availability across zones
  • Separate public traffic from private traffic

When your load balancer is created, it receives a public DNS name that clients can use to send requests. The DNS servers resolve the DNS name of your load balancer to the public IP addresses of the load balancer nodes for your load balancer. Never resolve the IP of a load balancer as it can change with time. You should always use the DNS name.

Why use an Elastic Load Balancer?

  • An Elastic Load Balancer is a managed load balancer
    *AWS guarantees that it will be working
    *AWS takes care of upgrades, maintenance, high availability
    *AWS provides only a few configuration knobs
  • It costs less to setup your own load balancer but it will be a lot more effort on your end
  • It is integrated with many AWS offerings / services
    * EC2, EC2 Auto Scaling Groups, Amazon ECS
    * AWS Certificate Manager (ACM), CloudWatch
    *Route 53, AWS WAF, AWS Global Accelerator

Health Checks

  • Health Checks are crucial for Load Balancers
  • They enable the load balancer to know if instances it forwards traffic to are available to reply to requests
  • The health check is done on a port and a route (/health is common)
  • If the response is not 200 (OK), then the instance is unhealthy

Types of load balancer on AWS

  • Classic Load Balancer (v1 old generation) — 2009 — CLB
  • Application Load Balancer (v2 new generation) — 2016— ALB
  • Network Load Balancer (v2 new generation) — 2017 — NLB
  • Gateway Load Balancer — 2020 — GWLB
    *Deploy,scale and mange a fleet of 3rd party network virtual appliances in AWS
    *Example: Firewalls, Intrusion Detection and Prevention Systems, Deep Packet Inspection Systems, payload manipulation…
    *Operate at Layer 3(Network Layer) — IP Packets
    *Combines the following functions: Transparent Network Gateway; Load Balancer
    *Use the GENEVE protocol on port 6081

Sticky Sessions — Cookie Names

  • Application-based Cookies
    * Custom cookie
    * Application cookie
  • Duration-based Cookies
    * Cookie generated by the load balancer
    * Cookie name is AWSALB for ALB, AWSELB for CLB

What is an Auto Scaling Group?

  • In real-life, the load on your websites and application can change
  • In the cloud, you can create and get rid of servers very quickly
  • The goal of an Auto Scaling Group (ASG) is to:
    * Scale out (add EC2 instances) to match an increased load
    * Scale in (remove EC2 instances) to match a decreased load
    * Ensure we have a minimum and a maximum number of machines running
    * Automatically Register new instances to a load balancer

AWS RDS Overview

  • RDS stands for Relational Database Service
  • It’s a managed DB service for DB use SQL as a query language
  • It allows you to create databases in the cloud that is managed by AWS

Advantages over using RDS versus deploying DB on EC2

  • RDS is a managed service, automated provisioning, OS patching
  • Continuous backups and restore to specific timestamp (Point in Time Restore)
  • Monitoring dashboards
  • Read replicas for improved read performance
  • Multi AZ setup for DR (Disaster Recovery)
  • Maintenance windows for upgrades
  • Scaling capability (vertical and horizontal)
  • Storage backed by EBS (gp2 or io1)
  • But you can’t SSH into your instances

RDS Backups

  • Backups are automatically enabled in RDS
  • Automated backups:
    * Daily full backup of the database (during the maintenance window)
    * Transaction logs are backed-up by RDS every 5 minutes
    * => ability to restore to any point in time (form oldest backup to 5 minutes ago)
    * 7 days retention (can be increased to 35 days)
  • DB Snapshots: Manually triggered by the user; Retention of backup for as long as you want

RDS — Storage Auto Scaling

  • Helps you increase storage on your RDS DB instance dynamically
  • When RDS detects you are running out of free database storage, it scales automatically
  • Avoid manually scaling your database storage
  • You have to set Maximum Storage Threshold
  • Automatically modify storage if:
    free storage is less than 10% of allocated storage;
    low-storage lasts at least 5 minutes;
    6 hours have passed since the last modification
  • Useful for applications with unpredictable workloads
  • Supports all RDS database engines (MariaDB, MySQL, PostgreSQL, SQL Server, Oracle)

RDS Read Replicas for read scalability

  • Up to 5 Read Replicas
  • Within AZ, Cross AZ or Cross Region
  • Replication is ASYNC, so reads are eventually consistent
  • Replicas can be promoted to their own DB
  • Applications must update the connection string to leverage read replicas

RDS Read Replicas — Network Cost

  • In AWS there is a network cost when data goes from one AZ to another
  • For RDS Read Replicas within the same region, you don’t pay that fee

RDS applies OS updates by performing maintenance on the standby, then promoting the standby to primary, and finally performing maintenance on the old primary, which becomes the new standby

Running a DB instance as a Multi-AZ deployment can further reduce the impact of a maintenance event because Amazon RDS applies operating system updates by following these steps:

Perform maintenance on the standby.

Promote the standby to primary.

Perform maintenance on the old primary, which becomes the new standby.

When you modify the database engine for your DB instance in a Multi-AZ deployment, then Amazon RDS upgrades both the primary and secondary DB instances at the same time. In this case, the database engine for the entire Multi-AZ deployment is shut down during the upgrade.

Amazon RDS automatically initiates a failover to the standby, in case the primary database fails for any reason — You also benefit from enhanced database availability when running your DB instance as a Multi-AZ deployment. If an Availability Zone failure or DB instance failure occurs, your availability impact is limited to the time automatic failover takes to complete.

Another implied benefit of running your DB instance as a Multi-AZ deployment is that DB instance failover is automatic and requires no administration. In an Amazon RDS context, this means you are not required to monitor DB instance events and initiate manual DB instance recovery in the event of an Availability Zone failure or DB instance failure.

RDS -From Single-AZ to Multi-AZ

  • Zero downtime operation (no need to stop the DB)
  • Just click on “modify” for the database
  • The following happens internally: a snapshot is taken; a new DB is restored from the snapshot in a new AZ; Synchronization is established between the two databases

RDS Security — Encryption

At Rest encryption
* Possibility to encrypt the master & read replicas with AWS KMS — AES-256 encryption
* Encryption has to be defined at launch time
* If the master is not encrypted, the read replicas cannot be encrypted
* Transparent Data Encryption (TDE) available for Oracle and SQL Server

In-flight encryption
* SSL certificates to encrypt data to RDS in flight
* Provide SSL options with trust certificate when connecting to database

RDS Encryption Operations

Encrypting RDS backups

  • Snapshots of un-encrypted RDS databases are un-encrypted
  • Snapshots of encrypted RDS databases are encrypted
  • Can copy a snapshot into an encrypted one

To encrypt an un-encrypted RDS database

  • Create a snapshot of the un-encrypted database
  • Copy the snapshot and enable encryption for the snapshot
  • Restore the database from the encrypted snapshot
  • Migrate applications to the new database, and delete the old database

RDS Security — Network & IAM

Network Security

  • RDS databases are usually deployed within a private subnet, not in a public one
  • RDS security works by leveraging security groups (the same concepts as for EC2 instances) -it controls which IP / security group can communicate with RDS

Access Management

  • IAM policies help control who can manage AWS RDS (through the RDS API)
  • Traditional Username and Password can be used to login into the database
  • IAM-based authentication can be used to login into RDS MySQL & PostgreSQL

RDS — IAM Authentication

  • IAM database authentication works with MySQL and PostgreSQL
  • You don’t need a password, just an authentication token obtained through IAM & RDS API calls
  • Auth token has a lifetime of 15 minutes
  • Benefits:
    Network in/out must be encrypted using SSL
    IAM to centrally manage users instead of DB
    Can leverage IAM Roles and EC2 Instance profiles for easy integration

Amazon Aurora

  • Aurora is a proprietary technology from AWS (Not open sourced)
  • Postgres and MySQL are both supported as Aurora DB (that means your drivers will work as if Aurora was a Postgres or MySQL database)
  • Aurora is “AWS cloud optimized” and claim 5X performance improvement over MySQL on RDS, over 3X the performance of Postgres on RDS
  • Aurora storage automatically grows in increments of 10GB, up to 64 TB
  • Aurora can have 15 replicas while MySQL has 5, and the replication process is faster (sub 10 ms replica lag)
  • Failover in Aurora is instantaneous. It’s HA (High Availability) native.
  • Aurora costs more than RDS (20% more) but is more efficient

Features of Aurora

  • Automatic fail-over
  • Backup and Recovery
  • Isolation and security
  • Industry compliance
  • Push-button scaling
  • Automated Patching with Zero Downtime
  • Advanced Monitoring
  • Routine Maintenance
  • Backtrack: restore data at any point of time without using backups

Aurora Security

  • Similar to RDS because uses the same engines
  • Encryption at rest using KMS
  • Automated backups, snapshots and replicas are also encrypted
  • Encryption in flight using SSL (same process as MySQL or Postgres)
  • Possibility to authenticate using IAM token (same method as RDS)
  • You are responsible for protecting the instance with security groups
  • You can’t SSH

Amazon ElasticCache Overview

  • The same way RDS is to get managed Relational Database…
  • ElastiCache is to get managed Redis or Memcached
  • Caches are in-memory databases with really high performance, low latency
  • Helps reduce load off of databases for read intensive workloads
  • Helps make your application stateless
  • AWS takes care of OS maintenance / patching, optimizations, setup, configuration, monitoring, failure recovery and backups
  • Using ElastiCache involves heavy application code changes
  • Applications queries ElastiCache, if not available, get from RDS and store in ElastiCache
  • Helps relieve load in RDS
  • Cache must have an invalidation strategy to make sure only the most current data is used i there

Caching Implementation Considerations

  • Is it safe to cache data? Data may be out of date, eventually consistent
  • Is caching effective for that data?
    Pattern: data changing slowly, few keys are frequently needed
    Anti patterns: data changing rapidly, all large key space frequently needed
  • Is data structured well for caching?
    example: key value caching or caching of aggregations results

Which caching design pattern is the most appropriate?

Lazy Loading/ Cache-Aside / Lazy Population

Write Through — Add or Update cache when database is updated

Cache Evictions and Time-to-live(TTL)

  • Cache eviction can occur in 3 ways: You delete the item explicitly in the cache; Items is evicted because the memory is full and it’s not recently used (LRU); You set an item time-to-live (or TTL)
  • TTL are helpful for any kind of data: Leaderboards, Comments, Activity streams
  • TTL can range from a few seconds to hours or days
  • If too many evictions happen due to memory, you should scale up or out

Final words of wisdom

  • Lazy loading / Cache aside is easy to implement and works for mang solutions as a foundation, especially on the read side
  • Write-through is usually combined with lazy loading as targeted for the queries or workloads that benefit from this optimization
  • Setting a TTL is usually not a bad idea, except when you are using Write-through. Set it to a sensible value for your application
  • Only Cache the data that makes sense (user profiles, blogs, etc…)

--

--