0% found this document useful (0 votes)
32 views

BECE 355L AWS Cloud Module 3 Total

Uploaded by

manansakhiya3112
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

BECE 355L AWS Cloud Module 3 Total

Uploaded by

manansakhiya3112
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 133

Course Material

For
Module III
BECE355L-AWS for Cloud Computing
Course Contents Module 6: AWS Operational
Experiences
Module 7: AWS networking and content
delivery

Module 5:
AWS
Architectural
Best Module 4: AWS Security and compliance
practices

Module 3: AWS
Database
services

Module 2: AWS core services

Module 1: AWS cloud concepts


Topics in Module 3

Module:3 AWS Database services


AWS Lambda, Amazon Dynamo DB, Amazon ECS (Elastic Container Service),
Amazon S3 Glacier, Amazon Kinesis, Amazon Redshift, Amazon EMR (Elastic
MapReduce), AWS Disaster Recovery and Backup.
TOPIC 1 : AWS LAMBDA
AWS Lambda Topic 1: AWS Lambda
• AWS (Amazon Web Services) function is the new
computing system that offers a variety of benefits
like real-time data processing and custom back-
end service.
• AWS Lambda is an Amazon serverless computing
system that runs code and automatically manages
the underlying computing resources like (EC2).
• It is an event-driven computing service. It lets a
person automatically run code in response to many
types of events, such as HTTP requests from the • A serverless service in general will not
Amazon API gateway, table updates in Amazon require any server provision to run the
DynamoDB, and state transitions. application. When you run an application
• It also enables the person to extend to other AWS on serverless, users will not have to worry
services with custom logic and even creates its about setting up the operating system,
own back-end services. patching, or extending the servers that you
will have to consider when running your
application on a physical server.
AWS Lambda
• AWS lamba performs all the
administrative duties of that
compute resource such as:
providing maintenance on the
server and operating system,
auto-scaling and managing the
person’s capacity provisions,
handling security patch
deployment, lambda extensions
code, monitoring, logging, and
concurrency function blueprints.

• AWS lambda is very helpful when you know how to write code but you don’t know how to
provision the underlying infrastructure in AWS. AWS lambda will scale up the applications
rapidly when there is sudden incoming traffic and scale down to zero when the incoming
traffic is reduced.
How AWS Lambda works ? Topic 1: AWS Lambda
• AWS Lambda is an event-driven, serverless
computing platform provided by Amazon as a
part of Amazon Web Services. It is a computing
service that runs code in response to events and
automatically manages the computing resources
required by that code.
• We can trigger Lambda from over 200 AWS
services so that means Lambda has incredible
natural integrations with AWS resources and
also able to integrate SAAS applications.
• Run code without provisioning or managing
infrastructure. Simply write and upload code as • Save costs by paying only for the compute
a .zip file or container image. Automatically time you use by per-millisecond instead of
respond to code execution requests at any scale, provisioning infrastructure upfront for
from a dozen events per day to hundreds of
thousands per second. peak capacity. We call this pay-as-you-go.
• Optimize code execution time and performance with the right function memory size. Respond to high demand in
double-digit milliseconds with Provisioned Concurrency.
How AWS Lambda works ? Topic 1: AWS Lambda

Detailed Example Workflow


The infrastructure provided by AWS, including the S3 bucket
and Lambda service, operates within the AWS Cloud.
1.Image Upload:
•Action: A user uploads an image to the S3 bucket.
•Event: The upload triggers an event configured to invoke
the Lambda function.
2.Event Trigger:
•S3 Event: The S3 bucket is configured to send an event to
Lambda when a new image is uploaded.
3.Lambda Execution:
•Lambda Function: The Lambda service invokes the specified
Lambda function with the event data.
•Function Code: The function code retrieves the uploaded
image, processes it to generate a thumbnail, and uploads the
thumbnail back to a specific location in the S3 bucket.
4.Post-Processing:
•S3 Storage: The thumbnail is stored in the thumbnails/ directory of the same S3 bucket.
•Completion: The Lambda function execution completes, and the generated thumbnail is now available in the S3 bucke
How AWS Lambda works ? Topic 1: AWS Lambda

• Each Lambda function runs in its own


container. You can think every lambda
function as a standalone docker containers.
• When a function is created, Lambda packages
it into a new container and then executes that
container on a multi-region cloud clusters of
servers managed by AWS.
• Before the functions start running, each
function’s container is allocated its necessary
RAM and CPU capacity that parameters are
configurable in aws lambda.
How AWS Lambda works ? Topic 1: AWS Lambda
• When the functions finish running, there is a calculation; the
allocated RAM and the function execution time is multiplied and
calculated charged cost to the customer. So that means customers
charged based on the allocated memory and the amount of execution
time the function finished.
• AWS Lambda’s entire infrastructure layer is managed by AWS.
Clients don’t get a lot of visibility into how the system is running,
but they’re also don’t need to know about underlying machines,
network contention, etc. they don’t have to worry about things like;
AWS handles this itself.
• Using AWS Lambda can save you time on operational tasks because
the service is fully managed.
• When there is no infrastructure to maintain, you can spend more
time on application code and your actual business logics; that means
it giving up the flexibility about to your infrastructure.
How AWS Lambda works ? :STEPS Topic 1: AWS Lambda
1. Upload your AWS Lambda code or write it directly in the Lambda code editor in a supported
language such as Java, Python, Go, or C#.
2. AWS Lambda helps upload code and event information to trigger it, specifying the triggering
conditions.
3. An AWS Lambda function runs in the Lambda runtime environment and is typically used for short-
term functions.
4. AWS manages the entire infrastructure layer of Lambda, selecting the best resources to execute the
code when an event occurs, allowing efficient IT infrastructure management.
5. The control plane component of Lambda consists of APIs that simplify using AWS resources for
application execution, allowing concurrent execution of multiple instances of the same or different
functions within the same AWS account.
6. Consumers are charged based on allocated memory and execution time, with pricing applicable only
when the Lambda code runs.
AWS Lambda :Key Concepts Topic 1: AWS Lambda
Function : A function in AWS Lambda is a small, self-contained piece of code
written in a supported language that performs a specific task.
• Example: An image processing function that creates thumbnails.
Event Source: An event source is a resource or service that triggers a Lambda
function to execute.
• Example: An S3 bucket event, or an API Gateway request.
Handler: The entry point for the Lambda function. It's the method or function in
your code that AWS Lambda calls to start execution.
• Example: In a Python Lambda function, the handler might be
lambda_function.lambda_handler.
Runtime: The runtime is the execution environment that AWS Lambda uses to
run your function code. It supports several languages.
• Example: Node.js, Python, Java, Go, Ruby, .NET Core.
AWS Lambda :Key Concepts Topic 1: AWS Lambda
Memory Allocation: The amount of memory allocated to your function. This
affects both the performance and cost of execution.
• Example: Allocating 512 MB of memory for a Lambda function.
Execution Role: An IAM role that the function assumes during execution to
access other AWS services.
• Example: A role with permissions to access an S3 bucket and DynamoDB table.
Concurrency: Concurrency controls the number of instances of your function
that can run simultaneously.
• Example: Setting reserved concurrency to 10 to limit the function to 10
concurrent executions.
Timeout: The maximum amount of time that a Lambda function can run before it
is terminated.
• Example: Setting a timeout of 30 seconds for a Lambda function.
AWS Lambda Main Features Topic 1: AWS Lambda
Lambda is a compute service that lets you run
code without provisioning or managing any
servers.
• Cost Saving with Pay-as-you-go model:
Customers charged based on the allocated
memory and the amount of execution time the
function finished. You only pay for the
compute time and there is no charge when your
code is not running.
• Event-driven Architecture with Lambda
Lambda is an on-demand compute service that
runs custom code in response to events. Most
AWS services generate events, and many can
act as an event source for Lambda.
AWS Lambda Main Features Topic 1: AWS Lambda
• Scalability and Availability
• With lambda you can run code for virtually any type of applications or backend
services all with zero administration. Upload your code and run your code and scale
your code automatically with high availability.
• Supports Multiple Languages and Frameworks
• Lambda has native support for a number of programming languages including Java,
Go, PowerShell, Node.js, C#, Python, and Ruby code, and provides a Runtime API that
lets you use any additional programming language to write your functions.
• It has a feature that offers sample code that helps the users to integrate Lambda without
any other help.
• Database Proxy is a feature of Lambda that controls a pool of database connections and
redirects requests from one function to another.
• Amazon Elastic File System is used to securely read, write and persist huge amounts of
data of any size.
AWS Lambda Use cases Topic 1: AWS Lambda
HTTP API
• Using AWS Lambda, you can integrate or deploy logical backends to the cloud and invoke functions with caution just by
calling HTTP.
• When integrating Amazon API Gateway with AWS Lambda, not only will costs be minimized (VTI Cloud will speak more
detail in the next section), but users will also minimize the effort to operate servers as well as expand services.
Data Processing
• For example, if an application handles a lot of data stored in Amazon DynamoDB, you can trigger Lambda functions whenever
you write, update, or delete items in that table.
• These events trigger lambda functions that will
process, analyze, and can push this data to other
AWS services such as Amazon S3 to store
results.
• In other words, users can create an entire data
processing process by combining different AWS
resources.

Amazon DynamoDB : It's designed to handle large


amounts of data and provide high performance,
scalability, and reliability.
AWS Lambda Use cases Topic 1: AWS Lambda
Real-time file processing
• Usually, Content Management System (CMS) applications for content management always have the
function of uploading images. This image will be saved on the Amazon S3 bucket. AWS Lambda can
now be used to create an automated task.
• For example, after uploading an image, triggers to create another version of the shape with a lower
resolution (thumbnails) and be at another bucket.
Real-time stream processing
For applications with “tremendous” traffic, the
system often uses AWS Lambda and Amazon
Kinesis Stream services to process real-time
streaming data for application tracking
activities, or real-time studies of various data
from metrics collect from many data sources
such as clickstream websites, Payment
transactions, Social media timeline, IT logs or
Location-based tracking.
Topic 1: AWS Lambda
The service costs
• With AWS Lambda, you pay only for what you use. You are charged
based on the number of requests for functions and duration, code
execution time.
• Duration is calculated from when the code is enforced to the time the
execution is complete or terminate, rounded to the nearest 1ms. The cost
will be based on the capacity of the layout allocated to the function.
• AWS Lambda free usage tier includes 1M free requests and 400,000
GB-s of calculated time per month.
INTRODUCTION: Amazon Dynamo DB Topic 2: Amazon Dynamo DB
WHY NO SQL
• Amazon DynamoDB is a fully managed NoSQL
database service provided by AWS.
• It is designed for applications that need
consistent, single-digit millisecond latency at any
scale.
• DynamoDB is ideal for use cases that require
high performance and scalability, such as web online analytical processing Online Transaction Processing
applications, mobile backends, gaming, and IoT.
• For example You are developing an online retail • Requirements
application similar to Amazon or eBay. This • High Availability, Low Latency, Scalability,
application needs to handle millions of Data Model Flexibility.
transactions per second during peak shopping
times, like Festive Season. You need a database
that can provide fast performance and can
automatically scale to meet demand.
INTRODUCTION: Amazon Dynamo DB Topic 2: Amazon Dynamo DB
• Non-relational databases, or NoSQL databases, differ from traditional relational databases by storing
data in non-tabular formats. These systems are highly available, scalable, and optimized for
performance. Unlike relational databases, NoSQL databases like Amazon DynamoDB use data models
such as key-value pairs and document storage for data management.
• Amazon DynamoDB is a serverless, NoSQL database service that enables you to develop modern
applications at any scale.
• As a serverless database, you only pay for what you use and DynamoDB scales to zero, has no cold
starts, no version upgrades, no maintenance windows, no patching, and no downtime maintenance.
• DynamoDB offers a broad set of security controls and compliance standards.
• DynamoDB is cloud-native in that it does not run on-premises or even in a hybrid cloud; it only runs on
Amazon Web Services (AWS).
• This enables it to scale as needed without requiring a customer’s capital investment in hardware.
• It also has attributes common to other cloud-native applications, such as elastic infrastructure
deployment (meaning that AWS will provision more servers in the background as you request
additional capacity).
• DynamoDB is NoSQL in that it does not support ANSI Structured Query Language (SQL). Instead, it
uses a proprietary API based on JavaScript Object Notation (JSON).
INTRODUCTION: Amazon Dynamo DB Topic 2: Amazon Dynamo DB

• It can store any amount of data and serve


any amount of traffic to any extent. With
DynamoDB, you can expect great
performance even when it scales up. It’s a
really simple and small API that follows
the key-value method to store, access, and
perform advanced data retrieval.
• DynamoDB is a web service, and
interactions with it are stateless.
Applications are not required to maintain
persistent network connections. Instead,
interaction with DynamoDB occurs
using HTTP(S) requests and responses.
Features: Amazon Dynamo DB Topic 2: Amazon Dynamo DB

• Fully Managed DynamoDB handles database management tasks like


provisioning, patching, setup, configuration, scaling, and backups, letting you
focus on application development.
• Scalable DynamoDB allows easy scaling of tables up or down to meet
application demands, avoiding overprovisioning or under provisioning.
• High Availability and Durability DynamoDB ensures high availability and
data durability by replicating data across multiple Availability Zones within an
AWS region.
• Security DynamoDB provides encryption at rest and in transit, fine-grained
access control, and integration with AWS IAM for authentication.
• Global Tables DynamoDB supports global tables, enabling data replication
across AWS regions for low-latency access worldwide.
Key Indicators: Amazon Dynamo DB Topic 2: Amazon Dynamo DB
DynamoDB is designed in such a way that
the user can get high-performance, run
scalable applications that might not be
possible with the normal database system.
• On-demand capacity mode: The
applications using the on-demand
service, DynamoDB automatically
scales up/down to accommodate the
traffic.
• Built-in support for ACID (Atomicity, Point-in-time recovery: This feature helps you
Consistency, Isolation, and
Durability) transactions: DynamoDB with the protection of your data just in case of
provides native/ server-side support for accidental read/ write operations.
transactions. Encryption at rest: It keeps the info encrypted
• On-demand backup: This feature even when the table is not in use. This enhances
allows you to make an entire backup of security with the assistance of encryption keys.
your work om any given point in time.
Key Indicators: Amazon Dynamo DB Topic 2: Amazon Dynamo DB
Serverless performance with limitless scalability
• Supports key-value and document data models with a flexible schema
• Serverless, with no servers to manage and automatic scaling to zero
• Pay-as-you-go pricing with zero administration and downtime maintenance
• Supports ACID transactions for mission-critical workloads, ensuring:
Atomicity-Consistency-Isolation- Durability.
Active-active replication with global tables
• Provides active-active replication across AWS Regions with 99.999% availability.
• Enables multi-active reads and writes from any replica.
• Offers single-digit millisecond performance with local access in selected Regions.
• Automatically scales capacity for multi-Region workloads.
• Enhances multi-Region resiliency and business continuity.
Key Indicators: Amazon Dynamo DB Topic 2: Amazon Dynamo DB
• Amazon DynamoDB Streams:
• Captures changes to DynamoDB table items in near-real time.
• Records create, update, and delete events in a time ordered sequence.
• Ideal for event-driven architecture applications.
• Stores changes for 24 hours, with de-duplication and exact-once delivery.
• Allows applications to access and view item changes before and after modification.
• Security and Reliability:
• Encryption: Default encryption at rest using AWS KMS keys, with optional attribute-level
encryption.
• Access Control: Fine-grained access control using IAM policies, resource-based policies, and
conditions.
• Point-in-time Recovery: Continuous backups for up to 35 days, with restore capability to any point in
time.
• On-demand Backup and Restore: Full backups for data archiving, with AWS Backup integration.
• Private Network Connectivity: Supports VPC endpoints and AWS PrivateLink for private
connections
How it works: Amazon Dynamo DB Topic 2: Amazon Dynamo DB

• Configuring Key Features: Configure core features such as global tables for multi-region replication,
encryption at rest for security, and on-demand capacity mode for automatic scaling based on demand.
• Data Management: Use NoSQL Workbench to design and visualize tables, and PartiQL for SQL-like
querying. Enable point-in-time recovery to back up and restore data as needed.
• Integrating with AWS Services: DynamoDB integrates with other AWS services seamlessly. Export
data to Amazon S3 for backup. Use AWS Glue Elastic Views for materialized views and Amazon Kinesis
Data Streams for real-time processing. Monitor activities and performance using AWS CloudTrail and
Amazon CloudWatch.
Auto Scaling: Amazon Dynamo DB Topic 2: Amazon Dynamo DB
• In order to automatically alter provided
throughput capacity on your behalf in
response to actual traffic patterns,
Amazon DynamoDB auto scaling
makes use of the AWS Application
Auto Scaling service.
• In order to accommodate sudden
increases in traffic without throttling, a
table or global secondary index might
enhance the read and write capacity of
the data it is provided with.
• Application Auto Scaling reduces
throughput when the workload falls so
that you won’t be charged for unused
allotted capacity.
Use cases: Amazon Dynamo DB Topic 2: Amazon Dynamo DB
Develop software applications
• Build internet-scale applications supporting user-content metadata and caches that require
high concurrency and connections for millions of users and millions of requests per second.
Create media metadata stores
• Scale throughput and concurrency for media and entertainment workloads such as real-time
video streaming and interactive content, and deliver lower latency with multi-Region
replication across AWS Regions.
Deliver seamless retail experiences
• Use design patterns for deploying shopping carts, workflow engines, inventory tracking, and
customer profiles. DynamoDB supports high-traffic, extreme-scaled events and can handle
millions of queries per second.
Scale gaming platforms
• Focus on driving innovation with no operational overhead. Build out your game platform
with player data, session history, and leaderboards for millions of concurrent users.
• What is Amazon DynamoDB used for?
• Amazon DynamoDB is a NoSQL database designed for high-performance
applications at any scale. It offers high read/write throughput with single-
digit millisecond performance and limitless scalability across multiple
regions, supporting horizontal scaling and fully managed automation.
• What are the advantages of Amazon DynamoDB?
• Amazon DynamoDB's key advantages include being a fully managed, scale-
to-zero serverless database with single-digit millisecond performance and up
to 99.999% availability. It provides consistent performance at scale, built-in
security, durability, and reliability for global applications.
• What are the main benefits of using Amazon DynamoDB?
• Amazon DynamoDB is a fully managed, serverless NoSQL database
offering limitless scalability, active-active data replication for multi-region
resiliency, and consistent single-digit millisecond response times. It is easy
to start and use, ideal for demanding applications.
Amazon ECS (Elastic
Container Service)
Introduction: Amazon ECS Topic 3: Amazon ECS (Elastic Container Service)
1. Container:
• Definition: A container is a lightweight, standalone, and executable software package that
includes everything needed to run a piece of software, including the code, runtime, system
tools, libraries, and settings.
• Key Feature: Containers are isolated from each other and the host system, which ensures
that the software will run consistently across different environments.
• Example: Think of a container as a virtualized package that can run your application on any
machine without worrying about the underlying hardware or operating system.
2. Docker:
• Definition: Docker is a platform that allows you to develop, ship, and run applications inside
containers. It simplifies the process of building and deploying containers.
• Key Feature: Docker provides tools to create, manage, and deploy containers efficiently.
• Example: Docker allows developers to package their application into a Docker container on
their laptop, test it, and then deploy the same container to production without worrying about
differences in environments.
Introduction: Amazon ECS Topic 3: Amazon ECS (Elastic Container Service)

• Amazon Elastic Container Service (ECS), also known as Amazon EC2 Container Service,
is a managed service that allows users to run Docker-based applications packaged as
containers across a cluster of EC2 instances.
• Running simple containers on a single EC2 instance is simple but running these
applications on a cluster of instances and managing the cluster is being administratively
heavy process.
• With ECS, Fargate launch type (serverless compute engine allows you to run containers
without managing the underlying infrastructure), the load, and responsibility of managing
the EC2 cluster is transferred over to the AWS and you can focus on application
development rather than management of your cluster architecture.
• AWS Fargate is the AWS service that allows ECS to run containers without having to
manage and provision the resources required for running these applications.
• It deeply integrates with the AWS environment to provide an easy-to-use solution for
running container workloads in the cloud and on premises with advanced security features
using Amazon ECS Anywhere.
Introduction: Amazon ECS Topic 3: Amazon ECS (Elastic Container Service)

Amazon ECS terminology and


components
• There are three layers in Amazon
ECS:
• Capacity - The infrastructure
where your containers run.
• Controller - Deploy and manage
your applications that run on the
containers.
• Provisioning - The tools that you
can use to interface with the
scheduler to deploy and manage
your applications and containers.
Introduction: Amazon ECS Topic 3: Amazon ECS (Elastic Container Service)

Amazon ECS capacity


Amazon ECS capacity is the infrastructure where your containers run. The following
is an overview of the capacity options:
• Amazon EC2 instances in the AWS cloud.
You choose the instance type, the number of instances, and manage the capacity.
• Serverless (AWS Fargate (Fargate)) in the AWS cloud
Fargate is a serverless, pay-as-you-go compute engine. With Fargate you don't
need to manage servers, handle capacity planning, or isolate container workloads
for security.
• On-premises virtual machines (VM) or servers.
Amazon ECS Anywhere provides support for registering an external instance
such as an on-premises server or virtual machine (VM), to your Amazon ECS
cluster.
How Elastic Container Service Works? Topic 3: Amazon ECS (Elastic Container Service)
• Amazon elastic container service is a fully managed service which is provided by AWS it is mainly used to deploy
containers that are docker based by which you scale up and down depending on the traffic you’re going to get. The
containers will run inside the Amazon elastic cloud (EC2) instance.
1.Container: A container is a package
that includes an application and all its
dependencies, making it independent of
the underlying operating system. This
ensures portability, flexibility, and
scalability, allowing the application to
run consistently across different
environments.
2.Docker: Docker is software that
facilitates and automates the installation
and deployment of applications
inside Linux containers.
3.Cluster: A logic group of EC2
instances running as a single application.
4.Container Instance: Each EC2 in an
ECS Cluster is called a container
instance.
How Elastic Container Service Works? Topic 3: Amazon ECS (Elastic Container Service)

• Developers Role: Developers begin by defining their applications, including specifying the
required resources such as compute power, storage, and networking. They also provide
container images that encapsulate the application and its dependencies.
How Elastic Container Service Works? Topic 3: Amazon ECS (Elastic Container Service)
• Operators' Role: Operators then create and customize scaling and capacity rules to manage how
applications should scale based on demand. They also set up monitoring and logging to observe
application performance and health.
• Amazon ECS Management: Amazon ECS takes over to manage the application lifecycle. It handles:
• Configuration: Automatically integrates with AWS services such as Elastic Load Balancing for
traffic distribution, AWS Secrets Manager for sensitive data management, and Amazon Elastic File
System for shared storage.
• Deployment and Scaling: Deploys applications using various compute options:
• AWS Fargate: A serverless compute engine that abstracts away the underlying infrastructure
management.
• Amazon EC2: Provides instances optimized for various workloads, including EC2 Graviton and Intel-
based options.
• Amazon ECS Anywhere: Extends ECS to on-premises servers.
• AWS Outposts, Local Zones, and Wavelength: Offers deployment flexibility for local and edge
computing needs.
• Container Images: Container images are built and stored using Amazon Elastic Container Registry
(ECR) or other container registries. ECS uses these images to launch and run containers as specified by
the developers.
Use cases Topic 3: Amazon ECS (Elastic Container Service)

Microservices Architecture
• Use Case: An e-commerce platform with different services like user
management, payment processing, product catalog, and order management.
• How ECS Helps: Each service can be deployed as a separate container,
allowing independent scaling, updates, and management. ECS enables
orchestrating these containers, ensuring smooth communication between
services and automatic scaling based on demand.
Batch Processing
• Use Case: A financial institution needs to run intensive batch processing jobs
like data transformation, report generation, or fraud detection.
• How ECS Helps: Batch jobs can be containerized and run on ECS. The
service can automatically scale up the number of containers to process large
data sets quickly and scale down when the tasks are completed, optimizing
resource usage.
Use cases Topic 3: Amazon ECS (Elastic Container Service)
Web Application Hosting
• Use Case: A media company needs to host a high-traffic website with
dynamic content and media streaming capabilities.
• How ECS Helps: The web application and media streaming services can be
containerized and deployed on ECS. ECS ensures that the application scales
to handle traffic spikes and balances the load across multiple containers.
Gaming Backend Services
• Use Case: A gaming company needs a scalable backend to handle
matchmaking, player data storage, and real-time analytics.
• How ECS Helps: ECS can run backend services in containers that
automatically scale based on player activity. This ensures that the gaming
experience is smooth and responsive even during peak times.
Features Topic 3: Amazon ECS (Elastic Container Service)

• Scheduling: Schedulers place containers over clusters according to the


desired resources -- such as RAM or CPU -- and availability requirements.
This feature can be used to schedule batch jobs and long-running applications
or services.
• Amazon ECS includes two schedulers to deploy containers based on
computing needs or availability requirements. AWS Blox, an open source
container orchestration tool, integrates with ECS to schedule containers.
Long-running applications and batch jobs benefit from the use of schedulers
for their responsiveness; ECS also supports third-party scheduling options.
• Docker integration: Amazon ECS supports Docker, which enables AWS
users to manage Docker containers across clusters of Amazon EC2 instances.
Each EC2 instance in a cluster runs a Docker daemon that deploys and runs
any application packaged as a container locally on Amazon ECS without the
need to make any changes to the container.
Features Topic 3: Amazon ECS (Elastic Container Service)
• Networking: Amazon ECS supports Docker networking, as well as integration
with Amazon Virtual Private Cloud (Amazon VPC), to provide isolation for
containers. This provides developers with control over how the containers
interact with other services and external traffic. There are four networking
modes available for the containers; each one supports different use cases. The
modes include:
• Host mode: Adds containers directly to the host's network stack and exposes
containers on the network that are not isolated.
• Task networking mode: Assigns every running Amazon ECS task a dedicated
elastic networking interface which provides the containers with full networking
features in Amazon VPC similar to EC2 instances.
• None mode: Deactivates external networking for containers.
• Bridge mode: Creates a Linux bridge to connect all containers operating on the
host in a local virtual network and accessed through the host's default network
connection.
Features Topic 3: Amazon ECS (Elastic Container Service)

• Cluster Management: Amazon ECS automates cluster management, including installation,


operation, scaling, and monitoring. Developers only need to launch a cluster of container
instances and specify the tasks to perform.
• Tasks: Tasks are JSON templates where developers specify container requirements
(memory, CPU, Docker images, and data volumes) and their connections. This also allows
version control of application specifications.
• Load Balancing: Amazon ECS integrates with AWS ELB (Elastic Load Balancing) to
distribute traffic across containers, automatically managing container addition and removal
based on the specified Task and ELB.
• Repository Support: Developers can use any third-party or private Docker registry,
including Docker Hub, as specified in the Task.
• Local Development: The AWS CLI (Command Line Interface) simplifies local
development by setting up an ECS cluster and related resources and supports Docker
Compose for managing multi-container applications.
Features Topic 3: Amazon ECS (Elastic Container Service)
• Programmatic Control: APIs enable management of Amazon ECS by creating/deleting
clusters, launching/destroying containers, and accessing cluster status. AWS CloudFormation
can also deploy ECS clusters and manage Task s.
• Logging: Amazon CloudWatch Logs collects ECS agent and Docker container logs for
troubleshooting, while AWS CloudTrail records all ECS API calls and delivers log files to
users.
• Monitoring: Monitoring tools track container and cluster health, including average CPU and
memory utilization, with insights provided by Amazon CloudWatch. Users can set CloudWatch
alarms to get notifications for scaling needs.
• Container Deployments: ECS automatically starts new containers with updated images and
disables old versions, managing container registration with AWS ELB.
• Container Auto-Recovery: The ECS service scheduler recovers unhealthy containers to
ensure consistent application support.
• Container Security: EC2 instances in Amazon VPC can be selectively exposed to the internet,
and ECS tasks use IAM roles with security groups and network ACLs to control access.
Benefits Topic 3: Amazon ECS (Elastic Container Service)
Amazon ECS is ideal for small, cross-functional teams due to its easy setup and fast
deployment. As a fully managed AWS service, it lets teams focus on app migration rather
than platform issues.
• Improved Security: Amazon ECR and ECS enhance application security together.
• Cost Efficient: ECS allows high-density container scheduling on Amazon EC2 nodes.
• Performance at Scale: ECS can launch thousands of Docker containers quickly and
efficiently.
• Improved Compatibility: Container-based pipelines minimize deployment issues across
environments.
• Designed for Collaboration: ECS integrates with AWS services like ECR and ELB for a
complete containerized application solution.
• Manageable at Any Scale: ECS eliminates the need for cluster management, allowing
developers to focus on their applications.
• Extensible: ECS provides full visibility and control, easily integrating with or extending
through APIs.
Use Cases Topic 3: Amazon ECS (Elastic Container Service)
• Modernize applications
• Empower developers to build and deploy applications with enhanced security
features in a fast, standardized, compliant, and cost-efficient manner with Amazon
ECS.
• Automatically scale web applications
• Automatically scale and run web applications in multiple Availability Zones with the
performance, scale, reliability, and availability of AWS.
• Support batch processing
• Plan, schedule, and run batch computing workloads across AWS services, including
Amazon Elastic Compute Cloud (EC2), AWS Fargate, and Amazon EC2 Spot
Instances.
• Train NLP and AI/ML models
• Train natural language processing (NLP) and other artificial intelligence (AI) /
machine learning (ML) models without managing the infrastructure by using
Amazon ECS with AWS Fargate.
Amazon S3 Glacier
Introduction: Amazon S3 Glacier Topic 3: Amazon S3 Glacier

AWS provides a variety of storage services tailored to different needs, including:


• Highly Confidential Data: Services for secure storage.
• Frequently Accessed Data: Options for high-performance access.
• Infrequently Accessed Data: Cost-effective storage for less frequent use.
• AWS Glacier is a low-cost, long-term storage service designed for backups
and archival.
• It offers durable, secure storage at a fraction of the cost of AWS S3, with
prices as low as $1 per petabyte per month (1 million gigabytes, or 1,000
terabytes).
• Glacier interacts with S3 through S3 lifecycle policies but differs primarily in
its cost structure, making it ideal for infrequent access and archival needs.
Introduction Topic 3: Amazon S3 Glacier

• Amazon S3 Glacier storage classes are designed for data archiving, offering
high performance, flexible retrieval, and low-cost storage. They provide
virtually unlimited scalability and 99.99% data durability, with options for fast
access and minimal cost.
• You can choose from three S3 Glacier archive storage classes based on access
needs and cost:
• S3 Glacier Instant Retrieval: For immediate access to data (e.g., medical
images), with millisecond retrieval and the lowest storage cost.
• S3 Glacier Flexible Retrieval: For data not needing immediate access but
flexible retrieval (e.g., backups), offering retrieval in minutes or free bulk
retrievals in 5-12 hours.
• S3 Glacier Deep Archive: For long-term storage (e.g., compliance archives)
with the lowest cost and retrieval within 12 hours.
How does it work Topic 3: Amazon S3 Glacier

• The Amazon S3 Glacier storage classes are purpose-built for data archiving,
providing you with the highest performance, most retrieval flexibility, and the
lowest cost archive storage in the cloud. You can now choose from three
archive storage classes optimized for different access patterns and storage
duration.
How does it work Topic 3: Amazon S3 Glacier
• S3 Glacier Instant Retrieval offers up to 68% lower storage costs compared to S3 Standard-
IA for long-lived data accessed quarterly.
• It is ideal for rarely accessed, performance-sensitive use cases like image hosting, medical
imaging, and news media. It provides 99.999999999% (11 nines) durability and 99.9%
availability, with data stored redundantly across multiple AWS Availability Zones.
• Storage costs are lower than S3 Standard, but retrieval costs are slightly higher.
How does it work Topic 3: Amazon S3 Glacier
• S3 Glacier Flexible Retrieval offers up to 10% lower storage costs compared to S3 Glacier
Instant Retrieval for archive data accessed 1-2 times per year.
• It provides flexible retrieval options, balancing cost and access times from minutes to hours,
with free bulk retrievals.
• This class is ideal for backups, disaster recovery, and offsite storage, offering 99.99% durability
and 99.99% availability, with data stored redundantly across multiple AWS Availability Zones.
How does it work Topic 3: Amazon S3 Glacier

• S3 Glacier Deep Archive offers the lowest storage cost, up to 75% less than S3 Glacier
Flexible Retrieval, for data accessed less than once per year.
• At $0.00099 per GB-month, it’s a cost-effective alternative to on-premises tape. Suitable
for long-term data retention (7-10 years), it provides 99.999% durability and 99.99%
availability, with data stored across multiple AWS Availability Zones.
AWS Glacier Terminology Topic 3: Amazon S3 Glacier
1. Vaults: Vaults are virtual
containers that are used to store
data. Vaults in AWS Glacier are
similar to buckets in S3.
• Each Vault has its specific access
policies (Vault lock/access policies).
Thus providing you with more
control over who has what kind of
access to your data.
• Vaults are region-specific.
• 2. Archives: Archives are the
fundamental entity type stored in
Vaults. Archives in AWS Glacier are
similar to Objects in S3. Virtually
you have unlimited storage capacity
on AWS Glacier and hence, can store
an unlimited number of archives in a
vault.
AWS Glacier Terminology Topic 3: Amazon S3 Glacier
• 3. Vault Access Policies: In addition to the
basic IAM controls AWS Glacier offers Vault
access policies that help managers and
administrators have more granular control of
their data.
• Each vault has its own set of Vault Access
Policies.
• If either of Vault Access Policy or IAM control
doesn’t pass for some user action. The user is
not declared unauthorized.
• 4. Vault Lock Policies: Vault lock policies are
exactly like Vault access policies but once set,
they cannot be changed.
• Specific to each bucket.
• This helps you with data compliance controls.
For example- Your business administrators
might want some highly confidential data to be
only accessible to the root user of the account.
Vault lock policy for such a use case can be
written for the required vaults.
Features of AWS Glacier Topic 3: Amazon S3 Glacier
• Given the extremely cheap storage,
provided by AWS Glacier, it doesn’t
provide as many features as AWS S3.
Access to data in AWS Glacier is an
extremely slow process.
• Just like S3, AWS Glacier can
essentially store all kinds of data types
and objects.
• Durability: AWS Glacier, just like
Amazon S3, claims to have a 99.99%
of durability. This means the
possibility of losing your data stored
in one of these services one in a
billion. AWS Glacier replicates data
across multiple Availability Zones for
providing high durability.
• Data Retrieval Time: Data retrieval
from AWS Glacier can be as fast as 1-
5 minutes (high-cost retrieval) to 5-12
hours(cheap data retrieval).
Features of AWS Glacier Topic 3: Amazon S3 Glacier
• Given the extremely cheap storage,
provided by AWS Glacier, it doesn’t
provide as many features as AWS S3.
Access to data in AWS Glacier is an
extremely slow process.
• Just like S3, AWS Glacier can
essentially store all kinds of data types
and objects.
• Durability: AWS Glacier, just like
Amazon S3, claims to have a 99.99%
of durability. This means the
possibility of losing your data stored
in one of these services one in a
billion. AWS Glacier replicates data
across multiple Availability Zones for
providing high durability.
•AWS Glacier Console: The AWS Glacier dashboard is not as
• Data Retrieval Time: Data retrieval
from AWS Glacier can be as fast as 1- intuitive and friendly as AWS S3. The Glacier console can only be
5 minutes (high-cost retrieval) to 5-12 used to create vaults. Data transfer to and from AWS Glacier can
hours(cheap data retrieval). only be done via some kind of code. This functionality is provided
via: AWS Glacier API, AWS SDKs
Features of AWS Glacier Topic 3: Amazon S3 Glacier
• Security:
• AWS Glacier automatically encrypts
your data using the AES-256
algorithm and manages its keys for
you.
• Apart from normal IAM controls
AWS Glacier also has resource
policies (vault access policies and
vault lock policies) that can be used
to manage access to your Glacier
vaults.
• Infinite Storage Capacity:
Virtually AWS Glacier is supposed
to have infinite storage capacity.
Data Transfer In Glacier Topic 3: Amazon S3 Glacier
• 1. Data Upload
• Data can be uploaded to AWS Glacier by creating a vault from the
Glacier console and using one of the following methods:
• Write code that uses AWS Glacier SDK (Software Development Kit) to
upload data.
• Write code that uses AWS Glacier API to upload data.
• S3 Lifecycle policies: S3 lifecycle policies can be set to upload S3 objects to
AWS Glacier after some time. This can be used to backup old and
infrequently access data stored in S3.
• 2. Data Transfer Between Regions
• AWS Glacier is a region-specific service. Data in one region can be transferred
to another from the AWS console. This cost of suck a data transfer is $0.02.
Data Transfer In Glacier Topic 3: Amazon S3 Glacier
• 3. Data Retrieval
• As mentioned before, AWS Glacier is a backup and data archive service, given its low cost of storage,
AWS Glacier data is not readily available for consumption.
• Data retrieval from Glacier can only be done via some sort of code, using AWS Glacier SDK or the
Glacier API.
• Data Retrieval in AWS Glacier is of three types:
• Expedited:
• This mode of data retrieval is only suggested for urgent requirements of data.
• A single expedited retrieval request can only be used to retrieve 250MB of data at max.
• This data is then provided to you within 1-5 minutes.
• The cost of expedited retrieval is $0.03 per GB and 0.01 per request.
• Standard:
• This data retrieval mode can be used for any size of data, full or partial archive.
• This data is then provided to you within 3-5 hours.
• The cost of standard retrieval is $0.01 per GB and $0.05 per 1000 requests.
• Bulk:
• This data retrieval is suggested for mass retrieval of data (petabytes of data).
• It is the cheapest data retrieval option offered by AWS Glacier
• This data is then provided to you within 5-12 hours.
• The cost of bulk retrieval is 0.0025 per GB and 0.025 per 1000 requests
Benefits Topic 3: Amazon S3 Glacier

• Lower Cost: Glacier is intended to be Amazon’s most affordable storage class. When
compared to other Amazon storage service, this allows an organization to store large amounts
of data at a lower cost.
• Upholds archive database: It is not mandatory for an organization to keep up its own
archival database. Administrative duties including hardware and capacity planning are handled
by AWS.
• Durability: Glacier can restore data more easily in the event that it is lost in one of the three
actual AWS Availability Zones where it is distributed at any one time.
• Scalability: Businesses are able to adjust the amount of data they store as needed. Businesses
have a choice between bulk, standard, and expedited retrievals.
Data Transfer In Glacier Topic 3: Amazon S3 Glacier
Feature Amazon Glacier Amazon S3
Long-term archival of infrequently Frequently accessed data with low-
Use Case accessed data. latency requirements.

Lower storage costs, making it Generally higher storage costs, suitable


Storage Cost economical for long-term retention. for actively used data.

High durability with multiple storage


High durability with redundant storage
Durability across multiple facilities.
classes, each with its redundancy
strategy.

Longer retrieval times (Standard, Near-instantaneous retrieval for


Retrieval Time Expedited, Bulk options available). frequently accessed data.

Supports lifecycle policies for Supports lifecycle policies but focused


Lifecycle Policies automated data management. on different storage classes.

Supports AWS Transfer Acceleration for Supports AWS Transfer Acceleration for
Data Transfer Acceleration faster data uploads. faster data uploads.

Robust security features, including Robust security features, including


Security encryption at rest and in transit. encryption at rest and in transit.
Amazon Kinesis
Amazon Kinesis
• Amazon Kinesis is a service provided by Amazon Web Service that allows users to process a
large amount of data (which can be audio, video, application logs, website clickstreams, and
IoT telemetry) per second in real time.
• In today’s scenario handling a large amount of data becomes very important and for that,
there is a complete whole subject known as Big Data which works upon how to process or
handle the streams of large amounts of data.
• So Amazon came up with a solution known as Amazon Kinesis which is fully managed and
automated and can handle the real-time large streams of data with ease.
• It allows users to collect, store, capture, and process many logs from distributed streams such
as social media feeds.
• It makes users focus on the development by taking any amount of data from any source to
process it.
• After processing all the data, Kinesis also distributes all the data to the consumers
simultaneously.
Amazon Kinesis
• Amazon kinesis is used to analyze streaming data and process it for further use at
large amounts of scales.
• It is fully managed by Amazon itself so it is easy to easy to capture, process, and store
streaming data in the cloud.
• Kinesis Data Streams can be used for rapid and continuous data intake and
aggregation. The type of data used can include IT infrastructure log data, application
logs, social media, market data feeds, and web clickstream data. Because the response
time for the data intake and processing is in real time, the processing is typically
lightweight.
• It is very useful for the developers to build an application by which they can
continuously ingest process and analyze data streams from various sources as
mentioned below
• Application and service logs
• Clickstream data
• Sensor data
• In-app user events
How Amazon Kinesis Works?
• Data Ingestion
• Amazon kinesis will collect the data or receives the data from the different data streams
like application, sensors, and so on. The data that is going to be received from the
different sources can be in different formats like JSON and Binary. It can also accept the
data of real-time applications.
• Sharding and Scaling
• The smaller parts of the data called the shards the data which is received from the
different sources are divided into smaller shards for redundancy and fault tolerance. There
are no limits for the shards amazon kinesis can scale the shards horizontally depending on
the requirement.

Example: Imagine an e-commerce platform processing millions of transactions


per second. Kinesis Data Streams can ingest this data, allowing real-time
analytics like identifying trending products.
How Amazon Kinesis Works?

• Processing and Buffering


• After sharded the data will be prepared for further use like it will apply
filtering or record aggregation before storing it.

• Making the data accessible


• After completing all the steps mentioned above know the data should
be accessible it offers various ways to access and utilize your data
stream.
Types of Services Offered by Amazon Kinesis
There are three types of services which Amazon kinesis offers:

1. Kinesis Data Firehose


2. Kinesis Analytics
3. Kinesis Data streams
Types of Services Offered by Amazon Kinesis
• Kinesis Firehose
Firehose allows the users to load or transform their streams of data into amazon web service
latter transfer for the other functionalities like analyzing or storing. It does not require
continuous management as it is fully automated and scales automatically according to the data.
• Example: Stream IoT sensor data to Amazon S3, for storage or analysis.
• Kinesis Analytics
It allows the streams of data provided by the kinesis firehose to analyze and process it with the
standard SQL. It analyzes the data format and automatically parses the data and by using some
standard interactive schema editor to edit it in recommend schema. It also provides pre-built
stream process templates that can be used to select a suitable template for their data analytics.
Example: Analyze log data from applications to detect patterns and anomalies as they
happen.
Types of Services Offered by Amazon Kinesis
• Kinesis Streams
It provides a platform for real-time and continuous processing of data. It is also
used to encrypt the sensitive data by using the KMS master keys and the server-
side encryption for the security purpose.The architecture of Amazon Kinesis
looks somewhat like the given below image

Example: Imagine an e-commerce platform processing millions of transactions per second.


Kinesis Data Streams can ingest this data, allowing real-time analytics like identifying
trending products.
• Data Streams is for custom real-time processing of data.
• Firehose is for automatically delivering streaming data to specific destinations.
• Data Analytics is for real-time SQL-based analysis of streaming data.
Amazon Kinesis Working
Using Amazon Kinesis, real-time data can be ingested, such as audio, video, website
clickstreams, application logs, and IoT telemetry data for artificial intelligence, machine
learning, and other analytics applications. Amazon Kinesis also assists with processing and
analyzing data as it reaches and responds instantly without having to wait for the entire
collection of data so that the processing could begin.

Amazon Redshift is a
fully managed data
warehouse service
provided by AWS.
AMAZON KINESIS DATA STREAMS
Amazon Kinesis is a serverless streaming data service that simplifies the
capture, processing and storage of data streams at any scale
This architecture demonstrates
how Amazon Kinesis can ingest
and process large volumes of
real-time data securely and
reliably across multiple
availability zones. It supports a
variety of use cases from simple
data archiving to complex real-
time analytics, machine learning,
and large-scale batch processing.
The modular nature of Kinesis
allows it to be integrated into
diverse systems to provide
timely and actionable insights.
Features of Amazon Kinesis
• Cost-efficient: All the services provided by the amazon are cost-efficient as it
follows the pay as you go model which means you have to pay for the service
according to the usage, not a flat price. So it becomes advantageous for the users
that they have to pay only what they use.
• Integrate with other AWS services: Amazon Kinesis allows users to use the
other AWS services and integrate with it. Services that can be integrated are
Amazon DynamoDB, Amazon Redshift, and all the other services that deal with
the large amount of data.
• Availability: You can access it from anywhere and anytime. Just need a good
connectivity of net.
• Real-time processing- It allows you to work upon the data which is needed to
be updated every time with changes instantaneously. Most advantageous feature
of Kinesis because real-time processing becomes important when you are
dealing with such a huge amount of data.
TERMINOLOGIES - AMAZON KINESIS
• Kinesis Data Streams
• A Kinesis data stream is a set of shards (pieces). Each shard has a sequence of data records. Each data
record has a sequence number that is assigned by Kinesis Data Streams
• Data Record
• A data record is the unit of data stored in a Kinesis data stream. Data records are composed of a
sequesnce number, partition key and a data blob, which is an immutable sequence of bytes. Kinesis
Data Streams does not inspect, interpret, or change the data in the blob in any way. A data blob can be
up to 1 MB
• Capacity Mode
• A data stream capacity mode determines how capacity is managed and how you are charged for the
usage of your data stream. Currently, in Kinesis Data Streams, you can choose between an on demand
mode and a provisioned mode for your data streams.
• With the on-demand mode, Kinesis Data Streams automatically manages the shards in order to
provide the necessary throughput. You are charged only for the actual throughput that you use and
Kinesis Data Streams automatically accommodates your workloads’ throughput needs as they ramp up
or down.
• With the provisioned mode, you must specify the number of shards for the data stream. The total
capacity of a data stream is the sum of the capacities of its shards. You can increase or decrease the
number of shards in a data stream as needed and you are charged for the number of shards at an hourly
rate.
TERMINOLOGIES - AMAZON KINESIS DATA STREAMS
• Retention Period
• The retention period is the length of time that data records are accessible after they are
added to the stream. A stream’s retention period is set to a default of 24 hours after
creation. You can increase the retention period up to 8760 hours (365 days) using the
IncreaseStreamRetentionPeriod operation, and decrease the retention period down to a
minimum of 24 hours using the DecreaseStreamRetentionPeriod operation. Additional
charges apply for streams with a retention period set to more than 24 hours
• Producer
• Producers put records into Amazon Kinesis Data Streams. For example, a web server
sending log data to a stream is a producer
• Consumer
• Consumers get records from Amazon Kinesis Data Streams and process them. These
consumers are known as Amazon Kinesis Data Streams Application.
Amazon Kinesis Data Streams Application
• An Amazon Kinesis Data Streams application is a consumer of a stream that
commonly runs on a fleet of EC2 instances.
• There are two types of consumers that you can develop: shared fan-out
consumers and enhanced fan-out consumers
AMAZON KINESIS DATA STREAMS
The producers continually push data to Kinesis Data Streams, and the consumers process the
data in real time. Consumers (such as a custom application running on Amazon EC2 or an
Amazon Data Firehose delivery stream) can store their results using an AWS service such as
Amazon DynamoDB, Amazon Redshift, or Amazon S3
Benefits of Amazon Kinesis Data Streams
• Although we can use Kinesis Data Streams to solve a variety of streaming data
problems, a common use is the real-time aggregation of data followed by
loading the aggregate data into a data warehouse or map-reduce cluster.
• Data is put into Kinesis data streams, which ensures durability and elasticity.
• The delay between the time a record is put into the stream and the time it can be
retrieved (put-to-get delay) is typically less than 1 second. In other words, a
Kinesis Data Streams application can start consuming the data from the stream
almost immediately after the data is added.
• The managed service aspect of Kinesis Data Streams relieves you of the
operational burden of creating and running a data intake pipeline. You can create
streaming map-reduce–type applications. The elasticity of Kinesis Data Streams
enables you to scale the stream up or down, so that you never lose data records
before they expire.
Benefits of Amazon Kinesis Data Streams (Contd…)

• Multiple Kinesis Data Streams applications can consume data from a stream, so
that multiple actions, like archiving and processing, can take place concurrently
and independently

Eg: Two applications can read data from the same stream. The first application
calculates running aggregates and updates an Amazon DynamoDB table, and the
second application compresses and archives data to a data store like Amazon
Simple Storage Service (Amazon S3). The DynamoDB table with running
aggregates is then read by a dashboard for up-to-the-minute reports. The Kinesis
Client Library enables fault-tolerant consumption of data from streams and
provides scaling support for Kinesis Data Streams applications.
AMAZON KINESIS VIDEO STREAMS
Amazon Kinesis is a serverless streaming data service that simplifies the capture,
processing and storage of data streams at any scale.
USE CASE EXAMPLE – INDUSTRIAL ROBOTICS
Amazon Kinesis Data Streams Vs Video Streams
• Amazon Kinesis Data Streams
A serverless data streaming service that can capture, process, and store data
streams at any scale. It's highly available, durable, and low latency.
• Amazon Kinesis Video Streams
A data streaming service that's tailored to video streaming. It can capture,
process, and store video streams from various sources, and supports both live
video streaming and archival storage. It can manage thousands of video streams
simultaneously
Amazon Kinesis Data Streams Vs Video Streams
• Here are some other differences between the two services:
• Use cases
Kinesis Data Streams can be used for a variety of applications, while Kinesis
Video Streams is designed for video streaming, analytics, and machine learning.
• Integration
Kinesis Data Streams integrates with other AWS services. Kinesis Video Streams
works well with machine learning services for tasks like facial recognition or
motion detection.
• Features
Kinesis Data Streams offers on-demand and provisioned capacity mode, and
dedicated throughput per consumer. Kinesis Video Streams is designed for cost-
effective, efficient ingestion and storage of time-encoded data
Use cases of Amazon Kinesis
• Create Real time applications
• Build apps for application monitoring, fraud detection, and live leaderboards
• Analyze data and emit the results to any data store or application
• Evolve from batch to real time analytics
• Perform real-time analytics on data that has been traditionally analyzed using batch processing
• Get the latest information without delay
• Analyze IoT device data
• Process streaming data from IoT devices, and then use the data to programmatically send real-
time alerts and respond when a sensor exceeds certain operating thresholds
• Build Video Analytics applications
• Securely stream video from camera-equipped devices
• Use streams for video playback, security monitoring, face detection, ML, and other analytics
Amazon Redshift
Amazon Redshift - Introduction
• Amazon Redshift is a fully managed, petabyte (1 PB = 1,024 TB)-scale data warehouse
service in the cloud, designed to enable fast, scalable, and cost-effective data analysis.
• Launched by Amazon Web Services (AWS) in 2013, it has quickly become a cornerstone for
organizations looking to perform complex queries on large datasets without the burden of
managing physical hardware.
• Designed for online analytical processing (OLAP) of petabyte-scale data, making it ideal for
big data analytics, business intelligence, and reporting.
• Redshift shines in its ability to handle huge volumes of data — capable of processing
structured and unstructured data in the range of exabytes (1018 bytes). However, the service
can also be used for large-scale data migrations.
• Redshift helps to gather valuable insights from a large amount of data. With the easy-to-use
interface of AWS, you can start a new cluster in a couple of minutes, and you don’t have to
worry about managing infrastructure.
• Customers can use the Redshift for just $0.25 per hour with no commitments or upfront costs
and scale to a petabyte or more for $1,000 per terabyte per year.
• RedShift is an enterprise-level,
petabyte-scale and fully managed data
warehousing service.

• So, what is a Data Warehouse? The


answer for resides in its own if we know
what a warehouse is general terms;
generally a warehouse is a place where
raw materials or manufactured goods
may be stored prior to their
distribution for sale, the same holds
for Data also data warehouse is a place
for collecting, storing, and managing
data from various sources and provide
the relevant and meaningful business So Amazon provides an enterprise-level
insights. warehousing tool where we can process and
manage data with REDSHIFT. The Range for
these datasets varies from 100s of gigabytes to
a petabyte.
Amazon Redshift - Introduction (Contd....)

Some of its key attributes include:


• High Performance: Optimized for high-speed query performance through a
combination of Massively Parallel Processing (MPP), columnar storage, and
advanced compression techniques.
• Scalability: Easily scales from a few hundred gigabytes to petabytes or more,
allowing businesses to start small and grow as their data needs increase.
• Cost Efficiency: Offers a pay-as-you-go pricing model, with options to reserve
instances for long-term savings, making it accessible for businesses of all sizes.
Amazon Redshift
• Redshift is an OLAP-style (Online Analytical Processing) column-oriented
database. It is based on PostgreSQL version 8.0.2. This means regular SQL
queries can be used with Redshift. But this is not what separates it from other
services. The fast delivery to queries made on a large database with exabytes of
data is what helps Redshift stand out.

• Fast querying is made possible by Massively Parallel Processing design or


MPP. The technology was developed by ParAccel. With MPP, a large number
of computer processors work in parallel to deliver the required computations.
Sometimes processors situated across multiple servers can be used to deliver a
process
Architecture

• Empowering Data-Driven Decisions: By providing the ability to quickly analyze


massive amounts of data, Amazon Redshift empowers businesses to make
informed, data-driven decisions.
• Integrated Ecosystem: Seamlessly integrates with other AWS services such as
Amazon S3, AWS Glue, and Amazon QuickSight, creating a powerful
ecosystem for end-to-end data management and analytics.
• Adoption Across Industries: Trusted by thousands of companies across various
sectors, from retail to healthcare, finance, and beyond, Amazon Redshift is a
versatile solution that addresses diverse data warehousing needs
Data warehouse system architecture
Example: Suppose BMW's online store uses Amazon Redshift for analytics. They have a
cluster named "bmw-sales" with 4 nodes:-
Leader Node (Node 1): Coordinates queries-
Compute Node 1 (Node 2): Stores customer data-
Compute Node 2 (Node 3): Stores sales data-
Compute Node 3 (Node 4): Stores product data
When a BMW analyst runs a query to analyze sales trends, here's what happens:
1. The leader node (Node 1) receives the query: "What were the top-selling car models in the
INDIA last quarter?“
2. The leader node breaks down the query into tasks: - Task 1: Retrieve INDIA sales data
from Node 3 - Task 2: Retrieve car model data from Node 4 - Task 3: Join sales and
model data on Node 2.
3. The leader node assigns tasks to compute nodes: - Node 3 processes Task 1, retrieving
INDIA sales data - Node 4 processes Task 2, retrieving car model data - Node 2 processes
Task 3, joining sales and model data4. Each compute node processes its task and returns
results to the leader node5. The leader node aggregates results and returns the top-selling
car models to the analyst
Data warehouse system architecture
• Client applications
Amazon Redshift integrates with various data loading and ETL (extract, transform,
and load) tools and business intelligence (BI) reporting, data mining, and analytics
tools. Amazon Redshift is based on open standard PostgreSQL, so most existing
SQL client applications will work with only minimal changes.

• Clusters
The core infrastructure component of an Amazon Redshift data warehouse is a
cluster. A cluster is composed of one or more compute nodes. If a cluster is
provisioned with two or more compute nodes, an additional leader node
coordinates the compute nodes and handles external communication. Your client
application interacts directly only with the leader node. The compute nodes are
transparent to external applications.
Data warehouse system architecture
• Leader Node
The leader node manages communications with client programs and all communication
with compute nodes. It parses and develops execution plans to carry out database
operations, in particular, the series of steps necessary to obtain results for complex
queries. Based on the execution plan, the leader node compiles code, distributes the
compiled code to the compute nodes, and assigns a portion of the data to each
compute node. The leader node distributes SQL statements to the compute nodes only
when a query references tables that are stored on the compute nodes. All other queries
run exclusively on the leader node.
• Compute Node
The leader node compiles code for individual elements of the execution plan and
assigns the code to individual compute nodes. The compute nodes run the compiled
code and send intermediate results back to the leader node for final aggregation. Each
compute node has its own dedicated CPU and memory, which are determined by the
node type. As your workload grows, you can increase the compute capacity of a cluster
by increasing the number of nodes, upgrading the node type, or both.
Data warehouse system architecture
• Redshift managed storage
Data warehouse data is stored in a separate storage tier Redshift Managed Storage (RMS).
RMS provides the ability to scale your storage to petabytes using Amazon S3 storage. RMS
lets you scale and pay for computing and storage independently, so that you can size your
cluster based only on your computing needs. It automatically uses high-performance SSD-
based local storage as tier-1 cache. It also takes advantage of optimizations, such as data
block temperature, data block age, and workload patterns to deliver high performance
while scaling storage automatically to Amazon S3 when needed without requiring any
action.
• Node Slices
A compute node is partitioned into slices. Each slice is allocated a portion of the node's
memory and disk space, where it processes a portion of the workload assigned to the
node. The leader node manages distributing data to the slices and apportions the
workload for any queries or other database operations to the slices. The slices then work
in parallel to complete the operation.
Data warehouse system architecture
• Data Distribution
Distributes data based on a column value, ensuring that rows with the same value are
stored together, optimizing joins.
Evenly distributes rows across all slices, which is useful for uniform data distribution.
Copies data to all nodes, ideal for small tables that are frequently joined with others.

• Massively Parallel Processing (MPP):


Parallel Execution: MPP enables Redshift to process queries in parallel across all the
nodes in a cluster. This massively improves query performance, especially for complex
analytical queries.
Data Distribution and Replication: Data is distributed across nodes based on the chosen
distribution style, ensuring that each node works on its own subset of the data.
How it works?
• Amazon Redshift uses SQL to analyze structured and semi-structured data across data
warehouses, operational databases, and data lakes, using AWS-designed hardware and
machine learning to deliver the best price performance at any scale.
Security Features

• Encryption:
Data Encryption at Rest: Redshift supports encryption of data at rest using AWS Key Management
Service (KMS) or customer-managed keys.

• Network Isolation:
Amazon VPC Integration: Redshift clusters can be deployed within an Amazon Virtual Private
Cloud (VPC), providing network isolation and enhanced security.
PrivateLink Support: Allows secure and private communication between Redshift and other AWS
services without using the public internet
Security Features
• Access Control:
AWS Identity and Access Management (IAM): Redshift integrates with IAM to provide
fine-grained access control to data and resources, allowing you to define roles and
permissions based on the principle of least privilege.
Database User Management: Supports creating users and roles within Redshift with varying
levels of access to the database and its objects.

• Audit Logging:
Database Audit Logs: Redshift can generate audit logs that record database activity,
including connection attempts, queries run, and changes to database objects.
Integration with CloudTrail: Audit logs can be sent to AWS CloudTrail for long-term storage
and detailed analysis
Use cases
• Amazon Redshift is used when the data to be analyzed is humongous. The data has to be at
least of a petabyte-scale (1015 bytes) for Redshift to be a viable solution. The MPP
technology used by Redshift can be leveraged only at that scale. Beyond the size of data,
there are some specific use cases that warrant its use.

• Real-time analytics
• Many companies need to make decisions based on real-time data and often need to
implement solutions quickly too. Take Uber for example, based on historical and current
data, Uber has to make decisions quickly. It has to decide surge pricing, where to send
drivers, what route to take, expected traffic, and a whole host of data.
• Thousands of such decisions have to be made every minute for a company like Uber with
operations across the globe. The current stream of data and historical data has to be
processed in order to make those decisions and ensure smooth operations. Such instances can
use Redshift as the MPP technology to make accessing and processing data faster
Use cases
• Combining multiple data sources
There are occasions where structured data, semi-structured data, and/or unstructured data
have to be processed to gain insights. Traditional business intelligence tools lack the
capability to handle the varied structures of data from different sources. Amazon Redshift is a
potent tool in such use cases.

• Business intelligence
The data of an organization needs to be handled by a lot of different people. All of them are
not necessarily data scientists and will not be familiar with the programming tools used by
engineers.
They can rely on detailed reports and information dashboards that have an easy-to-use
interface. Highly functional dashboards and automatic report creation can be built using
Redshift. It can be used with tools like Amazon Quicksight and also third-party tools created
by AWS partners.
Use cases
• Log analysis
• Behavior analytics is a powerful source for useful insights. Behavior analytics provide
information on how a user uses an application, how they interact with it, the duration of
use, their clicks, sensor data, and a plethora of other data.

• The data can be collected from multiple sources — including a web application used on a
desktop, mobile phone, or tablet — and can be aggregated and analyzed to gain insight into
user behavior. This coalescing of complex datasets and computing data can be done using
Redshift.

• Redshift can also be used for traditional data warehousing. But solutions like the S3 data
lake would likely be better suited for that. Redshift can be used to perform operations on
data in S3, and save the output in S3 or Redshift
Use cases
• Improve financial and demand forecasts
• Ingests hundreds of megabytes of data per second so you can query data in near real time and build
low latency analytics applications for fraud detection, live leaderboards, and IoT.
• Optimize business intelligence
• Build insight-driven reports and dashboards using Amazon Redshift and BI tools such as Amazon
QuickSight, Tableau, Microsoft PowerBI, or others.
• Accelerate Machine Learning in SQL
• Use SQL to build, train, and deploy ML models for many use cases including predictive analytics,
classification, regression and more to support advance analytics on large amount of data.
• Monetize data
• Build applications on top of all your data across databases, data warehouses, and data lakes.
• Seamlessly and securely share and collaborate on to create more value for your customers, monetize
your data as a service, and unlock new revenue streams.
• Combine data with third party datasets
• Whether it's market data, social media analytics, weather data or more, subscribe to and combine third
party data in AWS Data Exchange with your data in Amazon Redshift, without hassling over licensing
and onboarding processes and moving the data to the warehouse.
Benefits of AWS Redshift
• Speed. With the use of MPP technology, the speed of delivering output on large data sets is
unparalleled. No other cloud service providers can match the speed at the cost AWS provides the
service.
• Data Encryption. Amazon provides the facility for data encryption for any part of Redshift operation.
You as the user can decide which operations need encryption and those that do not need encryption.
Data encryption provides an added layer of security.
• Use familiar tools. Redshift is based on PostgreSQL. All the SQL queries work with it. Additionally,
you are free to choose any SQL, ETL (Extract, Transform, Load), and Business Intelligence (BI) tools
you are familiar with. There is no requirement to use the tools provided by Amazon.
• Intelligent Optimization. For a large data set, there would be a number of ways to query data with the
same parameters. The different commands will have different levels of data utilization. AWS Redshift
provides tools and information to improve queries. It will also provide tips to improve the database
automatically. These can be utilized for an even faster operation that is less intensive on resources.
• Automate repetitive tasks. Redshift has the provisions by which you can automate tasks that have to be
done repeatedly. This could be administrative tasks like generating, daily, weekly, or monthly reports. It
could be resource and cost auditing. It can also be regular maintenance tasks to clean up data. You can
automate all these with the provisions offered by Redshift.
• Concurrent Scaling. AWS Redshift will scale up automatically to support increasing concurrent
workloads.
Example
Real-time User Activity Streaming:
•Amazon Kinesis Data Streams collects real-time data on user actions, such as browsing history, clicks, add-to-
cart actions, and purchases. This real-time stream is processed for:
• Real-time analytics and personalized recommendations.
• Updating inventory levels in near real-time.
• Triggering alerts for low stock or fraud detection.

Data Storage and Centralization:


•Amazon S3 stores raw and structured data (e.g., product information, transaction logs, customer profiles,
images, etc.). S3 is the data lake where all data sources (Kinesis streams, logs, transactional data) are centralized.
•Amazon RDS stores customer, order, and transactional data in a relational format. This data can be used for
more structured and relational queries.
•Amazon DynamoDB stores real-time session data (such as users’ shopping cart status) for quick retrieval.

Data Warehousing and Analytics:


•Amazon Redshift is used as the data warehouse where large volumes of structured data from the RDS, S3, and
Kinesis streams are aggregated. Redshift enables:
• Running complex SQL queries for business intelligence.
• Analyzing customer behavior, sales performance, and inventory trends.
• Processing large-scale historical data for long-term trends.
Amazon Elastic MapReduce
(EMR)
Amazon EMR Elastic MapReduce
• Amazon Elastic MapReduce is an important cloud-based platform service that is designed
for the effective scaling and processing of large-volume datasets. Its platform facilitates
the users in quickly and easily setting up the cluster with Amazon EC2 Instances that are
already pre-configured with big data frameworks.
• The Map function takes a dataset and transforms it into key-value pairs. It essentially maps
or applies a specific function to each element of the input data, processing each item
independently.
• The Reduce function aggregates the key-value pairs produced by the Map function. It
combines the values associated with the same key to produce a final output.
• Amazon markets EMR as an expandable, low-configuration service that provides an
alternative to running on-premises cluster computing.
• Amazon EMR is based on Apache Hadoop, a Java-based programming framework that
supports the processing of large data sets in a distributed computing environment. Using
MapReduce, a core component of the Hadoop software framework, developers can write
programs that process massive amounts of unstructured data across a distributed cluster of
processors or standalone computers.
Working of Amazon EMR Elastic MapReduce
• Amazon EMR (Elastic Map Reduce) is an AWS-based platform service that processes
large-volume datasets using shared computing frameworks such as Apache Hadoop and
Apache Spark. It facilitates the users in quickly setting up, configuring, and scaling
virtual server clusters for analyzing and processing vast amounts of data efficiently.
• Amazon EMR functionalities simplify the complex processing of large datasets over the
cloud.
• Users can create the clusters and can be utilized with elastic nature of Amazon EC2
instances.
• The natures of Amazon EC2 instances are configured with pre existing frameworks like
Apache Hadoop and Apache Spark.
• A framework is a platform or set of tools, libraries, and standards that developers use to
build applications, handle common tasks, and structure their code more efficiently.
• Apache Hadoop is an open-source software framework used for distributed storage and
processing of large datasets across clusters of computers.
• Apache Spark is another open-source framework for distributed data processing but is
designed to be faster and more flexible than Hadoop.
Working of Amazon EMR
• By distributing the processing jobs across the several nodes these clusters effectively
handle and guarantee the parallel executions with faster outcomes.
• It provides scalability by automatically adjusting the cluster size in accordance to workload
needs.
• It optimizes the data storages on integrating with other AWS services making things easier.
• Users can find the things easily rather than going for complicated detailing of infrastructure
and administration.
• It provides a simplified approach for big data analytics.
Architecture of Amazon EMR
1. Amazon EMR on EKS runs your Spark
and other big data applications on a
Kubernetes cluster using EKS.
2. Different Spark applications (version 2.4
and 3.0) and other custom analytics
applications are deployed and run in the
same cluster.
3. The compute resources used by these
applications come from a mix of EC2
instances and Fargate, depending on the
configuration and workload type.
4. Data is stored and retrieved from Amazon
S3, which acts as the primary storage for
the input/output of the applications.
5. The infrastructure is distributed across
multiple Availability Zones to ensure
resiliency and performance.
Use case example for architecture
Use Case: Real-Time Analytics on Streaming Data with Apache Spark on EMR and EKS
1. Data Ingestion:
• Website clickstream data is continuously uploaded to Amazon S3.
2. Amazon EMR on EKS Setup:
• Apache Spark 3.0 is deployed on Amazon EMR on EKS for real-time data processing.
• Another analytics application is also running in the same EKS cluster for deeper insights.
3. Compute Resources:
• Amazon EC2 instances handle heavy Spark processing tasks.
• AWS Fargate is used for lightweight tasks without manual server management.
4. Data Storage:
• Real-time processed data and insights are stored back in Amazon S3.
• Historical data is also maintained in S3 for future batch analysis.
5. High Availability:
• The infrastructure is spread across multiple Availability Zones to ensure high availability
and fault tolerance.
Architecture of Amazon EMR
• Amazon EMR architecture is designed for efficient big data processing using a distributed computing
framework.
• Clusters: Consist of a master node (manages the cluster), core nodes (process data and store data in
HDFS) Hadoop Distributed File System, and optional task nodes (handle additional processing).
• Hadoop Ecosystem: Utilizes tools like Apache Spark, HBase, and Hive, pre-configured and
optimized for big data analytics.
• AWS Integration: Seamlessly integrates with AWS services like S3 (storage), IAM (security),
CloudWatch (monitoring), and Amazon VPC (network isolation), enhancing functionality and security.
Understanding clusters and nodes
• The central component of Amazon EMR is the cluster.
• A cluster is a collection of Amazon Elastic Compute Cloud (Amazon EC2) instances.
• Each instance in the cluster is called a node.
• Each node has a role within the cluster, referred to as the node type.
• Amazon EMR also installs different software components on each node type, giving each node a role
in a distributed application like Apache Hadoop.
Architecture of Amazon EMR

• The node types in Amazon EMR are as follows:


Primary node: A node that manages the cluster by running software
components to coordinate the distribution of data and tasks among other nodes
for processing. The primary node tracks the status of tasks and monitors the
health of the cluster. Every cluster has a primary node, and it's possible to create
a single-node cluster with only the primary node.
Core node: A node with software components that run tasks and store data in
the Hadoop Distributed File System (HDFS) on your cluster. Multi-node clusters
have at least one core node
Task node: A node with software components that only runs tasks and does not
store data in HDFS. Task nodes are optional.
Features of Amazon EMR
• Integration: It support integration with other AWS services that enhances the efficiency in
data processing, making connections with Amazon S3 possible facilitating efficiency in
workflow.
• Salability: Amazon EMR providing scaling and handling of workloads dynamically. It
support automatic adjustments in sizing of the cluster and optimizing the performance and
minimizing costs.
• Ease Of Use: Amazon EMR makes the deployments of big data easier by offering pre-
configured environments for Apache Hadoop and Apache spark. Setuping and maintaining
of clusters will be easier for users without requirement of complex setups on this Amazon
ECR.
• Cost Management: EMR facilitates with cost optimization through letting users to pay
only for the resources during the processing of big data making analytics more affordable.
Spot instances and Reserved Instances further minimizes the costs.
• Security: EMR provides strong security features such as Data encryption, IAM roles and
fine-grained access controls. It ensures data protection through the pipeline processing
Deployment options of Amazon EMR
• On-Demand Instances: Without making any advanced commitments, users can
easily create the EMR clusters utilizing on demand instances for they need and will
pay for the resources on hourly basis. This will be as a flexible choice for shifting
workloads well.
• Reserved Instances: Reserved Instances are helpful for customers to commit for a
specific instance for a duration of 1 or 3 years in a particular region. This option
provides an appropriate steady workloads with predictable usage and less expensive
than on-demand pricing.
• Spot Instances: By using Amazon EC2 spot instances, users can create requests for
EC2 capacity that are unused possibly saving a lot of money. Spot instances are best
suited for workloads that are tolerant of faults and disrupts
Advantages of Amazon EMR
• Scalability: EMR allows users to easily scale up or down the number of instances in
a cluster to handle varying amounts of data processing and analysis tasks.
• Cost Effectiveness: EMR allows users to pay for the resources they need, when they
need them, making it a cost-effective solution for big data processing.
• Integration With Other AWS Services: EMR can be easily integrated with other AWS
services such as Amazon S3, Amazon DynamoDB, and Amazon Redshift for data
storage and analysis.
• Flexibility: EMR supports a wide range of open-source big data frameworks,
including Hadoop, Spark, and Hive, giving users the flexibility to choose the tools
that best fit their needs.
• Easy To Use: EMR provides an easy-to-use web interface that allows users to launch
and manage clusters, as well as monitor and troubleshoot performance issues.
Disadvantages of Amazon EMR
• Limited Customization: EMR is pre-configured with popular big data frameworks such
as Hadoop and Spark, so users may have limited options for customizing their cluster.
• Latency: The latency of data processing tasks may increase as the size of the data set
increases.
• Cost: EMR can be expensive for users with large amounts of data or high-performance
requirements, as costs are based on the number of instances and the amount of storage
used.
• Limited Control Over The Infrastructure: EMR is a managed service, which means that
users have limited control over the underlying infrastructure. This can be a
disadvantage for users who need more control over their big data environments.
• Limited Support For Certain Big Data Frameworks: EMR does not support some big
data frameworks such as Flink, which may be a deal breaker for some organizations.
• Limited Support For Certain Applications: EMR is not suitable for all types of
applications, it mainly supports big data processes and analytics
Use cases of Amazon EMR
• Big Data Processing: Amazon ECR is ideal for business Organizations where their is a
dealing of distributed processing with large amounts of data. It is capable of managing large
volumes of data conversions, data warehousing and analysis of logs efficiently.
• Data Analysis: EMR is well known for performing complicated data analytics. It supports
with big data frameworks like Apache spark. It facilitates the companies in making well
informed decisions by letting them to extract insightful information from various types of
datasets.
• Genomic Analysis: EMR is used in bio informatics for analyzing genomic data. Large
scaled genomic datasets are used for processing and analyzing to helps the researchers in
enhancing the scalability and interoperabilities with genomic technologies in life sciences
and healthcare.
• Machine Learning: EMR supports integration with other AWS services such as Amazon
SageMaker seamlessly. It facilitates the organizations to run distributed ML algorithms on
large datasets. It usage is very beneficial for predictive analysis and model training.
AWS Disaster Recovery and
Backup
What is a disaster?
• When planning for disaster recovery, evaluate your plan for these
three main categories of disaster:

• Natural disasters, such as earthquakes or floods.


• Technical failures, such as power failure or network connectivity.
• Human actions, such as inadvertent misconfiguration or
unauthorized/outside party access or modification.
AWS Disaster Recovery and Backup
• Disaster recovery is the process of preparing for and recovering from a disaster. An
event that prevents a workload or system from fulfilling its business objectives in its
primary deployed location is considered a disaster.
• Disaster recovery (DR) is an important part of the resiliency strategy and concerns
how the workload responds when a disaster strikes (a disaster is an event that causes
a serious negative impact on your business).
• This response must be based on the organization's business objectives which specify
the workload's strategy for avoiding loss of data, known as the Recovery Point
Objective (RPO), and reducing downtime where the workload is not available for
use, known as the Recovery Time Objective (RTO).
• We must therefore implement resilience in the design of your workloads in the cloud
to meet your recovery objectives (RPO and RTO) for a given one-time disaster
event.
Disaster recovery and availability
• Disaster recovery can be compared to availability, which is another important
component of your resiliency strategy. Whereas disaster recovery measures
objectives for one-time events, availability objectives measure mean values over a
period of time

MTBF (Mean Time Between


Failures) and MTTR (Mean Time
to Recovery)
How it works?
• AWS Elastic Disaster Recovery (AWS DRS) minimizes downtime and data loss
with fast, reliable recovery of on-premises and cloud-based applications using
affordable storage, minimal compute, and point-in-time recovery
Recovery objectives (RTO and RPO)
• When creating a Disaster Recovery (DR) strategy, organizations most commonly
plan for the Recovery Time Objective (RTO) and Recovery Point Objective (RPO)
Recovery objectives (RTO and RPO)
• Recovery Time Objective (RTO) is the maximum acceptable delay between the
interruption of service and restoration of service. This objective determines what is
considered an acceptable time window when service is unavailable and is defined by
the organization

• Recovery Point Objective (RPO) is the maximum acceptable amount of time since
the last data recovery point. This objective determines what is considered an
acceptable loss of data between the last recovery point and the interruption of
service and is defined by the organization.
Disaster recovery options in the cloud
• Disaster recovery strategies available to you within AWS can be broadly categorized into four
approaches, ranging from the low cost and low complexity of making backups to more complex
strategies using multiple active Regions
• Active/passive strategies use an active site (such as an AWS Region) to host the workload and serve
traffic
• The passive site (such as a different AWS Region) is used for recovery
• The passive site does not actively serve traffic until a failover event is triggered
Disaster recovery options in the cloud
• For a disaster event based on disruption or loss of one physical data center for a well-architected,
highly available workload, you may only require a backup and restore approach to disaster recovery
• If your definition of a disaster goes beyond the disruption or loss of a physical data center to that of
a Region or if you are subject to regulatory requirements that require it, then you should consider
Pilot Light, Warm Standby, or Multi-Site Active/Active
• When choosing your strategy, and the AWS resources to implement it, keep in mind that within
AWS, we commonly divide services into the data plane and the control plane
• The data plane is responsible for delivering real-time service while control planes are used to
configure the environment
• For maximum resiliency, you should use only data plane operations as part of your failover
operation
• This is because the data planes typically have higher availability design goals than the control
planes
Backup and restore
• Backup and restore is a suitable approach for mitigating against data loss or corruption. This
approach can also be used to mitigate against a regional disaster by replicating data to other AWS
Regions, or to mitigate lack of redundancy for workloads deployed to a single Availability Zone.
• In addition to data, you must redeploy the infrastructure, configuration, and application code in the
recovery Region
Pilot Light
• With the pilot light approach, you replicate your data from one Region to another and
provision a copy of your core workload infrastructure.
• Resources required to support data replication and backup, such as databases and object
storage, are always on.
• Other elements, such as application servers, are loaded with application code and
configurations, but are "switched off" and are only used during testing or when disaster
recovery failover is invoked.
• In the cloud, you have the flexibility to deprovision resources when you do not need them,
and provision them when you do.
• A best practice for “switched off” is to not deploy the resource, and then create the
configuration and capabilities to deploy it (“switch on”) when needed.
• Unlike the backup and restore approach, your core infrastructure is always available and you
always have the option to quickly provision a full scale production environment by
switching on and scaling out your application servers.
Pilot Light (Contd...)
A pilot light approach minimizes the ongoing cost of disaster recovery by minimizing the
active resources, and simplifies recovery at the time of a disaster because the core
infrastructure requirements are all in place. This recovery option requires you to change your
deployment approach. You need to make core infrastructure changes to each Region and
deploy workload (configuration, code) changes simultaneously to each Region.
Shared Responsibility Model for Resiliency
• Resiliency is a shared responsibility between AWS and you, the customer. It is
important that you understand how disaster recovery and availability, as part of
resiliency, operate under this shared model.
Shared Responsibility Model for Resiliency
• AWS responsibility “Resiliency of the Cloud”
• AWS is responsible for resiliency of the infrastructure that runs all of the services offered in the
AWS Cloud. This infrastructure comprises the hardware, software, networking, and facilities that
run AWS Cloud services. AWS uses commercially reasonable efforts to make these AWS Cloud
services available, ensuring service availability meets or exceeds
• Customer responsibility “Resiliency in the Cloud”
• Customer responsibility will be determined by the AWS Cloud services that are selected. This
determines the amount of configuration work you must perform as part of your resiliency
responsibilities. For example, a service such as Amazon Elastic Compute Cloud (Amazon EC2)
requires the customer to perform all of the necessary resiliency configuration and management
tasks. Customers that deploy Amazon EC2 instances are responsible for deploying EC2 instances
across multiple locations (such as AWS Availability Zones), implementing self-healing using
services like Amazon EC2 Auto Scaling, as well as using resilient workload architecture best
practices for applications installed on the instances. For managed services, such as Amazon S3 and
Amazon DynamoDB, AWS operates the infrastructure layer, the operating system, and platforms,
and customers access the endpoints to store and retrieve data
Restoring and testing backups

You might also like