0% found this document useful (0 votes)
38 views

Session+5-Azure+Components

The document discusses Azure Cloud Storage, Virtual Machines (VMs), and Azure Data Factory (ADF) in the context of modern data integration challenges. It highlights the advantages of using Azure's scalable platform, including high availability and low latency through multiple regions and availability zones, as well as various managed services for relational and non-relational databases. Additionally, it outlines the differences between IaaS, PaaS, and SaaS models, and details the types of data storage solutions available on Azure, including Azure SQL Database, Azure Cosmos DB, and Azure Storage services.

Uploaded by

kushwahtanu2609
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

Session+5-Azure+Components

The document discusses Azure Cloud Storage, Virtual Machines (VMs), and Azure Data Factory (ADF) in the context of modern data integration challenges. It highlights the advantages of using Azure's scalable platform, including high availability and low latency through multiple regions and availability zones, as well as various managed services for relational and non-relational databases. Additionally, it outlines the differences between IaaS, PaaS, and SaaS models, and details the types of data storage solutions available on Azure, including Azure SQL Database, Azure Cosmos DB, and Azure Storage services.

Uploaded by

kushwahtanu2609
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Session 5: Azure Cloud Storage, VM & ADF

Let us start with a real life data problem. Data is generated at


much faster rate, with more volume and at different types of
sources like at on premises, at cloud (SAAS Apps).

Business wants to maintain data in a secured manner , ingest


those in a consistent manner so that they can insights from data
without having any interruption.

Traditional data integration solution is not able to handle this sue


to lack of connector and inability to scale.

Cloud solves this problem by providing scalable data platform


with variety of connectors to connect to other applications and
databases (Sales Forces, SQL Server) , real time and batch
integration .
Azure has 200+ services.

Let us start with creating account with Azure Cloud and login
there.

We need to understand customer requirement before giving


solutions based on Azure components. To gather requirement we
can frame a questionnaire like :

- Is application to be cater single or multiple region, single


region data center may have several challenges :
Challenge 1 : Slow access for users from other parts of the
world (high latency)
Challenge 2 : What if the data center crashes?
Your application goes down (low availability)

- Adding a new region : Mumbai, can partially resolve the


issue of slowness and high availability if server crashes at
one region.
-
- Azure provides 60+ regions around the world

Region: Specific geographical location to host your resources

Advantages:
- High Availability
- Low Latency Global Footprint
- Adhere to government regulations
But how to achieve high availability in the same region (or
geographic location)?

Azure provides solution with Availability Zones in Some Regions

Multiple AZs (3) in a region provides One or more discrete data


centers

- Each AZ has independent & redundant power, networking


& connectivity
- AZs in a region are connected through low-latency links
- Increased availability and fault tolerance within same
region
- Survive the failure of a complete data center

Azure VMs

In corporate data-centers, data stores are deployed on


physical servers but where do you deploy data stores in
the cloud?

You can rent virtual servers on Azure:


- Virtual Machines - Virtual servers in Azure
- Azure Virtual Machines - Provision & Manage Virtual
Machines

Problems for VM with databases:


- You need to take care of:
- OS installation & upgrades
- Database installation & upgrades Availability (create a
standby database) Durability (take regular backups) Scaling
compute & storage
Managed Services:
Do you want to continue running databases in the cloud, the
same way you run them in your data centre or are there OTHER
approaches?

Let's understand some terminology used with cloud services:

• IaaS (Infrastructure as a Service)


• PaaS (Platform as a Service)
• SaaS (Software as a Service)

Let's get on a quick journey to understand these!

IAAS:

Use only infrastructure from cloud


provider

Example: Running SQL Server on a VM

Cloud Provider is responsible for:


- Virtualization, Hardware and
Networking

You are responsible for:


- OS upgrades and patches
- Database software and upgrades
- Database Configuration (Tables, Indexes, Views etc) Data
- Scaling of compute & storage, Availability and Durability

PAAS:
Use a platform provided by cloud

Cloud provider is responsible for:

- Virtualization, Hardware and


Networking
- OS upgrades and patches
- Database software and upgrades
Scaling, Availability, Durability etc..

You are responsible for:


- Database Configuration (Tables, Views,
Indexes, ...)
- Data
Examples: Azure SQL Database, Azure Cosmos DB and a lot more

You will NOT have access to OS and Database software (most of


the times!)

SAAS
Centrally hosted software (mostly on the cloud) Offered on a
subscription basis (pay-as-you-go)

Examples:
Email, calendaring & office tools (such as Outlook 365, Microsoft
Office 365, Gmail, Google Docs)

Customer relationship management(CRM), enterprise resource


planning (ERP) and document management tools

Cloud provider is responsible for:


- OS (incl. upgrades and patches)
- Application Runtime
- Auto scaling, Availability & Load balancing etc.. Application
code and/or
- Application Configuration (How much memory? How many
instances? ..)

Customer is responsible for:


- Configuring the software!

Data Stores :

Amount of data generated increasing exponentially Data formats:


• Structured: Tables, Rows and Columns (Relational)
• Semi Structured: Key-Value, Document (JSON), Graph, etc
• Unstructured: Video, Audio, Image, Text files, Binary files

Types of Data stores:


• Relational databases
• NoSQL databases Analytical databases
• Object/Block/File storage

Relational Data base – OLTP


• Applications where large number of users make large
number (millions) of transactions
• Transaction - small, discrete, unit of work
• Example: Transfer money from your account to your friend's
account
• Heavy writes and moderate reads Quick processing expected
• Use cases: Most traditional applications -
banking, e- commerce, ..
• Popular databases: MySQL, Oracle, SQL Server etc Some

• Azure Managed Services:


▪ Azure SQL Database: Managed Microsoft SQL Server
▪ Azure Database for MySQL: Managed MySQL
▪ Azure Database for PostgreSQL: Managed PostgreSQL

Relational Data base – OLAP


Applications allowing users to analyze petabytes of data by
gathering data into data warehouse and make those enable for
reporting or further analysis.
- Data is consolidated from multiple (typically transactional)
databases

Examples: Reporting applications, Data warehouses, Business


intelligence applications, Analytics systems
Sample application : Decide insurance premiums analyzing data
from last hundred years

Azure Managed Service: Azure Synapse Analytics

Petabyte-scale distributed data ware house


- Unified experience for developing end-to-end analytics
solutions
- Data integration + Data warehouse + Data analytics
- Run complex queries across petabytes of data Earlier called
Azure SQL Data Warehouse
Semi Structure Data :

• Data has some structure BUT not very strict


• Semi Structured Data is stored in NoSQL
databases
▪ NoSQL = not only SQL
▪ Flexible schema
o Structure data the way your
application needs it
o Let the structure evolve with time
• Horizontally scale to petabytes of data with
millions of TPS
• Managed Service: Azure Cosmos DB
• Types of Semi Structured Data:
▪ Document
▪ Key Value
▪ Graph
▪ Column Family

Semi structured Data - Document

• Data stored as collection of documents


▪ Typically JSON (Javascript Object Notation)
o Be careful with formatting (name/value pairs, commas
etc)
o address - Child Object - {}
o socialProfiles - Array - []
▪ Documents are retrieved by unique id (called the key)
o Typically, you can define additional indexes
▪ Documents don't need to have the same structure
o No strict schema defined on database
o Apps should handle variations (application defined
schema)
Typically, information in one document would be stored in
multiple tables, if you were using a relational database.

Use cases: Product Catalog, Profile, Shopping Cart etc


Managed Service: Azure Cosmos DB SQL API, MongoDB

Semi structured Data – Key-Value

• Key-value data is similar to a HashMap where


▪ Key - Unique identifier to retrieve a specific value
▪ Value - Number or a String or a complex object, like a JSON
file
▪ Supports simple lookups - query by keys
NOT optimized for query by values Typically, no other indexes
allowed

Use cases: Session Store, Caching Data


Managed Services: Azure Cosmos DB Table API, Azure Table
Storage
Semi structured Data - Graph

• Social media applications have data with complex


relationships How do you store such data?
▪ As a graph in Graph Databases
▪ Used to store data with complex relationships
• Contains nodes and edges (relationships)
• Use cases: People and relationships, Organizational charts,
Fraud Detection
• Managed Service: Azure Cosmos DB Gremlin API

Semi structured Data – Column Family

• Data organized into rows and columns


• Can appear similar to a relational database
• IMPORTANT FEATURE: Columns are divided into groups called
column-family
• Rows can be sparse (does NOT need to have value for every
column)
• Use cases: IOT streams and real time analytics, financial data
- transaction histories, stock prices etc
• Managed Service: Azure Cosmos DB Cassandra API
Unstructured Data

• Data which does not have any structure (Audio files, Video
files, Binary files)
• What is the type of storage of your hard disk?
• Block Storage (Azure Managed Service: Azure Disks)
• You've created a file share to share a set of files with your
colleagues in a enterprise. What type of storage are you using?
• File Storage (Azure Managed Service: Azure Files)
• You want to be able to upload/download objects using a REST
API without mounting them onto your VM. What type of
storage are you using?
• Object Storage (Azure Managed Service: Azure Blob Storage)
Relational vs Non-Relational

• Relational Data (Structured Data)


▪ OLTP: SQL Server on Azure VMs, Azure SQL Database (or
Azure SQL Managed Instance), Azure Database for
PostgreSQL, MariaDB, MySQL
▪ OLAP: Azure Synapse Analytics

• Non Relational Data (Semi Structured/Unstructured Data)


▪ Semi Structured - Document (JSON)
o Azure Cosmos DB SQL API and Cosmos DB MongoDB
API
▪ Semi Structured - Key-Value
o Azure Cosmos DB Table API, Azure Table Storage
▪ Semi Structured - Column-Family
o Azure Cosmos DB Cassandra API
▪ Semi Structured - Graph
o Azure Cosmos DB Gremlin API
▪ Unstructured Data
o Block Storage (Azure Disks), File Storage (Azure Files),
Object Storage (Azure Blob Storage)
Relational Databases on Azure

Structured Query Language (SQL) for retrieving and managing


data Recommended when strong transactional consistency
guarantees are needed
Database schema is mandatory Azure Managed Services:
Azure SQL Database
Azure SQL Managed Instance
Azure Database for PostgreSQL Azure Database for MySQL Azure
Database for MariaDB

Fully Managed Service for Microsoft SQL Server 99.99%


availability
Built-in high availability, automatic updates and backups
Flexible and responsive serverless compute Hyperscale (up to
100 TB) storage
Transparent data encryption (TDE) - Data is
automatically encrypted at rest
Authentication: SQL Server authentication or Active
Directory (and MFA)

vCore-based: Choose between provisioned or serverless


compute
OPTIONAL: Hyperscale (Autoscale storage)
Higher compute, memory, I/O, and storage limits Supports BYOL

Serverless Compute: Database is paused during inactive


periods
You are only billed for storage during inactive periods
If there is any activity, database is automatically resumed
DTU-based: Bundled compute and storage packages
Balanced allocation of CPU, memory and IO
Assign DTUs (relative - Double DTU => Double resources)
Recommended when you want to keep things simple
You CANNOT scale CPU, memory and IO independently
Use DTUs for small and medium databases (< few hundred
DTUs)

Single Database - Great fit for modern, cloud-born applications


Fully managed database with predictable performance
Hyperscale storage (up to 100TB)
Serverless compute

Elastic pool - Cost-effective solution for multiple databases with


variable usage patterns Manage multiple databases within a fixed
budget

Database Server - Database servers are used to manage groups


of single databases and elastic pools. Things configured at
Database server level: Access management, Backup
management

Prerequisites to connect and query from Azure SQL database:


1: Connection Security: Database should allow connection from
your IP address
2: User should be created in the database
3: User should have grants (permissions) to perform queries -
Select, Insert etc.
Use BYOL to reduce license costs
Use read-only replicas (Read scale-out) for offloading read-only
query workloads
Azure SQL Managed Instances:

Another Fully Managed Service for Microsoft SQL Server What's


New: Near 100% SQL Server feature compatibility
Recommended when migrating on premise SQL Servers to Azure
Azure SQL managed instance features NOT in Azure SQL
Database
Cross-database queries (and transactions) within a single SQL
Server instance
Database Mail
Built in SQL Server Agent
Service to execute scheduled administrative tasks - jobs in SQL
Server
Native virtual network support
Supports only vCore-based purchasing model
(Remember) SQL Server Analysis Services (SSAS), SQL Server
Reporting Services (SSRS), Polybase: NOT supported by both
Azure SQL Database and SQL Managed Instance
SQL Server on Azure Virtual Machines : Provides full
administrative control over the SQL Server instance and
underlying OS for migration to Azure

Azure SQL Database - Fully Managed Service for Microsoft SQL


Server. Recommended for cloud-born applications

Azure SQL managed instance –


Full (Near 100%) SQL Server access and feature compatibility
Recommended for migrating on-premise SQL Server databases
Azure SQL managed instance ONLY features: Cross-database
queries, Database Mail Support, SQL Server Agent etc.

Azure Database for MySQL:

Fully managed, scalable MySQL database


Supports 5.6, 5.7 and 8.0 community editions of MySQL
99.99% availability
Choose single zone or zone redundant high availability
Automatic updates and backups
Alternative: Azure Database for MariaDB
MariaDB: community-developed, commercially supported fork
of MySQL

Azure Database for PostgresSQL

Fully managed, intelligent and scalable


PostgreSQL
99.99% availability
Choose single zone or zone redundant high availability
Automatic updates and backups
Single Server and Hyperscale Options
Hyperscale: Scale to hundreds of nodes and
execute queries across multiple nodes

COSMOS DB :
Relational Data (Structured Data)
OLTP: Azure SQL Database, Azure SQL Managed Instance, SQL
Server on Azure VMs, Azure Database for PostgreSQL, MariaDB,
MySQL
OLAP: Azure Synapse Analytics
Non Relational Data (Semi Structured/Unstructured Data)
Semi Structured - Document (JSON)
Azure Cosmos DB SQL API and Cosmos DB MongoDB API
Semi Structured - Key-Value
Azure Cosmos DB Table API, Azure Table Storage
Semi Structured - Column-Family
Azure Cosmos DB Cassandra API
Semi Structured - Graph
Azure Cosmos DB Gremlin API
Unstructured Data
Block Storage (Azure Disks), File Storage (Azure Files), Object
Storage (Azure Blob Storage)

Fully managed NoSQL database service


Global database: Automatically replicates data across multiple
Azure regions
Single-digit millisecond response times
99.999% availability
Automatic scaling (serverless) - Storage and Compute Multi-
region writes
Data distribution to any Azure region with the click of a button
Your app doesn't need to be paused or redeployed to add or
remove a region
Structure: Azure Cosmos account(s) >
database(s) > container(s) > item(s)

Fully managed NoSQL database service


Global database: Automatically replicates data across multiple
Azure regions
Single-digit millisecond response times
99.999% availability
Automatic scaling (serverless) - Storage and Compute Multi-
region writes
Data distribution to any Azure region with the click of a button
Your app doesn't need to be paused or redeployed to add or
remove a region
Structure: Azure Cosmos account(s) >
database(s) > container(s) > item(s)

Core(SQL): SQL based API for working with documents


MongoDB: Document with MongoDB API
Move existing MongoDB workloads
Table: Key Value
Ideal for moving existing Azure Table storage workloads
Gremlin: Graph
Store complex relationships between data
Cassandra: Column Family
REMEMBER: You need a separate Cosmos DB account for each
type of API

Azure Storage

It is Azure managed storage – highly availability, durable and


scalabile
• Managed Cloud Storage Solution
Highly available, durable and massively scalable (upto few
PetaBytes)
• Core Storage Services:
▪ Azure Disks: Block storage (hard disks) for Azure VMs
▪ Azure Files: File shares for cloud and on-premises
▪ Azure Blobs: Object store for text and binary data
▪ Azure Queues: Decouple applications using messaging
▪ Azure Tables: NoSQL store (Very Basic)
o Prefer Azure Cosmos DB for NoSQL
• (PRE-REQUISITE) Storage Account is needed for Azure Files,
Azure Blobs, Azure Queues and Azure Tables

Durability can be achieved by replicating data from one location


to another. Data redundancy is important.

Option Redundancy Discussion

Locally redundant Three synchronous copies in Least expensive


storage (LRS) same data center and least
availability
Zone-redundant Three synchronous copies in
storage (ZRS) three AZs in the primary
region
Geo-redundant LRS + Asynchronous copy to
storage (GRS) secondary region (three more
copies using LRS)
Geo-zone-redundant ZRS + Asynchronous copy to Most expensive
secondary region and highest
storage (GZRS) (three more copies using LRS) availability

• Block Storage :
Use case: Hard-disks attached to your computers
• Typically, ONE Block Storage device can be connected to
ONE virtual server
• You can connect multiple block storage devices into one
virtual server

Disk Storage:
• Disk storage: Disks for Azure VMs
• Types:
o Standard HDD: Recommended for Backup, non-
critical, infrequent access
• Standard SSD: Recommended for Web servers, lightly used
enterprise applications and dev/test environments
• Premium SSD disks: Recommended for production and
performance sensitive workloads
Ultra disks (SSD): Recommended for IO-intensive workloads
such as SAP HANA, top tier databases (for example, SQL,
Oracle), and other transaction-heavy workloads
Premium and Ultra provide very high availability
• Managed vs Unmanaged Disks:
▪ Managed Disks are easy to use:
▪ Azure handles storage
▪ High fault tolerance and availability
▪ Unmanaged Disks are old and tricky (Avoid them if you can)
• You need to manage storage and storage account
• Disks stored in Containers (NOT Docker containers
Completely unrelated )

Files :

• Media workflows need huge shared storage for things like


video editing
• Enterprise users need a quick way to share files in a secure &
organized way
• Azure Files:
▪ Managed File Shares
▪ Connect from multiple devices concurrently:
o From cloud or on-premises
o From different OS: Windows, Linux, and macOS
▪ Supports Server Message Block (SMB) and Network File
System (NFS) protocols
▪ Usecase: Shared files between multiple VMs (example:
configuration files)

Blob Storage:
• Azure Blob Storage: Object storage in Azure
• Structure: Storage Account > Container(s) > Blob(s)
• Store massive volumes of unstructured data
▪ Store all file types - text, binary, backup & archives:
o Media files and archives, Application packages and logs
o Backups of your databases or storage devices
• Three Types of Blobs
▪ Block Blobs: Store text or binary files (videos, archives etc)
▪ Append Blobs: Store log files (Ideal for append operations)
▪ Page Blobs: Foundation for Azure IaaS Disks (512-byte
pages up to 8 TB)

• Azure Data Lake Storage Gen2: Azure Blob Storage Enhanced


▪ Designed for enterprise big data analytics (exabytes,
hierarchical)
▪ Low-cost, tiered storage, with high availability/disaster
recovery

Blob Storage Access Tier

• Different kinds of data can be stored in Blob Storage


▪ Media files, website static content
▪ Backups of your databases or storage devices Long term
archives
• Huge variations in access patterns
• Can I pay a cheaper price for objects I access less frequently?
▪ Access tiers
o Hot: Store frequently accessed data
o Cool: Infrequently accessed data stored for min. 30 days
o Archive: Rarely accessed data stored for min. 180 days
o Lowest storage cost BUT Highest access cost
o Access latency: In hours
o To access: Rehydrate (Change access tier to hot or
cool) OR
▪ Copy to another blob with access tier hot or cool
o You can change access tiers of an object at any point in
time
ADF provide end-to-end solution for data integration,
transformation and orchestration need.

Although it is hard to implement much complex transformation


using ADF, this can be achieved by some external tools like
Spark, HDinsight and Data bricks. We can orchestrate these
transformations from Data Factory.

Data Factory can orchestrate publishing dashboards created by


Power BI.

ADF :

Fully Managed Service: Microsoft takes care of all the


management for Data Factoty that you create as a resource in
your subscription.

We need not to worry about scaling CMs, manage OS patches,


scalability , availability, security requirement etc.

Serverless: Compute environment can be scaled to any size


without managing any servers in environment.

Data Integration capabilities: Provides 90+ connector for


connecting to external applications.
Data Transformation capability: With code free development
tools most of the transfprmation can be achieved.

Data Orchestrate Service: We can orchestrate workflow in ADF


with excellent monitoring capabilities.

ADF is not :
• Data Migration Tool
• Not support streaming service
• Not suitable complex transformation
• Not a Data storage solution, it provide compute requirement
Appendix:
What is ENTRA ID in Azure?

Entra ID is the new name for Azure Active Directory (Azure AD),
a cloud-based identity and access management service provided
by Microsoft. It helps secure and manage access to resources
such as Microsoft 365 applications, Azure services, third-party
SaaS applications, and custom apps.
Key Features of Entra ID:
1. Identity Management: Manages user identities and their
access to resources.
2. Authentication: Provides Single Sign-On (SSO), Multi-Factor
Authentication (MFA), and passwordless login options.
3. Conditional Access: Controls resource access based on
policies such as device state, user location, and application
being accessed.
4. B2B Collaboration: Allows secure collaboration with
external users (partners, vendors).
5. B2C Identity Management: Manages identities for
customer-facing apps.
6. Identity Protection: Detects and mitigates identity risks,
such as compromised accounts.

Example Use Case: Securing Access to an Application


Imagine you have an internal web application used by your
organization's employees. You want to:
• Ensure only authorized employees can access it.
• Require additional authentication for sensitive actions.
• Provide seamless login for employees with their work
accounts.
Here’s how Entra ID helps:
1. Integration with Entra ID:
o You register the application in the Entra ID App
Registration portal.
o Assign user roles and permissions for accessing the
app.
2. Authentication:
o Employees sign in using their work email (managed in
Entra ID).
o The application uses OAuth 2.0 or OpenID Connect
protocols for authentication.
3. Access Control with Conditional Access:
o Define policies such as:
▪ Allow access only from corporate devices.
▪ Require MFA if accessing from an unknown
location.
o Employees meeting the conditions can seamlessly
access the app.
4. SSO:
o If your organization uses other applications integrated
with Entra ID, employees only need to log in once to
access all resources (SSO).

Example in Action:
Scenario:
Your company builds an employee portal for managing payroll,
leaves, and internal communication. Employees need to log in
securely using their corporate accounts.
Steps with Entra ID:
1. Register the App:
o Register the employee portal in the Entra ID portal.
o Configure redirect URIs to enable authentication
callbacks.
2. Define Roles:
o Assign roles like "Admin", "HR", and "Employee" with
different access levels.
3. Enable Conditional Access:
o Policy: Access is granted only during working hours
from corporate devices.
o Enforce MFA for sensitive actions like accessing
payroll details.
4. User Authentication:
o Employees log in using their Entra ID credentials (e.g.,
[email protected]).
o Entra ID handles authentication and issues a token for
the portal.
5. User Experience:
o Employees enjoy SSO across all integrated
applications.
o IT admins can monitor access and enforce security
policies via Entra ID dashboards.

Entra ID not only secures access but also provides centralized


identity management, reducing the complexity of managing users
and applications.

You might also like