0% found this document useful (0 votes)
153 views

Eshant Garg: Azure Data Engineer, Architect, Advisor

The document discusses data lakes and how they compare to data warehouses. It defines a data lake as a large body of water that allows various users to examine, dive into, or take samples of the data. In contrast, a data warehouse is a structured store of clean, packaged data for easy consumption. The document also provides an overview of Azure Data Lake Storage Gen 2, including how it is optimized for big data analytics and supports Hadoop.

Uploaded by

Asif Al Hye
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
153 views

Eshant Garg: Azure Data Engineer, Architect, Advisor

The document discusses data lakes and how they compare to data warehouses. It defines a data lake as a large body of water that allows various users to examine, dive into, or take samples of the data. In contrast, a data warehouse is a structured store of clean, packaged data for easy consumption. The document also provides an overview of Azure Data Lake Storage Gen 2, including how it is optimized for big data analytics and supports Hadoop.

Uploaded by

Asif Al Hye
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Eshant Garg

Azure Data Engineer, Architect, Advisor


[email protected]
Data Lake is a big container to store data.
What is Data Lake?

“If you think of a DataMart as a store of bottled water – clean and packaged and structured
for easy consumption – the data lake is a large body of water in a more natural state. The
contents of the data lake stream in from a source to fill the lake, and various users of the lake
can come to examine, dive in, or take samples.”

James Dixon
CTO, Pentaho

Data Warehouse Data Lake


Data Warehouse vs Data Lake

Data Warehouse Data Lake


Data lake vs Hadoop
Data lake vs Hadoop
Data Lake can use:
Azure Data Lake Gen1 evolution

Data Lake Gen 1

 Fault tolerant file system


 Runs on commodity hardware
 MapReduce, Pig, Hive, Spark etc.
 HDFS in Cloud -> Data Lake Storage Gen1
Cloud storage challenge

Processing
- Easy to optimize processing by increasing vCPU and Ram

Storage
- Different requirements
- No direct solution
Azure Blob Storage

• Large object storage in cloud


• Optimized for storing massive amounts of
unstructured data
• Text or Binary Data
• General purpose object storage
• Cost efficient
• Provide multiple Tiers
Azure Data lake Gen 2

Data Lake Gen 1


MICRSOFT
RECOMMENDS
Data Lake Storage Gen2
for your big data storage Place your screenshot here
needs.

Note: USQL currently not supported in Gen 2


Blob Storage vs Data Lake Storage

Azure Blob Storage Azure Data Lake Storage (Gen 2)

• General purpose data storage • Optimized for big data analytics


• Container based object storage • Hierarchical namespace on Blob Storage
• Available in every Azure region • Available in every Azure region
• Local and global redundancy • Local and global redundancy
• Processing performance limit • Supports a subset of Blob storage features
• Supports multiple Azure integrations
• Compatible with Hadoop
Data Lake Architecture
Blob API ADLS API

Hierarchical Namespace

Performance Scale & Cost


Security
Enhancements Effectiveness

Blog Storage

Object Tiering AAD Integration


HA/DR Data Governance
Lifecycle Policy RBAC, Storage
(ZRS/RA-GRS) & Management
Management Account Security
Learning objective
• Authentication
• Storage Account keys
• Shared access signature (SAS)
• Azure Active Directory (Azure AD)
• Access Control
• Role based access control (RBAC)
• Access control list (ACL)
• Network access
• Firewall and virtual network
• Data Protection
• Data encryption in transit
• Data encryption at rest
• Advanced threat Protection
Authentication
Authentication
Shared Access Signature (SAS)

Security token string


“SAS Token”
Contains permission like start and end time
Azure doesn’t track SAS after creation

Shared Access Signature To invalidate, regenerate storage account


key used to sign SAS
Stored access policy

Reused by multiple SAS


Defined on a resource container
Permissions + validity period
Service level SAS only
Stored access policy
Stored access policy can be revoked
Authentication
Azure Active Directory (AD)

• Grand access to Azure Active directory (AD) Identities


• AD is an enterprise identity provider, Identity as a Service
(IDaaS)
• Globally available from virtually any device
• Identities – user, group or application principle
• Assign role at Subscription, RG, Storage account, container
level.
• No longer need to store credentials with application config
files
• Similar to IIS Application pool identity approach
Access Control
Access control
Firewalls and Virtual Networks

Azure Storage

IP Address My VN Internet
Authentication
Authentication
Shared Access Signature (SAS)

Azure doesn’t track SAS after creation

To invalidate, regenerate storage account


key used to sign SAS
Stored access policy

Reused by multiple SAS


Defined on a resource container
Permissions + validity period
Service level SAS only
Stored access policy
Stored access policy can be revoked
Authentication
Access control
Azure Active Directory (AD)

• Grand access to Azure Active directory (AD) Identities


• AD is an enterprise identity provider, Identity as a Service
(IDaaS)
• Globally available from virtually any device
• Identities – user, group or application principle
• Assign role at Subscription, RG, Storage account, container
level.
• No longer need to store credentials with application config
files
• Similar to IIS Application pool identity approach
Data Protection
Encrypting Data in Transit – Advance

• Site-to-site VPN
• Point-to-site VPN
• Azure ExpressRoute
Client-side Encryption

• Encrypt data within application


• Data is encrypted in transit and at rest
• Application decrypt data when retrieved
• HTTPS has integrity checks built-in
• .Net and Java storage client libraries
• Can leverage Azure Key Vault to generate and/or
store encryption keys
Data Protection
Encrypting Data at Rest

• Encryption enabled by default


• Can’t be disabled
• Storage Service Encryption (SSE)
• Automatically encrypt and decrypt while writing
and reading
• It’s free, no charge
• Applied to both standard and premium tiers
• 256 bit AES Encryption
• Option: Use your own encryption keys
• Blobs and files only
Advanced threat protection

You might also like