MIE1628 Big Data Analytics Lecture10
MIE1628 Big Data Analytics Lecture10
Data Analytics
Lecture 10
1
Azure Data Platform-
Revision
Objectives
Cloud Big Data concepts and terminologies
Key features:
Azure Storage accounts are scalable and secure, durable, and highly available. Azure
handles your hardware maintenance, updates, and critical issues.
Azure Data Lake Storage
Key features:
• Unlimited scalability
• Hadoop compatibility
• Security support for both
access control lists (ACLs)
• POSIX compliance
• An optimized Azure Blob File
System (ABFS) driver that's
designed for big-data
analytics
• Zone-redundant storage
• Geo-redundant storage
Key feature:
Azure Cosmos
DB supports
99.999 percent
uptime.
Azure SQL Database
Key feature:
SQL Database
delivers
predictable
performance for
multiple resource
types, service
tiers, and
compute sizes.
Azure Synapse Analytics
Key feature:
SQL Pools uses massively parallel processing (MPP) to quickly run queries across petabytes of data. Because
the storage is separated from the compute nodes, you can scale the compute nodes independently to meet
any demand at any time.
Azure Stream Analytics
Azure HDInsight
Key features
HDInsight is a low-cost cloud solution. It includes Apache Hadoop, Spark, Kafka,
HBase, Storm, and Interactive Query.
Azure Databricks
Data Factory
Data Catalog
IOT Services
• IoT devices
• An IoT device is typically made up
of a circuit board with sensors
attached that use WiFi to connect
to the internet. For example:
• A pressure sensor on a remote oil
pump.
• Temperature and humidity
sensors in an air-conditioning
unit.
• An accelerometer in an elevator.
• Presence sensors in a room.
Big Data Analytics
AI and ML Services
Hybrid
Reason over any data, anywhere Flexibility of choice Security and performance
The Microsoft offering
Hybrid
Easiest lift and shift
with no code changes
Industry leader 4 years in a row Operational databases Operational databases 70% faster
T-SQL query over any data Data lakes Data lakes 99.9% SLA
Reason over any data, anywhere Flexibility of choice Security and performance
Modern Data Platform Solution Scenarios
Big Data and advanced analytics
SQL
“We want to integrate all our data— “We’re trying to predict when our “We’re trying to get insights from
including Big Data—with our data customers churn” our devices in real-time”
warehouse”
Modern data warehousing pattern
LOB
CRM
BI + Reporting
INGEST STORE PREP MODEL & SERVE
(& store)
Graph
Advanced Analytics
Image
Social Data orchestration Big data store Transform & Clean Data warehouse
and monitoring
AI
IoT
Modern Data Platform Reference Architecture
Load and Ingest Process
Stream Datasets
λ Lambda Architecture and Real-time
Stream Dashboards
Hot Path
V=Velocity Real Time Business User
IoT devices, sensors, gadgets Analytics
(loosely-typed) Event Hubs Stream Analytics
Cold Path
History and
Trend Analysis
Semi-Structured Data Factory Azure Data Lake Gen2 Databricks CosmosDB Application
V=Volume
csv, logs, json, xml Scheduled / event-
triggered data ingestion
(loosely-typed) Integrate big data
Fast load
data with scenarios with
Polybase/ traditional data
ParquetDirect warehouse
Enterprise-grade
semantic model
Relational Databases
(strongly-typed, structured) Azure Synapse Analytics Power BI Premium Analytics
Store Serve
Azure Data Platform End2End Lab 1: Load Data into Azure Synapse Analytics using Azure Data Factory Pipelines
Lab 2: Transform Big Data using Azure Data Factory Mapping Data Flows
5
ADPLogicApp 1 ADPEventHubs-suffix 3 SynapseStreamAnalytics-suffix
Logic App Event Hubs Event Hubs
4
2 5 6
ADPCosmosDB-suffix
Azure CosmosDB
1 2
3
PowerBI
MDWResources SynapseDataFactory-suffix SynapseDataLakesuffix ADPDatabricks
Power BI Desktop/
Storage Account Azure Data Factory Azure Data Lake Storage Gen2 Azure Databricks
Workspace
1 2 1
2
ADPComputerVision
4 Computer Vision API
3
operationalsql-suffix\NYCDataSets SynapseDataFactory-suffix
Azure SQL Database Azure Data Factory
1
RDP Connection 3 2
or
Azure Bastion
ADPDesktop
Virtual Machine
4
synapsesql-suffix\SynapseDW
Student’s Azure Synapse Analytics
Computer 4
ADPVirtualNetwork
Virtual Network
Cloud
Comparison
References
• AWS to Azure services comparison