0% found this document useful (0 votes)
77 views

MIE1628 Big Data Analytics Lecture10

The document discusses Microsoft Azure data platform and services. It provides an overview of Azure data architecture, cloud data concepts and terminologies, and various Azure data services like Azure Storage, Data Lake Storage, Cosmos DB, SQL Database, Synapse Analytics, Stream Analytics, HDInsight, and Databricks.

Uploaded by

Viola Song
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views

MIE1628 Big Data Analytics Lecture10

The document discusses Microsoft Azure data platform and services. It provides an overview of Azure data architecture, cloud data concepts and terminologies, and various Azure data services like Azure Storage, Data Lake Storage, Cosmos DB, SQL Database, Synapse Analytics, Stream Analytics, HDInsight, and Databricks.

Uploaded by

Viola Song
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Cloud-based

Data Analytics
Lecture 10

1
Azure Data Platform-
Revision
Objectives
Cloud Big Data concepts and terminologies

Microsoft Azure Data Services

Azure modern data platform architecture

Cloud Platform Comparison


The modern data world out there
SQL
No-SQL Databricks
Storm Data Catalog
IoT PaaS vs IaaS
Streaming
Deep Learning
Predictive Data Mart SMP vs MPP
ETL vs ELT
Data Visualisation Prescriptive
Master Data
Big Data Cloud vs On-prem
Data Quality Velocity, Variety and Volume
Semantic Layer Spark AI
Azure Data Architecture
Azure Architecture Solutions
Data Platform
Concepts
Explore data types in Azure
• Structured data
• Nonstructured data
• Semi Structured data
Cloud Models
IaaS vs PaaS vs SaaS
Cloud Terminologies
Database Services
Azure Data Services
Azure Storage

Key features:
Azure Storage accounts are scalable and secure, durable, and highly available. Azure
handles your hardware maintenance, updates, and critical issues.
Azure Data Lake Storage
Key features:
• Unlimited scalability
• Hadoop compatibility
• Security support for both
access control lists (ACLs)
• POSIX compliance
• An optimized Azure Blob File
System (ABFS) driver that's
designed for big-data
analytics
• Zone-redundant storage
• Geo-redundant storage
Key feature:
Azure Cosmos
DB supports
99.999 percent
uptime.
Azure SQL Database

Key feature:
SQL Database
delivers
predictable
performance for
multiple resource
types, service
tiers, and
compute sizes.
Azure Synapse Analytics

Key feature:
SQL Pools uses massively parallel processing (MPP) to quickly run queries across petabytes of data. Because
the storage is separated from the compute nodes, you can scale the compute nodes independently to meet
any demand at any time.
Azure Stream Analytics
Azure HDInsight

Key features
HDInsight is a low-cost cloud solution. It includes Apache Hadoop, Spark, Kafka,
HBase, Storm, and Interactive Query.
Azure Databricks
Data Factory
Data Catalog
IOT Services
• IoT devices
• An IoT device is typically made up
of a circuit board with sensors
attached that use WiFi to connect
to the internet. For example:
• A pressure sensor on a remote oil
pump.
• Temperature and humidity
sensors in an air-conditioning
unit.
• An accelerometer in an elevator.
• Presence sensors in a room.
Big Data Analytics
AI and ML Services

• Azure Machine Learning Studio is a


web portal in Azure Machine Learning
that contains low-code and no-code
options for project authoring and
asset management.
AI and ML Services
AI and ML Services
Modern data
warehousing
What is a Data Warehouse?
• It is not an Operational Database
• Different workload types: transactional (DB) versus analytics (DW)
• It is not a Data Lake
• These are different concepts, they can co-exist and they compliment
each other
• It is not a Data Mart
• A data mart is a subject-oriented database populated from a subset of
the Data Warehouse
The modern data estate

LOB CRM Graph Image Social IoT

Hybrid

Operational databases Operational databases

Data warehouses Data warehouses

Data lakes Data lakes

Reason over any data, anywhere Flexibility of choice Security and performance
The Microsoft offering

LOB CRM Graph Image Social IoT

Hybrid
Easiest lift and shift
with no code changes

SQL Server Azure Data Services

Industry leader 4 years in a row Operational databases Operational databases 70% faster

#1 TPC-H performance Data warehouses Data warehouses 2x the global reach

T-SQL query over any data Data lakes Data lakes 99.9% SLA

AI built-in | Most secure | Lowest TCO

Reason over any data, anywhere Flexibility of choice Security and performance
Modern Data Platform Solution Scenarios
Big Data and advanced analytics

SQL

Modern data warehousing Advanced analytics Real-time analytics

“We want to integrate all our data— “We’re trying to predict when our “We’re trying to get insights from
including Big Data—with our data customers churn” our devices in real-time”
warehouse”
Modern data warehousing pattern
LOB

CRM
BI + Reporting
INGEST STORE PREP MODEL & SERVE
(& store)
Graph

Advanced Analytics
Image

Social Data orchestration Big data store Transform & Clean Data warehouse
and monitoring
AI
IoT
Modern Data Platform Reference Architecture
Load and Ingest Process
Stream Datasets
λ Lambda Architecture and Real-time
Stream Dashboards
Hot Path
V=Velocity Real Time Business User
IoT devices, sensors, gadgets Analytics
(loosely-typed) Event Hubs Stream Analytics

Cold Path
History and
Trend Analysis

Non-structured Cognitive Services Analytics


V=Variety Power BI Premium
Azure ML
images, video, audio, free text
(no structure)

Build and Score


ML models

Semi-Structured Data Factory Azure Data Lake Gen2 Databricks CosmosDB Application
V=Volume
csv, logs, json, xml Scheduled / event-
triggered data ingestion
(loosely-typed) Integrate big data
Fast load
data with scenarios with
Polybase/ traditional data
ParquetDirect warehouse

Enterprise-grade
semantic model
Relational Databases
(strongly-typed, structured) Azure Synapse Analytics Power BI Premium Analytics

Store Serve
Azure Data Platform End2End Lab 1: Load Data into Azure Synapse Analytics using Azure Data Factory Pipelines
Lab 2: Transform Big Data using Azure Data Factory Mapping Data Flows

Lab Architecture Lab 3: Explore Big Data with Azure Databricks


Lab 4: Add AI to your Big Data pipeline with Cognitive Services
Lab 5: Ingest and Analyse Real-Time Data with Event Hubs and Stream Analytics

Azure Data Platform


Resource Group

5
ADPLogicApp 1 ADPEventHubs-suffix 3 SynapseStreamAnalytics-suffix
Logic App Event Hubs Event Hubs

4
2 5 6
ADPCosmosDB-suffix
Azure CosmosDB

1 2
3
PowerBI
MDWResources SynapseDataFactory-suffix SynapseDataLakesuffix ADPDatabricks
Power BI Desktop/
Storage Account Azure Data Factory Azure Data Lake Storage Gen2 Azure Databricks
Workspace
1 2 1

2
ADPComputerVision
4 Computer Vision API
3
operationalsql-suffix\NYCDataSets SynapseDataFactory-suffix
Azure SQL Database Azure Data Factory
1

RDP Connection 3 2
or
Azure Bastion
ADPDesktop
Virtual Machine
4
synapsesql-suffix\SynapseDW
Student’s Azure Synapse Analytics
Computer 4
ADPVirtualNetwork
Virtual Network
Cloud
Comparison
References
• AWS to Azure services comparison

• GCP to Azure services comparison

• Azure Data Platform End to End

You might also like