0% found this document useful (0 votes)
110 views

ADF Workshop by Amit Navgire

The document summarizes topics covered on days 1 and 2 of a 5-day Azure Data Factory workshop. Day 1 covers cloud basics, creating an Azure account, and the basics of ADF. Day 2 continues ADF basics and covers creating an Azure SQL database, Azure Data Lake, and a task to copy data from the database to the data lake. The document also provides an overview of key ADF concepts like pipelines, activities, linked services, datasets, triggers, and integration runtime.

Uploaded by

Avik Mandal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
110 views

ADF Workshop by Amit Navgire

The document summarizes topics covered on days 1 and 2 of a 5-day Azure Data Factory workshop. Day 1 covers cloud basics, creating an Azure account, and the basics of ADF. Day 2 continues ADF basics and covers creating an Azure SQL database, Azure Data Lake, and a task to copy data from the database to the data lake. The document also provides an overview of key ADF concepts like pipelines, activities, linked services, datasets, triggers, and integration runtime.

Uploaded by

Avik Mandal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

ADF Workshop by Amit Navgire

19 July 2021 09:28 PM

Day 1
Day 2

Course Link: https://www.amitnavgire.com/azure-data-factory-5-days-


workshop/
How to reach Amit Navgire: [email protected]
-------------------------------------------------------------

Day 1
Topics covered:
- Cloud basics
- Create free azure account
- Data Factory
○ Start Azure
○ Start ADF
○ Basics of ADF

Day 2
Topics covered:
- Continuation of Basics of ADF
○ More about Integration Runtime installation, configuration. (first 1hr)
○ Basic on Data Flow & Control Flow (Just after break)
- How to create Azure SQL Database
- How to create Azure Datalake
- First Task: copy data from database to datalake

-------------------------------------------------------------

Data Factory :-
- It is Data integration ETL service
- It orchestrates the overall workflow execution
- PaaS

Azure Page 1
Start Azure :-
- Go to https://portal.azure.com and it is homepage.

Azure Page 2
- Search a particular service from search bar

- Create a resource for IaaS and PaaS services. (100+ services)

Azure Page 3
Create a resource for IaaS and PaaS services. (100+ services)

- Go to "Subscriptions" to check all the subscriptions.

Azure Page 4
- When changing your subscription to pay as you go, then go to "Cost
Management + Billing" to compare cost for each subscription.
- If you are not using a resource, then delete it. Otherwise pay for that
without any usage.

Start ADF :-
- Go to azure portal
- Create a resource for "Data Factories"
- Create a new ADF

Azure Page 5
Azure Page 6
Azure Page 7
OR

Azure Page 8
Basics of ADF:
- Pipelines
- Activities
- Linked Services
- Datasets
- Triggers
- Integration Runtime
- Control Flow
- Data Flow

Azure Page 9
Pipelines:
- A pipeline is a logical grouping of activities that together perform a task
- A data factory can have one or multiple pipelines
Activities:
- The activities in a pipeline defines actions to perform on your data
- 3 type of activities
○ Data movement activities
○ Data transformation activities
○ Control activities
Linked Services:
- Linked services are much like connection strings, which define the
connection information needed for Data Factory to connect to external
resources
- Here is a sample scenario to copy data from Blob storage to a SQL
database, you create two linked services: Azure Storage and Azure SQL
Database
- Everything can be parameterized
Datasets:
- Datasets identify data within different data stores, such as tables, files,
folders and documents
- Before you create a dataset, you must create a linked service to link your
data to the data factory
- Same dataset cant be used for input & output in a pipeline
- Dataset points to single data source
Triggers:
- Triggers are used to schedule a execution of pipeline
- Pipeline and triggers have a many-to-many relationship
○ multiple triggers cab kick off a single pipeline or a single trigger can
kick off multiple pipelines
Integration Runtime:

Azure Page 10
Integration Runtime:
- The integration Runtime (IR) is the compute infrastructure used by ADF
- It is used when linked service is created
- Following available type of integration runtime:
○ Azure (default)
○ Self hosted
○ Azure SSIS
Control Flow:
- It orchestrates a set of control activities within a pipeline like
LookupActivity, ForEachActivity etc.
Data Flow:
- Data flows allow data engineers to develop graphical data transformation
logic without writing code
- Data flows are executed as activities within Azure Data Factory pipelines
using scaled out Azure Databricks clusters.

How to create Azure SQL Database?


Steps are like below:
- Go to azure portal
- Create a resource for "SQL Database"

- Give username & password which you can remember.

Azure Page 11
- Click "Configure database"

Azure Page 12
Azure Page 13
- Database is created.

- Click "Set server firewall"

Azure Page 14
Click "Set server firewall"

- Click "Query Editor"

- You can access the database using "Query editor"

Azure Page 15
- Also you can access using "SQL server management studio"

How to create Azure Datalake ?


Notes on ADL:
- Store as BLOB in azure Datalake
- Schema on read
Steps:
- Search for "storage accounts"

Azure Page 16
- Go to "Review + create"
- Click "Review + create" and then "Create" once validation passed.
- Click "Go to resource" once deployment is completed.

Azure Page 17
- Go to "Storage Explorer"
- Right click on "CONTAINERS" and click "Create file system"

- Upload files manually using Microsoft Azure Storage Explorer

Azure Page 18
First Task: copy data from database to datalake
- Create database table and insert records

Azure Page 19
- No transformation is applied here, so "COPY DATA" activity is used from
ADF.
- Provide appropriate "Name"

- Specify source
○ Specify "Source dataset" --> Click "+ New" to create new dataset
○ Search and select "Azure SQL Database"

Azure Page 20
○ Set properties -->
▪ specify "Name"
▪ create new linked service

▪ Select "Table name" and press OK

Azure Page 21
- Source configuration is completed.
- For Note:
○ "Use query" to specify whether unload full data, query output or sql
stored procedure output.
○ If source table has any partition, then specify that using "partition
option"
○ Use "Preview data" to check sample records.

- Specify sink
○ Specify "Source dataset" --> Click "+ New" to create new dataset
○ Search and select "Azure Data Lake Storage Gen 2"
○ Specify

Azure Page 22
○ Select file format

○ Set properties
▪ specify "Name"
▪ create new linked service

Azure Page 23
▪ Set properties where to store output data

○ Can select "block size", "maz rows perfile" etc.


○ Specify "file extension"

Azure Page 24
- Can specify "Mapping"

- Click "Publish all"

Azure Page 25
- Once publishing is completed, we can run it.
○ Can run in debug mode, using "Debug"
○ Can run using "Add trigger" and monitor the run from triggers.

Azure Page 26

You might also like