0% found this document useful (0 votes)

87 views

Big Data Technologies PG-DBDA March 2022

The document outlines teaching guidelines for a course on Big Data technologies that includes 66 classroom hours and 84 lab hours. The objective is to teach skills in Hadoop, MapReduce, HBase, Pig, and Spark. The course covers topics like HDFS, MapReduce, HBase, Hive, Spark, and Apache Airflow. Students will learn through lectures, labs exercises involving installing/configuring technologies and writing queries/programs. Evaluation includes exams and assignments focused on both theoretical and practical concepts. Reference materials include books on Hadoop, Big Data, Hive, and related topics.

Uploaded by

srinivasa helwar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views

Big Data Technologies PG-DBDA March 2022

Uploaded by

srinivasa helwar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Suggested Teaching Guidelines for

Big Data Technologies PG-DBDA March 2022

Duration: 66 Classroom hours + 84 Lab hours

Objective: To reinforce knowledge of BigData Technologies such as Hadoop, Map reduce,HBase,

PIG, Spark (PySpark)

Prerequisites:Knowledge of Linux command, SQL and Core Java

Evaluation method: Theory exam – 40% weightage

Lab exam – 40% weightage
Internal exam – 20% weightage

List of Books / Other training material

Textbook:
1. Hadoop: The Definitive Guide, SPD

Reference:
1. Big Data, Black Book by DreamTech
2. Programming Hive by O’Rellay (Author:- Edward Capriolo, Dean Wampler, and Jason
RutherglenEdward Capriolo, Dean Wampler, and Jason Rutherglen)
1. Hadoop The Definitive Guide 4thEdition by O’Rellay (Author: - Tom White)
2. Hadoop In Practice by Manning (Author: - ALEX HOLMES)
3. Pro Hadoop by Aprss(Author:-Jason Venner)
4. Hadoop with python
5. Hadoop Real-World Solutions Cookbook by Packet publication (Author: Jonathan R.
Owens, Jon Lentz,Brian Femiano)
6. Hadoop In Action by Manning Publications (Author: - CHUCK LAM)
7. Data Architecture: A Primer for the Data Scientist: Big Data, Data Warehouse and Data
Vault
8. Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset
9. Big Data Analytics with Spark: A Practitioner's Guide to Using Spark for Large-Scale Data
Processing, Machine Learning, and Graph Analytics, and High-Velocity Data Stream
Processing

Note: Each session having 2 Hours

Introduction to Bigdata and Hadoop (Theory- 16 Hrs and Lab- 06 Hrs)

Session: 1, 2 & 3
Introduction to Big Data
o Big Data - Beyond the Hype,
o Big Data Skills and Sources of Big Data,
o Big Data Adoption,
o Research and Changing Nature of Data Repositories,
o Data Sharing and Reuse Practices and Their Implications for Repository Data
Curation,
o Overlooked and Overrated Data Sharing,
o Data Curation Services in Action,
o Open Exit: Reaching the End of The Data Life Cycle,
o The Current State of Meta-Repositories for Data,
Page 1 of 7
Suggested Teaching Guidelines for
Big Data Technologies PG-DBDA March 2022
o Curation of Scientific Data at Risk of Loss: Data Rescue And Dissemination
Introduction to Hadoop
o A Brief History of Hadoop,
o Evolution of Hadoop,
o Introduction to Hadoop and its components
o Comparison with Other Systems,
o Hadoop Releases
o Hadoop Distributions and Vendors

Hadoop Distributed File System (HDFS)

Session: 4 & 5
Hadoop Distributed File System (HDFS)
o Distributed File System,
o What is HDFS,
o Where does HDFS fit in,
o Core components of HDFS,
o HDFS Daemons,
o Hadoop Server Roles: Name Node, Secondary Name Node, and Data Node
HDFS Architecture
o HDFS Architecture,
o Scaling and Rebalancing,
o Replication,
o Rack Awareness,
o Data Pipelining,
o Node Failure Management.
o HDFS High Availability NameNode

Lab-Assignment:
o Run the HDFS commands, and add a one liner understanding for each of the
command.
o Execute the provided code using HDFS, step run and understand

Hadoop Installation and Cluster Configuration (Lab – 02 Hrs)

Getting Started: Hadoop Installation
o Hadoop Operation modes
o Setting up a Hadoop Cluster,
o Cluster specification,
o Single and Multi-Node Cluster Setup on Virtual & Physical Machines,
o Remote Login using Putty/Mac Terminal/Ubuntu Terminal.
o Hadoop Configuration, Security in Hadoop, Administering Hadoop,
o HDFS – Monitoring & Maintenance, Hadoop benchmarks,
o Hadoop in the cloud.

Session: 7
Hadoop Architecture
o Hadoop Architecture,
o Core components of Hadoop,

Page 2 of 7
Suggested Teaching Guidelines for
Big Data Technologies PG-DBDA March 2022
o Common Hadoop Shell commands.

Session: 8
HDFS Data Storage Process
o HDFS Data storage process,
o Anatomy of writing and reading file in HDFS,
o Handling Read/Write failures
o HDFS user and admin commands,
o HDFS Web Interface.

Map Reduce (Theory – 06 Hrs & Lab – 12 Hrs)

Session: 9
Getting in touch with Map Reduce Framework
o Hadoop Map Reduce paradigm,
o Map and Reduce tasks,
o Map Reduce Execution Framework,
o Map Reduce Daemons
o Anatomy of a Map Reduce Job run
More Map Reduce Concepts
o Partitioners and Combiners,
o Input Formats (Input Splits and Records, Text Input, Binary Input, Multiple
Inputs),
o Output Formats (Text Output, Binary Output, Multiple Output).
o Distributed Cache

Session: 10
Basics of Map Reduce Programming
o Hadoop Data Types,
o Java and Map Reduce,
o Map Reduce program structure,
o Map-only program, Reduce-only program,
o Use of combiner and partitioner,
o Counters, Schedulers (Job Scheduling),
o Custom Writables, Compression

Lab-Assignment:
o Execute the train data example.
o Execute the train data example using chained methods.

Session: 11
Map Reduce Streaming
o Complex Map Reduce programming,
o Map Reduce streaming,
o Python and Map Reduce,
o Map Reduce on image dataset

Hadoop ETL
Session: 12
Page 3 of 7
Suggested Teaching Guidelines for
Big Data Technologies PG-DBDA March 2022
o Hadoop ETL Development,
o ETL Process in Hadoop,
o Discussion of ETL functions,
o Data Extractions,
o Need of ETL tools,
o Advantages of ETL tools.

Lab-Assignment:
o Understand the file formats and read the provided links

HBase (Theory – 06 Hrs & Lab – 06 Hrs)

Session: 13
Introduction to HBase
o Overview of HBase
o HBase architecture
o Installation
Session: 14 and 15
The HBaseAdmin and HBase Security

o Various Operations on Tables

o HBase general command and shell,
o java client API for HBase
o Admin API
o CRUD operations
o Client API
o HBase – Scan, Count and Truncate
o HBase Security

Lab-Assignment:
o Run the Hbase shell commands
o Run the HBase using Java client

Hive (Theory – 08 Hrs & Lab – 18 Hrs)

Session: 16
The Hive Data-ware House
o Introduction to Hive,
o Hive architecture and Installation,
o Comparison with Traditional Database,
o Basics of Hive Query Language.

Session: 17
Working with Hive QL
o Datatypes,
o Operators and Functions,
o Hive Tables (Managed Tables and Extended Tables),
o Partitions and Buckets,
o Storage Formats,
o Importing data,
o Altering and Dropping Tables.
Page 4 of 7
Suggested Teaching Guidelines for
Big Data Technologies PG-DBDA March 2022

Lab-Assignment:
o Creative a hive DB and table ( internal and external )
o Load the data into hive table (using local inpath and HSFS inpath)

Session:18
Querying with Hive QL
o Querying Data-Sorting,
o Aggregating,
o Map Reduce Scripts,
o Joins and Sub queries,
o Views,
o Map and Reduce side joins to optimize query.

Lab-Assignment:
o Run all the types of joins in Hive
o Execute the data to be partitioned

Session: 19
More on Hive QL
o Data manipulation with Hive,
o UDFs,
o Appending data into existing Hive table,
o custom map/reduce in Hive
o Writing HQL scripts

Apache Airflow (Theory – 06 Hrs & Lab – 06 Hrs)

Session: 20, 21and 22
o Introduction to Data Warehousing and Data Lakes
o Designing Data warehousing for an ETL Data Pipeline
o Designing Data Lakes for an ETL Data Pipeline
o ETL vs ELT
o Fundamentals of Airflow
o Work management with Airflow
o Automating an entire Data Pipeline with Airflow

Lab-Assignment:
o Create a airflow DAG for Extract -> Transform -> Load

Introduction to Apache Spark& Kafka (Theory – 24 Hrs & Lab – 36 Hrs)

Session: 23, 24 and 25

Apache Spark APIs for large-scale data processing
o Overview, Linking with Spark, Initializing Spark,
o Resilient Distributed Datasets (RDDs), External Datasets
o RDD v/s Data frames v/s Datasets
o Data frame operations
o Structured Spark Streaming
o Passing Functions to Spark, Working with Key-Value Pairs, Shuffle operations,
Page 5 of 7
Suggested Teaching Guidelines for
Big Data Technologies PG-DBDA March 2022
o RDD Persistence, Removing Data, Shared Variables, Deploying to a Cluster

Lab-Assignment:
o Run the provided Hadoop Streaming program using python

Session: 26
o Map Reduce with Spark
o Working with Spark with Hadoop
o Working with Spark without Hadoop and their Differences
Lab Assignment
o Execute all the provided code using step-runs for each and every codeline
o Setup the JDBC configuration and run the Spark JDBC Connectivity program
o Run the spark integrations using the provided code

Session: 27
o Data preprocessing
o EDA

Session: 28 and 29
o Introduction to Kafka
o Working with Kafka using Spark
o Spark streaming Architecture
o Spark Streaming APIs
o Building Stream Processing Application with Spark

Lab Assignment
o Execute the spark streaming with Kafka

Session:
30 o Setting up Kafka Producer and Consumer
o Kafka Connect API

Session: 31
o Spark SQL

Lab Assignment
o Run the sparkSQL programs using step-runs for each and every codeline
o Run all the SparkSQL programs
o Analyse the election data using spark and provide analysis

Session: 32 and 33
o Spark MLlib
o Predictive Analysis

Lab Assignment:
o Deep Learning with Spark
o Connecting DB’s with Spark
o Accessing and manipulating the DB’s
Page 6 of 7
o Demo: Capstone Project

Page 7 of 7
Suggested Teaching Guidelines for
Big Data Technologies PG-DBDA March 2022
o Create a complex workflow using bash operator, a simple workflow using python
o Create Using python airflow operator to read data from your local drive, ingest the
data into your HDFS, and perform a spark WC

Page 8 of 7

(eBook PDF) Qualitative Data Analysis: A Methods Sourcebook 4th Edition 2024 Scribd Download
100% (2)
(eBook PDF) Qualitative Data Analysis: A Methods Sourcebook 4th Edition 2024 Scribd Download
41 pages
Slides Adi - Potential Energy
No ratings yet
Slides Adi - Potential Energy
11 pages
MT6768 Android Scatter
No ratings yet
MT6768 Android Scatter
16 pages
Schema Thatdefines The Operations Database
No ratings yet
Schema Thatdefines The Operations Database
24 pages
RS 232 Medibus
No ratings yet
RS 232 Medibus
60 pages
Common Batch Job Failures
No ratings yet
Common Batch Job Failures
6 pages
Big Data Technologies PG-DBDA September 2023: ACTS, Pune
No ratings yet
Big Data Technologies PG-DBDA September 2023: ACTS, Pune
6 pages
Big Data Syllabus
No ratings yet
Big Data Syllabus
2 pages
Big Data Analytics- sem 7 CVMU
No ratings yet
Big Data Analytics- sem 7 CVMU
4 pages
Big Data Analytics Syllabus
No ratings yet
Big Data Analytics Syllabus
2 pages
Koe097big Data
No ratings yet
Koe097big Data
1 page
Big Data Technologies Course Outline
No ratings yet
Big Data Technologies Course Outline
2 pages
IV Yr II Sem Lesson Plans
No ratings yet
IV Yr II Sem Lesson Plans
19 pages
B.Tech. CS_CE and CSE Syllabus 3rd Year 2024-25
No ratings yet
B.Tech. CS_CE and CSE Syllabus 3rd Year 2024-25
2 pages
Mastering Big Data and Hadoop: From Basics to Expert Proficiency
From Everand
Mastering Big Data and Hadoop: From Basics to Expert Proficiency
William Smith
No ratings yet
AIADS 7th Sem Syllabus Signed
No ratings yet
AIADS 7th Sem Syllabus Signed
19 pages
Big Data Analytics Course Outline (Fall 2020) : Dr. Tariq Mahmood 830 Am - 11 Am (Monday) Scope
No ratings yet
Big Data Analytics Course Outline (Fall 2020) : Dr. Tariq Mahmood 830 Am - 11 Am (Monday) Scope
3 pages
bigdata
No ratings yet
bigdata
2 pages
Cap456-Introduction To Big Data
No ratings yet
Cap456-Introduction To Big Data
1 page
BCA-BIGDATA-FIFTH_SEM-APPROVED-SYLLABUS
No ratings yet
BCA-BIGDATA-FIFTH_SEM-APPROVED-SYLLABUS
23 pages
Big Data and Analytics Syllabus 2021
No ratings yet
Big Data and Analytics Syllabus 2021
3 pages
CC ZG522 Course Handout
No ratings yet
CC ZG522 Course Handout
6 pages
Big Data Technologies: Course Code Level Program
No ratings yet
Big Data Technologies: Course Code Level Program
3 pages
Big Data
No ratings yet
Big Data
3 pages
Big Data and Hadoop For Developers - Syllabus
No ratings yet
Big Data and Hadoop For Developers - Syllabus
6 pages
FRM Course Syllabus IPDownload
No ratings yet
FRM Course Syllabus IPDownload
2 pages
Course Pack BDA
No ratings yet
Course Pack BDA
6 pages
DATA ANALYTICS Lab
No ratings yet
DATA ANALYTICS Lab
3 pages
Advanced Hadoop Techniques: A Comprehensive Guide to Mastery
From Everand
Advanced Hadoop Techniques: A Comprehensive Guide to Mastery
Adam Jones
No ratings yet
BDA Syllabus
No ratings yet
BDA Syllabus
4 pages
17CSCS2T2
No ratings yet
17CSCS2T2
2 pages
Bigdata
No ratings yet
Bigdata
2 pages
Big Data-2
No ratings yet
Big Data-2
3 pages
LP BigData
No ratings yet
LP BigData
5 pages
Appendix-74
No ratings yet
Appendix-74
42 pages
Bite411l Big-data-Analytics TH 1.0 73 Bite411l 67 Acp
No ratings yet
Bite411l Big-data-Analytics TH 1.0 73 Bite411l 67 Acp
2 pages
Coursera Report Divyansh Sahai CSF443
No ratings yet
Coursera Report Divyansh Sahai CSF443
7 pages
BIG DATA ANALYTICS (1)
No ratings yet
BIG DATA ANALYTICS (1)
20 pages
Big Data Analytics
No ratings yet
Big Data Analytics
3 pages
Big Data Analytics 0th Lecture
No ratings yet
Big Data Analytics 0th Lecture
19 pages
Big Data Analytics With Lab
No ratings yet
Big Data Analytics With Lab
3 pages
Essentials of Big Data Griet
No ratings yet
Essentials of Big Data Griet
2 pages
BIG Data Syllabus
No ratings yet
BIG Data Syllabus
2 pages
2024 25 ODD CE449 BDA Syllabus
No ratings yet
2024 25 ODD CE449 BDA Syllabus
4 pages
19CS4701D
No ratings yet
19CS4701D
2 pages
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
From Everand
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
Peter Jones
No ratings yet
Introduction Big Data With Hadoop
No ratings yet
Introduction Big Data With Hadoop
3 pages
1. Introduction of Subject
No ratings yet
1. Introduction of Subject
28 pages
BigData and Hadoop - Syllabus
No ratings yet
BigData and Hadoop - Syllabus
2 pages
Syllabus of Big Data Analysis - Proposed
No ratings yet
Syllabus of Big Data Analysis - Proposed
2 pages
Big Data Analytics Comp Syllabus Sem7
No ratings yet
Big Data Analytics Comp Syllabus Sem7
4 pages
CIT 4401Big Data Analytics Course Outline
No ratings yet
CIT 4401Big Data Analytics Course Outline
5 pages
Syllabus E63 Spring2016-2
No ratings yet
Syllabus E63 Spring2016-2
3 pages
BDA Syllabus - Sem VII - Mumbai University
No ratings yet
BDA Syllabus - Sem VII - Mumbai University
3 pages
BDA - Unit-1
No ratings yet
BDA - Unit-1
24 pages
IOT Analytics - AI361
No ratings yet
IOT Analytics - AI361
3 pages
DSA Practical Index
No ratings yet
DSA Practical Index
3 pages
Big Data analyticsNEW SYLLABUS FRAMING
No ratings yet
Big Data analyticsNEW SYLLABUS FRAMING
3 pages
Big Data Technology E1UJ502B
No ratings yet
Big Data Technology E1UJ502B
11 pages
CCS334 BDA Syllabus
No ratings yet
CCS334 BDA Syllabus
5 pages
Big Data Analytics
No ratings yet
Big Data Analytics
2 pages
Training For Bigdata and Hadoop: #I Background and Introduction
No ratings yet
Training For Bigdata and Hadoop: #I Background and Introduction
9 pages
Blda Pract 2024
No ratings yet
Blda Pract 2024
59 pages
BE-AIDS-R-20-VII-VIII-Sem-Syllabus_compressed
No ratings yet
BE-AIDS-R-20-VII-VIII-Sem-Syllabus_compressed
55 pages
Unit 1
No ratings yet
Unit 1
19 pages
MCAD2232 (PRESS) BIG DATA and Its Applications
No ratings yet
MCAD2232 (PRESS) BIG DATA and Its Applications
140 pages
Submitted By: Manjit Gogoi, Varun Kapoor, Shailender, Isha Kakkar, Shweta, Shruti, Ravneet, Avni, Era, Deepali, Nancy, Rahul Thakur
No ratings yet
Submitted By: Manjit Gogoi, Varun Kapoor, Shailender, Isha Kakkar, Shweta, Shruti, Ravneet, Avni, Era, Deepali, Nancy, Rahul Thakur
32 pages
An Analyses and Meta-Synthesis of Research On STEM Education
No ratings yet
An Analyses and Meta-Synthesis of Research On STEM Education
11 pages
Grade 11 and 12 Marketing and Sales Management Flowchart 3 Sep 24
100% (4)
Grade 11 and 12 Marketing and Sales Management Flowchart 3 Sep 24
23 pages
Dbms Bca m5
No ratings yet
Dbms Bca m5
3 pages
Suggested Answers To Discussion Questions
No ratings yet
Suggested Answers To Discussion Questions
3 pages
ABAP Interview Questions
No ratings yet
ABAP Interview Questions
35 pages
English For Academic Legal Purposes Text
No ratings yet
English For Academic Legal Purposes Text
94 pages
SQL
No ratings yet
SQL
102 pages
Cheatsheet 1
No ratings yet
Cheatsheet 1
1 page
Cache Memory: A Safe Place For Hiding or Storing Things
No ratings yet
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
Questions: (April 18)
No ratings yet
Questions: (April 18)
15 pages
s4 Hana Changes in SD 2
100% (4)
s4 Hana Changes in SD 2
4 pages
Restore Controlfile
No ratings yet
Restore Controlfile
3 pages
J1.L.P0021. Event Management-400LOC
No ratings yet
J1.L.P0021. Event Management-400LOC
5 pages
Types and Qualities of Knowledge
No ratings yet
Types and Qualities of Knowledge
22 pages
Tableau Questions
No ratings yet
Tableau Questions
2 pages
MongoDB Lab
100% (1)
MongoDB Lab
6 pages
Disk Storage and Basic File Structures
No ratings yet
Disk Storage and Basic File Structures
39 pages
Cassandra Quick Guide
No ratings yet
Cassandra Quick Guide
60 pages
Certificación UL Del Controlador Diesel Eaton Cutler Hammer Mod - FD120
No ratings yet
Certificación UL Del Controlador Diesel Eaton Cutler Hammer Mod - FD120
1 page
ASM Configuration On Oracle10g
No ratings yet
ASM Configuration On Oracle10g
13 pages
In Escorts Operation Management
0% (1)
In Escorts Operation Management
100 pages
FortiAnalyzer Admin Guide
No ratings yet
FortiAnalyzer Admin Guide
162 pages
SQL Interview Questions
No ratings yet
SQL Interview Questions
21 pages

Uploaded by

Uploaded by

Suggested Teaching Guidelines for

Big Data Technologies PG-DBDA March 2022

Objective: To reinforce knowledge of BigData Technologies such as Hadoop, Map reduce,HBase,

Prerequisites:Knowledge of Linux command, SQL and Core Java

Evaluation method: Theory exam – 40% weightage

List of Books / Other training material

Note: Each session having 2 Hours

Introduction to Bigdata and Hadoop (Theory- 16 Hrs and Lab- 06 Hrs)

Hadoop Distributed File System (HDFS)

Hadoop Installation and Cluster Configuration (Lab – 02 Hrs)

Map Reduce (Theory – 06 Hrs & Lab – 12 Hrs)

HBase (Theory – 06 Hrs & Lab – 06 Hrs)

o Various Operations on Tables

Hive (Theory – 08 Hrs & Lab – 18 Hrs)

Apache Airflow (Theory – 06 Hrs & Lab – 06 Hrs)

Introduction to Apache Spark& Kafka (Theory – 24 Hrs & Lab – 36 Hrs)

Session: 23, 24 and 25

You might also like