Big Data Analytics- sem 7 CVMU
Big Data Analytics- sem 7 CVMU
Semester: VII
Course Objectives: This course gives an overview of Big Data, the characteristics of Big Data and
its applications in Big Data Analytics. In addition, it also focuses on the tools and algorithms that
covers a wide range of analytics platforms and databases, including Hadoop, Sqoop, Hive, Pig, HBase
and Spark.
Detailed Syllabus:
Sr. Contents Hours
1 Introduction to Big Data: 05
Classification of Digital Data, Structured Data, Semi- Structured data, Unstructured
Data, Characteristic of Data, Evolution of Big Data, Definition of Big Data, 4Vs of Data-
Volume, Velocity, Variety and Veracity, Big Data requirement, Traditional Business
intelligence versus Big Data, Introduction to Big Data Analytics.
2 NoSQL: 05
What is it? Where It is Used, Types of NoSQL databases, Why NoSQL?, Advantages
of NoSQL, Use of NoSQL in Industry, SQL vs NoSQL, NewSQL.
3 Introduction to Hadoop: 10
Features of Hadoop, Key Advantages of Hadoop, Versions of Hadoop, Hadoop
Ecosystems, Hadoop Vs SQL, Hadoop Components, Use case of Hadoop, Processing
data with Hadoop, YARN Components, YARN Architecture, YARN MapReduce
Application, Execution Flow, YARN Workflow, Anatomy of MapReduce Program,
Input Splits, Relation between Input Splits and HDFS Blocks.
4 HDFS, SQOOP, HIVE, PIG AND HBASE: 12
HDFS: Daemons, Anatomy of File Read, Anatomy of File Write, Replica Placement
Strategy, Working with HDFS Commands
Sqoop: Introduction, import and export command
Hive: Hive Architecture and Installation, Comparison with Traditional Database,
HiveQL Querying Data, Sorting and Aggregating, Map Reduce Scripts, Joins & Sub
queries
PIG: PIG Architecture & Data types, Shell and Utility components, PIG Latin
Relational Operators, PIG Latin: File Loaders and UDF, Programming structure in
UDF, PIG Jars Import, limitations of PIG.
HBase: HBase concepts, Advanced Usage, Schema Design, Advance Indexing
Zookeeper: How it helps in monitoring a cluster, HBase uses Zookeeper and how to
Build Applications with Zookeeper.
5 SPARK: 08
Introduction to Data Analysis with Spark, Features of Apache Spark, Components of
Spark, Downloading Spark and Getting Started, RDD Transformations, RDD Actions,
Programming with RDDs, Machine Learning with MLlib.
Total 40
Reference Books:
1 BIG Data and Analytics, Sima Acharya, Subhashini Chhellappan, Willey
2 DT Editorial Services, “Black Book- Big Data (Covers Hadoop 2, MapReduce, Hive, Yarn, PIG,
R, Data visualization)”, Dream tech Press edition 2016.
3 Learning Spark: Lightning-Fast Big Data Analysis Paperback by Holden Karau
4 Chris Eaton, Dirk derooset al., “Understanding Big data”, McGraw Hill, 2012.
5 Tom White, “HADOOP: The Definitive Guide”, O Reilly 2012.
6 Vignesh Prajapati, “Big Data Analytics with R and Hadoop”, Packet Publishing 2013.
7 Learning Spark: Lightning-Fast Big Data Analysis Paperback by Holden Kara
8 http://www.bigdatauniversity.com/
Pedagogy:
● Direct classroom teaching
● Audio Visual presentations/demonstrations
● Assignments/Quiz
● Continuous assessment
● Interactive methods
● Seminar/Poster Presentation
● Industrial/ Field visits
● Course Projects
Curriculum Revision:
Version: 1.0
Drafted on (Month-Year): June -2020
Last Reviewed on (Month-Year): -
Next Review on (Month-Year): June-2025