0% found this document useful (0 votes)

50 views20 pages

Unit 1 - ETI (BDA)

Uploaded by

farida07parveen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views20 pages

Unit 1 - ETI (BDA)

Uploaded by

farida07parveen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Unit- 1

Introduction to Big Data Analytics

Zikra Shaikh
1.1 Introduction:
● Characteristics of Data
● Evolution of Big Data
● Deﬁnition of Big Data
● Challenges with Big Data

Topics ●
●
What is Big Data
Why Big Data

1.2 introduction to Big Data Analytics:

Marks= 14, Hours = 08
● What is BDA
● Classiﬁcation of Analytics
● Why is BDA Important
● Data Science
● Responsibilities of a Data Scientist
● Terminologies Used in Big Data Environments
What is Data?
● Data refers to raw facts, ﬁgures, or information that, on its own, may not
provide meaningful insights.
● These raw elements are typically in the form of numbers, text, images, or
any other format.
● Data becomes valuable when it is processed, organized, and interpreted,
leading to meaningful patterns, relationships, or conclusions.
● Data is often manipulated and processed by computer systems to
perform tasks, make decisions, and generate insights.
History of Big Data Analytics

The history of Big Data analytics can be traced back to the early days of computing, when organizations
ﬁrst began using computers to store and analyze large amounts of data. However, it was not until the
late 1990s and early 2000s that Big Data analytics really began to take off, as organizations increasingly
turned to computers to help them make sense of the rapidly growing volumes of data being generated
by their businesses.

Today, Big Data analytics has become an essential tool for organizations of all sizes across a wide range
of industries. By harnessing the power of Big Data, organizations are able to gain insights into their
customers, their businesses, and the world around them that were simply not possible before.

As the ﬁeld of Big Data analytics continues to evolve, we can expect to see even more amazing and
transformative applications of this technology in the years to come.
Data can be categorized into two main types:

➢ Structured Data: This type of data is highly ➢ Unstructured Data: This type of data lacks
organized and formatted in a way that is a predeﬁned data model or structure and
easily searchable and can be processed by is not easily searchable in traditional
machines. Examples include databases, databases. Examples include text
spreadsheets, and tables. documents, images, videos, and social
media posts.
Characteristics of Data

1. Accuracy
2. Relevance
3. Completeness
4. Consistency
5. Timeliness
6. Accessibility
7. Granularity
8. Consolidation
9. Validity
10. Security
11. Scalability
12. Volatility

These qualities deﬁne the quality, usefulness, and reliability of data in various contexts.
Big Data
● Big Data is a collection of data that is huge in volume, yet growing
exponentially with time.
● It is a data with so large size and complexity that none of traditional data
management tools can store it or process it eﬃciently.
● Big data is also a data but with huge size.

Sources of Big Data

1. Stock Exchange 4. Bank Data

2. Social Media Data 5. Video sharing portals

3. Transport Data 6. Search Engine Data

Characteristics of Big Data
The three Vs of Big Data describe its main characteristics:

➔ Volume (the sheer size of data)

➔ Velocity (the speed at which data is generated and processed)
➔ Variety (the diversity of data types and sources).
➔ Variability
➔ Value
Challanges of big data
● Insuﬃcient understanding and acceptance of big data
● Confusion while big data tools selection
● Paying loads of money
● Data Integration
● Data Security
● Data Capture
● Storages of data
● Analysis of data
● Presentions
Introduction to Big Data Analytics
It process of collecting, examining, and analysing large amounts of dataset
containing a variety of data types to uncover hidden patterns, unknown correlation
making trend, customer preference and other useful information

Big Data analytics provides various advantages—it can be used for better decision
making, preventing fraudulent activities, among other things.
Classification of analytics
1. Descriptive Analytics: This summarizes past data into a form that people can
easily read. This helps in creating reports, like a company’s revenue, profit, sales,
and so on. Also, it helps in the tabulation of social media metrics.
2. Diagnostic Analytics: This is done to understand what caused a problem in the
first place. Techniques like drill-down, data mining, and data recovery are all
examples. Organizations use diagnostic analytics because they provide an
in-depth insight into a particular problem.
Classification of analytics
3. Predictive Analytics: This type of analytics looks into the historical and present
data to make predictions of the future. Predictive analytics uses data mining, AI,
and machine learning to analyze current data and make predictions about the
future. It works on predicting customer trends, market trends, and so on.
4. Prescriptive Analytics:This type of analytics prescribes the solution to a
particular problem. Perspective analytics works with both descriptive and
predictive analytics. Most of the time, it relies on AI and machine learning.
Why is big data analytics important
● Data is the oil for today’s world. With the right tools, technologies,
algorithms, we can use
● data and convert it into a distinct business advantage
● Data Science can help you to detect fraud using advanced machine
learning algorithms
● It helps you to prevent any significant monetary losses
● Allows to build intelligence ability in machines
● You can perform sentiment analysis to gauge customer brand loyalty
● It enables you to take better and faster decisions
● It helps you to recommend the right product to the right customer to
enhance your
● business
Data Science

● Data Science is the area of study which involves extracting insights

from vast amounts of data using various scientiﬁc methods,
algorithms, and processes. It helps you to discover hidden patterns
from the raw data.
● The term Data Science has emerged because of the evolution of
mathematical statistics, data analysis, and big data.
● Data Science is an interdisciplinary ﬁeld that allows you to extract
knowledge from structured or unstructured data.
● Data science enables you to translate a business problem into a
research project and then translate it back into a practical solution.
Data Science Jobs Roles

Most prominent Data Scientist job titles are:

•Data Scientist

•Data Engineer

•Data Analyst

•Statistician

•Data Architect

•Data Admin

•Business Analyst

•Data/Analytics Manager
Responsibilities of Data Scientist

● Collect data and identify data sources

● Analyze huge amounts of data, both structured and unstructured
● Create solutions and strategies to business problems
● Work with team members and leaders to develop data strategy
● To discover trends and patterns, combine various algorithms and modules
● Present data using various data visualization techniques and tools
● To boost general effectiveness and performance, stay current with the newest tools, trends,
and technologies
● Create analytics solutions for businesses by combining various tools, applied statistics, and
machine learning
Terminologies Used in Big Data Environments

In the realm of Big Data, several terminologies and concepts are commonly used
to describe various aspects of data processing, storage, and analytics. Here are
some key terminologies used in Big Data environments:

Big Data:
● Refers to large and complex datasets that are challenging to process
using traditional data management tools.
Volume, Velocity, Variety:
● The three Vs of Big Data describe its main characteristics: Volume (the
sheer size of data), Velocity (the speed at which data is generated and
processed), and Variety (the diversity of data types and sources).
Structured Data:
● Data that is organized in a tabular format with a fixed schema. Examples include relational
databases.
Unstructured Data:
● Data that lacks a predefined data model or structure. Examples include text, images, and
videos.
Semi-structured Data:
● Data that is partially organized and may have some level of structure. Examples include JSON
and XML files.
Hadoop:
● An open-source framework for distributed storage and processing of large datasets. It
includes the Hadoop Distributed File System (HDFS) and MapReduce programming model.
MapReduce:
● A programming model for processing and generating large datasets in parallel across a
distributed cluster.
Apache Spark:
● An open-source, distributed computing system that provides fast and general-purpose data
processing for Big Data.
NoSQL:
● Stands for "Not Only SQL" and refers to a category of databases that do not strictly adhere to
the traditional relational database model. Examples include MongoDB and Cassandra.
Data Lake:
● A centralized repository that allows organizations to store structured and
unstructured data at any scale. It enables data exploration and analysis.
Data Warehouse:
● A centralized repository for storing and analyzing structured data from different
sources. Data warehouses are designed for efficient querying and reporting.
ETL (Extract, Transform, Load):
● The process of extracting data from various sources, transforming it into a
suitable format, and loading it into a target system, such as a data warehouse.
Machine Learning:
● A field of artificial intelligence (AI) that involves the development of algorithms
that enable systems to learn and make predictions or decisions based on data.
Data Mining:
● The process of discovering patterns, trends, and insights from large datasets
using various techniques.
Data Governance:
● The overall management of the availability, usability, integrity, and security of data
within an organization.
Data Security:
● The practice of protecting data from unauthorized access, disclosure, alteration,
and destruction.
Data Privacy:
● Concerned with protecting individuals' personal information and ensuring that data
is handled in compliance with privacy regulations.
Streaming Analytics:
● Analyzing and processing real-time data streams as they are generated, allowing
for immediate insights and actions.
Lambda Architecture:
● A data processing architecture that combines batch processing and stream
processing to handle large-scale data.
Edge Computing:
● Processing and analyzing data closer to the source (at the edge of the network)
rather than in a centralized data center.

Big Data Analytics - CCS334 - Notes - Unit 1 - Understanding Big Data
No ratings yet
Big Data Analytics - CCS334 - Notes - Unit 1 - Understanding Big Data
40 pages
Big Data Analytics - CCS334 - Notes - ALL UNITS NOTES
No ratings yet
Big Data Analytics - CCS334 - Notes - ALL UNITS NOTES
130 pages
15-11-Rutgers Brochure
100% (1)
15-11-Rutgers Brochure
22 pages
The State of Cloud Driven Transformation HBR
No ratings yet
The State of Cloud Driven Transformation HBR
12 pages
Unit 1 Introduction To Data Science
No ratings yet
Unit 1 Introduction To Data Science
63 pages
Literature Survey FINAL PROJECT
No ratings yet
Literature Survey FINAL PROJECT
6 pages
Phpe 9 PFBV
No ratings yet
Phpe 9 PFBV
70 pages
BDA Unit 1
No ratings yet
BDA Unit 1
17 pages
Kwasu-Csc204 Big Data Computing and Security-1
No ratings yet
Kwasu-Csc204 Big Data Computing and Security-1
57 pages
Big Data Analytics
No ratings yet
Big Data Analytics
127 pages
The Effects of Digitalization On Students
No ratings yet
The Effects of Digitalization On Students
63 pages
Data, Big
No ratings yet
Data, Big
90 pages
Unit 1 BDA
No ratings yet
Unit 1 BDA
38 pages
Unit 1
No ratings yet
Unit 1
56 pages
Content Structure 03
No ratings yet
Content Structure 03
25 pages
Risk Oversight Technical Debt
No ratings yet
Risk Oversight Technical Debt
6 pages
BDA Unit 1
No ratings yet
BDA Unit 1
39 pages
Session 1
No ratings yet
Session 1
12 pages
Big Data Analytics Notes
No ratings yet
Big Data Analytics Notes
115 pages
Big Data
No ratings yet
Big Data
20 pages
Unit 1 - Big Data Analytics - CCS334
No ratings yet
Unit 1 - Big Data Analytics - CCS334
35 pages
BDA Unit 1 Notes
No ratings yet
BDA Unit 1 Notes
34 pages
MCK The Ai Native Telco Radical Transformation To Thrive in Turbulent Times
No ratings yet
MCK The Ai Native Telco Radical Transformation To Thrive in Turbulent Times
9 pages
Big Data - Iv Bda
No ratings yet
Big Data - Iv Bda
143 pages
L01-Fundamentals of Big Data and Data Analytics
No ratings yet
L01-Fundamentals of Big Data and Data Analytics
58 pages
Pes Executive Mba in Data Science and Business Analytics
No ratings yet
Pes Executive Mba in Data Science and Business Analytics
14 pages
CS 329 Lecture One 2025
No ratings yet
CS 329 Lecture One 2025
28 pages
Automation in Action at Sheriff
No ratings yet
Automation in Action at Sheriff
1 page
Big Data Analytics
No ratings yet
Big Data Analytics
83 pages
UNIT Two Emerging Technology
No ratings yet
UNIT Two Emerging Technology
43 pages
Big Data Analytics - Unit 1
No ratings yet
Big Data Analytics - Unit 1
29 pages
Fin Irjmets1711372102
No ratings yet
Fin Irjmets1711372102
3 pages
Business Analytics
No ratings yet
Business Analytics
34 pages
Big Data Analtics (Unit 1)
No ratings yet
Big Data Analtics (Unit 1)
31 pages
Introduction To Data
No ratings yet
Introduction To Data
34 pages
Unit 1 Notes Bda
No ratings yet
Unit 1 Notes Bda
20 pages
Unit 1 Introduction
No ratings yet
Unit 1 Introduction
70 pages
BDA Unit 1 Notes-1
No ratings yet
BDA Unit 1 Notes-1
34 pages
Snowplow 101 Guide To Marketing Attribution - 2023
No ratings yet
Snowplow 101 Guide To Marketing Attribution - 2023
16 pages
RRL
No ratings yet
RRL
7 pages
Generative Ai in The Pharmaceutical Industry Moving From Hype To Reality VF
No ratings yet
Generative Ai in The Pharmaceutical Industry Moving From Hype To Reality VF
25 pages
CC Unit 4
No ratings yet
CC Unit 4
22 pages
Azure Book 126
No ratings yet
Azure Book 126
1 page
Module 1
No ratings yet
Module 1
14 pages
BD 1
No ratings yet
BD 1
15 pages
Unit 1 (Chapter 1) - Introduction
No ratings yet
Unit 1 (Chapter 1) - Introduction
10 pages
Big Data
No ratings yet
Big Data
16 pages
Getting An Overview of Big Data (Module1)
No ratings yet
Getting An Overview of Big Data (Module1)
58 pages
Bda U1
No ratings yet
Bda U1
78 pages
Big Data Analytics
No ratings yet
Big Data Analytics
73 pages
Resume Sonaika Pati D (010824) - 4-1
No ratings yet
Resume Sonaika Pati D (010824) - 4-1
2 pages
Magic Quadrant For Intelligent Business Process Management Suites
No ratings yet
Magic Quadrant For Intelligent Business Process Management Suites
16 pages
Business Analytics Notes
No ratings yet
Business Analytics Notes
31 pages
Personal Swot Analysis: Internal Factors Strengths (+) Weaknesses (-)
No ratings yet
Personal Swot Analysis: Internal Factors Strengths (+) Weaknesses (-)
3 pages
BDA Notes
No ratings yet
BDA Notes
35 pages
Introduction To Big Data Unit - 2
No ratings yet
Introduction To Big Data Unit - 2
75 pages
BDAchap 1
No ratings yet
BDAchap 1
15 pages
Humanology Company Profile
No ratings yet
Humanology Company Profile
16 pages
BDA Notes
No ratings yet
BDA Notes
96 pages
UNIT I BIG DATA Extra Content
No ratings yet
UNIT I BIG DATA Extra Content
15 pages
PallaviSawlikar Resume New
No ratings yet
PallaviSawlikar Resume New
4 pages
AIHR-Data Driven Talent Acquisition Guide
100% (1)
AIHR-Data Driven Talent Acquisition Guide
18 pages
Unit 1 and Unit 2 Notes Bda
No ratings yet
Unit 1 and Unit 2 Notes Bda
11 pages
10 Powerful Behavioral Segmentation Methods To Understand Customer
No ratings yet
10 Powerful Behavioral Segmentation Methods To Understand Customer
48 pages
Big Data Analytics
No ratings yet
Big Data Analytics
14 pages
Big Data Analytics - Unit 1
No ratings yet
Big Data Analytics - Unit 1
43 pages
Reviewed Big Data Assignment
No ratings yet
Reviewed Big Data Assignment
6 pages
Introduction To Big Data - Report 1
No ratings yet
Introduction To Big Data - Report 1
5 pages
What Is Big Data & Why Is Big Data Important in Today's Era
100% (1)
What Is Big Data & Why Is Big Data Important in Today's Era
13 pages
Big Data Analytics Project Proposal by Slidesgo
No ratings yet
Big Data Analytics Project Proposal by Slidesgo
12 pages
What Is Big Data
No ratings yet
What Is Big Data
4 pages
MTM User Guide For MCTs
No ratings yet
MTM User Guide For MCTs
22 pages
Introduction To Big Data
No ratings yet
Introduction To Big Data
4 pages
Big Data Analytics
No ratings yet
Big Data Analytics
37 pages
Big Data Security and Privacy A Review On Issues C
No ratings yet
Big Data Security and Privacy A Review On Issues C
7 pages
Unit 2
No ratings yet
Unit 2
35 pages
Assignment 2 Analysis and Product Strategy - Tableau
No ratings yet
Assignment 2 Analysis and Product Strategy - Tableau
11 pages
Insights Into Big Data: An Industrial Perspective
No ratings yet
Insights Into Big Data: An Industrial Perspective
52 pages
Journal Marketing
No ratings yet
Journal Marketing
11 pages
1.big Data and Its Importance
No ratings yet
1.big Data and Its Importance
17 pages
Syllabus (CBCS) : Faculty of Commerce & Business Management, Kakatiya University
No ratings yet
Syllabus (CBCS) : Faculty of Commerce & Business Management, Kakatiya University
50 pages
Big Data and Data Analysis: Offurum Paschal I Kunoch Education and Training College, Owerri
No ratings yet
Big Data and Data Analysis: Offurum Paschal I Kunoch Education and Training College, Owerri
35 pages
Challenges in Big Data Analytics Techniques
No ratings yet
Challenges in Big Data Analytics Techniques
6 pages
OpenText Vendor Invoice Management For SAP Solutions 7.5 SP6 - User Guid...
100% (1)
OpenText Vendor Invoice Management For SAP Solutions 7.5 SP6 - User Guid...
130 pages
Applied Microsoft Power BI Bring Your Data To Life
100% (13)
Applied Microsoft Power BI Bring Your Data To Life
592 pages
Management Information System: Banking Industry
No ratings yet
Management Information System: Banking Industry
12 pages
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
From Everand
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
FLOYD BAX
No ratings yet
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
From Everand
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
Rob Botwright
No ratings yet
Be Data Curious!: Be Data Curious!, #1
From Everand
Be Data Curious!: Be Data Curious!, #1
Nick Jewell
No ratings yet

Uploaded by

Uploaded by

Unit- 1

Introduction to Big Data Analytics

1.2 introduction to Big Data Analytics:

Sources of Big Data

1. Stock Exchange 4. Bank Data

2. Social Media Data 5. Video sharing portals

3. Transport Data 6. Search Engine Data

➔ Volume (the sheer size of data)

● Data Science is the area of study which involves extracting insights

Most prominent Data Scientist job titles are:

● Collect data and identify data sources

You might also like