0% found this document useful (0 votes)

14 views

Big Data Analytics M1

The document provides an overview of Big Data, including its characteristics, evolution, challenges, and importance in decision-making and innovation. It discusses various technologies used in big data analytics, such as data storage, mining, analytics, and visualization tools. Additionally, it outlines the role of data scientists and the elements of the big data ecosystem, including data sensing, collection, wrangling, analysis, and storage.

Uploaded by

nithishlkumar9194

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Big Data Analytics M1

Uploaded by

nithishlkumar9194

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Big Data Analytics

Module 1
Introduction to Big Data
Data
• Data is a set of values that represent a concept or concepts. It can be raw
information, such as numbers or text, or it can be more complex, such as images,
graphics, or videos.
Characteristics of Data
Composition: deals with structure of data, that is, the sources of data, the types, and
the nature of the data as to whether it is static or real-time streaming.
Condition: The condition of data deals with the state of the data that is “can one use
this data as is for analysis?” or “Does it require cleansing for further enhancement and
enrichment?”
Context: deals with “Where has this data been generated?”, “Why was this data
generated?” and so on.
In simple terms, characteristics of data includes
• Accuracy
• Completeness
• Consistency
• Timeliness
• Validity
• Uniqueness
Characteristics of Big Data
The characteristics of big data includes,
Evolution of Big Data
• 1970s and before – Mainframe: Basic Data Storage, Data has a structure.
• 1980s and 1990s – Relational Databases: It has a structure and relationship of the
data.
• 2000s and beyond – Structured, Unstructured and Multimedia data in the form of
WWW.
There are a lot of milestones in the evolution of Big Data which are described below:
Data Warehousing:
In the 1990s, data warehousing emerged as a solution to store and analyze large
volumes of structured data.
Hadoop:
Hadoop was introduced in 2006 by Doug Cutting and Mike Cafarella. Distributed storage
medium and large data processing are provided by Hadoop, and it is an open-source
framework.
Evolution of Big Data

NoSQL Databases:
In 2009, NoSQL databases were introduced, which provide a flexible way to store and
retrieve unstructured data.

At present, technologies like cloud computing, machine learning are widely used by
companies for reducing the maintenance cost and infrastructure cost and also to get the
proper insights from the big data effectively.
Challenges with Big Data

• Data Volume: Managing and Storing Massive Amounts of Data

• Data Variety: Handling Diverse Data Types
• Data Velocity: Processing Data in Real-Time
• Data Veracity: Ensuring Data Quality and Accuracy
• Data Security and Privacy: Protecting Sensitive Information
• Data Integration: Combining Data from Multiple Sources
• Data Analytics: Extracting Valuable Insights
• Data Governance: Establishing Policies and Standards
Data Warehouse Environment
Data Warehouse Environment
Traditional Business Intelligence versus Big Data
Importance of Big Data
• Enhanced Decision-Making (vast amounts of data, discovering new patterns and
trends)
• Understanding Consumer Behavior (for recommendations)
• Competitive Advantage (Competitor analysis, market trends)
• Innovation and New Opportunities (reveals gaps in existing products or services)
• Efficiency and Cost Reduction (optimize processes for reducing waste and improve
resource allocation)
• Improved Risk Management (advanced modelling and simulation)
• Enhanced Public Services (traffic management and disease control)
• Better Workforce Insights (employee engagement, performance, and retention)
• AI and Machine Learning (predict accurately)
• Advancements in Research (academics, healthcare etc.,)
Big Data Technologies

Big data technologies can be categorized into four main types:

• Data storage,
• Data mining,
• Data analytics and
• Data visualization.
Big Data Technologies
1. Data Storage:
Big data technology that deals with data storage has the capability to fetch, store, and
manage big data. Two commonly used tools are Hadoop and MongoDB.
Hadoop:
• It is the most widely used big data tool.
• It is an open-source software platform which allows for faster data processing.
• The framework is designed to reduce bugs or faults and process all data formats.
MongoDB:
• It is a NoSQL database that can be used to store large volumes of data using key-value
pairs.
• It is a most popular big data databases because it can manage and store unstructured
data.
Big Data Technologies
2. Data mining
Data mining extracts the useful patterns and trends from the raw data. Big data
technologies such as Rapidminer and Presto can turn unstructured and structured data
into usable information.
Rapidminer:
• Rapidminer is a data mining tool that can be used to build predictive models.
• It is used for processing and preparing data, and building machine and deep learning
models.
Presto:
• Presto is an open-source query engine that was originally developed by Facebook to
run analytic queries against their large datasets. Now, it is available widely.
• One query on Presto can combine data from multiple sources within an organization
and perform analytics on them.
Big Data Technologies
3. Data analytics
In big data analytics, technologies are used to clean and transform data into information
that can be used to drive business decisions. This next step (after data mining) is where
users perform algorithms, models, and predictive analytics using tools such as Spark and
Splunk.
Spark:
• Spark is a popular big data tool for data analysis because it is fast and efficient at
running applications.
• Spark supports a wide variety of data analytics tasks and queries.
Splunk:
• Splunk is another popular big data analytics tool for deriving insights from large
datasets. It has the ability to generate graphs, charts, reports, and dashboards.
• Splunk also enables users to incorporate artificial intelligence (AI) into data outcomes.
Big Data Technologies
4. Data visualization
Finally, big data technologies can be used to create good visualizations from the data. In
data-oriented roles, data visualization is a skill that is beneficial for presenting
recommendations to stakeholders for business profitability and operations—to tell an
impactful story with a simple graph.
Tableau:
• Tableau is a very popular tool in data visualization because its drag-and-drop interface
makes it easy to create pie charts, bar charts, box plots, Gantt charts, and more.
• It is a secure platform that allows users to share visualizations and dashboards in real
time.
Looker:
• Looker is a business intelligence (BI) tool used to make sense of big data analytics and
then share those insights with other teams.
• Charts, graphs, and dashboards can be configured with a query, such as monitoring
weekly brand engagement through social media analytics.
What kind of Technologies are we looking
toward to meet the challenges posed by big
data?
1. The first requirement is cheap and abundant storage.
2. Need fast processors for quick processing of big data.
3. Open source.
4. Advanced analysis.
5. Resource allocation arrangements.
Data Science
• Data science is the science of extracting knowledge from data.
• It is a science of drawing out hidden patterns amongst data using statistical and
mathematical techniques.
• It is a multidisciplinary approach that combines principles and practices from the fields
of mathematics, statistics, artificial intelligence, and computer engineering to analyze
large amounts of data.
• This analysis helps data scientists to ask and answer questions like what happened,
why it happened, what will happen, and what can be done with the results.
The basic business acumen skills required are
1. Understanding of Domain
2. Business Strategy
3. Problem Solving
4. Communication
Responsibilities of Data Scientist
• Prepares and integrates large and varied datasets
• Applies business domain knowledge to provide context
• Models and analyses to comprehend, interpret relationships, patterns and trends
• Communicates / presents the findings and results.

In simple words, the responsibilities of data scientist includes,

• Data Management
• Applying Analytical Techniques
• Communicating with the Stakeholders
Soft state Eventual consistency

Soft state refers to a system design principle where the state of a system or its data is
allowed to change over time, even without direct user interaction.

Eventual consistency is a consistency model used in distributed systems where updates

to a data item are propagated asynchronously across nodes.
Role / Elements of Big Data Ecosystem

The elements of big data ecosystem includes,

1. Sensing
2. Collection
3. Wrangling
4. Analysis
5. Storage
Role / Elements of Big Data Ecosystem
1. Sensing
Sensing refers to the process of identifying data sources for your project.
This evaluation includes asking such questions as:
• Is the data accurate?
• Is the data recent and up to date?
• Is the data complete? Is the data valid? Can it be trusted?
Key pieces of the data ecosystem leveraged in this stage include:
• Internal data sources: Spreadsheets, and other resources that originate from within
organization.
• External data sources: Databases, spreadsheets, websites that originate from outside
your organization.
• Software: Custom software that exists for the sole purpose of data sensing.
• Algorithms: A set of steps or rules that automates the process of evaluating data for
accuracy and completion before it’s used.
Role / Elements of Big Data Ecosystem
2. Collection
Once a potential data source has been identified, data must be collected. Data collection
can be completed through manual or automated processes.

Key pieces of the data ecosystem leveraged in this stage include:

• Various programming languages: These include R, Python, SQL, and JavaScript.
• Code packages and libraries: Existing code that’s been written and tested and allows
data scientists to generate programs more quickly and efficiently.
• APIs (Application Programming Interface): Software programs designed to interact
with other applications and extract data.
Role / Elements of Big Data Ecosystem
3. Wrangling
• Data wrangling is a set of processes designed to transform raw data into a more usable
format.
• Depending on the quality of the data in question, it may involve merging multiple
datasets, identifying and filling gaps in data, deleting unnecessary or incorrect data,
and “cleaning” and structuring data for future analysis.
Key pieces of the data ecosystem leveraged in this stage include:
• Algorithms: A series of steps or rules to be followed to solve a problem.
• Various programming languages: These include R, Python, SQL, and JavaScript, and
can be used to write algorithms.
Role / Elements of Big Data Ecosystem
4. Analysis
• After raw data has been inspected and transformed into a readily usable state, it can
be analyzed. wrangling is a set of processes designed to transform raw data into a
more usable format.
• Depending on the quality of the data in question, it may involve merging multiple
datasets, identifying and filling gaps in data, deleting unnecessary or incorrect data,
and “cleaning” and structuring data for future analysis.
Key pieces of the data ecosystem leveraged in this stage include:
• Algorithms: A series of steps or rules to be followed to solve a problem.
• Various programming languages: These include R, Python, SQL, and JavaScript, and
can be used to write algorithms.
Role / Elements of Big Data Ecosystem
5. Storage
• Throughout all of the data life cycle stages, data must be stored in a way that’s both
secure and accessible.
Key pieces of the data ecosystem leveraged in this stage include:
• Cloud-based storage solutions: These allow an organization to store data off-site and
access it remotely.
• On-site servers: These give organizations a greater sense of control over how data is
stored and used.
• Other storage media: These include hard drives, USB devices, CD-ROMs, and floppy
disks

Big Data Unit 1 Notes
100% (1)
Big Data Unit 1 Notes
27 pages
Introduction to Big Data
No ratings yet
Introduction to Big Data
4 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
20 pages
unit 1 b tech 3 year bd
No ratings yet
unit 1 b tech 3 year bd
10 pages
Big-Data-A-Comprehensive-Overview
No ratings yet
Big-Data-A-Comprehensive-Overview
25 pages
Big Data Technology Report With Pages Removed
No ratings yet
Big Data Technology Report With Pages Removed
32 pages
BIG DATA Notes
No ratings yet
BIG DATA Notes
11 pages
Bigdata Notes
No ratings yet
Bigdata Notes
136 pages
Big Data Analytics
No ratings yet
Big Data Analytics
8 pages
Big Data
No ratings yet
Big Data
16 pages
BD 1
No ratings yet
BD 1
15 pages
Now To Be Data
No ratings yet
Now To Be Data
16 pages
Chapter 2 - Data Science
No ratings yet
Chapter 2 - Data Science
20 pages
What Is Big Data
No ratings yet
What Is Big Data
18 pages
Unit 1 - ETI (BDA)
No ratings yet
Unit 1 - ETI (BDA)
20 pages
Introduction To Big Data: Types of Digital Data, History of Big Data Innovation
No ratings yet
Introduction To Big Data: Types of Digital Data, History of Big Data Innovation
12 pages
Unit-1 Final sgs
No ratings yet
Unit-1 Final sgs
24 pages
Introduction To Big Data Platform
No ratings yet
Introduction To Big Data Platform
20 pages
BDA-Unit-1 (2)
No ratings yet
BDA-Unit-1 (2)
39 pages
BD U-1 (Anupam Sir)
No ratings yet
BD U-1 (Anupam Sir)
20 pages
Introduction To Bda
No ratings yet
Introduction To Bda
67 pages
Unit 1 and Unit 2 notes bda
No ratings yet
Unit 1 and Unit 2 notes bda
11 pages
UNIT 1 BD
No ratings yet
UNIT 1 BD
24 pages
UNIT-1:Overview of Big Data
No ratings yet
UNIT-1:Overview of Big Data
10 pages
Unit 1 Big Data Analytics Full
No ratings yet
Unit 1 Big Data Analytics Full
29 pages
BIG DATA_UNIT-I
No ratings yet
BIG DATA_UNIT-I
17 pages
Big Data and Hadoop Self Notes
No ratings yet
Big Data and Hadoop Self Notes
16 pages
PPT 1.1.2
No ratings yet
PPT 1.1.2
17 pages
Seminar Report Alisha
No ratings yet
Seminar Report Alisha
22 pages
BDA pptx
No ratings yet
BDA pptx
94 pages
BA ppt
No ratings yet
BA ppt
17 pages
(15) Big Data
No ratings yet
(15) Big Data
10 pages
Big Data Analysis by deshbandhu
No ratings yet
Big Data Analysis by deshbandhu
368 pages
Big Data Seminar Report Rahul Jain
No ratings yet
Big Data Seminar Report Rahul Jain
41 pages
Big Data
No ratings yet
Big Data
190 pages
Big Data Analytics
No ratings yet
Big Data Analytics
32 pages
Big Data: Concepts, Techniques, Storage and Challenges
No ratings yet
Big Data: Concepts, Techniques, Storage and Challenges
9 pages
Bigdata
No ratings yet
Bigdata
12 pages
ETB 1 (Big data)
No ratings yet
ETB 1 (Big data)
28 pages
BDA 01 - Introduction
No ratings yet
BDA 01 - Introduction
43 pages
DBMS Unit1
No ratings yet
DBMS Unit1
30 pages
BDA U1 copy
No ratings yet
BDA U1 copy
78 pages
3
No ratings yet
3
12 pages
UNIT 1 -BDA
No ratings yet
UNIT 1 -BDA
21 pages
BIGDATAUNIT1AKTUpdf
No ratings yet
BIGDATAUNIT1AKTUpdf
33 pages
Unit 1_BDS_DS307
No ratings yet
Unit 1_BDS_DS307
47 pages
Ccs 334
No ratings yet
Ccs 334
16 pages
Big Data
No ratings yet
Big Data
8 pages
Computer Networks TCP
No ratings yet
Computer Networks TCP
48 pages
Big Data Unit 1 Notes - 240311 - 100703
No ratings yet
Big Data Unit 1 Notes - 240311 - 100703
15 pages
Big-Data-sent-24-10-24 (2)
No ratings yet
Big-Data-sent-24-10-24 (2)
49 pages
Chapter 1
No ratings yet
Chapter 1
21 pages
Ethiopin Tecica University Departement of Ict Cours Title: Big Data
No ratings yet
Ethiopin Tecica University Departement of Ict Cours Title: Big Data
15 pages
Big Data
No ratings yet
Big Data
63 pages
UNIT Two Emerging Technology
No ratings yet
UNIT Two Emerging Technology
43 pages
Unit 1 (1)
No ratings yet
Unit 1 (1)
21 pages
Big Data Unit 1
No ratings yet
Big Data Unit 1
29 pages
BDA Class1
No ratings yet
BDA Class1
26 pages
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
Data Analysis
No ratings yet
Data Analysis
4 pages
Sachin Kapoor Data & Analytics Leader (1)
No ratings yet
Sachin Kapoor Data & Analytics Leader (1)
5 pages
No-Fluff AI Cheat Sheets for Busy People - Get Yours Now
100% (1)
No-Fluff AI Cheat Sheets for Busy People - Get Yours Now
74 pages
Financial and Managerial Accounting 9th Edition John Wildinstant download
100% (2)
Financial and Managerial Accounting 9th Edition John Wildinstant download
72 pages
Financial Performance Dashboard (Business Analyst)
No ratings yet
Financial Performance Dashboard (Business Analyst)
14 pages
Hari Prakash - Data Analytics
No ratings yet
Hari Prakash - Data Analytics
1 page
Geckoboard (2025)
No ratings yet
Geckoboard (2025)
8 pages
LabInFlow Catalog - an order management system for research labs
No ratings yet
LabInFlow Catalog - an order management system for research labs
19 pages
The Big Picture: How to Use Data Visualization to Make Better Decisions―Faster Steve Wexler - Quickly access the ebook and start reading today
100% (1)
The Big Picture: How to Use Data Visualization to Make Better Decisions―Faster Steve Wexler - Quickly access the ebook and start reading today
58 pages
Data Visualization Case Study in Python
No ratings yet
Data Visualization Case Study in Python
7 pages
3 User Manual-Microsoft Power BI (1)
No ratings yet
3 User Manual-Microsoft Power BI (1)
101 pages
meraki-ms-switches-report
No ratings yet
meraki-ms-switches-report
68 pages
CRM Analytics and Einstein Discovery Consultant
No ratings yet
CRM Analytics and Einstein Discovery Consultant
7 pages
SAMYUKTA CHANDAWAD
No ratings yet
SAMYUKTA CHANDAWAD
9 pages
B.I RECORD
No ratings yet
B.I RECORD
60 pages
DM BCA unit 4
No ratings yet
DM BCA unit 4
16 pages
Vikash_Raj_2113047_NITS
No ratings yet
Vikash_Raj_2113047_NITS
1 page
Management Information System in India
0% (1)
Management Information System in India
5 pages
Data Visualization Tools
No ratings yet
Data Visualization Tools
106 pages
HREAID_2024_33_Excellence in Workplace Wellbeing
No ratings yet
HREAID_2024_33_Excellence in Workplace Wellbeing
15 pages
Margin Calls Intra for 2-V2, Technical Analysis Scanner 2
No ratings yet
Margin Calls Intra for 2-V2, Technical Analysis Scanner 2
1 page
POWER BI Mastery in 15 Days
No ratings yet
POWER BI Mastery in 15 Days
46 pages
Daksh Blackbook
No ratings yet
Daksh Blackbook
94 pages
CSA-Reviewer-1-1
No ratings yet
CSA-Reviewer-1-1
39 pages
Exam Prep for PL-300_Microsoft Power BI Analyst from GTech
No ratings yet
Exam Prep for PL-300_Microsoft Power BI Analyst from GTech
18 pages
8_Action_Centers
No ratings yet
8_Action_Centers
26 pages
Management Accounting Techniques in the Public Sector Using
No ratings yet
Management Accounting Techniques in the Public Sector Using
14 pages
ANSARI HAMZA RESUME MIS EXECUTIVE PDF (1) (1)
No ratings yet
ANSARI HAMZA RESUME MIS EXECUTIVE PDF (1) (1)
1 page
Internship report
No ratings yet
Internship report
40 pages
Embedded Analytics: Integrating analysis with the business workflow (Second Early Release) Donald Farmer - The special ebook edition is available for download now
100% (1)
Embedded Analytics: Integrating analysis with the business workflow (Second Early Release) Donald Farmer - The special ebook edition is available for download now
68 pages

Uploaded by

Uploaded by

Big Data Analytics

• Data Volume: Managing and Storing Massive Amounts of Data

Big data technologies can be categorized into four main types:

In simple words, the responsibilities of data scientist includes,

Eventual consistency is a consistency model used in distributed systems where updates

The elements of big data ecosystem includes,

Key pieces of the data ecosystem leveraged in this stage include:

You might also like