0% found this document useful (0 votes)

9 views

BigDataAnalytics _ Unit1

The document provides an overview of data analytics, detailing its importance, types, and processes involved in data collection and management. It covers various analytics types, including descriptive, diagnostic, predictive, and prescriptive, along with methods for collecting primary and secondary data. Additionally, it discusses the significance of using multiple data sources, data exploration, and the importance of fixing data issues to ensure accuracy and reliability in analysis.

Uploaded by

21ucs048

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

BigDataAnalytics _ Unit1

Uploaded by

21ucs048

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Unit - 1

Introduction to Data analytics

● Data analytics is an important field that involves the process of collecting, processing, and
interpreting data to uncover insights and help in making decisions.
● Used to identify trends, draw conclusions, and extract meaningful information.

Types of Data Analytics

There are four major types of data analytics:
❖ Predictive (forecasting)
❖ Descriptive (business intelligence & data mining)
❖ Prescriptive (optimization and simulation)
❖ Diagnostic analytics
1. Descriptive analysis -
● Examines data to gain insights into what happened or what is happening in the data
environment.
● It is characterized by data visualizations such as pie charts, bar charts, line graphs, tables, or
generated narratives.

For Example: A flight booking service may record data like the number of tickets booked each day.
Descriptive analysis will reveal booking spikes, booking slumps, and high-performing months for this service.

2. Diagnostic analysis -
● Detailed data examination to understand why something happened.
● It is characterized by techniques such as drill-down, data discovery, data mining, and
correlations.
● Multiple data operations and transformations may be performed on a given data set to discover
unique patterns in each of these techniques.

For example: the flight service might drill down on a particularly high-performing month to better
understand the booking spike. This may lead to the discovery that many customers visit a particular city to
attend a monthly sporting event.
3. Predictive analysis -

● Historical data to make accurate forecasts about data patterns that may occur in the future.
● It is characterized by techniques such as machine learning, forecasting, pattern matching, and
predictive modeling.
● In each of these techniques, computers are trained to reverse engineer causality connections in the data.

For example: the flight service team might use data science to predict flight booking patterns for the coming year at the
start of each year. The computer program or algorithm may look at past data and predict booking spikes for certain
destinations in May. Having anticipated their customer’s future travel requirements, the company could start targeted
advertising for those cities from February.

4. Prescriptive analysis -

● It not only predicts what is likely to happen but also suggests an optimum response to that outcome.
● It can analyze the potential implications of different choices and recommend the best course of action.
● It uses graph analysis, simulation, complex event processing, neural networks, and
recommendation engines from machine learning.

For Example: Prescriptive analysis could look at historical marketing campaigns to maximize the advantage of the
upcoming booking spike. A data scientist could project booking outcomes for different levels of marketing spend on
various marketing channels. These data forecasts would give the flight booking company greater confidence in their
marketing decisions.
Data Collection and Management
● Data Collection is the process of collecting information from relevant sources to find a solution
to the given statistical inquiry. Collection of Data is the first and foremost step in a statistical
investigation, research and business intelligence.

Primary Data
● Primary data refers to information collected directly from first-hand sources specifically for a
particular research purpose.
● This type of data is gathered through various methods, including surveys, interviews,
experiments, observations, and focus groups.
● One of the main advantages of primary data is that it provides current, relevant, and specific
information tailored to the researcher’s needs, offering a high level of accuracy and control over
data quality.
Secondary Data
● Secondary data refers to information that has already been collected, processed, and
published by others.
● This type of data can be sourced from existing research papers, government reports, books,
statistical databases, and company records.
● The advantage of secondary data is that it is readily available and often free or less expensive to
obtain compared to primary data.
Primary Data are collected from:

● Surveys and Questionnaires: In a survey, a sample population is questioned about a set of predetermined
questions to gather data. This approach helps acquire demographic data as well as subjective
preferences and opinions. Online questionnaires, telephone interviews, and in-person interviews are all
options for conducting surveys.
● Observational Studies: In observational studies, information is gathered by directly observing
and documenting occurrences, actions, or events. In disciplines like anthropology, psychology, and
the social sciences, this approach is frequently employed. Field observations, videotapings, or already-existing
records and documentation are all methods for gathering observational data.
● Experiments: Experiments entail changing factors to examine how they affect a desired
outcome. By contrasting a control group with one or more experimental groups, data are gathered.
Researchers can establish cause-and-effect links using this technique. Experiment data might be gathered in
controlled lab environments or real-world situations.
● Interviews: Individuals are interviewed one-on-one or in groups to collect data. Interviews can be
pre-planned with a series of questions or left unplanned to allow for free-flowing discussions. This approach
works well for obtaining in-depth knowledge, insights, & qualitative data.
● Web scraping: This technique automatically extracts data from websites. Large amounts of organized or
unstructured data can be gathered using a variety of web sources. Programming expertise and commitment to
moral and legal standards are required for web scraping.
● Sensor data collection: Data from real-world objects or settings is gathered using sensors.
Examples include heart rate monitors, accelerometers, temperature sensors, and GPS trackers. In industries
including the Internet of Things, healthcare, and environmental monitoring, sensor data collection is common.
● Social Media Monitoring: As social media platforms have grown in popularity, researchers are
gathering information from sites like Twitter, Facebook, and Instagram to analyze trends,
attitudes, and general public opinion. This approach aids in the comprehension of user behavior and social
dynamics.

Secondary Data are collected from:

● Existing Databases & Records: Information may be gathered from historical archives, databases, or
records that are already in existence. This technique is time- and money-efficient, especially when working
with huge datasets. Government information, client databases, & medical records are a few examples.
● Data management is the practice of collecting, organizing, protecting, and storing an
organization’s data so it can be analyzed for business decisions.
Types of Data Management
● Data management techniques include the following:
a. Data preparation is used to clean and transform raw data into the right shape and format
for analysis, including making corrections and combining data sets.
b. Data pipelines enable the automated transfer of data from one system to another.
c. ETLs (Extract, Transform, Load) are built to take the data from one system, transform it,
and load it into the organization’s data warehouse.
d. Data catalogs help manage metadata to create a complete picture of the data, providing a
summary of its changes, locations, and quality while also making the data easy to find.
e. Data warehouses are places to consolidate various data sources, contend with the many
data types businesses store, and provide a clear route for data analysis.
f. Data governance defines standards, processes, and policies to maintain data security and
integrity.
g. Data architecture provides a formal approach for creating and managing data flow.
h. Data security protects data from unauthorized access and corruption.
i. Data modeling documents the flow of data through an application or organization.
Sources of Data
● A data source is a place or origin from which data is received in the context of
data management and collecting in data science.
● A database, website, API, sensor, or any other platform or system that produces
or stores data can be a data source. To obtain the information required for
analysis and decision-making processes, data scientists locate and access
pertinent data sources.

● Internal data: This refers to information gathered within a company, such as financial, customer, or sales
data.
● External data: This refers to information gathered from sources outside of an organization, such as the
government, social media, or the weather.
● Sensor data: This refers to information gathered through sensors, such as GPS, temperature, or heart rate
readings.
● Text data: This is information gathered from written materials like news stories, social media posts, and
product reviews.
● Image data: This is information gathered from visual sources like pictures, x-rays, or satellite images.
● Audio data: This is information that has been gathered from audio sources like voice, music, or noises in the
environment.
Using Multiple Data Source
Importance of Multiple Data Sources

● Comprehensive Insights: Accessing varied perspectives allows for deeper understanding and better
decision-making.
● Enhanced Accuracy: Cross-verification of data ensures reliable outcomes.
● Rich Context: Adding external data enriches internal datasets, providing broader context.
● Improved Predictions: Aggregating diverse data improves the performance of predictive and prescriptive models.

Challenges in Using Multiple Data Sources

1. Data Integration
○ Handling different formats (structured, semi-structured, unstructured).
○ Combining siloed data from disparate sources.
2. Data Quality
○ Cleaning inconsistent or missing values.
○ Removing duplicates and irrelevant data.
3. Data Governance
○ Ensuring privacy, security, and compliance (e.g., GDPR, HIPAA).
4. Scalability
○ Managing the increasing volume, velocity, and variety of data.
5. Real-Time Processing
○ Managing latency for streaming data sources.
Steps for Using Multiple Data Sources

Step 1: Data Collection

● Identify internal and external data relevant to the analysis.

● Use APIs, web scraping, and IoT devices for automated data collection.

Step 2: Data Integration

● Use Extract-Transform-Load (ETL) processes to:

○ Extract data from sources.
○ Transform it into a unified format.
○ Load it into a central repository (e.g., data lake or warehouse).

Step 3: Data Cleaning

● Remove noise, irrelevant records, and duplicates.

● Handle missing values (e.g., imputation or exclusion).
● Normalize data formats (e.g., units, date formats).
Steps for Using Multiple Data Sources

Step 4: Data Enrichment

● Merge datasets using unique identifiers (e.g., user ID, location ID).
● Add external data for context (e.g., weather, market trends).

Step 5: Data Analysis

● Apply statistical, machine learning, or deep learning techniques.

● Use tools like Apache Spark, Hadoop, or TensorFlow for processing.

Step 6: Visualization and Insights

● Use dashboards or visualization tools (e.g., Tableau, Power BI).

● Present insights to stakeholders with actionable recommendations.
Data Collection and API
● Data collection using APIs involves accessing these interfaces provided by online services, platforms, or
data providers to retrieve structured data.
● Using an API for data collection is a powerful way to obtain real-time or historical data for data
analysis, machine learning models, or any data-driven application.
● Communicating and sharing data between various software systems is made possible by a collection of
protocols, procedures, and building blocks for software applications.
Steps needed to Access and Collect data using APIs in R
To access and collect data using an API in R, you need to follow the following steps:
1. Identify the API: Locate the API that offers the information you desire, then study its
documentation to learn about its endpoints, parameters, and any authentication needs.
2. Install the required software: Install the R packages needed to access and work with the API’s
data. A few popular packages include httr, jsonlite, and XML.
3. Set API parameters: Provide the necessary inputs for the API request, such as query parameters,
endpoint URLs, and API keys or tokens. The HTTP request functions offered by the R packages
normally receive these arguments.
4. Make the API request: To submit the request and any relevant arguments to the API endpoint,
use the appropriate HTTP request function from the R package.
5. Parse and process the data: After receiving the API response, use R packages like jsonlite or
XML to transform the data into manipulable R objects like data frames or lists.
6. Data analysis and visualization: To carry out any required analyses or visualizations on the
data gathered via the API, use R functions and packages.
Data Exploration and Fixing data
● Data exploration is the first step in the journey of extracting insights from raw datasets.
● Data exploration serves as the compass that guides data scientists through the vast sea of information.
● It involves getting to know the data intimately, understanding its structure, and uncovering valuable
nuggets that lay hidden beneath the surface.
● Data exploration plays a crucial role in data analysis because it helps you uncover hidden gems
within your data.
Through this initial investigation, you can start to identify:
● Patterns and Trends: Are there recurring themes or relationships between different data points?
● Anomalies: Are there any data points that fall outside the expected range, potentially indicating
errors or outliers?
How Data Exploration Works

1. Data Collection: Data exploration commences with collecting data from diverse sources such as databases, APIs,
or through web scraping techniques. This phase emphasizes recognizing data formats, structures, and
interrelationships. Comprehensive data profiling is conducted to grasp fundamental statistics, distributions, and
ranges of the acquired data.
2. Data Cleaning: Integral to this process is the rectification of outliers, inconsistent data points, and addressing
missing values, all of which are vital for ensuring the reliability of subsequent analyses. This step involves employing
methodologies like standardizing data formats, identifying outliers, and imputing missing values. Data organization
and transformation further streamline data for analysis and interpretation.
3. Exploratory Data Analysis (EDA): This EDA phase involves the application of various statistical tools such as
box plots, scatter plots, histograms, and distribution plots. Additionally, correlation matrices and descriptive
statistics are utilized to uncover links, patterns, and trends within the data.
4. Feature Engineering: Feature engineering focuses on enhancing prediction models by introducing or modifying
features. Techniques like data normalization, scaling, encoding, and creating new variables are applied. This step
ensures that features are relevant and consistent, ultimately improving model performance.
5. Model Building and Validation: During this stage, preliminary models are developed to test hypotheses or
predictions. Regression, classification, or clustering techniques are employed based on the problem at hand.
Cross-validation methods are used to assess model performance and generalizability.
Fixing Data
● Data fixing is a critical step in data preprocessing where errors or inaccuracies in the dataset are identified
and corrected to ensure data reliability and usability.

Why is Data Fixing Important?

1. Improves Accuracy: Ensures the dataset reflects the correct information.

2. Enhances Decision-Making: Reliable data leads to better insights and outcomes.
3. Maintains Consistency: Prevents errors in downstream processes or analysis.
4. Prevents Misinterpretation: Reduces the risk of incorrect conclusions due to flawed data.
Common Data Issues Requiring Fixing

1. Incorrect Values:
○ Negative values where only positive ones are valid (e.g., negative age or income).
○ Out-of-range values (e.g., temperatures exceeding physical limits).
2. Typographical Errors:
○ Misspelled names, places, or labels.
○ Inconsistent entries (e.g., "NY" and "New York" for the same entity).
3. Data Mismatches:
○ Data discrepancies between sources (e.g., mismatched customer IDs across tables).
4. Logical Errors:
○ Start dates occurring after end dates.
○ Invalid combinations of categorical data (e.g., "Male" listed as "Pregnant").
5. Incomplete Data:
○ Missing critical fields that require imputation or manual entry.
6. Duplicate Entries:
○ Redundant rows that inflate results or create biases.
Methods for Fixing Data
1. Handling Missing Values:
○ Fill missing data with:
■ Mean/Median/Mode (for numerical data).
■ Interpolation (for time-series data).
■ Domain-Specific Defaults.
○ Drop rows or columns if the missing values are not critical.
2. Correcting Invalid Values:
○ Replace with nearest valid values (e.g., cap outliers at upper/lower bounds).
○ Use regex or string matching for correcting typos (e.g., "Nwe York" → "New York").
3. Resolving Duplicates:
○ Drop duplicate rows using drop_duplicates() in Python or similar tools.
4. Standardizing Data:
○ Convert inconsistent formats (e.g., "MM-DD-YYYY" → "YYYY-MM-DD").
○ Standardize text case, remove whitespace, and normalize units.
5. Cross-Referencing Data:
○ Match and validate entries against reference datasets or lookup tables.
6. Handling Outliers:
○ Apply statistical methods to detect and cap outliers (e.g., z-scores, IQR).
○ Decide whether to keep or remove outliers based on context.
7. Fixing Logical Errors:
○ Write conditional rules to correct invalid logic (e.g., swap dates if end < start).
Data Storage and Management
● Data Storage is a key segment of computerized gadgets, as buyers and organizations have come to depend
on it to save data going from individual information to business-basic data.
● It is used to capture and retain digital data on storage devices.

Types of Data Storage:

Network Attached Storage Device:

● The network-attached storage device permits the storage and recovery of data from a centralized location
by approved network users.
● These devices are adaptable and versatile.
● NAS associates with a wireless router, making it simple for disseminated workplaces to access documents
from any device associated with the network.

Cloud storage:

● Cloud storage is a storage option that utilizes remote servers and is accessible from any computer with
Internet access.
● It is kept up, worked and overseen by a cloud storage service provider on storage servers that are based
on virtualization strategies. Examples of cloud storage providers are Google Drive, iCloud, Citrix
ShareFile, ownCloud, Dropbox, Amazon Cloud Drive, MediaFire, etc.
Direct Attached Storage:
●Direct-attached storage is storage associated with a PC.
●It is associated with one computer and not accessible to other computers.
● DAS can furnish clients with preferable execution over networked storage in light of the fact
that the server does not need to cross a system to peruse and compose information.
● Hard drive or USB flash drive is an example of direct-attached storage.
Storage Area Network:
○ The storage area network is a network-based storage system.
○ SAN systems connect to the network utilizing high-speed interfaces enabling improved
execution and the capacity to interface numerous servers to a centralized disk storage store.
○ Storage area networks are highly scalable because capacity can be added as needed.
Object storage:
● Object storage is a technique for organizing data into distinct components called objects that
are kept with specific identifiers and metadata.
● Each object is given a distinct address in this type, and data may be retrieved by using that
address as a reference.
● It is made for large-scale, unstructured data storage, including multimedia files, backups, and
archives. Distributed storage systems and cloud storage platforms is an example of Object
storage

Data Processing and Analysis
100% (3)
Data Processing and Analysis
38 pages
Reliability-21 08 2023
No ratings yet
Reliability-21 08 2023
51 pages
Statistics For Engineers-Fall 2009 Tutorial-09-Solution
No ratings yet
Statistics For Engineers-Fall 2009 Tutorial-09-Solution
3 pages
Data Science
No ratings yet
Data Science
68 pages
ToolKit 1 - Unit 1 - Introduction To Data Analytics
No ratings yet
ToolKit 1 - Unit 1 - Introduction To Data Analytics
15 pages
All About Data Science
No ratings yet
All About Data Science
35 pages
Important Question of Introduction of Data Science
No ratings yet
Important Question of Introduction of Data Science
10 pages
UNIT 2 Notes - Data Science
No ratings yet
UNIT 2 Notes - Data Science
18 pages
Notes of Unit-I Data Analyticsdocx_250319_093958
No ratings yet
Notes of Unit-I Data Analyticsdocx_250319_093958
18 pages
UNIT 1_PPT
No ratings yet
UNIT 1_PPT
67 pages
Unit 1 Topic 1 Intro
No ratings yet
Unit 1 Topic 1 Intro
30 pages
DAFD UNit-2
No ratings yet
DAFD UNit-2
16 pages
unit-1ppt-241202105748-ba1c594f
No ratings yet
unit-1ppt-241202105748-ba1c594f
30 pages
Lec02 Business Analytics - 20231224 - 102047 - 0000 1
No ratings yet
Lec02 Business Analytics - 20231224 - 102047 - 0000 1
23 pages
CoSc-4121 RMICS-Chapter Six Data Collection and Analysis
No ratings yet
CoSc-4121 RMICS-Chapter Six Data Collection and Analysis
16 pages
Introduction to Data Science Module 2
No ratings yet
Introduction to Data Science Module 2
35 pages
Comprehensive Guide to Data Collection
No ratings yet
Comprehensive Guide to Data Collection
16 pages
DATA ANALYSIS Docx
No ratings yet
DATA ANALYSIS Docx
17 pages
Data mining 3
No ratings yet
Data mining 3
31 pages
LESSON1 ObtainingData
100% (1)
LESSON1 ObtainingData
32 pages
chapter-1 Introduction to Data Analytics
No ratings yet
chapter-1 Introduction to Data Analytics
34 pages
Unit 1 Notes - Data Analysis Using r
No ratings yet
Unit 1 Notes - Data Analysis Using r
17 pages
unit-1ppt
No ratings yet
unit-1ppt
29 pages
AA THeory and Methods
No ratings yet
AA THeory and Methods
40 pages
Data Science - III
No ratings yet
Data Science - III
94 pages
Module 5 Lecture Note
No ratings yet
Module 5 Lecture Note
8 pages
DA Unit 1
No ratings yet
DA Unit 1
43 pages
Chapter-II-Data-Collection-and-Management
No ratings yet
Chapter-II-Data-Collection-and-Management
19 pages
Unit 2 BI & Data Science (1)
No ratings yet
Unit 2 BI & Data Science (1)
35 pages
UNIT 1 DATA ACQUISITION
No ratings yet
UNIT 1 DATA ACQUISITION
62 pages
Module 4
No ratings yet
Module 4
8 pages
Cse2026 Module 1 & 2 Detailed Notes
No ratings yet
Cse2026 Module 1 & 2 Detailed Notes
185 pages
BDA-24_Lect (3-4)-(Fundamentals of Data Analysis)
No ratings yet
BDA-24_Lect (3-4)-(Fundamentals of Data Analysis)
15 pages
Da Unit-I
No ratings yet
Da Unit-I
39 pages
Xi Ai Unit - 5 Notes
No ratings yet
Xi Ai Unit - 5 Notes
28 pages
Unit 2
No ratings yet
Unit 2
58 pages
Data Analytics
No ratings yet
Data Analytics
17 pages
Data Analytics pdf
No ratings yet
Data Analytics pdf
115 pages
data collection (1)
No ratings yet
data collection (1)
6 pages
Data similarity and dissimilarity
No ratings yet
Data similarity and dissimilarity
73 pages
all-unit-notes
No ratings yet
all-unit-notes
116 pages
21BCAD5C01 IDA Module 2 Notes
No ratings yet
21BCAD5C01 IDA Module 2 Notes
16 pages
Lecture 6 23-24
No ratings yet
Lecture 6 23-24
20 pages
Data Analytics
No ratings yet
Data Analytics
5 pages
ITE Elective Lecture Materials Data Colletion and Descriptive Statistics
No ratings yet
ITE Elective Lecture Materials Data Colletion and Descriptive Statistics
8 pages
3 Data Analytics Techniques
No ratings yet
3 Data Analytics Techniques
17 pages
Data Handling
No ratings yet
Data Handling
7 pages
DA-Unit-2-Trio-1
No ratings yet
DA-Unit-2-Trio-1
26 pages
2.1_Data_Analytics[1]
No ratings yet
2.1_Data_Analytics[1]
16 pages
unit 2
No ratings yet
unit 2
81 pages
Unit_1.pptx
No ratings yet
Unit_1.pptx
57 pages
Data Analysis: Types, Process, Methods, Techniques and Tools
No ratings yet
Data Analysis: Types, Process, Methods, Techniques and Tools
6 pages
Data Science
No ratings yet
Data Science
64 pages
data analytics unit 1
No ratings yet
data analytics unit 1
16 pages
data analytics unit-1 part 1
No ratings yet
data analytics unit-1 part 1
37 pages
UNITWISE-IMP-NOTES
No ratings yet
UNITWISE-IMP-NOTES
34 pages
Data Analysis: Nahiyan Saleh, Ali Ahmed, Mohammed RASHED
No ratings yet
Data Analysis: Nahiyan Saleh, Ali Ahmed, Mohammed RASHED
11 pages
Unit 2 DS
No ratings yet
Unit 2 DS
30 pages
essay2
No ratings yet
essay2
3 pages
Rudra Bhatt Data
No ratings yet
Rudra Bhatt Data
9 pages
Data Analysis: An In-depth Insight
From Everand
Data Analysis: An In-depth Insight
Pasquale De Marco
No ratings yet
Data Analytics and Data Processing Essentials
From Everand
Data Analytics and Data Processing Essentials
gareth thomas
No ratings yet
Class 5
No ratings yet
Class 5
21 pages
Class 13
No ratings yet
Class 13
7 pages
Class 7
No ratings yet
Class 7
10 pages
Class 11
No ratings yet
Class 11
8 pages
Class 8
No ratings yet
Class 8
10 pages
Class 12
No ratings yet
Class 12
5 pages
Class 5 - 2
No ratings yet
Class 5 - 2
2 pages
Python Record[2] (1)
No ratings yet
Python Record[2] (1)
77 pages
Document
No ratings yet
Document
5 pages
ID Pengaruh Pengelolaan Barang Milik Daerah
100% (1)
ID Pengaruh Pengelolaan Barang Milik Daerah
11 pages
OP5205: BUSINESS STATISTICS (2 Credits) Session 21-24 - Confidence Interval & Hypothesis Testing
No ratings yet
OP5205: BUSINESS STATISTICS (2 Credits) Session 21-24 - Confidence Interval & Hypothesis Testing
77 pages
Statistics For Managers Using Microsoft® Excel 5th Edition: Numerical Descriptive Measures
No ratings yet
Statistics For Managers Using Microsoft® Excel 5th Edition: Numerical Descriptive Measures
64 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
6 pages
Dissertation Analysis Tools
100% (2)
Dissertation Analysis Tools
6 pages
Notes On ECON3032: X Y U (X
No ratings yet
Notes On ECON3032: X Y U (X
2 pages
02.2 Probability - CP
No ratings yet
02.2 Probability - CP
26 pages
Ijere 2 - Work Related Stress
No ratings yet
Ijere 2 - Work Related Stress
7 pages
International Accounting Standards For Presentation and
No ratings yet
International Accounting Standards For Presentation and
21 pages
Unit 16_CRP-SEM3_Proposal 2023 Big Data
No ratings yet
Unit 16_CRP-SEM3_Proposal 2023 Big Data
93 pages
English
No ratings yet
English
28 pages
Impact On Online Food Delivery Apps in Hotel Industry
100% (1)
Impact On Online Food Delivery Apps in Hotel Industry
54 pages
Week 2 Gathering DatA
No ratings yet
Week 2 Gathering DatA
32 pages
Activity Expected Time Variance 1-2 10 1 1-3 10 0 1-4 5 1 2-6 7 4 3-6 5 1 3-7 7 1 3-5 2 0 4-5 5 1 5-7 8 4 6-7 4 1
No ratings yet
Activity Expected Time Variance 1-2 10 1 1-3 10 0 1-4 5 1 2-6 7 4 3-6 5 1 3-7 7 1 3-5 2 0 4-5 5 1 5-7 8 4 6-7 4 1
3 pages
Ajmms
No ratings yet
Ajmms
8 pages
NPTEL Online Certification Courses Indian Institute of Technology Kharagpur
100% (1)
NPTEL Online Certification Courses Indian Institute of Technology Kharagpur
4 pages
Algorithmic Trading
No ratings yet
Algorithmic Trading
3 pages
PSYC 3200 - Tests & Measurement
No ratings yet
PSYC 3200 - Tests & Measurement
22 pages
Complete Regression Analysis in Medical Research: For Starters and 2nd Levelers Ton J. Cleophas PDF For All Chapters
100% (6)
Complete Regression Analysis in Medical Research: For Starters and 2nd Levelers Ton J. Cleophas PDF For All Chapters
47 pages
SPFA01 6+Formula+Sheet
No ratings yet
SPFA01 6+Formula+Sheet
3 pages
Real Stats Using Econometrics for Political Science and Public Policy Scanned 1st Edition Michael A. Bailey - Download the ebook in PDF with all chapters to read anytime
100% (1)
Real Stats Using Econometrics for Political Science and Public Policy Scanned 1st Edition Michael A. Bailey - Download the ebook in PDF with all chapters to read anytime
41 pages
Business Analytics 9605 Session 2
No ratings yet
Business Analytics 9605 Session 2
60 pages
3 Advanced Tranformation
No ratings yet
3 Advanced Tranformation
11,705 pages
A Confirmatory Factor Analysis of The End-User Computing Satisfaction Instrument
No ratings yet
A Confirmatory Factor Analysis of The End-User Computing Satisfaction Instrument
10 pages
Analysis Plan Nizam
No ratings yet
Analysis Plan Nizam
1 page
STATS 0714
No ratings yet
STATS 0714
3 pages
Canonical Correlation Analysis in SPSS PDF
No ratings yet
Canonical Correlation Analysis in SPSS PDF
6 pages

Uploaded by

Uploaded by

Unit - 1

Introduction to Data analytics

Types of Data Analytics

Secondary Data are collected from:

Challenges in Using Multiple Data Sources

Step 1: Data Collection

● Identify internal and external data relevant to the analysis.

Step 2: Data Integration

● Use Extract-Transform-Load (ETL) processes to:

Step 3: Data Cleaning

● Remove noise, irrelevant records, and duplicates.

Step 4: Data Enrichment

Step 5: Data Analysis

● Apply statistical, machine learning, or deep learning techniques.

Step 6: Visualization and Insights

● Use dashboards or visualization tools (e.g., Tableau, Power BI).

Why is Data Fixing Important?

1. Improves Accuracy: Ensures the dataset reflects the correct information.

Types of Data Storage:

Network Attached Storage Device:

You might also like