0% found this document useful (0 votes)

11 views

Data Science

The document discusses the distinction between data and information, emphasizing that processed data provides meaningful insights for decision-making. It covers topics such as data recovery, data loss, data collection methods, types of data, big data, and the importance of data visualization. Additionally, it highlights ethical guidelines and governance frameworks necessary for responsible data management.

Uploaded by

31samidhanarkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Data Science

Uploaded by

31samidhanarkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Introduction:

DATA V/S INFORMATION:

Data can be a number, symbol, or text which may or may not mean anything to individuals on
its own.

When the data is processed and put in context, it bears a meaning. This data can be used for
decision-making, calculations, and discussion. Data then becomes information.

For example, if you are given a list of temperature readings, it would not make sense. But when
the list is well arranged and organized, it shows that the global temperature is rising. This list
now becomes information from data.

DATA RECOVERY

Data can be lost, corrupted, damaged, or deleted due to multiple reasons like system crash,
disk failure, transaction failure. The process of restoring this inaccessible, corrupted, deleted,
or damaged data is called data recovery.

DATA LOSS

The intentional or unintentional data destruction of information by people or processes is

called data loss.

Causes of Data Loss:

Hardware Failure
Software Issues
Natural Disaster
Viruses or Malware
Human Error
TYPES OF DATA LOSS:-

System Failure-
Hardware for crash failure
Software crash
Power Failure.
Natural disaster:-
Fire.
Natural disaster
Crime-
Theft, hacking, etc.
computer virus, ransomware.
Unintentional action-
Accidental deletion of files.
Loss of pendrive or laptops
Intentional action -
Deletion of files or program.
Arranging and Collecting Data-

DATA COLLECTION-

The method of gathering data for calculating and analyzing reliable insights is called data
collection, which done using standardized validated techniques. A research or scientists works
based on the collected data. Data collection is the primary and essential step in most cases.
The approach for data is highly different in different fields.

VARIABLES:-

A variable is an attribute of an object that may vary for different cases. A variable can be a
numbers, characteristics or quantity that can be measured. A variable can have different values
in different cases. It is of two types:-

Numerical Variable-

It is a variable that has values in numbers for e.g. heights, weights, ages etc. It is a quantifiable
characteristic

Categorical Variable-

It is a variable that has values in words. for e.g. name, origin/ country of birth, etc. It is not a
quantifiable characteristic

TYPES OF DATA :-
Quantitative Data-

Quantitative data are numbers or values which can be measured. For e.g:-

Height, weight and age of a student

No. of times an item is sold in a month.
No. of items sold in a month.
Since this data can be quantified it is easier to analyze.

Qualitative Data: On the other hand qualitative data is subjective. For eg. * Traveller's
Feedback on for a hotel Feedback for customer service. Opinion on something.

This data helps us understand experience experiences in depth.

SOURCES OF DATA-

Primary Data Source:-

Physical interviews
Online Surveys
Feed back forms.
Marketing Campaign.

SECONDARY DATA SOURCES:-

Satellite data
IOT sensor data
transitional databases.
Social Media
Web Traffic.

BIG DATA-

When the data exceeds the capacities of traditional databases and a specialised system is
required to mang manage Alu data, then It is called Big Data.

Characteristics of Big Data are-

Volume refers to the size of the data. Determines whether the data can be classified as big
data or not.
Variety: Data sets are collected from a wide range of sources including traditional
databases, sensor data, etc. Includes images, pictures, audio, video, etc. Essential
characteristic.
Velocity: Refers to the rate at which data is generated. Generally is created at rapid speed
resulting in high volumes very soon. Social media generates massive amounts of data every
minute.

RETAIL:
Retail chains are spread across the world. They handle millions of customers every second
minute. They store and analyze customer data and transactions using big data systems.

SCIENCE
:

On the Discover Supercomputing clusters. The NASA center of Climate Simulations (NCCS)
generates 32 petabytes of data on climate simulations and observations.

SOCIAL MEDIA:
Popular social media platforms store and analyze petabytes of data.
EXPERIMENT:
everyday. They use bit Big data techniques for storage, and
analysis

HEALTHCARE
During Covid-19, many governments used Big data to locate
the infected people. Big Data was also used for case
identification and medical treatment.

ALGORITHMS TO INTERPRET DATA

Binary Classification - Is this A or B?
Regression Algorithm - How much or how many - (Frauds) Recommendation Protection
Anomaly Detection - Is this Odd?
Clustering Algorithm - Can I group the data?
Replacement Algorithm - What should I do now? - (Robots)

UNIVARIATE DATA

has single variable

eg: height of a student

Multi Multi MULTIVARIATE DATA

has relationship with multiple variables.
eg - sales of umbrella are dependent on rainfall.
DATA VISUALISATIONS
The mechanism of representing raw data in the form of graphical representations is such that
allows users to explore data and uncover quick insights is called data visualization.

IMPORTANCE
It makes complex data simple and enables human mind to understand its significance.
It helps us recognize the trends, patterns and outliers from seemingly meaningless data
records of data.
Data visualization techniques use visual data in a universal, fast and powerful way of
communication to communicate information

REAL LIFE EXAMPLES

Monitoring student progress with scorecards.
Identifying usage trends of a website.
Monitoring goals and results of a sales executive.
Visualizing spread and impact of pandemics.

DOT PLOT

A dot plot is a graphical representation of data using dots. Dots are used in a dot plot to
illustrate the quantitative values associated with qualitative values - categorical values.

BAR GRAPH

A bar graph is a graphical representation of data using bars of different heights.

The bars can be either vertical or horizontal.

• vertical bar graph is called column graph or chart.

• In a bar graph, bars are presented to show the elements so that they do not touch each other.

MINIMUM:

The smallest number in a dataset is called the minimum. There cannot be two minimum values
in a data set = Max(Range)

MAXIMUM:

Maximum is the largest number in a dataset. There cannot be two maximum values in a data
set = Min(Range)

FREQUENCY:

The number of times a data value repeats (occurs in a data set is called the frequency of the
data value.

=COUNTIFS(Range, criteria"")

HISTOGRAM

Graphical representation of data illustration of frequency against time intervals.

In other words, Histogram displays data points which fall under a set of values called bins to
provide visual representation of the numeric data.

SHAPES OF HISTOGRAM:
Normal
Bimodal
Right-Skewed
Left-Skewed
Random
SINGLE VARIABLE
NORMAL DISTRIBUTION-

Normal distribution is a common bell shaped curve pattern. In normal distribution the data
points are equal distributed on either side of the average. Statistical calculations must be done
to prove normal distribution. It is also known as Symmetrical or bell-shaped distribution.

DIFFERENCE-

Normal distribution has one peak which represents average.

Bimodal distribution has two peaks which show that the data is collected from two different
systems.

BIMODAL (distribution)

has two peaks - combination of two normal histograms.

RANDOM (distribution)

lacks apparent pattern and has several peaks-

RANGE

The difference between maximum and minimum values is called range.

FREQUENCY TABLE:-

A frequency table is tabular representation that summarizes raw categorical data.

DIFFERENCE
Right Skewed -distribution skewed to It is also called positively skewed distribution
In this distribution, all the collected data has value more than 0.
In right skewed, many data points occur on the left than on the right
Left Skewed. 24 is also called negatively skewed distribution.
In this distribution all the collected data has values less than 0
In left skewend, many data points occur on the right with fewer on the left
Ethics in Data Science
ETHICAL GUIDELINES:

Data governance is critical

Protect your cost cor customer
Do not lie
Understand the link of data quality.
Private identity and information should remain private.
Share private information should be treated confidentially.

NEED FOR ETHICAL GUIDELINES:

To collect minimal data.

To identify and search sensitive data.
To have a backup plan incase the insights backfire.

GOAL FOR ETHICAL GUIDELINES:

Protect the To secure customer's private information

To distinguish between legal and ethical policies.
To consolidate data collection methods.
To follow well instructed ang and approved rules
Integrity of data and methods -
ensuring accuracy, consistency and reliability of both the data
and the process used to collect, store and analyze data.
To implement compliance requirements-
means to actively put into practice the rules, regulations and
standards set by government.
To establish internal rules for data use - means to create a set of guidelines within an
organization that defines how system employees access the system to collect, store and
process proper data.

KEY GOALS OF ETHICAL GUIDELINES

Professional integrity and accountability.
Integrity of data and methods
Follow informed concerned rules.
Respect confidentiality and privacy.

DATA GOVERNANCE FRAMEWORK

Data governance framework provides a comprehensive approach in managing, storing,
securing and collecting data.
Data governance means cleaner, leaner and better data which means better analytics, better
decisions and better results.

GOALS OF DATA GOVERNANCE

To improve external and internal communication
To reduce cost
To minimize risks
To increase the value of data
To increase revenue.
To implement compliance requirements
To establish internal rules for data.

Gravimetric Determination of Aluminium As Oxinate
No ratings yet
Gravimetric Determination of Aluminium As Oxinate
10 pages
Notes - KCS 061 Big Data Unit 1
No ratings yet
Notes - KCS 061 Big Data Unit 1
25 pages
Data Analytics (Finished
No ratings yet
Data Analytics (Finished
4 pages
C20 Combined
No ratings yet
C20 Combined
291 pages
Class 11 Ip Chapter 5 2024-2025
No ratings yet
Class 11 Ip Chapter 5 2024-2025
11 pages
Data Science and Ai Education For Young Minds
No ratings yet
Data Science and Ai Education For Young Minds
75 pages
DV Co1 All PDF
No ratings yet
DV Co1 All PDF
196 pages
foundation of Data science imp notes
No ratings yet
foundation of Data science imp notes
6 pages
DA Unit 1
No ratings yet
DA Unit 1
43 pages
Unit 1 Introduction
No ratings yet
Unit 1 Introduction
70 pages
DWH m2p2
No ratings yet
DWH m2p2
8 pages
Da End Sem
No ratings yet
Da End Sem
5 pages
BDA U-3
No ratings yet
BDA U-3
30 pages
Introduction to Data Science Module 2
No ratings yet
Introduction to Data Science Module 2
35 pages
Data Science
No ratings yet
Data Science
59 pages
kit-601-l-unit-1-240219102731-858108ce
No ratings yet
kit-601-l-unit-1-240219102731-858108ce
35 pages
ANL201 Study Unit 3 - 2023
No ratings yet
ANL201 Study Unit 3 - 2023
48 pages
Week 2 - 3getting To Know Your Data
No ratings yet
Week 2 - 3getting To Know Your Data
67 pages
Revised NOTES on AI PROJECT CYCLE Class 9 and 10 as on 29-10-2024 1
No ratings yet
Revised NOTES on AI PROJECT CYCLE Class 9 and 10 as on 29-10-2024 1
21 pages
DG Intro
No ratings yet
DG Intro
22 pages
Lecture 2
No ratings yet
Lecture 2
14 pages
EDA - Unit 1
No ratings yet
EDA - Unit 1
82 pages
Screenshot 2024-11-08 at 11.01.05 AM
No ratings yet
Screenshot 2024-11-08 at 11.01.05 AM
54 pages
AIML Unit 2 Understanding Data
No ratings yet
AIML Unit 2 Understanding Data
51 pages
DS Unit-1 PDF
No ratings yet
DS Unit-1 PDF
50 pages
TYCS DS Unit1
No ratings yet
TYCS DS Unit1
28 pages
Unit 2 Data Analytics
No ratings yet
Unit 2 Data Analytics
16 pages
Biostatistics - Data and Its Types
No ratings yet
Biostatistics - Data and Its Types
11 pages
Module 5 Lecture Note
No ratings yet
Module 5 Lecture Note
8 pages
Lecture 3 (DS) - Steps in Data Science Process
No ratings yet
Lecture 3 (DS) - Steps in Data Science Process
57 pages
Data Analitics 3
No ratings yet
Data Analitics 3
14 pages
Introduction to Business Analytics - Copy
No ratings yet
Introduction to Business Analytics - Copy
63 pages
Module 1_BCS602_chapter 02.pptx
No ratings yet
Module 1_BCS602_chapter 02.pptx
90 pages
Dv Chapter 1
No ratings yet
Dv Chapter 1
25 pages
Unit 1
No ratings yet
Unit 1
61 pages
Basics of Data Visualization A Necessity
No ratings yet
Basics of Data Visualization A Necessity
11 pages
DS Notes
No ratings yet
DS Notes
49 pages
EDA 1
No ratings yet
EDA 1
137 pages
data evolution unit 1 material.docx
No ratings yet
data evolution unit 1 material.docx
28 pages
Class 9 AI Project Cycle Notes
No ratings yet
Class 9 AI Project Cycle Notes
8 pages
BigDataAnalytics _ Unit1
No ratings yet
BigDataAnalytics _ Unit1
21 pages
Ll Ll Lllll Lllll
No ratings yet
Ll Ll Lllll Lllll
39 pages
Chapter - 2 - Data Science
No ratings yet
Chapter - 2 - Data Science
32 pages
Data Science Notes
No ratings yet
Data Science Notes
56 pages
DA-Unit-2-Trio-1
No ratings yet
DA-Unit-2-Trio-1
26 pages
Cs3352 Foundation of Data Science
No ratings yet
Cs3352 Foundation of Data Science
80 pages
Class X AI Project Cycle Notes
No ratings yet
Class X AI Project Cycle Notes
19 pages
Statistical Learning - Introduction
No ratings yet
Statistical Learning - Introduction
20 pages
Data Mining
No ratings yet
Data Mining
34 pages
Data Analysis and Modelling
No ratings yet
Data Analysis and Modelling
107 pages
chapter-1 Introduction to Data Analytics
No ratings yet
chapter-1 Introduction to Data Analytics
34 pages
Combine PDF
No ratings yet
Combine PDF
270 pages
Lecture 01
No ratings yet
Lecture 01
40 pages
BIG DATA
No ratings yet
BIG DATA
66 pages
What Is Data Visualization UNIT-V
No ratings yet
What Is Data Visualization UNIT-V
24 pages
Unit 4
No ratings yet
Unit 4
21 pages
Data Discovery & Visualization - New
100% (1)
Data Discovery & Visualization - New
41 pages
Chapter 1 - 1
No ratings yet
Chapter 1 - 1
44 pages
ToolKit 1 - Unit 1 - Introduction To Data Analytics
No ratings yet
ToolKit 1 - Unit 1 - Introduction To Data Analytics
15 pages
Chapter 1-Introduction To Data
No ratings yet
Chapter 1-Introduction To Data
18 pages
Data Analytics
From Everand
Data Analytics
Jeffery Short
1/5 (1)
Entrepreneursare Not Bornnor Made
No ratings yet
Entrepreneursare Not Bornnor Made
9 pages
0193 01
No ratings yet
0193 01
22 pages
Attitude Value and Perception
No ratings yet
Attitude Value and Perception
88 pages
Synthesis Paper: Reading Difficulties, Disabilities, or Problems
No ratings yet
Synthesis Paper: Reading Difficulties, Disabilities, or Problems
5 pages
Terms of Reference AC Units
No ratings yet
Terms of Reference AC Units
2 pages
Annual Day Report
No ratings yet
Annual Day Report
5 pages
Ls5-Mga Filipino Na Kilala Sa Mundo Ngayon
No ratings yet
Ls5-Mga Filipino Na Kilala Sa Mundo Ngayon
2 pages
Romeo Pack
No ratings yet
Romeo Pack
39 pages
Ex4 Intro To Game Theory
No ratings yet
Ex4 Intro To Game Theory
36 pages
An Investigation of Awareness and Usage of Artificial Intelligence Applications For Learning by Undergraduate Students of Abubakar Tafawa Balewa University Bauchi
100% (1)
An Investigation of Awareness and Usage of Artificial Intelligence Applications For Learning by Undergraduate Students of Abubakar Tafawa Balewa University Bauchi
71 pages
Science6 - Q1-WK-3 FOR STUDENT
No ratings yet
Science6 - Q1-WK-3 FOR STUDENT
18 pages
K. by Cunanan Wedding and Debut Package
No ratings yet
K. by Cunanan Wedding and Debut Package
1 page
Aaaaaa: Distance, and Pile Cap Thickness
No ratings yet
Aaaaaa: Distance, and Pile Cap Thickness
6 pages
TECON TP60 Formwork A4
No ratings yet
TECON TP60 Formwork A4
18 pages
Technical Notice SLS 6 Rev 3 - Fire Protection Systems Appliances and Compressed Gas Cylinder PDF
No ratings yet
Technical Notice SLS 6 Rev 3 - Fire Protection Systems Appliances and Compressed Gas Cylinder PDF
20 pages
Carl Savigny
No ratings yet
Carl Savigny
16 pages
Modified Project
No ratings yet
Modified Project
49 pages
ANSI-ASME B16.47 Series B Weld Neck Flange 150lb
100% (1)
ANSI-ASME B16.47 Series B Weld Neck Flange 150lb
1 page
Water Balls: An Exciting New Product From A Company Called Skipping Rocks Lab
No ratings yet
Water Balls: An Exciting New Product From A Company Called Skipping Rocks Lab
3 pages
ma-psycho(10)-sem1-led 05042023
No ratings yet
ma-psycho(10)-sem1-led 05042023
116 pages
Analysis of Rates (AirfieldWorks)
88% (16)
Analysis of Rates (AirfieldWorks)
9 pages
Convert PDF Format Data To OTF Format
No ratings yet
Convert PDF Format Data To OTF Format
4 pages
Vetting Oil Tanker
67% (3)
Vetting Oil Tanker
13 pages
Previous Year Coding Questions Solution (Free)
No ratings yet
Previous Year Coding Questions Solution (Free)
6 pages
LAW Quiz 1
No ratings yet
LAW Quiz 1
2 pages
RP 030343
No ratings yet
RP 030343
59 pages
PT_G4 MATATAG MATHEMATICS 4_Q3 V3
No ratings yet
PT_G4 MATATAG MATHEMATICS 4_Q3 V3
9 pages
BQ 80 Specifications V1.7 2022.03.15
100% (1)
BQ 80 Specifications V1.7 2022.03.15
6 pages
Tunnel Diode PDF
No ratings yet
Tunnel Diode PDF
9 pages

Uploaded by

Uploaded by

Introduction:

DATA V/S INFORMATION:

The intentional or unintentional data destruction of information by people or processes is

Causes of Data Loss:

Height, weight and age of a student

This data helps us understand experience experiences in depth.

Primary Data Source:-

SECONDARY DATA SOURCES:-

Characteristics of Big Data are-

ALGORITHMS TO INTERPRET DATA

has single variable

Multi Multi MULTIVARIATE DATA

REAL LIFE EXAMPLES

A bar graph is a graphical representation of data using bars of different heights.

The bars can be either vertical or horizontal.

Graphical representation of data illustration of frequency against time intervals.

Normal distribution has one peak which represents average.

has two peaks - combination of two normal histograms.

lacks apparent pattern and has several peaks-

The difference between maximum and minimum values is called range.

A frequency table is tabular representation that summarizes raw categorical data.

Data governance is critical

NEED FOR ETHICAL GUIDELINES:

To collect minimal data.

GOAL FOR ETHICAL GUIDELINES:

Protect the To secure customer's private information

KEY GOALS OF ETHICAL GUIDELINES

DATA GOVERNANCE FRAMEWORK

GOALS OF DATA GOVERNANCE

You might also like