Data Science
Data Science
Data can be a number, symbol, or text which may or may not mean anything to individuals on
its own.
When the data is processed and put in context, it bears a meaning. This data can be used for
decision-making, calculations, and discussion. Data then becomes information.
For example, if you are given a list of temperature readings, it would not make sense. But when
the list is well arranged and organized, it shows that the global temperature is rising. This list
now becomes information from data.
DATA RECOVERY
Data can be lost, corrupted, damaged, or deleted due to multiple reasons like system crash,
disk failure, transaction failure. The process of restoring this inaccessible, corrupted, deleted,
or damaged data is called data recovery.
DATA LOSS
Hardware Failure
Software Issues
Natural Disaster
Viruses or Malware
Human Error
TYPES OF DATA LOSS:-
System Failure-
Hardware for crash failure
Software crash
Power Failure.
Natural disaster:-
Fire.
Natural disaster
Crime-
Theft, hacking, etc.
computer virus, ransomware.
Unintentional action-
Accidental deletion of files.
Loss of pendrive or laptops
Intentional action -
Deletion of files or program.
Arranging and Collecting Data-
DATA COLLECTION-
The method of gathering data for calculating and analyzing reliable insights is called data
collection, which done using standardized validated techniques. A research or scientists works
based on the collected data. Data collection is the primary and essential step in most cases.
The approach for data is highly different in different fields.
VARIABLES:-
A variable is an attribute of an object that may vary for different cases. A variable can be a
numbers, characteristics or quantity that can be measured. A variable can have different values
in different cases. It is of two types:-
Numerical Variable-
It is a variable that has values in numbers for e.g. heights, weights, ages etc. It is a quantifiable
characteristic
Categorical Variable-
It is a variable that has values in words. for e.g. name, origin/ country of birth, etc. It is not a
quantifiable characteristic
TYPES OF DATA :-
Quantitative Data-
Quantitative data are numbers or values which can be measured. For e.g:-
Qualitative Data: On the other hand qualitative data is subjective. For eg. * Traveller's
Feedback on for a hotel Feedback for customer service. Opinion on something.
SOURCES OF DATA-
Physical interviews
Online Surveys
Feed back forms.
Marketing Campaign.
Satellite data
IOT sensor data
transitional databases.
Social Media
Web Traffic.
BIG DATA-
When the data exceeds the capacities of traditional databases and a specialised system is
required to mang manage Alu data, then It is called Big Data.
RETAIL:
Retail chains are spread across the world. They handle millions of customers every second
minute. They store and analyze customer data and transactions using big data systems.
SCIENCE
:
On the Discover Supercomputing clusters. The NASA center of Climate Simulations (NCCS)
generates 32 petabytes of data on climate simulations and observations.
SOCIAL MEDIA:
Popular social media platforms store and analyze petabytes of data.
EXPERIMENT:
everyday. They use bit Big data techniques for storage, and
analysis
HEALTHCARE
During Covid-19, many governments used Big data to locate
the infected people. Big Data was also used for case
identification and medical treatment.
UNIVARIATE DATA
IMPORTANCE
It makes complex data simple and enables human mind to understand its significance.
It helps us recognize the trends, patterns and outliers from seemingly meaningless data
records of data.
Data visualization techniques use visual data in a universal, fast and powerful way of
communication to communicate information
DOT PLOT
A dot plot is a graphical representation of data using dots. Dots are used in a dot plot to
illustrate the quantitative values associated with qualitative values - categorical values.
BAR GRAPH
• In a bar graph, bars are presented to show the elements so that they do not touch each other.
MINIMUM:
The smallest number in a dataset is called the minimum. There cannot be two minimum values
in a data set = Max(Range)
MAXIMUM:
Maximum is the largest number in a dataset. There cannot be two maximum values in a data
set = Min(Range)
FREQUENCY:
The number of times a data value repeats (occurs in a data set is called the frequency of the
data value.
=COUNTIFS(Range, criteria"")
HISTOGRAM
In other words, Histogram displays data points which fall under a set of values called bins to
provide visual representation of the numeric data.
SHAPES OF HISTOGRAM:
Normal
Bimodal
Right-Skewed
Left-Skewed
Random
SINGLE VARIABLE
NORMAL DISTRIBUTION-
Normal distribution is a common bell shaped curve pattern. In normal distribution the data
points are equal distributed on either side of the average. Statistical calculations must be done
to prove normal distribution. It is also known as Symmetrical or bell-shaped distribution.
DIFFERENCE-
Bimodal distribution has two peaks which show that the data is collected from two different
systems.
BIMODAL (distribution)
RANDOM (distribution)
RANGE
FREQUENCY TABLE:-