0% found this document useful (0 votes)

5 views27 pages

Statistics for Data Science

Statistics is essential in data science for collecting, analyzing, and interpreting data, divided into descriptive and inferential statistics. It provides the foundation for making informed decisions and is crucial in machine learning for model evaluation and feature selection. The document also discusses various data types, levels of measurement, and concepts like bias, reliability, and validity in statistical analysis.

Uploaded by

Sagar Bathani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views27 pages

Statistics for Data Science

Uploaded by

Sagar Bathani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Statistics for Data Science

What is Statistics?
Statistics is a fundamental component of data science, providing methods for collecting, analyzing,
interpreting, and presenting data. It is broadly classified into descriptive and inferential statistics.
Descriptive statistics focus on summarizing data using measures such as mean, median, mode,
variance, and standard deviation, along with visual tools like histograms and box plots. Inferential
statistics, on the other hand, help in making predictions and generalizations about a population
based on a sample, using techniques like hypothesis testing, regression analysis, and confidence
intervals. Probability theory is also an essential part of statistics in data science, helping to model
uncertainties and make data-driven predictions. Concepts such as probability distributions (normal,
binomial, Poisson), Bayes’ theorem, and the central limit theorem are widely used in statistical
analysis. In data science, statistical techniques are employed in exploratory data analysis (EDA),
hypothesis testing, regression modeling, and time series analysis to uncover patterns and
relationships within data. Moreover, statistics play a crucial role in machine learning, aiding in feature
selection, model evaluation, and understanding bias-variance tradeoffs to build robust predictive
models. By leveraging statistical methods, data scientists can make informed decisions, validate
hypotheses, and develop accurate machine learning algorithms. Understanding statistics is,
therefore, essential for deriving meaningful insights from data and ensuring the reliability of
analytical models.

Why is Statistics Important?

Statistics is crucial in data science because it provides the mathematical foundation for analyzing
data, identifying patterns, and making informed decisions. It enables data scientists to summarize
large datasets, extract meaningful insights, and quantify uncertainty using probability theory.
Descriptive statistics help in organizing and visualizing data, while inferential statistics allow for
making predictions and generalizations about populations based on samples. Statistical methods
are essential for hypothesis testing, regression analysis, and experimental design, ensuring that
conclusions drawn from data are valid and reliable. In machine learning, statistics are fundamental
for model evaluation, feature selection, and performance optimization, helping to prevent issues like
overfitting and underfitting. Additionally, probability distributions, statistical tests, and Bayesian
inference are widely used to model real-world uncertainties and optimize predictive algorithms.
Without statistics, data-driven decision-making would lack accuracy, leading to misleading
conclusions and ineffective models. Thus, statistics are the backbone of data science, ensuring that
data analysis is rigorous, evidence-based, and capable of driving meaningful insights across various
industries, from healthcare and finance to technology and business analytics.
Scales and levels of measurements
The scales and levels of measurement in statistics define how data is categorized, measured, and
analyzed. There are four primary levels of measurement: Nominal, Ordinal, Interval, and Ratio.
Each level has different properties and determines the types of statistical analyses that can be
performed.
1. Nominal Scale (Categorical Data)
• Definition: Data is classified into distinct categories with no inherent order.
• Characteristics: No numerical value or ranking.
• Examples:
o Gender (Male, Female, Other)
o Blood type (A, B, AB, O)
o Types of cars (Sedan, SUV, Truck)
2. Ordinal Scale (Ranked Data)
• Definition: Data is categorized in a meaningful order, but the difference between values is
not uniform.
• Characteristics: Can be ranked, but gaps between ranks are not equal.
• Examples:
o Education level (High School, Bachelor's, Master's, PhD)
o Customer satisfaction rating (Poor, Average, Good, Excellent)
o Competition rankings (1st, 2nd, 3rd place)
3. Interval Scale (Equal Differences, No True Zero)
• Definition: Data is measured on a scale with equal intervals between values but no true zero
point.
• Characteristics: Can perform addition and subtraction but not multiplication or division.
• Examples:
o Temperature in Celsius or Fahrenheit (0°C doesn’t mean "no temperature")
o IQ scores (Difference between 100 and 110 is the same as between 120 and 130)
o SAT scores
4. Ratio Scale (Equal Differences, True Zero Exists)
• Definition: Data has equal intervals and a meaningful zero, allowing all arithmetic
operations.
• Characteristics: Allows for the full range of mathematical operations (addition, subtraction,
multiplication, division).
• Examples:
o Height (cm, inches)
o Weight (kg, lbs)
o Age (years)
o Income ($0 means no income)
Differences Between Quantitative and Qualitative Data

Discrete, Continuous and Boolean Datasets

What is the time series?
A time series is a sequence of data points recorded at specific time intervals, typically in chronological order.
It is used to analyze trends, patterns, and dependencies over time. Time series data can be collected at
regular intervals (e.g., daily, monthly, yearly) or irregular intervals.

Key Characteristics of Time Series Data:

1. Temporal Dependence: The value of a data point depends on past values.
2. Trend: A long-term increase or decrease in the data over time.
3. Seasonality: Regular and repeating patterns (e.g., higher sales in December).
4. Cyclic Patterns: Fluctuations that occur over irregular time periods.
5. Stationarity: The statistical properties of data remain constant over time.

What is Special Data?

Spatial data (also called geospatial data) refers to data that represents the location, shape,
and relationship of objects in a geographical space. It includes coordinates (latitude,
longitude), maps, and geographic features. Spatial data is crucial for analyzing patterns and
trends in a physical space and is used in Geographic Information Systems (GIS) and
spatial analytics.

Types of Spatial Data:

1. Vector Data: Uses points, lines, and polygons to represent geographical features
(e.g., city locations, roads, boundaries).
2. Raster Data: Uses grid-based structures like satellite images and heatmaps to
represent spatial information.
Difference between Categorical and Numerical Data
What is the Multivariate data and what are its types?
Multivariate data refers to datasets containing multiple variables or attributes observed
simultaneously for each data point. It helps in understanding relationships, patterns, and
dependencies between multiple factors, making it essential in data science, machine
learning, and statistics.
Types of Multivariate Data
1. Multivariate Categorical Data
o Definition: Data where multiple categorical variables are recorded for each
observation.
o Example: A customer survey recording gender, preferred product category,
and payment method (Male, Electronics, Credit Card).
2. Multivariate Numerical Data
o Definition: Data where multiple numerical variables are measured for each
observation.
o Example: A student's dataset with height, weight, and exam scores (5.7 ft, 65
kg, 85%).
3. Mixed Multivariate Data
o Definition: A combination of categorical and numerical variables in the same
dataset.
o Example: A dataset containing a person’s age (numeric), education level
(categorical), and income (numeric).
4. Time-Series Multivariate Data
o Definition: Data where multiple variables are recorded over time at regular
intervals.
o Example: Weather data containing temperature, humidity, and wind speed
recorded every hour.
5. Spatial Multivariate Data
o Definition: Data where multiple variables are associated with specific
geographic locations.
o Example: A dataset mapping city with pollution levels, temperature, and
population density.
Difference between Structured and Unstructured Data

What are the Boolean data types?

A Boolean data type is a data type that can hold only two possible values: True (1) or False
(0). It is used in logic-based operations, decision-making, and control flow in programming
and data science. Boolean data is fundamental in computer science, databases, and
machine learning, where binary conditions are evaluated.
Characteristics of Boolean Data Type:
1. Binary Nature: Can only be True/False, Yes/No, or 1/0.
2. Used in Logical Operations: Boolean algebra operations like AND, OR, and NOT.
3. Decision Making: Commonly used in conditional statements in programming (if-else
conditions).
4. Data Filtering: Used in queries and searches (e.g., filtering data where "Is Active =
True").

What is Ture and Error, Types of Errors

In statistics and measurement theory, True Score and Error Score are components of the
Observed Score in any measurement process.
1. True Score (T):
o The actual, correct, and consistent value of a measurement if there were no
errors.
o It represents the real ability, knowledge, or characteristic being measured.
o Example: If a student’s actual math ability is 85%, their true score should be
85%.
2. Error Score (E):
o The difference between the observed score and the true score, caused by
various errors.
o It includes factors like measurement inaccuracies, environmental factors, or
respondent mistakes.
o Example: If a student was distracted during an exam and scored 78% instead
of 85%, the error score is 85% - 78% = 7%.
Formula:
Observed Score (X) = True Score (T) + Error Score (E).

Types of Errors in Measurement

1. Systematic Error:
o An error that consistently occurs in one direction due to faulty measurement
tools or bias.
o Example: A weighing scale that always shows 2 kg extra.
2. Random Error:
o An unpredictable error that occurs due to temporary influences like mood,
fatigue, or environmental conditions.
o Example: A student performing poorly on a test due to illness.
3. Type I Error (False Positive):
o Rejecting a true hypothesis (detecting an effect that does not exist).
o Example: A COVID-19 test incorrectly detecting the virus in a healthy person.
4. Type II Error (False Negative):
o Accepting a false hypothesis (failing to detect an actual effect).
o Example: A smoke alarm failing to detect a real fire.

Type I & II Error

In hypothesis testing, errors can occur when making decisions about a hypothesis. These
errors are classified into Type I Error (False Positive) and Type II Error (False Negative).
1. Type I Error (False Positive)
• Occurs when a true null hypothesis (H₀) is incorrectly rejected.
• This means we conclude that an effect exists when it does not.
• Example: A fire alarm goes off when there is no fire.
2. Type II Error (False Negative)
• Occurs when a false null hypothesis (H₀) is incorrectly accepted.
• This means we fail to detect an effect when it exists.
• Example: A smoke detector failing to go off when there is a fire.

Reliability and Validity

Reliability and validity are two key concepts in research and measurement to ensure the
accuracy and consistency of data.
1. Reliability
• Definition: The consistency and repeatability of a measurement or test. A
measurement is considered reliable if it produces the same results under consistent
conditions.
• Key Types of Reliability:
o Test-Retest Reliability: Consistency of results over time.
o Inter-Rater Reliability: Consistency between different observers or raters.
o Internal Consistency: Consistency within the test itself (e.g., survey
questions measuring the same concept).
• Example: A weighing scale that gives the same weight every time you step on it under
the same conditions is reliable.
2. Validity
• Definition: The accuracy and correctness of a measurement, meaning it measures
what it is supposed to measure.
• Key Types of Validity:
o Content Validity: The extent to which a test covers all aspects of the concept
being measured.
o Construct Validity: Whether the test truly measures the theoretical concept
it claims to measure.
o Criterion Validity: How well the test correlates with an external standard or
outcome.
• Example: If a weighing scale shows incorrect weight (e.g., 5 kg extra) every time, it
is reliable but not valid. A valid scale should give the correct weight.
What is Bais, how do you remove the bias?
Bias in data refers to systematic errors that lead to inaccurate or misleading conclusions.
It occurs when data is collected, analyzed, or interpreted in a way that favors certain
outcomes over others, reducing fairness and accuracy.
Example of Bias:
• Hiring Bias: A company uses an AI recruitment system trained on past hiring data,
which favors male candidates over female candidates because past hiring decisions
were biased.

How to Remove Bias in Data?

1. Collect Representative Data: Ensure data comes from diverse and unbiased
sources.
2. Eliminate Sampling Bias: Use random sampling to avoid over-representing certain
groups.
3. Preprocess Data Fairly: Remove or adjust biased features (e.g., gender, race) in
machine learning models.
4. Use Fair Algorithms: Apply bias-detection tools and fairness-aware models.
5. Check for Data Imbalance: Ensure all groups have equal representation in training
data.
6. Regularly Audit Models: Continuously monitor and correct biases in AI and data-
driven decisions.

What is Statistics, types of Statistics?

Statistics is the branch of mathematics that deals with collecting, organizing, analyzing,
interpreting, and presenting data to make informed decisions. It is widely used in research,
business, healthcare, and data science.
What is Data Analysis, what are the types of data analysis?
Data Analysis is the process of inspecting, cleaning, transforming, and interpreting data
to discover useful insights, patterns, and trends for decision-making. It is widely used in
business, healthcare, finance, and data science.
Sample, Population and Central Tendency
Sample
• Definition: A subset of data selected from a larger group (population) used for
analysis.
• Example: Surveying 1,000 students from a university of 10,000 students to study
average study hours.
Population
• Definition: The entire group from which a sample is taken for research.
• Example: The 10,000 students in a university represent the population in study-on-
study habits.
Central Tendency
• Definition: A statistical measure that represents the center of a dataset using Mean,
Median, or Mode.
• Example: The average (Mean) height of students in a class is 5.7 feet, summarizing
the dataset.

Mean, Median and Mode of data

Mean (Average)
• Definition: The sum of all values divided by the total number of values.
• Example: If the test scores of five students are 70, 80, 90, 85, and 75, the mean is:
(70+80+90+85+75) / 5 = 80.
So, the mean score is 80.

Median (Middle Value)

• Definition: The middle value in an ordered dataset.
• Example: If the test scores are 70, 75, 80, 85, 90, the middle value (Median) is 80.
Mode (Most Frequent Value)
• Definition: The value that appears most frequently in a dataset.
• Example: If the test scores are 70, 80, 80, 85, 90, the most repeated score is 80 (Mode
= 80).

Limitations of mean, median and mode of the data

Limitations of Mean:
• Affected by Outliers: A single extreme value can distort the mean, making it
unreliable in skewed distributions.
• Not Suitable for Categorical Data: Mean can only be calculated for numerical data,
not for categories like "Male" or "Female."
Limitations of Median:
• Ignores All Data Except the Middle Value: It does not consider how far values are
from each other, losing important details.
• Less Sensitive to Small Changes: If numbers change slightly, the median might stay
the same, reducing its responsiveness.
Limitations of Mode:
• May Not Exist or May Have Multiple Values: In some datasets, there may be no
mode or multiple modes, making interpretation difficult.
• Not Useful for Small Data Sets: In small datasets, mode may not provide
meaningful insights compared to mean or median.

What is the mean of sample and population

Validity Dispersion and Spreadness of Data
1. Validity
• Definition: Validity refers to how accurately a measurement or test represents what
it is supposed to measure.
• Example: A weighing scale is valid if it consistently shows the correct weight of an
object.

2. Dispersion
• Definition: Dispersion measures how to spread out the data points are in a dataset.
It shows variability in data.
• Common Measures: Range, Variance, Standard Deviation.
• Example: In two classes, if students' scores are (50, 52, 55, 58, 60) in one and (30,
40, 50, 60, 60, 70) in another, the second class has higher dispersion in scores.

3. Spread of Data
• Definition: The spread of data describes how values in a dataset differ from each
other and from the central value.
• Example: In a marathon, if runners finish within 1–2 minutes of each other, the
spread is small. If some finish in 2 hours and others in 4 hours, the spread is large.
Range, IQR, Variance, Standard Deviation and Standard Error
1. Range
o Definition: The difference between the maximum and minimum values in a
dataset. It measures the total spread of data.
o Formula: Range=Max Value−Min Value.
o Example: If exam scores are (45, 50, 60, 70, 90),
then: Range=90−45=45
o Interpretation: The data is spread out over a range of 45 points.

2. Interquartile Range (IQR)

o Definition: The difference between the third quartile (Q3) and first quartile
(Q1) of a dataset. It measures the spread of the middle 50% of data.
o Formula: IQR=Q3−Q1
o Example: If Q1 = 50 and Q3 = 80,
then: IQR=80−50=30
o Interpretation: The middle 50% of the data lies within a 30-point spread.
Normal Distribution and Standard Deviation
1. Normal Distribution
• Definition: A bell-shaped, symmetrical probability distribution where most values
cluster around the mean.
• Characteristics:
o Mean, median, and mode is equal.
o Data is symmetrically distributed around the mean.
o Follows the 68-95-99.7 rule (Empirical Rule).
• Real-Life Example:
o Height of People: In a large population, most people have an average height,
with fewer people being extremely short or tall, forming a bell curve.

2. Standard Deviation in Normal Distribution

• Definition: Measures of how spread-out values are from the mean in a normal
distribution.
• Empirical Rule (68-95-99.7 Rule):
o 68% of data falls within 1 standard deviation (σ) of the mean.
o 95% falls within 2σ.
o 99.7% falls within 3σ.
• Real-Life Example:
o IQ Scores: The average IQ is 100 with a standard deviation of 15.
▪ 68% of people have an IQ between 85 and 115.
▪ 95% have IQs between 70 and 130.
▪ 99.7% fell between 55 and 145.

Data Distribution and Types of Data Distribution

1. What is Data Distribution?
• Definition: Data distribution refers to how data points are spread across a dataset.
It helps in understanding the shape, central tendency, and variability of data.
• Importance: Used in statistics, machine learning, and data science to make
predictions and analyze patterns.

2. Types of Data Distribution

1. Normal Distribution (Gaussian Distribution)
• Definition: A symmetrical, bell-shaped distribution where most values cluster
around the mean.
• Example: Heights of people in a large population follow a normal distribution.
2. Skewed Distribution
• Definition: A distribution where data is asymmetrically spread.
• Types:
o Right-Skewed (Positive Skew): Tail extends to the right (e.g., income
distribution).
o Left-Skewed (Negative Skew): Tail extends to the left (e.g., age at retirement).
• Example: Salaries in a company (a few high earners create right skew).
3. Uniform Distribution
• Definition: All values have equal probability of occurring.
• Example: Rolling a fair die, where each number (1-6) has an equal chance of
appearing.
4. Binomial Distribution
• Definition: Represents two possible outcomes (Success/Failure) over multiple
trials.
• Example: Flipping a coin 10 times and counting heads.
5. Poisson Distribution
• Definition: Models the probability of rare events occurring in a fixed interval of time
or space.
• Example: Number of customer arrivals at a bank per hour.
6. Exponential Distribution
• Definition: Models the time between events in a Poisson process.
• Example: Time between earthquakes in a region.

3. Real-Life Example

• Traffic Flow: The number of cars passing through a toll booth per hour often follows
Poisson distribution, where certain peak times have a higher probability of traffic
congestion.

Skewness and Kurtosis

1. Skewness
• Definition: Skewness measures the asymmetry of data distribution. It tells whether
data is symmetrically distributed or leans toward one side.
• Types of Skewness:
o Positive Skew (Right Skewed): Tail extends toward the right (higher values).
o Negative Skew (Left-Skewed): Tail extends toward the left (lower values).
o Zero Skewness: Data is perfectly symmetrical (normal distribution).
• Real-Life Example: Income Distribution is right-skewed because most people earn
average wages, while a few people earn extremely high salaries.

2. Kurtosis
• Definition: Kurtosis measures the "tailedness" of a data distribution, showing how
extreme values (outliers) affect it.
• Types of Kurtoses:
o Leptokurtic (High Kurtosis): Tall, sharp peak with heavy tails (more extreme
outliers).
o Mesokurtic (Normal Kurtosis): Moderate peak and normal tails (follows
normal distribution).
o Platykurtic (Low Kurtosis): Flat peak with light tails (fewer outliers).
• Real-Life Example: Stock Market Returns often show leptokurtic behavior
because stock prices sometimes have extreme fluctuations.

Differentiating Between Primary and Secondary Data

Sampling and Sampling Techniques

1. What is Sampling?
• Definition: Sampling is the process of selecting a subset of individuals from a larger
population to analyze and make predictions about the whole population.
• Importance: Saves time, cost, and effort compared to studying the entire
population.

2. Types of Sampling Techniques

A. Probability Sampling (Random Selection, Equal Chance)
1. Simple Random Sampling: Every individual has an equal chance of being selected.
o Example: Drawing names from a hat for a prize.
2. Stratified Sampling: Population is divided into groups (strata), and samples are
taken proportionally from each group.
o Example: Selecting students from different grades in a school for a survey.
3. Cluster Sampling: Population is divided into clusters (groups), and entire clusters
are randomly selected.
o Example: Choosing random neighborhoods in a city to survey.
4. Systematic Sampling: Every nth person is selected from a list.
o Example: Surveying every 10th customer entering a mall.

B. Non-Probability Sampling (Non-Random, Based on Convenience)

1. Convenience Sampling: Choosing participants based on ease of access.
o Example: Surveying people at a nearby coffee shop.
2. Judgmental (Purposive) Sampling: Selecting individuals based on expert
judgment.
o Example: Interviewing top doctors about a new treatment.
3. Snowball Sampling: Existing participants refer to new participants (useful for hard-
to-reach populations).
o Example: Researching drug addiction by asking participants to refer others.
4. Quota Sampling: Selecting a fixed quota from specific groups.
o Example: Surveying 50 men and 50 women for a market study.

3. Real-Life Example:
• A company launching a new product uses stratified samples to ensure feedback
from different age groups, ensuring a balanced and representative sample.
Representative, Non-Representative Sampling Techniques
Hybrid Sampling
What is Hybrid Sampling?
• Definition: Hybrid sampling is a method that combines two or more sampling techniques
(probability and non-probability) to improve data collection efficiency and accuracy.
• Purpose: It is used to balance representation and practicality, ensuring better coverage of
diverse populations while saving time and cost.

Real-Life Example:
• A healthcare study wants to analyze patient satisfaction in a city. They use:
o Stratified Sampling to divide patients into age groups.
o Convenience Sampling to collect data from hospitals where researchers have easy
access.
o Snowball Sampling to reach patients with rare diseases through referrals.

Differentiate between Descriptive and Inferential Statistics

Choosing the Right Statistical Method

Choosing the right statistical method depends on data type, objective, and analysis requirements.
Below are the key steps to determine the best approach:
1. Identify the Type of Data
• Numerical Data (Continuous or Discrete) → Use parametric tests like t-tests, ANOVA,
regression.
• Categorical Data → Use non-parametric tests like Chi-square tests.

3. Consider Data Distribution

• If Data is Normally Distributed (Bell Curve) → Use parametric tests (T-test,
ANOVA).
• If Data is Not Normally Distributed → Use non-parametric tests (Mann-Whitney U
test, Kruskal-Wallis).
4. Sample Size Matters
• Large Sample (>30) → Use parametric tests (Central Limit Theorem applies).
• Small Sample (<30) → Use non-parametric tests (less assumption-dependent).
5. One Real-Life Example
• A healthcare researcher wants to compare blood pressure levels between two
groups: those who take medicine and those who don’t.
o Right method? T-test (since blood pressure is numerical and comparing two
groups).

Confidence Interval
Definition: A confidence interval (CI) is a range of values, derived from sample data, that is
likely to contain the true population parameter (e.g., mean or proportion) with a certain
level of confidence.
Purpose: It provides an estimate of uncertainty in statistical analysis.
Common Confidence Levels:
• 90% CI → There's a 90% chance the population parameter falls within the interval.
• 95% CI → The most common, meaning there’s a 95% chance the true value is within
the range.
• 99% CI → More precise but leads to a wider interval.

Difference Between Dependent and Independent Variables

Inferential Statistics and Hypothesis Testing

1. What is Inferential Statistics?
• Definition: Inferential statistics is the process of analyzing sample data to make
predictions, generalizations, or conclusions about a larger population.
• Purpose: It helps in making data-driven decisions based on probability and
estimation.
• Methods Used:
o Hypothesis Testing
o Confidence Intervals
o Regression Analysis
Example:
A marketing team analyzes the purchasing behavior of 1,000 customers and uses
inferential statistics to predict the behavior of 1 million customers.

2. What is Hypothesis Testing?

• Definition: Hypothesis testing is a statistical method used to determine whether
there is enough evidence in a sample to support or reject a claim about a population.
• Purpose: Helps in making scientific and business decisions based on statistical
evidence.
• Types of Hypothesis Tests:
o T-Test (Comparing two means)
o Chi-Square Test (For categorical data)
o ANOVA (Comparing more than two groups)
Example:
A pharmaceutical company tests whether a new drug is more effective than the existing
one. They conduct a hypothesis test to determine if there is a statistically significant
improvement in patients using the new drug.

Unit-3 DS
No ratings yet
Unit-3 DS
21 pages
BA Th Exam
No ratings yet
BA Th Exam
38 pages
Chapter 1
No ratings yet
Chapter 1
3 pages
BA UNIT 1 NOTES
No ratings yet
BA UNIT 1 NOTES
10 pages
(STATS) Module 3
No ratings yet
(STATS) Module 3
2 pages
Pa 1 2024
No ratings yet
Pa 1 2024
88 pages
22UCS303 DS-Unit III-N
No ratings yet
22UCS303 DS-Unit III-N
85 pages
4.02 Statistics Fundamentals
No ratings yet
4.02 Statistics Fundamentals
2 pages
Unit 3
No ratings yet
Unit 3
30 pages
Quantitative Techniques for Management
No ratings yet
Quantitative Techniques for Management
18 pages
SM Session 1 IPL 2024 Post Session Slides
No ratings yet
SM Session 1 IPL 2024 Post Session Slides
44 pages
DBB2102 QUANTITATIVE TECHNIQUES FOR MANAGEMENT (18)
No ratings yet
DBB2102 QUANTITATIVE TECHNIQUES FOR MANAGEMENT (18)
12 pages
What Exactly Is Data Science
No ratings yet
What Exactly Is Data Science
15 pages
Data Types
No ratings yet
Data Types
5 pages
Chapter 1
No ratings yet
Chapter 1
8 pages
EBA2123 1.Data and Statistics
No ratings yet
EBA2123 1.Data and Statistics
36 pages
FDS Module 1 Notes
No ratings yet
FDS Module 1 Notes
27 pages
Basics of Data and Types of Data
No ratings yet
Basics of Data and Types of Data
3 pages
E-Note_33325_Content_Document_20250319114322AM
No ratings yet
E-Note_33325_Content_Document_20250319114322AM
69 pages
Data Analytics (Finished
No ratings yet
Data Analytics (Finished
4 pages
Assignment DSBDS Insem
No ratings yet
Assignment DSBDS Insem
6 pages
Data Analytics Theory
No ratings yet
Data Analytics Theory
35 pages
1-Introduction To Statistics PDF
100% (1)
1-Introduction To Statistics PDF
37 pages
Data Ana With R
No ratings yet
Data Ana With R
45 pages
FDS Unit 1 Notes
No ratings yet
FDS Unit 1 Notes
53 pages
Statistics & Data
No ratings yet
Statistics & Data
11 pages
LBYACST [Lecture Notes] (3)
No ratings yet
LBYACST [Lecture Notes] (3)
9 pages
Notes of Week-1 and Week-2
No ratings yet
Notes of Week-1 and Week-2
30 pages
Part 1 - Basic Statistics
No ratings yet
Part 1 - Basic Statistics
44 pages
SBE - 11e ch01
No ratings yet
SBE - 11e ch01
36 pages
Types of Data and Data Quality: KIT306/606: Data Analytics Unit Coordinator: A/Prof. Quan Bai University of Tasmania
No ratings yet
Types of Data and Data Quality: KIT306/606: Data Analytics Unit Coordinator: A/Prof. Quan Bai University of Tasmania
25 pages
Business Mathematics and Statistics: Dr. Muhammad Arif Hussain
No ratings yet
Business Mathematics and Statistics: Dr. Muhammad Arif Hussain
39 pages
business Analytics (tanya pandey) mba m3a
No ratings yet
business Analytics (tanya pandey) mba m3a
64 pages
Unit 1 Introduction
No ratings yet
Unit 1 Introduction
86 pages
4.0 Introduction to Data
No ratings yet
4.0 Introduction to Data
16 pages
Exploratory Data Analysis
100% (1)
Exploratory Data Analysis
209 pages
Lecture 01-05 Data, Central Tendency PDF
No ratings yet
Lecture 01-05 Data, Central Tendency PDF
51 pages
Reviewer +Ch+1+Data+and+Data+Preparation+
No ratings yet
Reviewer +Ch+1+Data+and+Data+Preparation+
3 pages
BoS - Session 1
100% (1)
BoS - Session 1
37 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
16 pages
1 Introduction
No ratings yet
1 Introduction
15 pages
DAT100_Int_Data_Ana_Lec3_Types_Of_Data
No ratings yet
DAT100_Int_Data_Ana_Lec3_Types_Of_Data
35 pages
CH 01
No ratings yet
CH 01
11 pages
Unit-2-1
No ratings yet
Unit-2-1
48 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
15 pages
Dealing with Different Type of Data
No ratings yet
Dealing with Different Type of Data
32 pages
Business Data Analysis and Interpretation Notes Lecture Notes Lectures 1 13
No ratings yet
Business Data Analysis and Interpretation Notes Lecture Notes Lectures 1 13
20 pages
BA1 Introduction 2025
No ratings yet
BA1 Introduction 2025
55 pages
Lecture 1,2&3
No ratings yet
Lecture 1,2&3
80 pages
Notes (Chapter 1 - 3)
No ratings yet
Notes (Chapter 1 - 3)
15 pages
Chapter 1.1 Introduction to Data
No ratings yet
Chapter 1.1 Introduction to Data
10 pages
EDA 1
No ratings yet
EDA 1
137 pages
STATISTICS FOR-COMPUTING 1
No ratings yet
STATISTICS FOR-COMPUTING 1
36 pages
Statistical Learning - Introduction
No ratings yet
Statistical Learning - Introduction
20 pages
2. Know_Your_Data and Rescaling
No ratings yet
2. Know_Your_Data and Rescaling
72 pages
Classes of Data
No ratings yet
Classes of Data
10 pages
Probability Theory 1st Week
No ratings yet
Probability Theory 1st Week
44 pages
MGT 1103
No ratings yet
MGT 1103
4 pages
Ahsan Stats
No ratings yet
Ahsan Stats
9 pages
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Johnson 7e Sarq 19
No ratings yet
Johnson 7e Sarq 19
5 pages
StatProb11 Q4 Mod1 Tests of Hypothesis v4
67% (3)
StatProb11 Q4 Mod1 Tests of Hypothesis v4
65 pages
Astm - G16
100% (1)
Astm - G16
14 pages
Chapter 8 Statistics
100% (1)
Chapter 8 Statistics
47 pages
Financial Ratios and Stock Prices: Consistency or Discrepancy? Longitudinal Comparison Between Uae and Usa
No ratings yet
Financial Ratios and Stock Prices: Consistency or Discrepancy? Longitudinal Comparison Between Uae and Usa
10 pages
CHP 5
No ratings yet
CHP 5
101 pages
VeReMi Dataset Article 1
No ratings yet
VeReMi Dataset Article 1
20 pages
CH 8
100% (3)
CH 8
18 pages
Hypothesis Testing - Online - Lecture - 1
No ratings yet
Hypothesis Testing - Online - Lecture - 1
15 pages
Terminology
No ratings yet
Terminology
9 pages
Examples-Ch9 1
No ratings yet
Examples-Ch9 1
5 pages
Final Exam Fall 2019
No ratings yet
Final Exam Fall 2019
12 pages
A Refresher On A B Testing
No ratings yet
A Refresher On A B Testing
9 pages
Type I and Type II Errors
100% (1)
Type I and Type II Errors
8 pages
Practice of Nursing Research Appraisal Synthesis and Generation of Evidence 7th Edition Grove Test Bank - Available For One-Click Instant Download
100% (3)
Practice of Nursing Research Appraisal Synthesis and Generation of Evidence 7th Edition Grove Test Bank - Available For One-Click Instant Download
47 pages
Mohd B. Makmor Bakry, PH.D., R.PH
No ratings yet
Mohd B. Makmor Bakry, PH.D., R.PH
12 pages
Lec2 PDF
No ratings yet
Lec2 PDF
8 pages
Lecture No. 7 Hypothesis Testing 2
No ratings yet
Lecture No. 7 Hypothesis Testing 2
44 pages
Hypothesis Testing Z-Test and T-Test
No ratings yet
Hypothesis Testing Z-Test and T-Test
13 pages
Edu 2009 Fall Exam C Questions PDF
No ratings yet
Edu 2009 Fall Exam C Questions PDF
172 pages
Multiple Choice Questions On Quantitative Techniques
No ratings yet
Multiple Choice Questions On Quantitative Techniques
20 pages
L15 Testing of Hypothesis
No ratings yet
L15 Testing of Hypothesis
42 pages
Hypothesis Testing: Frances Chumney, PHD
No ratings yet
Hypothesis Testing: Frances Chumney, PHD
38 pages
RM Practice Questions Inferential Statistics Mainly
No ratings yet
RM Practice Questions Inferential Statistics Mainly
24 pages
Inferential Statistics: Estimation Hypothesis Testing
No ratings yet
Inferential Statistics: Estimation Hypothesis Testing
59 pages
Event Studies Lecture
No ratings yet
Event Studies Lecture
113 pages
Lec4 - Variables and Hypotheses
100% (1)
Lec4 - Variables and Hypotheses
28 pages
Faculty - Computer Sciences and Mathematic - 2022 - Session 1 - Degree - Sta408
No ratings yet
Faculty - Computer Sciences and Mathematic - 2022 - Session 1 - Degree - Sta408
10 pages
Learning Competencies:: Lesson 1: Hypothesis Testing
No ratings yet
Learning Competencies:: Lesson 1: Hypothesis Testing
45 pages
Hypothesis..
No ratings yet
Hypothesis..
22 pages

Uploaded by

Uploaded by

Statistics for Data Science

Why is Statistics Important?

Discrete, Continuous and Boolean Datasets

Key Characteristics of Time Series Data:

What is Special Data?

Types of Spatial Data:

What are the Boolean data types?

What is Ture and Error, Types of Errors

Types of Errors in Measurement

Type I & II Error

Reliability and Validity

How to Remove Bias in Data?

What is Statistics, types of Statistics?

Mean, Median and Mode of data

Median (Middle Value)

Limitations of mean, median and mode of the data

What is the mean of sample and population

2. Interquartile Range (IQR)

2. Standard Deviation in Normal Distribution

Data Distribution and Types of Data Distribution

2. Types of Data Distribution

Skewness and Kurtosis

Differentiating Between Primary and Secondary Data

Sampling and Sampling Techniques

2. Types of Sampling Techniques

B. Non-Probability Sampling (Non-Random, Based on Convenience)

Differentiate between Descriptive and Inferential Statistics

Choosing the Right Statistical Method

3. Consider Data Distribution

Difference Between Dependent and Independent Variables

Inferential Statistics and Hypothesis Testing

2. What is Hypothesis Testing?

You might also like