0% found this document useful (0 votes)

6 views

02 Regression and Classification Problems

The document discusses regression and classification problems in data analysis, defining regression as supervised learning for continuous output and classification for discrete output. It explains linear regression types, evaluation metrics like Mean Squared Error and Root Mean Squared Error, and introduces logistic regression for predicting categorical outcomes. The document also provides examples of parameter estimation and application of logistic regression in predicting outcomes based on specific data.

Uploaded by

meghanaalluri2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

02 Regression and Classification Problems

Uploaded by

meghanaalluri2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Regression and Classification Problems

The expression multivariate analysis is used to describe analyses of data that are
multivariate in the sense that numerous observations or variables are obtained for each
individual or unit studied.
Regression Problems:
• Supervised learning problems where the output is a continuous value are called as
regression problems.
• The Regression technique is used for predicting a continuous value.
• For example, predicting things like the price of a house based on its characteristics,
or to estimate the Co2 emission from a car’s engine, etc.

Regression Analysis In statistical modeling, regression analysis is a set of statistical

processes for estimating the relationships among variables.
Regression analysis is a predictive modelling technique. It estimates the relationship
between the input variables (x) and the output variable (y). Regression is a problem of
predicting the value 𝑌� (or response) given the values of the input variables x 1,x2, ...,x𝑝�
(or predictors).
• In linear regression, we assume that the function 𝑓�(𝑋�) corresponding to the
relationship 𝑌� = 𝑓�(x1,x2, ...,x𝑝�) is linear.
• The task is to find coefficients for the linear model (parameter estimation).
There are two types of Linear Regression models:
Simple Linear Regression:
• When there is a single input variable (x), the method is referred to as simple
linear regression.
• Predict Co2emission using EngineSize of all cars
• Independent variable (x): EngineSize
• Dependent variable (y): Co2emission
Multiple Linear Regression:
• When there are multiple input variables, literature from statistics often refers to
the method as multiple linear regression.
• Predict Co2emission using EngineSize and Cylinders of all cars
• Independent variables (x): EngineSize, Cylinders
• Dependent variable (y): Co2emission

Simple Linear Regression

• The simplest mathematical relationship between two variables x and y is a linear
relationship:
y = β0 + β1x
• x: the input, or independent, or predictor, or explanatory variable (usually
known).
• y: the output, or dependent, or response, or study variable.
• Objective: to find out the parameters.
• The points (x1, y1), …, (xn, yn) resulting from n independent observations will then
be scattered about the true regression line:

• The simple linear regression model is:

y = β0 + β1x c +
where:
b0 and b1 are called parameters of the model,
e is a random variable called the error term.
Evaluation Metrics in Regression Models:
Evaluation metrics are used to explain the performance of a model. As mentioned,
basically, we can compare the actual values and predicted values, to calculate the
accuracy of our regression model.
A residual is a measure of how far away a point is from the regression line. Simply, it is
the error between a predicted value and the observed actual value.
Mean Squared Error (MSE) is the mean of the squared error. It's more popular than
mean absolute error because the focus is geared more towards large errors. This is due
to the squared term exponentially increasing larger errors in comparison to smaller ones.

Root Mean Squared Error (RMSE) is the square root of the mean squared error. This
is one of the most popular of the evaluation metrics because root mean squared error is
interpretable in the same units as the response vector or y units, making it easy to relate
its information.

Estimation of Parameters in Simple Linear Regression using Ordinary Least

Squares:
Ordinary Least Squares (OLS) works by minimizing the sum of the squares of the
differences between the observed dependent variable in the given dataset and those
predicted by the linear function.
This method allows finding such estimators 𝛽 ̂0 and 𝛽
̂1 for parameters β0 and β1 that
minimize the sum of squared errors 𝜀�(β0, β1) in the observed 𝑛� experiments.
In other words, we minimize the function

and find the arguments minimizing the function.

To solve the minimization problem, we can use the following theorem.
Theorem: The minimum of the function

is unique and attained when

Where 𝑋̅ is the mean of x values, and 𝑌̅ is the mean of y values.

Example – Dataset of patient's age and their blood pressure

Our aim is to find the regression line:

𝑋̅ = 491/10= 49.1, and 𝑌̅ = 1410/10= 141

The slope (β1) can be calculated as: β1= 2335/2048.9 =1.14
The intercept (β0) is calculated as: β0= 141-1.14*49.1 = 85.026
• Now substitute the regression coefficients into the regression equation
• Estimated blood pressure:
(Ŷ) = 85.026 + 1.14 * 𝑎�𝑔�𝑒�

Classification Problems:
• The problems where the output is a discrete value are called as classification
problems.
• Classification is the process of predicting a discrete class label, or categories.
• For example, if a cell is benign or malignant, if an email is spam or not.
• The classification problem not necessarily has only two outcomes, which means
it isn’t limited to two classes. For example, the problem of handwritten digit
recognition (that is a classification problem) has ten outcome

Logistic Regression
Logistic regression is a classification algorithm designed to predict categorical target
labels based on historical feature data. It allows us to predict the probability of a
dependent variable given an input, and a model. Logistic regression can be used for both
binary classification and multi-class classification.
Sigmoid Function
Logistic Regression uses the sigmoid function also known as the logistic function to
perform classification. The sigmoid function takes in any value and map it into a
value between 0 and 1. The key thing to notice here is that it doesn’t matter what value
of y you put into the logistics or the sigmoid function you’ll always get a value between
0 and 1. This means we can take our linear regression solution and place it into
the sigmoid function and it looks something like this:

• We can formulate the algorithm for predicting the class of the new object x with
the predictors (x1, x2, ..., x𝑝�) once the coefficients β0, β1, ..., β𝑝� are found.
1. Calculate the value 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥 2 + ⋯ + 𝛽𝑝 𝑥 𝑝
2. Calculate the probability P:
3. If P ≥ 0.5, the object x will fall into the class 1 or 0 otherwise.
(In practice, the choice of a probability cut-off is up to a researcher)

Let’s apply the logistic regression algorithm to specific data.

• Our data is football statistics. It has three predictors, including shots on target
(𝑋�1), possession (𝑋� 2), and shots (𝑋�3).
• The response 𝑌� takes only two values. The value 1 corresponds to a win (class
+1), and the value 0 is a loss or draw (class 0).
• The training data provides the following values of the model parameters:
β0= −0.046, β1=0.541, β2= −0.014, β3= −0.132.
• We classify the new object 𝑧�:
𝑧� = (1, 40, 3).
• It’s a team that had 1 shot on target, 40 percent of possession, and 3 shots.
According to the described algorithm, the probability that the team wins equals:
1
P+= 1+𝑒 −(𝛽0 +𝛽1 𝑥1+𝛽2 𝑥2 +𝛽3 𝑥3 )
1
=1+𝑒 −(−0.046+0.541∗1−0.014∗40−0.132∗3)
=0.38
• It means that it will likely lose.

Dental Research Report
No ratings yet
Dental Research Report
3 pages
MIL-STD-883 Test Method STD - Microcircuits
100% (1)
MIL-STD-883 Test Method STD - Microcircuits
662 pages
Soluciones Libro Daniel
No ratings yet
Soluciones Libro Daniel
273 pages
ML - LAB - BE CSE (DS) Final
No ratings yet
ML - LAB - BE CSE (DS) Final
110 pages
DMML Unit4
No ratings yet
DMML Unit4
77 pages
Regression
No ratings yet
Regression
11 pages
chapter 8
No ratings yet
chapter 8
39 pages
Module 4
No ratings yet
Module 4
41 pages
L4a - Supervised Learning
No ratings yet
L4a - Supervised Learning
25 pages
DS Unit-Iv
No ratings yet
DS Unit-Iv
34 pages
Supervised Machine Learning - Regression
No ratings yet
Supervised Machine Learning - Regression
34 pages
Applying_Machine_Learning_Algorithms_with_Scikit-learn(Sklearn)_-_Notes
No ratings yet
Applying_Machine_Learning_Algorithms_with_Scikit-learn(Sklearn)_-_Notes
19 pages
Linear-Regression ML
No ratings yet
Linear-Regression ML
36 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
UNIT-2 ML
No ratings yet
UNIT-2 ML
39 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
Lecture Material 11
No ratings yet
Lecture Material 11
14 pages
ML Unit-2 Final
No ratings yet
ML Unit-2 Final
32 pages
Unit 2
No ratings yet
Unit 2
19 pages
LEC2 مشين
No ratings yet
LEC2 مشين
116 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Teit ML2
No ratings yet
Teit ML2
11 pages
Module 5
No ratings yet
Module 5
48 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Lec 6
No ratings yet
Lec 6
19 pages
Regression
No ratings yet
Regression
14 pages
SUPERVISED MACHINE LEARNING
No ratings yet
SUPERVISED MACHINE LEARNING
56 pages
TYPES OF SUPERVISED LEARNING2
No ratings yet
TYPES OF SUPERVISED LEARNING2
66 pages
Linear and Logistic Regression
No ratings yet
Linear and Logistic Regression
21 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
ML UNIT II
No ratings yet
ML UNIT II
30 pages
Machine Learning: Bilal Khan
100% (2)
Machine Learning: Bilal Khan
20 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
Lecture 4 Linear Regression
100% (1)
Lecture 4 Linear Regression
44 pages
Lecture 2
No ratings yet
Lecture 2
17 pages
Article Module 4
No ratings yet
Article Module 4
8 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
(Unit-04) Part-01 - ML Algo
No ratings yet
(Unit-04) Part-01 - ML Algo
49 pages
U3 U4 Regression
No ratings yet
U3 U4 Regression
22 pages
Lecture-3---Linear-Regression-imran-20022025-092939am
No ratings yet
Lecture-3---Linear-Regression-imran-20022025-092939am
46 pages
ML_Unit2
No ratings yet
ML_Unit2
69 pages
Machine Learning and Deep Learning Course
No ratings yet
Machine Learning and Deep Learning Course
23 pages
Regression v33
No ratings yet
Regression v33
81 pages
Regression: Unit Iii
No ratings yet
Regression: Unit Iii
54 pages
Linear Regression
No ratings yet
Linear Regression
26 pages
Statics Thinking-Regression
No ratings yet
Statics Thinking-Regression
51 pages
Accuracy Assessment and Confusion Matrix
No ratings yet
Accuracy Assessment and Confusion Matrix
23 pages
Regression
No ratings yet
Regression
45 pages
Unit 2 ML
No ratings yet
Unit 2 ML
201 pages
4 ML
No ratings yet
4 ML
41 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Regression
No ratings yet
Regression
6 pages
Simple Linear and Logistic Regression
No ratings yet
Simple Linear and Logistic Regression
81 pages
ML Module 2
No ratings yet
ML Module 2
185 pages
ML-U2-Regression
No ratings yet
ML-U2-Regression
20 pages
unit-2.pptx
No ratings yet
unit-2.pptx
133 pages
Linear Regression - Everything You Need To Know About Linear Regression
No ratings yet
Linear Regression - Everything You Need To Know About Linear Regression
17 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
AAI Lecture 10 Sp 25
No ratings yet
AAI Lecture 10 Sp 25
37 pages
Machine learning
No ratings yet
Machine learning
62 pages
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
Fundamental Math
From Everand
Fundamental Math
Russell Pead
No ratings yet
03 Principal Components Analysis
No ratings yet
03 Principal Components Analysis
3 pages
05 Pictorial and Tabular Methods in Descriptive Inference
No ratings yet
05 Pictorial and Tabular Methods in Descriptive Inference
5 pages
01 Hidden Markov Models
No ratings yet
01 Hidden Markov Models
3 pages
NP Completeness
No ratings yet
NP Completeness
18 pages
R Plotting Code Outputs
No ratings yet
R Plotting Code Outputs
1 page
PG R 23 M.tech CSE Syllabus
No ratings yet
PG R 23 M.tech CSE Syllabus
127 pages
Tutorial5_Logic
No ratings yet
Tutorial5_Logic
21 pages
311 MCom Computer Applications 23-24 F
No ratings yet
311 MCom Computer Applications 23-24 F
37 pages
cha2 reserch exam
No ratings yet
cha2 reserch exam
21 pages
Common Factors Needed To Build A Profitable Small Business in South Caloocan City
50% (2)
Common Factors Needed To Build A Profitable Small Business in South Caloocan City
56 pages
IPPTCh 008
No ratings yet
IPPTCh 008
49 pages
Chapter 1 5 in PR
No ratings yet
Chapter 1 5 in PR
15 pages
J Model Test
No ratings yet
J Model Test
3 pages
g12 Inquires Investigations Immersion Module 5
No ratings yet
g12 Inquires Investigations Immersion Module 5
25 pages
Unit 4 Sustainable Development Indicators
No ratings yet
Unit 4 Sustainable Development Indicators
39 pages
Explanation of History
No ratings yet
Explanation of History
7 pages
Least Squares Data Fitting With Applications
0% (1)
Least Squares Data Fitting With Applications
175 pages
Week 7 Assignment
No ratings yet
Week 7 Assignment
6 pages
Worksheet 7 Solution
No ratings yet
Worksheet 7 Solution
4 pages
Day 2 - Sesi 1 - Pengantar ML
No ratings yet
Day 2 - Sesi 1 - Pengantar ML
28 pages
Research Paper
No ratings yet
Research Paper
10 pages
Using Machine Learning Models To Predict The Uber
No ratings yet
Using Machine Learning Models To Predict The Uber
7 pages
A Study On Customer Satisfaction Towards Lotte Chocolate: Project Report
No ratings yet
A Study On Customer Satisfaction Towards Lotte Chocolate: Project Report
57 pages
Impact of Online Learning Platform On Students Performance in Biology
No ratings yet
Impact of Online Learning Platform On Students Performance in Biology
11 pages
Components of Research
No ratings yet
Components of Research
4 pages
Stat Final Exam '17-'18
100% (3)
Stat Final Exam '17-'18
2 pages
Planning and Conducting Surveys
0% (1)
Planning and Conducting Surveys
29 pages
The Prevalence of Corruption in Somali
No ratings yet
The Prevalence of Corruption in Somali
26 pages
cs231n Github Io Neural Networks Case Study
No ratings yet
cs231n Github Io Neural Networks Case Study
17 pages
Homemade - An Exploratory Study On The Impact of Cooking On Famil
No ratings yet
Homemade - An Exploratory Study On The Impact of Cooking On Famil
81 pages
Scouting Players With FIFA19: Data Driven Approach To Scouting
No ratings yet
Scouting Players With FIFA19: Data Driven Approach To Scouting
3 pages
Quiz 1
No ratings yet
Quiz 1
8 pages
Project The Normal Distribution Activity
No ratings yet
Project The Normal Distribution Activity
17 pages
Forecasting Time Series With R - Dataiku
No ratings yet
Forecasting Time Series With R - Dataiku
16 pages
Solution Manual for Walston-Dunham Introduction to Law 7th download
100% (2)
Solution Manual for Walston-Dunham Introduction to Law 7th download
38 pages

Uploaded by

Uploaded by

Regression and Classification Problems

Regression Analysis In statistical modeling, regression analysis is a set of statistical

Simple Linear Regression

• The simple linear regression model is:

Estimation of Parameters in Simple Linear Regression using Ordinary Least

and find the arguments minimizing the function.

is unique and attained when

Example – Dataset of patient's age and their blood pressure

Our aim is to find the regression line:

𝑋̅ = 491/10= 49.1, and 𝑌̅ = 1410/10= 141

Let’s apply the logistic regression algorithm to specific data.

You might also like