0% found this document useful (0 votes)
6 views

Lecture 10 Logistic Regression Part 1

The document provides an overview of logistic regression, contrasting it with linear regression, and discusses its application in predicting binary outcomes. It emphasizes the importance of using quantitative metrics for evaluating machine learning models, including accuracy, precision, and RMSE. Additionally, it explains the process of converting predicted values into probabilities and class labels for classification tasks.

Uploaded by

pateljil0247
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Lecture 10 Logistic Regression Part 1

The document provides an overview of logistic regression, contrasting it with linear regression, and discusses its application in predicting binary outcomes. It emphasizes the importance of using quantitative metrics for evaluating machine learning models, including accuracy, precision, and RMSE. Additionally, it explains the process of converting predicted values into probabilities and class labels for classification tasks.

Uploaded by

pateljil0247
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Introduction to Data

Analytics
ITE 5201
Lecture10-Logistic Regression
Instructor: Parisa Pouladzadeh
Email: [email protected]
Simple linear regression
Table 1 Age and systolic blood pressure (SBP) among 33 adult women

Copyright © 2018 Pearson Education, Inc. All Rights Reserved.


SBP (mm Hg)

220

SBP = 81.54 + 1.222  Age


200

180

160

140

120

100

80
20 30 40 50 60 70 80 90

Age (years)

adapted from Colton T. Statistics in Medicine. Boston: Little Brown, 1974

Copyright © 2018 Pearson Education, Inc. All Rights Reserved.


Simple linear regression
Relation between 2 continuous variables (SBP and age)

y
Slope y = α + β 1x 1
x

Regression coefficient b1
◦ Measures association between y and x
◦ Amount by which y changes on average when x changes by one unit
◦ Least squares method

Copyright © 2018 Pearson Education, Inc. All Rights Reserved.


Logistic regression

Copyright © 2018 Pearson Education, Inc. All Rights Reserved.


Copyright © 2018 Pearson Education, Inc. All Rights Reserved.
Logistic regression
Table 2 Age and signs of coronary heart disease (CD)

Copyright © 2018 Pearson Education, Inc. All Rights Reserved.


How can we analyse these data?
Compare mean age of diseased and non-diseased

◦ Non-diseased: 38.6 years


◦ Diseased: 58.7 years (p<0.0001)

Linear regression?

Copyright © 2018 Pearson Education, Inc. All Rights Reserved.


Dot-plot: Data from Table 2
Y
es
Signs of coronary disease

N
o

0 2
0 4
0 6
0 8
0 1
00
A
GE(y
ears
)

Copyright © 2018 Pearson Education, Inc. All Rights Reserved.


Logistic regression
Linear regression is used to predict outputs on a continuous spectrum.
o Example: predicting revenue based on the outside air temperature.

Logistic regression is used to predict binary outputs with two possible values labeled "0" or "1"
o Logistic model output can be one of two classes: pass/fail, win/lose, healthy/sick
Hours Studying Pass/Fail
1 0
PASS/FAIL

1.5 0
2 0
LINEAR MODEL
3 1
3.25 0
4 1
5 1
HOURS OF STUDYING 6 1

Copyright © 2018 Pearson Education, Inc. All Rights Reserved.


Logistic regression
• Linear regression is not suitable for classification problem.
• Linear regression is unbounded, so logistic regression will be better candidate in which
the output value ranges from 0 to 1.

LINEAR MODEL
PASS/FAIL

LOGISTIC REGRESSION
MODEL

HOURS OF STUDYING

Copyright © 2018 Pearson Education, Inc. All Rights Reserved.


Logistic regression
• Logistic regression algorithm works by implementing a linear equation first with
independent predictors to predict a value.
• We then need to convert this value into a probability that could range from 0 to 1.
PASS/FAIL

LOGISTIC REGRESSION
MODEL

HOURS OF STUDYING

Copyright © 2018 Pearson Education, Inc. All Rights Reserved.


Logistic regression
FROM PROBABILITY TO CLASS

• Now we need to convert from a probability to a class value which is “0” or “1”.
PASS/FAIL

LOGISTIC REGRESSION
MODEL
Class 1
0.5

Class 0 THRESHOLD

HOURS OF STUDYING

Copyright © 2018 Pearson Education, Inc. All Rights Reserved.


Metrics
It is extremely important to use quantitative metrics for evaluating a
machine learning model

For classification
Accuracy/Precision/Recall/F1-score, ROC curves,…

For regression
Normalized RMSE, Normalized Mean Absolute Error (NMAE)

Copyright © 2018 Pearson Education, Inc. All Rights Reserved.


Metrics

Copyright © 2018 Pearson Education, Inc. All Rights Reserved.


Metrics

Copyright © 2018 Pearson Education, Inc. All Rights Reserved.


MSE
MSE (Mean Squared Error) is the average squared error between actual
and predicted values.
Squared error, is a row-level error calculation where the difference
between the prediction and the actual is squared.
The main draw for using MSE is that it squares the error, which results in
large errors being punished or clearly highlighted.

Copyright © 2018 Pearson Education, Inc. All Rights Reserved.


RMSE
Root Mean Squared Error (RMSE) is the square root of the mean
squared error (MSE) between the predicted and actual values.

A benefit of using RMSE is that the metric it produces is in terms of the


unit being predicted. For example, using RMSE in a house price
prediction model would give the error in terms of house price, which
can help end users easily understand model performance.

Copyright © 2018 Pearson Education, Inc. All Rights Reserved.


R squared compared to RMSE
RMSE (or MSE) is the measure of goodness of predicting the
validation/test values, while R^2 is a measure of goodness of fit in
capturing the variance in the training set.
R Square is not only a measure of Goodness-of-fit, it is also a measure of
how much the model (the set of independent variables you selected)
explain the behavior of your dependent variable.

Copyright © 2018 Pearson Education, Inc. All Rights Reserved.

You might also like