0% found this document useful (0 votes)

17 views

FAM Unit6

Uploaded by

Ritika Darade

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

FAM Unit6

Uploaded by

Ritika Darade

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 32

Unit-6

Classification and
Regression
Linear Regression-

• In Machine Learning, Linear Regression is a

supervised machine learning algorithm.
• It tries to find out the best linear relationship that
describes the data you have.
• It assumes that there exists a linear relationship
between a dependent variable and independent
variable(s).
• The value of the dependent variable of a linear
regression model is a continuous value i.e. real
numbers.
Representing Linear Regression Model-
Linear regression model represents the linear relationship between
a dependent variable and independent variable(s) via a sloped
straight line.
The sloped straight line representing the linear relationship that
fits the given data best is called as a regression line.
Types of Linear Regression-
• Simple Linear Regression

Y = β0 + β 1 X
• Y is a dependent variable.
• X is an independent variable.
• β0 and β1 are the regression coefficients.
• β0 is the intercept or the bias that fixes the offset to a
line.
• β1 is the slope or weight that specifies the factor by
which X has an impact on Y.
• Multiple Linear Regression

Y = β0 + β1X1 + β2X2 + β3X3 + …… + βnXn

• Y is a dependent variable.
• X1, X2, …., Xn are independent variables.
• β0, β1,…, βn are the regression coefficients to
be determined through regression analysis.
Types of Relationship
• Positive Linear Relationship:
When dependent variable increases on y axis as
independent variable increases on x axis
Y=a0+a1x
• Negative Linear Relationship
When dependent variable decreases on y axis as
independent variable increases on x axis
Y=-a0+a1x
Evaluation Metrics for Linear Regression

A variety of evaluation measures can be used to determine the strength

of any linear regression model. These assessment metrics often give

an indication of how well the model is producing the observed

outputs.

Mean Square Error (MSE)

Mean Absolute Error (MAE)
Root Mean Squared Error (RMSE)
Mean Absolute Percentage Error(MAPE)
R-squared(R2)
Confusion Matrix
ROC AUC Curve
Log Loss
• Mean Squared Error (MSE) is an evaluation
metric that calculates the average of the squared
differences between the actual and predicted
values for all the data points. The difference is
squared to ensure that negative and positive
differences don’t cancel each other out.
MSE=1/n∑ (y–y^)/2
• Here,
• n is the number of data points.
• y is the actual or observed value
• y^is the predicted value for the i th data point.
Mean Absolute Error (MAE)
• Mean Absolute Error is an evaluation metric used to calculate the
accuracy of a regression model. MAE measures the average
absolute difference between the predicted values and actual
values.

• MAE=1/n∑∣Y–Y^∣

• Here,
• n is the number of observations
• Y represents the actual values.
• Y^ represents the predicted values

• Lower MAE value indicates better model performance. It is not

sensitive to the outliers as we consider absolute differences
Root Mean Squared Error (RMSE)
• The square root of the residuals variance is
the Root Mean Squared Error. It describes how
well the observed data points match the expected
values, or the model’s absolute fit to the data.
• In mathematical notation, it can be expressed as

RMSE=
MSE
Mean Absolute Percentage Error(MAPE)

• MAPE is defined as the average absolute percentage

difference between predicted values and actual values.
• Also known as the mean absolute percentage
deviation (MAPD)

• Formula: 1/N Σ(|actual-predicted|/|actual|)*100

• N is the number of fitted points;
• A is the actual value;
• F is the forecast value; and
• Σ is summation notation (the absolute value is summed
for every forecasted point in time).
R Squared

R squared measures the proportion of the

variance in the dependent variables.

• R2 = 1 – (RSS/TSS)
• R2 represents the required R Squared value,
• RSS represents the residual sum of squares, and
• TSS represents the total sum of squares.
• (txt book)
Confusion Matrix
• Used in classification problem which displays
TP,TN,FP,FN counts.
• It is also used to calculate various classification
metrics such as Accuracy, Precision, Recall and
F1 score.
ROC curve
• ROC curve used in binary classification , plots
the true positive rate (TPR) against the False
positive rate (FPR) at different threshold.

• AUC measures the area under the curve.

• Higher the AUC better the model.

Log Loss
• Log Loss used in the classification problems

• It quantifies the difference between predicted

probabilities and true class labels.

• Lower log loss values indicate better

performance.
Overfitting and Underfitting
Overfitting:
• Overfitting occurs when our machine learning model tries to
cover all the data points or more than the required data
points present in the given dataset.
• Because of this, the model starts caching noise and
inaccurate values present in the dataset, and all these factors
reduce the efficiency and accuracy of the model.
Reasons for Overfitting:
• The model is too complex.
• The size of the training data.
• High variance and low bias.
Underfitting:
• In the case of underfitting, the model is not able to learn
enough from the training data, and hence it reduces the
accuracy and produces unreliable predictions.
Reasons for Underfitting
• The model is too simple, So it may be not capable to
represent the complexities in the data.
• The input features which is used to train the model is not
the adequate representations of underlying factors
influencing the target variable.
• The size of the training dataset used is not enough.
• Excessive regularization are used to prevent the overfitting,
which constraint the model to capture the data well.
• Features are not scaled.
Calculate MSE using Excel
Step 1: Enter the actual values and forecasted values in two separate
columns.
Step 2: Calculate the squared error for
each row.
Step 3: Calculate the mean squared error.
Calculate MSE by simply finding the average of the values in column D:
MSE=1/n∑ (y–y^)2
Logistic Regression
• Logistic regression is used in binary classification where
we use sigmoid function, that takes input as independent
variables and produces a probability value between 0 and
1.

• For example, we have two classes Class 0 and Class 1 if

the value of the logistic function for an input is greater
than 0.5 (threshold value) then it belongs to Class 1
otherwise it belongs to Class 0. It’s referred to as
regression because it is the extension of linear
regression but is mainly used for classification problems.
• Logistic regression predicts the output of a categorical
dependent variable. Therefore, the outcome must be a
categorical or discrete value.
• It can be either Yes or No, 0 or 1, true or False, etc. but
instead of giving the exact value as 0 and 1, it gives
the probabilistic values which lie between 0 and 1.
• In Logistic regression, instead of fitting a regression
line, we fit an “S” shaped logistic function, which
predicts two maximum values (0 or 1).
Binary And Multiclass Classification
In binary classification, the goal is to classify the input
into one of two classes or categories.
Example – On the basis of the given health conditions
of a person, we have to determine whether the person
has a certain disease or not.

In multi-class classification, the goal is to classify the

input into one of several classes or categories.
Examples of multiclass classification include: face
classification, animal species classification
Assessing in Logistic regression
Confusion Matrix and classification Report:
It consist of Four components TP,TN,FP,FN
It basically assess the model’s Accuracy, precision call, recall,
F1 sore

• ROC Curve and AUC Score:

It is assessed by calculating TPR against FPR at different
threshold values.
It summarizes overall performance of the model with higher
AUC value
Indicating better performance.
• Calibration curve:
Used to assess how well the predicted
probabilities align with the Actual probabilities.

• Residual Plot
Residual plot is used for assessing the
performance of Regression Model.
It displays the discrepancies between predicted
and actual values.
• Multiclass Classification
Binary logistic regression
• Binary logistic regression works well for binary classification problems
that have only two possible outcomes. The dependent variable can have
only two values, such as yes and no or 0 and 1.
• Even though the logistic function calculates a range of values between 0
and 1, the binary regression model rounds the answer to the closest values.
Generally, answers below 0.5 are rounded to 0, and answers above 0.5 are
rounded to 1, so that the logistic function returns a binary outcome.

Multinomial logistic regression

• Multinomial regression can analyze problems that have several possible
outcomes as long as the number of outcomes is finite. For example, it can
predict if house prices will increase by 25%, 50%, 75%, or 100% based on
population data, but it cannot predict the exact value of a house.
• Multinomial logistic regression works by mapping outcome values to
different values between 0 and 1. Since the logistic function can return a
range of continuous data, like 0.1, 0.11, 0.12, and so on, multinomial
regression also groups the output to the closest possible values
One vs One
In One-vs-One classification, for the N-class instances dataset, we have to generate
the N* (N-1)/2 binary classifier models. Using this classification approach, we split
the primary dataset into one dataset for each class opposite to every other class.
One vs Rest
In one-vs-All classification, for the N-class instances dataset, we have to generate
the N-binary classifier models. The number of class labels present in the dataset
and the number of generated binary classifiers must be the same.
• As shown in the above image, consider we
have three classes, for example, type 1 for
Green, type 2 for Blue, and type 3 for Red.
• So we have to create three classifiers here for
three respective classes.
• Classifier 1:- [Green] vs [Red, Blue]
• Classifier 2:- [Blue] vs [Green, Red]
• Classifier 3:- [Red] vs [Blue, Green]
Sums on Confusion Matrix
• TP=86
• TN=79
• FP=12
• FN=10
Accuracy=?
Precision=?
Recall=?
F1score=?
Linear Regression vs Logistic Regression

Linear Regression Logistic Regression

Linear regression is used to predict the Logistic regression is used to predict the
continuous dependent variable using a given categorical dependent variable using a given
set of independent variables. set of independent variables.

Linear regression is used for solving regression

problem. It is used for solving classification problems.

In this we predict the value of continuous

variables In this we predict values of categorical variables

In this we find best fit line. In this we find S-Curve.

Least square estimation method is used for Maximum likelihood estimation method is used
estimation of accuracy. for Estimation of accuracy.

The output must be continuous value, such as Output must be categorical value such as 0 or
price, age, etc. 1, Yes or no, etc.

It required linear relationship between It not required linear relationship.

dependent and independent variables.

The Hundred-Page Machine Learning Book - Andriy Burkov
No ratings yet
The Hundred-Page Machine Learning Book - Andriy Burkov
16 pages
Dana S. Dunn, Suzanne Mannes - Statistics and Data Analysis For The Behavioral Sciences-McGraw-Hill Companies (2001)
100% (1)
Dana S. Dunn, Suzanne Mannes - Statistics and Data Analysis For The Behavioral Sciences-McGraw-Hill Companies (2001)
758 pages
Standardised Effect Size in Mixed ML Models
No ratings yet
Standardised Effect Size in Mixed ML Models
9 pages
Milk Bidding Wars: Data Driven Decision Making - Case Study 3
No ratings yet
Milk Bidding Wars: Data Driven Decision Making - Case Study 3
4 pages
Session 2 A (CH 10) - Canvas-Teaching W Soln
No ratings yet
Session 2 A (CH 10) - Canvas-Teaching W Soln
51 pages
Accuracy Assessment and Confusion Matrix
No ratings yet
Accuracy Assessment and Confusion Matrix
23 pages
ML CLASS 5 Logistic Regression Algorithm
No ratings yet
ML CLASS 5 Logistic Regression Algorithm
16 pages
11-Logistic Regression
No ratings yet
11-Logistic Regression
27 pages
ML - LAB - BE CSE (DS) Final
No ratings yet
ML - LAB - BE CSE (DS) Final
110 pages
DMML Unit4
No ratings yet
DMML Unit4
77 pages
Machine Learning
No ratings yet
Machine Learning
19 pages
6 ML Updated
No ratings yet
6 ML Updated
23 pages
Lecture Material 11
No ratings yet
Lecture Material 11
14 pages
ML UNIT II
No ratings yet
ML UNIT II
30 pages
Iet Cipher ML Bootcamp (Session-1)
No ratings yet
Iet Cipher ML Bootcamp (Session-1)
67 pages
ARTIFICIAL INTELLIGENCE LEC 4
No ratings yet
ARTIFICIAL INTELLIGENCE LEC 4
13 pages
Linear and Logistic Regression
No ratings yet
Linear and Logistic Regression
21 pages
Cp4252 Ml Unit-II
No ratings yet
Cp4252 Ml Unit-II
44 pages
ML_7th_Sem_AIML_ITE_Notes_Complete_LONG[1]-34-62
No ratings yet
ML_7th_Sem_AIML_ITE_Notes_Complete_LONG[1]-34-62
29 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
Machine learning notes
No ratings yet
Machine learning notes
12 pages
ML-2
No ratings yet
ML-2
155 pages
Unit 2 ML
No ratings yet
Unit 2 ML
201 pages
Regression
No ratings yet
Regression
35 pages
20-questions-to-test-your-skills-on-logistic-regression
No ratings yet
20-questions-to-test-your-skills-on-logistic-regression
9 pages
Unit II
100% (1)
Unit II
13 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Predictive ModellingAnalytics
No ratings yet
Predictive ModellingAnalytics
27 pages
ML Unit-2 Final
No ratings yet
ML Unit-2 Final
32 pages
Hundred Page ML Book CH 3
No ratings yet
Hundred Page ML Book CH 3
16 pages
Teit ML2
No ratings yet
Teit ML2
11 pages
Session-11 Machine Learning - Jupyter Notebook
No ratings yet
Session-11 Machine Learning - Jupyter Notebook
11 pages
Regression
No ratings yet
Regression
11 pages
DS_UNIT_4
No ratings yet
DS_UNIT_4
13 pages
Regression
No ratings yet
Regression
45 pages
ML-1
No ratings yet
ML-1
24 pages
Practical - Logistic Regression
No ratings yet
Practical - Logistic Regression
84 pages
Predictive Analytics (2)
No ratings yet
Predictive Analytics (2)
46 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
Unit-Vi 2
No ratings yet
Unit-Vi 2
31 pages
Unit I
No ratings yet
Unit I
14 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
Modern Pridictive Modelling(Regression)
No ratings yet
Modern Pridictive Modelling(Regression)
12 pages
Machine Learning: Bilal Khan
100% (2)
Machine Learning: Bilal Khan
20 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
M2 - Supervised Machine Learning
No ratings yet
M2 - Supervised Machine Learning
79 pages
AAI Lecture 10 Sp 25
No ratings yet
AAI Lecture 10 Sp 25
37 pages
Chapter 6 Supervised Learning
No ratings yet
Chapter 6 Supervised Learning
6 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Fai Module 3
No ratings yet
Fai Module 3
67 pages
ML Model Paper 2 Solution
No ratings yet
ML Model Paper 2 Solution
15 pages
03 Logistic Regression
No ratings yet
03 Logistic Regression
23 pages
unit-2.pptx
No ratings yet
unit-2.pptx
133 pages
Concepts - Regression Overview
No ratings yet
Concepts - Regression Overview
14 pages
Logistic Regression
No ratings yet
Logistic Regression
42 pages
09_23ECE216_LogisticRegression
No ratings yet
09_23ECE216_LogisticRegression
40 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
Session-11 Machine Learning
No ratings yet
Session-11 Machine Learning
27 pages
ML Algorithms Week 3
No ratings yet
ML Algorithms Week 3
30 pages
Week 6 - Lecture 12-1
No ratings yet
Week 6 - Lecture 12-1
34 pages
APznzaaV-S8wLPGsP_Add8mCHq3JcpXzeJ180tg4GWAcHx6DAgMVD3eyvT5dWstrOMVpGkO6YPvB6EzW3QMZ2MOlHap6AIHzt5bF4qrpZ6P5COArRIkGSOpTA3irJqdWr5VzZJgsslAEoNck-7XB6goMBGQ2C1xBIjiLrywLxqEZfdK9zE3-of9LPSjsbB_QkInc2mquD_oyBRUUJcHri
No ratings yet
APznzaaV-S8wLPGsP_Add8mCHq3JcpXzeJ180tg4GWAcHx6DAgMVD3eyvT5dWstrOMVpGkO6YPvB6EzW3QMZ2MOlHap6AIHzt5bF4qrpZ6P5COArRIkGSOpTA3irJqdWr5VzZJgsslAEoNck-7XB6goMBGQ2C1xBIjiLrywLxqEZfdK9zE3-of9LPSjsbB_QkInc2mquD_oyBRUUJcHri
199 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Lesson - Correlation Linear Regression
No ratings yet
Lesson - Correlation Linear Regression
84 pages
Econometrics Assignment 1 Fall 2020
No ratings yet
Econometrics Assignment 1 Fall 2020
2 pages
Additional Problem Set Units I and II
No ratings yet
Additional Problem Set Units I and II
8 pages
Worksheet 9 - Spring 2014 - Chapter 6 - Key
No ratings yet
Worksheet 9 - Spring 2014 - Chapter 6 - Key
8 pages
Worksheet 4
No ratings yet
Worksheet 4
7 pages
PDF Multilevel Statistical Models Wiley Series in Probability and Statistics 4th Edition Harvey Goldstein Download
100% (14)
PDF Multilevel Statistical Models Wiley Series in Probability and Statistics 4th Edition Harvey Goldstein Download
70 pages
ECO375H_Slides_4
No ratings yet
ECO375H_Slides_4
45 pages
Greykite Part 2
No ratings yet
Greykite Part 2
2 pages
Types of Claims
No ratings yet
Types of Claims
2 pages
Package Hmisc' - Harrell (2022)
No ratings yet
Package Hmisc' - Harrell (2022)
455 pages
Nanang Arifin - F0120088 - UTS EKONOMETRIKA II
No ratings yet
Nanang Arifin - F0120088 - UTS EKONOMETRIKA II
7 pages
Day 7 - Module Linear Regression
No ratings yet
Day 7 - Module Linear Regression
2 pages
Zuur Et Al 2009 BOOK - Chap01 - Introduction
No ratings yet
Zuur Et Al 2009 BOOK - Chap01 - Introduction
10 pages
SAT Problem-Solving and Data Analysis H
No ratings yet
SAT Problem-Solving and Data Analysis H
71 pages
Stats Cheatsheet Final
No ratings yet
Stats Cheatsheet Final
2 pages
Statistic Mini Project
No ratings yet
Statistic Mini Project
7 pages
Jurnal Zafran New
No ratings yet
Jurnal Zafran New
15 pages
Lecture 2. Relaxing The Assumptions of CLRM - 0
No ratings yet
Lecture 2. Relaxing The Assumptions of CLRM - 0
17 pages
MBAN-603DE - Decision Making Methods & Tools
No ratings yet
MBAN-603DE - Decision Making Methods & Tools
3 pages
Statistical Methods 4th Edition Donna Mohr All Chapters Instant Download
100% (3)
Statistical Methods 4th Edition Donna Mohr All Chapters Instant Download
37 pages
An Exploratory Study To Assess The Knowledge Among Mothers of Under Five Children Regarding Thumb Sucking in Selected Village of Moga
No ratings yet
An Exploratory Study To Assess The Knowledge Among Mothers of Under Five Children Regarding Thumb Sucking in Selected Village of Moga
58 pages
A Short-Term, Pattern-Based Model For Water-Demand Forecasting
No ratings yet
A Short-Term, Pattern-Based Model For Water-Demand Forecasting
12 pages
Lesson Plan in MSC 5-11: A Semi Detailed
100% (1)
Lesson Plan in MSC 5-11: A Semi Detailed
6 pages
Research Methods and Statistics for Public and Nonprofit Administrators 1st Edition Nishishiba Test Bank - Read Directly Or Download With One Click
100% (1)
Research Methods and Statistics for Public and Nonprofit Administrators 1st Edition Nishishiba Test Bank - Read Directly Or Download With One Click
41 pages
Definition: Order Statistics of A Sample
No ratings yet
Definition: Order Statistics of A Sample
11 pages
Instant ebooks textbook Statistical Design and Analysis of Clinical Trials Principles and Methods 1st Edition Weichung Joe Shih download all chapters
100% (3)
Instant ebooks textbook Statistical Design and Analysis of Clinical Trials Principles and Methods 1st Edition Weichung Joe Shih download all chapters
50 pages

Uploaded by

Uploaded by

Unit-6

• In Machine Learning, Linear Regression is a

Y = β0 + β1X1 + β2X2 + β3X3 + …… + βnXn

A variety of evaluation measures can be used to determine the strength

of any linear regression model. These assessment metrics often give

an indication of how well the model is producing the observed

Mean Square Error (MSE)

• Lower MAE value indicates better model performance. It is not

• MAPE is defined as the average absolute percentage

• Formula: 1/N Σ(|actual-predicted|/|actual|)*100

R squared measures the proportion of the

• AUC measures the area under the curve.

• Higher the AUC better the model.

• It quantifies the difference between predicted

• Lower log loss values indicate better

• For example, we have two classes Class 0 and Class 1 if

In multi-class classification, the goal is to classify the

• ROC Curve and AUC Score:

Multinomial logistic regression

Linear Regression Logistic Regression

Linear regression is used for solving regression

In this we predict the value of continuous

In this we find best fit line. In this we find S-Curve.

It required linear relationship between It not required linear relationship.

You might also like