0% found this document useful (0 votes)

30 views

UNIT-2 ML

Uploaded by

Varsha Saxena

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views

UNIT-2 ML

Uploaded by

Varsha Saxena

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

•UNIT # 2

Linear Regression
• It’s a supervised learning technique.
• Regression is a technique to model the predictive relationship between one or more
independent variables and one dependent variable.
• The objective of regression is to find the best fitting curve for dependent variable
which is the function of one or more independent variables. If independent variable
is one then its called 1 dimensional regression analysis and if independent variables
are more than one then its called Multidimensional regression analysis.
• The curve could be a straight line or Non linear curve.
• The quality of fitting of the curve is measured by calculating coefficient of
correlation(r). Depending on the value of r the quality of regression is declared.
• Coefficient of correlation is the square root of the amount of variance given by the
curve.
Possible Cases
• For the given cases two variables x & y are shown. In the first case as x increases y also
increases and a straight line can be drawn which fits the curve.
• In the second case, as x increases y decreases. Here also a straight line can be drawn
which will perfectly fit with the data set.
• In third case a straight line cannot be drawn but a curve is drawn to fit the given data set.
Its called curvilinear regression analysis. Fourth case is also an example of curvilinear
regression analysis.
• In the last two cases neither a straight line nor a curve can be drawn. So no relationship
can be established.
Correlation
• For two numeric variables x and y, correlation can be calculated as

• 𝑥ഥ is the mean of x and 𝑦ത is the mean of y.

• Correlation defines the kind and strength of relationship developed between two
variables.
• It’s a quantitative measure which that is measured in the range from 0 to 1.
• A correlation of 1 indicates perfect relationship and correlation of 0 indicates no
relationship.
Coefficient of correlation
• The relationship can be positive or it can be an inverse relationship i.e.
variables can move in the same direction or in the opposite direction.
• So better measure is the correlation coefficient instead of correlation.
Correlation coefficient is the square root of the correlation. It varies in the
range of -1 to +1.
• 1 shows perfect relationship in the same direction and -1 shows the
perfect relationship in the opposite direction. 0shows no relationship.
Steps of Regression Analysis

• List all the variables available for making the model.

• Establish a dependent variable of interest.
• Examine the relationship between variables of interest.
• Find a way to predict dependent variable using other variables.
Linear Regression Model
Multiple Regression Model
Example 1 of Regression equation to predict
Glucose level

S.No. Age (x) Glucose Level (y)

1 43 99
2 21 65
3 25 79
4 42 75
5 57 87
6 59 81
7 55 ?
Linear Regression Equation, Yi = b0+b1Xi
where bo and b1 can be computed as follows:
Step 1: Calculate the values of XY X2
S.No. Age (X) Glucose Level (Y) XY X2

1 43 99 4257 1849
2 21 65 1365 441
3 25 79 1975 625
4 42 75 3150 1764
5 57 87 4959 3249
6 59 81 4779 3481
Sum 247 486 20485 11409
Step 2: Find the value of b0 and b1
Step 3: Insert the values in the equation
Yi = b0+b1Xi
Y = 65.14+0.385225X

Step 4: Prediction of the y for the given value of x =55

Y= 65.14+0.385225*55
= 86.327
So the glucose level predicted for the age of 55 is 86.327
2. Find the linear regression equation for the
following data set
S.No. x y
1 2 3
2 4 7
3 6 5
4 8 10
Sum 20 25
S.No. x y X2 xy
1 2 3 4 6
2 4 7 16 28
3 6 5 36 30
4 8 10 64 80
Sum 20 25 120 144

4∗144−20∗25
• b1(slope)= =0.95
4∗120−400
25∗120−20∗144
• bo= =1.5
4∗120−20∗20

• y=0.95x+1.5
Coefficient of determination ( R2 )
• Determination coefficient tells the goodness of the fit of the model. The
value of R2 lies in range of 0-1.
• Where y is the observed output , yp is the predicted output and y is
the mean value of observed output.

• In the given figure R2 =0.9 tells the model is a good fit model as the
observed values and the predicted values are almost equal or have
small error
Model with R2 =0.2 tells the model is not a good fit model as the observed values and the
predicted values are far away from each other or have large error. Model is not able to fit
the data points
For the dataset given in example 2 determine the value of R2
• y=0.95x+1.5 (regression model)
• 𝑦=6.25
ത (mean of y)

S.No. x y yp y-yp (y-yp)2 (y-𝑦)

ത (y-𝑦)
ത2
1 2 3 3.4 -0.4 0.16 -3.25 10.562
2 4 7 5.3 1.7 2.89 0.75 0.562
3 6 5 7.2 -2.2 4.84 -1.25 1.562
4 8 10 9.1 0.9 0.81 3.75 14.062
Sum 20 25 25 8.7 26.748

8.7
R2 = 1- = 0.67
26.748
As determination coefficient is close to 1, so predicted model is good.
3. The values of x and their corresponding y is given in the table

(i) Find the regression line for the given data points
(ii) Check wether it is a best fit line or not

S.No. x y
1 1 3
2 2 4
3 3 2
4 4 4
5 5 5
Finding the best fit line:

• When working with linear regression, our main goal is to find the
best fit line that means the error between predicted values and
actual values should be minimized. The best fit line will have the
least error.
• The different values for weights or the coefficient of lines (b0, b1)
gives a different line of regression, so we need to calculate the best
values for b0 and b1 to find the best fit line, so to calculate this we
use cost function.
Cost function
• The different values for weights or coefficient of lines (b0, b1) gives the different line of
regression, and the cost function is used to estimate the values of the coefficient for the
best fit line.
• Cost function optimizes the regression coefficients or weights. It measures how a linear
regression model is performing.
• We can use the cost function to find the accuracy of the mapping function, which
maps the input variable to the output variable. This mapping function is also known
as Hypothesis function.
• For Linear Regression, we use the Mean Squared Error (MSE) cost function, which
is the average of squared error occurred between the predicted values and actual
values. It can be written as:
• For the above linear equation, MSE can be calculated as:
Gradient Descent

• Gradient descent is an optimization algorithm used to find the values of

parameters of a function that minimizes a cost function. It is an iterative
algorithm. We use gradient descent to update the parameters of the model.
Parameters refer to coefficients in Linear Regression and weights in neural
networks.
• Gradient descent is used to minimize the MSE by calculating the gradient of the
cost function.
• A regression model uses gradient descent to update the coefficients of the line by
reducing the cost function.
• It is done by a random selection of values of coefficient and then iteratively
update the values to reach the minimum cost function.
Multiple Linear Regression
• In the previous topic, we have learned about Simple Linear Regression, where a
single Independent/Predictor(X) variable is used to model the response variable
(Y). But there may be various cases in which the response variable is affected by
more than one predictor variable; for such cases, the Multiple Linear Regression
algorithm is used.
• Moreover, Multiple Linear Regression is an extension of Simple Linear regression
as it takes more than one predictor variable to predict the response variable.
• For MLR, the dependent or target variable(Y) must be the continuous/real, but the
predictor or independent variable may be of continuous or categorical form.
• Each feature variable must model the linear relationship with the dependent
variable.
• MLR tries to fit a regression line through a multidimensional space of data-points.
Assumptions for Multiple Linear Regression:
• A linear relationship should exist between the Target and predictor
variables.
• The regression residuals must be normally distributed.
• MLR assumes little or no multicollinearity (correlation between the
independent variable) in data.
Logistic Regression
• Logistic regression is one of the most popular Machine Learning algorithms, which comes
under the Supervised Learning technique. It is used for predicting the categorical dependent
variable using a given set of independent variables.
• Logistic regression predicts the output of a categorical dependent variable. Therefore the
outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or
False, etc. but instead of giving the exact value as 0 and 1, it gives the probabilistic values
which lie between 0 and 1.
• Logistic Regression is much similar to the Linear Regression except that how they are
used. Linear Regression is used for solving Regression problems, whereas Logistic
regression is used for solving the classification problems.
• In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic
function, which predicts two maximum values (0 or 1).
• The curve from the logistic function indicates the likelihood of something such as whether
the cells are cancerous or not, a mouse is obese or not based on its weight, etc.
• Logistic Regression is a significant machine learning algorithm because it has the ability to
provide probabilities and classify new data using continuous and discrete datasets.
Representation of S (sigmoid function)
Logistic Regression is a Machine Learning algorithm which is used for the
classification problems, it is a predictive analysis algorithm and based on the concept of
probability. Here we use sigmoid function to map predictions to probabilities as given
below.

Where:
y = β0 + β1x (in case of univariate Logistic regression)
y = β0 + β1x1 + β2x2 … +βnxn (in case of multivariate logistic regression)

Univariate Logistic Regression means the output variable is predicted using only one
predictor variable, while Multivariate Logistic Regression means output variable is
predicted using multiple predictor variables.
The logistic regression function converts the values of logits also called log-odds that
range from −∞ to +∞ to a range between 0 and 1.
Now let us try to simply what we said. Let P be the probability of occurrence of an event.
So probability the event will not occur is 1-P.
Odds is defined as the ratio of the probability of occurrence of a particular event to the
probability of the event not occurring.

We know that logistic regression function gives us probability value. So we can write :
Now since we mentioned log odds, let us take the natural log of both sides of
the Odds equation and substitute the value of P.

Thus we get a more simplified form of logistic regression function equation and we
can say that log odds has linear relationship with the predictor variable x.
Maximum Likelihood Estimation

In order that our model predicts output variable as 0 or 1, we need

to find the best fit sigmoid curve, that gives the optimum values of
beta coefficients. That is we need to create an efficient boundary
between the 0 and 1 values.
Now a cost function tells you how close your values are from
actual. So here we need a cost function which maximizes the
likelihood of getting desired output values. Such a cost function is
called as Maximum Likelihood Estimation (MLE) function.
For points to be 0, we need the probabilities P1, P2 and P4 to be as minimum as possible and for
points to be 1, we need the probabilities P3, P5, P6 and P7 to be as high as possible, for correct
classification.
We can also say that (1-P1), (1-P2), P3, (1-P4), P5, P6 and P7 should be as high as possible.
The joint probability is nothing but the product of probabilities.
So the product :[ (1-P1)*(1-P2)* P3*(1-P4)*P5*P6*P7 ] should be maximum.
This joint probability function is nothing but our cost function which should be maximized in order to
get a best fit sigmoid curve. Or we can say predicted values to be close to the actual values.
Sigmoid Function
• The sigmoid function is a mathematical function used to map the predicted values to
probabilities.
• It maps any real value into another value within a range of 0 and 1.
• The value of the logistic regression must be between 0 and 1, which cannot go beyond this
limit, so it forms a curve like the "S" form. The S-form curve is called the Sigmoid
function or the logistic function.
• In logistic regression, we use the concept of the threshold value, which defines the
probability of either 0 or 1. Values above the threshold value tends to 1, and a value below
the threshold values tends to 0.
Assumptions for Logistic Regression:
• The dependent variable must be categorical in nature.
Type of Logistic Regression:
• On the basis of the categories, Logistic Regression can be classified into
three types:
• Binomial: In binomial Logistic regression, there can be only two possible
types of the dependent variables, such as 0 or 1, Pass or Fail, etc.
• Multinomial: In multinomial Logistic regression, there can be 3 or more
possible unordered types of the dependent variable, such as "cat", "dogs", or
"sheep"
• Ordinal: In ordinal Logistic regression, there can be 3 or more possible
ordered types of dependent variables, such as "low", "Medium", or "High".
Advantages & Disadvantages of Logistic Regression

• The main advantage of logistic regression is that it is much easier to

set up and train than other machine learning and AI applications.
• Another advantage is that it is one of the most efficient algorithms
when the different outcomes or distinctions represented by the data
are linearly separable. This means that you can draw a straight line
separating the results of a logistic regression calculation.
• One of the biggest attractions of logistic regression for statisticians
is that it can help reveal the interrelationships between different
variables and their impact on outcomes.
Logistic regression vs. linear regression

• The main difference between logistic and linear regression is that logistic regression
provides a constant output, while linear regression provides a continuous output.
• In logistic regression, the outcome, or dependent variable, has only two possible
values. However, in linear regression, the outcome is continuous, which means that it
can have any one of an infinite number of possible values.
• Logistic regression is used when the response variable is categorical, such as yes/no,
true/false and pass/fail. Linear regression is used when the response variable is
continuous, such as hours, height and weight.
• For example, given data on the time a student spent studying and that student's exam
scores, logistic regression and linear regression can predict different things.
• With logistic regression predictions, only specific values or categories are allowed.
Therefore, logistic regression predicts whether the student passed or failed. Since
linear regression predictions are continuous, such as numbers in a range, it can predict
the student's test score on a scale of 0 to100.

Linear Regression
No ratings yet
Linear Regression
16 pages
Examinations: 18 April 2000 (Am)
No ratings yet
Examinations: 18 April 2000 (Am)
205 pages
Regression: Unit Iii
No ratings yet
Regression: Unit Iii
54 pages
Unit 2
No ratings yet
Unit 2
19 pages
SML Updated UNIT 3
No ratings yet
SML Updated UNIT 3
41 pages
DA Unit-3
No ratings yet
DA Unit-3
13 pages
2.1 Regression Analysis
No ratings yet
2.1 Regression Analysis
28 pages
Unit-Iii-1 1
No ratings yet
Unit-Iii-1 1
31 pages
Regression Modelling
No ratings yet
Regression Modelling
25 pages
UNIT 2 Machine Learning BCAI601BCDS062.pptx
No ratings yet
UNIT 2 Machine Learning BCAI601BCDS062.pptx
244 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
ML_Unit2
No ratings yet
ML_Unit2
69 pages
UNIt-3 TY
No ratings yet
UNIt-3 TY
67 pages
L4a - Supervised Learning
No ratings yet
L4a - Supervised Learning
25 pages
Machine Learning and Deep Learning Course
No ratings yet
Machine Learning and Deep Learning Course
23 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
APznzaaV-S8wLPGsP_Add8mCHq3JcpXzeJ180tg4GWAcHx6DAgMVD3eyvT5dWstrOMVpGkO6YPvB6EzW3QMZ2MOlHap6AIHzt5bF4qrpZ6P5COArRIkGSOpTA3irJqdWr5VzZJgsslAEoNck-7XB6goMBGQ2C1xBIjiLrywLxqEZfdK9zE3-of9LPSjsbB_QkInc2mquD_oyBRUUJcHri
No ratings yet
APznzaaV-S8wLPGsP_Add8mCHq3JcpXzeJ180tg4GWAcHx6DAgMVD3eyvT5dWstrOMVpGkO6YPvB6EzW3QMZ2MOlHap6AIHzt5bF4qrpZ6P5COArRIkGSOpTA3irJqdWr5VzZJgsslAEoNck-7XB6goMBGQ2C1xBIjiLrywLxqEZfdK9zE3-of9LPSjsbB_QkInc2mquD_oyBRUUJcHri
199 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Unit 2 Topic 1 REGRESSION
No ratings yet
Unit 2 Topic 1 REGRESSION
19 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Unit-3 Data Analysis
No ratings yet
Unit-3 Data Analysis
36 pages
Linear Regression
No ratings yet
Linear Regression
16 pages
AAI Lecture 10 Sp 25
No ratings yet
AAI Lecture 10 Sp 25
37 pages
Regression
No ratings yet
Regression
11 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
AI - Mod 5. Part 3
No ratings yet
AI - Mod 5. Part 3
26 pages
Linear regression for machine learning
No ratings yet
Linear regression for machine learning
9 pages
5_AML Lecture 5_Linear regression
No ratings yet
5_AML Lecture 5_Linear regression
56 pages
Unit2-Regression NGP
No ratings yet
Unit2-Regression NGP
81 pages
ML - LAB - BE CSE (DS) Final
No ratings yet
ML - LAB - BE CSE (DS) Final
110 pages
3CP10 Final MJJ Linear Regression
No ratings yet
3CP10 Final MJJ Linear Regression
68 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
Data Science
100% (1)
Data Science
14 pages
Unit 2 ML
No ratings yet
Unit 2 ML
201 pages
DS Unit-Iv
No ratings yet
DS Unit-Iv
34 pages
Cs3351 Aiml Unit 3 Notes Eduengg
No ratings yet
Cs3351 Aiml Unit 3 Notes Eduengg
38 pages
Unit III
No ratings yet
Unit III
18 pages
6 ML Updated
No ratings yet
6 ML Updated
23 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
Regression in M.L
No ratings yet
Regression in M.L
13 pages
BDA Unit 4
No ratings yet
BDA Unit 4
144 pages
Regression
No ratings yet
Regression
4 pages
Teit ML2
No ratings yet
Teit ML2
11 pages
Da Unit-3
No ratings yet
Da Unit-3
27 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
Chapter4_Regression.docx
No ratings yet
Chapter4_Regression.docx
15 pages
-18-Linear Regression
No ratings yet
-18-Linear Regression
29 pages
Linear and Logistic Regression
No ratings yet
Linear and Logistic Regression
21 pages
Regression
No ratings yet
Regression
14 pages
chapter 8
No ratings yet
chapter 8
39 pages
Simple Linear and Logistic Regression
No ratings yet
Simple Linear and Logistic Regression
81 pages
MachineLearning Unit II
No ratings yet
MachineLearning Unit II
45 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
CH 03 Regression Techniques
No ratings yet
CH 03 Regression Techniques
74 pages
CS3351 AIML UNIT 3 NOTES
No ratings yet
CS3351 AIML UNIT 3 NOTES
32 pages
Machine Learning Class Slide
No ratings yet
Machine Learning Class Slide
44 pages
Lecture 3.1
No ratings yet
Lecture 3.1
21 pages
Classical Machine Learning: Linear Regression: Ramesh S
No ratings yet
Classical Machine Learning: Linear Regression: Ramesh S
28 pages
MachineLearning_Unit-II
No ratings yet
MachineLearning_Unit-II
45 pages
DMML Unit4
No ratings yet
DMML Unit4
77 pages
Pre-Calculus Essentials
From Everand
Pre-Calculus Essentials
Ernest Woodward
No ratings yet
UNIT 2
No ratings yet
UNIT 2
33 pages
UNIT 4
No ratings yet
UNIT 4
35 pages
CLOUD COMPUTING UNIT-4 (1)
No ratings yet
CLOUD COMPUTING UNIT-4 (1)
34 pages
Machine Learning Assignment-1
No ratings yet
Machine Learning Assignment-1
2 pages
Python Notes of Unit-3
No ratings yet
Python Notes of Unit-3
30 pages
Run Off Triangles - Bornhuetter
No ratings yet
Run Off Triangles - Bornhuetter
1,030 pages
Actuary- SOA.pdf
No ratings yet
Actuary- SOA.pdf
17 pages
MI_Unit 5
No ratings yet
MI_Unit 5
72 pages
IAS 19 Employee Benefits Student
No ratings yet
IAS 19 Employee Benefits Student
40 pages
PUBH5010 L3 Measures of Association - Prerecorded - Slides
No ratings yet
PUBH5010 L3 Measures of Association - Prerecorded - Slides
61 pages
Using Multivariate Statistics Barbara G. Tabachnick 2024 scribd download
100% (3)
Using Multivariate Statistics Barbara G. Tabachnick 2024 scribd download
77 pages
CH 11 Slides
No ratings yet
CH 11 Slides
41 pages
Commerce Course 2025
No ratings yet
Commerce Course 2025
32 pages
DR Tutsln A5
No ratings yet
DR Tutsln A5
7 pages
001
No ratings yet
001
7 pages
Model Form of Complaint
No ratings yet
Model Form of Complaint
186 pages
FDSA UNIT 5
No ratings yet
FDSA UNIT 5
48 pages
Business Analytics 2nd Edition Evans Test Bankdownload
100% (6)
Business Analytics 2nd Edition Evans Test Bankdownload
45 pages
Taxguru - In-Accounting Requirement For Employee Benefit Plans - AS 15 Amp IndAS 19
No ratings yet
Taxguru - In-Accounting Requirement For Employee Benefit Plans - AS 15 Amp IndAS 19
4 pages
Problem Set 2 SOLUTIONS
No ratings yet
Problem Set 2 SOLUTIONS
9 pages
Man203 Chapter 2 Simplex Method
No ratings yet
Man203 Chapter 2 Simplex Method
5 pages
Econ 326 2024 Group Assignment
No ratings yet
Econ 326 2024 Group Assignment
2 pages
CENSUS: 2011: The Study of Population Is Called Demography. Population Trend in India
No ratings yet
CENSUS: 2011: The Study of Population Is Called Demography. Population Trend in India
3 pages
A STUDY ON CALCULATION OF RESERVES FOR JOINT WHOLE LIFE INSURANCE POLICY
No ratings yet
A STUDY ON CALCULATION OF RESERVES FOR JOINT WHOLE LIFE INSURANCE POLICY
9 pages
ML-UNIT-IV - Complete
No ratings yet
ML-UNIT-IV - Complete
42 pages
Tabel Time Value of Money PDF
No ratings yet
Tabel Time Value of Money PDF
10 pages
Download ebooks file Regression Analysis A Practical Introduction 2nd Edition Jeremy Arkes all chapters
100% (1)
Download ebooks file Regression Analysis A Practical Introduction 2nd Edition Jeremy Arkes all chapters
50 pages
TP Regression
100% (1)
TP Regression
1 page
Dougherty C12G02 2016 05 22
No ratings yet
Dougherty C12G02 2016 05 22
18 pages
4.85 Actuarial Science
No ratings yet
4.85 Actuarial Science
9 pages
Iogi2018,+8 3 +Charles+Willson +08 +OK
No ratings yet
Iogi2018,+8 3 +Charles+Willson +08 +OK
9 pages
Assignment 5
No ratings yet
Assignment 5
6 pages
1231.3389.01 - Risk Measurement in Insurance: (Prerequisite: Principles of Finance) Semester A - 2014/15
0% (1)
1231.3389.01 - Risk Measurement in Insurance: (Prerequisite: Principles of Finance) Semester A - 2014/15
3 pages
Module4 Measuring Disease Part II
No ratings yet
Module4 Measuring Disease Part II
38 pages

Uploaded by

Uploaded by

•UNIT # 2

• 𝑥ഥ is the mean of x and 𝑦ത is the mean of y.

• List all the variables available for making the model.

S.No. Age (x) Glucose Level (y)

Step 4: Prediction of the y for the given value of x =55

S.No. x y yp y-yp (y-yp)2 (y-𝑦)

• Gradient descent is an optimization algorithm used to find the values of

In order that our model predicts output variable as 0 or 1, we need

• The main advantage of logistic regression is that it is much easier to

You might also like