Uploaded by

didihem948

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views

2.1 Regression Analysis

Uploaded by

didihem948

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 28

Regression Analysis

Dr. Tanu Shree

Explanation

 Regression analysis is a statistical method to model the

relationship between a dependent (target) and
independent (predictor) variables with one or more
independent variables.
 It predicts continuous/real values such as temperature,
age, salary, price, etc.
Example:
Suppose there is a marketing
company A, who does various
advertisement every year and
get sales on that. The below list
shows the advertisement made
by the company in the last 5
years and the corresponding
sales:
Regression:

 is a supervised learning technique

 used for prediction, forecasting, time series modeling, and

determining the causal-effect relationship between variables.

 "Regression shows a line or curve that passes through all the

datapoints on target-predictor graph in such a way that the
vertical distance between the datapoints and the regression
line is minimum."
Terminologies Related to the
Regression Analysis:
• Dependent Variable or target variable.
• Independent Variable or predictor.
• Outliers
• Multicollinearity: If the independent variables are highly correlated
with each other than other variables, then such condition is called
Multicollinearity. It should not be present in the dataset, because it
creates problem while ranking the most affecting variable.
• Underfitting and Overfitting: If our algorithm works well with the
training dataset but not well with test dataset, then such problem is
called Overfitting. And if our algorithm does not perform well even with
training dataset, then such problem is called underfitting.
Linear Regression
 Relationship between one dependent & two or more independent variables is a linear function
 For example, suppose that height was the only determinant of body weight. If we were to
plot height (the independent or 'predictor' variable) as a function of body weight (the
dependent or 'outcome' variable), we might see a very linear relationship, as illustrated
below.

 We could also describe this relationship with the equation for a line, Y = a + b(x), where
'a' is the Y-intercept and 'b' is the slope of the line. We could use the equation to predict
weight if we knew an individual's height. In this example, if an individual was 70 inches
tall, we would predict his weight to be:
Weight = 80 + 2 x (70) = 220 lbs.
Question: Find linear regression equation for the following
two sets of data:

 Solution :
Multiple linear regression

 In real estate, we can predict the selling price of a house based on

various factors such as area, number of bedrooms, number of floors, and
location. This is where multiple linear regression comes into play.
Multiple Linear Regression
where: Population Population Random
Y-intercept slopes error
0 = y-intercept {a
constant value}
1 = slope of Y with variable x1 Y  00  11X 11   22 X 22     PP X PP  
holding the variables x2,
x3, ..., xP effects constant Independent
Dependent
(explanatory)
P = slope of Y with variable xP (response) variables
holding all other variables’ variable
effects constant
Logistic Regression
• In classification problems, we have dependent variables in a
binary or discrete format such as 0 or 1.
• Logistic regression algorithm works with the categorical variable
such as 0 or 1, Yes or No, True or False, Spam or not spam, etc.
• It is a predictive analysis algorithm which works on the concept
of probability.
• Logistic regression is a type of regression, but it is different from
the linear regression algorithm in the term how they are used.
 Logistic regression uses sigmoid function or logistic
function which is a complex cost function. This sigmoid
function is used to model the data in logistic regression.
The function can be represented as:

•f(x)= Output between the 0 and 1 value.

•x= input to the function
•e= base of natural logarithm.
 There are three types of logistic
regression:
• Binary(0/1, pass/fail)
• Multi (cats, dogs, lions)
• Ordinal (low, medium, high)
 Logistic regression is often used
in healthcare to estimate binary
outcomes, like whether a patient
will develop a particular disease.
For example, we could use
logistic regression to predict the
likelihood of a patient having
diabetes based on factors like
age, BMI, family history, and
blood sugar levels.
Polynomial Regression
• Polynomial Regression is a type of regression which models the non-
linear dataset using a linear model.
• It is similar to multiple linear regression, but it fits a non-linear curve
between the value of x and corresponding conditional values of y.
• Suppose there is a dataset which consists of datapoints which are present
in a non-linear fashion, so for such case, linear regression will not best fit
to those datapoints. To cover such datapoints, we need Polynomial
regression.
• In Polynomial regression, the original features are transformed into
polynomial features of given degree and then modeled using a linear
model. Which means the datapoints are best fitted using a polynomial
line.
• The equation for polynomial
regression also derived from linear
regression equation that means
Linear regression equation Y= b0+
b1x, is transformed into Polynomial
regression equation
• Y= b0+b1x+ b2x2+ b3x3+.....+ bnxn.
• Here Y is the predicted/target
output, b0, b1,... bn are the
regression coefficients. x is
our independent/input variable.
• The model is still linear as the
coefficients are still linear with
quadratic
Difference between Polynomial Regression
and Multiple Linear Regression
 Lets start with Example:
 let’s take a little imaginative trip to Central Perk, the beloved coffee shop from
“Friends”.
Picture this: Ross, the paleontologist, is trying to figure out how the number of
coffees he drinks and the hours he spends at the museum affect his mood. He’s been
keeping a track of these three variables for a month.
 Ross decides to use Multiple Linear Regression. The independent variables are
“coffees drunk” and “hours at the museum”, and the dependent variable is “mood
score”, a rating Ross gives himself at the end of the day. Ross uses his stats software
(let’s assume Python) and finds that both the number of coffees and hours at the
museum significantly predict his mood score. Surprisingly, more coffee and less
time at the museum lead to a better mood score (Ross wonders if he should consider
a career change).
 Meanwhile, Chandler has been trying to understand his
relationship with Janice.
 He decides to analyze the pattern of their breakups and
makeups over the years. Because their relationship isn’t
exactly linear (it’s more of a roller-coaster), Chandler
decides to use Polynomial Regression. He finds that the
pattern of their relationship can be modeled quite well
with a cubic function. Chandler is amused — he always
knew his relationship was a little “cubed”.
Polynomial Regression
 Polynomial Regression is a form of regression analysis in which the
relationship between the independent variable x and the dependent
variable y is modelled as an nth degree polynomial.
 The real world is complex, and the relationship between variables in
real-world data can often be non-linear. When Linear Regression
fails to accurately capture the relationship between variables due to
its linearity, Polynomial Regression can be a good alternative
because it can model more complex relationships.
 For instance, a cubic regression uses three variables, X, X², and X³
as predictors.
 Pros:
1. Can model more complex, non-linear relationships between
variables.
2. Provides a more flexible curve that can fit the data better in many
cases.
 Cons:
1. Choosing the correct polynomial degree can be challenging. Too
low and the model underfits; too high and it overfits.
2. Polynomial regression models can be sensitive to the scale of the
dataset, and sometimes require scaling of the features.
Multiple Linear Regression
 Pros:
1. Allows for the analysis of the effects of multiple predictors on the
outcome.
2. Useful for real-world scenarios where multiple factors influence the
outcome.
 Cons:
1. Assumes a linear relationship between predictors and outcome, which
might not always hold true.
2. Multicollinearity, where predictors are correlated with each other, can
be a problem and make the model interpretation difficult.
3. Like other regression models, outliers can heavily influence the model.
Support Vector Regression
Below are some keywords which are used in Support Vector Regression:
• Kernel: It is a function used to map a lower-dimensional data into higher
dimensional data.
• Hyperplane: In general SVM, it is a separation line between two classes, but
in SVR, it is a line which helps to predict the continuous variables and cover
most of the datapoints.
• Boundary line: Boundary lines are the two lines apart from hyperplane,
which creates a margin for datapoints.
• Support vectors: Support vectors are the datapoints which are nearest to the
hyperplane and opposite class.
 The main goal of SVR is
to consider the
maximum datapoints
within the boundary
lines and the hyperplane
(best-fit line) must
contain a maximum
number of datapoints.
Decision Tree Regression
 solve both classification and
regression problems.
 Decision Tree regression builds a tree-
like structure in which each internal
node represents the "test" for an
attribute, each branch represent the
result of the test, and each leaf node
represents the final decision or result.
Ridge Regression
 statistical regularization technique.
 It corrects for overfitting on training data in machine learning models.
 Ridge regression is a procedure for eliminating the bias of coefficients
and reducing the mean square error by shrinking the coefficients of a
model towards zero in order to solve problems of overfitting or
multicollinearity that are normally associated with ordinary least
squares regression.
 Ridge regression specifically corrects for multicollinearity in
regression analysis.
•The amount of bias added to the model is known as Ridge
Regression penalty. We can compute this penalty term by
multiplying with the lambda to the squared weight of each
individual features.
•The equation for ridge regression will be:

•A general linear or polynomial regression will fail if there is high

collinearity between the independent variables, so to solve such
problems, Ridge regression can be used.
•Ridge regression is a regularization technique, which is used to
reduce the complexity of the model. It is also called as L2
regularization.
•It helps to solve the problems if we have more parameters than
samples.
Lasso Regression:

• Lasso regression is another regularization technique to reduce the

complexity of the model.
• It is similar to the Ridge Regression except that penalty term contains
only the absolute weights instead of a square of weights.
• Since it takes absolute values, hence, it can shrink the slope to 0,
whereas Ridge Regression can only shrink it near to 0.
• It is also called as L1 regularization. The equation for Lasso
regression will be:

Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
DA Unit-3
No ratings yet
DA Unit-3
13 pages
Unit-Iii-1 1
No ratings yet
Unit-Iii-1 1
31 pages
3 Unit - Dspu
No ratings yet
3 Unit - Dspu
23 pages
AAI Lecture 10 Sp 25
No ratings yet
AAI Lecture 10 Sp 25
37 pages
U3 U4 Regression
No ratings yet
U3 U4 Regression
22 pages
BDA Unit 4
No ratings yet
BDA Unit 4
144 pages
SML Updated UNIT 3
No ratings yet
SML Updated UNIT 3
41 pages
UNIT-2 ML
No ratings yet
UNIT-2 ML
39 pages
Machine Learning Class Slide
No ratings yet
Machine Learning Class Slide
44 pages
5_AML Lecture 5_Linear regression
No ratings yet
5_AML Lecture 5_Linear regression
56 pages
228w1f0065 ML
No ratings yet
228w1f0065 ML
15 pages
Regression in M.L
No ratings yet
Regression in M.L
13 pages
Regression Analysis Linear Multiple Logistic
No ratings yet
Regression Analysis Linear Multiple Logistic
25 pages
4 ML
No ratings yet
4 ML
41 pages
6 Regression Analysis
No ratings yet
6 Regression Analysis
12 pages
Regression: Unit Iii
No ratings yet
Regression: Unit Iii
54 pages
Regression Modelling
No ratings yet
Regression Modelling
25 pages
Unit 2
No ratings yet
Unit 2
19 pages
CH 03 Regression Techniques
No ratings yet
CH 03 Regression Techniques
74 pages
Unit - Iii Data Analysis
No ratings yet
Unit - Iii Data Analysis
39 pages
Machine Learning: Bilal Khan
100% (2)
Machine Learning: Bilal Khan
20 pages
L4a - Supervised Learning
No ratings yet
L4a - Supervised Learning
25 pages
Classical Machine Learning: Linear Regression: Ramesh S
No ratings yet
Classical Machine Learning: Linear Regression: Ramesh S
28 pages
Unit 2 Topic 1 REGRESSION
No ratings yet
Unit 2 Topic 1 REGRESSION
19 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Unit 2
No ratings yet
Unit 2
67 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
U-4_IML
No ratings yet
U-4_IML
17 pages
chp6 (10) fam
No ratings yet
chp6 (10) fam
24 pages
TSEGIII AI
No ratings yet
TSEGIII AI
6 pages
UNIT-2 NOTES
No ratings yet
UNIT-2 NOTES
30 pages
ML_Unit2
No ratings yet
ML_Unit2
69 pages
DMML Unit4
No ratings yet
DMML Unit4
77 pages
ML Using Python Unit3 pdf
No ratings yet
ML Using Python Unit3 pdf
8 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
unit-3
No ratings yet
unit-3
30 pages
Unit - 3 Machine Learning
No ratings yet
Unit - 3 Machine Learning
30 pages
LECTURE Regression
No ratings yet
LECTURE Regression
12 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
26 pages
6 ML Updated
No ratings yet
6 ML Updated
23 pages
(Unit-04) Part-01 - ML Algo
No ratings yet
(Unit-04) Part-01 - ML Algo
49 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
Supervised Machine Learning - Regression
No ratings yet
Supervised Machine Learning - Regression
34 pages
-18-Linear Regression
No ratings yet
-18-Linear Regression
29 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
DOC-20240831-WA0023.
No ratings yet
DOC-20240831-WA0023.
22 pages
TYPES OF SUPERVISED LEARNING2
No ratings yet
TYPES OF SUPERVISED LEARNING2
66 pages
Regression
No ratings yet
Regression
11 pages
Unit 3c Linear Regression
No ratings yet
Unit 3c Linear Regression
98 pages
MODULE 2
No ratings yet
MODULE 2
21 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Linear and Logistic Regression
No ratings yet
Linear and Logistic Regression
21 pages
Data Science
100% (1)
Data Science
14 pages
ML Unit-2 Final
No ratings yet
ML Unit-2 Final
32 pages
SIDDHANT VIJAY 2K20 CH 65 Sem 5
No ratings yet
SIDDHANT VIJAY 2K20 CH 65 Sem 5
29 pages
Da Unit-3
No ratings yet
Da Unit-3
27 pages
Linear Regression
No ratings yet
Linear Regression
8 pages
Unit - 2 MLA
No ratings yet
Unit - 2 MLA
57 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
REGULARIZATION TOOLS - A Matlab Package For Analysis and Solution of Discrete Ill-Posed Problems
No ratings yet
REGULARIZATION TOOLS - A Matlab Package For Analysis and Solution of Discrete Ill-Posed Problems
35 pages
Week 6
No ratings yet
Week 6
11 pages
EE2211_Past_Paper
No ratings yet
EE2211_Past_Paper
14 pages
Kernel Ridge Regression Classification
No ratings yet
Kernel Ridge Regression Classification
5 pages
Multivariate Short-Term Traffic Flow Prediction Based On Real-Time Expressway Toll Plaza Data Using Non-Parametric Techniques
No ratings yet
Multivariate Short-Term Traffic Flow Prediction Based On Real-Time Expressway Toll Plaza Data Using Non-Parametric Techniques
19 pages
Ch-2 Supervised Machine Learning
No ratings yet
Ch-2 Supervised Machine Learning
48 pages
Nptel ML Questions
No ratings yet
Nptel ML Questions
12 pages
Lecture Notes in Earth Sciences
No ratings yet
Lecture Notes in Earth Sciences
267 pages
Training Code
No ratings yet
Training Code
27 pages
Evaluating The Accuracy of Valuation Multiples On
No ratings yet
Evaluating The Accuracy of Valuation Multiples On
30 pages
SSRN Id4501707
No ratings yet
SSRN Id4501707
159 pages
007-Discrete Dynamics in Nature and Society - 2022 - Alkhammash - Optimized Multivariate Adaptive Regression Splines For
No ratings yet
007-Discrete Dynamics in Nature and Society - 2022 - Alkhammash - Optimized Multivariate Adaptive Regression Splines For
9 pages
Project Report Forest Fire Final
No ratings yet
Project Report Forest Fire Final
26 pages
Lecture 6 - Ridge Regression, Polynomial Regression (DONE!!) PDF
No ratings yet
Lecture 6 - Ridge Regression, Polynomial Regression (DONE!!) PDF
26 pages
Machine Learning Interview Questions PDF
No ratings yet
Machine Learning Interview Questions PDF
14 pages
4th Unit DL Final Class Notes (1)
No ratings yet
4th Unit DL Final Class Notes (1)
68 pages
PA 1 UNIT
No ratings yet
PA 1 UNIT
23 pages
Aruoba Drechsel
No ratings yet
Aruoba Drechsel
49 pages
Full download Inverse Problems: Basics, Theory and Applications in Geophysics 2nd Edition Mathias Richter pdf docx
100% (1)
Full download Inverse Problems: Basics, Theory and Applications in Geophysics 2nd Edition Mathias Richter pdf docx
55 pages
2009 Ridge Regression
No ratings yet
2009 Ridge Regression
8 pages
L2 Linear Regression
No ratings yet
L2 Linear Regression
61 pages
Ridge Regression LASSO
No ratings yet
Ridge Regression LASSO
18 pages
Inverse Problems with Applications in Science and Engineering 1st Edition Daniel Lesnic pdf download
100% (3)
Inverse Problems with Applications in Science and Engineering 1st Edition Daniel Lesnic pdf download
78 pages
2024 FRM Part 1 IFT Notes (Sample)
No ratings yet
2024 FRM Part 1 IFT Notes (Sample)
14 pages
Advanced Regression With JMP PRO Handout
No ratings yet
Advanced Regression With JMP PRO Handout
46 pages
Slide 1
No ratings yet
Slide 1
4 pages
Karthik Nambiar 60009220193
No ratings yet
Karthik Nambiar 60009220193
9 pages
Regression Techniques
No ratings yet
Regression Techniques
14 pages
Unit4 DL Final
No ratings yet
Unit4 DL Final
30 pages
Inverse Heat Conduction Ill Posed Problems 2nd Edition Hamidreza Najafi Keith A Woodbury Filippo De Monte James V Beck - Download the ebook now to start reading without waiting
100% (1)
Inverse Heat Conduction Ill Posed Problems 2nd Edition Hamidreza Najafi Keith A Woodbury Filippo De Monte James V Beck - Download the ebook now to start reading without waiting
76 pages

Uploaded by

Uploaded by

Regression Analysis

Dr. Tanu Shree

 Regression analysis is a statistical method to model the

 is a supervised learning technique

 used for prediction, forecasting, time series modeling, and

 "Regression shows a line or curve that passes through all the

 In real estate, we can predict the selling price of a house based on

•f(x)= Output between the 0 and 1 value.

•A general linear or polynomial regression will fail if there is high

• Lasso regression is another regularization technique to reduce the

You might also like