Lecture 09_02.09.2024_Regression-01
Lecture 09_02.09.2024_Regression-01
Regression - 1
Significance Testing
Residual Analysis
What is “Linear”?
• Remember this :
• Y = mX + B ?
• A slope of two implies that every one unit change in X yields two units change in Y.
Assumptions
• If you know something about X, this knowledge helps you predict something about
Y. (Sound familiar?…sound like conditional probabilities?)
• Expected value of y for given x is
Fixed Random
exactly error with
on the normal
line distribution
Signficance Testing
• Formula for the standard error of beta. We consider n-2 since we lose two degree of
freedom (i.e., the slope and y-intercept).
• Residual
• The residual for observation i, ei, is the difference between its observed and predicted
value
• Check the assumptions of regression by examining the residuals
• Examine for linearity assumption
• Examine for constant variance for all levels of X (homoscedasticity)
• Evaluate normal distribution assumption
• Evaluate independence assumption
Analysis For Linearity
Analysis For Homoscedasticity
Analysis For Independence
Overall Picture
Summary
• Modelling the relation between a scalar response (target/dependent variable) and one or
more explanatory (features/independent) variables
• The case of one explanatory variable is called simple linear regression
• It is important to assess the quality of fit using some statistical tools that were mentioned
“ How can we try to solve for
multilinear regression (more
than one feature) ?
Normal Equation
Computational Aspects
Normal Equation
Linear Regression:
𝑦𝑖 = 𝑓 𝑥𝑖 = 𝑏0 + 𝑏1 𝑥1 + 𝑏2 𝑥2 + . . . +𝑏𝑚 𝑥𝑚 + 𝜖𝑖
𝑚
= 𝑏0 ∗ 1 + 𝑥𝑖𝑗 𝑏𝑗 + 𝜖𝑖
𝑗=1
Derivation
• Find the optimal coefficients, which fit the model better, instead of
Solving the equation:
Derivation
• Can we use SVD or any other direct method to solve the normal equations?
Computational Complexity
• We can derive the normal equation by considering many features and many
training examples.
• Solving the normal equation is computationally expensive due to the calculation
of the inverse.
• Hence iterative methods are preferred to prevent large computation
overheads.
“
How do we perform linear regression
when the training data is large?
Gradient Descent
Intuition
Choosing hyperparameters
Intuition
R2
Mean Squared Error
• If yi is the target value and fi is the predicted value for the linear
regression fit on n samples, then the mean squared error for the fit
is given by -
• If yi is the target value and fi is the predicted value for the linear
regression fit on n samples, then the mean absolute error for the fit
is given by -
• If yi is the target value, then the mean target value for n samples is
• The residual sum of squares for the fit is proportional to the MSE
• What does R2 = 0 imply? Which one would you say is a better fit?
“ How can we fit non-linear data?
Non-Linear Regression
Polynomial Regression
• How to fit nonlinear data, that is, when the relation between the
target value and the features is nonlinear?
Quadratic Regression
• How about
• Is it still linear?
Quadratic Regression
• How about sample data with two features? How will you write the
target y as a function of features x1 and x2 for a quadratic regression
problem?
Higher-Order Polynomial Regression