STAT22209 - Chapter 03-Multiple Regression - 2022
STAT22209 - Chapter 03-Multiple Regression - 2022
curves, not straight lines. In these instances, the linear trend model does
• Where;
• numerical constants
• However, we can determine the values of the numerical constants
from the following three equations.
Second-degree Polynomial – quadratic Trend
Applications
• Fit a polynomial to the following data.
X Y
0 0
1 0
2 2
3 6
4 12
• However, we can determine the values of the
numerical constants from the following three
equations. X Y
0 0
1 0
2 2
3 6
4 12
Example
• Fit a polynomial to the following data.
quadratic Trend
• However, we can determine the values of the numerical constants
from the following three equations.
• the estimated quadratic regression equation I
• Example:
– revision time (measured in hours),
– intelligence (measured using IQ score),
– exam performance (measured from 0 to 100),
– weight (measured in kg)
Assumption #2:
• Continuous Variable –
• Examples:
– number of students present
– number of red marbles in a jar
– number of heads when flipping three coins
– students’ grade level
Categorical Data (Data that is not
numbers) : Nominal Variable
• Sometimes there is no hierarchy in categorical data.
• If eye colour was coded
– 0-- “Blue”
– 1 --“Green”
– 2 --“Brown”
Assumption #4:
• Data must not show multicollinearity, which occurs when
you have two or more independent variables that are highly
correlated with each other.
What is Multicollinearity?
The following data on 20 individuals with high blood pressure:
1. blood pressure (y = BP, in mm Hg)
• The high correlation among some of the predictors suggests that data-
based multicollinearity exists.
Assumption #5:
• There should be
– no significant outliers,
– high leverage points
– highly influential points.
• These different classifications of unusual points reflect the different
impact they have on the regression line.
What are outliers in the data?
• An outlier is an observation that lies an abnormal distance from other values
in a random sample from a population.
• The box plot is a useful graphical display for describing the behavior of the
data in the middle as well as at the ends of the distributions.
• The following quantities (called fences) are needed for identifying extreme
values in the tails of the distribution:
– lower inner fence: Q1 - 1.5*IQ
– upper inner fence: Q3 + 1.5*IQ
– lower outer fence: Q1 - 3*IQ
– upper outer fence: Q3 + 3*IQ
residuals), which you can easily check using the Durbin-Watson statistic
Assumption #6:
• There needs to be a linear relationship between
– the dependent variable and each of your independent
variables
Assumption #7:
• Finally, you need to check that the residuals (errors) are
approximately normally distributed
.
• Where is total production, is labor input, is total capital and
the information about each factor is given below for the 15
year period of 2001 to 2016.
Year 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
10 15 21 26 40 37 42 33 30 38 60 65 50 35 42
12 10 9 8 5 7 4 5 7 5 3 4 3 1 2
• By using above data, estimate the and
parameters of