0% found this document useful (0 votes)
24 views

MAP 716 Lecture 5 Multiple Regression

Multiple regression is an extension of simple linear regression that involves one response variable and two or more explanatory variables. It allows modeling of more complex relationships between variables by accounting for the effects of multiple explanatory variables simultaneously. The multiple regression model expresses the response variable as a linear combination of the predictor variables. Regression coefficients represent the effect of each predictor after controlling for all other variables in the model.

Uploaded by

josephnjenga142
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

MAP 716 Lecture 5 Multiple Regression

Multiple regression is an extension of simple linear regression that involves one response variable and two or more explanatory variables. It allows modeling of more complex relationships between variables by accounting for the effects of multiple explanatory variables simultaneously. The multiple regression model expresses the response variable as a linear combination of the predictor variables. Regression coefficients represent the effect of each predictor after controlling for all other variables in the model.

Uploaded by

josephnjenga142
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

5/23/2023

MAP 716: BIOSTATITSICS II AND COMPUTTING


Multiple Regression

Lecture 5: Multiple Regression • Simple linear regression describe the linear relation
between a response variable Y and a single explanatory
variable X
Dr Alice Lakati, PhD • Multiple regression is an extension to the case of one
Senior Lecturer
Amref International University
response variable and two more explanatory variables
• In multiple linear regression a linear model is fitted for
the response variable, which is expressed as a linear
combination of the predictors
• Y’=b0 + b1x1 + b2x2 +…….bkxk

MR…. Multiple Regression…..


• b0 is the value of Y when all of the independent variables • Multiple regression is performed for several reasons;
(X1 through Xk) are equal to zero, and b1 through bp are • The need to control or adjust for possible effects of
the estimated regression coefficients. “nuisance” explanatory variables-
• Each regression coefficient represents the change in Y • Can be used to adjust for confounding variable
relative to a one unit change in the respective independent • More explanatory variables may have a meaningful
variable. relationship with the response variable and these more
• In the multiple regression situation, b1, for example, is the complex relationship need to be investigated
change in Y relative to a one unit change in X1, holding all • It is almost always better to perform one comprehensive
other independent variables constant (i.e., when the analysis including all the relevant variables than a series of
remaining independent variables are held at the same value two way comparisons
or are fixed). • To analyze simultaneous effect of a number of categorical
variables. Is an alternative technique to analysis of variance
• Statistical tests can be performed to assess whether each • To predict a value of outcome variable
regression coefficient is significantly different from zero.

Multiple Regression…. Importance of predictors


• Multiple regression models can take various forms
• The regression coefficient bi represents the effects
of that explanatory variable after controlling for all
• Multiple linear regression the other predictors in the model
• Predictors all continuous and linearly related to the outcome
variable • The importance is each individual predictor is
• Polynomial regression tested by the t-test as for SLR
• Quadratic or higher order terms fitted • A confidence interval will give further information
• Analysis of covariance for the regression parameter
• Both continuous and categorical variables are included in the model • An ANOVA table can be obtained and the
• Analysis of variance significance may be assessed via F-test
• Predictors all categorical

1
5/23/2023

MR, Explanatory variables Example: Framingham Offspring Study

• First order model means linear in both X and B


• Suppose we want to assess the association
• The parameter βo is the intercept of the response surface between BMI and systolic blood pressure using
plane
data collected in the seventh examination, a total
• The parameter β1 is the change in µ per unit increase in X1 of n=3,539 participants attended the exam, and
while X2 is held constant. It is the effect of X1 on the mean
response.
their mean systolic blood pressure was 127.3 with a
standard deviation of 19.0.
• When the slope of a variable (say X1) does not depend on
the level of the other variable then the model is additive • The mean BMI in the sample was 28.2 with a
or not interactive standard deviation of 5.3.
• The parameter β1 and β2 are called partial regression
coefficients because the reflect the effect of one
explanatory variable when the other is held constant

MR
Regression coefficients- SLR
Suppose we now want to assess whether age (a continuous variable, measured in
Independent Regression Coefficient T P value
years), male gender (yes/no), and treatment for hypertension (yes/no) are potential
variable
confounders, and if so, appropriately account for these using multiple regression
Intercept 108.28 62.61 0.0001 (ANCOVA) analysis. For analytic purposes, treatment for hypertension is coded as
BMI 0.67 11.06 0.0001 1=yes and 0=no. Gender is coded as 1=male and 0=female (indicator variables )

Independent variable Regression coefficients T value P value


Intercept 68.15 26.33 0.0001
The regression coefficient associated with BMI is 0.67 suggesting that each one BMI 0.58 10.30 0.0001
unit increase in BMI is associated with a 0.67 unit increase in systolic blood
pressure. Age 0.65 20.22 0.0001
The association between BMI and systolic blood pressure is also statistically Male gender 0.94 1.58 0.1133
significant (p=0.0001) Treatment for 6.44 9.74 0.0001
hypertension

Indicator variables MR
• Independent variables can be qualitative e,g
gender (M/F), income group.
The multiple regression model is:
• Indicator or dummy variables are used to = 68.15 + 0.58 (BMI) + 0.65 (Age) + 0.94 (Male gender) + 6.44
quantify the effects of the levels or classes of (Treatment for hypertension).

a qualitative variable
• Take note that the association between BMI and
• If a qualitative variable has k levels, then K-1 systolic blood pressure is smaller (0.58 versus 0.67)
indicator variables will be created to after adjustment for age, gender and treatment for
hypertension.
represent that variable • BMI remains statistically significantly associated with
• An indicator variable has the form 1- if systolic blood pressure (p=0.0001), but the magnitude
of the association is lower after adjustment.
characteristic occurs and 0 if otherwise • The regression coefficient decreases by 13%.

2
5/23/2023

Interpretation of regression
MR coefficients
• In this case the true "beginning value" was 0.58,
and confounding caused it to appear to be 0.67. so • A one unit increase in BMI is associated with a 0.58 unit
the actual % change = 0.09/0.58 = 15.5%. increase in systolic blood pressure holding age, gender
and treatment for hypertension constant.
• Using the rule (i.e., a change in the coefficient in either • Each additional year of age is associated with a 0.65 unit
direction by 10% or more), we meet the criteria for increase in systolic blood pressure, holding BMI, gender
confounding.
and treatment for hypertension constant.
• Thus, part of the association between BMI and systolic
blood pressure is explained by age, gender and • Men have higher systolic blood pressures, by
treatment for hypertension approximately 0.94 units, holding BMI, age and
• It important to gender in the model even though it is treatment for hypertension constant and persons on
not significant treatment for hypertension have higher systolic blood
pressures, by approximately 6.44 units, holding BMI,
age and gender constant.

Interpretations :
Interpretations..
We can estimate the blood pressure of a 50 year old male, with a BMI of
25 who is not on treatment for hypertension as follows:
• The multiple regression equation can be used to
estimate systolic blood pressures as a function of a
participant's BMI, age, gender and treatment for
hypertension status.

Estimate the blood pressure of a 50 year old female, with a BMI of


25 who is on treatment for hypertension as follows

Interpretations : ANOVA table for MR:


We can estimate the blood pressure of a 50 year old male, with a BMI of Analysis of Variance Table
25 who is not on treatment for hypertension as follows:
Source of Sum of Degrees of Mean F Ratio p
Variation Squares Freedom Square

Regression Reg SS K Reg Reg


SS/K ms/resm
<or
s >
Residual SS res N-K-1 SS res/n-
We can estimate the blood pressure of a 50 year old female, with a k-1
BMI of 25 who is on treatment for hypertension as follows

Total N-1

3
5/23/2023

Fit of the Model Tutorial1


• R2 measure the usefulness or predictive value of the model • A researcher recruited 100 participants to perform
• R2 is interpreted as the proportion of the total variability a maximum VO2max test, but also recorded their
explained by the model "age", "weight", "heart rate" and "gender".
• But it increases in value as each additional variable is • Heart rate is the average of the last 5 minutes of a
added to the Model 20 minute, much easier, lower workload cycling test.
• Adjusted R2 (preferred measure) takes into account the
number of explanatory variables included in the model • The researcher's goal is to be able to predict
VO2max based on these four attributes: age, weight,
• Overall F-test from ANOVA table tests whether the
proportion of variation explained by the model is a heart rate and gender.
significant portion compared to the unexplained variation

Multiple Regression Outputs using


SPSS Questions
• 1. Interpret the results from ANOVA table (3 marks)

• 2. Explain the fitness of the model ( 3 marks)

• 3. Interpret the importance of each coefficient or


predictor ( 8 marks)

• 4. State the regression equation (4 marks)

• 5. Write a summary of the results ( 6 marks)

Assumptions Assumptions cont’


• Random sampling • Such assumptions are tested by;
• Observations must be independent • Assessing normality
• The relation between each of the explanatory • Obtaining scatter plots of Y or the residuals
variables and the outcome variable should be linear • Obtaining a plot of the standardized residuals against
the fitted values to assess the constant variance
• The values of the response variable Y should have a
normal distribution for specified values of the
explanatory variables
• The variability of Y should be the same for any set
of values of the explanatory variables-
Homoscedasticity

4
5/23/2023

Interactions Interactions: Example


• Interactions exist between 2 explanatory • In a chemical process the additive effects of 2
variables if the relationship between the mean drugs are not an accurate reflection of their
response and one explanatory variable is combined effect since catalytic effects are often
dependent on the value of the other explanatory present.
variable • A hospital administrator used data from 15
patients to examine the relationship between
the length of stay in hospital y(in days), the
age of patient X1 and previous admissions X2
• It would be appropriate to use the model
Y=βO + β 1x1 + β 2x2 +e

Interaction ,,, Interaction..


• To describe LOS data, because the equation Consider the model
E(Y)= (βO + β 1x1 )+ β 2x2 • Y=βO + β 1x1 + β 2x2 + β 3x1 X2 + e
• Assumes that for a fixed value of X1, the straight • This uses a cross product or interaction term 3x1 X2
line relating E(Y) to X2 has a slope B2 that is • The equation E(Y)= (βO + β 1x1 )+ (β 2 + β 3x1
independent of the fixed value of X1, ie for 2 )x2=(βO + β 2x2 )+(β 1 + β 3x2)x1
different values of X1, the slopes of the straight
lines relating E(Y) to X2 would be the same. • Assumes that for a fixed value of X1, the straight
line relating E(Y) to X2 has a slope β 2 + β 3x1 that is
• Similarly for E(Y)= (βO + β 2x2 )+ β 1x1 dependent upon the fixed value of X1.
• Such a model assumes no interaction exist • That is for two different values fixed values of X1,
between X1 and X2 the slopes of the straight line relation (E(Y) to X2
would be different

Interactions Interactions example..


• Conversely, the equation assumes that for a • The least squares estimates of β= (βO β 1 β 2 β 3) =
fixed value of X2, the straight line relating E(Y) βo= -15.88, β1=1.734, β2 =7.911, β3=-0.245.
to X1, has a slope β 1 + β 3x2, that is dependent
upon a fixed value of X2.
• That is, for 2 different values of X2, The slopes of • Suppose someone is 40 years old and has had
the straight line relating E(y) to X1 would be two previous admissions, determine the LOS.
different
• To fit the interaction model, define an extra
column X1X2= age*previous admins

5
5/23/2023

Advantages of fitting MR
Interactions example..
• The least squares estimates of β= (βO β 1 β 2 β 3) = 1. Since both type of patients assume equal
βo= -15.88, β1=1.734, β2 =7.911, β3=-0.245. slopes and the same error variance, the
common slope B1 can best be estimated by
• Suppose someone is 40 years old and has had pooling all the patients together
two previous admissions, determine the LOS.
2. Comparing different levels of qualitative
variables can be done by tests on regression
• 49.702 days coefficient B2
• 53.09 days 3. Inferences on Bo and B2 can be made more
• 69.057 days precisely since more degrees of freedom will
be associated with MSE
• 50.103 days
Interactions can be introduced into the model

You might also like