0% found this document useful (0 votes)
33 views

Module 8 Updated April 3 2018

This document provides an overview of linear regression analysis and correlation. It discusses evaluating the relationship between an outcome and one or more exposures. Key topics covered include correlation analysis, simple linear regression, multiple linear regression, and variable selection in multiple regression models. Examples are provided to demonstrate correlation analysis and interpreting linear regression results.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Module 8 Updated April 3 2018

This document provides an overview of linear regression analysis and correlation. It discusses evaluating the relationship between an outcome and one or more exposures. Key topics covered include correlation analysis, simple linear regression, multiple linear regression, and variable selection in multiple regression models. Examples are provided to demonstrate correlation analysis and interpreting linear regression results.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

MPH5041/6041

Introductory Biostatistics
1

Tutorial
Module 8: Describing the Relationship Between
Events and Exposures.

Prepared by:
Baki Billah

Objectives

Objectives: To evaluate the association between outcome


and one or more exposures.
Key Topics:
Correlation analysis.
Simple regression analysis.
Multiple regression analysis.
Selection of significant variables in multiple regression
model.
Outcome variable: Continuous
Exposure variable: Any type

Review: Correlation

Correlation: Measure the association between outcome


and exposure,
Notation: denoted by “r”, which lies b/w -1 and +1.
Key points (see page 158):
r = +1
r = -1
r=0
r = +ve higher
r = -ve higher
Significance test: Null/Alternative hypotheses
Discussion & conclusion:

1
Correlation Analysis
4

The following graph/table


shows the correlation
analysis for birth weight and
oestriol level of pregnant
women near full term.
a) Discuss the scatter-plot,
b) Interpret the correlation
co-efficient,
c) State the hypothesis,
d) Discuss the significance
of relation,
e) Make a conclusion

Correlation Analysis
5

a) Discuss the scatter-plot,


The scatter-plot shows that birthweight and oestriol level are positively
correlated (up-ward trend). That is, birthweight increases if oestriol
level increases. There is an approximate liner relationship between
birthweight and oestriol level.

Correlation Analysis
6

b) Interpret the correlation co-efficient,


The Pearson correlation coefficient, r, is 0.617. This means that there is a
positive weak to moderate relationship between birthweight and oestriol
level.

c) State the hypothesis,


Null: There is no association between birthweight and oestriol level in the
population.
Alternative: There is an association between birthweight and oestriol level in
the population.

2
Correlation Analysis
7

d) Discuss the significance of relation


P-value is less than 0.001. Therefore we have enough evidence to reject the null hypothesis and
favour the alternative hypothesis that there is a statistically significant association between
birthweight and oestriol level in the population.

e) Make a conclusion
Birthweight has a positive moderate correlation with oestriol level.

Review: Regression

Assumptions (see page 164):


Normality.
Linearity.
Constant variability.
Presence of Outliers.

Review: Regression

Intercept “a”, constant/baseline effect


Beta co-efficient “b”, can take any positive/negative value,
Beta coefficient determines the magnitude of relationship b/w outcome and
exposure,
Key points (see page 163): Model adequacy
b = +ve, outcome increases as the R-squared and adjusted R-squared,
covariate increases Residual plot
b = -ve, outcome decreasing as the NPP
covariate increases
b = 0, covariate has no effect on the
outcome
b = +ve higher, strong positive effect of
the covariate on the outcome
b = -ve higher, strong negative effect of
the covariate on the outcome

3
Review: Regression Steps to follow

10

Objectives
Assumptions
Hypothesis
Statistical method(s)
Interpret beta coefficient and 95% CI
Discuss model adequacy
Discussion of Results & Conclusion

Exercise 8.1

11

The following tables and graphs show the linear regression analysis of husband’s SBP on
wife’s SBP (obtained using SPSS) for SBP of husband and wife pair data in Table 8.1.
Note: outcome: husband, exposure: wife.
a) State the objective of the study.
b) State the assumptions related to
linear regression analysis and
discuss how would you evaluate
them?
c) State the hypotheses for testing
the relationship between husband-
wife SBP.
d) Interpret the beta co-efficient and
its 95% CI.
e) Discuss the results presented in the
above tables and graphs and make
a summary conclusion

Exercise 8.1

12

a) Objectives: To evaluate the relationship between husband-wife SBP pairs.

b) Assumptions
The outcome variable follows the normal distribution (for example, the husband’s SBP follows the
normal distribution). This assumption can be evaluated by using histogram or box-plot or QQ plot
or Shapiro Wilks test.

The relationship between the outcome and covariate is linear.

The variability in the outcome remains the same across different values of the covariate.

c) Hypotheses
The null hypothesis to be tested is that there is no association between the husband’s and wife’s
SBPs in the population.
The alternative hypothesis is that there is an association between the husband’s SBP and wife’s
SBP.

d) Interpret the beta co-efficient and its 95% CI.


• The beta coefficient is 1.061, which means that the husband’s SBP increases by 1.061mmHg for
every 1mmHg increase in wife’s SBP. It can be said with 95% confidence that for every 1mmHg
increase in wife’s SBP, husband’s SBP will increase between 0.655 to 1.467mmHg.

4
Exercise 8.1

13

d) Discussion and Conclusion:


There is a positive relationship b/w husband-wife SBP (b=1.061) and the relationship is significant
because the p-value < 0.001 and 95% CI is 0.655 to 1.467mmHg, and does not include zero.

R2 = 0.605. This implies that 60.5% of the


variation among the observed values of
husband’s SBP is explained by wife’s SBP.
Note: always use Adjusted R-squared

The data follows the normal distribution if the


dots are clustered around a straight line passing
through the origin at the first quadrant. The data
is skewed if the dots are scattered at either tail.
The normal probability plot for husband’s SBP
data shows that the data is approximately
normal.

Exercise 8.1

14

The residual plot does not show any pattern or trend, so the relationship between husband’s
SBP and wife’s SBP is linear. Also, more than 95% of the residuals fall within +/-2 of the zero
line, which shows that the residual/outcome follows the normal distribution. Further, the
residual plot does not show any outliers or extreme values in the data.

Exercise 8.1

15

Summary of part (d) above:

Discussion: The beta coefficient is 1.061. The relationship is positive and statistically significant because
the p-value < 0.001 and 95% CI is 0.655 to 1.467mmHg, and does not include zero. The residual plot does
not show any outliers in the data set. Also the normal probability plot confirms the normality of husband’s
SBP, the outcome variable. Furthermore, 60.5% of the variation among the observed values of husband’s
SBP is explained by wife’s SBP, which shows that wife’s SBP is a strong predictor for husband’s SBP.

Conclusion: Husband’s SBP increases as wife’s SBP increases.

5
Exercise 8.2 – In-Tutorial Activity

16

The following SPSS output shows the simple regression analysis for systolic
blood pressure (SBP) and patients’ age in the Los Angles blood pressure study
data.

a) State the assumptions related to linear regression analysis and discuss


how would you evaluate them?
b) State the hypotheses for testing the relationship between SBP and AGE.
c) Interpret the beta co-efficient and its 95% CI.
d) Discuss the results and make a summary conclusion.

Exercise 8.4

17

Consider the disease Y study data in Module 4. The following SPSS output shows the
regression analysis results where M2 is the outcome variable and case-control is the
exposure.

a) Interpret the beta coefficient.


b) Interpret the 95% CI of the beta coefficient.
c) State the null and alternative hypothesis that you would be evaluating.
d) Do you think that the M2 is related to disease Y? Justify your answer.

Exercise 8.5

18

The following table shows the multiple regression analysis of SBP on AGE, BMI, DBP,
SES and CL for Los Angles blood pressure study.
a) Interpret the beta co-efficient
and 95% CI for the variables
age and SES group.
pzzz
b) The R-squared value was
obtained as 59.2%. Discuss the
prediction performance of the
model.
c) Discuss at least two methods
that are useful for evaluating
the normality of the outcome
variable in regression model.
d) Discuss the results and make a
summary conclusion about the
effect of these covariates on
the SBP.

6
Exercise 8.5

19

ojo, hay un error de tipeo


a) Interpret the beta co-efficient and 95% CI for the variables age and SES group.
The SBP increases by 0.34mmHg for every year of AGE increase, provided BMI, DBP, SES and CL remains
the same. It can be said with 95% confidence that for every year of age, the SBP will increase between 0.191
los q permancen igual son
and 0.495mmHg if BMI, DBP, SES and CL remains the same.

For SES Group, the beta coefficient is 0.236, which means that the difference between mean SBP for exposed
(very high and high SES) and unexposed (very poor to moderate SES) is 0.236mmHg, provided BMI, DBP,
SES and CL remains the same. It can be said with 95% confidence that the difference between mean SBP for
age, bmi, dbp, chl,,,no SES
exposed and unexposed will lie between -3.390 and 3.862mmHg, provided BMI, DBP, SES and CL remains
the same.

Exercise 8.5

20

b) The R-squared value was obtained as 59.2%. Discuss the prediction performance of the model.
Model 1 Adjusted R2 value of 0.592 means that 59.2% of the variations in SBP is explained by
BMI, SES Group, AGE, CL and DBP.

Notes: After completion of step-wise backward elimination and selecting only the significant
covariates of the model (AGE and DBP); Model 4 Adjusted R2 value of 0.595 means that 59.5%
of the variation in the systolic blood pressure is explained by the patients’ age and diastolic
blood pressure.

c) Discuss at least two methods that are useful for evaluating the normality of the outcome variable in
regression model.
NPP (Normal Probability Plots) or Residual Plots

Exercise 8.5

21

Discuss the results and make a summary conclusion about the effect of these
covariates on the SBP.

7
SPSS Lab
22

• Combining categories using SPSS Lab 2 (see Activity 2.10-12).

You might also like