Module 8 Updated April 3 2018
Module 8 Updated April 3 2018
Introductory Biostatistics
1
Tutorial
Module 8: Describing the Relationship Between
Events and Exposures.
Prepared by:
Baki Billah
Objectives
Review: Correlation
1
Correlation Analysis
4
Correlation Analysis
5
Correlation Analysis
6
2
Correlation Analysis
7
e) Make a conclusion
Birthweight has a positive moderate correlation with oestriol level.
Review: Regression
Review: Regression
3
Review: Regression Steps to follow
10
Objectives
Assumptions
Hypothesis
Statistical method(s)
Interpret beta coefficient and 95% CI
Discuss model adequacy
Discussion of Results & Conclusion
Exercise 8.1
11
The following tables and graphs show the linear regression analysis of husband’s SBP on
wife’s SBP (obtained using SPSS) for SBP of husband and wife pair data in Table 8.1.
Note: outcome: husband, exposure: wife.
a) State the objective of the study.
b) State the assumptions related to
linear regression analysis and
discuss how would you evaluate
them?
c) State the hypotheses for testing
the relationship between husband-
wife SBP.
d) Interpret the beta co-efficient and
its 95% CI.
e) Discuss the results presented in the
above tables and graphs and make
a summary conclusion
Exercise 8.1
12
b) Assumptions
The outcome variable follows the normal distribution (for example, the husband’s SBP follows the
normal distribution). This assumption can be evaluated by using histogram or box-plot or QQ plot
or Shapiro Wilks test.
The variability in the outcome remains the same across different values of the covariate.
c) Hypotheses
The null hypothesis to be tested is that there is no association between the husband’s and wife’s
SBPs in the population.
The alternative hypothesis is that there is an association between the husband’s SBP and wife’s
SBP.
4
Exercise 8.1
13
Exercise 8.1
14
The residual plot does not show any pattern or trend, so the relationship between husband’s
SBP and wife’s SBP is linear. Also, more than 95% of the residuals fall within +/-2 of the zero
line, which shows that the residual/outcome follows the normal distribution. Further, the
residual plot does not show any outliers or extreme values in the data.
Exercise 8.1
15
Discussion: The beta coefficient is 1.061. The relationship is positive and statistically significant because
the p-value < 0.001 and 95% CI is 0.655 to 1.467mmHg, and does not include zero. The residual plot does
not show any outliers in the data set. Also the normal probability plot confirms the normality of husband’s
SBP, the outcome variable. Furthermore, 60.5% of the variation among the observed values of husband’s
SBP is explained by wife’s SBP, which shows that wife’s SBP is a strong predictor for husband’s SBP.
5
Exercise 8.2 – In-Tutorial Activity
16
The following SPSS output shows the simple regression analysis for systolic
blood pressure (SBP) and patients’ age in the Los Angles blood pressure study
data.
Exercise 8.4
17
Consider the disease Y study data in Module 4. The following SPSS output shows the
regression analysis results where M2 is the outcome variable and case-control is the
exposure.
Exercise 8.5
18
The following table shows the multiple regression analysis of SBP on AGE, BMI, DBP,
SES and CL for Los Angles blood pressure study.
a) Interpret the beta co-efficient
and 95% CI for the variables
age and SES group.
pzzz
b) The R-squared value was
obtained as 59.2%. Discuss the
prediction performance of the
model.
c) Discuss at least two methods
that are useful for evaluating
the normality of the outcome
variable in regression model.
d) Discuss the results and make a
summary conclusion about the
effect of these covariates on
the SBP.
6
Exercise 8.5
19
For SES Group, the beta coefficient is 0.236, which means that the difference between mean SBP for exposed
(very high and high SES) and unexposed (very poor to moderate SES) is 0.236mmHg, provided BMI, DBP,
SES and CL remains the same. It can be said with 95% confidence that the difference between mean SBP for
age, bmi, dbp, chl,,,no SES
exposed and unexposed will lie between -3.390 and 3.862mmHg, provided BMI, DBP, SES and CL remains
the same.
Exercise 8.5
20
b) The R-squared value was obtained as 59.2%. Discuss the prediction performance of the model.
Model 1 Adjusted R2 value of 0.592 means that 59.2% of the variations in SBP is explained by
BMI, SES Group, AGE, CL and DBP.
Notes: After completion of step-wise backward elimination and selecting only the significant
covariates of the model (AGE and DBP); Model 4 Adjusted R2 value of 0.595 means that 59.5%
of the variation in the systolic blood pressure is explained by the patients’ age and diastolic
blood pressure.
c) Discuss at least two methods that are useful for evaluating the normality of the outcome variable in
regression model.
NPP (Normal Probability Plots) or Residual Plots
Exercise 8.5
21
Discuss the results and make a summary conclusion about the effect of these
covariates on the SBP.
7
SPSS Lab
22