0% found this document useful (0 votes)

11 views

Regn_lect_3

Epidemiology linear regression

Uploaded by

Martha Reuben

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Regn_lect_3

Epidemiology linear regression

Uploaded by

Martha Reuben

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

1

Lecture 3: Simple Linear Regression

Objectives
 Interpret a scatter diagram that shows the relation between two quantitative variables
 Understand the assumptions that are made in a linear regression analysis
 Interpret a simple linear regression equation, describing the intercept and slope
 Learn how to test whether a regression relationship is statistically significant
 Understand why extrapolation may be inappropriate
 Understand the relationship between correlation and linear regression

We are often interested in the relationship between our response and one (or more) variables
(a) We might wish to see whether rhknow increases with age in Stepping stones
(b) We could investigate whether the time taken for the TOP procedure depends on
gestational age in the misoprostol study.

We investigate relationships statistically using regression models. The simplest type of

regression model is the simple linear regression model in which the response variable y is
related to the input variable x according to the equation Y = α + β x
So our line has two parameters namely α and β
α is the intercept i.e. the value of y when x=0
β is the slope i.e. when x increases by 1 unit, y will increase by β units
In practice we have random error in our observations, so we adopt our simple linear regression
model yi = α + β xi + εi
We do not know the true population parameters α and β, so we have to estimate them from
our data using sample statistics a and b.

We do this using the method of “least squares” which fits the line
y= a + bx
which passes as closely as possible to our observed data points.
If yi = a + bxi is the i-th predicted value, then a and b are chosen to minimize the sum of the
squared vertical deviations between the observed and fitted line. In terms of our simple linear
regression model
yi = α + β xi + εi
we minimize ∑ε2 with respect to α and β
It turns out that the minimizing values are
2

b = Sxy / Sxx
where Sxy = ∑ xi yi - (∑xi ) (∑yi )/n is the corrected sum of products and
Sxx = ∑ xi 2 - (∑xi )2 /n is the corrected sum of squares for x
And a = mean(y) – b * mean(x)
Our fitted line
y = a + bx will always pass through the point mean(x), mean(y)

Ex: The table below shows plasma volume and body weight in eight healthy men (Kirkwood &
Sterne page 87). Our outcome variable is plasma volume and our explanatory (exposure)
variable is body weight; we wish to see whether plasma volume is related to body weight.

Subject Body weight (kg) Plasma volume (litres)

1 58.0 2.75
2 70.0 2.86
3 74.0 3.37
4 63.5 2.76
5 62.0 2.62
6 70.5 3.49
7 71.0 3.05
8 66.0 3.12

The plot below shows that as body weight increases, plasma volume also tends to increase i.e.
there seems to be a relationship between plasma volume and body weight, so we can estimate
the regression equation.

Regression of Plasma Volume on Body Weight

3.5
3.25
3
2.75
2.5

55 60 65 70 75
Body weight ( kg )

plasma volume(litres) Fitted values

Assumptions of our regression model

1. The independent variable x is measured without error
2. The true value of the response variable y is linearly related to x ; y is subject to random error
yi = α + β xi + εi
3. The deviations εi are assumed to be
(a) independent
(b) normally distributed with zero mean and constant variance σ2

Fitting the regression line

In order to fit the regression line we need to estimate the parameters i.e. find a and b. We can
do this by using the formulae and carrying out the calculations as shown below:
Sxy = ∑ xi yi - (∑xi ) (∑yi )/n = 1615.295 – (535)*(24.02)/8 = 8.9575
Sxx = ∑ xi 2 - (∑xi )2 /n = 35983.5 – (535)2/8 = 205.375
b = Sxy / Sxx = 0.043615
a = mean(y) – b * mean(x) = 3.0025 – 0.043615*66.875 = 0.0857

Thus our fitted line has equation y = 0.0857 + 0.043615 x

(where y is the plasma volume and x is the body weight).
In order to plot the fitted line we need the co-ordinates of two points which can be found by
choosing two x values in the range of the data and using the fitted equation to find the
corresponding y values.
When x=60, y = 0.0857 + 0.043615*60 = 2.7
When x=70, y = 0.0857 + 0.043615*70 = 3.1
In addition the line should pass through the point given by the mean of x and the mean of y (in
this case where x = 66.875 and y = 3.0025).
In practice fitting the regression line is most easily done using a statistical package such as
Stata using the regress command as shown below.

. reg plasvol wt

Source | SS df MS Number of obs = 8

-------------+------------------------------ F( 1, 6) = 8.16
Model | .390684335 1 .390684335 Prob > F = 0.0289
Residual | .287265681 6 .047877614 R-squared = 0.5763
-------------+------------------------------ Adj R-squared = 0.5057
Total | .677950016 7 .096850002 Root MSE = .21881
4

------------------------------------------------------------------------------
plasvol | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
wt | .0436153 .0152684 2.86 0.029 .006255 .0809757
_cons | .0857244 1.023998 0.08 0.936 -2.419909 2.591358
------------------------------------------------------------------------------

Thus our fitted line has the equation

Plasvol = 0.0857 + 0.043615 * bodyweight (as before)

Thus plasma volume increases on average by 0.0436 litres for every 1 kg increase in body
weight (or by 10*0.0436 = 0.436 for a 10-kg weight increase).

To plot the observed points with the fitted line (as done above) we use the predict command to
save the fitted values and then the relevant graph command as shown below. The predict
command is a very powerful command that can be used to estimate or predict a number of
quantities after fitting a regression model and takes the form
predict newvar , options

where newvar is the name of the new variable which will contain the predicted value and the
option chosen specifies what this new variable will contain. If we omit the option (as below) then
newvar will contain the default which is the fitted values (or the values of y predicted from our
model).
. predict plasfit
(option xb assumed; fitted values)
. twoway (scatter plasvol wt, sort msymbol(plus)) (connected plasfit wt, sort
msymbol(none)) ,
> xtitle (body weight (kg)) xlabel (55 (5) 75) ylabel (2.5 (0.25) 3.5)
ti("Regression of plasma volume on body weight")

Assessing the regression line

We are interested in the question: “Is there a real relationship between y and x?”
That is whether the slope of the assumed linear regression genuinely is different from zero.

We can test this by constructing the analysis of variance table for the regression model:
We subdivide the total variation in our sample of y values into the variation which can be
attributed to the regression line (the regression sum of squares) and the residual variation about
the fitted line, as shown on the graph below in which we have drawn in the mean of the x-values
and the mean of the y-values.
5

Regression of Plasma Volume on Body Weight

3.5
3.25
3
2.75
2.5

55 60 65 70 75
Body weight ( kg )

plasma volume(litres) Fitted values

In practice the analysis of variance is obtained by calculating the total sum of squares as the
sum of the squared deviations about the mean of the y-values i.e . Syy and the regression sum
of squares given by
Regression ss = (Sxy )2 / Sxx
We can then calculate the residual sum of squares by subtraction
Residual ss = total ss – regression ss
Note that the residual sum of squares is the minimized value of the sum of squared deviations.

The partition of variation is summarized in the analysis of variance table. To construct this we
need the degrees of freedom (number of pieces of independent information that go into the
calculation of each sum of squares). The (n-1) degrees of freedom for the total sum of squares
are divided into 1 d.f. for the slope (regression) and (n-2) for the residual – we can understand
this as having n observations from which we have estimated 2 parameters (a and b) – hence we
have (n-2) d.f. for the residual.
From each sum of squares we calculate a mean square by dividing the sum of squares by the
associated degrees of freedom. The residual mean square s2 found by dividing the residual sum
of squares by (n-2) provides an estimate of the residual variation in y having “adjusted for the
effect of x”.
6

In our example Syy = ∑ yi 2 - (∑yi )2 /n = 72.798 – (24.02)2/8 = 0.67795

The regression sum of squares is given by (Sxy)2 / Sxx = (8.9575)2/ 205.375 = 0.39068
Thus by subtraction the residual sum of squares is 0.67795 – 0.39068 = 0.28727

We can then construct our ANOVA table noting that since we have 8 observations our total
degrees of freedom will be (n-1)=7 and the residual degrees of freedom will be
(n-2)=6.

Source of variation d.f. Sum of Squares Mean Square F ratio

Regression 1 0.39068 0.39068 8.16

Residual 6 0.28727 0.047878

Total 7 0.67795

We can use the F-ratio (8.16) to test

H0: There is no relationship between y and x
versus
H1: There is a relationship between y and x

Under H0 we would expect the regression and residual mean squares to be comparable (i.e. the
variation due to the effect of body weight should be comparable to the residual or random
variation) and thus we would expect F to be about 1; in fact under H0 the F-ratio follows an F-
distribution with 1,6 degrees of freedom. Comparing our F-ratio of 8.16 with F-tables shows that
the probability of getting such an F-value if H0 is true is less than 0.05 (in fact P=0.0289) so we
can reject the null hypothesis and conclude that there is a relationship between y and x.
The components of the ANOVA table are given as part of the output from the regress command
(and are reproduced below).
7

. reg plasvol wt

Source | SS df MS Number of obs = 8

Standard error of regression estimates

The calculated values of a and b are estimates of the population parameters α and β (the
intercept and slope) and are thus subject to sampling variation. As with other estimates, their
precision is measured by their standard errors.
It can be shown that
s.e.(a) = s √ { 1/n + mean(x)2 / Sxx }
s.e.(b) = s / √ Sxx
where s is the square root of the residual mean square form the ANOVA table and gives the
standard deviation of the data points about the regression line (on n-2 degrees of freedom).
In our example s= √0.047878 = 0.2188
s.e. (a) = 0.2188 * √ { 1/8 + (66.9)2 / 205.375 } = 1.024
s.e. (b) = 0.2188 / √ 205.375 = 0.01528

Note:
1. The output from regress in Stata gives the standard errors for the regression estimates.
------------------------------------------------------------------------------
plasvol | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
wt | .0436153 .0152684 2.86 0.029 .006255 .0809757
_cons | .0857244 1.023998 0.08 0.936 -2.419909 2.591358
------------------------------------------------------------------------------

The standard errors can be used to test the hypothesis that the parameter is zero and also to
construct confidence intervals for the parameter.
In this case the T-statistic (on 6 d.f.) for testing the hypothesis that β=0 is 2.86 and we reject the
null hypothesis (P=0.029). Note that in the case of simple linear regression the t-test is identical
to the F-test (in fact T2 = F or 2.862 ≈ 8.16 – to within rounding error). The 95% confidence limits
for β are 0.006255 to 0.08098 – so we are confident that the plasma volume will increase by
between 0.00626 and 0.08098 litres for every 1 kg increase in body weight.
8

2. In this example the intercept is not meaningful. Literally it could be interpreted as the
estimated mean plasma volume when weight=0 kg. The intercept can be made more
meaningful by centering the exposure variable i.e. subtracting the mean of the exposure
variable from each observation, so that the new exposure variable has mean 0. The
intercept in a simple linear regression model with a centered exposure variable is equal to
the mean of the outcome variable.
3. Also note that in general the test of the null hypothesis that the intercept is zero is usually
not of interest.

Prediction
We can use our regression equation for prediction; the predicted value y* when x=x* is given by
y* = a + bx*
The standard error of prediction is given by
s.e.(y*) = s √ {1+ 1/n + [x*-mean(x)]2 / Sxx }
The standard error is smallest when x* is close to the mean of the x-values and increases as x*
moves away from the mean, thus prediction is most precise close to the mean of the exposure
variable.
Also note that in general we should avoid extrapolation i.e. using the regression line for
predicting values outside the range of x in the original data, as the linear relationship may not
hold true beyond the range over which it has been fitted.
Note that we have seen how to use the predict command to find our predicted values
corresponding to each observation. We can use predict with the stdf option to find standard
errors for the predicted values (stdf is short for standard error of the forecast value)
predict seyfit , stdf

Ex: Since measuring plasma volume is time consuming, we may wish to predict it from body
weight. The predicted plasma volume for a man weighing 66 kg is
0.0832 + 0.0436*66 = 2.96 litres
s.e. (y*) = 0.2189 √ {1+ 1/8 + [66 – 66.9]2 / 205.375 } = 0.23 litres

So 95% confidence limits for the plasma volume for a man of 66 kg (known as the prediction
interval) are
2.96 ± 2.45 * 0.23 or (2.40 ; 3.52).
9

Proportion of variation explained by regression

The proportion of variation in y that is explained by the regression on x is given by
R2 = (Regression sum of squares) / (Total sum of squares)
In this example R2 = 0.576 (given as part of the Stata output)
Thus 57.6% of the variation in plasma volume is explained by its linear relationship with body
weight – so in this case 42.4% of the variation is still unexplained.

Correlation
As well as estimating the best fitting straight line, we may wish to examine the strength of the
linear association between the outcome and exposure variable. This is measured by the
correlation coefficient r which is estimated as
r = Sxy / √{ Sxx Syy }
In our example on plasma volume
r = 8.96 / √{ 205.375 * 0.6780 } = 0.76
The correlation coefficient always lies between -1 and 1 and equals zero if there is no linear
association between x and y. It is positive if high values of y are associated with high values of x
and low values of y are associated with low values of x; the higher its value, the stronger the
association. It is negative if high values of y are associated with low values of x and low values
of y are associated with high values of x. The correlation coefficient has the same sign as the
regression slope b.
A useful interpretation of the correlation coefficient is that it is the number of standard deviations
that the outcome y changes for each standard deviation change in the exposure x.
For larger samples (n>100) this gives a simple method of finding a confidence interval for the
correlation coefficient.

For studies with smaller sample sizes, confidence intervals for the correlation coefficient can be
derived using Fisher’s transformation
zr = 0.5 loge {(1+r) / (1-r)}
s.e.(zr) ≈ 1 / √ (n-3)
We can then find a confidence interval for zr which can then be back-transformed to give a
confidence interval for r using the inverse of Fisher’s transformation
r = {exp(2 zr ) – 1} / {exp(2 zr ) + 1}
10

In our example
zr = 0.5 loge {(1.76) / (0.24)} = 0.994
s.e.(zr) ≈ 1 / √ (8-3) = 0.447
So our 95% c.i. for zr is
0.994 – 1.96*0.447 ; 0.994 + 1.96*0.447
or 0.1176 ; 1.8706
Applying the inverse of Fisher’s transformation gives a confidence interval for r of
0.1171 ; 0.9536

Note that R2, the proportion of variation in y explained by its relationship to x, is simply the
square of the correlation coefficient.
Correlation is particularly useful when looking for association between variables when there is
no clear outcome variable and no clear exposure variable (e.g. if we look for an association
between the results of two blood chemistry variables such as sodium and potassium) and will be
important later on when we consider multi-collinearity between exposure variables in multiple
regression. For examining the relationship between an outcome variable and an exposure
variable, simple linear regression is generally preferred by many statisticians.

FINA 6216 Report - Data Case 5
No ratings yet
FINA 6216 Report - Data Case 5
5 pages
Pset 6 - Fall2019 - Solutions PDF
100% (3)
Pset 6 - Fall2019 - Solutions PDF
33 pages
CH 04 Wooldridge 6e PPT Updated
No ratings yet
CH 04 Wooldridge 6e PPT Updated
39 pages
CH 17 Statistica
No ratings yet
CH 17 Statistica
36 pages
Feasibility of Ikea in India - Market Research 2014
100% (5)
Feasibility of Ikea in India - Market Research 2014
71 pages
Linear Regression and Correlation
No ratings yet
Linear Regression and Correlation
35 pages
05 Class RegressionCorrelation
No ratings yet
05 Class RegressionCorrelation
57 pages
Multiple Regression: Model and Interpretation
No ratings yet
Multiple Regression: Model and Interpretation
10 pages
reg
No ratings yet
reg
110 pages
18 SL Regression 1 320E F21
No ratings yet
18 SL Regression 1 320E F21
40 pages
Linear Regression and Correlation
No ratings yet
Linear Regression and Correlation
26 pages
Regression Analysis in Minitab
100% (1)
Regression Analysis in Minitab
4 pages
@regression
No ratings yet
@regression
33 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
20 pages
Regression PDF
No ratings yet
Regression PDF
18 pages
What Is Simple Linear Regression?
No ratings yet
What Is Simple Linear Regression?
7 pages
Simple Linear Regression 2023
No ratings yet
Simple Linear Regression 2023
33 pages
What Is Multiple Linear Regression
No ratings yet
What Is Multiple Linear Regression
23 pages
Minitab Simple Regression Analysis
No ratings yet
Minitab Simple Regression Analysis
7 pages
Regression Analysis Assignment1111
No ratings yet
Regression Analysis Assignment1111
13 pages
Mapingure Simple Linear Regression
No ratings yet
Mapingure Simple Linear Regression
23 pages
13Simple linear Regression
No ratings yet
13Simple linear Regression
46 pages
Minitab Multiple Regression Analysis
No ratings yet
Minitab Multiple Regression Analysis
6 pages
Minitab Multiple Regression Analysis
100% (1)
Minitab Multiple Regression Analysis
6 pages
Minitab Multiple Regression Analysis PDF
No ratings yet
Minitab Multiple Regression Analysis PDF
6 pages
Primer of Applied Regression and Analysis of Variance (Glantz S.a., Slinker B.K., Neilands T.B)
No ratings yet
Primer of Applied Regression and Analysis of Variance (Glantz S.a., Slinker B.K., Neilands T.B)
1,472 pages
Correlation Regression Tutorial
No ratings yet
Correlation Regression Tutorial
42 pages
RSM1282-2025-Session 6-Multiple Regression POST (1)
No ratings yet
RSM1282-2025-Session 6-Multiple Regression POST (1)
84 pages
Nonlinear Regression Using EXCEL Solver
No ratings yet
Nonlinear Regression Using EXCEL Solver
10 pages
Practical Biostatistics BMB-308: Torial Port and Presentation
No ratings yet
Practical Biostatistics BMB-308: Torial Port and Presentation
28 pages
Calculation of VIF
No ratings yet
Calculation of VIF
24 pages
Advanced Statistical Methods
No ratings yet
Advanced Statistical Methods
63 pages
Iml Exp. 3
No ratings yet
Iml Exp. 3
4 pages
9 Regression and Correlation Methods 5 2023
No ratings yet
9 Regression and Correlation Methods 5 2023
7 pages
Common Pitfalls in Statistical Analysis: Linear Regression Analysis
No ratings yet
Common Pitfalls in Statistical Analysis: Linear Regression Analysis
4 pages
Notes 1
No ratings yet
Notes 1
26 pages
question
No ratings yet
question
7 pages
5.2.1 Bayesian Linear Regression
No ratings yet
5.2.1 Bayesian Linear Regression
9 pages
Regression
No ratings yet
Regression
6 pages
Regression: Finding The Equation of The Line of Best Fit: Background and General Principle
No ratings yet
Regression: Finding The Equation of The Line of Best Fit: Background and General Principle
6 pages
Correlation and Regression
No ratings yet
Correlation and Regression
8 pages
R-programming - Unit 5
No ratings yet
R-programming - Unit 5
43 pages
Weatherwax Weisberg Solutions
No ratings yet
Weatherwax Weisberg Solutions
162 pages
Lesson1 - Simple Linier Regression
No ratings yet
Lesson1 - Simple Linier Regression
40 pages
Linear Regression
No ratings yet
Linear Regression
13 pages
Correlation and Regression
No ratings yet
Correlation and Regression
10 pages
15Multiple Linear Regression
No ratings yet
15Multiple Linear Regression
168 pages
Assignment IV Probability
No ratings yet
Assignment IV Probability
18 pages
Lecture 4 Linear Regression
No ratings yet
Lecture 4 Linear Regression
75 pages
Deepu Final
No ratings yet
Deepu Final
9 pages
Chapter 5 - Regression
No ratings yet
Chapter 5 - Regression
7 pages
Reg & Cor QMS 080-1
No ratings yet
Reg & Cor QMS 080-1
48 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
3 pages
RegrCorr PDF
No ratings yet
RegrCorr PDF
20 pages
Example Sheet 4 1a. Data - Read - Table ("Salary - TXT", Sep "", Header FALSE)
No ratings yet
Example Sheet 4 1a. Data - Read - Table ("Salary - TXT", Sep "", Header FALSE)
4 pages
Exercise 1 Statistical Learning
No ratings yet
Exercise 1 Statistical Learning
11 pages
Simple Regression Analysis: April 2011
No ratings yet
Simple Regression Analysis: April 2011
10 pages
Linear Correlation and Regression
No ratings yet
Linear Correlation and Regression
42 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
26 pages
Untitled 472
No ratings yet
Untitled 472
13 pages
Minitab Tip Sheet 15
No ratings yet
Minitab Tip Sheet 15
5 pages
Linear Algebra Fundamentals
From Everand
Linear Algebra Fundamentals
Kartikeya Dutta
No ratings yet
Comprehensive Linear Algebra
From Everand
Comprehensive Linear Algebra
Kartikeya Dutta
No ratings yet
Standard-Slope Integration: A New Approach to Numerical Integration
From Everand
Standard-Slope Integration: A New Approach to Numerical Integration
Peter James Italia, MD
No ratings yet
Regn_prac_4
No ratings yet
Regn_prac_4
1 page
Regn_prac_3
No ratings yet
Regn_prac_3
1 page
Regn_prac_5
No ratings yet
Regn_prac_5
1 page
Regn_prac_1
No ratings yet
Regn_prac_1
4 pages
Regn_lect_7
No ratings yet
Regn_lect_7
26 pages
Regn_prac_2
No ratings yet
Regn_prac_2
1 page
Regn_lect_4
No ratings yet
Regn_lect_4
9 pages
Regn_lect_6
No ratings yet
Regn_lect_6
8 pages
HIV Testing
No ratings yet
HIV Testing
260 pages
Regn_lect_5
No ratings yet
Regn_lect_5
9 pages
Cervical Cancer
No ratings yet
Cervical Cancer
91 pages
Knowledge Attitude and Practice On Cervi
No ratings yet
Knowledge Attitude and Practice On Cervi
87 pages
Robust Decision Trees
No ratings yet
Robust Decision Trees
6 pages
Ch. 8 Notes
No ratings yet
Ch. 8 Notes
5 pages
BT MGCR 650 Sample Final Exam Solutions MBAJapan
No ratings yet
BT MGCR 650 Sample Final Exam Solutions MBAJapan
9 pages
Review+C+Unit+2+Topics+2.3+-+2.9
No ratings yet
Review+C+Unit+2+Topics+2.3+-+2.9
8 pages
Design and Analysis of Experiments 8th Edition Montgomery Solutions Manualpdf download
100% (3)
Design and Analysis of Experiments 8th Edition Montgomery Solutions Manualpdf download
50 pages
ICT513 Data Analytics - A3
No ratings yet
ICT513 Data Analytics - A3
25 pages
(Quantitative Applications in The Social Sciences) Dr. Charles W. Ostrom - Time Series Analysis - Regression Techniques-Sage Publications, Inc (1990)
No ratings yet
(Quantitative Applications in The Social Sciences) Dr. Charles W. Ostrom - Time Series Analysis - Regression Techniques-Sage Publications, Inc (1990)
29 pages
Chem 206 Lab Manual
No ratings yet
Chem 206 Lab Manual
69 pages
9700 w08 QP 5
No ratings yet
9700 w08 QP 5
12 pages
Mincer Et Al 1993
No ratings yet
Mincer Et Al 1993
12 pages
Sampling Distribution of The Proportion
No ratings yet
Sampling Distribution of The Proportion
8 pages
Experimental Investigation and Optimization of Machining Parameters For Surface Roughness in CNC Turning by Taguchi Method
No ratings yet
Experimental Investigation and Optimization of Machining Parameters For Surface Roughness in CNC Turning by Taguchi Method
6 pages
Clog P Dengan Aktivitas (Log 1/IC) : Regression
No ratings yet
Clog P Dengan Aktivitas (Log 1/IC) : Regression
6 pages
Natural Disasters Prediction
No ratings yet
Natural Disasters Prediction
21 pages
05 DIY Guide Regression Analysis
No ratings yet
05 DIY Guide Regression Analysis
10 pages
Kunal - Dry Fog Paper
No ratings yet
Kunal - Dry Fog Paper
17 pages
Minitab DOE Tutorial2015
No ratings yet
Minitab DOE Tutorial2015
3 pages
Uji SPSS Fix
No ratings yet
Uji SPSS Fix
7 pages
Excel Beta Example
No ratings yet
Excel Beta Example
5 pages
University of Zimbabwe: Authorized Materials: Calculator
No ratings yet
University of Zimbabwe: Authorized Materials: Calculator
11 pages
Anova & Factor Analysis
No ratings yet
Anova & Factor Analysis
24 pages
Prediction of Stock Price Movements Through Regression Analysis For Sun Pharma and Cipla
No ratings yet
Prediction of Stock Price Movements Through Regression Analysis For Sun Pharma and Cipla
4 pages
Linear_regression_final
No ratings yet
Linear_regression_final
160 pages
ASM Question Paper
No ratings yet
ASM Question Paper
2 pages
Powerful Forecasting With MS Excel Sample
No ratings yet
Powerful Forecasting With MS Excel Sample
257 pages

Uploaded by

Uploaded by

1

Lecture 3: Simple Linear Regression

We investigate relationships statistically using regression models. The simplest type of

Subject Body weight (kg) Plasma volume (litres)

Regression of Plasma Volume on Body Weight

plasma volume(litres) Fitted values

Assumptions of our regression model

Fitting the regression line

Thus our fitted line has equation y = 0.0857 + 0.043615 x

Source | SS df MS Number of obs = 8

Thus our fitted line has the equation

Assessing the regression line

Regression of Plasma Volume on Body Weight

plasma volume(litres) Fitted values

In our example Syy = ∑ yi 2 - (∑yi )2 /n = 72.798 – (24.02)2/8 = 0.67795

Source of variation d.f. Sum of Squares Mean Square F ratio

Regression 1 0.39068 0.39068 8.16

Residual 6 0.28727 0.047878

We can use the F-ratio (8.16) to test

Source | SS df MS Number of obs = 8

Standard error of regression estimates

Proportion of variation explained by regression

You might also like