Chapter 5: Regression: 5.1 Meaning and Purpose
Chapter 5: Regression: 5.1 Meaning and Purpose
When the regression line has been determined, it can be used for predicting y-values
corresponding to given x-values, eg. if x = x0, what value of y (ie. ŷ 0) would be expected?
When the line is expressed in the form y = a + bx, x is called the independent
variable, y is called the dependent variable and the line is called the regression of y on x.
The choice of which variable to regard as independent is left to the user of the model,
depending on which variable he/she wants to predict.
33
Notice that some residuals are positive and some are negative; in fact, their sum is
usually close to zero. In order to measure how well a line fits the data points, we can
calculate the average of the squares of the residuals (ie. square each residual, add the squares
and divide the total by the number of residuals). If we then choose the line for which this
average is smallest, we will have chosen the least squares regression line.
Example 5.2.1: Which of these three lines better fits the data points than either of the
others?
Clearly Line 3 is a better fit than the other two, because the residuals are generally smaller
for Line 3.
It can be shown mathematically that the values of “a” and “b” for the least squares
regression line are:
n xy - x y
b and a y - bx
n x 2 x
2
Example 5.2.2: Calculate the least squares regression line for the data in
Example 4.2.1.
34
5271 1966
b 2.97
579 1919 and a = 66/5 – 2.97(19/5) = 1.91
In practice, one may plot any convenient points in order to draw the line; it is best to
plot at least three points, which should all be in a straight line, as a check.
35
36