0% found this document useful (0 votes)
57 views

Chapter 5: Regression: 5.1 Meaning and Purpose

This document discusses regression analysis and determining the best-fit regression line for correlated data. It introduces regression lines and using them to predict dependent variable (y) values from independent variable (x) values. The procedure is to determine the least squares regression line, where the average of the squared residuals between observed y-values and predicted y-values is minimized. Examples are provided to demonstrate calculating the regression line coefficients and using the line to predict values.

Uploaded by

Israel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views

Chapter 5: Regression: 5.1 Meaning and Purpose

This document discusses regression analysis and determining the best-fit regression line for correlated data. It introduces regression lines and using them to predict dependent variable (y) values from independent variable (x) values. The procedure is to determine the least squares regression line, where the average of the squared residuals between observed y-values and predicted y-values is minimized. Examples are provided to demonstrate calculating the regression line coefficients and using the line to predict values.

Uploaded by

Israel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

CHAPTER 5: REGRESSION

5.1 Meaning and purpose


If two quantitative variables (x and y) are correlated, the points in their scattergram
will tend to lie about a straight line; such a line is called a regression line. One is often
interested to determine what the line is, because it can be used to make reasonably reliable
predictions, if the variables are correlated fairly strongly (say, r < -0.5 or r > 0.5).
One could, of course, fit a line to the data by eye, ie. draw on the scattergram the line
which best seems to fit the data, but it is better to determine a model for the relationship in
the form of an equation for the line. The general form for the equation of a straight line may
be written as: y = a + bx, where “a” and “b” are constants.
In order to distinguish observed values of y from the y-values of points on the
regression line, we define the regression line as: ŷ  a  bx .

When the regression line has been determined, it can be used for predicting y-values
corresponding to given x-values, eg. if x = x0, what value of y (ie. ŷ 0) would be expected?
When the line is expressed in the form y = a + bx, x is called the independent
variable, y is called the dependent variable and the line is called the regression of y on x.
The choice of which variable to regard as independent is left to the user of the model,
depending on which variable he/she wants to predict.

5.2 Procedure for determining a regression line


How can the best-fitting line be determined?
For every x-value in the dataset, define y - ŷ  , ie. the difference between the
corresponding y-value in the dataset and the y-value given by the regression line, as the
residual. For any given line, the smaller the residuals are, the better will be the fit of the line
to the data points.

33
Notice that some residuals are positive and some are negative; in fact, their sum is
usually close to zero. In order to measure how well a line fits the data points, we can
calculate the average of the squares of the residuals (ie. square each residual, add the squares
and divide the total by the number of residuals). If we then choose the line for which this
average is smallest, we will have chosen the least squares regression line.
Example 5.2.1: Which of these three lines better fits the data points than either of the
others?

Clearly Line 3 is a better fit than the other two, because the residuals are generally smaller
for Line 3.
It can be shown mathematically that the values of “a” and “b” for the least squares
regression line are:

n  xy -  x  y 
b  and a  y - bx
n  x 2   x 
2

Example 5.2.2: Calculate the least squares regression line for the data in
Example 4.2.1.

Cable No. of Breaking xy x2


strands in strength
cable (tonne)
(x) (y)
A 4 15 60 16
B 3 10 30 9
C 2 8 16 4
D 5 17 85 25
E 5 16 80 25
Total 19 66 271 79

34
5271  1966
b  2.97
579  1919 and a = 66/5 – 2.97(19/5) = 1.91

so the regression line is ŷ  1.9  3.0x

5.3 Plotting a regression line


The regression line is: ŷ  a  bx
When x  0, ŷ  a; when x  x, ŷ  a  bx  y - bx   bx  y
.
Thus the least squares regression line passes through the points (0,a) and x, y .
Although one can readily plot the second of these points, which is in the middle of the scatter
of points, it is often not convenient to plot the first one. Since one need not show the origin
in a scattergram, it is not necessary to show the “y-intercept” (ie. a) either.
Example 5.3.1: Draw the least squares regression line derived in Example 5.2.2.
The regression line is ŷ  1.9  3.0x , so the y-intercept is 1.9.
Also x  19/5  3.8 and y  66/5  13.2.

In practice, one may plot any convenient points in order to draw the line; it is best to
plot at least three points, which should all be in a straight line, as a check.

5.4 Use of a regression line for prediction


Given a particular x-value, one simply has to substitute it into the regression equation
in order to determine the expected y-value. Alternatively one may read the predicted value
off the graph of the regression line.
Example 5.4.1: Using the least squares regression line derived in Example 5.2.2,
predict the breaking strength of a 6-stranded cable.
The regression line is ŷ  1.9  3.0 x, so when x  6, ŷ  1.9  3.0 * 6  19.9.
Thus we predict the breaking strength of a 6-stranded cable will be 20 tonnes.

35
36

You might also like