Correlation and Regression
Correlation and Regression
REGRESSION
CORRELATION
Finding the relationship between two quantitative variables without being able to infer
causal relationships
Correlation is a statistical technique used to determine the degree to which two variables
are related
Scatter diagram
• Rectangular coordinate
• Two quantitative variables
• One variable is called independent (X) and
the second is called dependent (Y)
• Points are not joined
• No frequency table
Example
SBP(mmHg)
220
200
180
160
140
120
100
80 wt (kg)
60 70 80 90 100 110 120
200
180
160
140
120
100
80
Wt (kg)
60 70 80 90 100 110 120
16
14
12
Height in CM
10
0
0 10 20 30 40 50 60 70 80 90
Age in Weeks
NEGATIVE RELATIONSHIP
Reliability
Age of Car
NO RELATION
CORRELATION COEFFICIENT
Statistic showing the degree of relation between two variables
SIMPLE CORRELATION COEFFICIENT
(R)
It is also called Pearson's correlation or product moment
correlationcoefficient.
It measures the nature and strength between two variables of
the quantitative type.
The sign of r denotes the nature of
association
While if the sign is -ve this means an inverse or indirect relationship (which
means an increase in one variable is associated with a decrease in the other).
The value of r ranges between ( -1) and ( +1)
The value of r denotes the strength of the association as illustrated
by the following diagram.
If r = l = perfect correlation.
How to compute the simple correlation
coefficient (r)
xy x y
r n
( x) 2
( y)
2
x
2 . y
2
n n
EXAMPLE:
A sample of 6 children was selected, data about their age in years and weight in kilograms
was recorded as shown in the following table . It is required to find the correlation between
age and weight.
xy x y
r n
( x) 2 ( y)2
x
2 . y
2
n n
Weight Age
Serial
Y2 X2 xy (Kg) (years)
.n
(y) (x)
144 49 84 12 7 1
64 36 48 8 6 2
144 64 96 12 8 3
100 25 50 10 5 4
121 36 66 11 6 5
169 81 117 13 9 6
r = 0.759
strong direct correlation
EXAMPLE: RELATIONSHIP BETWEEN ANXIETY
AND TEST SCORES
XY Y2 X2 Test Anxiety
score (Y) )X(
20 4 100 2 10
24 9 64 3 8
18 81 4 9 2
7 49 1 7 1
30 36 25 6 5
30 25 36 5 6
XY=129∑ Y2 = 204∑ X2 = 230∑ Y = 32∑ X = 32∑
CALCULATING CORRELATION COEFFICIENT
(6)(129) (32)(32) 774 1024
r .94
6(230) 32 6(204) 32
2 2
(356)(200)
r = - 0.94
Regression tells us how to draw the straight line described by the correlation
REGRESSION
Calculates the “best-fit” line for a certain set of data
The regression line makes the sum of the squares of the residuals smaller than for any
other line
220
200
180
160
140
120
100
80
Wt (kg)
60 70 80 90 100 110 120
By using the least squares method (a procedure that minimizes the vertical
deviations of plotted points surrounding a straight line) we are
able to construct a best fitting straight line to the scatter diagram points and then
formulate a regression equation in the form of:
ŷ a bX
x y
xy
ŷ y b(x x) bb1 n
( x) 2
x 2
n
REGRESSION EQUATION
SBP(mmHg)
220
Regression equation describes 200
the regression line mathematically 180
Intercept 160
Slope 140
120
100
80
Wt (kg)
60 70 80 90 100 110 120
LINEAR
LINEAR EQUATIONS
EQUATIONS
Y
ŷ a bX
Y = bX + a
Change
b = Slope in Y
Change in X
a = Y-intercept
X
HOURS STUDYING AND GRADES
REGRESSING GRADES ON HOURS
Linear Regression
90.00 Final grade in course = 59.95 + 3.17 * study
R-Square = 0.88
80.00
70.00
41 66
461
b 6 0.92
2
(41)
291
6
Regression equation
x n
2 41678
20
ŷ =112.13 + 0.4547 x
for age 25
B.P = 112.13 + 0.4547 * 25=123.49 = 123.5 mm hg
MULTIPLE REGRESSION