0% found this document useful (0 votes)
13 views

C2-English

Uploaded by

tranlh234081e
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

C2-English

Uploaded by

tranlh234081e
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Chapter 2.

Multiple Regression
(Course: Econometrics)

Phuong Le

Faculty of Economic Mathematics


University of Economics and Law
Vietnam National University, Ho Chi Minh City
Content

1 Multiple regression model


Introduction
Least squares method
Multiple coefficient of determination

2 Hypothesis testing and interval estimation


Testing for significance
Interval estimation

3 Model selection
Information criteria
Wald test
Introduction

Multiple regression model


The equation that describes how the dependent variable y is related
to p independent variables x1 , x2 , . . . , xp and an error term ε is:

y = β0 + β1 x1 + β2 x2 + ... + βp xp + ε,

where
• β0 , β1 , . . . , βp are the parameters (there is k = p + 1 parameters),
• ε is a random variable called the error term.

Multiple Regression Equation


The equation that describes how the mean value of y is related to
x1 , x2 , . . . , xp is:

E(y) = β0 + β1 x1 + β2 x2 + ... + βp xp .
Introduction
Matrix representation

Y = X β + ε,
where
   
1 x11 x21 ... xp1 y1
 1 x12 x22 ... xp2   y2 
X = .. .. .. ..  , Y =  .. ,
   
 . . . .   . 
1 x1n x2n ... xpn yn
   
β0 ε1
 β1   ε2 
β= .. , ε =  ..
   

 .   . 
βp εn

Note: (x1j , x2j , . . . , xpj , yj ) is the j-th observation for j = 1, 2, . . . , n.


Introduction
Estimated Multiple Regression Equation

ŷ = β̂0 + β̂1 x1 + β̂2 x2 + ... + β̂p xp .


A simple random sample is used to compute sample statistics
β̂0 , β̂1 , . . . , β̂p that are used as the point estimators of the parameters
β0 , β1 , . . . , βp .

Representation in matrix

Y = X β̂ + e,

where    
β̂0 e1
 β̂1   e2 
β̂ =  , e =  .. .
   
..
 .   . 
β̂p en
Some multiple regression functions

Cobb-Douglas prodution functions


Cobb-Douglas production functions are represented as

yi = β0 x1iβ1 x2iβ2 eεi ,

where yi : production, x1i : capital, x2i : labor, εi : error.


Cobb-Douglas production function can be transformed to

ln yi = β0 + β1 ln x1i + β2 ln x2i + εi .

This is a multiple regression model for ln y, ln x1 and ln x2 .


Quadratic regression function

yi = β0 + β1 xi + β2 xi2 + εi .
This is a multiple regression model for y, x and x 2 .
Least squares method
Least squares criterion

X X X 2
ei2 = (yi −ŷi )2 = yi − β̂0 − β̂1 x1i − β̂2 x2i − · · · − β̂p xpi → min .

Solving this, we get the OLS formula


−1
β̂ = X T X XTY.

Computation of coefficient values


• The formulas for the regression coefficients β̂0 , β̂1 , . . . , β̂p involve
the use of matrix algebra. We will rely on computer software
packages to perform the calculations.
• The emphasis will be on how to interpret the computer output
rather than on how to make the multiple regression
computations.
Least squares method

Example 1. A software firm collected data for a sample of 20


computer programmers. A suggestion was made that regression
analysis could be used to determine if salary was related to the years
of experience and the score on the firm’s programmer aptitude test.

The years of experience, score on the aptitude test, and


corresponding annual salary ($1000s) for a sample of 20
programmers is shown on the next slide.
Least squares method
Least squares method

Suppose we believe that salary (Salary) is related to the years of


experience (Experience) and the score on the programmer aptitude
test (TestScore) by the following regression model

Salary = β0 + β1 Experience + β2 TestScore + ε,

where
• Salary: annual salary ($1000s),
• Experience: years of experience,
• TestScore: score on programmer aptitude test.
Least squares method
STATA code: regress Salary Experience TestScore
MSE = ESS / n -p - 1 MSR = RSS / p
Result:

p=(tổng số biến độc lập)

n - p -1 =

n - 1=
căn bậc 2 của MSE

biến phụ thuộc

biến độc lập

Estimated Regression Equation

\ = 3.174 + 1.404 · Experience + 0.251 · TestScore.


Salary

(Note: Predicted salary will be in thousands of dollars.)


Least squares method
Interpretation of parameters
In multiple regression analysis, we interpret each regression
coefficient as follows: β̂i represents an estimate of the change in y
corresponding to one unit increase in xi when all other independent
variables are held constant.

Example 2. Interpretation of parameters in example 1:

\ = 3.174 + 1.404 · Experience + 0.251 · TestScore.


Salary

• Salary is expected to increase by $1,404 for each additional year


of experience (when the variable score on programmer attitude
test is held constant).
• Salary is expected to increase by $251 for each additional point
scored on the programmer aptitude test (when the variable years
of experience is held constant).
Multiple coefficient of determination
• Total Sum of Squares
X X
TSS = (yi − y)2 = yi2 − ny 2 = Y T Y − ny 2 .

• Sum of Squares due to Regression


X X
RSS = (ŷi − y)2 = xi2 − nx 2 .

• Sum of Squares due to Errors


X
ESS = (ŷi − yi )2 = β̂ T X T Y − ny 2 .

Relationship Among SST, SSR, SSE: TSS = ESS + RSS.


Multiple Coefficient of Determination:

RSS
R2 = .
TSS

These value can be found in ANOVA Output of the regression result.


Multiple coefficient of determination
Adjusted Multiple Coefficient of Determination
• Adding independent variables, even ones that are not statistically
significant, causes the prediction errors to become smaller, thus
reducing ESS.
• Because RSS = TSS – ESS, when ESS becomes smaller, RSS
becomes larger, causing R 2 = RSSTSS to increase.
• The adjusted multiple coefficient of determination compensates
for the number of independent variables in the model

n−1
Ra2 = R 2 := 1 − (1 − R 2 ) .
n−p−1

Example 3. From the result of example 1:


RSS 500.32853
R2 = = = 0.8342.
TSS 599.7855
n−1 20 − 1
R 2 = 1 − (1 − R 2 ) = 1 − (1 − 0.8342) = 0.8147.
n−p−1 20 − 2 − 1
Testing for significance

Assumptions About the Error Term ε


• The error ε is a random variable with mean of zero.
• The variance of ε, denoted by σ 2 is the same for all values of the
independent variables.
• The values of ε are independent.
• The error ε is a normally distributed random variable reflecting
the deviation between the y value and the expected value of y
given by β0 + β1 x1 + β2 x2 + ... + βp xp .
Testing for significance
• In simple linear regression, the F and t tests provide the same
conclusion.
• In multiple regression, the F and t tests have different purposes.
Testing for Significance: F Test
• The F test is used to determine whether a significant relationship
exists between the dependent variable and the set of all the
independent variables.
• The F test is referred to as the test for overall significance.

Testing for Significance: t Test


• If the F test shows an overall significance, the t test is used to
determine whether each of the individual independent variables
is significant.
• A separate t test is conducted for each of the independent
variables in the model.
• We refer to each of these t tests as a test for individual
significance.
Testing for significance

F Test for Overall Significance


1 Hypotheses:
H0 : β1 = β2 = · · · = βp = 0,
Ha : One or more of the parameters is not equal to zero.
2 Test Statistics:
RSS/p (n − p − 1)R 2
F = = .
ESS/(n − p − 1) p(1 − R 2 )

3 Rejection Rule: Reject H0 if p-value < α or if


F ≥ Fα (p, n − p − 1).
RSS ESS
Notes: We usually denote MSR = p , MSE = n−p−1 . Hence
MSR
F = MSE .
Testing for significance

Example 4. From the result of example 1, we want to test for overall


significance of the model at the significant level 0.05.
• Hypotheses:
H0 : β1 = β2 = 0,
Ha : One or more of the parameters is not equal to zero.
• Test Statistics:
RSS/p 500.32853/2
F = = = 42.76.
ESS/(n − p − 1) 99.4569697/17

• Fα (p, n − p − 1) = F0.05 (2, 17) = 3.592.


Since F > Fα (p, n − p − 1), we reject H0 .
Testing for significance

Estimation of standard errors for coefficents


The covariance matrix of β̂:

Var (β̂) = σ 2 (X T X )−1 .

Let cii be the item at the cell (i, i) of the matrix (X T X )−1 , then

σβ̂2 = σ 2 cii ≈ s2 cii ,


i

ESS
where s2 = MSE = n−p−1 . Hence we can estimate σβ̂i by

se(β̂i ) = s cii .
Testing for significance
t Test for Significance of Individual Parameters
For a given number β ∗ .
1 Hypotheses:
H0 : β i = β ∗ ,
Ha : βi ̸= β ∗ .
2 Test Statistics:
β̂i − β ∗
t= .
se(β̂i )

3 Rejection Rule: Reject H0 if p-value < α or if |t| > tα/2 (n − p − 1).


Notes:
• t statistics reported in STATA and other statistical software
corresponds to the case β ∗ = 0.
• For Ha : βi > β ∗ , we reject H0 if t > tα (n − p − 1).
• For Ha : βi < β ∗ , then we reject H0 if t < −tα (n − p − 1).
Testing for significance

Example 5. From the result of example 1, we want to test if TestScore


is significant, using the significant level 5%.
• Hypotheses:
H0 : β2 = 0,
Ha : β2 ̸= 0.
• Test Statistics:

β̂2 .2508854
t= = = 3.24
se(β̂2 ) .0773541

(this t statistics has already been computed by STATA).


• tα/2 (n − p − 1) = t0.025 (17) = 2.11.
Since |t| > tα/2 (n − p − 1), we cannot reject H0 .
Alternatively, we can compare the corresponding p-value computed
by STATA (0.005) to α = 5% to conclude that H0 should be rejected.
Testing for significance

Example 6. From the result of example 1, we want to test if one more


year of experience produces $1000 more dollars in income, using the
significant level 5%.
• Hypotheses:
H0 : β1 = 1,
Ha : β1 ̸= 1.
• Test Statistics:

β̂1 − 1 1.403902 − 1
t= = = 2.03.
se(β̂1 ) 0.1985669

• tα/2 (n − p − 1) = t0.025 (17) = 2.11.


Since |t| < tα/2 (n − p − 1), we cannot reject H0 .
Testing for significance
t Test for a linear combination of parameters
The same procedure can be used to test for a linear combination of
parameters
r T β := r0 β0 + r1 β1 + · · · + rp βp .
For given (r0 , r1 , . . . , rp ) and β ∗ .
1 Hypotheses:
H0 : r T β = β ∗ ,
Ha : r T β ̸= β ∗ .
2 Test Statistics:
r T β̂ − β ∗
t= .
se(r T β̂)

3 Rejection Rule: Reject H0 if p-value < α or if |t| > tα/2 (n − p − 1).


Notes: se(r T β̂) can be computed using the variance rule
s X sX X
se(r T β̂) = Var ( ri β̂i ) = ri2 Var (β̂i ) + 2 ri rj Cov(β̂i , β̂j ).
i i i<j
Interval estimation
Confident interval for βi
Interval estimate for the mean value of βi with confident level 1 − α is

β̂i − tα/2 (n − p − 1)se(β̂i ), β̂i + tα/2 (n − p − 1)se(β̂i ) .

Example 7. From the result of example 1, we want to find the 95%


confident interval for β1 (the coefficient of Experience).
The error is

ε = tα/2 (n − p − 1)se(β̂1 ) = t0.025 (17)se(β̂1 )


= 2.11 · 0.1985669 = 0.419.

The 95% confident interval for β1 is



β̂1 − ε, β̂1 + ε = (1.4039−0.419, 1.4039+0.419) = (0.9849, 1.8229).

Note: this confident interval has already been computed by STATA.


Prediction
Prediction
Let  
1
 x10 
x20
 
X0 =  ,
 
 .. 
 . 
xp0
we want to predict value y0 of y.
• Point estimate of y0 :

ŷ0 = β̂0 + β̂1 x10 + β̂2 x20 + · · · + β̂p xp0 .

• Interval estimate of y0 :

ŷ0 − tα/2 (n − p − 1)se(ŷ0 ), ŷ0 + tα/2 (n − p − 1)se(ŷ0 ) ,
q −1
where se(ŷ0 ) = σŷ20 where σŷ20 ≈ s2 X0T X T X X0 .
Information criteria
• The adjusted multiple coefficient of determination

n−1 ESS/(n − p − 1)
Ra2 = R 2 := 1 − (1 − R 2 ) =1−
n−p−1 TSS/(n − 1)

(higher is better).
• Akaike information criterion

ESS 2(p+1)
AIC = e n
n

(smaller is better).
• Schwarz information criterion (BIC/SC)

ESS p+1
BIC = n n
n

(smaller is better).
Information criteria
Example 8. A real estate company investigates the prices of
apartments for young families. They use the following regression
model:

PRICE = β0 + β1 SQFT + β2 BEDRMS + β3 BATHS + ε,

where
• PRICE: price of the
apartment (in thousands
dollars),
• SQFT: area (in square feet),
• BEDRMS: number of
bedrooms,
• BATHS: number of
bathrooms.
Find the best linear model.
Information criteria
Information criteria
Information criteria

Conclusion: we should use model 3 (PRICE = β0 + β1 SQFT ).


Wald test

Wald test is an extension of F test to test for the significance of a


group of j independent variables (2 ≤ j ≤ p).
1 Hypotheses:
H0 : βp−j+1 = βp−j+2 = · · · = βp = 0,
Ha : One or more of βp−j+1 , βp−j+2 , . . . , βp are not equal to zero.
2 Test Statistics:

(ESSR − ESSU )/j (RU2 − RR2 )/j


F = = ,
ESSU /(n − p − 1) (1 − RU2 )/(n − p − 1)

where R is the restricted model (βp−j+1 = βp−j+2 = · · · = βp = 0)


and U is the unrestricted model.
3 Rejection Rule: Reject H0 if p-value < α or if F ≥ Fα (j, n − p − 1).
Note: If we choose all j = p independent variables, we obtain the
original F test.
Wald test

Example 9. After fitting the model

PRICE = β0 + β1 SQFT + β2 BEDRMS + β3 BATHS + ε,

we want to test for the significance of the group BEDRMS, BATHS.

We cannot reject the null hypothesis H0 : β2 = β3 = 0 at 5%


significant level. This confirms that we should use the model
PRICE = β0 + β1 SQFT + ε.
Wald test
We can also test on a linear combination of j independent variables.

Example 10. After fitting the model

PRICE = β0 + β1 SQFT + β2 BEDRMS + β3 BATHS + ε,

we want to test for the hypothesis


H0 : β2 + β3 = 0,
Ha : β2 + β3 ̸= 0.

We cannot reject the null hypothesis at 5% significant level.

You might also like