0% found this document useful (0 votes)
59 views

C2 English

Uploaded by

vidt234081e
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views

C2 English

Uploaded by

vidt234081e
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Chapter 2.

Multiple Regression
(Course: Econometrics)

Phuong Le

Faculty of Economic Mathematics


University of Economics and Law
Vietnam National University, Ho Chi Minh City
Content

1 Multiple regression model


Introduction
Least squares method
Multiple coefficient of determination

2 Hypothesis testing and interval estimation


Testing for significance
Interval estimation

3 Model selection
Information criteria
Wald test
Introduction
Multiple regression model
The equation that describes how the dependent variable y is related
to p independent variables x1 , x2 , . . . , xp and an error term ε is:

y = β0 + β1 x1 + β2 x2 + ... + βp xp + ε,

where
• β0 , β1 , . . . , βp are the parameters (there is k = p + 1 parameters),
• ε is a random variable called the error term.
The equation for i-th observation of the population:
yi = β0 + β1 x1i + β2 x2i + ... + βp xpi + εi , i = 1, 2, . . . , N.

Multiple Regression Equation


The equation that describes how the mean value of y is related to
x1 , x2 , . . . , xp is:

E(y) = β0 + β1 x1 + β2 x2 + ... + βp xp .
Introduction
Matrix representation

Y = X β + ε,
where    
1 x11 x21 xp1
... y1
1 x12 x22 xp2 
... y2 
X = . .. .. ..  , Y =  ..  ,
   
 .. . . .  .
1 x1n x2n . . . xpn yn
   
β0 ε1
 β1   ε2 
β =  . , ε =  . 
   
 ..   .. 
βp εn

Note: (x1j , x2j , . . . , xpj , yj ) is the j-th observation for j = 1, 2, . . . , N (N


is the population size).
Introduction
Estimated Multiple Regression Equation

ŷ = β̂0 + β̂1 x1 + β̂2 x2 + ... + β̂p xp .


A simple random sample is used to compute sample statistics
β̂0 , β̂1 , . . . , β̂p that are used as the point estimators of the parameters
β0 , β1 , . . . , βp .

Representation in matrix

Y = X β̂ + e,

where    
β̂0 e1
β̂1  e2 
β̂ =  .  , e =  . 
   
 ..   .. 
β̂p en
with n being the sample size and ei = yi − ŷi .
Some multiple regression functions

Cobb-Douglas prodution functions


Cobb-Douglas production functions are represented as

yi = β0 x1iβ1 x2iβ2 eεi ,

where yi : production, x1i : capital, x2i : labor, εi : error.


Cobb-Douglas production function can be transformed to

ln yi = β0 + β1 ln x1i + β2 ln x2i + εi .

This is a multiple regression model for ln y, ln x1 and ln x2 .


Quadratic regression function

yi = β0 + β1 xi + β2 xi2 + εi .
This is a multiple regression model for y, x and x 2 .
Least squares method
Least squares criterion

X X X 2
ei2 = (yi −ŷi )2 = yi − β̂0 − β̂1 x1i − β̂2 x2i − · · · − β̂p xpi → min .

Solving this, we get the OLS formula


−1
β̂ = X T X XTY.

Computation of coefficient values


• The formulas for the regression coefficients β̂0 , β̂1 , . . . , β̂p involve
the use of matrix algebra. We will rely on computer software
packages to perform the calculations.
• The emphasis will be on how to interpret the computer output
rather than on how to make the multiple regression
computations.
Least squares method

Example 1. A software firm collected data for a sample of 20


computer programmers. A suggestion was made that regression
analysis could be used to determine if salary was related to the years
of experience and the score on the firm’s programmer aptitude test.

The years of experience, score on the aptitude test, and


corresponding annual salary ($1000s) for a sample of 20
programmers is shown on the next slide.
Least squares method

Exper. Test Salary Exper. Test Salary


(Yrs.) Score ($1000s) (Yrs.) Score ($1000s)
4 78 24.0 9 88 38.0
7 100 43.0 2 73 26.6
1 86 23.7 10 75 36.2
5 82 34.3 5 81 31.6
8 86 35.8 6 74 29.0
10 84 38.0 8 87 34.0
0 75 22.2 4 79 30.1
1 80 23.1 6 94 33.9
6 83 30.0 3 70 28.2
6 91 33.0 3 89 30.0
The data table on years of experience, test scores, and salary.
Least squares method

Suppose we believe that salary (Salary) is related to the years of


experience (Experience) and the score on the programmer aptitude
test (TestScore) by the following regression model

Salary = β0 + β1 Experience + β2 TestScore + ε,

where
• Salary: annual salary ($1000s),
• Experience: years of experience,
• TestScore: score on programmer aptitude test.
Least squares method
STATA code: regress Salary Experience TestScore

Result:

Estimated Regression Equation

\ = 3.174 + 1.404 · Experience + 0.251 · TestScore.


Salary

(Note: Predicted salary will be in thousands of dollars.)


Least squares method
Interpretation of parameters
In multiple regression analysis, we interpret each regression
coefficient as follows: β̂i represents an estimate of the change in y
corresponding to one unit increase in xi when all other independent
variables are held constant.

Example 2. Interpretation of parameters in example 1:

\ = 3.174 + 1.404 · Experience + 0.251 · TestScore.


Salary

• Salary is expected to increase by $1,404 for each additional year


of experience (when the variable score on programmer attitude
test is held constant).
• Salary is expected to increase by $251 for each additional point
scored on the programmer aptitude test (when the variable years
of experience is held constant).
Multiple coefficient of determination
• Total Sum of Squares
X X
TSS = (yi − y)2 = yi2 − ny 2 = Y T Y − ny 2 .

• Sum of Squares due to Regression


X X
RSS = (ŷi − y)2 = xi2 − nx 2 .

• Sum of Squares due to Errors


X
ESS = (ŷi − yi )2 = β̂ T X T Y − ny 2 .

Relationship Among SST, SSR, SSE: TSS = ESS + RSS.


Multiple Coefficient of Determination:

RSS
R2 = .
TSS

These value can be found in ANOVA Output of the regression result.


Multiple coefficient of determination
Adjusted Multiple Coefficient of Determination
• Adding independent variables, even ones that are not statistically
significant, causes the prediction errors to become smaller, thus
reducing ESS.
• Because RSS = TSS – ESS, when ESS becomes smaller, RSS
becomes larger, causing R 2 = RSSTSS to increase.
• The adjusted multiple coefficient of determination compensates
for the number of independent variables in the model

n−1
Ra2 = R 2 := 1 − (1 − R 2 ) .
n−p−1

Example 3. From the result of example 1:


RSS 500.32853
R2 = = = 0.8342.
TSS 599.7855
n−1 20 − 1
R 2 = 1 − (1 − R 2 ) = 1 − (1 − 0.8342) = 0.8147.
n−p−1 20 − 2 − 1
Testing for significance

Assumptions About the Error Term ε


• Assumption 1: The relationship between y and x1 , x2 , . . . , xp is
linear.
• Assumption 2: The errors εi are random variables with zero
mean
E(εi |x1i , . . . , xpi ) = 0.
• Assumption 3: The covariance matrix of ε has the form

Var (ε|x1 , . . . , xp ) = σ 2 In ,

where In is the identity matrix of size n.


Equivalently, Var (εi |x1i , . . . , xpi ) = σ 2 and
Cov(εi , εj |x1i , . . . , xpi , x1j , . . . , xpj ) = 0 for i ̸= j.
• Assumption 4: rank(X ) = p + 1. Equivalently, X T X is invertible.
Testing for significance

Gauss – Markov Theorem


When Assumptions 1 – 4 are satisfied, the estimates obtained using
the OLS method are the best linear unbiased estimates of the
population regression function (OLS estimates are BLUE).

Assumptions about the errors ε


To perform hypothesis testing and interval estimation in the following
sections, we need an additional assumption (Assumption 5) as
follows:
• Assumption 5: The errors ε follow a multivariate normal
distribution
ε ∼ N(0, σ 2 In ).
In particular, εi ∼ N(0, σ 2 ).
Testing for significance
• In simple linear regression, the F and t tests provide the same
conclusion.
• In multiple regression, the F and t tests have different purposes.
Testing for Significance: F Test
• The F test is used to determine whether a significant relationship
exists between the dependent variable and the set of all the
independent variables.
• The F test is referred to as the test for overall significance.

Testing for Significance: t Test


• If the F test shows an overall significance, the t test is used to
determine whether each of the individual independent variables
is significant.
• A separate t test is conducted for each of the independent
variables in the model.
• We refer to each of these t tests as a test for individual
significance.
Testing for significance

F Test for Overall Significance


1 Hypotheses:
H0 : β1 = β2 = · · · = βp = 0 (R 2 = 0),
Ha : β12 + β22 + · · · + βp2 > 0 (R 2 > 0).
2 Test Statistics:
RSS/p (n − p − 1)R 2
F = = .
ESS/(n − p − 1) p(1 − R 2 )

3 Rejection Rule: Reject H0 if p-value < α or if


F ≥ Fα (p, n − p − 1).
RSS ESS
Notes: We usually denote MSR = p , MSE = n−p−1 . Hence
MSR
F = MSE .
Testing for significance

Example 4. From the result of example 1, we want to test for overall


significance of the model at the significant level 0.05.
• Hypotheses:
H0 : β1 = β2 = 0,
Ha : β1 ̸= 0 or β2 ̸= 0.
• Test Statistics:
RSS/p 500.32853/2
F = = = 42.76.
ESS/(n − p − 1) 99.4569697/17

• Fα (p, n − p − 1) = F0.05 (2, 17) = 3.592.


Since F > Fα (p, n − p − 1), we reject H0 .
Testing for significance

Estimation of standard errors for coefficents


The covariance matrix of β̂:

Var (β̂) = σ 2 (X T X )−1 .

Let cii be the item at the cell (i, i) of the matrix (X T X )−1 , then

σβ̂2 = σ 2 cii ≈ s2 cii ,


i

ESS
where s2 = MSE = n−p−1 . Hence we can estimate σβ̂i by

se(β̂i ) = s cii .

Note: To obtain the covariance matrix of β̂ from STATA, after fitting


the model, type the command:
matrix list e(V)
Testing for significance

t Test for Significance of Individual Parameters


For a given real number b.
1 Hypotheses:
H0 : βi = b, Ha : βi ̸= b.
2 Test Statistics:
β̂i − b
t= .
se(β̂i )

3 Rejection Rule: Reject H0 if p-value < α or if |t| > tα/2 (n − p − 1).


Notes:
• t statistics reported in STATA and other statistical software
corresponds to the case b = 0.
• For Ha : βi > b, we reject H0 if t > tα (n − p − 1).
• For Ha : βi < b, then we reject H0 if t < −tα (n − p − 1).
Testing for significance

Example 5. From the result of example 1, we want to test if TestScore


is significant, using the significant level 5%.
• Hypotheses:
H0 : β2 = 0, Ha : β2 ̸= 0.
• Test Statistics:

β̂2 .2508854
t= = = 3.24
se(β̂2 ) .0773541

(this t statistics has already been computed by STATA).


• tα/2 (n − p − 1) = t0.025 (17) = 2.11.
Since |t| > tα/2 (n − p − 1), we reject H0 .
Alternatively, we can compare the corresponding p-value computed
by STATA (0.005) to α = 5% to conclude that H0 should be rejected.
Testing for significance

Example 6. From the result of example 1, we want to test if one more


year of experience produces $1000 more dollars in income, using the
significant level 5%.
• Hypotheses:
H0 : β1 = 1, Ha : β1 ̸= 1.
• Test Statistics:

β̂1 − 1 1.403902 − 1
t= = = 2.03.
se(β̂1 ) 0.1985669

• tα/2 (n − p − 1) = t0.025 (17) = 2.11.


Since |t| < tα/2 (n − p − 1), we do not reject H0 .
Testing for significance
t Test for a linear combination of parameters
The same procedure can be used to test for a linear combination of
parameters
r T β := r0 β0 + r1 β1 + · · · + rp βp
for given (r0 , r1 , . . . , rp ) ∈ Rp and b ∈ R.
1 Hypotheses:
H0 : r T β = b, Ha : r T β ̸= b.
2 Test Statistics:
r T β̂ − b
t= .
se(r T β̂)

3 Rejection Rule: Reject H0 if p-value < α or if |t| > tα/2 (n − p − 1).


Notes: se(r T β̂) can be computed using the variance rule
s X sX X
se(r T β̂) = Var ( ri β̂i ) = ri2 Var (β̂i ) + 2 ri rj Cov(β̂i , β̂j ).
i i i<j
Interval estimation
Confident interval for βi
Interval estimate for the mean value of βi with confident level 1 − α is

β̂i − tα/2 (n − p − 1)se(β̂i ), β̂i + tα/2 (n − p − 1)se(β̂i ) .

Example 7. From the result of example 1, we want to find the 95%


confident interval for β1 (the coefficient of Experience).
The error is

ε = tα/2 (n − p − 1)se(β̂1 ) = t0.025 (17)se(β̂1 )


= 2.11 · 0.1985669 = 0.419.

The 95% confident interval for β1 is



β̂1 − ε, β̂1 + ε = (1.4039−0.419, 1.4039+0.419) = (0.9849, 1.8229).

Note: this confident interval has already been computed by STATA.


Prediction
Prediction
Given (x10 , . . . , xp0 ), we want to predict value y0 of y when
(x1 , . . . , xp ) = (x10 , . . . , xp0 ). Let

x10 x20 xp0 .



X0 = 1 ···

• Point estimate for y0 : ŷ0 = β̂0 + β̂1 x10 + β̂2 x20 + · · · + β̂p xp0 .
• Interval estimate for the mean of y0 :

ŷ0 − tα/2 (n − p − 1)se(ŷ0 ), ŷ0 + tα/2 (n − p − 1)se(ŷ0 ) ,
q
−1
where se(ŷ0 ) = s X0T (X T X )
X0 .
• Interval estimate for the individual value of y0 (prediction interval):

ŷ0 − tα/2 (n − p − 1)se(y0 − ŷ0 ), ŷ0 + tα/2 (n − p − 1)se(y0 − ŷ0 ) ,
q
−1 p
where se(y0 − ŷ0 ) = s 1 + X0T (X T X ) X0 = s2 + (se(ŷ0 ))2 .
Information criteria
• The adjusted multiple coefficient of determination

n−1 ESS/(n − p − 1)
Ra2 = R 2 := 1 − (1 − R 2 ) =1−
n−p−1 TSS/(n − 1)

(higher is better).
• Akaike information criterion

ESS 2(p+1)
AIC = e n
n

(smaller is better).
• Schwarz information criterion (BIC/SC)

ESS p+1
BIC = n n
n

(smaller is better).
Information criteria
Example 8. A real estate company investigates the prices of
apartments for young families. They use the following regression
model:

PRICE = β0 + β1 SQFT + β2 BEDRMS + β3 BATHS + ε,

where
• PRICE: price of the
apartment (in thousands
dollars),
• SQFT: area (in square feet),
• BEDRMS: number of
bedrooms,
• BATHS: number of
bathrooms.
Find the best linear model.
Information criteria
Information criteria
Information criteria

Conclusion: we should use model 3 (PRICE = β0 + β1 SQFT ).


Wald test

Wald test is an extension of F test to test for the significance of a


group of j independent variables (2 ≤ j ≤ p).
1 Hypotheses:
H0 : βp−j+1 = βp−j+2 = · · · = βp = 0,
Ha : One or more of βp−j+1 , βp−j+2 , . . . , βp are not equal to zero.
2 Test Statistics:

(ESSR − ESSU )/j (RU2 − RR2 )/j


F = = ,
ESSU /(n − p − 1) (1 − RU2 )/(n − p − 1)

where R is the restricted model (βp−j+1 = βp−j+2 = · · · = βp = 0)


and U is the unrestricted model.
3 Rejection Rule: Reject H0 if p-value < α or if F ≥ Fα (j, n − p − 1).
Note: If we choose all j = p independent variables, we obtain the
original F test.
Wald test

Example 9. After fitting the model

PRICE = β0 + β1 SQFT + β2 BEDRMS + β3 BATHS + ε,

we want to test for the significance of the group BEDRMS, BATHS.

We cannot reject the null hypothesis H0 : β2 = β3 = 0 at 5%


significant level. This confirms that we should use the model
PRICE = β0 + β1 SQFT + ε.
Wald test
We can also test on a linear combination of j independent variables.

Example 10. After fitting the model

PRICE = β0 + β1 SQFT + β2 BEDRMS + β3 BATHS + ε,

we want to test for the hypothesis


H0 : β2 + β3 = 0,
Ha : β2 + β3 ̸= 0.

We cannot reject the null hypothesis at 5% significant level.

You might also like