C2 English
C2 English
Multiple Regression
(Course: Econometrics)
Phuong Le
3 Model selection
Information criteria
Wald test
Introduction
Multiple regression model
The equation that describes how the dependent variable y is related
to p independent variables x1 , x2 , . . . , xp and an error term ε is:
y = β0 + β1 x1 + β2 x2 + ... + βp xp + ε,
where
• β0 , β1 , . . . , βp are the parameters (there is k = p + 1 parameters),
• ε is a random variable called the error term.
The equation for i-th observation of the population:
yi = β0 + β1 x1i + β2 x2i + ... + βp xpi + εi , i = 1, 2, . . . , N.
E(y) = β0 + β1 x1 + β2 x2 + ... + βp xp .
Introduction
Matrix representation
Y = X β + ε,
where
1 x11 x21 xp1
... y1
1 x12 x22 xp2
... y2
X = . .. .. .. , Y = .. ,
.. . . . .
1 x1n x2n . . . xpn yn
β0 ε1
β1 ε2
β = . , ε = .
.. ..
βp εn
Representation in matrix
Y = X β̂ + e,
where
β̂0 e1
β̂1 e2
β̂ = . , e = .
.. ..
β̂p en
with n being the sample size and ei = yi − ŷi .
Some multiple regression functions
ln yi = β0 + β1 ln x1i + β2 ln x2i + εi .
yi = β0 + β1 xi + β2 xi2 + εi .
This is a multiple regression model for y, x and x 2 .
Least squares method
Least squares criterion
X X X 2
ei2 = (yi −ŷi )2 = yi − β̂0 − β̂1 x1i − β̂2 x2i − · · · − β̂p xpi → min .
where
• Salary: annual salary ($1000s),
• Experience: years of experience,
• TestScore: score on programmer aptitude test.
Least squares method
STATA code: regress Salary Experience TestScore
Result:
RSS
R2 = .
TSS
n−1
Ra2 = R 2 := 1 − (1 − R 2 ) .
n−p−1
Var (ε|x1 , . . . , xp ) = σ 2 In ,
Let cii be the item at the cell (i, i) of the matrix (X T X )−1 , then
ESS
where s2 = MSE = n−p−1 . Hence we can estimate σβ̂i by
√
se(β̂i ) = s cii .
β̂2 .2508854
t= = = 3.24
se(β̂2 ) .0773541
β̂1 − 1 1.403902 − 1
t= = = 2.03.
se(β̂1 ) 0.1985669
• Point estimate for y0 : ŷ0 = β̂0 + β̂1 x10 + β̂2 x20 + · · · + β̂p xp0 .
• Interval estimate for the mean of y0 :
ŷ0 − tα/2 (n − p − 1)se(ŷ0 ), ŷ0 + tα/2 (n − p − 1)se(ŷ0 ) ,
q
−1
where se(ŷ0 ) = s X0T (X T X )
X0 .
• Interval estimate for the individual value of y0 (prediction interval):
ŷ0 − tα/2 (n − p − 1)se(y0 − ŷ0 ), ŷ0 + tα/2 (n − p − 1)se(y0 − ŷ0 ) ,
q
−1 p
where se(y0 − ŷ0 ) = s 1 + X0T (X T X ) X0 = s2 + (se(ŷ0 ))2 .
Information criteria
• The adjusted multiple coefficient of determination
n−1 ESS/(n − p − 1)
Ra2 = R 2 := 1 − (1 − R 2 ) =1−
n−p−1 TSS/(n − 1)
(higher is better).
• Akaike information criterion
ESS 2(p+1)
AIC = e n
n
(smaller is better).
• Schwarz information criterion (BIC/SC)
ESS p+1
BIC = n n
n
(smaller is better).
Information criteria
Example 8. A real estate company investigates the prices of
apartments for young families. They use the following regression
model:
where
• PRICE: price of the
apartment (in thousands
dollars),
• SQFT: area (in square feet),
• BEDRMS: number of
bedrooms,
• BATHS: number of
bathrooms.
Find the best linear model.
Information criteria
Information criteria
Information criteria