Econometrics: Multicollinearity
Econometrics: Multicollinearity
Multicollinearity
Multicollinearity
! We start by considering the model with k-1 explanatory
variables,
Y = Xβ + e
w ith E [ e | X ] = 0 , V ar[ e | X ] = σ 2 I n and
X ( nxk ) = [x 1 x 2 … x n ]
x 1 = 1 x 2 = x 21 ... x k = xk1
1 x x
22 k2
M M M
1 x 2 n x kn
Perfect Multicollinearity
! In the situation just described, we say that there is
perfect multicollinearity.
! In this situation, the columns of the regressors matrix, X,
are linearly dependent, so
ra n k ( X ) < k ⇒ ra n k ( X ' X ) < k ⇒ X ' X = 0
and it is not possible to invert the X’X matrix (that is,
X’X is a singular matrix).
! In this case, it is not possible to estimate the regression
coefficients using the OLS rule. In fact, there is not a
unique solution for the normal equations
( X ' X ) βˆ = X ' y
Econometrics Patrícia Cruz 8-4
Example
! Example: Consider the model
y i = β 1 + β 2 x 2 i + β 3 x 3 i + ei
and lets assume that
x3i = λ x2 i with λ ≠ 0
Replacing in the model above, we obtain
y i = β 1 + β 2 x 2 i + β 3 ( λ x 2 i ) + ei
= β 1 + ( β 2 + λ β 3 ) x 2 i + ei
= β 1 + α x 2 i + ei w ith α = β 2 + λ β 3
Therefore, although using the OLS rule we can estimate
α uniquely, there is no way to estimate the impact of x2
on y (β2) and the impact of x3 on y (β3) uniquely.
Econometrics Patrícia Cruz 8-5
Example
! Example: Mathematically,
αˆ = βˆ 2 + λβˆ3
gives us only one equation in two unknowns (l is given)
and there is an infinity of solutions to this equation for
given values of αˆ and l.
In particular, if αˆ = 0 .8 a n d λ = 2 , we have
0 .8 = βˆ 2 + 2 βˆ 3 ⇔ βˆ 2 = 0 .8 − 2 βˆ 3
so, there is no unique solution forβ̂2 .
Near Multicollinearity
■ When there are nearly exact linear relationships between
the explanatory variables we say that there is near or
high multicollinearity.
■ In this case none of the assumptions of the classical
linear regression model is being violated. In fact, X has
k linearly independent columns – rank (X) = k – the X’X
matrix is non singular, and the matrix (X’X)−1 exists.
■ In the figure below we represent several degrees of
collinearity between x2 and x3. The circles represent the
variations in y (the dependent variable) and in x2 and x3
(the explanatory variables).
y y
x2 x3 x2
x3
No collinearity y
Low collinearity
x2 x3
High collinearity
Econometrics Patrícia Cruz 8-9
t
ftp.vtn-u
-
-
Not st significant
4) The outcome described in 3) occurs despite possible high
.
cant
You have really big differences
( precision )
←
output L
6. Son cetfital returns to sale
p, e
p,
=
Degree of
e. p, em t t 9 ers
en t
palmeri
t
f' " e"
P2 Ps
=
g,
=
here
\ If we strong
egjdeg÷%Yff
in the model
-
2) Combining cross-sectional and time-series data, or, *
( en y e) pit pin
Pt t ee
pooling the data. =
with en yet en gt -
pas
en It
model. Notice that in dropping a variable from a (d) Predict the bebevioe of
(
Def variable
model we may be committing a specification error. if
.
re rt
;
Globally
significant
÷÷÷÷÷÷÷÷÷÷i:
Econometrics Patrícia Cruz 8-15
Peo BLED
not significant
only do this if you're sure the variable is
4) ye fi Pz
-
model as
valid
→ The
use t ee i
t Ps i
§ KE
-
t fr
-
i
Let pi
-
t CE
D ee C
-
-
) Ps (
Us
-
use E
t
-
p Cuz E me
-
-
i
2
Ye
2
I
Ye
- -
Is Multicollinearity Necessarily Bad?
■ Despite the difficulties in isolating the effects of individual
variables, if the only purpose of regression analysis is
prediction or forecasting, then multicollinearity is not a
serious problem.
■ In fact, as long as the model has good explanatory power
(high R2 and a F statistic indicating that the model is
globally significant) and the structure of the collinear
relationship remains the same within the new sample
observations, accurate forecasts may still be possible.
■ If the objective of the analysis is not only prediction but
also obtain reliable estimates of the parameters, serious
multicollinearity will be a problem because of the large
variances of the OLS estimator.
Econometrics Patrícia Cruz 8-17