0% found this document useful (0 votes)
82 views

Econometrics: Multicollinearity

Multicollinearity refers to a near perfect linear relationship between two or more explanatory variables in a regression model. Near multicollinearity does not violate the assumptions of the classical linear regression model but it does result in imprecise and unstable estimates with large standard errors. This makes it difficult to determine the individual impact of each variable and results in few or no variables being deemed statistically significant despite the overall model fit. Polynomial regressions are also susceptible to multicollinearity between the higher order terms. Multicollinearity can be identified through high correlation coefficients between variables and significant overall model fits but insignificant individual variables.

Uploaded by

Carlos Abeli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views

Econometrics: Multicollinearity

Multicollinearity refers to a near perfect linear relationship between two or more explanatory variables in a regression model. Near multicollinearity does not violate the assumptions of the classical linear regression model but it does result in imprecise and unstable estimates with large standard errors. This makes it difficult to determine the individual impact of each variable and results in few or no variables being deemed statistically significant despite the overall model fit. Polynomial regressions are also susceptible to multicollinearity between the higher order terms. Multicollinearity can be identified through high correlation coefficients between variables and significant overall model fits but insignificant individual variables.

Uploaded by

Carlos Abeli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Econometrics

Multicollinearity

Multicollinearity
! We start by considering the model with k-1 explanatory
variables,
Y = Xβ + e
w ith E [ e | X ] = 0 , V ar[ e | X ] = σ 2 I n and
X ( nxk ) = [x 1 x 2 … x n ]
x 1 = 1  x 2 =  x 21  ... x k =  xk1 
1  x  x 
   22   k2 
M  M   M 
     
1   x 2 n   x kn 

Econometrics Patrícia Cruz 8-2


Perfect Multicollinearity
! Originally the term multicollinearity meant the existence
of a “perfect”, or exact, linear relationship among
some of the explanatory variables of the model.

! For example, in our model with k explanatory variables,


we say that there is an exact linear relationship between
them if the following condition is satisfied:
λ1x1 + ... + λk x k = 0
where l1, l 2,…, lk are constants such that at least one of
them is different from zero. That is, at least one of the
→ PERFECT
explanatory variables can be expressed as a linear
MULTI COCCINEA NTI
combination of the other variables.
Econometrics Patrícia Cruz 8-3

Perfect Multicollinearity
! In the situation just described, we say that there is
perfect multicollinearity.
! In this situation, the columns of the regressors matrix, X,
are linearly dependent, so
ra n k ( X ) < k ⇒ ra n k ( X ' X ) < k ⇒ X ' X = 0
and it is not possible to invert the X’X matrix (that is,
X’X is a singular matrix).
! In this case, it is not possible to estimate the regression
coefficients using the OLS rule. In fact, there is not a
unique solution for the normal equations
( X ' X ) βˆ = X ' y
Econometrics Patrícia Cruz 8-4
Example
! Example: Consider the model
y i = β 1 + β 2 x 2 i + β 3 x 3 i + ei
and lets assume that
x3i = λ x2 i with λ ≠ 0
Replacing in the model above, we obtain
y i = β 1 + β 2 x 2 i + β 3 ( λ x 2 i ) + ei
= β 1 + ( β 2 + λ β 3 ) x 2 i + ei
= β 1 + α x 2 i + ei w ith α = β 2 + λ β 3
Therefore, although using the OLS rule we can estimate
α uniquely, there is no way to estimate the impact of x2
on y (β2) and the impact of x3 on y (β3) uniquely.
Econometrics Patrícia Cruz 8-5

Example
! Example: Mathematically,
αˆ = βˆ 2 + λβˆ3
gives us only one equation in two unknowns (l is given)
and there is an infinity of solutions to this equation for
given values of αˆ and l.

In particular, if αˆ = 0 .8 a n d λ = 2 , we have

0 .8 = βˆ 2 + 2 βˆ 3 ⇔ βˆ 2 = 0 .8 − 2 βˆ 3
so, there is no unique solution forβ̂2 .

Econometrics Patrícia Cruz 8-6


Near Multicollinearity
■ A different situation is when the explanatory variables
are strongly correlated but not perfectly so, as follows
λ1 x 1 + λ 2 x 2 + ... + λ k x k + v = 0
where ν is a vector with random errors.
■ If we assume, for example, that λ2 ≠ 0, we can write the
equation above as
λ λ λ 1
x 2 = − 1 x1 − 3 x 3 − ... − k x k − v
λ2 λ2 λ2 λ2
which shows that x2 is not an exact linear combination of
the other explanatory variables because it is also
determined by the error term ν.
Econometrics Patrícia Cruz 8-7

Near Multicollinearity
■ When there are nearly exact linear relationships between
the explanatory variables we say that there is near or
high multicollinearity.
■ In this case none of the assumptions of the classical
linear regression model is being violated. In fact, X has
k linearly independent columns – rank (X) = k – the X’X
matrix is non singular, and the matrix (X’X)−1 exists.
■ In the figure below we represent several degrees of
collinearity between x2 and x3. The circles represent the
variations in y (the dependent variable) and in x2 and x3
(the explanatory variables).

Econometrics Patrícia Cruz 8-8


Multicollinearity

y y

x2 x3 x2
x3

No collinearity y
Low collinearity

x2 x3

High collinearity
Econometrics Patrícia Cruz 8-9

Statistical Consequences of Near


Multicollinearity
■ Statistical consequences of near multicollinearity

1) Although BLUE, the OLS estimators have large


variances and covariances. In fact, when there are
nearly exact dependencies among the explanatory
variables, some elements of (X'X)−1 will be large and
so some elements of Var(βˆ | X ) = σ 2 (X'X)−1 will be large.

2) Large standard errors for the OLS estimators imply


that confidence intervals tend to be much wider and
that the information provided by the sample data
about the unknown parameters is relatively imprecise.

Econometrics Patrícia Cruz 8-10


to
③ Ko o vs Ha : pi
pi
: =

t
ftp.vtn-u
-
-

Statistical Consequences of Near


Multicollinearity
3) Because of the large estimated standard errors, it is likely
that the usual t tests will lead to the conclusion that the
parameter values are not significantly different from zero.
→ Fail to Reject Ho

Not st significant
4) The outcome described in 3) occurs despite possible high
.

R2 or “F-values” indicating that the model is globally


significant. In fact, collinear variables do not provide
enough information to estimate their separate effects, even
though economic theory, and their total effect, may
indicate their importance in the relationship.
5) The OLS estimators may be very sensitive to the addition
or deletion of a few observations, or the deletion of an
apparently insignificant variable.
Econometrics Patrícia Cruz 8-11

cant
You have really big differences
( precision )

Multicollinearity in Polynomial Regressions

■ Multicollinearity, as we have defined it, refers only to


linear relationships among the explanatory variables. It
does not rule out nonlinear relationships among them.
■ For example, in polynomial regressions the explanatory
variable(s) appear with various powers. These terms are
going to be correlated, making it difficult to estimate the
various slope coefficients with precision (although the
assumption of no multicollinearity is not violated).

Econometrics Patrícia Cruz 8-12


Identifying Multicollinearity
■ Identifying multicollinearity
1) High R2 and the F statistic indicating that the model is
globally significant, but the individual t tests showing that
none or very few of the partial slope coefficients are
statistically different from zero.

2) High sample correlation coefficients between pairs of → Lou should


explanatory variables. A commonly used rule is that a
correlation coefficient greater than 0.8 indicates a strong always
linear association and a potentially harmful collinear compute
relationship.
the core .

Econometrics Patrícia Cruz 8-13


of coefficient
when
you
here corn > 0.8 btdlvau.dk
Not &
enough ←
problems contain )
'T 't
of TWLTLCOCU NEAR
Corral , ,Xz)
( !
)
Identifying Multicollinearity
Notice , however, that high correlation coefficients are a
sufficient but not a necessary condition for the existence of
multicollinearity because it can exist although the
correlations coefficients are relatively low (of course that if
there are only two explanatory variables this measure will
suffice).
3) In order to find out which explanatory variable is related to
other explanatory variables we can regress each xj on the
remaining explanatory variables and compute the
correspondent R2, which we can designate as R2j. Each one
of these regressions is called an auxiliary regression. A rule
of thumb is that multicollinearity may be a problem only if
R2j is greater than the R2 in the initial model.
Econometrics Patrícia Cruz 8-14
ki Benzi exp bei E
Y pi
=


output L
6. Son cetfital returns to sale
p, e
p,
=
Degree of

e. p, em t t 9 ers
en t
palmeri
t
f' " e"
P2 Ps
=

g,
=

here
\ If we strong

egjdeg÷%Yff
in the model

Solutions to Collinear Data


( I p 3) Cruzi In Uzi
lnyi en
pi
t t
Ps
-
=

L New Deb - VAR CABLE

1) Using a priori information, that is, introducing in


Coni ?
-
-
'n
pi t
Ps
In
K÷ )
nonsample information in the form of linear
restrictions on the parameters. This a priori
other models
information could come from previous empirical work -
-
estimated
in which the collinearity problem is less serious or
htt
from economic theory. !
n
ye =p ,
t Pah Pt
t
Ps

-
2) Combining cross-sectional and time-series data, or, *

( en y e) pit pin
Pt t ee
pooling the data. =

with en yet en gt -
pas
en It

3) Dropping an explanatory variable(s) from the =

model. Notice that in dropping a variable from a (d) Predict the bebevioe of

(
Def variable
model we may be committing a specification error. if
.

re rt
;
Globally
significant

÷÷÷÷÷÷÷÷÷÷i:
Econometrics Patrícia Cruz 8-15
Peo BLED

not significant
only do this if you're sure the variable is

Solutions to Collinear Data

4) Transformation of the variables in the initial model.


5) Additional or new data. Since multicollinearity is a
sample feature, it is possible that in another sample
involving the same variables collinearity may not be
so serious as in the first sample. Sometimes simple
increasing the size of the sample may solve (attenuate)
the collinearity problem.

Econometrics Patrícia Cruz 8-16

bigley coerce Ted


t Kit t P3 ↳ it t ee → heard uz E are

4) ye fi Pz
-

model as
valid
→ The
use t ee i
t Ps i

§ KE
-

t fr
-

i
Let pi
-

t CE
D ee C
-
-

) Ps (
Us
-

use E
t
-

p Cuz E me
-
-
i
2
Ye
2
I
Ye
- -
Is Multicollinearity Necessarily Bad?
■ Despite the difficulties in isolating the effects of individual
variables, if the only purpose of regression analysis is
prediction or forecasting, then multicollinearity is not a
serious problem.
■ In fact, as long as the model has good explanatory power
(high R2 and a F statistic indicating that the model is
globally significant) and the structure of the collinear
relationship remains the same within the new sample
observations, accurate forecasts may still be possible.
■ If the objective of the analysis is not only prediction but
also obtain reliable estimates of the parameters, serious
multicollinearity will be a problem because of the large
variances of the OLS estimator.
Econometrics Patrícia Cruz 8-17

You might also like