0% found this document useful (0 votes)
21 views

slides-3-iu

Multicollinearity occurs when regressors in a regression model are highly correlated, which can lead to unreliable coefficient estimates and inflated standard errors. It can arise from inherent relationships between variables, repeated measures, sampling issues, or mathematical derivations. Detection methods include examining correlation matrices, variance inflation factors (VIF), and auxiliary regressions, while remedies involve restructuring the model or dropping correlated regressors.

Uploaded by

Ngô Trâm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

slides-3-iu

Multicollinearity occurs when regressors in a regression model are highly correlated, which can lead to unreliable coefficient estimates and inflated standard errors. It can arise from inherent relationships between variables, repeated measures, sampling issues, or mathematical derivations. Detection methods include examining correlation matrices, variance inflation factors (VIF), and auxiliary regressions, while remedies involve restructuring the model or dropping correlated regressors.

Uploaded by

Ngô Trâm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

MULTICOLLINEARITY

Trương Đăng Thụy


[email protected]
Definition
Sources of
MULTICOLLINEARITY multicollinearity
Detection
Remedy
COLLINEARITY
▪ One of the assumptions of the classical linear regression model is that
there is no perfect linear relationship among the regressors.
▪ If there are one or more such relationships among the regressors, we
call it multicollinearity, or collinearity for short.
▪ Perfect collinearity: A perfect linear relationship between two variables.
▪ Imperfect collinearity: The regressors are collinear, but not perfectly.

▪ Multicollinearity refers to the case where regressors are highly collinear


PERFECT COLLINEARITY
▪ If two regressors are perfectly collinear, one of them must be
dropped
▪ For example: 𝑦 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 where 𝑋2 = 2𝑋1
▪ The model becomes 𝑦 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 2𝑋1 , or 𝑦 = 𝛽0 + 𝛽1 + 2𝛽2 𝑋1 ,
which can be expressed as
𝑦 = 𝛽0 + 𝛾𝑋1 where 𝛾 = 𝛽1 + 2𝛽2 .
▪ Given a value of 𝛾, there are multiple combinations of 𝛽1 and 𝛽2 . For
example 𝛾 = 1:
▪ if 𝛽1 = 2, then 𝛽2 = −0.5
▪ if 𝛽1 = 3, then 𝛽2 = −1
▪…
▪ There is no unique solution and thus one of them must be dropped.
SOURCES OF MULTICOLLINEARITY
▪ Inherent relationships: Some variables may naturally be correlated,
such as:
▪ education and income
▪ inputs (labor, capital) of the production function
▪ Repeated measures: Using multiple measures for the same concept,
or closely related concepts. Example: assets and income.
▪ Sampling issue: Collecting data from a population where certain
variables naturally co-vary can introduce multicollinearity.
▪ Mathematical derivation: If one variable is derived from another,
such as using a variable and its square.
CONSEQUENCES
▪ The OLS estimators are still BLUE, but one or more regression
coefficients have large standard errors relative to the values of the
coefficients, thereby making the 𝑡 ratios small.
▪ Even though some regression coefficients are statistically insignificant,
the 𝑅2 value may be very high.
▪ Therefore, one may conclude (misleadingly) that the true values of these
coefficients are not different from zero.
▪ Also, the regression coefficients may be very sensitive to small changes
in the data, especially if the sample is relatively small.
▪ In some cases, wrong expected signs of estimated coefficients.
VARIANCE INFLATION FACTOR
▪ For the following regression model:
𝑦𝑖 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + 𝑢
▪ It can be shown that:
𝜎2 𝜎2 1 𝜎2
𝑣𝑎𝑟 𝛽መ1 = = 2 = 2 𝑉𝐼𝐹
σ 𝑥12 1 − 𝑟12
2 2
σ 𝑥1 1 − 𝑟12 σ 𝑥1
and
𝜎2 𝜎2 1 𝜎2
𝑣𝑎𝑟 𝛽መ2 = = 2 = 2 𝑉𝐼𝐹
σ 𝑥22 1 − 𝑟12
2 2
σ 𝑥2 1 − 𝑟12 σ 𝑥2
where 𝜎 2 is the variance of the error term 𝑢𝑖 , and 𝑟23 is the coefficient of correlation
between 𝑋2 and 𝑋3 .
VARIANCE INFLATION FACTOR
1
𝑉𝐼𝐹 = 2
1 − 𝑟23
▪ 𝑽𝑰𝑭 is a measure of the degree to which the variance of
the OLS estimator is inflated because of multicollinearity.
DETECTING
MULTICOLLINEARITY
Correlaton matrix
Auxiliary regression
VIF
EXAMPLE: HOUSEHOLD EXPENDITURE
SURVEY DATA OF MARRIED COUPLES 2020 IN HCM

▪ [DEP VAR] expense: household expenditure (mil. VND/month)


▪ income: household monthly income (mil. VND/month)
▪ age_wife: age of the wife (or female partner)
▪ age_husband: age of the husband (or male partner)
▪ hhsize: Household size (members)
▪ children: % children in the household

▪ Data source: https://kinhteluong.online/esdata/iu/mcl.csv


SUMMARY STATISTICS
HOUSEHOLD EXPENDITURE: OLS REGRESSION
DETECTION OF MULTICOLLINEARITY
▪ Wrong expected sign but high 𝑅2
▪ High 𝑅2 but few significant 𝑡 ratios
▪ High pair-wise correlations among or regressors (check the
correlation matrix)
▪ Significant 𝐹 test for auxiliary regressions (regressions of each
regressor on the remaining regressors) or 𝑅2 of auxiliary
regression is higher than the regression between 𝑌 and 𝑋𝑠
▪ High variance inflation factor: 𝑉𝐼𝐹 > 5 (or 10)
CORRELATION MATRIX

▪ High correlation coefficients (usually believed to be ±0.8) is a sign of


multicollinearity, not a confirmation
▪ On the other hand, low correlation coefficients does not assure that there
is no multicollinearity.
AUXILIARY REGRESSION
▪ Hgh R2 is also a sign of
multicollinearity, not a
confirmation.
▪ On the other hand, low
R2 does not assure that
there is no
multicollinearity.
VARIANCE INFLATING FACTOR

▪ Recall that VIF indicates the degree to which regressors are collinear.
▪ As a rule of thumb, VIF of 5 or higher is considered severe, and thus
confirm multicollinearity.
SOLUTIONS FOR
MULTICOLLINEARITY
SOLUTIONS
▪ General Rules of Thumb: DO NOT WORRY IF
▪ coefficients are statistically significant
▪ correct expected signs for coefficients

▪ Otherwise:
▪ restructure the model (transform regressors)
▪ drop correlated regressors
RESTRUCTURING THE MODEL
▪ There may be alternative specifications or alternative
functional forms
▪ Example: production function
𝑦 = 𝐹(𝑙𝑎𝑏𝑜𝑟, 𝑙𝑎𝑛𝑑, 𝑐𝑎𝑝𝑖𝑡𝑎𝑙)
▪ Solution:
𝑦 𝑙𝑎𝑏𝑜𝑟 𝑐𝑎𝑝𝑖𝑡𝑎𝑙
= 𝐹( , 𝑙𝑎𝑛𝑑, )
𝑙𝑎𝑛𝑑 𝑙𝑎𝑛𝑑 𝑙𝑎𝑛𝑑
TRANSFORMING REGRESSOR
DROPPING
CORRELATED REGRESSORS
DROPPING
CORRELATED REGRESSORS

You might also like