Bdba Notes
Bdba Notes
Homoscedasticity and heteroscedasticity are terms used in regression analysis to describe the
variance of errors in a regression model.
Homoscedasticity:
In a regression model, homoscedasticity means that the variance of the errors (or residuals) is
constant across all levels of the predictor variables. In simpler terms, it implies that the spread
of the residuals is consistent as you move along the regression line.
Visually, when you plot the residuals against the predicted values, a pattern of equally spread
points around the horizontal line (zero) suggests homoscedasticity. It indicates that the
variability of the residuals doesn’t change significantly across the range of predicted values.
For example, the time taken for an ice cube to melt depends on the temperature. Here, the
temperature is the independent variable, and the time is the dependent one.
Heteroscedasticity:
Heteroscedasticity, on the other hand, occurs when the variance of the residuals is not constant
across different levels of the predictor variables. This means that the spread of the residuals
varies along the range of predicted values.
What is R Software?
R is a programming language and free software developed by Ross Ihaka
and Robert Gentleman in 1993. R possesses an extensive catalog of
statistical and graphical methods. It includes machine learning algorithms,
linear regression, time series, statistical inference to name a few. Most of
the R libraries are written in R, but for heavy computational tasks, C, C++
and Fortran codes are preferred.
R is not only entrusted by academic, but many large companies also use R
programming language, including Uber, Google, Airbnb, Facebook and so
on.
Remove highly correlated variables: Identify and remove one of the variables in a pair or set
of variables that are highly correlated. This helps in reducing multicollinearity by eliminating
redundant information.
Principal Component Analysis (PCA): PCA is a technique used to transform the original
variables into a smaller set of uncorrelated variables called principal components. It helps in
reducing the dimensionality of the data while preserving most of the variability present in the
dataset.
Regularization Techniques:
Ridge Regression: Ridge regression adds a penalty term to the regression equation that shrinks
the coefficients of correlated variables towards zero. This helps in reducing the impact of
multicollinearity by stabilizing the coefficients.
Lasso Regression: Similar to Ridge, Lasso adds a penalty term, but Lasso has the additional
property of performing variable selection by forcing some of the coefficients to be exactly zero.
It can automatically eliminate some variables, effectively reducing multicollinearity.