0% found this document useful (0 votes)
23 views5 pages

Bdba Notes

Uploaded by

ayshaf748
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views5 pages

Bdba Notes

Uploaded by

ayshaf748
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

BDBA NOTES

Homoscedasticity and heteroscedasticity are terms used in regression analysis to describe the
variance of errors in a regression model.

Homoscedasticity:

In a regression model, homoscedasticity means that the variance of the errors (or residuals) is
constant across all levels of the predictor variables. In simpler terms, it implies that the spread
of the residuals is consistent as you move along the regression line.

Visually, when you plot the residuals against the predicted values, a pattern of equally spread
points around the horizontal line (zero) suggests homoscedasticity. It indicates that the
variability of the residuals doesn’t change significantly across the range of predicted values.

For example, the time taken for an ice cube to melt depends on the temperature. Here, the
temperature is the independent variable, and the time is the dependent one.

Heteroscedasticity:

Heteroscedasticity, on the other hand, occurs when the variance of the residuals is not constant
across different levels of the predictor variables. This means that the spread of the residuals
varies along the range of predicted values.

Visually, in a plot of residuals against predicted values, heteroscedasticity appears as a funnel-


like shape, where the spread of residuals widens or narrows as the predicted values increase or
decrease, respectively. It implies that the model might be better at predicting some ranges of
values than others, leading to unequal variability in the residuals.
Heteroscedasticity violates one of the assumptions of classical linear regression, which
assumes constant variance of errors across all levels of the predictor variables.

What is R Software?
R is a programming language and free software developed by Ross Ihaka
and Robert Gentleman in 1993. R possesses an extensive catalog of
statistical and graphical methods. It includes machine learning algorithms,
linear regression, time series, statistical inference to name a few. Most of
the R libraries are written in R, but for heavy computational tasks, C, C++
and Fortran codes are preferred.

R is not only entrusted by academic, but many large companies also use R
programming language, including Uber, Google, Airbnb, Facebook and so
on.

Data analysis with R is done in a series of steps; programming,


transforming, discovering, modeling and communicate the results

 Program: R is a clear and accessible programming tool


 Transform: R is made up of a collection of libraries designed
specifically for data science
 Discover: Investigate the data, refine your hypothesis and analyze
them
 Model: R provides a wide array of tools to capture the right model for
your data
 Communicate: Integrate codes, graphs, and outputs to a report with
R Markdown or build Shiny apps to share with the world
Q) How can we handle the impact of multi collinearity? Write any two methods

multicollinearity refers to high correlation between independent variables in a regression


model, which can cause issues in interpreting the model coefficients and can reduce the
reliability of the model's predictions. Here are two common methods to handle
multicollinearity:

Feature Selection or Dimensionality Reduction:

Remove highly correlated variables: Identify and remove one of the variables in a pair or set
of variables that are highly correlated. This helps in reducing multicollinearity by eliminating
redundant information.

Principal Component Analysis (PCA): PCA is a technique used to transform the original
variables into a smaller set of uncorrelated variables called principal components. It helps in
reducing the dimensionality of the data while preserving most of the variability present in the
dataset.

Regularization Techniques:
Ridge Regression: Ridge regression adds a penalty term to the regression equation that shrinks
the coefficients of correlated variables towards zero. This helps in reducing the impact of
multicollinearity by stabilizing the coefficients.

Lasso Regression: Similar to Ridge, Lasso adds a penalty term, but Lasso has the additional
property of performing variable selection by forcing some of the coefficients to be exactly zero.
It can automatically eliminate some variables, effectively reducing multicollinearity.

You might also like