0% found this document useful (0 votes)
20 views

Logistic Regression

Logistic regression is used when the dependent variable is binary. It violates assumptions of linear regression by allowing the error term to only take on two values. Weight of evidence and information value are used to measure the strength of predictor variables, with higher values indicating a stronger relationship. Dummy variables are used to code categorical predictors.

Uploaded by

tedom14127
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Logistic Regression

Logistic regression is used when the dependent variable is binary. It violates assumptions of linear regression by allowing the error term to only take on two values. Weight of evidence and information value are used to measure the strength of predictor variables, with higher values indicating a stronger relationship. Dummy variables are used to code categorical predictors.

Uploaded by

tedom14127
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Logistic Regression

Why do we ever need Logistic Regression?

Violates the assumption of Linear Regression!

Assumption says that the residulas should be normally distributed.

The error term can only take on two values, hence it's impossible for it to
have a normal distribution.

Violates the assumption of Homoscedasticity!

Homoscedasticity describes a situation in which the error term is the


same across all values of the independent variables.
Logistic Regression
Odds
Weight of Evidence (WoE) and Information Value (IV)

Weight of Evidence
The Weight of Evidence or WoE value is a widely used measure of the “strength” of a
grouping for separating good and bad risk (default). It is computed from the basic
odds ratio:

Information Value (IV)


The Information Value (IV) of a predictor is related to the sum of the
(absolute) values for WoE over all groups.
Weight of Evidence (WoE) and Information Value (IV)

According to Siddiqi (2006), by convention the values of the IV statistic can be interpreted as follows. If
the IV statistic is:
•Less than 0.02, then the predictor is not useful for modeling (separating the Goods from the Bads)
•0.02 to 0.1, then the predictor has only a weak relationship to the Goods/Bads odds ratio
•0.1 to 0.3, then the predictor has a medium strength relationship to the Goods/Bads odds ratio
•0.3 or higher, then the predictor has a strong relationship to the Goods/Bads odds ratio.

Indicates a weak relationship to the binary dependent variable.


What are Dummy Variable, Design Variable, Boolean
Indicators and Proxies?

These are all the synonyms for dummy variable


Categorical Variables – Male / Female, High Low Bank Bal etc

They are coded with 1 and 0


Class Class_Dummy1 Class_Dummy2
1 1 0
1 1 0
1 1 0
2 0 1
2 0 1
2 0 1
3 0 0
3 0 0
3 0 0
Results and Interpretation

Independent p value interpretation – p value less than 0.05 (alpha)


should be retained in the model, else remove them from the model!

Analysis of Maximum Likelihood Estimates

Parameter DF Estimate Standard Error Wald Chi-Square Pr > ChiSq


Intercept 1 "-2.6516" 0.6748 15.4424 <.0001

blackd 1 0.5952 0.3939 2.2827 0.1308


whitvic 1 0.2565 0.4002 0.4107 0.5216
serious 1 0.1871 0.0612 9.3342 0.0022
Baseline, R Square and Max-rescaled R square and C

What is R square?
R square of Logistic Regression?

How much the goodness of fit improves!!

C statistics – based on receiver operating characteristic (ROC)


curve
Ranges from 0.5 to 1; closer to 1 better the model
Gini – 2*C statistics -1
Ranges from 0 to 1; closer to 1 better the model
Check Multicollinearity!!

Check the VIF / Tolerance to detect the


multicollinearity!!
Results and Interpretation – Classification Table

Correct Incorrect Percentages


Prob Non- Non- Sensi- Speci- FALSE FALSE
Level Event Event Event Event Correct tivity ficity POS NEG
0.05 30 47 23 0 77 100 67.1 43.4 0
0.1 30 53 17 0 83 100 75.7 36.2 0
0.15 30 55 15 0 85 100 78.6 33.3 0
0.2 30 60 10 0 90 100 85.7 25 0
0.25 29 61 9 1 90 96.7 87.1 23.7 1.6
0.3 25 62 8 5 87 83.3 88.6 24.2 7.5
0.35 23 62 8 7 85 76.7 88.6 25.8 10.1
0.4 23 63 7 7 86 76.7 90 23.3 10
0.45 23 63 7 7 86 76.7 90 23.3 10
0.5 23 63 7 7 86 76.7 90 23.3 10

Higher sensitivity and specificity indicates better fit.


Results and Interpretation – Predicted Probability

Obs CURED INTERVENTION DURATION _LEVEL_ pred

1 0 0 7 1 0.42812

2 0 0 7 1 0.42812

3 0 0 6 1 0.43004

4 1 0 8 1 0.42621

5 1 1 7 1 0.71991

6 1 0 6 1 0.43004
Logistic Regression – KS Stat
KS lies between 0 – 1
Closer to 1 better the model

You might also like