New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add functions for calculating log-likelihood and null log-likelihood #25982
Comments
You may initiate a PR on this for review? |
There already is a |
Thank you for your comment @betatim . I want to measure the goodness of fit of a logistic regression model. However, I think the log_loss function measures the difference between the predicted probabilities and the actual outcomes. Alternatively, can we have a metric to directly calculate the Pseudo R-square? |
If I look at the code of your |
Thanks @betatim for your guidance. I'm new to the field so that was immensely helpful. I used the following code for my case from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X, y)
y_pred_proba = model.predict_proba(X)[:, 1]
from sklearn.metrics import log_loss
ll_null = log_loss(y, [y.mean()] * len(y))
ll_model = log_loss(y, y_pred_proba)
pseudo_r2 = 1 - ll_model / ll_null However, I was wondering if we have an inbuilt function to compute pseudo R-squared, if not what might be the reason of not having it? |
I don't know the answer to that. I think there isn't a lot of "goodness of fit" tooling in scikit-learn because there are other libraries that do that. |
I think the reason for this may be that there are different approaches to computing pseudo R-squared. Moreover, different pseudo-R-squared measures can have different interpretations and assumptions, which may not always apply in a particular context. However, since sci-kit learn already has r2_score, IMO this library should also have pseudo R-squared for logistic regression (allowing the user to choose from indices) for consistency. I'd be more than happy to work on it. |
Metrics in the @lorentzenchr What do you think of providing a pseudo R-squared to the models in |
Such a metric
Note that a „good“ GOF does exactly that: it measures some kind of „distance“ between observations and predictions. I think we can close then? |
@lorentzenchr Okay that makes sense. I'm closing this issue as a duplicate of #20943. |
Thanks @thomasjpfan for bringing this up and your time. I have also not seen this metric being applied to any tree based model. However I have a small concern. As @lorentzenchr suggested-
I cannot locate any literature about it (I also noticed other people facing the same issue). If you could help me a little I could take shot at it. |
@SinghAnkur28 There is literature and if you are still interested, I can point you to it. |
Describe the workflow you want to enable
As a user of Scikit-learn, I want to be able to calculate the McFadden's pseudo R-squared for a binary logistic regression model for that we need log-likelihood and null log-likelihood.
Describe your proposed solution
I use the following functions and I propose to add them in the library as well.
For Log Likelihood -
For Null Log Likelihood -
Describe alternatives you've considered, if relevant
One alternative to adding these functions to Scikit-learn would be for users to use statsmodels.
However, adding these functions to Scikit-learn would make it easier for users to calculate log-likelihood and null log-likelihood within the Scikit-learn ecosystem and would provide a standardized implementation.
Additional context
No response
The text was updated successfully, but these errors were encountered: