The Wayback Machine - https://web.archive.org/web/20210705010248/https://github.com/scikit-learn/scikit-learn/issues/14444
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide an option to consider class prior in ComplementNB #14444

Open
qinhanmin2014 opened this issue Jul 23, 2019 · 11 comments
Open

Provide an option to consider class prior in ComplementNB #14444

qinhanmin2014 opened this issue Jul 23, 2019 · 11 comments

Comments

@qinhanmin2014
Copy link
Member

@qinhanmin2014 qinhanmin2014 commented Jul 23, 2019

Currently, in ComplementNB, we estimate class prior but do not use it. I think we can provide an option to consider class prior in ComplementNB (like other naive bayes algorithms in scikit-learn). Reasons:
(1) In the original paper, when proposing ComplementNB, the authors actually take class prior into consideration (see Section 3.1). When proposing the detailed implementation, the authors "use a uniform prior estimate for simplicity", because they think that the class probabilities tend to be overpowered by class prior.
(2) This will make ComplementNB consistent with other naive bayes algorithms in scikit-learn, which is beneficial if we want to implement GeneralNB in the future.
(3) Current API design of ComplementNB seems awkward, i.e., we expose unused parameters (class_prior) to users.

Some simple benchmarks on 20 newsgroups (fetch_20newsgroups_vectorized):
MultinomialNB training set acc: 0.8533 testing set acc: 0.7159
MultinomialNB + class prior training set acc: 0.8439 testing set acc: 0.7053
ComplementNB training set acc: 0.9498 testing set acc: 0.8318
ComplementNB + class prior training set acc: 0.9156 testing set acc: 0.8043

@ghost
Copy link

@ghost ghost commented Jul 26, 2019

I'll do a pull request over the weekend

@qinhanmin2014
Copy link
Member Author

@qinhanmin2014 qinhanmin2014 commented Jul 26, 2019

PRs are always welcomed, but I'm wondering whether there's enough consensus.

@jnothman
Copy link
Member

@jnothman jnothman commented Jul 26, 2019

@qinhanmin2014
Copy link
Member Author

@qinhanmin2014 qinhanmin2014 commented Jul 26, 2019

Actually the implementation here is simple, the issue is whether we should consider class prior by default. If we consider class prior by default, we'll change the behavior of current model when n_classes > 1 (according to the paper, the performance of CNB is likely to decrease, like MNB). If we do not, we'll change the behavior of current model when n_classes = 1 and we'll need to deprecate the default value of fit_prior. (Currently, the default fit_prior is True but we only use it when n_classes = 1.)
If we want this, maybe it's more reasonable to consider class prior by default.

@ghost
Copy link

@ghost ghost commented Jul 26, 2019

Don't all of the other NB classifiers respect class priors by default?

@qinhanmin2014
Copy link
Member Author

@qinhanmin2014 qinhanmin2014 commented Jul 27, 2019

Don't all of the other NB classifiers respect class priors by default?

yes

@Praveenk8051
Copy link

@Praveenk8051 Praveenk8051 commented Jul 24, 2020

@qinhanmin2014 Can i contribute to this ? First timer

@cmarmo
Copy link
Member

@cmarmo cmarmo commented Aug 23, 2020

Hi @Praveenk8051, there is a stalled pull request trying to solve the issue (#14523).
If you are still interested in contributing, feel free to open a new one taking into account the comments there.

@Praveenk8051
Copy link

@Praveenk8051 Praveenk8051 commented Aug 23, 2020

Hi @Praveenk8051, there is a stalled pull request trying to solve the issue (#14523).
If you are still interested in contributing, feel free to open a new one taking into account the comments there.

Thank you.. will look into this

@LeclercTanguy
Copy link

@LeclercTanguy LeclercTanguy commented Dec 6, 2020

Can I contribute to this? I'm first time

@cmarmo
Copy link
Member

@cmarmo cmarmo commented Dec 6, 2020

Can I contribute to this? I'm first time

Hi @LeclercTanguy there is already a pull request meant to fix this issue and waiting for review (#18521). There are some other issues needing help and labeled as easy. Feel free to pick one without a correspondent active pull request open. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

5 participants