Provide an option to consider class prior in ComplementNB #14444

qinhanmin2014 · 2019-07-23T03:48:29Z

Currently, in ComplementNB, we estimate class prior but do not use it. I think we can provide an option to consider class prior in ComplementNB (like other naive bayes algorithms in scikit-learn). Reasons:
(1) In the original paper, when proposing ComplementNB, the authors actually take class prior into consideration (see Section 3.1). When proposing the detailed implementation, the authors "use a uniform prior estimate for simplicity", because they think that the class probabilities tend to be overpowered by class prior.
(2) This will make ComplementNB consistent with other naive bayes algorithms in scikit-learn, which is beneficial if we want to implement GeneralNB in the future.
(3) Current API design of ComplementNB seems awkward, i.e., we expose unused parameters (class_prior) to users.

Some simple benchmarks on 20 newsgroups (fetch_20newsgroups_vectorized):
MultinomialNB training set acc: 0.8533 testing set acc: 0.7159
MultinomialNB + class prior training set acc: 0.8439 testing set acc: 0.7053
ComplementNB training set acc: 0.9498 testing set acc: 0.8318
ComplementNB + class prior training set acc: 0.9156 testing set acc: 0.8043

ghost · 2019-07-26T01:12:26Z

I'll do a pull request over the weekend

qinhanmin2014 · 2019-07-26T03:31:19Z

PRs are always welcomed, but I'm wondering whether there's enough consensus.

jnothman · 2019-07-26T05:32:38Z

I'm happy with this.

qinhanmin2014 · 2019-07-26T10:12:56Z

Actually the implementation here is simple, the issue is whether we should consider class prior by default. If we consider class prior by default, we'll change the behavior of current model when n_classes > 1 (according to the paper, the performance of CNB is likely to decrease, like MNB). If we do not, we'll change the behavior of current model when n_classes = 1 and we'll need to deprecate the default value of fit_prior. (Currently, the default fit_prior is True but we only use it when n_classes = 1.)
If we want this, maybe it's more reasonable to consider class prior by default.

ghost · 2019-07-26T16:12:21Z

Don't all of the other NB classifiers respect class priors by default?

qinhanmin2014 · 2019-07-27T00:41:27Z

Don't all of the other NB classifiers respect class priors by default?

yes

Praveenk8051 · 2020-07-24T13:50:30Z

@qinhanmin2014 Can i contribute to this ? First timer

cmarmo · 2020-08-23T16:36:08Z

Hi @Praveenk8051, there is a stalled pull request trying to solve the issue (#14523).
If you are still interested in contributing, feel free to open a new one taking into account the comments there.

Praveenk8051 · 2020-08-23T20:14:12Z

Hi @Praveenk8051, there is a stalled pull request trying to solve the issue (#14523).
If you are still interested in contributing, feel free to open a new one taking into account the comments there.

Thank you.. will look into this

LeclercTanguy · 2020-12-06T22:08:24Z

Can I contribute to this? I'm first time

cmarmo · 2020-12-06T22:41:21Z

Can I contribute to this? I'm first time

Hi @LeclercTanguy there is already a pull request meant to fix this issue and waiting for review (#18521). There are some other issues needing help and labeled as easy. Feel free to pick one without a correspondent active pull request open. Thanks!

qinhanmin2014 added Easy good first issue help wanted labels Jul 26, 2019

ghost mentioned this issue Jul 30, 2019

[MRG] ComplementNB respects class priors when computing joint log-likelihoods #14523

Closed

qinhanmin2014 mentioned this issue Oct 23, 2019

Implement GeneralNB #15077

Open

cmarmo removed the help wanted label Sep 30, 2020

This was referenced Oct 2, 2020

[MRG] ComplementNB with class priors when computing joint log-likelihoods (Trac : #14523, #14444) #18520

Closed

[MRG] ComplementNB with class priors when computing joint log-likelihoods (Trac : #14523, #14444) #18521

Open

Apr	JUL	May
	05
2020	2021	2022

scikit-learn / scikit-learn

Provide an option to consider class prior in ComplementNB #14444

Provide an option to consider class prior in ComplementNB #14444

qinhanmin2014 commented Jul 23, 2019

ghost commented Jul 26, 2019

qinhanmin2014 commented Jul 26, 2019

jnothman commented Jul 26, 2019

qinhanmin2014 commented Jul 26, 2019

ghost commented Jul 26, 2019

qinhanmin2014 commented Jul 27, 2019

Praveenk8051 commented Jul 24, 2020

cmarmo commented Aug 23, 2020

Praveenk8051 commented Aug 23, 2020

LeclercTanguy commented Dec 6, 2020

cmarmo commented Dec 6, 2020

scikit-learn / scikit-learn

Sponsor scikit-learn/scikit-learn

Provide an option to consider class prior in ComplementNB #14444

Provide an option to consider class prior in ComplementNB #14444

Comments

qinhanmin2014 commented Jul 23, 2019

ghost commented Jul 26, 2019

qinhanmin2014 commented Jul 26, 2019

jnothman commented Jul 26, 2019

qinhanmin2014 commented Jul 26, 2019

ghost commented Jul 26, 2019

qinhanmin2014 commented Jul 27, 2019

Praveenk8051 commented Jul 24, 2020

cmarmo commented Aug 23, 2020

Praveenk8051 commented Aug 23, 2020

LeclercTanguy commented Dec 6, 2020

cmarmo commented Dec 6, 2020