Provide an option to consider class prior in ComplementNB #14444
Comments
I'll do a pull request over the weekend |
PRs are always welcomed, but I'm wondering whether there's enough consensus. |
I'm happy with this.
|
Actually the implementation here is simple, the issue is whether we should consider class prior by default. If we consider class prior by default, we'll change the behavior of current model when n_classes > 1 (according to the paper, the performance of CNB is likely to decrease, like MNB). If we do not, we'll change the behavior of current model when n_classes = 1 and we'll need to deprecate the default value of fit_prior. (Currently, the default fit_prior is True but we only use it when n_classes = 1.) |
Don't all of the other NB classifiers respect class priors by default? |
yes |
@qinhanmin2014 Can i contribute to this ? First timer |
Hi @Praveenk8051, there is a stalled pull request trying to solve the issue (#14523). |
Thank you.. will look into this |
Can I contribute to this? I'm first time |
Hi @LeclercTanguy there is already a pull request meant to fix this issue and waiting for review (#18521). There are some other issues needing help and labeled as easy. Feel free to pick one without a correspondent active pull request open. Thanks! |
Currently, in ComplementNB, we estimate class prior but do not use it. I think we can provide an option to consider class prior in ComplementNB (like other naive bayes algorithms in scikit-learn). Reasons:
(1) In the original paper, when proposing ComplementNB, the authors actually take class prior into consideration (see Section 3.1). When proposing the detailed implementation, the authors "use a uniform prior estimate for simplicity", because they think that the class probabilities tend to be overpowered by class prior.
(2) This will make ComplementNB consistent with other naive bayes algorithms in scikit-learn, which is beneficial if we want to implement GeneralNB in the future.
(3) Current API design of ComplementNB seems awkward, i.e., we expose unused parameters (class_prior) to users.
Some simple benchmarks on 20 newsgroups (fetch_20newsgroups_vectorized):
MultinomialNB training set acc: 0.8533 testing set acc: 0.7159
MultinomialNB + class prior training set acc: 0.8439 testing set acc: 0.7053
ComplementNB training set acc: 0.9498 testing set acc: 0.8318
ComplementNB + class prior training set acc: 0.9156 testing set acc: 0.8043
The text was updated successfully, but these errors were encountered: