Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upProvide an option to consider class prior in ComplementNB #14444
Comments
I'll do a pull request over the weekend |
PRs are always welcomed, but I'm wondering whether there's enough consensus. |
I'm happy with this.
|
Actually the implementation here is simple, the issue is whether we should consider class prior by default. If we consider class prior by default, we'll change the behavior of current model when n_classes > 1 (according to the paper, the performance of CNB is likely to decrease, like MNB). If we do not, we'll change the behavior of current model when n_classes = 1 and we'll need to deprecate the default value of fit_prior. (Currently, the default fit_prior is True but we only use it when n_classes = 1.) |
Don't all of the other NB classifiers respect class priors by default? |
yes |
@qinhanmin2014 Can i contribute to this ? First timer |
Currently, in ComplementNB, we estimate class prior but do not use it. I think we can provide an option to consider class prior in ComplementNB (like other naive bayes algorithms in scikit-learn). Reasons:
(1) In the original paper, when proposing ComplementNB, the authors actually take class prior into consideration (see Section 3.1). When proposing the detailed implementation, the authors "use a uniform prior estimate for simplicity", because they think that the class probabilities tend to be overpowered by class prior.
(2) This will make ComplementNB consistent with other naive bayes algorithms in scikit-learn, which is beneficial if we want to implement GeneralNB in the future.
(3) Current API design of ComplementNB seems awkward, i.e., we expose unused parameters (class_prior) to users.
Some simple benchmarks on 20 newsgroups (fetch_20newsgroups_vectorized):
MultinomialNB training set acc: 0.8533 testing set acc: 0.7159
MultinomialNB + class prior training set acc: 0.8439 testing set acc: 0.7053
ComplementNB training set acc: 0.9498 testing set acc: 0.8318
ComplementNB + class prior training set acc: 0.9156 testing set acc: 0.8043