0% found this document useful (0 votes)
22 views

Slides Classification Naivebayes

This document provides an introduction to machine learning classification using Naive Bayes. It explains the key aspects of the Naive Bayes classifier including its conditional independence assumption, how it handles numerical and categorical features, and Laplace smoothing. It also discusses an example application of Naive Bayes as a spam filter.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Slides Classification Naivebayes

This document provides an introduction to machine learning classification using Naive Bayes. It explains the key aspects of the Naive Bayes classifier including its conditional independence assumption, how it handles numerical and categorical features, and Laplace smoothing. It also discusses an example application of Naive Bayes as a spam filter.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Introduction to Machine Learning

Classification: Naive Bayes

Learning goals
15

10
Understand the idea of Naive
Response
Bayes
x2

a
b
5

0
Understand in which sense
0 5
x1
10 15 Naive Bayes is a special QDA
model
NAIVE BAYES CLASSIFIER
NB is a generative multiclass technique. Remember: We use Bayes’
theorem and only need p(x|y = k ) to compute the posterior as:

P(x|y = k )P(y = k ) p(x|y = k )πk


πk (x) = P(y = k | x) = = g
P(x) P
p(x|y = j )πj
j =1

NB is based on a simple conditional independence assumption: the


features are conditionally independent given class y .
p
Y
p(x|y = k ) = p((x1 , x2 , ..., xp )|y = k ) = p(xj |y = k ).
j =1

So we only need to specify and estimate the distribution p(xj |y = k ),


which is considerably simpler as this is univariate.

© Introduction to Machine Learning – 1 / 5


NB: NUMERICAL FEATURES
We use a univariate Gaussian for p(xj |y = k ), and estimate (µj , σj2 ) in
p
Q
the standard manner. Because of p(x|y = k ) = p(xj |y = k ), the
j =1
joint conditional density is Gaussian with diagonal but non-isotropic
covariance structure, and potentially different across classes. Hence,
NB is a (specific) QDA model, with quadratic decision boundary.
15

10
Response
x2

a
b
5

0 5 10 15
x1

© Introduction to Machine Learning – 2 / 5


NB: CATEGORICAL FEATURES
We use a categorical distribution for p(xj |y = k ) and estimate the
probabilities pkjm that, in class k , our j-th feature has value m, xj = m,
simply by counting the frequencies.

[x = m ]
Y
p(xj |y = k ) = pkjmj
m

Because of the simple conditional independence structure it is also very


easy to deal with mixed numerical / categorical feature spaces.

© Introduction to Machine Learning – 3 / 5


LAPLACE SMOOTHING
If a given class and feature value never occur together in the training
data, then the frequency-based probability estimate will be zero.

This is problematic because it will wipe out all information in the other
probabilities when they are multiplied.

A simple numerical correction is to set these zero probabilities to a


small value to regularize against this case.

© Introduction to Machine Learning – 4 / 5


NAIVE BAYES: APPLICATION AS SPAM FILTER
In the late 90s, Naive Bayes became popular for e-mail spam filter
programs
Word counts were used as features to detect spam mails (e.g.,
"Viagra" often occurs in spam mail)
Independence assumption implies: occurrence of two words in
mail is not correlated
Seems naive ("Viagra" more likely to occur in context with "Buy
now" than "flower"), but leads to less required parameters and
therefore better generalization, and often works well in practice.

© Introduction to Machine Learning – 5 / 5

You might also like