ch9 Ensemble Learning
ch9 Ensemble Learning
Chapter 9
Ensemble Learning
supplementary slides to
Machine Learning Fundamentals
c
Hui Jiang 2020
published by Cambridge University Press
August 2020
Outline
2 Bagging
3 Boosting
Ensemble Learning
where
1 if x ∈ Rl
I(x ∈ Rl ) =
0 otherwise
supplementary slides to Machine Learning Fundamentals
c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting
where
(n) (n)
Dl = (x(n) , y (n) ) xi ≤ tj , Dr = (x(n) , y (n) ) xi > tj ,
and c∗l and c∗r are the centroids of Dl and Dr
supplementary slides to Machine Learning Fundamentals
c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting
Boosting: Outline
1 Gradient Boosting
2 AdaBoost
Boosting
consider an additive model for ensemble learning
Gradient Boosting
gradient boosting: estimate the new base
model along the direction of the gradient
at the current ensemble Fm−1 :
∂l f (x), y
∆
∇l Fm−1 (x) =
∂f
f =Fm−1
AdaBoost (I)
apply gradient boosting to binary classification problems
H: all binary functions, i.e. ∀f ∈ H, f (x) ∈ {−1, +1}
the exponential loss function: l F (x), y = e −yF (x)
given a training set: D = (x1 , y1 ), (x2 , y2 ), · · · , (xN , yN ) ,
where xn ∈ Rd and yn ∈ {−1, +1}
the functional gradient:
∆ ∂l f (x), y
∇l Fm−1 (x) = = −y e−yFm−1 (x)
∂f
f =Fm−1
project into H:
fm = arg max f, −∇l Fm−1 (x)
f ∈H
N
1 X
= arg max yn f (xn )e−yn Fm−1 (xn )
f ∈H N n=1
supplementary slides to Machine Learning Fundamentals
c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting
AdaBoost (II)
(m) ∆
denote αn = exp(−yn Fm−1 xn ) :
X X
fm = arg max αn(m) − αn(m)
f ∈H
yn =f (xn ) yn 6=f (xn )
N
X X
= arg max αn(m) − 2 αn(m)
f ∈H
n=1 yn 6=f (xn )
X
= arg min αn(m)
f ∈H
yn 6=f (xn )
(m)
(m) ∆ αn
normalize all weights as ᾱn = PN (m) , we have
n=1 αn
X
fm = arg min ᾱn(m)
f ∈H
yn 6=f (xn )
AdaBoost (III)
estimate fm to minimize the weighted classification error:
X
m = ᾱn(m) (0 ≤ m ≤ 1)
yn 6=fm (xn )
P (m)
ᾱn
1 yn =fm (xn ) 1 1 − m
=⇒ wm = ln (m)
= ln
2 2 m
P
yn 6=fm (xn ) ᾱn
AdaBoost (IV)
AdaBoost algorithm
m = 1 and F0 (x) = 0
(1)
initialize ᾱn = N1 for all n = 1, 2, · · · , N
while not converged do P (m)
learn a binary classifier fm (x) to minimize m = yn 6=fm (xn ) ᾱn
estimate ensemble weight: wm = 21 ln 1− m
m
AdaBoost (V)
Theorem
If AdaBoost generates m base models with errors 1 , 2 , · · · , m ,
the error of the ensemble model Fm (x) is bounded as:
m p
Y
ε ≤ 2m t (1 − t )
t=1