0% found this document useful (0 votes)

93 views19 pages

ch9 Ensemble Learning

This document discusses ensemble learning techniques for machine learning. It introduces formulation of ensemble learning, bagging methods like random forests, and boosting methods. For boosting, it outlines gradient boosting, AdaBoost, and gradient tree boosting. Decision trees are also covered, including how they are constructed for regression and classification problems.

Uploaded by

Juan Zarate

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

93 views19 pages

ch9 Ensemble Learning

Uploaded by

Juan Zarate

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Formulation Bagging Boosting

Chapter 9
Ensemble Learning

supplementary slides to
Machine Learning Fundamentals

c
Hui Jiang 2020
published by Cambridge University Press

August 2020

supplementary slides to Machine Learning Fundamentals

c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting

Outline

1 Formulation of Ensemble Learning

2 Bagging

3 Boosting

supplementary slides to Machine Learning Fundamentals

c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting

Ensemble Learning

ensemble learning: combine multiple base models that are

learned separately for the same task

how to choose base models?

◦ neural networks, linear models, decision trees, etc.

how to learn base models to ensure the diversity?

◦ re-sampling the training set, re-weighting training samples, etc.

how to combine base models optimally?

◦ bagging, boosting, stacking

supplementary slides to Machine Learning Fundamentals

c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting

Decision Trees (I)

a popular non-parametric model for

regression or classification tasks
a tree-structured model:
◦ each non-terminal node is associated
with a binary question regarding an
input feature element xi and a threshold
tj , e.g. xi ≤ tj
◦ each leaf node represents a homogeneous
region Rl in the input space
each decision tree represents a particular
partition of the input space
decision trees are a highly interpretable
machine learning method
supplementary slides to Machine Learning Fundamentals
c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting

Decision Trees (II)

fit a simple model to all y values in each x y
region Rl ML model
◦ regression: use a constant cl for each Rl
◦ classification: assign all x in each Rl to y = f¯(x)
one particular class
approximate the unknown target function
by a piece-wise constant function
X
y = f (x) = cl I(x ∈ Rl )
l

where

1 if x ∈ Rl
I(x ∈ Rl ) =
0 otherwise
supplementary slides to Machine Learning Fundamentals
c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting

Decision Trees for Regression

a training set: D = (x(n) , y (n) ) n = 1, 2, · · · , N

construct the loss functional using a loss function l(·):

1
PN 1
PN 2
L(f ; D) = N n=1 l y (n) , f (x(n) ) = N n=1 y (n) − f (x(n) )
computationally infeasible to find the best partition to
minimize the above loss
use the greedy algorithm to recursively find an optimal split
x∗i ≤ t∗j at a time
h X 2 X 2 i
x∗i , t∗j = arg min y (n) − c∗l y (n) − c∗r

+
xi ,tj
x(n) ∈Dl x(n) ∈Dr

where
(n) (n)
Dl = (x(n) , y (n) ) xi ≤ tj , Dr = (x(n) , y (n) ) xi > tj ,
and c∗l and c∗r are the centroids of Dl and Dr
supplementary slides to Machine Learning Fundamentals
c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting

Decision Trees for Classification

classification
problem involving K classes,
i.e. ω1 , ω2 , · · · , ωK
plk (k = 1, 2, · · · , K): the portion of class
k among all training samples assigned to
leaf node l representing Rl
1 X
plk = I(y (n) = ωk )
Nl
x(n) ∈Rl ◦ misclassification error:
1
P (n)
Nl x (n) ∈R l
I(y 6=
all input x in each region Rl is assigned ωkl∗ ) = 1 − plkl∗
to the majority class ◦ Gini index: 1 − K
P 2
k=1 plk
kl∗ = arg maxk plk ◦ entropy:
− K
P
the criteria for the best split {x∗i , t∗j } k=1 plk log(plk )

supplementary slides to Machine Learning Fundamentals

c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting

Bagging and Random Forests

bagging stands for bootstrap aggregating

bootstrapping (sampling with replacement) a training set into
M subsets
use M bootstrap subsets to independently learn M models
combine M models by averaging or majority-voting
random forests: use decision trees as base models in bagging
◦ row sampling
◦ column sampling
◦ sub-optimal splitting

random forests are much more powerful than decision trees

supplementary slides to Machine Learning Fundamentals

c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting

Boosting: Outline

1 Gradient Boosting

2 AdaBoost

3 Gradient Tree Boosting

supplementary slides to Machine Learning Fundamentals

c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting

Boosting
consider an additive model for ensemble learning

Fm (x) = w1 f1 (x) + w2 f2 (x) + · · · + wm fm (x)

each base model fm (x) ∈ H, then Fm (x) ∈ lin(H) ⊇ H

ensemble learning: ⇐⇒ functional minimization
N
X
Fm (x) = arg min l f (xn ), yn
f ∈ lin(H)
n=1

boosting: a sequential learning strategy to add a new base

model to improve the current ensemble

Fm (x) = Fm−1 (x) + wm fm (x)

supplementary slides to Machine Learning Fundamentals

c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting

Gradient Boosting
gradient boosting: estimate the new base
model along the direction of the gradient
at the current ensemble Fm−1 :

∂l f (x), y
∆
∇l Fm−1 (x) =
∂f
f =Fm−1

project the gradient into H: N

∆ 1 X

hf, gi = f (xi )g(xi )
fm = arg max f, −∇l Fm−1 (x) N i=1
f ∈H

estimate the optimal weight:

PN
wm = arg minw n=1 l Fm−1 (xn ) + w fm (xn ), yn

supplementary slides to Machine Learning Fundamentals

c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting

AdaBoost (I)
apply gradient boosting to binary classification problems
H: all binary functions, i.e. ∀f ∈ H, f (x) ∈ {−1, +1}
the exponential loss function: l F (x), y = e −yF (x)

given a training set: D = (x1 , y1 ), (x2 , y2 ), · · · , (xN , yN ) ,
where xn ∈ Rd and yn ∈ {−1, +1}
the functional gradient:

∆ ∂l f (x), y
∇l Fm−1 (x) = = −y e−yFm−1 (x)
∂f
f =Fm−1

project into H:

fm = arg max f, −∇l Fm−1 (x)
f ∈H
N
1 X
= arg max yn f (xn )e−yn Fm−1 (xn )
f ∈H N n=1
supplementary slides to Machine Learning Fundamentals
c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting

AdaBoost (II)
(m) ∆
denote αn = exp(−yn Fm−1 xn ) :
X X
fm = arg max αn(m) − αn(m)
f ∈H
yn =f (xn ) yn 6=f (xn )
N
X X
= arg max αn(m) − 2 αn(m)
f ∈H
n=1 yn 6=f (xn )
X
= arg min αn(m)
f ∈H
yn 6=f (xn )

(m)
(m) ∆ αn
normalize all weights as ᾱn = PN (m) , we have
n=1 αn
X
fm = arg min ᾱn(m)
f ∈H
yn 6=f (xn )

supplementary slides to Machine Learning Fundamentals

c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting

AdaBoost (III)
estimate fm to minimize the weighted classification error:
X
m = ᾱn(m) (0 ≤ m ≤ 1)
yn 6=fm (xn )

replace the 0-1 loss function with a weighted loss function,

(m)
where ᾱn is treated as the loss if (xn , yn ) is misclassified
estimate the optimal weight:
N
X
wm = arg min e−yn Fm−1 (xn )+w fm (xn )
w
n=1

P (m)
ᾱn

1 yn =fm (xn ) 1 1 − m
=⇒ wm = ln (m)
= ln
2 2 m
P
yn 6=fm (xn ) ᾱn

supplementary slides to Machine Learning Fundamentals

c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting

AdaBoost (IV)

AdaBoost algorithm

input: (x1 , y1 ), · · · , (xN , yN ) , where xn ∈ Rd and yn ∈ {−1, +1}

output: an ensemble model Fm (x)

m = 1 and F0 (x) = 0
(1)
initialize ᾱn = N1 for all n = 1, 2, · · · , N
while not converged do P (m)
learn a binary classifier fm (x) to minimize m = yn 6=fm (xn ) ᾱn

estimate ensemble weight: wm = 21 ln 1− m
m

add to ensemble: Fm (x) = Fm−1 (x) + wm fm (x)

(m) −yn wm fm (xn )
(m+1) ᾱn e
update ᾱn = PN (m) −yn wm fm (xn ) for all n = 1, 2, · · · , N
n=1 ᾱn e
m=m+1
end while

supplementary slides to Machine Learning Fundamentals

c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting

AdaBoost (V)

Theorem
If AdaBoost generates m base models with errors 1 , 2 , · · · , m ,
the error of the ensemble model Fm (x) is bounded as:
m p
Y
ε ≤ 2m t (1 − t )
t=1

combine many weak classifiers towards a strong classifier, i.e.

ε → 0 as m → ∞ if all t 6= 12 (better than random guessing)
generalize well into unseen samples since it improves the
margin distribution of training samples

supplementary slides to Machine Learning Fundamentals

c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting

Gradient Tree Boosting (I)

apply gradient boosting to regression problems
H: all decision trees
use the square error as the loss functional:
2
l f (x), y = 12 f (x) − y

the functional gradient: ∇l Fm−1 (x) = Fm−1 (x) − y

given a training set: D = (x1 , y1 ), (x2 , y2 ), · · · , (xN , yN )
project it to minimize:
2
fm = arg min f + ∇l Fm−1 (x)
f ∈H
N
X 2
= arg min f (xn ) − yn − Fm−1 (xn )
f ∈H
n=1
| {z }
residual

supplementary slides to Machine Learning Fundamentals

c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting

Gradient Tree Boosting (II)

gradient tree boosting: build a decision tree fm to
approximate the residuals, i.e. yn − Fm−1 (xn ), for all n
X
y = fm (x) = cml I(x ∈ Rml )
l

where cml is the mean of all residuals in the region Rml

a.k.a. gradient boosting machine (GBM), gradient boosted
regression tree (GBRT)
use a pre-set ”shrinkage” parameter ν as the weight:

Fm (x) = Fm−1 (x) + ν fm (x)

also applicable to multi-class classification problems

supplementary slides to Machine Learning Fundamentals
c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting

Gradient Tree Boosting (III)

Gradient Tree Boosting

input: (x1 , y1 ), (x2 , y2 ), · · · , (xN , yN )
output: an ensemble model Fm (x)

fit a regression tree f0 (x) to (x1 , y1 ), (x2 , y2 ), · · · , (xN , yN )
F0 (x) = ν f0 (x)
m=1
while not converged do
compute the negative gradients as pseudo outputs:
ỹn = −∇l Fm−1 (xn) for all n = 1, 2, · · · , N
fit a regression tree fm (x) to (x1 , ỹ1 ), · · · , (xN , ỹN )
Fm (x) = Fm−1 (x) + νfm (x)
m=m+1
end while
supplementary slides to Machine Learning Fundamentals
c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9

Skip Gram
100% (1)
Skip Gram
37 pages
Titanic Data Analysis
No ratings yet
Titanic Data Analysis
11 pages
Navies Bayes
No ratings yet
Navies Bayes
18 pages
Tf-Idf: David Kauchak cs160 Fall 2009
No ratings yet
Tf-Idf: David Kauchak cs160 Fall 2009
51 pages
Model With One-Word Context: 2vec 2vec 2vec 2vec
100% (1)
Model With One-Word Context: 2vec 2vec 2vec 2vec
17 pages
Chapters 8 & 9 First-Order Logic: Dr. Daisy Tang
No ratings yet
Chapters 8 & 9 First-Order Logic: Dr. Daisy Tang
76 pages
Technical Seminar: Sapthagiri College of Engineering
No ratings yet
Technical Seminar: Sapthagiri College of Engineering
18 pages
Knowledge Based Systems (Sistem Berbasis Pengetahuan) : Ir. Wahidin Wahab M.SC PH.D
No ratings yet
Knowledge Based Systems (Sistem Berbasis Pengetahuan) : Ir. Wahidin Wahab M.SC PH.D
33 pages
m8 Fol
No ratings yet
m8 Fol
27 pages
Application of First-Order Logic in Knowledge Based Systems PDF
No ratings yet
Application of First-Order Logic in Knowledge Based Systems PDF
7 pages
Knowledge Representation Additional Reading
No ratings yet
Knowledge Representation Additional Reading
26 pages
Prompt Engineering For Vision Models Slides 1720084286
No ratings yet
Prompt Engineering For Vision Models Slides 1720084286
17 pages
Short Report On Expert Systems
100% (1)
Short Report On Expert Systems
12 pages
Automatic Music Generation
No ratings yet
Automatic Music Generation
16 pages
Knowledge Representation First Order Logic
No ratings yet
Knowledge Representation First Order Logic
49 pages
CSC445: Neural Networks
No ratings yet
CSC445: Neural Networks
51 pages
Lecture 05 - Part A First Order Logic (FOL) : Dr. Shazzad Hosain
No ratings yet
Lecture 05 - Part A First Order Logic (FOL) : Dr. Shazzad Hosain
80 pages
AutoGen - The Automated Program Generator
No ratings yet
AutoGen - The Automated Program Generator
196 pages
Statistics Presentation
No ratings yet
Statistics Presentation
21 pages
6 - Train - Test - Split - Ipynb - Colaboratory
No ratings yet
6 - Train - Test - Split - Ipynb - Colaboratory
5 pages
Statistics Powerpoint Presentation - Regression
No ratings yet
Statistics Powerpoint Presentation - Regression
17 pages
Mining The Web Graph: Technical Seminar Presentation On
No ratings yet
Mining The Web Graph: Technical Seminar Presentation On
15 pages
542 315 Word2vec
No ratings yet
542 315 Word2vec
20 pages
Web3 Presentation
No ratings yet
Web3 Presentation
32 pages
Download Complete Data Mining for Business Intelligence Concepts Techniques and Applications in Microsoft Office Excel r with XLMiner r 2nd ed Edition Patel PDF for All Chapters
100% (19)
Download Complete Data Mining for Business Intelligence Concepts Techniques and Applications in Microsoft Office Excel r with XLMiner r 2nd ed Edition Patel PDF for All Chapters
60 pages
PPT03-First Order Logic & Inference in FOL
No ratings yet
PPT03-First Order Logic & Inference in FOL
59 pages
Generative Ai Guide
No ratings yet
Generative Ai Guide
2 pages
Topic For The Class:: Knowledge and Reasoning
No ratings yet
Topic For The Class:: Knowledge and Reasoning
41 pages
Lab I TENSOR FLOW AND KERAS
No ratings yet
Lab I TENSOR FLOW AND KERAS
3 pages
Soft Max
No ratings yet
Soft Max
6 pages
Module-5:: Network Analysis
No ratings yet
Module-5:: Network Analysis
22 pages
Vector Database in LLMs
No ratings yet
Vector Database in LLMs
14 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
28 pages
Python
No ratings yet
Python
21 pages
Data Science
No ratings yet
Data Science
64 pages
QUESTION BANK UNIT 5 - Computer Organization and Architecture
No ratings yet
QUESTION BANK UNIT 5 - Computer Organization and Architecture
9 pages
Deep Learning: - Course Code: - Unit 1
No ratings yet
Deep Learning: - Course Code: - Unit 1
21 pages
Data Science Intervieew Questions
100% (1)
Data Science Intervieew Questions
16 pages
Explainable Ai in Pervasive Healthcare
No ratings yet
Explainable Ai in Pervasive Healthcare
25 pages
Gradient Descent Algorithms and Variations - PyImageSearch
No ratings yet
Gradient Descent Algorithms and Variations - PyImageSearch
21 pages
Computer Organization & Architecture
No ratings yet
Computer Organization & Architecture
49 pages
Artificial Intelligence For R-2017 by Krishna Sankar P., Shangaranarayanee N. P., Nithyananthan S.
0% (1)
Artificial Intelligence For R-2017 by Krishna Sankar P., Shangaranarayanee N. P., Nithyananthan S.
8 pages
Predicate Logic
No ratings yet
Predicate Logic
64 pages
Bayesian belief Network
No ratings yet
Bayesian belief Network
23 pages
All Pairs Shortest Path
No ratings yet
All Pairs Shortest Path
28 pages
Bias and Variance
No ratings yet
Bias and Variance
6 pages
[Ebooks PDF] download LLMs and Generative AI for Healthcare (Early Release) Kerrie Holley full chapters
100% (2)
[Ebooks PDF] download LLMs and Generative AI for Healthcare (Early Release) Kerrie Holley full chapters
65 pages
Ch-4 Ensemble Learning
No ratings yet
Ch-4 Ensemble Learning
18 pages
Best Practices For Prompt Engineering With The OpenAI
No ratings yet
Best Practices For Prompt Engineering With The OpenAI
6 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
21 pages
Agents & Environment
No ratings yet
Agents & Environment
24 pages
Social Network Analytics Session2
No ratings yet
Social Network Analytics Session2
34 pages
Regularization: Swetha V, Research Scholar
No ratings yet
Regularization: Swetha V, Research Scholar
32 pages
Lecture Notes - Logistic Regression
100% (1)
Lecture Notes - Logistic Regression
11 pages
Knowledge Engineering
No ratings yet
Knowledge Engineering
16 pages
Gradient Descent Optimization
No ratings yet
Gradient Descent Optimization
27 pages
C Programming and Data Structures
No ratings yet
C Programming and Data Structures
5 pages
GenerativeAdversialNetwork
No ratings yet
GenerativeAdversialNetwork
21 pages
Bias, Variance, and Tradeoff
No ratings yet
Bias, Variance, and Tradeoff
8 pages
Hebbian Learning: Fundamentals and Applications for Uniting Memory and Learning
From Everand
Hebbian Learning: Fundamentals and Applications for Uniting Memory and Learning
Fouad Sabry
No ratings yet
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
From Everand
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
Fouad Sabry
No ratings yet
Adversarial Training For Large Neural Language Models
No ratings yet
Adversarial Training For Large Neural Language Models
13 pages
1 s2.0 S2095809922006324 Main
No ratings yet
1 s2.0 S2095809922006324 Main
20 pages
Lecture 1
No ratings yet
Lecture 1
36 pages
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
No ratings yet
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
23 pages
1 s2.0 S0168169919311573 Main
No ratings yet
1 s2.0 S0168169919311573 Main
11 pages
Lab2 Linear Regression
100% (1)
Lab2 Linear Regression
18 pages
Data Mining Unit 3
No ratings yet
Data Mining Unit 3
50 pages
Shubham Thesis PDF
No ratings yet
Shubham Thesis PDF
63 pages
Sap PM Tutorial PDF
100% (5)
Sap PM Tutorial PDF
89 pages
Dayananda Sagar University: A Mini Project Report ON
No ratings yet
Dayananda Sagar University: A Mini Project Report ON
17 pages
SAS Programming For Data Mining: AUC Calculation Using Wilcoxon Rank Sum Test
No ratings yet
SAS Programming For Data Mining: AUC Calculation Using Wilcoxon Rank Sum Test
8 pages
Swathi Belavadi Aswathanarayana
No ratings yet
Swathi Belavadi Aswathanarayana
2 pages
Deep Learning Full
No ratings yet
Deep Learning Full
25 pages
Supervised Learning in Healthcare
No ratings yet
Supervised Learning in Healthcare
6 pages
DSR LAB MANUAL - 10 programs
No ratings yet
DSR LAB MANUAL - 10 programs
34 pages
Predicting Diabetes in Medical Datasets Using Machine Learning Techniques
No ratings yet
Predicting Diabetes in Medical Datasets Using Machine Learning Techniques
14 pages
confusion matrix problem solution
No ratings yet
confusion matrix problem solution
6 pages
HRA Notes
No ratings yet
HRA Notes
18 pages
An Overview On Gene Expression Analysis: Dr. R. Radha, P. Rajendiran
No ratings yet
An Overview On Gene Expression Analysis: Dr. R. Radha, P. Rajendiran
6 pages
Lecture 26
No ratings yet
Lecture 26
17 pages
A Fast Clustering Algorithm To Cluster Very Large Categorical Data Sets in Data Mining
No ratings yet
A Fast Clustering Algorithm To Cluster Very Large Categorical Data Sets in Data Mining
8 pages
Sentiment Analysis For Therapy Chatbots A Comparison of Supervised Learning Approaches
No ratings yet
Sentiment Analysis For Therapy Chatbots A Comparison of Supervised Learning Approaches
6 pages
Kernell Mallows Kernels For Permutations
No ratings yet
Kernell Mallows Kernels For Permutations
38 pages
A Comprehensive Survey On Support Vector Machine Classification - Applications, Challenges and Trends
No ratings yet
A Comprehensive Survey On Support Vector Machine Classification - Applications, Challenges and Trends
27 pages
Frai 05 964279
No ratings yet
Frai 05 964279
15 pages
A Mamdani Fuzzy Inference System For Classification
No ratings yet
A Mamdani Fuzzy Inference System For Classification
11 pages
GIS and Remote Sensing Applications in Wetland Mapping and Monitoring
No ratings yet
GIS and Remote Sensing Applications in Wetland Mapping and Monitoring
27 pages
Data Description For Data Mining
No ratings yet
Data Description For Data Mining
7 pages
AMLFv1 EN PDF M02 SG
No ratings yet
AMLFv1 EN PDF M02 SG
55 pages
Cyberbullying Detection Through Sentiment Analysis
No ratings yet
Cyberbullying Detection Through Sentiment Analysis
6 pages
Artificial Intelligence Vrs Statistics
100% (1)
Artificial Intelligence Vrs Statistics
25 pages
Thesis Defense Reminder Email
100% (3)
Thesis Defense Reminder Email
7 pages
Sat - 15.Pdf - Online Subjective Answer Checker
No ratings yet
Sat - 15.Pdf - Online Subjective Answer Checker
11 pages
SBST1103 Topic 1 Edited
No ratings yet
SBST1103 Topic 1 Edited
10 pages
Language and Eating Problems in Children
No ratings yet
Language and Eating Problems in Children
10 pages