0% found this document useful (0 votes)

33 views

Lab4 RBM DBN Extra Slides

The document discusses restricted Boltzmann machines (RBMs) and their training using contrastive divergence (CD). It provides details on RBM structure and equations for conditional probabilities between visible and hidden units. CD training involves running Gibbs sampling for k steps to approximate maximum likelihood training. The weights are updated based on the difference between correlations in the data and reconstructed by the model after k steps of Gibbs sampling. CD1 uses just 1 step of reconstruction to perform efficient approximate maximum likelihood training of RBMs.

Uploaded by

Prem Nath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views

Lab4 RBM DBN Extra Slides

Uploaded by

Prem Nath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Extra

lab 4 support
DD2437
Pawel Herman
CST/EECS/KTH

KTH Pawel Herman DD2437 annda

• RBMs and CD learning
• DBNs (stacking RBMs)
• Hinton et al.’s (2006) model

Restricted Boltzmann machine (RBM)
Visible and hidden units are conditionally
independent given one another
p( h | v )   p( hi | v )
i

p( v | h)   p( v j | h)
j

KTH Pawel Herman DD2437 Representation learning, and generative models

• RBMs and CD learning
• DBNs (stacking RBMs)
• Hinton et al.’s (2006) model

Restricted Boltzmann machine (RBM)
Visible and hidden units are conditionally
independent given one another
p( h | v )   p( hi | v )
i

p( v | h)   p( v j | h)
j

Following the same principle of maximising log likelihood by means of
gradient ascent, one obtains:

w ji  
L( W)
w ji

  v j hi
data
 v j hi
model


KTH Pawel Herman DD2437 Representation learning, and generative models

• RBMs and CD learning
• DBNs (stacking RBMs)
• Hinton et al.’s (2006) model

Following the same principle of maximising log likelihood by means of
gradient ascent, one obtains:

w ji  
L( W)
w ji
  v j hi  data
 v j hi
model


KTH Pawel Herman DD2437 Representation learning, and generative models

• RBMs and CD learning
• DBNs (stacking RBMs)
• Hinton et al.’s (2006) model

RBM learning with Contrastive Divergence (CD)

Gibbs sampling

1
P( hi  1| v ) 

1  exp biash  v T W:,i
i

1
P( v j  1| h) 

1  exp biasv j  Wj ,: h 
increase energy “elsewhere”,
esp. in areas of low energy
for the observed data

Hinton, 2003

KTH Pawel Herman DD2437 Representation learning, and generative models

• RBMs and CD learning
• DBNs (stacking RBMs)
• Hinton et al.’s (2006) model

RBM learning with Contrastive Divergence (CD)

Gibbs sampling

1
P( hi  1| v ) 

1  exp biash  v T W:,i
i

1
P( v j  1| h) 

1  exp biasv j  Wj ,: h 
GOOD TO KNOW:
increase energy “elsewhere”,
Contrastive Divergence does esp. in areas of low energy
not optimise the likelihood for the observed data

but it works effectively! Hinton, 2003

KTH Pawel Herman DD2437 Representation learning, and generative models

• RBMs and CD learning
• DBNs (stacking RBMs)
• Hinton et al.’s (2006) model

CDk recipe for training RBM

Gibbs sampling

Objective:
v j hi and v j hi
data model

1) Set (clamp) the visible units with an input vector and update hidden units (binary states).
  
1
P( hi  1| v )  1  exp biash  v W:,i
T
i

2) Update all the visible units in parallel to get a reconstruction (probabs can be used).

  
1
P( v j  1| h)  1  exp biasv j  Wj ,: h
3) Collect the statistics for correlations after k steps using mini‐batches (N samples) and
update weights: k‐th step


1 N (n) (n)
w j ,i   v j hi  vˆ (jn ) hˆi( n )
N n 1
 The final update of the hidden
units should use the probability.

KTH Pawel Herman DD2437 Representation learning, and generative models

• RBMs and CD learning
• DBNs (stacking RBMs)
• Hinton et al.’s (2006) model

CD1 case

Gibbs sampling

 
N
1 (1, n ) ˆ (1, n )
w j ,i  v (0)
j h
i
(0)
 v h
(1)
j i
(1)
 v (0,
j
n ) (0, n )
hi  ˆ
v j hi
N n 1

probabilities binary probabilities

samples bias (jv )  v (0)
j  v (1)
j

biasi( h )  hi(0)  hi(1)

KTH Pawel Herman DD2437 Representation learning, and generative models

• RBMs and CD learning
• DBNs (stacking RBMs)
• Hinton et al.’s (2006) model

Deep belief nets
Greedy layer‐wise training approach
with the use of RBMs

KTH Pawel Herman DD2437 Representation learning, and generative models

• RBMs and CD learning
• DBNs (stacking RBMs)
• Hinton et al.’s (2006) model

Deep belief nets
Greedy layer‐wise training approach
with the use of RBMs

h(3)
W(3)

h(2)
W(2)
h(1)
W(1)

v
Salakhutdinov, 2015

KTH Pawel Herman DD2437 Representation learning, and generative models

• RBMs and CD learning
• DBNs (stacking RBMs)
• Hinton et al.’s (2006) model

Deep belief nets
Greedy layer‐wise training approach
with the use of RBMs

h(3)
undirected part of
the network
(bipartite graph of RBM)
h(2)

h(1)

directed part of the
network (sigmoid
belief network)
v
Salakhutdinov, 2015

KTH Pawel Herman DD2437 Representation learning, and generative models

• RBMs and CD learning
• DBNs (stacking RBMs)
• Hinton et al.’s (2006) model

Deep belief nets
Bottom‐up pass by
stochastically activating
Approach 1 higher layers in time

KTH Pawel Herman DD2437 Representation learning, and generative models

• RBMs and CD learning
• DBNs (stacking RBMs)
• Hinton et al.’s (2006) model

Hinton et al.’s (2006) architecture
Building the stack of RBMs

500 units

RBM1
28x28
pixel
image

KTH Pawel Herman DD2437 Representation learning, and generative models

• RBMs and CD learning
• DBNs (stacking RBMs)
• Hinton et al.’s (2006) model

Hinton et al.’s (2006) architecture
Building the stack of RBMs

500 units
RBM2

The visible layer of RBM2 is treated as 500 units

probabilities (just like v(0) in CD)

28x28
pixel
image

KTH Pawel Herman DD2437 Representation learning, and generative models

• RBMs and CD learning
• DBNs (stacking RBMs)
• Hinton et al.’s (2006) model

Hinton et al.’s (2006) architecture
Building the stack of RBMs

2000 top‐level units

RBM3
10 label units 500 units
(soft‐max)

500 units

28x28
pixel
image

KTH Pawel Herman DD2437 Representation learning, and generative models

• RBMs and CD learning
• DBNs (stacking RBMs)
• Hinton et al.’s (2006) model

Hinton et al.’s (2006) architecture
The network used to model joint distribution of digit images and labels.

2000 top‐level units

10 label units 500 units
(soft‐max)

500 units
Once the top layer is added, the
connections between the layers
below (now hidden) get decoupled ‐
28x28
unidirectional
pixel
image

KTH Pawel Herman DD2437 Representation learning, and generative models

• RBMs and CD learning
• DBNs (stacking RBMs)
• Hinton et al.’s (2006) model: pretraining

Hinton et al.’s (2006) architecture
Pretraining with labels once the stack of RBMs has been built

2000 top‐level units

10 label units 500 units
(soft‐max)

1. Clamp a label unit corresponding to the

input digit image (or rather its probabilistic 500 units
representation).

28x28
pixel
image

KTH Pawel Herman DD2437 Representation learning, and generative models

• RBMs and CD learning
• DBNs (stacking RBMs)
• Hinton et al.’s (2006) model: pretraining

Hinton et al.’s (2006) architecture
Pretraining with labels once the stack of RBMs has been built

2000 top‐level units

10 label units 500 units
(soft‐max)
Pretraining with labels

1. Clamp a label unit corresponding to the

input digit image (or rather its probabilistic 500 units
representation).
2. Gibbs sampling for CD learning (CD1) of the
weights in the top RBM, 500+10 <‐> 2000. 28x28
pixel
* Label units clamped (forced / set) only in the first iteration,
then in subsequent Gibbs iterations – soft‐max. image
* 500‐unit layer is a probabilistic representation
(coherently with the notion of CD1 learning of an RBM).

KTH Pawel Herman DD2437 Representation learning, and generative models

• RBMs and CD learning
• DBNs (stacking RBMs)
• Hinton et al.’s (2006) model: pretraining

Hinton et al.’s (2006) architecture
Pretraining with labels once the stack of RBMs has been built

2000 top‐level units

10 label units 500 units
(soft‐max)
Pretraining with labels

1. Clamp a label unit corresponding to the

input digit image (or rather its probabilistic 500 units
representation).
2. Gibbs sampling for CD learning (CD1) of the
weights in the top RBM, 500+10 <‐> 2000. 28x28
3. Test recognition by Gibbs sampling (20 pixel
iterations) with soft‐max for 10 label units image
Initialise 10 label units uniformly with 0.1 and use
binary sample representations for 500 units
KTH Pawel Herman DD2437 Representation learning, and generative models
• RBMs and CD learning
• DBNs (stacking RBMs)
• Hinton et al.’s (2006) model: generative mode

Approximate sampling from DBN

Gibbs sampling chain in the RBM part

RBM part
(undirected part of the graph)

single‐run sampling
(through the directed graph)

KTH Pawel Herman DD2437 Representation learning, and generative models

• RBMs and CD learning
• DBNs (stacking RBMs)
• Hinton et al.’s (2006) model: generative mode

Hinton et al.’s (2006) architecture
Generating samples

2000 top‐level units

10 label units 500 units

1. Keep a label unit corresponding to the 500 units

requested digit label clamped (fixed).

28x28
pixel
image

KTH Pawel Herman DD2437 Representation learning, and generative models

• RBMs and CD learning
• DBNs (stacking RBMs)
• Hinton et al.’s (2006) model: generative mode

Hinton et al.’s (2006) architecture
Generating samples

2000 top‐level units

top RBM
10 label units 500 units

1. Keep a label unit corresponding to the 500 units

requested digit label clamped (fixed).
2. Run Gibbs sampling for 200 iterations in the
top RBM (500+10 <‐> 2000) to converge. 28x28
• 2000 units: binary states pixel
image
• 10 label units: clamped in all iterations
• 500 units: binary samples.
KTH Pawel Herman DD2437 Representation learning, and generative models
• RBMs and CD learning
• DBNs (stacking RBMs)
• Hinton et al.’s (2006) model: generative mode

Hinton et al.’s (2006) architecture
Generating samples

2000 top‐level units

top RBM
10 label units 500 units

1. Keep a label unit corresponding to the 500 units

requested digit label clamped (fixed).
2. Run Gibbs sampling for 200 iterations in the
top RBM (500+10 <‐> 2000) to converge. 28x28
500‐unit layer can be initialized with either a random
• 2000 units: binary states pixel
sample (binomial distribution) or a sample from biases
image
or as a sample drawn from the distribution obtained by
• 10 label units: clamped in all iterations propagating random image all the way form the input.
• 500 units: binary samples.
KTH Pawel Herman DD2437 Representation learning, and generative models
• RBMs and CD learning
• DBNs (stacking RBMs)
• Hinton et al.’s (2006) model: generative mode

Hinton et al.’s (2006) architecture
Generating samples

2000 top‐level units

10 label units 500 units

1. Keep a label unit corresponding to the 500 units

requested digit label clamped (fixed).
2. Run Gibbs sampling for 200 iterations in the
generative
top RBM (500+10 <‐> 2000) to converge. 28x28 weights
pixel
3. Generate binary samples from probabilities image
and propagate down to the input layer
where you can see probabs again as images.

KTH Pawel Herman DD2437 Representation learning, and generative models

• RBMs and CD learning
• DBNs (stacking RBMs)
• Hinton et al.’s (2006) model: fine‐tuning

Hinton et al.’s (2006) architecture
Fine‐tuning with a contrastive wake‐sleep algorithm

2000 top‐level units

10 label units 500 units
Bottom‐up wake phase
1. Drive the network bottom‐up by providing input
digit images and using recognition weights 500 units recognition
propagate binary samples to the visible layer of
the top RBM.

28x28
pixel
image

KTH Pawel Herman DD2437 Representation learning, and generative models

• RBMs and CD learning
• DBNs (stacking RBMs)
• Hinton et al.’s (2006) model: fine‐tuning

Hinton et al.’s (2006) architecture
Fine‐tuning with a contrastive wake‐sleep algorithm

2000 top‐level units

top RBM
10 label units 500 units
(soft‐max)

1. Drive the network bottom‐up by providing input

digit images and using recognition weights 500 units
propagate binary samples to the visible layer of
the top RBM.
2. Input a label unit corresponding to digit.
3. Run Gibbs sampling for 10‐20 iterations in the top 28x28
pixel
RBM (500+10 <‐> 2000).
image

KTH Pawel Herman DD2437 Representation learning, and generative models

• RBMs and CD learning
• DBNs (stacking RBMs)
• Hinton et al.’s (2006) model: fine‐tuning

Hinton et al.’s (2006) architecture
Fine‐tuning with a contrastive wake‐sleep algorithm

2000 top‐level units

10 label units 500 units
(soft‐max)

1. Drive the network bottom‐up by providing input

digit images and using recognition weights 500 units
propagate binary samples to the visible layer of
the top RBM.
2. Input a label unit corresponding to digit. generative
3. Run Gibbs sampling for 10‐20 iterations in the top 28x28 weights
pixel
RBM (500+10 <‐> 2000).
image
4. Propagate the activity using generative weights
(binary sampling all the way) to the input layer
represented with probabs. Top down sleep phase
KTH Pawel Herman DD2437 Representation learning, and generative models
• RBMs and CD learning
• DBNs (stacking RBMs)
• Hinton et al.’s (2006) model: fine‐tuning

Approximate sampling from DBN

Gibbs sampling chain in the RBM part

RBM part
(undirected part of the graph)

single‐run sampling
(through the directed graph)

KTH Pawel Herman DD2437 Representation learning, and generative models

• RBMs and CD learning
• DBNs (stacking RBMs)
• Hinton et al.’s (2006) model: fine‐tuning

Hinton et al.’s (2006) architecture
Fine‐tuning with a contrastive wake‐sleep algorithm

Learning that results from the wake phase
(based on network activities sampled during wake phase)

xi
w ji

yj y j prediction
(probability)

w ji  xi  y j  y j 

KTH Pawel Herman DD2437 Representation learning, and generative models

• RBMs and CD learning
• DBNs (stacking RBMs)
• Hinton et al.’s (2006) model: fine‐tuning

Hinton et al.’s (2006) architecture
Fine‐tuning with a contrastive wake‐sleep algorithm
CDk learning of the top RBM
labels are not
clamped here
(soft‐max is used)

w j ,i  v (0)
j hi
(0)
 v (k ) (k )
j hi

binary states
binary states binary states
(probabilities could be used too)

KTH Pawel Herman DD2437 Representation learning, and generative models

• RBMs and CD learning
• DBNs (stacking RBMs)
• Hinton et al.’s (2006) model: fine-tuning

Hinton et al.’s (2006) architecture
Fine‐tuning with a contrastive wake‐sleep algorithm

Learning that results from the sleep phase
(based on network activities sampled during sleep phase)

yi yi prediction
(probability)

wij
xj

wij  x j  yi  yi 

KTH Pawel Herman DD2437 Representation learning, and generative models

Neural Networks For Machine Learning: Lecture 14A Learning Layers of Features by Stacking Rbms
No ratings yet
Neural Networks For Machine Learning: Lecture 14A Learning Layers of Features by Stacking Rbms
39 pages
RBM, DBN, and DBM
No ratings yet
RBM, DBN, and DBM
79 pages
Deep Learning & Neural Networks: Kevin Duh
No ratings yet
Deep Learning & Neural Networks: Kevin Duh
86 pages
Stochastically Reducing Overfitting in D
No ratings yet
Stochastically Reducing Overfitting in D
5 pages
DL CO4 - PPT 2
No ratings yet
DL CO4 - PPT 2
23 pages
Training Restricted Boltzmann Machines: An Introduction
No ratings yet
Training Restricted Boltzmann Machines: An Introduction
27 pages
Unit V
No ratings yet
Unit V
21 pages
Chapter_12_PartI
No ratings yet
Chapter_12_PartI
54 pages
Farkas Image Classif NN
No ratings yet
Farkas Image Classif NN
32 pages
Luciw RBM DBN
No ratings yet
Luciw RBM DBN
38 pages
RBM
No ratings yet
RBM
47 pages
unit 3
No ratings yet
unit 3
38 pages
Unit v Deep Generative Models_Part 01
No ratings yet
Unit v Deep Generative Models_Part 01
33 pages
9. Deep Belief Networks (DBNs)
No ratings yet
9. Deep Belief Networks (DBNs)
19 pages
Final Neural June 2020
No ratings yet
Final Neural June 2020
2 pages
Nips10 Workshop Tutorial Final PDF
No ratings yet
Nips10 Workshop Tutorial Final PDF
73 pages
Unit-V Deep Generative Models Part-01
No ratings yet
Unit-V Deep Generative Models Part-01
41 pages
Deep Learning u5
No ratings yet
Deep Learning u5
5 pages
2210.10318v1
No ratings yet
2210.10318v1
18 pages
Chapter 1
No ratings yet
Chapter 1
25 pages
Nips00 GM
No ratings yet
Nips00 GM
7 pages
Restricted Boltzmann Machines: Abstract
No ratings yet
Restricted Boltzmann Machines: Abstract
21 pages
Deep Boltzmann Machines
No ratings yet
Deep Boltzmann Machines
8 pages
Tom M CMU ANN Lecture Notes
No ratings yet
Tom M CMU ANN Lecture Notes
68 pages
Be Central
No ratings yet
Be Central
98 pages
Deep Belief Network
No ratings yet
Deep Belief Network
4 pages
3D Object Recognition With Deep Belief Nets
No ratings yet
3D Object Recognition With Deep Belief Nets
9 pages
Icml09 ConvolutionalDeepBeliefNetworks
No ratings yet
Icml09 ConvolutionalDeepBeliefNetworks
8 pages
Neural Networks Learning and Memorization With (Almost) No Over-Parameterization
No ratings yet
Neural Networks Learning and Memorization With (Almost) No Over-Parameterization
10 pages
Deep Learning Networks For Off-Line Handwritten Signature Recognition
No ratings yet
Deep Learning Networks For Off-Line Handwritten Signature Recognition
10 pages
AItRBM Proof
No ratings yet
AItRBM Proof
23 pages
Unit 5
No ratings yet
Unit 5
39 pages
Mod5_Slides
No ratings yet
Mod5_Slides
37 pages
L D G M: Earning EEP Enerative Odels
No ratings yet
L D G M: Earning EEP Enerative Odels
84 pages
3545572-supp
No ratings yet
3545572-supp
6 pages
Alain 14 A
No ratings yet
Alain 14 A
31 pages
Boltzmann Machine learning (1)
No ratings yet
Boltzmann Machine learning (1)
15 pages
Restricted Boltzmann Machine1
No ratings yet
Restricted Boltzmann Machine1
7 pages
Talk MLSS Part2
No ratings yet
Talk MLSS Part2
97 pages
Signal Classifier Using Deep Learning Architecture: Dwijith R A
No ratings yet
Signal Classifier Using Deep Learning Architecture: Dwijith R A
40 pages
cs236_lecture4
No ratings yet
cs236_lecture4
25 pages
DLbook
No ratings yet
DLbook
165 pages
Deep Learning
No ratings yet
Deep Learning
78 pages
ANIMESH GUPTA_2021UEA6545_PPT_DL
No ratings yet
ANIMESH GUPTA_2021UEA6545_PPT_DL
23 pages
9 Gans
No ratings yet
9 Gans
9 pages
3
No ratings yet
3
16 pages
Why Does Unsupervised Deep Learning Work? - A Perspective From Group Theory
No ratings yet
Why Does Unsupervised Deep Learning Work? - A Perspective From Group Theory
14 pages
WS_2021
No ratings yet
WS_2021
16 pages
Deep Learning Models
No ratings yet
Deep Learning Models
18 pages
Deep Neural Network Approximation Theory
No ratings yet
Deep Neural Network Approximation Theory
80 pages
Deep Learning Cognitive Radar For Micro UAS Detection and Classification
No ratings yet
Deep Learning Cognitive Radar For Micro UAS Detection and Classification
22 pages
Lateral
No ratings yet
Lateral
8 pages
Deep Belief Networks
No ratings yet
Deep Belief Networks
26 pages
Deep Learning 2017 Lecture7GAN
No ratings yet
Deep Learning 2017 Lecture7GAN
62 pages
minor 1 - DNN
No ratings yet
minor 1 - DNN
2 pages
DL Exam 2023-2
No ratings yet
DL Exam 2023-2
5 pages
9.deep Feedforward Networks
100% (1)
9.deep Feedforward Networks
13 pages
S2_7_NN
No ratings yet
S2_7_NN
39 pages
Convolutional Neural Network CNN For Ima
No ratings yet
Convolutional Neural Network CNN For Ima
5 pages
Age and Gender
No ratings yet
Age and Gender
14 pages
Ml0120en m2v4 The Mnist Database
No ratings yet
Ml0120en m2v4 The Mnist Database
2 pages
Agnes
No ratings yet
Agnes
25 pages
Explainable Artificial Intelligence Xai Concepts Enabling Tools Technologies And Applications Pethuru Raj download
100% (1)
Explainable Artificial Intelligence Xai Concepts Enabling Tools Technologies And Applications Pethuru Raj download
87 pages
Master Thesis TU Delft Dinesh Bisesser 2020
No ratings yet
Master Thesis TU Delft Dinesh Bisesser 2020
104 pages
Image Processing Based Real-Time Online Attendance Monitoring System Using Facia
No ratings yet
Image Processing Based Real-Time Online Attendance Monitoring System Using Facia
6 pages
Naive Bayes
No ratings yet
Naive Bayes
38 pages
batteries-10-00324
No ratings yet
batteries-10-00324
20 pages
Data Analytics For Fraud Detection
No ratings yet
Data Analytics For Fraud Detection
27 pages
Neural Networks in Fabric Engineering
No ratings yet
Neural Networks in Fabric Engineering
18 pages
YIA - List of Potential Supervisors - Call 2 - v3
No ratings yet
YIA - List of Potential Supervisors - Call 2 - v3
7 pages
(ESP) - Otc-30468-Ms
No ratings yet
(ESP) - Otc-30468-Ms
8 pages
DEEP LEARNING - LIBRARIES
No ratings yet
DEEP LEARNING - LIBRARIES
5 pages
Pattern Recognition
No ratings yet
Pattern Recognition
9 pages
Aiml (Medical Insurance Cost Detection) - 2
No ratings yet
Aiml (Medical Insurance Cost Detection) - 2
27 pages
Assignment 11
100% (1)
Assignment 11
4 pages
Thesis repot
No ratings yet
Thesis repot
9 pages
B.tech 20-21 Internship
No ratings yet
B.tech 20-21 Internship
9 pages
Enhancing Computational Fluid Dynamics With Machine Learning
No ratings yet
Enhancing Computational Fluid Dynamics With Machine Learning
15 pages
Lesson 7 Feature Engineering
No ratings yet
Lesson 7 Feature Engineering
43 pages
Automatic Ticket Assignment AIML Online Capstone Group 6
No ratings yet
Automatic Ticket Assignment AIML Online Capstone Group 6
21 pages
MalFSCIL: A Few-Shot Class-Incremental Learning Approach for Malware Detection
No ratings yet
MalFSCIL: A Few-Shot Class-Incremental Learning Approach for Malware Detection
20 pages
AIDA - Index
No ratings yet
AIDA - Index
10 pages
PHD Thesis in Management PDF
100% (3)
PHD Thesis in Management PDF
5 pages
AI Intro & Tools
No ratings yet
AI Intro & Tools
9 pages
1RV21AI011-1RV21AI028 Stream Lab Report
No ratings yet
1RV21AI011-1RV21AI028 Stream Lab Report
34 pages
Natnael Mekuanent 2021
No ratings yet
Natnael Mekuanent 2021
86 pages
Poppy Gustafsson Redefining
No ratings yet
Poppy Gustafsson Redefining
20 pages
Artificial_Intelligence_and_IoT_in_Elderly_Fall_Prevention_A_Review
No ratings yet
Artificial_Intelligence_and_IoT_in_Elderly_Fall_Prevention_A_Review
18 pages