0% found this document useful (0 votes)

235 views34 pages

Machine Learning: Chapter 4. Artificial Neural Networks

The document discusses artificial neural networks. Some key points: - Neural networks are made up of interconnected nodes that operate in parallel to solve problems. Backpropagation is an algorithm used to train multilayer neural networks by calculating error gradients. - Recurrent neural networks can represent temporal behavior because they have connections between nodes within a layer. - Neural networks can learn complex patterns and approximate continuous functions, though they are prone to overfitting. Techniques like adding hidden units and alternative error functions help address this.

Uploaded by

fareenfarzanawahed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

235 views34 pages

Machine Learning: Chapter 4. Artificial Neural Networks

Uploaded by

fareenfarzanawahed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 34

Machine Learning

Chapter 4. Artificial Neural

Networks

Tom M. Mitchell
Artificial Neural Networks
 Threshold units
 Gradient descent
 Multilayer networks
 Backpropagation
 Hidden layer representations
 Example: Face Recognition
 Advanced topics
2
Connectionist Models (1/2)
Consider humans:
 Neuron switching time ~ .001 second
 Number of neurons ~ 1010
 Connections per neuron ~ 104-5
 Scene recognition time ~ .1 second
 100 inference steps doesn’t seem like enough
 much parallel computation

3
Connectionist Models (2/2)
Properties of artificial neural nets (ANN’s):
 Many neuron-like threshold switching units
 Many weighted interconnections among units
 Highly parallel, distributed process
 Emphasis on tuning weights automatically

4
When to Consider Neural Networks
 Input is high-dimensional discrete or real-valued
(e.g. raw sensor input)
 Output is discrete or real valued
 Output is a vector of values
 Possibly noisy data
 Form of target function is unknown
 Human readability of result is unimportant
Examples:
 Speech phoneme recognition [Waibel]
 Image classification [Kanade, Baluja, Rowley]
 Financial prediction

5
ALVINN drives 70 mph on highways

6
Perceptron

Sometimes we’ll use simpler vector notation:

7
Decision Surface of a Perceptron

Represents some useful functions

 What weights represent
g(x1, x2) = AND(x1, x2)?
But some functions not representable
 e.g., not linearly separable
 Therefore, we’ll want networks of these...
8
Perceptron training rule
wi  wi + wi
where wi =  (t – o) xi
Where:
 t = c(x) is target value
 o is perceptron output
  is small constant (e.g., .1) called learning rate

Can prove it will converge

 If training data is linearly separable
 and  sufficiently small
9
Gradient Descent (1/4)
 To understand, consider simpler linear unit, where
o = w0 + w1x1 + ··· + wnxn
 Let's learn wi’s that minimize the squared error

 Where D is set of training examples

10
Gradient Descent (2/4)

Gradient

Training rule:

i.e.,

11
Gradient Descent (3/4)

12
Gradient Descent (4/4)
 Initialize each wi to some small random value
 Until the termination condition is met, Do
– Initialize each wi to zero.
– For each <x, t> in training_examples, Do
* Input the instance x to the unit and compute the
output o
* For each linear unit weight wi, Do
wi  wi +  (t – o) xi
– For each linear unit weight wi , Do
wi  wi + wi
13
Summary
Perceptron training rule guaranteed to succeed if
 Training examples are linearly separable
 Sufficiently small learning rate 

Linear unit training rule uses gradient descent

 Guaranteed to converge to hypothesis with minimum
squared error
 Given sufficiently small learning rate 
 Even when training data contains noise
 Even when training data not separable by H
14
Incremental (Stochastic) Gradient Descent (1/2)

Batch mode Gradient Descent:

Do until satisfied
1. Compute the gradient ED[w]
2. w  w -  ED[w]

Incremental mode Gradient Descent:

Do until satisfied
 For each training example d in D
1. Compute the gradient Ed[w]
2. w  w -  Ed[w]

15
Incremental (Stochastic) Gradient Descent (2/2)

Incremental Gradient Descent can approximate

Batch Gradient Descent arbitrarily closely if 
made small enough

16
Multilayer Networks of Sigmoid Units

17
Sigmoid Unit

(x) is the sigmoid function

Nice property:
We can derive gradient decent rules to train
 One sigmoid unit
 Multilayer networks of sigmoid units  Backpropagation
18
Error Gradient for a Sigmoid Unit

But we know:

So:

19
Backpropagation Algorithm
Initialize all weights to small random numbers. Until satisfied, Do
 For each training example, Do
1. Input the training example to the network and compute the
network outputs
2. For each output unit k : k  k(1 - k) (tk - k)
3. For each hidden unit h
h  h(1 - h) k outputs wh,kk 
4. Update each network weight wi,j
wi,j  wi,j + wi,j
where wi,j =  j xi,j

20
More on Backpropagation
 Gradient descent over entire network weight vector
 Easily generalized to arbitrary directed graphs
 Will find a local, not necessarily global error minimum
– In practice, often works well (can run multiple times)
 Often include weight momentum 
wi,j (n) =  j xi,j +  wi,j (n - 1)
 Minimizes error over training examples
– Will it generalize well to subsequent examples?
 Training can take thousands of iterations  slow!
 Using network after training is very fast

21
Learning Hidden Layer Representations (1/2)

A target function:

Can this be learned??

22
Learning Hidden Layer Representations (2/2)

A network: Learned hidden layer representation:

23
Training (1/3)

24
Training (2/3)

25
Training (3/3)

26
Convergence of Backpropagation
Gradient descent to some local minimum
 Perhaps not global minimum...
 Add momentum
 Stochastic gradient descent
 Train multiple nets with different initial weights

Nature of convergence
 Initialize weights near zero
 Therefore, initial networks near-linear
 Increasingly non-linear functions possible as training
progresses
27
Expressive Capabilities of ANNs
Boolean functions:
 Every boolean function can be represented by network with
single hidden layer
 but might require exponential (in number of inputs) hidden
units
Continuous functions:
 Every bounded continuous function can be approximated
with arbitrarily small error, by network with one hidden layer
[Cybenko 1989; Hornik et al. 1989]
 Any function can be approximated to arbitrary accuracy by a
network with two hidden layers [Cybenko 1988].

28
Overfitting in ANNs (1/2)

29
Overfitting in ANNs (2/2)

30
Neural Nets for Face Recognition

 90% accurate learning head pose, and recognizing

1-of-20 faces
31
Learned Hidden Unit Weights

http://www.cs.cmu.edu/tom/faces.html

32
Alternative Error Functions
Penalize large weights:

Train on target slopes as well as values:

Tie together weights:

 e.g., in phoneme recognition network

33
Recurrent Networks
(a) (b) (c)

(a) Feedforward network

(b) Recurrent network
(c) Recurrent network unfolded
in time

MATLAB Codes (CNN, LSTM)
100% (1)
MATLAB Codes (CNN, LSTM)
7 pages
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
No ratings yet
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
17 pages
Margham Booklist
50% (2)
Margham Booklist
5 pages
Flow Shop Scheduling
100% (4)
Flow Shop Scheduling
25 pages
MATH 6 - Q1 - W4 - Adds and Subtracts Decimal Numbers
No ratings yet
MATH 6 - Q1 - W4 - Adds and Subtracts Decimal Numbers
28 pages
Chapter 3. Suggested Problems
100% (1)
Chapter 3. Suggested Problems
2 pages
PDF Financial Modelling With Jump Processes 1st Edition Peter Tankov Download
100% (20)
PDF Financial Modelling With Jump Processes 1st Edition Peter Tankov Download
84 pages
Principles of Mechanism
100% (1)
Principles of Mechanism
356 pages
Introduction of Machine Learning
No ratings yet
Introduction of Machine Learning
58 pages
Gradient Boosting
No ratings yet
Gradient Boosting
9 pages
Machine Learning Random Forest Algorithm - Javatpoint
No ratings yet
Machine Learning Random Forest Algorithm - Javatpoint
14 pages
A Recurrent Neural Network
No ratings yet
A Recurrent Neural Network
22 pages
REPORT On DECISION TREE
No ratings yet
REPORT On DECISION TREE
40 pages
9.deep Feedforward Networks
100% (1)
9.deep Feedforward Networks
13 pages
ML Akash
No ratings yet
ML Akash
53 pages
Lecture13 ANFIS
No ratings yet
Lecture13 ANFIS
43 pages
ML Unit - 2
No ratings yet
ML Unit - 2
70 pages
DL Question Bank Answers
No ratings yet
DL Question Bank Answers
55 pages
Thermodynamic Properties DuPont Suva HFC 125
No ratings yet
Thermodynamic Properties DuPont Suva HFC 125
30 pages
Activation Functions - Ipynb - Colaboratory
No ratings yet
Activation Functions - Ipynb - Colaboratory
10 pages
Estimate Parameters of ARMAX, ARIMAX, ARMA, or ARIMA Model Using Time-Domain Data
No ratings yet
Estimate Parameters of ARMAX, ARIMAX, ARMA, or ARIMA Model Using Time-Domain Data
15 pages
Nota PDT 263
No ratings yet
Nota PDT 263
129 pages
Veer Narmad South Gujarat University, Surat Managerial Economics
No ratings yet
Veer Narmad South Gujarat University, Surat Managerial Economics
2 pages
Advanced Excel Syllabus Fees 2500 Duration 30 Days: More Functions and Formulas
No ratings yet
Advanced Excel Syllabus Fees 2500 Duration 30 Days: More Functions and Formulas
2 pages
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
16 pages
Neural Network
No ratings yet
Neural Network
44 pages
DeepXDE A Deep Learning Library For Solving Differ
No ratings yet
DeepXDE A Deep Learning Library For Solving Differ
17 pages
PINN Gentle Introduction
No ratings yet
PINN Gentle Introduction
26 pages
1.6 With Answer Keys
No ratings yet
1.6 With Answer Keys
23 pages
IPCV Unit 04
No ratings yet
IPCV Unit 04
12 pages
02 Understanding Mini Batch Gradient Descent C2W2L02
No ratings yet
02 Understanding Mini Batch Gradient Descent C2W2L02
4 pages
Intro SVM New Example PDF
100% (1)
Intro SVM New Example PDF
56 pages
1 Fnov 2023 Ans
No ratings yet
1 Fnov 2023 Ans
24 pages
Design Based On Availability Generative Design and Robotic Fabrication Workflow For Non-Standardized Sheet Metal With Variable Properties
No ratings yet
Design Based On Availability Generative Design and Robotic Fabrication Workflow For Non-Standardized Sheet Metal With Variable Properties
17 pages
Energy Conservation
No ratings yet
Energy Conservation
250 pages
Fuzzy Logic Research Paper
No ratings yet
Fuzzy Logic Research Paper
15 pages
Using Class Static Void String String String
No ratings yet
Using Class Static Void String String String
45 pages
MLOPs Original
No ratings yet
MLOPs Original
27 pages
Self Supervised Multi Modal Sequential Recommendation
No ratings yet
Self Supervised Multi Modal Sequential Recommendation
11 pages
Turbulence Modeling Slide
No ratings yet
Turbulence Modeling Slide
25 pages
Exercises and Examples of Fuzzy Logic Controller Using Toolbox and M File of Matlab
No ratings yet
Exercises and Examples of Fuzzy Logic Controller Using Toolbox and M File of Matlab
27 pages
PDST Physics - Mirrors - Plane
No ratings yet
PDST Physics - Mirrors - Plane
28 pages
Grade 1 Tests 1-1
100% (1)
Grade 1 Tests 1-1
14 pages
Les 3 DWM
No ratings yet
Les 3 DWM
21 pages
Machine Learning Techniques
100% (2)
Machine Learning Techniques
45 pages
3-Gauss Elimination Method
No ratings yet
3-Gauss Elimination Method
12 pages
Lecture Series On Machine Learning: Ravi Gupta G. Bharadwaja Kumar
No ratings yet
Lecture Series On Machine Learning: Ravi Gupta G. Bharadwaja Kumar
77 pages
Ch. 9: Introduction To Convolution Neural Networks (CNN) and Systems
No ratings yet
Ch. 9: Introduction To Convolution Neural Networks (CNN) and Systems
96 pages
Single Layer Perceptron Classifier
No ratings yet
Single Layer Perceptron Classifier
62 pages
Screenshot 2023-08-24 at 9.51.35 AM
No ratings yet
Screenshot 2023-08-24 at 9.51.35 AM
12 pages
Furman Math Tournament, Junior Exam 1996
No ratings yet
Furman Math Tournament, Junior Exam 1996
5 pages
Convolution in 1D and 2D
No ratings yet
Convolution in 1D and 2D
18 pages
P1 - Single Layer Feed Forward Networks
No ratings yet
P1 - Single Layer Feed Forward Networks
52 pages
Finance Basics Fin Basics 2.0 Quantitative Moelling Customer Analytics Business Capstone Anal
100% (1)
Finance Basics Fin Basics 2.0 Quantitative Moelling Customer Analytics Business Capstone Anal
3 pages
Answers All 2007
0% (1)
Answers All 2007
64 pages
Concept Learning
No ratings yet
Concept Learning
62 pages
PPT
No ratings yet
PPT
20 pages
AI-Lecture 12 - Simple Perceptron
100% (1)
AI-Lecture 12 - Simple Perceptron
24 pages
MATH 126 Module 3 - Exercise 2 - Ellipse
No ratings yet
MATH 126 Module 3 - Exercise 2 - Ellipse
2 pages
3AUA0000060663 3aua0000060663
No ratings yet
3AUA0000060663 3aua0000060663
2 pages
Random Forest Algorithm
No ratings yet
Random Forest Algorithm
28 pages
MB363 Assignment 1 Report
0% (1)
MB363 Assignment 1 Report
31 pages
Rapid Prototyping (RP) : Cad/Cam/Cae
No ratings yet
Rapid Prototyping (RP) : Cad/Cam/Cae
45 pages
MACHINE LEARNING TECHNIQUES - PPSX
No ratings yet
MACHINE LEARNING TECHNIQUES - PPSX
26 pages
NN LMS DR Gamal PDF
No ratings yet
NN LMS DR Gamal PDF
34 pages
Convolutional Neural Networks: Computer Vision
No ratings yet
Convolutional Neural Networks: Computer Vision
14 pages
Brochure Delft3D Flexible Mesh Suite
No ratings yet
Brochure Delft3D Flexible Mesh Suite
8 pages
Whole Numbers Year 3
No ratings yet
Whole Numbers Year 3
5 pages
GeckoSCRIPT PDF
No ratings yet
GeckoSCRIPT PDF
21 pages
Neural Networks
No ratings yet
Neural Networks
29 pages
Pyfirmata Latest PDF
No ratings yet
Pyfirmata Latest PDF
19 pages
2.2 RGB Led
No ratings yet
2.2 RGB Led
7 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
Convolutional Neural Network
100% (1)
Convolutional Neural Network
3 pages
Temperature Control and Adaptive Fuzzy Systems
No ratings yet
Temperature Control and Adaptive Fuzzy Systems
11 pages
Conjoint Analysis
No ratings yet
Conjoint Analysis
16 pages
AI Assignment
No ratings yet
AI Assignment
6 pages
Casting: 2.810 Prof. Timothy Gutowski
No ratings yet
Casting: 2.810 Prof. Timothy Gutowski
65 pages
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
No ratings yet
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
16 pages
Types of Neural Networks
No ratings yet
Types of Neural Networks
7 pages
Lecture Notes 5
No ratings yet
Lecture Notes 5
3 pages
CNN
No ratings yet
CNN
1 page
Multisim Software Tutorial
No ratings yet
Multisim Software Tutorial
33 pages
Modelling Rockfall Protection Fences (Cantarelli Et Al.)
No ratings yet
Modelling Rockfall Protection Fences (Cantarelli Et Al.)
6 pages
The Camera Analogy
No ratings yet
The Camera Analogy
6 pages
Useful Matlab Code
No ratings yet
Useful Matlab Code
5 pages
DSP Filter Design With Sptool Matlab
No ratings yet
DSP Filter Design With Sptool Matlab
6 pages
Appendix B Hand Out Gauss Newton Derivation
No ratings yet
Appendix B Hand Out Gauss Newton Derivation
8 pages
Robotics Toolbox For MATLAB
No ratings yet
Robotics Toolbox For MATLAB
6 pages
Random Forest Explained & Implemented in Python
No ratings yet
Random Forest Explained & Implemented in Python
1 page
Ain Shams University Faculty of Engineering
No ratings yet
Ain Shams University Faculty of Engineering
2 pages
Neural Networks and Their Application To Finance: Martin P. Wallace (P D)
No ratings yet
Neural Networks and Their Application To Finance: Martin P. Wallace (P D)
10 pages
CS 3600 Project 4b Analysis
No ratings yet
CS 3600 Project 4b Analysis
3 pages
Linear Equations in Two Variables
No ratings yet
Linear Equations in Two Variables
3 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet

Uploaded by

Uploaded by

Machine Learning

Chapter 4. Artificial Neural

Sometimes we’ll use simpler vector notation:

Represents some useful functions

Can prove it will converge

 Where D is set of training examples

Linear unit training rule uses gradient descent

Batch mode Gradient Descent:

Incremental mode Gradient Descent:

Incremental Gradient Descent can approximate

(x) is the sigmoid function

Can this be learned??

A network: Learned hidden layer representation:

 90% accurate learning head pose, and recognizing

Train on target slopes as well as values:

Tie together weights:

(a) Feedforward network

You might also like