0% found this document useful (0 votes)

19 views5 pages

CS445 - Neural Networks and Deep Learning - Lecture Notes

Uploaded by

nihed13535

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views5 pages

CS445 - Neural Networks and Deep Learning - Lecture Notes

Uploaded by

nihed13535

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

CS445: Neural Networks and Deep Learning

Lecture 4: Backpropagation and Gradient Descent Professor Chen - Fall 2024

I. Understanding Backpropagation
Today's lecture focused on the mathematics behind neural network training. The
backpropagation algorithm is fundamental to how neural networks learn from data.

Key Concepts:
1. Forward Propagation

- Input signals flow through the network

- Each neuron computes: output = activation_function(weighted_sum + bias)
- Final layer produces prediction

2. Computing the Loss

- Measures difference between prediction and actual target

- Common loss functions:
● Mean Squared Error (MSE): L = 1/n Σ(y - ŷ)²
● Cross-Entropy: L = -Σ y log(ŷ)

II. The Chain Rule in Neural Networks

The chain rule is crucial for computing gradients through multiple layers:

∂L/∂w = ∂L/∂a × ∂a/∂z × ∂z/∂w

Where:

- L is the loss
- a is the activation
- z is the weighted sum
- w is the weight

III. Gradient Descent Implementation

def backward_pass(network, loss, learning_rate=0.01):
# Compute gradients

for layer in reversed(network.layers):

layer.gradients = compute_gradients(layer)

# Update weights and biases

layer.weights -= learning_rate * layer.gradients['weights']

layer.biases -= learning_rate * layer.gradients['biases']

Types of Gradient Descent:

1. Batch Gradient Descent

- Uses entire dataset for each update

- Very stable but slow
- High memory requirements

2. Stochastic Gradient Descent (SGD)

- Uses single sample for each update

- Faster but noisier
- Lower memory requirements

3. Mini-batch Gradient Descent

- Best of both worlds

- Typically 32-256 samples per batch
- Most commonly used in practice

IV. Activation Functions

We covered several activation functions and their derivatives:

1. Sigmoid

- σ(x) = 1/(1 + e^(-x))

- Derivative: σ(x)(1 - σ(x))
- Issues with vanishing gradients
2. ReLU

- f(x) = max(0, x)
- Derivative: 1 if x > 0, 0 otherwise
- Most commonly used today

3. Tanh

- Range: [-1, 1]
- Often better than sigmoid
- Still has vanishing gradient issues

V. Common Challenges and Solutions

1. Vanishing Gradients
Solutions:

- Use ReLU activation

- Implement residual connections
- Proper initialization

2. Exploding Gradients
Solutions:

- Gradient clipping
- Batch normalization
- L2 regularization

VI. Practical Implementation Tips

1. Weight Initialization:

# He initialization for ReLU networks

weights = np.random.randn(shape) * np.sqrt(2/n_inputs)

# Xavier initialization for tanh networks

weights = np.random.randn(shape) * np.sqrt(1/n_inputs)

2. Learning Rate Selection:
- Start with 0.01
- Use learning rate schedules
- Consider adaptive methods (Adam, RMSprop)

VII. Today's Lab Exercise

Implement a simple neural network with:

1. One hidden layer (64 units)

2. ReLU activation
3. Softmax output layer
4. Cross-entropy loss
5. Mini-batch gradient descent

Homework Assignment
Due next Tuesday:

1. Implement backpropagation from scratch

2. Train a network on MNIST dataset
3. Experiment with different:
- Learning rates
- Batch sizes
- Network architectures

Important Formulas to Remember

1. Softmax: σ(z)ᵢ = e^zᵢ / Σ e^z

2. Cross-Entropy Loss: L = -Σ yᵢ log(ŷᵢ)

3. Weight Update Rule: w = w - α∇L

Additional reading: "Deep Learning" by Goodfellow, Bengio, and Courville - Chapter 6.5

Next Week's Preview

- Convolutional Neural Networks
- Feature Maps
- Pooling Layers
- CNN Architectures
Recommended Resources
- Tensorflow Documentation
- PyTorch Tutorials
- Stanford CS231n Course Notes
- Andrew Ng's Deep Learning Specialization

Note: Office hours this week are Wednesday 2-4pm and Thursday 3-5pm in Room 405.

978 981 15 2584 1 PDF
No ratings yet
978 981 15 2584 1 PDF
684 pages
Lecture 02-2
No ratings yet
Lecture 02-2
37 pages
06 AIS302 ANN Backpropagation
No ratings yet
06 AIS302 ANN Backpropagation
83 pages
4 Neural Networks
No ratings yet
4 Neural Networks
31 pages
ANN Notes Updated
0% (1)
ANN Notes Updated
46 pages
DL U-I Introduction Part-2
No ratings yet
DL U-I Introduction Part-2
48 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
Module 3 - Modified
No ratings yet
Module 3 - Modified
106 pages
Training Deep Neural Networks
No ratings yet
Training Deep Neural Networks
55 pages
3.4 - Backpropagation and Architectures
No ratings yet
3.4 - Backpropagation and Architectures
28 pages
ML807 Distributed and Federated Learning Slides 2
No ratings yet
ML807 Distributed and Federated Learning Slides 2
211 pages
HW1P1 F23
No ratings yet
HW1P1 F23
37 pages
02 Neural Networks
No ratings yet
02 Neural Networks
32 pages
Supervised Deep Learning
No ratings yet
Supervised Deep Learning
28 pages
Module 2
No ratings yet
Module 2
12 pages
Lect 12 - Deep Feed Forward NN - Review
No ratings yet
Lect 12 - Deep Feed Forward NN - Review
93 pages
Slides 11
No ratings yet
Slides 11
48 pages
Deep Learning Module-02
No ratings yet
Deep Learning Module-02
15 pages
Deep Learning Lectures - 2
No ratings yet
Deep Learning Lectures - 2
73 pages
Deep Learning Module-02 Search Creators
No ratings yet
Deep Learning Module-02 Search Creators
15 pages
Artificial Neural Networks - DL
No ratings yet
Artificial Neural Networks - DL
55 pages
Ai - W7L13
No ratings yet
Ai - W7L13
46 pages
Lesson 3 Artificial Neural Network
No ratings yet
Lesson 3 Artificial Neural Network
77 pages
DS303 NN
No ratings yet
DS303 NN
20 pages
18 DL Regularization
No ratings yet
18 DL Regularization
41 pages
L10 Neural Network
No ratings yet
L10 Neural Network
52 pages
AyushChokhani AI Asiignment 2
No ratings yet
AyushChokhani AI Asiignment 2
12 pages
3EBX0 Lecture Notes Addendum
No ratings yet
3EBX0 Lecture Notes Addendum
10 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Lecture 09 Slides - After
No ratings yet
Lecture 09 Slides - After
57 pages
L4 Training Neural Networks en
No ratings yet
L4 Training Neural Networks en
48 pages
Deep Feedforward Networks and Regularization: Licheng Zhang
No ratings yet
Deep Feedforward Networks and Regularization: Licheng Zhang
56 pages
Deep MLP's
No ratings yet
Deep MLP's
44 pages
Deep Learning PDF
100% (1)
Deep Learning PDF
87 pages
CT1 DL Ans
No ratings yet
CT1 DL Ans
13 pages
Sorting
No ratings yet
Sorting
156 pages
Unit 2 - ML
No ratings yet
Unit 2 - ML
18 pages
Foundations of Deep Learning
No ratings yet
Foundations of Deep Learning
30 pages
Lecture 5
No ratings yet
Lecture 5
34 pages
Back Propagation
No ratings yet
Back Propagation
29 pages
2023246032-Backward Propagation and Other Differential Algorithms
No ratings yet
2023246032-Backward Propagation and Other Differential Algorithms
48 pages
cst414 - Deep Learning
No ratings yet
cst414 - Deep Learning
34 pages
SDL Unit 2 3 4
No ratings yet
SDL Unit 2 3 4
12 pages
Module 2
No ratings yet
Module 2
13 pages
Assignment - 4
No ratings yet
Assignment - 4
24 pages
MLS 1 - Presentation
No ratings yet
MLS 1 - Presentation
11 pages
Unit 5 (Second Half)
No ratings yet
Unit 5 (Second Half)
10 pages
A Probabilistic Theory of Deep Learning: Unit 2
100% (1)
A Probabilistic Theory of Deep Learning: Unit 2
17 pages
Pure Optimization
No ratings yet
Pure Optimization
23 pages
Deep Learning
No ratings yet
Deep Learning
19 pages
Activation Function To Back Pro
No ratings yet
Activation Function To Back Pro
22 pages
Chapter 2 - Image Enhancement and Filter in Spatial Domain
No ratings yet
Chapter 2 - Image Enhancement and Filter in Spatial Domain
68 pages
Machine Learning
No ratings yet
Machine Learning
4 pages
3rd Ass
No ratings yet
3rd Ass
6 pages
Radial Basis Functions Neural Networks
No ratings yet
Radial Basis Functions Neural Networks
11 pages
AML 03 Dense Neural Networks
No ratings yet
AML 03 Dense Neural Networks
20 pages
ANN Unit IV Notes
No ratings yet
ANN Unit IV Notes
4 pages
Different Activation Functions With The Equations
No ratings yet
Different Activation Functions With The Equations
6 pages
A Imprimer 4
No ratings yet
A Imprimer 4
4 pages
D Gloge Applied Optics
No ratings yet
D Gloge Applied Optics
52 pages
Ass 2
No ratings yet
Ass 2
7 pages
Errorback Propagation
No ratings yet
Errorback Propagation
3 pages
Notes Chapter8
No ratings yet
Notes Chapter8
4 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
Binary Search Tree
No ratings yet
Binary Search Tree
36 pages
Mod6 Slides
No ratings yet
Mod6 Slides
27 pages
Digital Image Processing & Pattern Analysis (CSCE 563) : Course Outline & Introduction
No ratings yet
Digital Image Processing & Pattern Analysis (CSCE 563) : Course Outline & Introduction
24 pages
Speech and Audio Signal Processing ECE554 - Lec - 5 STFT Analysis v2.1
No ratings yet
Speech and Audio Signal Processing ECE554 - Lec - 5 STFT Analysis v2.1
17 pages
Multilayer Perceptrons For Digit Recognition With Core APIs - TensorFlow Core
No ratings yet
Multilayer Perceptrons For Digit Recognition With Core APIs - TensorFlow Core
21 pages
Cp5191 Machine Learning Techniques
100% (3)
Cp5191 Machine Learning Techniques
2 pages
1 s2.0 S0022247X24002105 Main
No ratings yet
1 s2.0 S0022247X24002105 Main
17 pages
Optimization Transfer Algorithms in Statistics
No ratings yet
Optimization Transfer Algorithms in Statistics
29 pages
Lecture 6 (Binary Codes For Decimal Digits) PDF
No ratings yet
Lecture 6 (Binary Codes For Decimal Digits) PDF
15 pages
Setting Up The Linear Programming Problem
No ratings yet
Setting Up The Linear Programming Problem
12 pages
Machine Learning
No ratings yet
Machine Learning
11 pages
Data Structures Question Paper
No ratings yet
Data Structures Question Paper
2 pages
KL University, Vaddeswaram, Dept. of Ece B. Tech (All Branches), Iind Year, Sem-1 Test-1 Solutions, Signal Processing-13-Es205 Max Marks 30
No ratings yet
KL University, Vaddeswaram, Dept. of Ece B. Tech (All Branches), Iind Year, Sem-1 Test-1 Solutions, Signal Processing-13-Es205 Max Marks 30
6 pages
Solutions For Exercises in A First Course in Machine Learning (2nd Edition) by Simon Rogers and Mark Girolami
No ratings yet
Solutions For Exercises in A First Course in Machine Learning (2nd Edition) by Simon Rogers and Mark Girolami
6 pages
Interview Questions 2
No ratings yet
Interview Questions 2
5 pages
Heaton Et Al Ijcnn 2017-Pre
No ratings yet
Heaton Et Al Ijcnn 2017-Pre
7 pages
CBOT
No ratings yet
CBOT
8 pages
To Improve The Performance of Models Predicting Ba
No ratings yet
To Improve The Performance of Models Predicting Ba
6 pages
MIT15 093J F09 Rec04
No ratings yet
MIT15 093J F09 Rec04
4 pages
Arid Agriculture University, Rawalpindi
No ratings yet
Arid Agriculture University, Rawalpindi
4 pages
Fast Convergence Algorithms For Active Noise Control in Vehicles
No ratings yet
Fast Convergence Algorithms For Active Noise Control in Vehicles
15 pages
Secure Hash Algorithm 2 (SHA-2)
No ratings yet
Secure Hash Algorithm 2 (SHA-2)
2 pages
EQTVuAhSFY (Dragged) 2
No ratings yet
EQTVuAhSFY (Dragged) 2
1 page
ECE232 DTFT Properties
No ratings yet
ECE232 DTFT Properties
1 page
Techniques and Tools for Artificial Intelligence. Neural Networks via R and PYTHON
From Everand
Techniques and Tools for Artificial Intelligence. Neural Networks via R and PYTHON
César Pérez López
No ratings yet
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet

Uploaded by

Uploaded by

CS445: Neural Networks and Deep Learning

Lecture 4: Backpropagation and Gradient Descent Professor Chen - Fall 2024

- Input signals flow through the network

2. Computing the Loss

- Measures difference between prediction and actual target

II. The Chain Rule in Neural Networks

∂L/∂w = ∂L/∂a × ∂a/∂z × ∂z/∂w

III. Gradient Descent Implementation

for layer in reversed(network.layers):

# Update weights and biases

layer.weights -= learning_rate * layer.gradients['weights']

layer.biases -= learning_rate * layer.gradients['biases']

Types of Gradient Descent:

- Uses entire dataset for each update

2. Stochastic Gradient Descent (SGD)

- Uses single sample for each update

3. Mini-batch Gradient Descent

- Best of both worlds

IV. Activation Functions

- σ(x) = 1/(1 + e^(-x))

V. Common Challenges and Solutions

- Use ReLU activation

VI. Practical Implementation Tips

# He initialization for ReLU networks

weights = np.random.randn(shape) * np.sqrt(2/n_inputs)

# Xavier initialization for tanh networks

weights = np.random.randn(shape) * np.sqrt(1/n_inputs)

VII. Today's Lab Exercise

1. One hidden layer (64 units)

1. Implement backpropagation from scratch

Important Formulas to Remember

2. Cross-Entropy Loss: L = -Σ yᵢ log(ŷᵢ)

3. Weight Update Rule: w = w - α∇L

Next Week's Preview

You might also like