0% found this document useful (0 votes)
19 views5 pages

CS445 - Neural Networks and Deep Learning - Lecture Notes

Uploaded by

nihed13535
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views5 pages

CS445 - Neural Networks and Deep Learning - Lecture Notes

Uploaded by

nihed13535
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

CS445: Neural Networks and Deep Learning

Lecture 4: Backpropagation and Gradient Descent Professor Chen - Fall 2024

I. Understanding Backpropagation
Today's lecture focused on the mathematics behind neural network training. The
backpropagation algorithm is fundamental to how neural networks learn from data.

Key Concepts:
1. Forward Propagation

- Input signals flow through the network


- Each neuron computes: output = activation_function(weighted_sum + bias)
- Final layer produces prediction

2. Computing the Loss

- Measures difference between prediction and actual target


- Common loss functions:
● Mean Squared Error (MSE): L = 1/n Σ(y - ŷ)²
● Cross-Entropy: L = -Σ y log(ŷ)

II. The Chain Rule in Neural Networks


The chain rule is crucial for computing gradients through multiple layers:

∂L/∂w = ∂L/∂a × ∂a/∂z × ∂z/∂w

Where:

- L is the loss
- a is the activation
- z is the weighted sum
- w is the weight

III. Gradient Descent Implementation


def backward_pass(network, loss, learning_rate=0.01):
# Compute gradients

for layer in reversed(network.layers):

layer.gradients = compute_gradients(layer)

# Update weights and biases

layer.weights -= learning_rate * layer.gradients['weights']

layer.biases -= learning_rate * layer.gradients['biases']

Types of Gradient Descent:


1. Batch Gradient Descent

- Uses entire dataset for each update


- Very stable but slow
- High memory requirements

2. Stochastic Gradient Descent (SGD)

- Uses single sample for each update


- Faster but noisier
- Lower memory requirements

3. Mini-batch Gradient Descent

- Best of both worlds


- Typically 32-256 samples per batch
- Most commonly used in practice

IV. Activation Functions


We covered several activation functions and their derivatives:

1. Sigmoid

- σ(x) = 1/(1 + e^(-x))


- Derivative: σ(x)(1 - σ(x))
- Issues with vanishing gradients
2. ReLU

- f(x) = max(0, x)
- Derivative: 1 if x > 0, 0 otherwise
- Most commonly used today

3. Tanh

- Range: [-1, 1]
- Often better than sigmoid
- Still has vanishing gradient issues

V. Common Challenges and Solutions

1. Vanishing Gradients
Solutions:

- Use ReLU activation


- Implement residual connections
- Proper initialization

2. Exploding Gradients
Solutions:

- Gradient clipping
- Batch normalization
- L2 regularization

VI. Practical Implementation Tips


1. Weight Initialization:

# He initialization for ReLU networks

weights = np.random.randn(shape) * np.sqrt(2/n_inputs)

# Xavier initialization for tanh networks

weights = np.random.randn(shape) * np.sqrt(1/n_inputs)


2. Learning Rate Selection:
- Start with 0.01
- Use learning rate schedules
- Consider adaptive methods (Adam, RMSprop)

VII. Today's Lab Exercise


Implement a simple neural network with:

1. One hidden layer (64 units)


2. ReLU activation
3. Softmax output layer
4. Cross-entropy loss
5. Mini-batch gradient descent

Homework Assignment
Due next Tuesday:

1. Implement backpropagation from scratch


2. Train a network on MNIST dataset
3. Experiment with different:
- Learning rates
- Batch sizes
- Network architectures

Important Formulas to Remember


1. Softmax: σ(z)ᵢ = e^zᵢ / Σ e^z

2. Cross-Entropy Loss: L = -Σ yᵢ log(ŷᵢ)

3. Weight Update Rule: w = w - α∇L

Additional reading: "Deep Learning" by Goodfellow, Bengio, and Courville - Chapter 6.5

Next Week's Preview


- Convolutional Neural Networks
- Feature Maps
- Pooling Layers
- CNN Architectures
Recommended Resources
- Tensorflow Documentation
- PyTorch Tutorials
- Stanford CS231n Course Notes
- Andrew Ng's Deep Learning Specialization

Note: Office hours this week are Wednesday 2-4pm and Thursday 3-5pm in Room 405.

You might also like