0% found this document useful (0 votes)
95 views9 pages

Vanishing and Exploding

Gradient descent vanishing and exploding issues arise due to the chain rule in backpropagation. The vanishing gradient problem occurs when the gradients become extremely small during backpropagation, making learning very slow. The exploding gradient problem is the opposite, where the gradients grow extremely large, making the neural network highly unstable. Both issues can be addressed using techniques like gradient clipping or initializing weights properly.

Uploaded by

logi9361
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views9 pages

Vanishing and Exploding

Gradient descent vanishing and exploding issues arise due to the chain rule in backpropagation. The vanishing gradient problem occurs when the gradients become extremely small during backpropagation, making learning very slow. The exploding gradient problem is the opposite, where the gradients grow extremely large, making the neural network highly unstable. Both issues can be addressed using techniques like gradient clipping or initializing weights properly.

Uploaded by

logi9361
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Gradient descent vanishing

& exploding
By
LOGESHWARI P
(CB.EN.P2BME23009)
GRADIENT DESCENT

Gradient descent is an optimization algorithm which is


commonly-used to train machine learning models and neural
networks.
Chain rule in back propagation
Vanishing gradient
Exploding gradient
• Softmax and sigmoid are both activation functions commonly used in machine learning for • 4. **Independence:**
different purposes. Let's compare them in terms of their characteristics and use cases:
• - **Softmax:** The probabilities sum to 1, and the output for one class is dependent on
the scores of other classes.
• 1. **Function Form:** • - **Sigmoid:** Each sigmoid output is independent of the others. It's applied element-
wise to each output node.
• - **Softmax:** It is used for multi-class classification problems. The softmax function
takes a vector of arbitrary real-valued scores and squashes them to a probability distribution
over multiple classes. The output is a vector of probabilities that sum to 1.
• 5. **Numerical Stability:**
• - **Sigmoid:** It is used for binary classification problems. The sigmoid function takes a
real-valued input and squashes it to the range [0, 1]. It's commonly used to produce the • - **Softmax:** The softmax function involves exponentiation, and in practice, it can be
probability of belonging to a particular class. sensitive to large input values, potentially leading to numerical instability issues.
• - **Sigmoid:** Generally more numerically stable compared to softmax.

• 2. **Output Range:**
• - **Softmax:** Produces a probability distribution over multiple classes, with each • 6. **Derivative:**
element in the range (0, 1). The sum of all elements in the output vector is 1.
• - **Softmax:** The derivative of the softmax function involves multiple terms, and it's
• - **Sigmoid:** Produces an output in the range (0, 1) and is suitable for binary often used in conjunction with the cross-entropy loss during backpropagation in
classification problems. It can be interpreted as the probability of belonging to the positive classification tasks.
class.
• - **Sigmoid:** The derivative of the sigmoid function has a simple and interpretable
form, making it computationally efficient during backpropagation.

• 3. **Application:**
• - **Softmax:** Typically used in the output layer of a neural network for multi-class • In summary, softmax is suitable for multi-class classification tasks, while sigmoid is
classification problems. It's especially useful when there are more than two classes. commonly used in binary classification problems. The choice between them depends on the
nature of the task and the number of classes involved.
• - **Sigmoid:** Commonly used in binary classification problems. It's also used in the
hidden layers of neural networks to model non-linear relationships in the data.

You might also like