Skip to content

Backpropagation in Neural Network

Backpropagation in Neural Network

Last Updated : 25 May, 2025

Back Propagation is also known as "Backward Propagation of Errors" is a method used to train neural network . Its goal is to reduce the difference between the model’s predicted output and the actual output by adjusting the weights and biases in the network.

It works iteratively to adjust weights and bias to minimize the cost function. In each epoch the model adapts these parameters by reducing loss by following the error gradient. It often uses optimization algorithms like gradient descent or stochastic gradient descent. The algorithm computes the gradient using the chain rule from calculus allowing it to effectively navigate complex layers in the neural network to minimize the cost function.

Frame-13 — Fig(a) A simple illustration of how the backpropagation works by adjustments of weights

Back Propagation plays a critical role in how neural networks improve over time. Here's why:

Efficient Weight Update: It computes the gradient of the loss function with respect to each weight using the chain rule making it possible to update weights efficiently.
Scalability: The Back Propagation algorithm scales well to networks with multiple layers and complex architectures making deep learning feasible.
Automated Learning: With Back Propagation the learning process becomes automated and the model can adjust itself to optimize its performance.

Working of Back Propagation Algorithm

The Back Propagation algorithm involves two main steps: the Forward Pass and the Backward Pass.

1. Forward Pass Work

In forward pass the input data is fed into the input layer. These inputs combined with their respective weights are passed to hidden layers. For example in a network with two hidden layers (h1 and h2) the output from h1 serves as the input to h2. Before applying an activation function, a bias is added to the weighted inputs.

Each hidden layer computes the weighted sum (`a`) of the inputs then applies an activation function like ReLU (Rectified Linear Unit) to obtain the output (`o`). The output is passed to the next layer where an activation function such as softmax converts the weighted outputs into probabilities for classification.

forwards-pass — The forward pass using weights and biases

2. Backward Pass

In the backward pass the error (the difference between the predicted and actual output) is propagated back through the network to adjust the weights and biases. One common method for error calculation is the Mean Squared Error (MSE) given by:

\text{MSE} = (\text{Predicted Output} - \text{Actual Output})^2

Once the error is calculated the network adjusts weights using gradients which are computed with the chain rule. These gradients indicate how much each weight and bias should be adjusted to minimize the error in the next iteration. The backward pass continues layer by layer ensuring that the network learns and improves its performance. The activation function through its derivative plays a crucial role in computing these gradients during Back Propagation.

Example of Back Propagation in Machine Learning

Let’s walk through an example of Back Propagation in machine learning. Assume the neurons use the sigmoid activation function for the forward and backward pass. The target output is 0.5 and the learning rate is 1.

example-1 — Example (1) of backpropagation sum

Forward Propagation

1. Initial Calculation

The weighted sum at each node is calculated using:

a j =∑(w i ,j∗x i )

Where,

a_j is the weighted sum of all the inputs and weights at each node
w_{i,j} represents the weights between the i^{th}input and the j^{th} neuron
x_i represents the value of the i^{th} input

o (output): After applying the activation function to a, we get the output of the neuron:

o_j = activation function(a_j )

2. Sigmoid Function

The sigmoid function returns a value between 0 and 1, introducing non-linearity into the model.

y_j = \frac{1}{1+e^{-a_j}}

example-2 — To find the outputs of y3, y4 and y5

3. Computing Outputs

At h1 node

\begin {aligned}a_1 &= (w_{1,1} x_1) + (w_{2,1} x_2) \\& = (0.2 * 0.35) + (0.2* 0.7)\\&= 0.21\end {aligned}

Once we calculated the a₁ value, we can now proceed to find the y₃ value:

y_j= F(a_j) = \frac 1 {1+e^{-a_1}}

y_3 = F(0.21) = \frac 1 {1+e^{-0.21}}

y_3 = 0.56

Similarly find the values of y₄ at h₂and y₅at O₃

a_2 = (w_{1,2} * x_1) + (w_{2,2} * x_2) = (0.3*0.35)+(0.3*0.7)=0.315

y_4 = F(0.315) = \frac 1{1+e^{-0.315}}

a3 = (w_{1,3}*y_3)+(w_{2,3}*y_4) =(0.3*0.57)+(0.9*0.59) =0.702

y_5 = F(0.702) = \frac 1 {1+e^{-0.702} } = 0.67

example-3 — Values of y3, y4 and y5

4. Error Calculation

Our actual output is 0.5 but we obtained 0.67. To calculate the error we can use the below formula:

Error_j= y_{target} - y_5

\text{=>} \space 0.5 - 0.67 = -0.17

Using this error value we will be backpropagating.

Back Propagation

1. Calculating Gradients

The change in each weight is calculated as:

\Delta w_{ij} = \eta \times \delta_j \times O_j

Where:

\delta_j is the error term for each unit,
\eta is the learning rate.

2. Output Unit Error

For O3:

\delta_5 = y_5(1-y_5) (y_{target} - y_5)

= 0.67(1-0.67)(-0.17) = -0.0376

3. Hidden Unit Error

For h1:

\delta_3 = y_3 (1-y_3)(w_{1,3} \times \delta_5)

= 0.56(1-0.56)(0.3 \times -0.0376) = -0.0027

For h2:

\delta_4 = y_4(1-y_4)(w_{2,3} \times \delta_5)

=0.59 (1-0.59)(0.9 \times -0.0376) = -0.0819

4. Weight Updates

For the weights from hidden to output layer:

\Delta w_{2,3} = 1 \times (-0.0376) \times 0.59 = -0.022184

New weight:

w_{2,3}(\text{new}) = -0.022184 + 0.9 = 0.877816

For weights from input to hidden layer:

\Delta w_{1,1} = 1 \times (-0.0027) \times 0.35 = 0.000945

New weight:

w_{1,1}(\text{new}) = 0.000945 + 0.2 = 0.200945

Similarly other weights are updated:

w_{1,2}(\text{new}) = 0.273225
w_{1,3}(\text{new}) = 0.086615
w_{2,1}(\text{new}) = 0.269445
w_{2,2}(\text{new}) = 0.18534

The updated weights are illustrated below

example-4-(1) — Through backward pass the weights are updated

After updating the weights the forward pass is repeated yielding:

y_3 = 0.57
y_4 = 0.56
y_5 = 0.61

Since y_5 = 0.61 is still not the target output the process of calculating the error and backpropagating continues until the desired output is reached.

This process demonstrates how Back Propagation iteratively updates weights by minimizing errors until the network accurately predicts the output.

Error = y_{target} - y_5

= 0.5 - 0.61 = -0.11

This process is said to be continued until the actual output is gained by the neural network.

Back Propagation Implementation in Python for XOR Problem

This code demonstrates how Back Propagation is used in a neural network to solve the XOR problem. The neural network consists of:

1. Defining Neural Network

We define a neural network as Input layer with 2 inputs, Hidden layer with 4 neurons, Output layer with 1 output neuron and use Sigmoid function as activation function.

def __init__(self, input_size, hidden_size, output_size): constructor to initialize the neural network
self.input_size = input_size: stores the size of the input layer
self.hidden_size = hidden_size: stores the size of the hidden layer
self.weights_input_hidden = np.random.randn(self.input_size, self.hidden_size): initializes weights for input to hidden layer
self.weights_hidden_output = np.random.randn(self.hidden_size, self.output_size): initializes weights for hidden to output layer
self.bias_hidden = np.zeros((1, self.hidden_size)): initializes bias for hidden layer
self.bias_output = np.zeros((1, self.output_size)): initializes bias for output layer

Python

import numpy as np

class NeuralNetwork:
    def __init__(self, input_size, hidden_size, output_size):
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size

        self.weights_input_hidden = np.random.randn(self.input_size, self.hidden_size)
        self.weights_hidden_output = np.random.randn(self.hidden_size, self.output_size)

        self.bias_hidden = np.zeros((1, self.hidden_size))
        self.bias_output = np.zeros((1, self.output_size))

    def sigmoid(self, x):
        return 1 / (1 + np.exp(-x))

    def sigmoid_derivative(self, x):
        return x * (1 - x)

2. Defining Feed Forward Network

In Forward pass inputs are passed through the network activating the hidden and output layers using the sigmoid function.

self.hidden_activation = np.dot(X, self.weights_input_hidden) + self.bias_hidden: calculates activation for hidden layer
self.hidden_output= self.sigmoid(self.hidden_activation): applies activation function to hidden layer
self.output_activation= np.dot(self.hidden_output, self.weights_hidden_output) + self.bias_output: calculates activation for output layer
self.predicted_output = self.sigmoid(self.output_activation): applies activation function to output layer

Python

    def feedforward(self, X):
        self.hidden_activation = np.dot(X, self.weights_input_hidden) + self.bias_hidden
        self.hidden_output = self.sigmoid(self.hidden_activation)

        self.output_activation = np.dot(self.hidden_output, self.weights_hidden_output) + self.bias_output
        self.predicted_output = self.sigmoid(self.output_activation)

        return self.predicted_output

3. Defining Backward Network

In Backward pass or Back Propagation the errors between the predicted and actual outputs are computed. The gradients are calculated using the derivative of the sigmoid function and weights and biases are updated accordingly.

output_error = y - self.predicted_output: calculates the error at the output layer
output_delta = output_error * self.sigmoid_derivative(self.predicted_output): calculates the delta for the output layer
hidden_error = np.dot(output_delta, self.weights_hidden_output.T): calculates the error at the hidden layer
hidden_delta = hidden_error * self.sigmoid_derivative(self.hidden_output): calculates the delta for the hidden layer
self.weights_hidden_output += np.dot(self.hidden_output.T, output_delta) * learning_rate: updates weights between hidden and output layers
self.weights_input_hidden += np.dot(X.T, hidden_delta) * learning_rate: updates weights between input and hidden layers

Python

    def backward(self, X, y, learning_rate):
        output_error = y - self.predicted_output
        output_delta = output_error * self.sigmoid_derivative(self.predicted_output)

        hidden_error = np.dot(output_delta, self.weights_hidden_output.T)
        hidden_delta = hidden_error * self.sigmoid_derivative(self.hidden_output)

        self.weights_hidden_output += np.dot(self.hidden_output.T, output_delta) * learning_rate
        self.bias_output += np.sum(output_delta, axis=0, keepdims=True) * learning_rate
        self.weights_input_hidden += np.dot(X.T, hidden_delta) * learning_rate
        self.bias_hidden += np.sum(hidden_delta, axis=0, keepdims=True) * learning_rate

4. Training Network

The network is trained over 10,000 epochs using the Back Propagation algorithm with a learning rate of 0.1 progressively reducing the error.

output = self.feedforward(X): computes the output for the current inputs
self.backward(X, y, learning_rate): updates weights and biases using Back Propagation
loss = np.mean(np.square(y - output)): calculates the mean squared error (MSE) loss

Python

    def train(self, X, y, epochs, learning_rate):
        for epoch in range(epochs):
            output = self.feedforward(X)
            self.backward(X, y, learning_rate)
            if epoch % 4000 == 0:
                loss = np.mean(np.square(y - output))
                print(f"Epoch {epoch}, Loss:{loss}")

5. Testing Neural Network

X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]]): defines the input data
y = np.array([[0], [1], [1], [0]]): defines the target values
nn = NeuralNetwork(input_size=2, hidden_size=4, output_size=1): initializes the neural network
nn.train(X, y, epochs=10000, learning_rate=0.1): trains the network
output = nn.feedforward(X): gets the final predictions after training

Python

X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])

nn = NeuralNetwork(input_size=2, hidden_size=4, output_size=1)
nn.train(X, y, epochs=10000, learning_rate=0.1)

output = nn.feedforward(X)
print("Predictions after training:")
print(output)

Output:

Screenshot-2025-03-07-130223 — Trained ModelFreFF

The output shows the training progress of a neural network over 10,000 epochs. Initially the loss was high (0.2713) but it gradually decreased as the network learned reaching a low value of 0.0066 by epoch 8000.
The final predictions are close to the expected XOR outputs: approximately 0 for [0, 0] and [1, 1] and approximately 1 for [0, 1] and [1, 0] indicating that the network successfully learned to approximate the XOR function.

Advantages of Back Propagation for Neural Network Training

The key benefits of using the Back Propagation algorithm are:

Ease of Implementation: Back Propagation is beginner-friendly requiring no prior neural network knowledge and simplifies programming by adjusting weights with error derivatives.
Simplicity and Flexibility: Its straightforward design suits a range of tasks from basic feedforward to complex convolutional or recurrent networks.
Efficiency: Back Propagation accelerates learning by directly updating weights based on error especially in deep networks.
Generalization: It helps models generalize well to new data improving prediction accuracy on unseen examples.
Scalability: The algorithm scales efficiently with larger datasets and more complex networks making it ideal for large-scale tasks.

Challenges with Back Propagation

While Back Propagation is useful it does face some challenges:

Vanishing Gradient Problem: In deep networks the gradients can become very small during Back Propagation making it difficult for the network to learn. This is common when using activation functions like sigmoid or tanh.
Exploding Gradients: The gradients can also become excessively large causing the network to diverge during training.
Overfitting: If the network is too complex it might memorize the training data instead of learning general patterns.

Backpropagation in Neural Network

tejashreeganesan

Improve

Article Tags :

Practice Tags :

Machine Learning

Similar Reads

Machine Learning Tutorial

Machine learning is a branch of Artificial Intelligence that focuses on developing models and algorithms that let computers learn from data without being explicitly programmed for every task. In simple words, ML teaches the systems to think and understand like humans by learning from the data.It can

Non-linear Components

In electrical circuits, Non-linear Components are electronic devices that need an external power source to operate actively. Non-Linear Components are those that are changed with respect to the voltage and current. Elements that do not follow ohm's law are called Non-linear Components. Non-linear Co

Linear Regression in Machine learning

Linear regression is a type of supervised machine-learning algorithm that learns from the labelled datasets and maps the data points with most optimized linear functions which can be used for prediction on new datasets. It assumes that there is a linear relationship between the input and output, mea

Support Vector Machine (SVM) Algorithm

Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification and regression tasks. It tries to find the best boundary known as hyperplane that separates different classes in the data. It is useful when you want to do binary classification like spam vs. not spam or

Spring Boot Tutorial

Spring Boot is a Java framework that makes it easier to create and run Java applications. It simplifies the configuration and setup process, allowing developers to focus more on writing code for their applications. This Spring Boot Tutorial is a comprehensive guide that covers both basic and advance

Class Diagram | Unified Modeling Language (UML)

A UML class diagram is a visual tool that represents the structure of a system by showing its classes, attributes, methods, and the relationships between them. It helps everyone involved in a projectâ€”like developers and designersâ€”understand how the system is organized and how its components interact

Logistic Regression in Machine Learning

Logistic Regression is a supervised machine learning algorithm used for classification problems. Unlike linear regression which predicts continuous values it predicts the probability that an input belongs to a specific class. It is used for binary classification where the output can be one of two po

K means Clustering â€“ Introduction

K-Means Clustering is an Unsupervised Machine Learning algorithm which groups unlabeled dataset into different clusters. It is used to organize data into groups based on their similarity. Understanding K-means ClusteringFor example online store uses K-Means to group customers based on purchase frequ

K-Nearest Neighbor(KNN) Algorithm

K-Nearest Neighbors (KNN) is a supervised machine learning algorithm generally used for classification but can also be used for regression tasks. It works by finding the "k" closest data points (neighbors) to a given input and makesa predictions based on the majority class (for classification) or th

Steady State Response

In this article, we are going to discuss the steady-state response. We will see what is steady state response in Time domain analysis. We will then discuss some of the standard test signals used in finding the response of a response. We also discuss the first-order response for different signals. We

'); // $('.spinner-loading-overlay').show(); let script = document.createElement('script'); script.src = 'https://assets.geeksforgeeks.org/v2/editor-prod/static/js/bundle.min.js'; script.defer = true document.head.appendChild(script); script.onload = function() { suggestionModalEditor() //to add editor in suggestion modal if(loginData && loginData.premiumConsent){ personalNoteEditor() //to load editor in personal note } } script.onerror = function() { if($('.editorError').length){ $('.editorError').remove(); } var messageDiv = $('

').text('Editor not loaded due to some issues'); $('#suggestion-section-textarea').append(messageDiv); $('.suggest-bottom-btn').hide(); $('.suggestion-section').hide(); editorLoaded = false; } }); //suggestion modal editor function suggestionModalEditor(){ // editor params const params = { data: undefined, plugins: ["BOLD", "ITALIC", "UNDERLINE", "PREBLOCK"], } // loading editor try { suggestEditorInstance = new GFGEditorWrapper("suggestion-section-textarea", params, { appNode: true }) suggestEditorInstance._createEditor("") $('.spinner-loading-overlay:eq(0)').remove(); editorLoaded = true; } catch (error) { $('.spinner-loading-overlay:eq(0)').remove(); editorLoaded = false; } } //personal note editor function personalNoteEditor(){ // editor params const params = { data: undefined, plugins: ["UNDO", "REDO", "BOLD", "ITALIC", "NUMBERED_LIST", "BULLET_LIST", "TEXTALIGNMENTDROPDOWN"], placeholderText: "Description to be......", } // loading editor try { let notesEditorInstance = new GFGEditorWrapper("pn-editor", params, { appNode: true }) notesEditorInstance._createEditor(loginData&&loginData.user_personal_note?loginData.user_personal_note:"") $('.spinner-loading-overlay:eq(0)').remove(); editorLoaded = true; } catch (error) { $('.spinner-loading-overlay:eq(0)').remove(); editorLoaded = false; } } var lockedCasesHtml = `You can suggest the changes for now and it will be under 'My Suggestions' Tab on Write.

You will be notified via email once the article is available for improvement. Thank you for your valuable feedback!`; var badgesRequiredHtml = `It seems that you do not meet the eligibility criteria to create improvements for this article, as only users who have earned specific badges are permitted to do so.

However, you can still create improvements through the Pick for Improvement section.`; jQuery('.improve-header-sec-child').on('click', function(){ jQuery('.improve-modal--overlay').hide(); $('.improve-modal--suggestion').hide(); jQuery('#suggestion-modal-alert').hide(); }); $('.suggest-change_wrapper, .locked-status--impove-modal .improve-bottom-btn').on('click',function(){ // when suggest changes option is clicked $('.ContentEditable__root').text(""); $('.suggest-bottom-btn').html("Suggest changes"); $('.thank-you-message').css("display","none"); $('.improve-modal--improvement').hide(); $('.improve-modal--suggestion').show(); $('#suggestion-section-textarea').show(); jQuery('#suggestion-modal-alert').hide(); if(suggestEditorInstance !== null){ suggestEditorInstance.setEditorValue(""); } $('.suggestion-section').css('display', 'block'); jQuery('.suggest-bottom-btn').css("display","block"); }); $('.create-improvement_wrapper').on('click',function(){ // when create improvement option clicked then improvement reason will be shown if(loginData && loginData.isLoggedIn) { $('body').append('

'); $('.spinner-loading-overlay').show(); jQuery.ajax({ url: writeApiUrl + 'create-improvement-post/?v=1', type: "POST", contentType: 'application/json; charset=utf-8', dataType: 'json', xhrFields: { withCredentials: true }, data: JSON.stringify({ gfg_id: post_id }), success:function(result) { $('.spinner-loading-overlay:eq(0)').remove(); $('.improve-modal--overlay').hide(); $('.unlocked-status--improve-modal-content').css("display","none"); $('.create-improvement-redirection-to-write').attr('href',writeUrl + 'improve-post/' + `${result.id}` + '/', '_blank'); $('.create-improvement-redirection-to-write')[0].click(); }, error:function(e) { showErrorMessage(e.responseJSON,e.status) }, }); } else { if(loginData && !loginData.isLoggedIn) { $('.improve-modal--overlay').hide(); if ($('.header-main__wrapper').find('.header-main__signup.login-modal-btn').length) { $('.header-main__wrapper').find('.header-main__signup.login-modal-btn').click(); } return; } } }); $('.left-arrow-icon_wrapper').on('click',function(){ if($('.improve-modal--suggestion').is(":visible")) $('.improve-modal--suggestion').hide(); else{ } $('.improve-modal--improvement').show(); }); const showErrorMessage = (result,statusCode) => { if(!result) return; $('.spinner-loading-overlay:eq(0)').remove(); if(statusCode == 403) { $('.improve-modal--improve-content.error-message').html(result.message); jQuery('.improve-modal--overlay').show(); jQuery('.improve-modal--improvement').show(); $('.locked-status--impove-modal').css("display","block"); $('.unlocked-status--improve-modal-content').css("display","none"); $('.improve-modal--improvement').attr("status","locked"); return; } } function suggestionCall() { var editorValue = suggestEditorInstance.getValue(); var suggest_val = $(".ContentEditable__root").find("[data-lexical-text='true']").map(function() { return $(this).text().trim(); }).get().join(' '); suggest_val = suggest_val.replace(/\s+/g, ' ').trim(); var array_String= suggest_val.split(" ") //array of words var gCaptchaToken = $("#g-recaptcha-response-suggestion-form").val(); var error_msg = false; if(suggest_val != "" && array_String.length >=4){ if(editorValue.length { jQuery('.ContentEditable__root').focus(); jQuery('#suggestion-modal-alert').hide(); }, 3000); } } document.querySelector('.suggest-bottom-btn').addEventListener('click', function(){ jQuery('body').append('

'); jQuery('.spinner-loading-overlay').show(); if(loginData && loginData.isLoggedIn) { suggestionCall(); return; } // script for grecaptcha loaded in loginmodal.html and call function to set the token setGoogleRecaptcha(); }); $('.improvement-bottom-btn.create-improvement-btn').click(function() { //create improvement button is clicked $('body').append('

'); $('.spinner-loading-overlay').show(); // send this option via create-improvement-post api jQuery.ajax({ url: writeApiUrl + 'create-improvement-post/?v=1', type: "POST", contentType: 'application/json; charset=utf-8', dataType: 'json', xhrFields: { withCredentials: true }, data: JSON.stringify({ gfg_id: post_id }), success:function(result) { $('.spinner-loading-overlay:eq(0)').remove(); $('.improve-modal--overlay').hide(); $('.create-improvement-redirection-to-write').attr('href',writeUrl + 'improve-post/' + `${result.id}` + '/', '_blank'); $('.create-improvement-redirection-to-write')[0].click(); }, error:function(e) { showErrorMessage(e.responseJSON,e.status); }, }); });

Continue without supporting

`; $('body').append(adBlockerModal); $('body').addClass('body-for-ad-blocker'); const modal = document.getElementById("adBlockerModal"); modal.style.display = "block"; } function handleAdBlockerClick(type){ if(type == 'disabled'){ window.location.reload(); } else if(type == 'info'){ document.getElementById("ad-blocker-div").style.display = "none"; document.getElementById("ad-blocker-info-div").style.display = "flex"; handleAdBlockerIconClick(0); } } var lastSelected= null; //Mapping of name and video URL with the index. const adBlockerVideoMap = [ ['Ad Block Plus','https://media.geeksforgeeks.org/auth-dashboard-uploads/abp-blocker-min.mp4'], ['Ad Block','https://media.geeksforgeeks.org/auth-dashboard-uploads/Ad-block-min.mp4'], ['uBlock Origin','https://media.geeksforgeeks.org/auth-dashboard-uploads/ub-blocke-min.mp4'], ['uBlock','https://media.geeksforgeeks.org/auth-dashboard-uploads/U-blocker-min.mp4'], ] function handleAdBlockerIconClick(currSelected){ const videocontainer = document.getElementById('ad-blocker-info-div-gif'); const videosource = document.getElementById('ad-blocker-info-div-gif-src'); if(lastSelected != null){ document.getElementById("ad-blocker-info-div-icons-"+lastSelected).style.backgroundColor = "white"; document.getElementById("ad-blocker-info-div-icons-"+lastSelected).style.borderColor = "#D6D6D6"; } document.getElementById("ad-blocker-info-div-icons-"+currSelected).style.backgroundColor = "#D9D9D9"; document.getElementById("ad-blocker-info-div-icons-"+currSelected).style.borderColor = "#848484"; document.getElementById('ad-blocker-info-div-name-span').innerHTML = adBlockerVideoMap[currSelected][0] videocontainer.pause(); videosource.setAttribute('src', adBlockerVideoMap[currSelected][1]); videocontainer.load(); videocontainer.play(); lastSelected = currSelected; }