0% found this document useful (0 votes)
70 views11 pages

Feed Forward Neural Network Assignment PDF

This document describes training a feed forward neural network to approximate a target function. It discusses: 1. Preparing training, validation, and test data from the target function. 2. Designing a network with 2 inputs, 8 hidden neurons with tansig activation, and 1 output neuron with purelin activation. 3. Training the network using different algorithms, finding that trainlm achieved the best performance in approximating the target function. 4. Monitoring validation error during training to prevent overfitting, and finding early stopping was not needed. 5. Testing showed a high correlation of 0.99925 between the network's outputs and the target function.

Uploaded by

Ashfaque Khowaja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views11 pages

Feed Forward Neural Network Assignment PDF

This document describes training a feed forward neural network to approximate a target function. It discusses: 1. Preparing training, validation, and test data from the target function. 2. Designing a network with 2 inputs, 8 hidden neurons with tansig activation, and 1 output neuron with purelin activation. 3. Training the network using different algorithms, finding that trainlm achieved the best performance in approximating the target function. 4. Monitoring validation error during training to prevent overfitting, and finding early stopping was not needed. 5. Testing showed a high correlation of 0.99925 between the network's outputs and the target function.

Uploaded by

Ashfaque Khowaja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

2021

Feed Forward Neural Network

KHOWAJA ASHFAQUE - 科瓦吉


Student No: 214718003
Machine Learning & Data Mining
12/24/2021

Given By: Prof. Dr. Kelvin KL


Wong
Feed Forward Neural Network

Contents
Exercise ......................................................................................................................................... 1
§1 Data Preparation ........................................................................................................... 1
§2 Network Design .............................................................................................................. 1
§3 Network Training .......................................................................................................... 2
§3.1 Training Function .............................................................................................. 2
§3.2 Early Stopping ..................................................................................................... 4
§4 Network Testing ............................................................................................................. 4
§5 Conclusion ......................................................................................................................... 5
Appendix ........................................................................................................................... 6
Feed Forward Neural Network

Exercise:
This exercise deals with the approximation of functions by neural networks. The so called
function approximation (regression), is to find a mapping f’ satisfying || f’(x) - f(x) ||< e, (e is the
tolerance; ||·|| can be any error measurement). In general, it is enough to have a single layer of
nonlinear neurons in a neural network in order to approximate a nonlinear function. The goal of
this exercise is then to build a feed forward neural network that approximates the following
function:

𝑓(𝑥,𝑦)= cos (𝑥+6∗0.35𝑦)+2∗0.35𝑥,𝑦 x,y ∈ [−1,1]

Fig. 1: Target function surface and contour

1. Data Preparation
Three types of data sets are prepared, namely a training set, a validation set and a test set.
Training set is a set of value pairs that include information about the target network training
activity. A confirmation set is associated with an early stopping technique. During the
training phase, verification error is monitored to prevent the network from overspending
training data. Typically, a test set is used to test later network performance. However, in
this problem the root mean square error (RMSE) in the test set is used as the working
principle of network training.
In the current problem, training and test data are taken from the same grids (l0xl0 pairs of
training data values, 9x9 pairs of test data), as shown in Fig.1. The scope of workflow is
already within the interval [-1 1]. Therefore, it is not necessary to measure the target
function. Validation data has been taken randomly from function surface.

2. Network Design
In the current problem, it is two layer feed-forward neural networks, in which one non-
linear neurons hidden layer and one is linear neurons output layer. As defined above, target

1|Page
Feed Forward Neural Network

function has two inputs (x,y) and one output. As we can see in the fig.2, there are two
inputs, one hidden layer tansig (Tan-sigmoid transfer function) which consist 8 hidden
neurons and one output layer purelin (linear transfer function). The reason behind using
tansig function in hidden layer is because; as we know that it predicts the probability
between (0 to1). Meanwhile, purelin function calculates a layer’s output from its net input.

Fig. 2: Network Architecture

3. Network Training
In general, we can train a network in two kinds of styles: batch training or incremental
training. In batch training, weights and biases of the network are only updated after all of
the inputs are presented to the network, while in incremental(on-line) training the network
parameters are updated each time an input is presented to it.
In this problem batch-training has been applied, because it’s faster and more reliable.

3.1. Training Functions


There are many types of batch training algorithms can be used to train a network. In this
case, 5 algorithms have been used to train the network and get different results. Table.1 shows
the results of different algorithms.

Training Epochs Time per Total time(sec) correlation


Functions epochs(sec)
trainbfg 28 0.0357 0.9996 0.988
traingdm 1000 0.006 6 0.588
traingd 1000 0.011 11 0.973
trainlm 5 0 0 0.998
traingda 119 0.008 0.952 0.966
Table.1: Training performance of 5 different functions

• Trainbfg: The Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm is an


iterative method for solving non-linear optimization problems. The BFGS
determines the direction of the descent by predicting a gradient with bending
information; it does this by gradually improving the Hessian matrix measurement of
the loss function. It’s computational complexity is only O(n2), compared to O(n3) in

2|Page
Feed Forward Neural Network

Newton’s method. BFGS has a limited memory; so far large network this algorithm
is not a good choice, however, for small network trainbfg is still an efficient
function.

• Traingd: It is a basic Gradient Descent (GD) algorithm that can update weights
and bias values according to it. The main benefit of this training function is that it
takes directly path to the minimum. The major disadvantage of traingd is due to its
slow learning rate, it can converge at local minima and saddle points and when an
update is performed, we can go through all the observations.

• Traingdm: It can train any network as well as its net input, weight and transfer
function. Traingdm training function is faster than traingd and it improves Gradient
descent by momentum.

• Trainlm: The Levenberg–Marquardt algorithm (LM) is used to solve non-linear


least squares problems, which works in such a way that performance function will
always be reduced at each iteration of the algorithm. Because of this feature trainlm
training function can be faster for average size networks.

• Traingda: It stands for Gradient Descent with Adaptive Learning Rate. In


trangda, each parameter has its own learning rate that improves performance on
problems with sparse gradient.

Fig. 3: Linear Regression Analysis

3|Page
Feed Forward Neural Network

Concerning to the above problem, some training functions have been applied to this network to
get optimal results. The maximum epochs were 5000, set by default and learning rate set by 0.02.
As shown in Table.1, trainbfg, trainlm and traingda achieve performance goal, while traingd
and traingdm fail. Traingda spend less time and get maximum output than trainbfg but trainlm
takes no time and in 5 epochs get optimal output value. Trainlm produces highest correlation
between outputs and targets. Thus, trainlm is the best option for this problem. (See fig.3).

3.2. Early stopping


If the size of the network is too large it may run a risk of over fitting the training set and
loses its generalization ability for unseen data. One method for improving network generalization
ability is to use a network that is just large enough to provide an adequate fit to the target function.
But sometimes it is hard to know beforehand how large a network should be for a specific application.
One commonly used technique for improving network generalization is early stopping. This
technique monitors the error on a subset of the data (validation data) that does not actually take
part in the training.

In order to examine the early stopping in this training, a randomly generated validation set is
used during trainlm training (Maximum validation failures=10, Erms=0.02 for the test set). The
early stopping mechanism is not triggered during the training. The results estimate that 8 hidden
neurons are best choice for this problem.

Fig. 4: Evolution of MSE

4. Network Testing
After the training phase, the testing can be done on test target data and network output. Fig.5
shows the correlation and comparing both values. Blue line shows the network output while

4|Page
Feed Forward Neural Network

red line indicates test output. It estimates the correlation between the network output and the
target is 0.99925 through graph.

Fig. 5: Correlation between network output and target

5. Conclusion
A two-layer network with two inputs, eight tansig hidden units and one pure/in output
unit is built for the approximation problem mentioned above. The network is trained by
trainlm. No early stopping is used during the training. The maximum number of epochs
to train and the learning rate are set to be 5000 and 0.02 respectively. Fig.6 shows the
Parametric Surfaces of the original and the approximated functions.

Fig. 6: Surface of Target function and Approximated function

5|Page
Feed Forward Neural Network

Appendix

Matlab Code
main.m

fprintf ('\t-------------------------------------\n');
fprintf ('\t- Problem 1: Function Approximation -\n');
fprintf ('\t-------------------------------------\n\n');
% >>>>> STEP 1: Generate training and test data <<<<<
fprintf ('Step 1: Generate training and testdata...\n');
fprintf ('===========================================\n');
[train_input,train_target,test_input,test_target,val_input,val_target] = generate_data;
fprintf ('Data generation is finished ! \n\n');
% >>>>> STEP 2: Create a two layer feedforward network <<<<<
fprintf ('Step 2: Create a two layer feedforward network...\n');
fprintf ('=================================================\n');
net = create_network;
fprintf ('Network creation is finished ! \n\n');
% >>>>>STEP 3:Train thenetwork for Erms=0.02 for test set <<<<<
fprintf ('Step 3: Train the network...\n');
fprintf ('============================\n');
[error,network_output]=train_network( net,train_input,train_target,test_input,test_target,val_input,val_target);
fprintf ('Network training is finished ! \n\n');
% >>>>>FINAL step: Plot the result... <<<<<
fprintf ('FINAL step: Plot the result...\n');
fprintf ('==============================\n');
plot_result(net,test_input,test_target,network_output,error);
fprintf ('Hope the training result is good : )');

generate_data.m

function [train_input,train_target,test_input,test_target,val_input,val_target] = generate_data()

train_x =(-1:2/9:1); % training data [-1 1]


train_y = train_x; % training data
test_x = (-1+1/9):2/9:(1-1/9); % test data [-1 1]
test_y = test_x; % test data
val_x= premnmx(rand(1,50)); % validation data[-1 1]
val_y= val_x; % validation data
[train_X, train_Y] = meshgrid(train_x, train_y);
[test_X, test_Y] = meshgrid(test_x, test_y);
[val_X,val_Y]= meshgrid(val_x,val_y);

% Studentcard number : s0105853, coefficient a = 35/100


a = 35/100;

% functin outputis within [-0.8 0.8],so noneed to sacle the function


train_Z = cos(train_X + 6*a*train_Y) + 2.0*a*train_X.*train_Y; % training target
test_Z = cos(test_X + 6*a*test_Y) + 2.0*a*test_X.*test_Y; % test target
val_Z = cos(val_X + 6*a*val_Y) + 2.0*a*val_X.*val_Y; % validation target

% plot the function


[X,Y] = meshgrid(-1:.2:1,-1:.2:1);

6|Page
Feed Forward Neural Network
Z = cos(X + 6*a*Y) + 2.0*a*X.*Y;
figure,
subplot(1,2,1);
surfc(X,Y,Z);

% plot parametric surface


xlabel('X');
ylabel('Y');
zlabel('Z');
title('Target Function Surface');
subplot(1,2,2);
[C,h] = contour(X, Y, Z);

% plot level curve title('Level curve of the target function');


set(h,'LineWidth',2);
title('Contour Function Surface');
clabel(C,h);
xlabel('X');
ylabel('Y');

% Return inputs [-1 1] and outputs[-0.80.8]


train_input = [train_X(:)'; train_Y(:)'];
train_target= train_Z(:)';
test_input = [test_X(:)'; test_Y(:)'];
test_target =test_Z(:)';
val_input = [val_X(:)'; val_Y(:)'];
val_target= val_Z(:)';

create_network

function net = create_network()

% ask the user for the network parameters

num_h = input('Size of the hidden layer[8] -> ');


transFcn_h = input('Transfer function of the hidden layer[tansig]-> ','s');
transFcn_o = input('Transfer function of the output layer[purelin]-> ','s');

% create the network based on the user'schoice


net=newff([-1 1; -1,1],[num_h 1],{transFcn_h,transFcn_o});

train_network.m

function [error,network_output] =
train_network(net,train_input,train_target,test_input,test_target,val_input,val_target)
val.P = val_input;
val.T = val_target;
test.P = test_input;
test.T = test_target;
% ask the user for the training parameters
epoch = round( input('Maximum number of epochs to train [5000]: ')); % maximum number of epochs to train
Lr = input('Learning rate [.02]: '); % learning rate
trainFcn= input('Training function [trainlm]-> ','s'); % training function (Automated Regularization (trainbr))
net.trainFcn = trainFcn;

7|Page
Feed Forward Neural Network
net.trainParam.lr = Lr;
net.trainParam.epochs = epoch;
net.trainParam.show = 40;% Epochs between displays
net.trainParam.goal = 0.02;% Mean-squared error goal
stop_crit = input('Use early stopping? y/n [n]:','s');
erms = 1;

% Training...
if(stop_crit=='n')% no stop criteria
tic, % start a stopwatch timer.
while erms> 0.02
net = train(net,train_input,train_target,[],[],[],test);
network_output = sim(net,test_input);
error = test_target - network_output;
erms = sqrt(mse(error)); % root mean-square error
net.trainParam.goal = net.trainParam.goal*0.5;
end
toc; % prints the elapsed time since tic was used
else % use early stopping
tic,
net.trainParam.max_fail = input('Maximum validation failures [10]:');
while erms> 0.02
net = train(net,train_input,train_target,[],[],val,test);
network_output = sim(net,test_input);
error = test_target - network_output;
erms = sqrt(mse(error)) % root mean-square error
net.trainParam.goal = net.trainParam.goal*0.5;
end
toc;
end

plot_result.m

function plot_result(net,input,target,network_output,error)
X = reshape(input(1,:),9,9);
Y = reshape(input(1,:),9,9);
Z = reshape(target,9,9);
No = reshape(network_output,9,9);
E = reshape(error,9,9);

% plot function surface


figure,
subplot(1,2,1);
surfc(X,Y,Z);xlabel('X');
ylabel('Y'); zlabel('Z');
title('Target Function Surface');
subplot(1,2,2);
surfc(X,Y,No);
xlabel('X');
ylabel('Y');
zlabel('Z');
title('Approximated Function Surface');

% plot level curves...


% create level curves of error

8|Page
Feed Forward Neural Network
figure,
[C,h] = contour(X, Y, E);
clabel(C,h);
xlabel('x');
ylabel('y');
title('level courve of the error')
figure,
[C,h1] = contour(X, Y, Z,'k'); % create level curve of target
set(h1,'LineWidth',2);
% clabel(C,h);
hold on
[C,h2] = contour(X, Y, No,'m');
% create level curve of approximation% clabel(C,h);
set(h2,'LineWidth',2);
hold off
legend([h1(1);h2(1)],'target','approximation');
xlabel('x');
ylabel('y');
title('level courves of the target and approximation functions')

% M - Slope of the best linear regression.M=1 means perfect fit.


% B - Y intercept of the best linear regression.B=0 means perfect fit.
% R - Regression R-value. R=1 means perfect correlation. figure,

%create a new figure for displaying the performance


[M,B,R] = postreg(network_output,target);% check the qualityof the network training
fprintf('\n\tThe slope of the best linear regression[1]: %6.5f\n',M);
fprintf('\tTheYintercept of the bestlinear regression[0]: %6.5f\n',B);
fprintf('\tThe coorelation between the network output and the target[1]: %6.5f\n',R);

9|Page

You might also like