Inception Net
Inception Net
BCSE-332L
Module 3:
Convolutional Neural Network
Dr . Saurabh Agrawal
Faculty Id: 20165
School of Computer Science and Engineering
VIT, Vellore-632014
Tamil Nadu, India
4-Sep-24 Dr. Saurabh Agrawal, SCOPE, DATABASE SYSTEMS, VIT, VELLORE 1
Outline
Foundations of Convolutional Neural Networks
CNN Operations
Architecture
Simple Convolution Network
Deep Convolutional Models
ResNet
AlexNet
InceptionNet
Others
In the above image of size 6x6, we can see that on the feature map, max pooling is applied with stride 2
and filter 2 or 2x2 window.
This operation reduces the size of the data and preserves the most essential features.
The output obtained after the pooling layer is called the pooled feature map.
4-Sep-24 Dr. Saurabh Agrawal, SCOPE, DATABASE SYSTEMS, VIT, VELLORE 21
CNN Operations
Padding: CNN has offered a lot of promising results but there are some issues that comes while applying
convolution layers. There are two significant problems:
1. When we apply convolution operation, based on the size of image and filter, the size of the resultant
processed image reduces according to the following rule:
Let image size: nxn
Let filer size: mxm
Then, resultant image size: (n-m+1)x(n-m+1)
2. Another problem comes with the pixel values in the edges of the image matrix. The values on the edges of
the matrix do not contribute as much as the values in the middle. Like if some pixel value resides in the
corner i.e. (0,0) position, then it will be considered only once in the convolution operation, while values in
the middle will be considered multiple times.
To overcome these important problems, padding is the solution. In padding, we layer of values like adding
layer(s) of zeros around the feature matrix, as shown in the below image.
In the above image, the output after flattening is given to the fully connected layer and this network helps
is in classifying the image as either cat or dog.
This will yield an output volume of size Wout x Wout x D. In all cases, pooling provides some
translation invariance which means that an object would be recognizable regardless of where it
appears on the frame.
It was later discovered that a slight modification to the original proposed unit offers better
performance by more efficiently allowing gradients to propagate through the network during training.