NB4-06 PT I Using CNN
NB4-06 PT I Using CNN
NOTE: We will train a network model to 30 epochs to see that the whole algorithm
works. Then we will load a better network model trained to 100 epochs and validate
with it. This 100 epochs must be loaded from our local disk. You must download NB4-
06 folderfromgithub to local disk.
1. Introduction
PyTorch is built on the Torch library, which was initially developed inLua. However,
PyTorch is implemented in Python, which has contributed to its rapid adoption due to
Python's widespread use in the data science community.
Tensor Computation: PyTorch's tensors are similar to NumPy arrays but with
added capabilities, such as GPU acceleration, which makes PyTorch efficient for
large-scale numerical computation.
Applications of PyTorch
PyTorch is versatile and can be used for a variety of machine learning tasks:
Computer Vision: PyTorch is widely used in the field of computer vision for
tasks such as image classification, object detection, and Generative Adversarial
Networks(GANs). Libraries like Torchvision, which is built on top of PyTorch,
provide tools for image transformations, datasets, and pre-trained models.
Natural Language Processing (NLP): PyTorch has extensive support for NLP
tasks, such as sentiment analysis, machine translation, and text generation.
Libraries like Torchtext provide utilities to handle text data and build models.
PyTorch is one of several popular deep learning frameworks. Here’s how it compares
to others:
Keras: Keras is a high-level API that runs on top of TensorFlow. It’s designed to
be user-friendly and is ideal for beginners. PyTorch, while slightly more complex,
offers greater control and flexibility, making it preferred for more advanced
users.
Flexibility: The dynamic nature of PyTorch allows for easier debugging and
experimentation, which is crucial in research settings.
First, we check if GPU is connected. In Google Colab this fact is immediate and
unnecessary but in other programming environment (local programming) my be
necessary this check.
We can use:
import tensorflow as tf
tf.config.list_physical_devices('GPU')
but we don't want use TensorFlow in this notebook. One alternative is use a system
command.
2. GPU Information:
o GPU: Id number.
3. Running Processes:
o Lists the running processes that are utilizing the GPU, if any.
In this particular case, there are no processes currently running that are utilizing the
GPU, as indicated in the "Processes" section. The GPU is in an idle state, with a
temperature of 56°C and a power usage of 10W out of a maximum of 70W.
Additionally, no processes are currently using the GPU.
For example. To enable GPU persistence in NVIDIA, you can use the nvidia-
smi command along with the --persistence-mode option. Here are the steps to enable
GPU persistence:
Persistence mode keeps the GPU driver loaded even when no applications are
running, which can reduce the time to start new applications.
We save the root directory of our workspace '/content' as 'HOME' since we will be
navigating through the directory to have multiple projects under the same HOME.
Additionally, we will have the datasets in the 'datasets' directory, so all datasets are
easily accessible for any project.
Next, it imports the drive module from the google.colab library, which provides
functionalities for mounting Google Drive in Google Colab.
Additionally, Google Drive is mounted in Google Colab and made available at the
path /content/drive. The user will be prompted to authorize access to Google Drive.
Once authorized, the content of Google Drive will be accessible from that point
onwards in the Colab notebook.
Create the dataset directory (only if it doesn't exist), where we are going to save the
dataset with which we are going to train our CNN.
Check if the file specified by file does not exist in the current directory. If it doesn't
exist, the code block inside the conditional, which in this case would be downloading
the file from the specified URL, is executed. then, it extracts the contents
of exp0.zip into the current directory quietly, overwriting any existing files if
necessary.
Now, we will use the matplotlib library to display multiple images in a 2x4 grid layout.
Next code imports necessary modules, including matplotlib.pyplot for plotting, glob for
file matching, and matplotlib.image for image handling.
It specifies the directory containing the images and retrieves the paths of the first
8 .jpg images in that directory using glob.glob().
Then, it creates a figure with subplots arranged in a 4x3 grid and iterates through the
image paths, displaying each image in a subplot using imshow(). The title of each
subplot is set to indicate the image index, and axis labels are turned off.
After displaying all images, it adjusts the layout to prevent overlapping and shows the
figure.
A. Using glob
Setting a Dataloader
Next, we will sets up a data loader using PyTorch and torchvision for handling
datasets. Here's a summary of what each library does:
In machine learning, it is common to divide the dataset into three main parts: training
set, validation set and test set. Here I explain each of them:
Train Set: This dataset is used to train the model. That is, the model learns
from this data by adjusting its parameters to minimise the loss function. The
model is iteratively fitted to this data set during training, using optimisation
techniques such as gradient descent. Generally, the training set is the largest,
as an adequate amount of data is required for the model to learn meaningful
patterns.
Validation Set: After training the model with the training set, the validation set
is used to adjust the hyperparameters of the model and evaluate its
performance. The val set is used to select the best model among several
possible configurations, avoiding overfitting to the test set. This dataset is used
to adjust the model architecture, learning rate or other hyperparameters, in
order to obtain a generalisable model.
Test Set: This data set is used to check the final performance of the model
after it has been trained and evaluated. The test set is essentially a stand-alone
data set that the model has not seen during training or evaluation. It provides
an objective estimate of the model's performance on unseen data and helps
assess its ability to generalise to new samples.
It is a good practice to normalize both the training set and theval setin the same
way. This ensures that the data are on the same scale and distribution, which can help
the model converge more quickly during training and make more consistent
predictions during evaluation.
It is important to remember that when normalizing the data, you need to calculate the
mean and standard deviationonly on the training setand then apply those same
statistics to the val set. This is because the validation set should simulate "new" or
"unknown" data for the model, so it should not be used to calculate any normalization
statistics.
Therefore, after calculating the mean and standard deviation on the training set, you
can normalize both the training set and the val set as follows:
3. Normalize the val set using the same statistics calculated on the training set.
Where:
After normalization, the pixel values will be centered around zero, which facilitates the
training of many models.
The train set is unmodified in size because transform() transform the data but it don't
augment the dataset
Let us show one example for each class, for fun. As we've transformed the image by
normalizing it, we should undo the transformation before visualizing the image.
To revert the normalization, you need to set a "new mean" and "new standard
deviation" in such a way that the effect of the previous normalization is canceled.
both original_image must be the same, therefore (you can easily check it out):
σN=1/σ
and
μN=−μ/σ
Therefore, our new_mean or inverse mean and new_std or inverse standard deviation
are as follows:
Inverse Mean: To counteract the effect of subtracting the previous mean, you
now add that original mean divided by the standard deviation, which is achieved
using [-m/s for m, s in zip(mean, std)]. This adjustment reverses the operation of
subtracting the previous mean.
Settings Hyperparameters
Hyperparameters are parameters that are not directly learned from the training
process of the model but are set before the training process begins. They are
configurations that control the training process of the model and affect its
performance and behavior.
We are going to define some training parameters for the network, such as the number
of batches, epochs, and classes in the dataset because they are needed for
dataloaders in order to set up our training loop.
4. Define a Convolutional Neural Network
CNNs have revolutionized the field of computer vision and have been widely adopted
for tasks such as image classification, object detection, image segmentation, and
more. Their ability to automatically learn hierarchical representations from raw data
makes them highly effective for a wide range of visual recognition tasks.
It's important to be careful when designing the architecture of a CNN to avoid errors
that may lead to premature loss of information in the data volumes. Some common
errors that can result in data volume loss include:
1. Using Overly Aggressive Pooling Layers: Employing pooling layers with too
large a window size or too large a stride can reduce the size of the feature
volume too quickly, resulting in the loss of relevant information in the process.
Import Libraries
Import the necessary modules from the PyTorch library for defining neural network
architectures. Here's what each part means:
import torch.nn as nn: This line imports the torch.nn module, which contains pre-
defined neural network layers, loss functions, and other utility functions for building
neural networks. By importing nn, we gain access to classes such
as Conv2d, Linear, MaxPool2d, etc., which are used to define the layers of a neural
network.
For the nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1) layer, let's assume
the input has a size of (batch_size, 32, H, W), where batch_size is the batch size, 32 is
the number of input channels (feature maps), H is the height of the image, and W is
the width of the image.
padding: 1 (padding=1)
kernel_size: 3 (kernel_size=3)
stride: 1 (stride=1)
Therefore, the output will have a size of (batch_size, 64, 32, 32), meaning that for
each sample in the batch, there will be 64 feature maps of size 32x32.
Functional definition
The code structure follows the same sequence of operations as in the original CNN,
but without the need for the nn.Sequential container.
Each layer's output is passed directly to the next layer's operation, just like in the
object-oriented implementation. Finally, the output of the last layer is returned as the
output of the myCNN function.
Improved Model
Dropout layers (nn.Dropout2d for convolutional layers and nn.Dropout for fully
connected layers) were introduced after each activation function to prevent
overfitting.
Adjusted the input size of the first fully connected layer (nn.Linear) based on the
output size of the previous layer's Flatten operation.
The dropout rate for fully connected layers was set to 0.5, which is a common
value for dropout rates in practice.
This code sets up the device (CPU or GPU) for running the neural network and creates
an instance of the myCNN model, moves it to the selected device, and optionally
utilizes data parallelism if multiple GPUs are available.
NOTE: Outside of Google Colab, it is necessary to explicitly specify the device (GPU or
CPU) and manage model parallelization if using multiple GPUs. This is crucial to
ensure the model runs on the intended GPU and leverages the available hardware
effectively. While Google Colab manages GPU allocation automatically, specifying the
device ("cuda") and using nn.DataParallel can still be beneficial for explicit control and
utilization of available resources, especially if you have specific requirements or want
to ensure optimal performance. However, for many basic use cases, simply letting
Colab manage the resources with automatic GPU selection) will suffice.
In summary:
Understanding how to handle and optimize GPU resource usage is essential for
efficient performance of deep learning models in both scenarios.
2. Model Creation: An instance of the myCNN model is created. Then, the model
is moved to the selected device using the .to(device) method. This ensures that
all operations of the model are performed on the appropriate device (CPU or
GPU). = myCNN().to(device)
If you haven't done so already, let's change the execution environment to T4.
Overall, torchviz is a valuable tool for visualizing and understanding PyTorch models,
especially for developers and researchers working on deep learning projects.
To train a model, we need a loss function and an optimizer. Let's use a Classification
Cross-Entropy loss and SGD with momentum.
A loss function, also known as a cost function or objective function, measures the
discrepancy between the predicted output of a model and the actual target values in
the training dataset. It quantifies how well the model is performing and provides
feedback to the optimization algorithm during training.
An optimizer, on the other hand, is responsible for updating the parameters of the
model (e.g., weights and biases) based on the gradients of the loss function with
respect to those parameters. The goal of the optimizer is to minimize the loss
function, thereby improving the model's performance on the task at hand.
In PyTorch, selecting the appropriate loss function and optimizer are crucial aspects
for effectively training a Deep Learning model. These tools enable the evaluation of
model performance and adjusting its parameters to minimize error and enhance
accuracy.
criterion = nn.MSELoss()
criterion = nn.BCELoss()
criterion = nn.CrossEntropyLoss()
criterion = nn.L1Loss()
In a training loop, a training dataset is used. This dataset consists of input examples
(for example, images in an image classification problem) and their corresponding
labels or desired outputs (for example, class labels associated with each image). The
purpose of the training loop is to iterate over this dataset to update the model's
weights during training.
The training dataset is used to adjust the model's parameters, i.e., the weights of the
connections between the neurons of the neural network. During each iteration of the
training loop, the model calculates predictions for a batch of input examples,
compares those predictions with the true labels using a loss function, and then adjusts
the model's weights to minimize this loss function.
The metrics used to determine good learning depend on the specific problem you are
addressing. Some common metrics include:
2. Loss: It is a measure of how well the model is doing in its predictions. Loss
functions typically assign a numerical value to the difference between the
model's predictions and the true labels. The goal of training is to minimize this
loss.
4. Training Time: The amount of time needed to train the model can be an
important metric, especially in real-time applications. MEJOR NO
These metrics are used to evaluate the model's performance during training and
validation, and to make decisions about the model's architecture, hyperparameters,
and other aspects of the training process.
The training loop is composed of two procedures: train and val, a loop that occurs at
each epoch.
Defining train
The train function is responsible for training the neural network model. It takes the
training dataloader, the model, the loss function, and the optimizer as inputs. Within
the function, it iterates over the data batches, performs the forward pass to obtain the
model predictions, calculates the loss between the predictions and the ground truth
labels, performs backpropagation to compute the gradients of the loss with respect to
the model parameters, and finally updates the model parameters using the optimizer.
During training, it also tracks the loss and accuracy of the model at each iteration and
prints them for monitoring purposes.
Defining val
The val function evaluates the model's performance on a test dataset. It computes the
loss and accuracy over the entire test dataset using batches and prints the accuracy
and val loss. Finally, it returns the val loss and accuracy.
This code trains the neural network model for a specified number of epochs (epochs).
It iterates over each epoch, calling the train function to train the model on the training
data and the val function to evaluate the model on the val data. It then saves the
training and val loss, as well as the training and val accuracy, for each epoch. Finally,
it saves the trained model's state dictionary to a file named "myCNN.pth" and saves
the metrics (loss and accuracy) to a CSV file named "metrics_myCNN.csv".
If your training process is oscillating and not improving accuracy, there could be
several reasons for this issue. Here are some common ones and possible solutions:
1. Learning rate too high or too low: If the learning rate is too high, the
optimization process may overshoot the optimal solution, causing oscillations.
Conversely, if the learning rate is too low, the optimization process may get
stuck in local minima. Try adjusting the learning rate. You can try reducing it
gradually during training (learning rate scheduling) or using adaptive learning
rate algorithms like Adam.
2. Poor initialization: The initial weights of the neural network could be poorly
chosen, leading to oscillations. Try initializing the weights using different
strategies, such as Xavier or He initialization. Here are some links that explain
the importance of proper initialization in neural networks and how different
initialization strategies, like Xavier (Glorot) and He initialization, can help
mitigate issues like oscillations:
Displaying Results
The files.download method will prompt the browser to download the file to your local
computer.
When you run files.download("myCNN.pth"), a dialog box will appear in your browser,
allowing you to download the file to your local machine. If you need to save more files
or perform additional save operations, you can use the same method, adjusting the
file name as needed.
You can download this files to your local disk from here in raw format. You will need to
load them next.
metrics_myCNN(100).csv
myCNN(100).png
myCNN(100).pth
Once you've trained a model with a training dataset (train set) and evaluated it with a
val dataset (val set), it can be useful to perform additional check using a test dataset
(test set). Test with a test dataset is used to fine-tune the model's hyperparameters
and prevent overfitting.
Here's a brief explanation of what test with a test dataset means and how it's done:
o After training the model with the training set, its performance is evaluated
using the validation set.
o The idea is to adjust the model's hyperparameters (such as learning rate,
batch size, neural network depth, etc.) based on its performance on the
validation set. This helps prevent overfitting to the training set and
improves the model's ability to generalize to unseen data.
o After splitting your data into training and testing sets, an additional
portion of the data is reserved for the validation set.
o The size of this validation set can vary depending on the total size of your
data and your specific needs, but typically around 10% to 20% of the data
is reserved.
o You then train the model using the training set and evaluate it using the
validation set.
o Finally, after tuning and validating your model, you evaluate its final
performance using the test set to obtain an unbiased estimate of its
performance on unseen data.
Confusion matrix
In addition to the confusion matrix, several statistics can be useful for evaluating the
performance of a classification model. Here are some options you might consider:
2. Precision: The proportion of true positives out of the total positive predictions.
These statistics provide different perspectives on the model's performance and can be
useful in various contexts. You can choose the ones that best fit your evaluation
needs.
1. Precision: The ratio of true positive predictions to the total predicted positives
(i.e., the number of correctly predicted positive samples divided by the total
number of samples predicted as positive). It measures the accuracy of the
positive predictions.
4. Support: The number of actual occurrences of the class in the dataset (i.e., the
number of true instances for each label).
Additionally, the classification report typically includes averages for these metrics:
Accuracy: The ratio of the number of correct predictions to the total number of
predictions.
Macro Average: The arithmetic mean of the precision, recall, and F1-score
calculated for each class, treating all classes equally.
Weighted Average: The weighted mean of the precision, recall, and F1-score,
taking into account the support of each class, which gives more importance to
the classes with more samples.
Display a ROC curve
AUC Calculation: auc computes the Area Under the Curve (AUC) from the ROC
curve.
Plotting: We use Matplotlib to plot the ROC curve. The diagonal dashed line
represents the ROC curve of a random classifier.
This example illustrates how to generate and plot a ROC curve for binary classification
using scikit-learn. Adjustments can be made for multi-class classification by
computing ROC curves for each class separately or by using techniques like one-vs-
rest.