0% found this document useful (0 votes)

24 views54 pages

Deep learning notes

The document provides an overview of deep learning, contrasting it with traditional machine learning, and discusses various concepts such as neural networks, perceptrons, multilayer perceptrons, loss functions, gradient descent, and optimization techniques. It highlights the importance of deep learning in achieving high accuracy with large datasets and powerful hardware, while also addressing challenges like overfitting and the vanishing gradient problem. Additionally, it outlines methods to improve models, including hyperparameter tuning, regularization techniques, and the use of advanced optimizers.

Uploaded by

jagdishdapke362

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views54 pages

Deep learning notes

Uploaded by

jagdishdapke362

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 54

How can you improve your BERT NER model?

What is Deep Learning

Deep Learning is a subfield of AI and ML that is

inspired by the structure of a human brain.

Deep learning algorithms attempt to draw similar

conclusions as humans would by continually analysis
data with a given logical structure called Neural
Network.

Deep learning is part of a broader family of machine learning methods based on artificial neural
networks with representation learning

Deep learning algorithms uses multiple layers to progressively extract higher-level features from
the raw input . For example, in image processing lower layers may identify edges, while higher
layers may identify the concepts relevant to a human such as digits or letters or faces.

why DL getting so famous

Deep Learning (DL) is becoming famous due to:

1. High Accuracy – Achieves state-of-the-art results in vision, NLP, and speech tasks.
2. Big Data Availability – Thrives on large datasets, which are now more accessible.
3. Powerful Hardware – GPUs and TPUs enable efficient training of deep networks.
4. Automation & Adaptability – Learns complex patterns without manual feature
engineering.
5. Real-World Applications – Used in AI assistants, self-driving cars, medical
diagnosis, and more.

Its success is driven by improved algorithms, computing power, and vast amounts of data.

When to Use

 ML → Small data, structured problems, fast training (e.g., decision

trees, SVMs).
 DL → Large data, complex unstructured tasks like image recognition
and NLP.
How can you improve your BERT NER model?

Deep Learning (DL) vs. Machine Learning (ML) – Summary

Aspect Machine Learning (ML) Deep Learning (DL)

Uses algorithms to learn Uses neural networks with multiple
Definition
patterns from data layers to learn complex patterns
Feature Requires manual feature
Learns features automatically
Engineering selection
Works well with small to
Data Requirement Requires large amounts of data
medium datasets
Computational
Runs efficiently on CPUs Needs GPUs/TPUs for training
Power
Acts like a "black box" (harder to
Interpretability Easier to interpret results
interpret)
Good for structured/tabular Excels in images, text, and speech
Performance
data processing

1. What is deep learning? How is it different from machine learning?

Answer:
Deep learning is a subset of machine learning that uses neural networks with multiple layers
(deep neural networks) to learn from large amounts of data.

 Machine Learning involves feature engineering, where humans select relevant

features for models.
 Deep Learning automatically extracts features using neural networks, making it more
powerful for tasks like image recognition, NLP, and speech processing.

What is perceptron model

A Perceptron is the simplest type of artificial neural network, primarily used for
binary classification. It is a fundamental building block in deep learning and serves as the
basis for more complex neural networks.

Key Points:
How can you improve your BERT NER model?

 Structure: A perceptron consists of input neurons, weights, a bias term, an activation

function (typically a step function), and an output neuron.
 Working:
1. Takes input features.
2. Multiplies them by weights and adds a bias.
3. Passes the sum through an activation function (like step or sigmoid) to
produce an output.

 Types:

o Single-layer Perceptron (SLP): Can only solve linearly separable problems.

o Multi-layer Perceptron (MLP): Uses hidden layers and activation functions
like ReLU or sigmoid, allowing it to solve non-linear problems.
o

 Limitations: A single-layer perceptron cannot solve the XOR problem, but MLPs
(with backpropagation) can.

 The perceptron is a single-layer neural network used for binary classification.

 It learns by adjusting weights using supervised learning.
 It can only solve linearly separable problems (like AND, OR) but not XOR.
How can you improve your BERT NER model?

 More complex problems require Multi-Layer Perceptrons (MLPs) with hidden layers
and non-linear activations.

What is Multilayer Perceptron (MLP)

A Multilayer Perceptron (MLP) is a type of artificial neural network composed of multiple layers of
neurons. Unlike a simple perceptron, which has a single layer of neurons, an MLP includes input,
hidden, and output layers. Each layer is fully connected to the next one.

1. Structure of MLP:
How can you improve your BERT NER model?

 Input Layer: Takes the input features of the dataset.

 Hidden Layers: Layers between the input and output where neurons process inputs.
These layers use activation functions (e.g., ReLU, Sigmoid) to introduce non-
linearity.
 Output Layer: Produces the final output or prediction.

2. How It Works:

 Forward Propagation:
o The input data is passed through the network layer by layer.
o Each neuron in a layer calculates a weighted sum of its inputs and applies an
activation function.
o The result of one layer is passed as input to the next layer until the output layer
provides the final prediction.
 Backward Propagation:
o After forward propagation, the loss (error) between the predicted output and
the actual target is calculated.
o The error is propagated backward through the network to compute the
gradients of the loss function with respect to the weights.
o These gradients are used to adjust the weights using an optimization algorithm
like gradient descent.

Loss function in deep learning

Loss function is a method of evaluating how well your algorithms is modelling

your dataset.

Regression loss:

mse->

Advantage-(easy to interpret, differentialble(Gd), 1 local minima).

Disadvantage-(Error Unit(squared), robust to outliers(not))

Mae->

Advantage- (Intuitive and easy, unit-same, Robust to outlier)

Disadvantage- (not differentiable)

How can you improve your BERT NER model?

Huber loss->

Combine (mse and mae) if outlier in data its behave like mae, and if outlier not present in

Data its behave like mse.

Classification loss:

Binary Cross Entropy – two classes (yes or no)

Categorical Cross Entropy- Multi-class classification (yes,no,maybe)

Gradient Descent in Deep Learning

What is Gradient Descent?

Gradient Descent is an optimization algorithm used in deep learning to minimize the error
(loss function) by adjusting the model’s parameters (weights & biases). It helps neural
networks learn by iteratively improving predictions.

|How It Works:
o Forward Pass: Compute predictions using the current
parameters.
o Loss Calculation: Measure the error between predictions and
true labels.
o Backpropagation: Compute gradients of the loss with
respect to each parameter.
o Parameter Update: Adjust parameters in the direction of the
negative gradient.
o Repeat: Iterate until the loss converges.
How can you improve your BERT NER model?

🔹 Types of Gradient Descent

Type Description Pros Cons

Batch Gradient Uses the entire dataset to Slow for large

More stable convergence.
Descent (BGD) compute gradients. datasets.

Stochastic Gradient Updates weights after each Faster updates, good for Noisy updates can lead
Descent (SGD) training example. online learning. to instability.

Mini-Batch Gradient Uses a small batch of data Balances speed & Requires tuning batch
Descent for updates. stability. size.

📌 Example: *SGD is widely used in deep learning frameworks like TensorFlow & PyTorch
for training deep networks efficiently.

Challenges of Gradient Descent in Deep

Learning

 Gradient Descent is essential for training deep learning models but

faces challenges like vanishing/exploding gradients, slow
convergence, and sensitivity to hyperparameters.
 Advanced techniques (e.g., adaptive optimizers, normalization, and
regularization) help address these challenges and improve training
efficiency and stability.

What is the Vanishing Gradient Problem?

The vanishing gradient problem occurs when deep neural networks struggle to update
weights in the earlier layers due to very small gradients during backpropagation. This
slows down or even stops learning.

How to Handle the Vanishing Gradient Problem

The vanishing gradient problem occurs in deep neural networks when gradients become too
small, preventing effective learning. Here are key solutions:
How can you improve your BERT NER model?

1. Use Activation Functions Like ReLU

o ReLU (Rectified Linear Unit) avoids small gradients by setting negative
values to zero.
o Variants like Leaky ReLU and ELU help prevent dead neurons.
2. Weight Initialization Techniques
o Xavier/Glorot Initialization: Balances variance across layers.
o He Initialization: Works well with ReLU-based activations.
3. Batch Normalization
o Normalizes inputs at each layer, stabilizing gradients.
4. Gradient Clipping
o Limits extreme gradient values to prevent exploding and vanishing gradients.
5. Residual Connections (Skip Connections)
o Used in ResNets to allow gradients to bypass deep layers, improving flow.
6. Use Smaller Learning Rates
o Helps prevent weight updates from becoming too small.

How to improve a neural network

1.Fine tuning NN hyperparameters

. hidden layers
. neurons per layers
. learning rate
. optimizer
. Batch size
. Activation function

2.By solving problems

->Vanishing / Exploding gradient:
.weight initialization
.Activation function
.gradient clipping(Exploding)
-> Not enough data:
.transfer learning
->slow training
.optimizers
.learning rate scheduler
->overfitting
.l1, and l2 regulization
.Droupout layer
.Early stopping
How can you improve your BERT NER model?

Overfitting

Early Stopping in Deep Learning

Early Stopping is a regularization technique used to prevent overfitting by halting training

when the validation loss stops improving.

Benefits:

✔ Prevents overfitting
✔ Reduces training time
✔ Improves generalization

Droupout layers in Deep Learning

(layer me jo nodes hote hai vo traing ke time hi droup hote hai testing ke time sare nodes
available hote hai , w(1-p))

Dropout is a technique in deep learning that helps prevent overfitting by randomly turning off a
percentage of neurons during training. This forces the network to learn more generalized patterns
instead of relying too much on specific neurons. The dropout rate controls how many neurons are
deactivated (e.g., 0.5 means half the neurons are dropped). During testing or inference, all neurons
are active, but their outputs are scaled to maintain consistency with training.

Regularization in Deep Learning

Regularization in deep learning refers to techniques used to prevent

overfitting, where a model performs well on training data but poorly on
unseen data. The goal of regularization is to improve the generalization of
the model by adding constraints or penalties to the learning process.
Here’s a summary of key regularization techniques:

1. L1 and L2 Regularization
 L1 Regularization (Lasso):
o Adds the sum of the absolute values of the weights to the loss
function.
o Encourages sparsity by driving some weights to zero,
effectively performing feature selection.
o Loss function: Loss=Original Loss + λ∑∣wi∣.
How can you improve your BERT NER model?

 L2 Regularization (Ridge):
o Adds the sum of the squared values of the weights to the loss
function.
o Encourages small weights but does not force them to zero.
o Loss function: Loss=Original Loss +λ ∑w

(Regularization me ham jo hamara

existing loss function hota hai usme penalty term
add karte. Penalty term sare weight ke square ka
sum hota hai, ham kabhi bhi bias ke square ka
sum nhi karte)

Vanishing / Exploding gradient:

What are Activation Function

In artificial neural networks, each neuron forms a

weighted sum of its input and passes the resulting scalar
value through a function referred to as an activation
function

Types of Activation Functions

Linear Activation Function

 f(x) = ax
 Issue: Cannot handle non-linear problems, limiting learning.

Non-Linear Activation Functions

1. ReLU (Rectified Linear Unit)

o f(x) = max(0, x)
o Non-linear
o Not saturated in the +ve region
o Pros: Avoids vanishing gradient, computationally efficient.
o Cons: Can cause "dead neurons" (outputs zero for negative inputs)
o Variants: Leaky ReLU, Parametric ReLU (PReLU), ELU.
How can you improve your BERT NER model?

2. Leaky ReLU
o f(x) = x if x>0, else 0.01x
o Fixes dead neurons issue in ReLU.
3. Sigmoid (Logistic Function)
o f(x)=1/1+e−x
o Pros: Converts inputs to range (0,1), useful for probability-based tasks.
o Cons: Prone to vanishing gradients, slow learning.
4. Tanh (Hyperbolic Tangent Function)
o Pros: Outputs in range (-1,1), better than sigmoid for deep networks.
o Cons: Still suffers from vanishing gradients.
5. Softmax
o Used in multi-class classification to convert logits into probabilities.

the speed up the traning

Why is Weight Initialization Important in Deep Learning?

Weight initialization is crucial in deep learning because it significantly impacts the training
process, convergence speed, and overall performance of a neural network. Poor weight
initialization can lead to slow learning, vanishing/exploding gradients, or even prevent the
network from learning effectively.

What NOT to Do:

 Do NOT initialize all weights to zero: (no training)
. if all weights are set to zero, all neurons in a layer will receive the same
gradients and update identically.
. This leads to symmetry problems, where all neurons learn the same features
and the network fails to learn effectively.

 Do NOT use constant values for weights:

. Assigning all weights a constant (e.g., 0.1) still leads to symmetry problems, similar
to zero initialization.

 Using Random Initialization:

. training will slow when using random initialization.
. Vanishing gradient problems arise.
How can you improve your BERT NER model?

When can be done:

Use activation-specific initialization:

 Xavier for Sigmoid/Tanh

 He for ReLU/Leaky ReLU

What is Batch Normalization?

Batch-Normalization is an algorithmic method which makes the training of deep neural

Networks (DNN) faster and more stable.
It consists of normalizing activation vectors from hidden layers using the mean and
Variance of the current batch. This normalization step is applied right before (or
Right after ) the nonlinear function.

Optimizers in Deep Learning

Optimizers are algorithms that adjust the weights of a neural network to minimize the loss function
and improve model performance. They help in faster and more stable training.

Optimizers are algorithms or methods used to change the attributes of your neural network such as
weight and learning rate in order to reduce the loses.

Challenges Solved by Optimizers:

Gradient Descent is essential for training deep learning models but faces
challenges like vanishing/exploding gradients, slow convergence, local
minima, learning rate, and sensitivity to hyperparameters.
Types of Optimizers

1 . Momentum-based (SGD + Momentum, NAG)

. Speeds up training and smooths updates.

2 . Adagrad (Adaptive Gradient Algorithm)

 Adjusts learning rates based on past gradients (larger updates for infrequent
parameters).
 Pros: Works well for sparse(most value are zero) data.
How can you improve your BERT NER model?

 Cons: Learning rate keeps decreasing, which can slow down training.

3 . RMSprop (Root Mean Square Propagation)

 Modifies Adagrad by introducing a moving average of squared gradients.

 Pros: Works well for non-stationary objectives like deep networks.
 Cons: Learning rate still needs careful tuning.

4 .Adam (Adaptive Moment Estimation)

 Combines momentum and RMSprop to provide adaptive learning rates.

 Pros: Fast convergence, works well for most tasks.
 Cons: Can overfit or lead to unstable training in some cases.

What is CNN

A CNN (Convolutional Neural Network) is a type of deep learning model primarily used for image
processing, computer vision, and pattern recognition tasks
How can you improve your BERT NER model?

Key Layers(COMPONENTS):

1. Convolutional Layer: Extracts features by applying filters (kernels) to the input

image.
2. Pooling Layer: Reduces spatial dimensions, making the model efficient.
3. Fully Connected Layer: Processes extracted features and makes final predictions.
4. Padding: Preserves input size or reduces edge information loss.
5. Strides: Controls the step size of the filter, affecting output size.

Applications:

 Image classification (e.g., identifying cats vs. dogs)

 Object detection (e.g., self-driving cars)
 Medical imaging (e.g., detecting tumors in X-rays)
 Face recognition (e.g., unlocking phones)

Advantages:
 Automatically learns spatial hierarchies of features.
 Highly effective for image-related tasks.
 Reduces the need for manual feature engineering

=>CNN Architecher based on humen VISUAL CORTEX.

=>there area two types of image 1.greyscale-balck/white image, 2.RGB – red green and black

Limitations:

 Computationally expensive, especially for large datasets.

 Requires significant amounts of labeled data for training.
 Can be prone to overfitting if not properly regularized

Why Not Use ANN for Image Data

CNNs are specifically designed to handle the unique challenges of image data,
such as spatial hierarchies, translation invariance, and high dimensionality. While
ANNs can technically process images, they are inefficient, computationally
expensive, and less effective compared to CNNs for image-related tasks.

 High coputional cost

 Overfitting
 Loss of important information like spatial arrangement of pixels

CNN Architecture - Summary

A Convolutional Neural Network (CNN) is a deep learning model used for image recognition
and processing. Its key layers include:

1. Input Layer – Takes an image as input.

2. Convolutional Layer – Extracts features using filters (kernels).
How can you improve your BERT NER model?

3. Activation Function (ReLU) – Introduces non-linearity.

4. Pooling Layer – Reduces spatial dimensions while retaining key features (e.g., Max Pooling).
5. Fully Connected Layer (FC) – Flattens and connects neurons for classification.
6. Output Layer – Produces final predictions using softmax or sigmoid functions.

CNNs are effective in tasks like image classification, object detection, and facial recognitio

CNN Architecture

Bac
kpropagation in CNN

Backpropagation in Convolutional Neural Networks (CNNs) is the process of updating the network’s
weights based on the error computed at the output layer. It follows the same fundamental principles
as backpropagation in traditional neural networks but accounts for the unique layers in CNNs, such
as convolutional and pooling layers.

Steps of Backpropagation in CNN:

1. Forward Propagation:

 Input image is passed through multiple convolutional, activation, pooling, and fully
connected layers.

 The final layer computes the output (predictions).

 Loss is calculated using a loss function (e.g., Cross-Entropy, MSE).

2. Backward Propagation (Gradient Calculation):

 Compute the derivative of the loss with respect to the output layer weights (Fully
Connected Layer).

 Backpropagate the error through the fully connected layers using the chain rule.

 Compute gradients for the convolutional layers by applying the chain rule to the activation
maps and filters.
How can you improve your BERT NER model?

 Pooling layers (if max pooling) propagate gradients only for the max activation locations.

Weight Update (Gradient Descent / Adam Optimizer):

 Update the weights of the filters and fully connected layers using gradient descent:
W=W−α∂L/∂W

 where α\alphaα is the learning rate.

4. Repeat Until Convergence:

 The process is repeated over multiple epochs until the loss stabilizes.

24. What are some real-world applications of CNNs?

Answer:

 Facial Recognition (Face ID, surveillance)

 Medical Imaging (Tumor detection)
 Autonomous Vehicles (Object detection, lane detection)
 Robotics & Industrial Automation

Pretrained Models in CNN

Pretrained models in Convolutional Neural Networks (CNNs) are deep learning models that
have already been trained on large datasets (such as ImageNet). These models can be used for
transfer learning, where they are fine-tuned on new tasks with smaller datasets, significantly
reducing training time and computational cost.

1. Why Use Pretrained Models?

✅ Faster training – No need to train from scratch.

✅ Better accuracy – Models have learned general features from large datasets.
✅ Less data required – Useful when labeled data is limited.
✅ State-of-the-art performance – Many models achieve top results on benchmark datasets.
How can you improve your BERT NER model?

VGG16: A Deep Learning Model for Image Processing

VGG16 is a deep convolutional neural network (CNN) architecture developed by the

Visual Geometry Group (VGG) at the University of Oxford. It was introduced in the 2014
ImageNet Large Scale Visual Recognition Challenge (ILSVRC) and became famous for
its simplicity and effectiveness in image classification tasks.

Why is VGG16 Used?

. pretrained on ImageNet:->
VGG16 is pretrained on the ImageNet dataset, which contains over 1
million images across 1000 classes. This makes it a powerful feature extraction for many computer
vision tasks.

. Transfer Learning:->
It can be fine-tuned for specific tasks, saving time and computational resources.

Good for Feature Extraction:->

The hierarchical feature maps produced by the network capture low-level(edges) and
high-level(object parts) features.

Standard Benchmark Model:->

Due to its popularity and simplicity, it serves as a benchmark in academic and
industrial research.

VGG16 Architecture
How can you improve your BERT NER model?

VGG16 is a deep convolutional neural network (CNN) with 16 weighted layers (13 convolutional + 3
fully connected). It uses 3×3 convolutional filters, ReLU activation, and 2×2 max-pooling layers for
feature extraction. The final layers consist of three fully connected layers, ending with a Softmax
classifier for image classification.

Applications:

 Image classification (ImageNet)

 Object detection (Faster R-CNN)
 Feature extraction for deep learning tasks

What is ResNet?

ResNet (Residual Network) is a deep learning architecture designed to train very deep
neural networks effectively. It was introduced by Microsoft in 2015 and became famous for
winning the ImageNet competition due to its superior performance.

Why was ResNet Needed?

As neural networks grow deeper, they face problems like:

❌ Vanishing Gradient Problem – Gradients become very small during training, making it
hard to update weights in deeper layers.
❌ Degradation Problem – Adding more layers sometimes decreases accuracy instead of
improving it.

Residual Block & Skip Connections

How can you improve your BERT NER model?

Components of the Residual Block

 Input (X): The initial feature map.

 Weight Layers: These are convolutional layers that learn transformations.
 ReLU Activation: Applied after each convolution to introduce non-linearity.
 Skip Connection (Identity Mapping): Directly adds the input X to the output of the
weight layers.
 Summation (+): Combines the transformed output F(X) with the original input X.
 Final ReLU Activation: Helps in non-linearity and better learning.

Skip Connections
. these “shortcut” connections allow the information from earlier layers to bypass some layers
and directly contribute to the later layers helping with gradient flow.
. prevent problems like vanishing/exploding gradients and improve regularization.

Experiments:
. ResNet has been successfully tested with 100-1000 layers on datasets like CIFAR-10,
demonstrating its scalability and effectiveness.

2️⃣ Understanding the Formula

The Residual Block computes:

Y=F(X) + X

Where:
How can you improve your BERT NER model?

 F(X) is the output of the convolutional layers.

 X is the original input (passed through the shortcut connection).
 The final result Y is the sum of both.

3️⃣ Why Use a Skip Connection?

✅ Solves the vanishing gradient problem

✅ Allows training very deep networks

Transfer learning
Transfer learning is a research problem in machine learning that focuses on storing
knowledge gained while solving one problem and applying it to a different but related problem.

Ways of doing Transfer Learning

Feature
Extraction Fine tuning
How can you improve your BERT NER model?

Feature Extraction

✅ Freeze pre-trained model layers and use them as a fixed feature extractor.
✅ Replace the last few layers with a new classifier for the target task.
✅ Best for small datasets and when features from the pre-trained model are already useful.
✅ Fast training but may not generalize well if tasks are too different.

Fine-Tuning

✅ Unfreeze some or all pre-trained layers and retrain them on new data.
✅ Helps the model adapt its features to the new dataset.
✅ Best for larger datasets and when the new task requires more customization.
✅ Better accuracy but requires more training time and data.

ANN-> Work on tbular data

CNN->Work on image data
RNN-> Work on sequential data

When to Use What?

Type Best For

ANN Simple classification/regression problems with no sequential dependency.

CNN Image/video processing, feature extraction from spatial data.

RNN Speech recognition, text processing, and time series forecasting.

How can you improve your BERT NER model?

What is RNN

RNN (Recurrent Neural Network) is a type of neural network designed for sequential data
processing. It retains memory of past inputs using hidden states, making it useful for tasks like
speech recognition, language modelling, and time series prediction. However, RNNs suffer from
vanishing/exploding gradients, which limit their ability to learn long-term dependencies. Variants
like LSTM and GRU help overcome these issues.

Why RNNs?

 Handle variable-length sequences.

 Capture temporal dependencies (order matters).
 Suitable for tasks where context from past inputs is crucial.
 sequence contains some meaning

Architecture of RNN
. Input Layer:
->Accepts sequential data(e.g, time-series, text).
->Data is usually represented as vectors(e.g, word embeddings for text)
. Hidden Layers with Recurrent Connections:
->Contains neurons that not olny process the current input but also have
“recurrent” past inputs.
->The hidden state stores information from previous time steps, allowing the
network to “remember” past inputs.
. Output layer:
->Generates the output at each time step based on the hidden state.
Time Dependency:
->The network processes one input at a time step and uses the hidden state to link
information across time steps.

. Backpropagation Through Time(BPTT):

 The training process where errors are propagated backward through the entire
sequence to adjust weights.
 Special algorithm used to handle the temporal dependencies in the network.
How can you improve your BERT NER model?

Feed back

h
output
input

Why This Matters:

 This unfolding process is key to understanding how RNNs handle sequential data and
maintain a "memory" of past inputs.

 It highlights the recurrent nature of RNNs, where the same parameters are reused across
time steps, making them efficient for tasks like time series prediction, NLP, and speech
recognition.

Types of RNN

Many to One to many2 One 2 one

one many many

How can you improve your BERT NER model?

Types of RNN (Summary)

RNNs can be categorized based on how inputs and outputs are structured over time.

1. One-to-One (Vanilla NN)

o Standard neural network (not truly RNN).
o Single input → Single output.
o Example: Image classification.
2. One-to-Many
o Single input → Sequence output.
o Example: Image captioning (one image → multiple words).
3. Many-to-One
o Sequence input → Single output.
o Example: Sentiment analysis (sentence → sentiment score).
4. Many-to-Many (Same Length)
o Sequence input → Sequence output (same length).
o Example: Named Entity Recognition (NER) (word sequence → tag sequence).
5. Many-to-Many (Different Lengths)
o Sequence input → Sequence output (different lengths).
o Example: Machine translation (English sentence → French sentence).

Key Takeaways

 Forward propagation: Computes hidden states and outputs.

 Backward propagation (BPTT): Computes gradients and updates weights.

 Challenges: Vanishing gradient problem( Long term dependency)/Exploding gradient

problems, solved using LSTMs/GRUs,

Why Does It Occur Vanishing gradient problem?

Repeated multiplication of small gradient values.
Activation functions like sigmoid/tanh squash values, making gradients even smaller.
RNNs share weights across time steps, worsening the effect over long sequences.
How can you improve your BERT NER model?

What is LSTM

LSTM(Long Short Term Memory ) is a type of recurrent neural network(RNN) designed to handle
sequential data and long term dependencies by addressing the Vanishing gradient problem.

Architecture of LSTM

The architecture of an LSTM (Long Short-Term Memory) network consists of several

components designed to manage and retain information over long sequences. The key
elements are:

1. Cell State: The memory of the LSTM, which carries important information
throughout the sequence and is updated by the gates. It is passed along from one time
step to the next.
2. Forget Gate: Decides what information from the cell state should be discarded. It
uses a sigmoid activation function to output values between 0 (completely forget) and
1 (completely retain).
How can you improve your BERT NER model?

3. Input Gate: Determines what new information should be added to the cell state. It
has two parts:
o A sigmoid layer, which decides which values to update.
o A tanh layer, which creates a vector of new candidate values that could be
added to the state.
4. Cell State Update: The cell state is updated by combining the old state (after being
filtered by the forget gate) with the new candidate values from the input gate.
5. Output Gate: Decides what the next hidden state should be (i.e., the output for the
current time step). It uses a sigmoid layer to filter information and a tanh layer to
ensure that the values stay between -1 and 1.

The flow can be summarized in the following steps:

1. The forget gate filters out irrelevant information from the previous cell state.

2. The input gate updates the cell state with new relevant information.

3. The cell state is updated and passed on to the next time step.

4. The output gate generates the hidden state (output).

This structure allows LSTMs to effectively capture long-term dependencies in sequential data, solving
issues that arise in traditional RNNs, such as the vanishing gradient problem

LSTM Aplications:

LSTM (Long Short-Term Memory) networks are widely used in sequential data processing due to
their ability to retain long-term dependencies. Their key applications include:

1. Natural Language Processing (NLP) – Used in text generation, machine translation, speech
recognition, and chatbots.

2. Time Series Forecasting – Helps predict stock prices, weather patterns, and sales trends.

3. Speech & Audio Processing – Applied in speech recognition, music generation, and speaker
identification.

4. Healthcare & Biomedical – Used for ECG analysis, medical diagnosis, and drug discovery.

5. Anomaly Detection – Detects fraud, cybersecurity threats, and machinery failures.

6. Autonomous Systems – Helps in self-driving cars and robotics.

7. Computer Vision & Video Analysis – Used in action recognition, handwriting recognition,
and image captioning.

8. Reinforcement Learning – Enhances AI decision-making in gaming and automated systems.

LSTMs are essential for tasks requiring memory of past sequences, making them valuable in AI
applications.
How can you improve your BERT NER model?

What is Gated Recurrent Unit(GRU)

GRUs are often preferred when computational efficiency and simplicity are
priorities, especially for smaller datasets or tasks where long-term dependencies
are less critical. However, LSTMs may still be better for tasks requiring the
modeling of very long-term dependencies or when working with large datasets.
How can you improve your BERT NER model?

Deep learning

ANN CNN RNN Seq2seq

How to CNN RNN models
Improve transfer LSTM/GRU Encoder Decoder
Performance learning Attention
Transformer
Fine-tuning

Seq2seq model
How can you improve your BERT NER model?

Encoder and Decoder

High-Level Overview

At a very high level, an encoder-decoder model can be

thought of as two blocks, the encoder and the decoder
connected by a vector which we will refer to as the
‘context vector’.

 Encoder: The encoder processes each token in the input-sequence. It

tries to cram all the information about the input-sequence into a vector of
fixed length i.e. the ‘context vector’. After going through all the tokens,
the encoder passes this vector onto the decoder.
 Context vector: The vector is built in such a way that it's expected to
encapsulate the whole meaning of the input-sequence and help the
decoder make accurate predictions. We will see later that this is the final
internal states of our encoder block.
 Decoder: The decoder reads the context vector and tries to predict the
target-sequence token by token.
How can you improve your BERT NER model?

What’s under the hood?

The internal structure of both the blocks would look

something like this:

Explanation of the Image

This image represents a sequence-to-sequence (Seq2Seq) model with Long Short-Term

Memory (LSTM) networks, commonly used for tasks like machine translation.

 Encoder: Processes the input sentence and extracts meaningful context.

 Decoder: Uses this context to generate the output sequence step by step.
 Hidden State & Cell State: These states transfer the learned knowledge from the encoder to
the decoder.
 Seq2Seq Model: This type of model is widely used in machine translation, text
summarization, and chatbot developmen

Encoders and decoders several challenges

Encoders and decoders in machine learning, especially in sequence-to-sequence models, face

several challenges:

 Encoder Problems: Loss of information, vanishing gradients, inefficiency in handling long

sequences, and overfitting.
How can you improve your BERT NER model?

 Decoder Problems: Exposure bias, error accumulation, slow inference, difficulty with rare
words, and alignment issues.

Solutions include attention mechanisms, transformers, beam search, pre-training, and improved
training techniques like scheduled sampling.

Attention Mechanism
The attention mechanism is a technique used in neural networks to focus on the most
relevant parts of the input when making predictions. It is widely used in sequence-to-
sequence (Seq2Seq) models, especially in tasks like machine translation, text
summarization, and speech recognition.
How can you improve your BERT NER model?

How It Works
Instead of encoding the entire input into a fixed-length vector (which can
lead to information loss), attention dynamically assigns different weights
to different input elements. This allows the model to focus on the most
important parts of the input at each step.
1. Calculate Attention Scores: The model computes similarity scores
between each input token and the current decoding step.
2. Assign Weights: These scores are converted into probabilities (using
Softmax), determining how much attention each input token gets.
3. Weighted Sum: The weighted sum of the input representations is
computed and used for the final prediction.

Transformer

A Transformer is a deep learning model introduced in "Attention is All

You Need" (2017). It replaces RNNs and LSTMs with self-attention and
multi-head attention, enabling parallel processing of sequences for
better efficiency and scalability.

Why Transformers?
They handle long-range dependencies, support parallel processing, and
scale efficiently.

Key Features:

 Self-Attention: Captures dependencies between words in a sequence.

 Multi-Head Attention: Learns multiple contextual representations.
 Positional Encoding: Preserves word order.
 Feedforward Layers & Residual Connections: Improve learning stability.
 Encoder-Decoder Architecture: Used in translation and text generation.

Applications:
 NLP: BERT, GPT, T5 for text tasks.
 Vision: Vision Transformers (ViT) for image processing.
 Speech: Whisper, Wav2Vec for speech-to-text.
How can you improve your BERT NER model?

Advantages of Transformers

 Handles long-range dependencies with self-attention.

 Highly parallelizable for faster training.
 Performs well on large datasets (e.g., BERT, GPT).
 Versatile applications in NLP, vision, and speech.
 No sequential computation like RNNs, reducing bottlenecks.

Disadvantages of Transformers

 High computational cost and requires powerful hardware.

 Memory-intensive, especially for long sequences (O(n²) complexity).
 Needs large datasets for effective training.
 Difficult to interpret compared to traditional models.
 Not ideal for small-scale tasks, as simpler models may perform sufficiently.

Self Attention
[ Self Attention is a mechanism that can take static Embbeding as input and can
generat good Contextual Embedding which are much better use for any kind of
Nlp Aplications ]
How can you improve your BERT NER model?

1. Inputs (Tokens): Represented by green blocks at the top, these could be words like
"money," "bank," etc.
2. Query, Key, and Value Vectors (Q, K, V):
o The pink blocks represent Key (K) vectors.
o The green blocks represent Query (Q) vectors.
o The blue blocks at the bottom represent Value (V) vectors.
3. Softmax Layer:
o The yellow box labeled "Softmax" computes attention scores for each token
by comparing the Query with all Keys.
o It assigns importance scores (s11, s12, s13, etc.) based on similarity.
4. Weighted Sum of Values:
o The final output (purple block) is obtained by summing the values (V)
weighted by the attention scores.

How It Works

1. Each token attends to all other tokens in the sequence, computing attention scores using
the dot product of its Query (Q) with all Keys (K).
2. Softmax normalizes the scores, ensuring they sum to 1.
3. Weighted sum of Value (V) vectors gives the final contextualized representation for each
word.

Points to consider :

 This operation is a parallel operation. Problems of self Attention

 There are no parameterics involved
How can you improve your BERT NER model?

This figure add parametrics(Wa,Wk,Wv) for task specific contextual embedding.

. Parallel summary upper figure.

. unstable gradient or unstable training naho isliye Scaled-dot-Product ka use karte
hai.
How can you improve your BERT NER model?

Multi-Head Attention
Multi-head attention is an extension of the self-attention mechanism in deep learning,
primarily used in transformers (e.g., BERT, GPT). It allows the model to attend to different
parts of the input sequence simultaneously, improving its ability to capture complex
relationships.

Key Points:

 Multiple Attention Heads: Instead of a single attention mechanism, multiple

attention heads process the input in parallel.
 Diverse Focus: Each head learns different patterns (e.g., short-term vs. long-term
dependencies).
 Improved Performance: Helps models better understand contextual relationships in
NLP, vision, and time-series data.

Formula:

MultiHead(Q,K,V)=Concat(head1,head2,...,headh)WO

where each head applies scaled dot-product attention.

Multi-head attention is an extension of the self-attention mechanism, where the model

computes multiple sets of attention scores (heads) in parallel and concatenates their outputs.
How can you improve your BERT NER model?

This allows the model to focus on different parts of the input sequence simultaneously,
capturing multiple types of relationships between tokens.

Benefits:

 Improves the ability to capture different patterns in the input.

 Provides more robust representations by combining multiple attention heads.

Positional Encoding in Transformers

Since transformers do not have a built-in notion of sequence order (unlike RNNs or CNNs),
positional encoding (PE) is used to provide positional information about words in a
sequence.

Why Positional Encoding?

Transformers process input in parallel rather than sequentially. Without positional encoding,
the model would treat words as a bag of words, ignoring word order. PE allows the model to
distinguish the positions of words in a sentence.

What is Normalization
Normalization in deep learning refers to the process of transforming data or model
outputs to Have specific statistical properties, typically a mean of zero and a Variance of one.

Key Benefits:

✅ Prevents exploding/vanishing gradients

✅ Stabilizes training for deeper models
✅ Speeds up training and improves efficiency
✅ Works well with attention mechanisms
✅ Independent of batch size (unlike BatchNorm)

Modern transformers mostly use Pre-Normalization (Pre-LN) for better stability.

Normalization in Transformers
Normalization in Transformers stabilizes training and improves convergence. The most common
method is Layer Normalization (LN), which normalizes across features for each token independently,
making it effective for variable sequence lengths

Why Normalization is Needed?

How can you improve your BERT NER model?

 Transformers process sequences where activations can have varying magnitudes.

 Normalization ensures stable gradients and efficient learning.

 It helps prevent issues like internal covariate shift.

why not use batch normalization in transformer

Batch Normalization (BN) is not used in Transformers because it depends on batch statistics, which
are unstable due to variable sequence lengths and small batch sizes. Instead, Layer Normalization
(LN) is preferred as it normalizes across features per token, making it more stable for self-attention
mechanisms. BN can also interfere with attention dynamics, whereas LN works more effectively in
Transformers.

Transformer as an Autoregressive models

The Transformer decoder is autoregressive at inference time and non-autoregressive at

Training time.

In the context of deep learning, autoregressive models are a class of

Models that generate data points in a sequence by conditioning each
New point on the previously generated points.

Masked Multi-Head Self-Attention

1. What is It?

Masked Multi-Head Self-Attention is a modified version of self-attention used in the Transformer

decoder. It prevents the model from "looking ahead" at future tokens when generating text,
ensuring that predictions are made sequentially.

2. Why is Masking Needed?

When generating text, the model should predict the next word based only on previous words. If it
could see future words during training, it would "cheat" and learn incorrect patterns.
How can you improve your BERT NER model?

How Masking Works

A masking matrix is applied to the attention scores before the softmax step. The mask:

Sets future token scores to −∞ (or a very large negative number), making their softmax
probabilities zero.

Ensures the model cannot attend to tokens that come after the current position.

Mathematically, masking modifies the attention mechanism as:

Masked Attention(Q,K,V)=softmax(QKT/dk+M)V

where M is a matrix where:

Allowed positions (past & current tokens) have 000.

Future positions have −∞.

What is Cross Attention

Cross attention is a mechanism used in transformer architectures, particulary in tasks involving

sequence-to-sequence data like translation or summarization. It allows a model to focus on different
part of an input sequence when generating an output sequence.

Transformer Architecture in Deep

Learning

The Transformer is a deep learning model introduced in the paper "Attention Is All You Need" by
Vaswani et al. (2017). It has revolutionized Natural Language Processing (NLP) and is widely used in
large language models (LLMs) like GPT, BERT, and T5. Unlike recurrent networks, Transformers
process input in parallel, making them more efficient and scalable.
How can you improve your BERT NER model?

Advantages of Transformers

✅ Parallelization – Unlike RNNs, they process all tokens at once.

✅ Long-Range Dependencies – Self-attention allows capturing relationships between distant words.
✅ Scalability – Suitable for training large models with massive datasets.

Summary of Transformer Encoder

The encoder in a Transformer processes an input sequence and generates a rich,
contextualized representation of each token. It consists of multiple identical layers (typically
6 or more), where each layer has two main components:
How can you improve your BERT NER model?

1. Multi-Head Self-Attention – Helps each word attend to all other words in the sequence.
2. Feed-Forward Neural Network (FFN) – Applies non-linearity and transformations.

Each layer also includes residual connections and layer normalization to stabilize learning.

Processing Steps in Each Encoder Layer:

1. Input Embedding + Positional Encoding – Converts words into vectors and adds positional
information.
2. Multi-Head Self-Attention – Computes attention scores to capture relationships between
words.
3. Add & Normalize – Uses residual connections and layer normalization.
4. Feed-Forward Neural Network (FFN) – Applies non-linear transformations.
5. Add & Normalize (Again) – Another residual connection and layer normalization.

After passing through multiple encoder layers, the final output is a contextual
representation of the input, which is used by the decoder for further processing.

Absolutely! Here's a clear and concise summary of the Transformer Decoder, mirroring
the encoder format:

🔍 Transformer Decoder – Summary

The Transformer Decoder takes the encoder’s output and a target sequence (e.g. previous
words in translation) to generate the final output sequence one token at a time during
inference.

🧱 Structure

 Composed of N identical layers (typically same as the encoder: 6).

 Each decoder layer has three main sub-layers:
1. Masked Multi-Head Self-Attention – Allows each position to attend only to
earlier positions in the target sequence (to prevent "cheating").
2. Multi-Head Cross-Attention – Lets the decoder attend to the encoder's
output.
3. Feed-Forward Neural Network (FFN) – Adds transformations and non-
linearity.

🔄 Steps in Each Decoder Layer

1. Input Embedding + Positional Encoding

o Target tokens are embedded + positional info added.
2. Masked Multi-Head Self-Attention
o Prevents attending to future tokens (important during training and inference).
How can you improve your BERT NER model?

3. Add & Normalize

o Residual connection + layer normalization.
4. Multi-Head Cross-Attention (Encoder-Decoder Attention)
o Queries from decoder attend to key-values from encoder outputs.
5. Add & Normalize
o Another residual connection + normalization.
6. Feed-Forward Network (FFN)
o Applies transformation to each position independently.
7. Add & Normalize
o Final normalization before passing to next decoder layer or output layer.

🧠 Final Output

 The decoder generates a probability distribution over the next token in the
sequence.
 During inference, it generates tokens step-by-step, feeding previous outputs back in.

Let me know if you'd like:

 A side-by-side Encoder vs Decoder comparison

 Code implementation in PyTorch/Keras
 Or a visual diagram of the architecture!
How can you improve your BERT NER model?
How can you improve your BERT NER model?

1. What is BERT, and why is it important?

Answer:
BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained NLP model
developed by Google. It is designed to understand the context of words in a sentence
bidirectionally, unlike traditional NLP models that process text sequentially.

 It is important because it improves accuracy in NLP tasks such as question answering,

sentiment analysis, text classification, and named entity recognition (NER).

2. How does BERT differ from previous NLP models?

Answer:
BERT is different because:

1. It uses bidirectional training (considers both left and right context in a sentence).

2. Previous models like Word2Vec and Glove created static word embeddings, but BERT
generates contextual embeddings based on the sentence.

3. It is pre-trained on large datasets and then fine-tuned for specific NLP tasks.

3. What are the key components of BERT's architecture?

Answer:
BERT is based on the Transformer architecture and consists of:

 Multi-head Self-Attention (helps in understanding word dependencies)

 Feed-forward Neural Networks

 Positional Encoding (as Transformers do not process text sequentially)

 Layer Normalization

 Residual Connections

5. What are the main pre-training tasks used in BERT?

How can you improve your BERT NER model?

Answer:
BERT is pre-trained using two tasks:

1. Masked Language Model (MLM):

o 15% of words in the input are masked, and BERT predicts the missing words.

2. Next Sentence Prediction (NSP):

o BERT is given two sentences and learns to predict if the second sentence follows the
first one or not.

6. How does BERT handle out-of-vocabulary (OOV) words?

Answer:
BERT uses Word Piece Tokenization, which splits rare words into smaller sub words or characters.
Example:

 The word “unaffordable” → is split into [‘un’, ‘##afford’, ‘##able’]

This allows BERT to handle unknown words efficiently.

7. How is BERT fine-tuned for specific NLP tasks?

Answer:

 A pre-trained BERT model is taken.

 A task-specific layer (such as a classifier or regression layer) is added on top.

 The model is fine-tuned using a smaller label dataset for specific tasks like sentiment
analysis, named entity recognition (NER), question answering, or text classification.

8. What are the advantages and limitations of BERT?

Answer:

✅ Advantages:

 Better context understanding due to bidirectional processing.

 Improves accuracy in NLP tasks like search engines, chatbots, and QA systems.
 Pre-trained on large corpora, reducing training effort.

❌ Limitations:
How can you improve your BERT NER model?

 High computational cost (requires powerful GPUs/TPUs).

 Slow inference time due to its size.
 Fixed-length input (512 tokens max), making it difficult for long documents.

Summary of BERT Training

BERT (Bidirectional Encoder Representations from Transformers) is pre-trained on:

1. Book Corpus (800M words) – A large collection of books to help BERT learn long-
range dependencies.
2. English Wikipedia (2.5B words) – Wikipedia text (excluding tables and lists) to
provide factual and diverse language understanding.

BERT is trained using Masked Language Modelling (MLM) and Next Sentence
Prediction (NSP). After pre-training, it can be fine-tuned on specific NLP tasks like question
answering, sentiment analysis, and named entity recognition (NER).

Why use BERT for fine-tuning?

“BERT is already trained on a huge amount of text, so it understands the language very well.
Instead of training a model from scratch, we can fine-tune BERT on our specific task, like sentiment
analysis or question answering. This saves time, needs less data, and gives better results because
BERT already knows how language works.”

Short example (if asked):

“For example, if I want to build a model to detect spam messages, I can take BERT and fine-tune it
on labeled spam data. BERT already understands grammar and context, so it learns the spam
patterns quickly.”
How can you improve your BERT NER model?

Simple Interview Answer:

"I can improve my BERT NER model in a few ways:

 Use a better version of BERT like RoBERTa or DistilBERT.

 Add more training data or create similar examples (data augmentation).

 Tune the training settings like learning rate, batch size, and number of training steps.

 Handle imbalanced data so the model learns all entity types better.

 Add a CRF layer on top of BERT to improve how the model predicts sequences.

 Clean and preprocess the data well, like fixing label errors or token issues."

Let me know if you want help explaining any of those points in even simpler terms or with
examples!
How can you improve your BERT NER model?

1. What is the T5 model in deep learning?

Answer:
T5 (Text-to-Text Transfer Transformer) is a transformer-based NLP model developed by Google
Research, where all NLP tasks (summarization, translation, question answering, etc.) are treated as
text-to-text problems. It uses an encoder-decoder architecture and is trained on a large dataset
called C4 (Colossal Clean Crawled Corpus).

2. How is T5 different from BERT and GPT?

Answer:

Feature T5 BERT GPT

Architecture Encoder-Decoder Encoder-only Decoder-only
Objective Text-to-Text Masked LM Auto-Regressive
Generation +
Task Type Understanding Generation
Understanding
Handles multiple NLP Strong contextual
Strengths Natural text generation
tasks understanding
Requires large-scale
Weaknesses Expensive Cannot generate text
training

4. What dataset was used to train T5?

Answer:
T5 was trained on C4 (Colossal Clean Crawled Corpus), a massive dataset created from web pages.
This dataset was cleaned to remove low-quality text and was used for pretraining.

5. What is the training objective of T5?

Answer:
T5 is trained using a denoising autoencoder objective. This means:

 The input text is corrupted by randomly masking some words.

 The model learns to reconstruct the original sentence.

Example:

 Input: "The quick brown [MASK] jumps over the lazy [MASK]."
 Output: "The quick brown fox jumps over the lazy dog."
How can you improve your BERT NER model?

This is similar to BERT's masked language modeling, but T5 generates the full sequence
instead of just predicting individual words.

7. What is the architecture of T5?

Answer:
T5 follows the Transformer Encoder-Decoder architecture:

 Encoder: Reads the input text and creates representations.

 Decoder: Generates the target text.

 Self-Attention: Helps model long-range dependencies.

 Feedforward Networks: Used in both encoder and decoder.

10. What are the advantages and limitations of T5?

Answer:

✅ Advantages:

 Unified framework for multiple NLP tasks.

 Pretrained on a large corpus, making it powerful.

 Flexible (can be fine-tuned for any task).

❌ Limitations:

 Computationally expensive (especially larger versions).

 Requires a large dataset for fine-tuning.

 May generate incorrect or biased outputs if the training data contains biases.
How can you improve your BERT NER model?
How can you improve your BERT NER model?
How can you improve your BERT NER model?
How can you improve your BERT NER model?
How can you improve your BERT NER model?

Fruits Classification using Convolutional Neural Network 8th Edition Md. Forhad Ali - The ebook with rich content is ready for you to download
100% (1)
Fruits Classification using Convolutional Neural Network 8th Edition Md. Forhad Ali - The ebook with rich content is ready for you to download
39 pages
DL Unit 1
No ratings yet
DL Unit 1
200 pages
Introduction To ImageNet Competition
No ratings yet
Introduction To ImageNet Competition
10 pages
Mathematics 12 01393
No ratings yet
Mathematics 12 01393
34 pages
DL Full Merged
No ratings yet
DL Full Merged
454 pages
Hybrid_Deep_Learning_Algorithms_for_Dog_Breed_IdentificationA_Comparative_Analysis
No ratings yet
Hybrid_Deep_Learning_Algorithms_for_Dog_Breed_IdentificationA_Comparative_Analysis
12 pages
Deep Learning A Tutorial
No ratings yet
Deep Learning A Tutorial
16 pages
file
No ratings yet
file
408 pages
Curriculum: Tuesday, February 15, 2022 3:30 PM
No ratings yet
Curriculum: Tuesday, February 15, 2022 3:30 PM
408 pages
Curriculum: Tuesday, February 15, 2022 3:30 PM
No ratings yet
Curriculum: Tuesday, February 15, 2022 3:30 PM
358 pages
Inam Ullah Khan (editor), Salma El Hajjami (editor), Mariya Ouai - Cognitive Machine Intelligence_ Applications, Challenges, and Related Technologies (Intelligent Data-Driven Systems and Artific (2024, CRC Press) -
No ratings yet
Inam Ullah Khan (editor), Salma El Hajjami (editor), Mariya Ouai - Cognitive Machine Intelligence_ Applications, Challenges, and Related Technologies (Intelligent Data-Driven Systems and Artific (2024, CRC Press) -
373 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
76 pages
AI as instrument of knowledge extractivism
No ratings yet
AI as instrument of knowledge extractivism
18 pages
Deep Learning For Fractographic Classification in Metallic Materials Copy
No ratings yet
Deep Learning For Fractographic Classification in Metallic Materials Copy
24 pages
Deep_Neural_Networks_for_Enhanced_Security_Detecting_Metamorphic_Malware_in_IoT_Devices
No ratings yet
Deep_Neural_Networks_for_Enhanced_Security_Detecting_Metamorphic_Malware_in_IoT_Devices
13 pages
Deep Learning
100% (2)
Deep Learning
49 pages
ReID About Market
No ratings yet
ReID About Market
10 pages
Unit-3
No ratings yet
Unit-3
16 pages
ChampKit - A Framework For Rapid Evaluation of Deep Neural Networks For Patch-Based Histopathology Classification
No ratings yet
ChampKit - A Framework For Rapid Evaluation of Deep Neural Networks For Patch-Based Histopathology Classification
12 pages
DNN Merged Sugata
No ratings yet
DNN Merged Sugata
243 pages
DL Module 1 - CS-1 Fundamentals of Neural Network
No ratings yet
DL Module 1 - CS-1 Fundamentals of Neural Network
81 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
3 pages
Chapter-2(Deep Learning)
No ratings yet
Chapter-2(Deep Learning)
18 pages
DNN - 1 - M1 - Fundamentals of Neural Network
No ratings yet
DNN - 1 - M1 - Fundamentals of Neural Network
95 pages
Deep Learning Midsem Merged Previous Batch
No ratings yet
Deep Learning Midsem Merged Previous Batch
423 pages
Comparing Different Deep Learning Architectures For Classification of Chest Radiographs
No ratings yet
Comparing Different Deep Learning Architectures For Classification of Chest Radiographs
16 pages
Deep Learning Seminar
No ratings yet
Deep Learning Seminar
15 pages
SegNeXt Rethinking Convolutional Attention Design Segmentation
No ratings yet
SegNeXt Rethinking Convolutional Attention Design Segmentation
15 pages
BDA Unit 2
No ratings yet
BDA Unit 2
48 pages
Deep Learning UNIT 5
No ratings yet
Deep Learning UNIT 5
182 pages
GDG_SOF_WEEK_2[1]
No ratings yet
GDG_SOF_WEEK_2[1]
11 pages
Lect 12 -Deep Feed Forward NN- Review
No ratings yet
Lect 12 -Deep Feed Forward NN- Review
93 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
Assignment No 2 (Aleeza Anjum CS101)
No ratings yet
Assignment No 2 (Aleeza Anjum CS101)
60 pages
Components-Algorithms/: The Basic Architecture of Neural Networks: Single Computational Layer
No ratings yet
Components-Algorithms/: The Basic Architecture of Neural Networks: Single Computational Layer
65 pages
Learning Transferable Visual Models From Natural Language Supervision
No ratings yet
Learning Transferable Visual Models From Natural Language Supervision
48 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
104 pages
ShuffleNet An Extremely Efficient Convolutional Neural Network For Mobile-2018
No ratings yet
ShuffleNet An Extremely Efficient Convolutional Neural Network For Mobile-2018
9 pages
Aidl Unit III
No ratings yet
Aidl Unit III
79 pages
Week-12 - Introduction To ML-NN-CNN
No ratings yet
Week-12 - Introduction To ML-NN-CNN
45 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
108 pages
Deep Learning - Intro, Methods & Applications
100% (1)
Deep Learning - Intro, Methods & Applications
37 pages
Gorilla - Large Language Model Connected With Massive APIs
No ratings yet
Gorilla - Large Language Model Connected With Massive APIs
18 pages
Bassi, Attux - 2022
No ratings yet
Bassi, Attux - 2022
10 pages
midterm_study_guide_csci566
No ratings yet
midterm_study_guide_csci566
20 pages
DL_Unit II
No ratings yet
DL_Unit II
78 pages
6S191_MIT_DeepLearning_L1
No ratings yet
6S191_MIT_DeepLearning_L1
108 pages
8. Deep learning
No ratings yet
8. Deep learning
95 pages
j2
No ratings yet
j2
16 pages
EI 2023 AVM-125 Patrick - Müller
No ratings yet
EI 2023 AVM-125 Patrick - Müller
7 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
101 pages
Chapter 5 Final
No ratings yet
Chapter 5 Final
80 pages
Freshwater Fish Image Classifier
No ratings yet
Freshwater Fish Image Classifier
54 pages
Brain_Tumour_Detection
No ratings yet
Brain_Tumour_Detection
32 pages
Lecture8 DeepLearning
No ratings yet
Lecture8 DeepLearning
94 pages
Advancements_in_Generative_AI_A_Comprehensive_Review_of_GANs_GPT_Autoencoders_Diffusion_Model_and_Transformers
No ratings yet
Advancements_in_Generative_AI_A_Comprehensive_Review_of_GANs_GPT_Autoencoders_Diffusion_Model_and_Transformers
26 pages
DL Intro
No ratings yet
DL Intro
64 pages
ANN-CNN-RNN
No ratings yet
ANN-CNN-RNN
26 pages
DEEP LEARNING
No ratings yet
DEEP LEARNING
22 pages
AI Chapter 4
No ratings yet
AI Chapter 4
63 pages
A Comprehensive Study of Camouflaged Object Detection Using Deep Learning
No ratings yet
A Comprehensive Study of Camouflaged Object Detection Using Deep Learning
8 pages
NISS Deep Learning Tutorial
No ratings yet
NISS Deep Learning Tutorial
58 pages
Falls From Heights - A Computer Vision-Based Approach For Safety Harness Detection
No ratings yet
Falls From Heights - A Computer Vision-Based Approach For Safety Harness Detection
9 pages
Unit II
No ratings yet
Unit II
56 pages
Unit 1 (1)
No ratings yet
Unit 1 (1)
72 pages
Unit 1 Fundamentals of Deep Learning
No ratings yet
Unit 1 Fundamentals of Deep Learning
20 pages
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
26 pages
Deep Learning
No ratings yet
Deep Learning
49 pages
Fai Micro Report
No ratings yet
Fai Micro Report
25 pages
Deep Learning
No ratings yet
Deep Learning
20 pages
The Deep Learning Revolution: Introductory Overview Lecture
No ratings yet
The Deep Learning Revolution: Introductory Overview Lecture
35 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
deep learning
No ratings yet
deep learning
11 pages
Salman Technical Seminar
No ratings yet
Salman Technical Seminar
24 pages
Ai Gets Real Ai Gets Real: at Singapore'S Changi Airport at Singapore'S Changi Airport
No ratings yet
Ai Gets Real Ai Gets Real: at Singapore'S Changi Airport at Singapore'S Changi Airport
5 pages
Deep Learning Algorithms and Architectures
No ratings yet
Deep Learning Algorithms and Architectures
26 pages
Deep Learning Question
No ratings yet
Deep Learning Question
4 pages
Deep Learning (1)
No ratings yet
Deep Learning (1)
19 pages
tutorial 1,2
No ratings yet
tutorial 1,2
12 pages
Deep Learning Fundamentals
No ratings yet
Deep Learning Fundamentals
19 pages
shortnotedeeplearning (2)
No ratings yet
shortnotedeeplearning (2)
11 pages
ca3dl
No ratings yet
ca3dl
6 pages
Reviewer
No ratings yet
Reviewer
7 pages
Deep Learning concise notes
No ratings yet
Deep Learning concise notes
4 pages
Fruits & Vegetable Classification and Calories Measurement System
No ratings yet
Fruits & Vegetable Classification and Calories Measurement System
2 pages
Future Imaginings in Art and Artificial Intelligence
No ratings yet
Future Imaginings in Art and Artificial Intelligence
18 pages
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
From Everand
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
Nietsnie Trebla
No ratings yet
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
TensorFlow in 1 Day: Make your own Neural Network
From Everand
TensorFlow in 1 Day: Make your own Neural Network
Krishna Rungta
3.5/5 (10)
Artificial Intelligence Algorithms
From Everand
Artificial Intelligence Algorithms
akosnemeth
No ratings yet