0% found this document useful (0 votes)
82 views49 pages

8-Deep Learning For NLP

This document provides an overview of deep learning for natural language processing. It discusses how deep learning algorithms are based on artificial neural networks and how they have been applied successfully in domains like text, image, and speech processing. The document also describes the basic structure of neural networks, including the input, hidden, and output layers. It explains key concepts like activation functions, backpropagation, and different types of neural networks like convolutional and recurrent neural networks.

Uploaded by

Getnete degemu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views49 pages

8-Deep Learning For NLP

This document provides an overview of deep learning for natural language processing. It discusses how deep learning algorithms are based on artificial neural networks and how they have been applied successfully in domains like text, image, and speech processing. The document also describes the basic structure of neural networks, including the input, hidden, and output layers. It explains key concepts like activation functions, backpropagation, and different types of neural networks like convolutional and recurrent neural networks.

Uploaded by

Getnete degemu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 49

Chapter 6 : Deep Learning for

Natural Language
Adama Science and Technology University
School of Electrical Engineering and Computing
Department of CSE
Dr. Mesfin Abebe Haile (2022)
Outline

 Introduction
 Why deep learning for NLP
 Overview of the NLP
 Basic Structure of NN
 Different types of layers
 Activation function
 Types of Neural Network
 Convolutional NN
 Recurrent NN

01/02/23 2
Deep Learning for Natural
Language
 Deep learning is an extended field of machine learning that has proven to
be highly useful in the domains of text, image, and speech, primarily.
 The collection of algorithms implemented under deep learning have
similarities with the relationship between stimuli and neurons in the
human brain.
 Deep learning has extensive applications in computer vision, language
translation, speech recognition, image generation, and so forth.
 These sets of algorithms are simple enough to learn in both a
supervised and unsupervised fashion.

01/02/23 3
Deep Learning for Natural
Language
 A majority of deep learning algorithms are based on the concept of
artificial neural networks, and the training of such algorithms in
today’s world has been made easier with the availability of abundant
data and sufficient computation resources.
 With additional data, the performance of deep learning models just
keep on improving.

 The term deep in deep learning refers to the depth of the artificial
neural network architecture, and learning stands for learning through
the artificial neural network itself.

01/02/23 4
Deep Learning for Natural
Language
 The figure shown below is an accurate representation of
the difference between a deep and a shallow network and why the term deep learning gained currency.

01/02/23 5
Deep Learning for Natural
Language
 Representation of deep and shallow networks.

01/02/23 6
Deep Learning for Natural
Language
 Deep neural networks are capable of discovering latent
structures (or feature learning) from unlabeled and
unstructured data, such as images (pixel data), documents (text
data), or files (audio, video data).
 What differentiates any deep neural network from an ordinary
artificial neural network is the way we use backpropagation.
 In an ordinary artificial neural network, backpropagation trains
later (or end) layers more efficiently than it trains initial (or
former) layers.
 Thus, as we travel back into the network, errors become
smaller and more diffused.
01/02/23 7
Deep Learning for Natural
Language
 How Deep is “Deep”?
 A deep neural network is simply a feed forward neural network
with multiple hidden layers.
 If there are many layers in the network, then we say that the
network is deep.

 Neural networks are a biologically inspired paradigm that


enables a computer to learn human faculties from
observational data.

01/02/23 8
Deep Learning for Natural
Language
 Multiple open source platforms and libraries for deep learning.

01/02/23 9
Basic Structure of Neural Network

 The basic principle behind a neural network is a collection of


basic elements, artificial neuron or perceptron, that were first
developed in the 1950s by Frank Rosenblatt.

 They take several binary inputs, x1, x2, ..., xN and produce a
single binary output if the sum is greater than the activation
potential.
 The neuron is said to “fire” whenever activation potential is
exceeded and behaves as a step function.

01/02/23 10
Basic Structure of Neural Network

 Biological Analogy:
 ANN is a computational model that simulate some properties of
the human brain.

Biological Neural Network Artificial Neural Networks


01/02/23 11
Basic Structure of Neural Network

 The neurons that fire pass along the signal to other neurons connected to their
dendrites, which, in turn, will fire, if the activation potential is exceeded, thus
producing a cascading effect.

01/02/23 12
Basic Structure of Neural Network

 As not all inputs have the same emphasis, weights are attached to
each of the inputs, xi to allow the model to assign more importance
to some inputs.

 Thus, output is 1, if the weighted sum is greater than activation


potential or bias, i.e.,

01/02/23 13
Basic Structure of Neural Network

 Multilayer perceptrons (MLPs) belong to the category of


feedforward neural networks and are made up of three types of
layers: an input layer, one or more hidden layers, and a final
output layer.
 A normal MLP has the following properties:
 Hidden layers with any number of neurons,
 An input layer using linear functions,
 Hidden layer(s) using an activation function, such as sigmoid,
 An activation function giving any number of outputs,
 Proper established connections between the input layer, hidden
layer(s), and output layer.
01/02/23 14
Basic Structure of Neural Network

 MLPs are also known as universal approximators, as they can


find the relationship between the input values and the targets,
by using a sufficient number of neurons in the hidden layer.

 This doesn’t even require a significant amount of prior


information about mapping between input and output values.
 Often, with the given degree of freedom to an MLP, it can
outperform the basic MLP network, by introducing more
hidden layers, with fewer neurons in each of the hidden layers
and optimum weights.

01/02/23 15
Basic Structure of Neural Network

 The following are a few of the features of network architecture


that have a direct impact on its performance.
 Hidden layers: These contribute to the generalization factor of the
network. In most cases, a single layer is sufficient to encompass
the approximation of any desired function, supported with a
sufficient number of neurons.

 Hidden neurons: The number of neurons present across the hidden


layer(s) that can be selected by using any kind of formulation.

01/02/23 16
Basic Structure of Neural Network

 The following are a few of the features of network architecture


that have a direct impact on its performance.
 Output nodes: The count of output nodes is usually equal to the
number of classes we want to classify the target value.

 Activation functions: These are applied on the inputs of


individual nodes.

01/02/23 17
Basic Structure of Neural Network

 In practice, this simple form is difficult, owing to the abrupt


nature of the step function.
 So, a modified form was created to behave more predictably,
i.e., small changes in weights and bias cause only a small
change in output. There are two main modifications.

01/02/23 18
Basic Structure of Neural Network

 The inputs can take on any value between 0 and 1, instead of


being binary.
 To make the output behave more smoothly for given inputs, x1, x2,
…, xN, and weights. w1, w2, …, wN, and bias, b, use the following
sigmoid function.

01/02/23 19
Basic Structure of Neural Network

 Other activation functions, which can be better choices than


sigmoid, for deep networks.
Hyperbolic Tangent Function (Pronounced “tanch”)

01/02/23 20
Basic Structure of Neural Network

 In addition to the usual sigmoid function, other nonlinearities


that are more frequently used include the following.
 ReLU: Rectified linear unit. This keeps the activation guarded at
zero. It is computed using the following function:
 The graph of the ReLU function, with ‘0’ value for all x <= 0,
and with a linear slope of 1 for all x > 0:

01/02/23 21
Basic Structure of Neural Network

 ReLUs quite often face the issue of dying, especially when the
learning rate is set to a higher value, as this triggers weight
updating that doesn’t allow the activation of the specific
neurons, thereby making the gradient of that neuron forever
zero.
 Another risk offered by ReLU is the explosion of the activation
function, as the input value, xj, is itself the output here.

01/02/23 22
Basic Structure of Neural Network

 Although ReLU offers other benefits as well, such as the


introduction of sparsity in cases where xj is below 0, leading to
sparse representations, and as the gradient returned in cases
where ReLU is constant, it results in faster learning,
accompanied by the reduced likelihood of the gradient vanishing.

 LReLUs (Leaky ReLUs): These mitigate the issue of dying


ReLUs by introducing a marginally reduced slope (~0.01) for
values of x less than 0.
 LReLUs do offer successful scenarios, although not always.

01/02/23 23
Basic Structure of Neural Network

 ELU (Exponential Linear Unit): These offer negative values that


push the mean unit activations closer to zero, thereby speeding
the learning process, by moving the nearby gradient to the unit
natural gradient.
 Softmax: Also referred to as a normalized exponential function,
this transforms a set of given real values in the range of (0,1),
such that the combined sum is 1.
01/02/23 24
Basic Structure of Neural Network

 As in the mammalian brain, individual neurons are organized


in layers, with connections within a layer and to the next layer,
creating an ANN, or artificial neural network or multilayer
perceptron (MLP).

 The layers between input and output are referred to as hidden


layers, and the density and type of connections between layers
is the configuration.
 For example, a fully connected configuration has all the
neurons of layer L connected to those of L + 1.

01/02/23 25
Basic Structure of Neural Network

 An illustration of two hidden layers with dense connections.

01/02/23 26
Types of Neural Network

 Feedforward neural networks constitute the basic units of the


neural network family.

 Data movement in any feedforward neural network is from the


input layer to output layer, via present hidden layers,
restricting any kind of loops.
 Output from one layer serves as input to the next layer, with
restrictions on any kind of loops in the network architecture.

01/02/23 27
Types of Neural Network

Inputs
.6 Output
Age 34 .
.2  4 0.6
.1 .5
Gender 2 .3 .2
.8

.7  “Probability of
4 .2 being Alive”
Stage

Independent Dependent
Weights Hidden Weights
variables variable
Layer
Prediction
01/02/23 28
Types of Neural Network

Inputs
.6 Output
Age 34
.5 0.6
.1
Gender 2 
.7 .8 “Probability of
beingAlive”
Stage 4

Independent Dependent
Weights Hidden Weights
variables variable
Layer
01/02/23
Prediction 29
Types of Neural Network

Inputs
Output
Age 34
.2 .5
0.6
Gender 2 .3

“Probability of
.8
beingAlive”
Stage 4 .2

Dependent
Independent Weights Hidde Weights variable
variables Layer
01/02/23
Prediction 30
Types of Neural Network

Inputs
.6 Output
Age 34
.2 .5
.1 0.6
Gender 1 .3 
.7 “Probability of
.8
beingAlive”
Stage 4 .2

Independent Dependent
Weights Hidden Weights
variables variable
Layer
Prediction
01/02/23 31
Types of Neural Network

 So far (feed forward, back propagation), the structure of our


neural network treats all inputs interchangeably.
 No relationships between the individual inputs.
 Just an ordered set of variables.

 We want to incorporate domain knowledge into the architecture


of a Neural Network.

01/02/23 32
Types of Neural Network

 Image data has important structures, such as;


”Topology” of pixels,
Issues of lighting and contrast,
Knowledge of human visual system,
Nearby pixels tend to have similar values,
Edges and shapes,
Scale Invariance – objects may appear at different sizes in the
image.
 From a dimensionality standpoint, taking advantage of these
structures means much fewer parameters! (CNN)
Convoluted version (Convolution Neural Network)
01/02/23 33
Convolution Neural Network

 Convolutional neural networks are well adapted for image


recognition and handwriting recognition.
 Their structure is based on sampling a window or portion of an
image, detecting its features, and then using the features to
build a representation.

 This leads to the use of several layers, thus these models were
the first deep learning models.

01/02/23 34
Convolution Neural Network

 A CNN is a neural network with some convolutional layers (and


some other layers).
 A convolutional layer has a number of filters that does
convolutional operation.

01/02/23 35
Convolution Neural Network

 From a memory and capacity standpoint the CNN is not much


bigger than a regular two layer network.

 At runtime the convolution operations are computationally


expensive and take up about 67% of the time.
 CNN’s are about 3X slower than their fully connected
equivalents (size-wise).

01/02/23 36
Motivation of Sequential Model

01/02/23 37
Recurrent Neural Network

 Recurrent neural networks (RNNs) are used when a data


pattern changes over time.
 RNNs can be assumed as unrolled over time.
 An RNN applies the same layer to the input at each time step,
using the output (i.e., the state of previous time steps as inputs).

 RNNs have feedback loops in which the output from the


previous firing or time index T is fed as one of the inputs at
time index T + 1.
 There might be cases in which the output of the neuron is fed to
itself as input.
01/02/23 38
Recurrent Neural Network

 These are well-suited for applications involving sequences,


they are widely used in problems related to videos, which are a
time sequence of images, and for translation purposes, wherein
understanding the next word is based on the context of the
previous text.
 Following are various types of RNNs:
 Encoding recurrent neural networks: this set of RNNs enables
the network to take an input of the sequence form.

01/02/23 39
Recurrent Neural Network

 Following are various types of RNNs:


 Generating recurrent neural networks: Such networks basically
output a sequence of numbers or values, like words in a sentence.

01/02/23 40
Recurrent Neural Network

 Following are various types of RNNs:


 General recurrent neural networks: These networks are a
combination of the preceding two types of RNNs.
 General RNNs are used to generate sequences and, thus, are
widely used in NLG (natural language generation) tasks.

01/02/23 41
Recurrent Neural Network

 With images, we forced them into a specific input dimension.


 Not obvious how to do this with text.
 We will use a new structure of network called a “Recurrent
Neural Network”.
Issue: Variable length sequences of words.

 Want to do better than “bag of words” implementations.


 Ideally, each word is processed or understood in the
appropriate context.
 Need to have some notion of “context”.
01/02/23 42
Recurrent Neural Network

 RNN focused on text/words as application.


 But, RNNs can be used for other sequential data.
Time-Series Data,
Speech Recognition,
Sensor Data,
Genome Sequences.

 Nature of state transition means it is hard to keep information


from distant past in current memory without reinforcement.

01/02/23 43
Recurrent Neural Network

 Issue: Standard RNNs have poor memory.


Transition Matrix necessarily weakens signal.
Need a structure that can leave some dimensions unchanged
over many steps.
This is the problem addressed by so-called Long-Short Term
Memory RNNs (LSTM).

Define a more complicated update mechanism for the


changing of the internal state.
By default, LSTMs remember the information from the last
step.
01/02/23 44
Recurrent Neural Network

 There are many different “flavors” of LSTM:


Gated Recurrent Unit (GRU)
Depth-Gated RNN

 LSTMs have considerably more parameters than plain RNNs.

 Most of the big performance improvements in NLP have come


from LSTMs, not plain RNN.

01/02/23 45
Question & Answer

01/02/23 46
Thank You !!!

01/02/23 47
Individual Assignment - Four

 Write a short note on the following topics (Not more than 8


pages):
 RNNs,
 LSTM,
 Gated Recurrent Unit (GRU),
 Depth-Gated RNN,
 Bidirectional Recurrent Neural Network.

01/02/23 48
Individual Assignment - Four

 Explain the following terms by writing few paragraphs for each


of them (Not more than 8 pages) :
 GPT 2 and GPT 3,
 BERT,
 Hugging Face,
 Transformers,
 Attention Models,
 Word Embeddings,
 Word2vec,
 Gensim.
01/02/23 49

You might also like