0% found this document useful (0 votes)

7 views55 pages

01-Transformer Based NLP Applications

The document outlines a course on Transformer Based NLP applications, detailing the structure, grading policy, and key topics including word representation, language modeling, and recurrent neural networks (RNNs). It discusses the evolution of NLP from early explorations to modern deep learning techniques, emphasizing the importance of word embeddings and language models in understanding and generating natural language. The document also highlights advancements such as Long Short-Term Memory (LSTM) networks and bidirectional RNNs for improved contextual understanding.

Uploaded by

Wiem Ben Romdhane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views55 pages

01-Transformer Based NLP Applications

Uploaded by

Wiem Ben Romdhane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 55

Mastère TICV

Transformer Based NLP

Applications
Lecture 1

Moez BEN HAJ HMIDA

[email protected]
January, 2025
Course hours and grading policy
● 3 x 3 hours Lectures
● foundations of modern methods for DL applied to NLP
● grading: regular exam (60 %)

● 6 hours Hands-on Labs

● Nvidia certiﬁcation
● grading: online certiﬁcation exam (40 %)

2 January, 2025
Outline

1. Introduction to NLP

2. Word representation

3. Language modeling

4. LMs + RNNs

5. Better RNNs

[Content adapted from CS224N: Natural Language Processing with Deep Learning,
The Stanford NLP Group, Stanford]

3 January, 2025
NLP progress
● 1940–1969
● Early Explorations
● 1970–1992
● Hand-built systems, of increasing formalization
● 1993–2012
● Statistical or Probabilistic NLP
● then, more general Supervised ML
● 2013–now
● Deep Artiﬁcial Neural Networks
● Unsupervised
● Self-supervised
● Reinforcement Learning

4 January, 2025
Machine Translation (1950s)
● The origin of NLP/Computational Linguistics

5 January, 2025
Machine Translation (today)

6 January, 2025
Question-Answering
● BASEBALL (Green et al., 1961)
● handled questions about a database of baseball
Bibliographical and Historical Notes.
● LUNAR (Woods 1973)
● answered questions about the rocks brought back from
the moon by the Apollo program.
● ChatGPT

7 January, 2025
Main NLP applications
● Text Classiﬁcation
● Sentiment Analysis
● Named Entity Recognition (NER)
● Topic Modeling
● Text Summarization
● Text Generation
● Speech Recognition (ASR, Speech-to-Text STT)
● Text-to-Speech (TTS)

8 January, 2025
How machines understand us?
● Machines need to understand (at least partially) the
ambiguous, messy languages that humans use.

● Machines need to understand words meaning

● single word vs. context

● All is about words meaning!

9 January, 2025
Words meaning
● Deﬁnition: meaning (Webster dictionary)
● the idea that is represented by a word, phrase, etc.
● the idea that a person wants to express by using words,
signs, etc.
● the idea that is expressed in a work of writing, art, etc.

Word ⇔ Idea
Hi ⇔ Greeting
Hello
Howdy
Greetings

⇒ We need better techniques to represent words

10 January, 2025
Word representation

11 January, 2025
Words representation as discrete symbols
● Traditional NLP used one-hot vectors
AI is the future
The 0 0 1 0
future 0 0 0 1
is 0 1 0 0
AI 1 0 0 0

Vocabulary Size

● Vector dimension = number of words in vocabulary

● “one-hot” encoding

12 January, 2025
Words representation as discrete symbols
● Is “Hello” similar to “Hi” with one-hot vectors?

Hello word1 word2 Hi

Hello 1 0 0 0
Hi 0 0 0 1

● These two vectors are orthogonal ⇒ not similar

■ cosine similarity

⇒ Can we encode similarity in the vectors themselves?

13 January, 2025
Word Embedding
● Represent a word by considering the contexts in
which it is found.
● Construct dense vectors for each word, aiming for
similarity with vectors of words found in
comparable contexts.
● Similarity will be assessed through the scalar
product of the vectors.

0.186 0.228
0.685 0.815
−0.209 −0.106 dot(Hi, Hello) = 0.77
Hi = −0.107 Hello = −0.201
0.117 0.205
−0.352 −0.288

14 January, 2025
Word Embedding
● Word2vec [Mikolov, 2013]
● GLOVE [Pennington, 2014]

● Word embeddings provide a way to numerically

represent individual words, making it easier for
machine learning models to process and understand
natural language.
● Bad vs. Not Bad ?
● like vs. don’t like?

⇒ We need to represent a sequence of words.

15 January, 2025
Language Modeling

16 January, 2025
Language Modeling
● The task of predicting what word comes next.
The students came ____ early/late/sleepy/motivated
● P(wn+1|wn, …, w1),
● wn+1 a word from the vocabulary V.

17 January, 2025
Language Modeling
● Word embedding focuses on representing individual
words as vectors based on their contextual usage.

● While a LM predicts/generates the next word based on

statistical patterns observed in a given language.
● Recurrently, a LM can predict a whole text.

18 January, 2025
Language Modeling
● A Language Model (LM) assigns a probability to a
sequence of words.
● A LM assigns the following probability to the sequence
of words w1, …, wN :

19 January, 2025
n-gram Language Models
● For a sequence of tokens (e.g., words or subwords)
S=[w1, …, wN ], an n-gram is a subsequence of n
consecutive elements from S.
● Unigram (n=1): single items
■ "I love NLP" ⇒ "I","love","NLP"
● Bigram (n=2): pairs of consecutive items
■ "I love NLP" ⇒ "I love","love NLP"
● Trigram (n=3): triplets of consecutive items
■ "I love NLP" ⇒ "I love NLP"

20 January, 2025
n-gram Language Models
● Idea:
● collect statistics about the occurrences of diﬀerent
n-grams
● and, use these statistics to predict next word.
● Example: learn a 4-gram LM
● The book that the teacher recommended was _____
count(teacher recommended was w )
● P(w|teacher recommended was) = count(teacher recommended was )

● Problems
■ Zero occurrences→ Zero probability (dividend / divisor)
– sparsity problem
■ Increasing n → increases
– sparsity problem
– Model size
21 January, 2025
Neural LMs & RNNs

22 January, 2025
Fixed-window Neural LM

23 January, 2025
Fixed-window Neural LM
● Improvements over n-gram LM
● No sparsity problem
● Don’t need to store all observed n-grams awesome
amazing

the teacher recommended was

x1 x2 x3 x4

24 January, 2025
Fixed-window Neural LM
● Remaining problems
● Fixed window is too small
● Enlarging window enlarges 𝑊
● Window can never be large enough
1 2
● x and x are multiplied by completely
diﬀerent weights in 𝑊.
→ No symmetry in how the inputs are
processed.

the teacher recommended was

x1 x2 x3
x4

⇒ We need a neural architecture that can process any length input

25 January, 2025
RNN LM
Core idea: Apply the same weights 𝑊 recurrently

the teacher recommended was

x1 x2 x3 x4
26 January, 2025
RNN LM
● RNN Advantages
● Can process any length input
● Computation for step t can (in theory) use information
from many steps back
● Model size doesn’t increase for longer input context
● Same weights applied on every timestep, so there is
symmetry in how inputs are processed.

● RNN Disadvantages
● Recurrent computation is slow
● In practice, diﬃcult to access information from many
steps back

27 January, 2025
How to train an RNN LM?
● Build a large corpus of text (sequences of words)
● For every step t:
● input: feed sequence of words into RNN LM
● output: compute ŷ (t) the prob distribution of every word

● Loss function on step t is cross-entropy between

predicted prob dist ŷ (t), and the true next word y (t)
(encoding for x (t+1))

● Overall loss for entire training set

28 January, 2025
How to train an RNN LM?
-log(prob(“teacher”))

the teacher recommended was amazing

29 January, 2025
How to train an RNN LM?
● Computing loss and gradients across entire corpus
x(1), x(2), … , x(T) at once is too expensive

● needs to keep in memory J(t)(𝛉) for each t in T

● In practice,
● x(1), x(2), … , x(T) is a sentence (or a document)
● Use Stochastic Gradient Descent to compute loss and
gradients for small chunk of data, and update.
● Compute loss J(𝛉) for a batch of sentences, compute
gradients and update weights.
■ Repeat on a new batch of sentences.

30 January, 2025
How generate text with RNN LM?

31 January, 2025
Evaluating Language Models
● The standard evaluation metric for LMs is perplexity.
● Inverse probability of corpus, according to LM
● Normalized by number of words (1/T)

● Perplexity: inability to deal with or understand something.

⇒ Lower perplexity is better
32 January, 2025
RNNs: Vanishing Gradient
When these are small, the gradient signal gets smaller and
smaller as it backpropagates further

When the derivatives are small, the gradient signal gets smaller and smaller as
it backpropagates further.

33 January, 2025
RNNs: Vanishing Gradient

Gradient signal from far away is lost because it’s much smaller than gradient
signal from close-by.
⇒ Model weights are updated only with respect to near eﬀects, not long-term
eﬀects.

34 January, 2025
How to fix Vanishing Gradient problem?
● The main problem
● RNNs are unable to preserve information over many
timesteps.

● Memorize
● Add separate memory: LSTM

● And then
● Create more direct and linear pass-through connections
in model
■ Attention, residual connections, etc.

35 January, 2025
Recap
● Word representation
● Focus
■ Individual words and their semantic relationships.
● Applications
■ Similarity measurement (e.g., ﬁnding synonyms).
■ Feature inputs for downstream NLP tasks like sentiment
analysis or text classiﬁcation.
● Language modeling
● Focus
■ Sequences of words or sentences understanding and
prediction.
● Applications
■ Text generation (e.g., chatbots, auto-completion).
■ Machine translation.
■ Speech recognition.

36 January, 2025
Recap
● Language Model
● A system that predicts the next word

● Recurrent Neural Network

● A family of neural networks that:
■ Take sequential input of any length; apply the same
weights on each step
■ Can (optionally) produce output on each step

● Recurrent Neural Network ≠ Language Model

● Not all LMs use RNNs, most LMs are transformer-based.

● NLP tasks that involve generating text or estimating the

probability of text use LMs.
● Now everything in NLP is being rebuilt upon LMs.

37 January, 2025
Better RNNs

38 January, 2025
Long Short-Term Memory RNNs (LSTMs)
● On step t, there is a hidden state 𝒉(t) and a cell state 𝒄(t)
● Both are vectors length n
● The cell stores long-term information
● LSTM can read, erase, and write information from the cell
■ The cell is like a RAM in a computer

39 January, 2025
LSTMs
● The LSTM architecture solves the problem of preserve
information over many timesteps (long distance) within
an RNN
● e.g., to preserve indeﬁnitely information contained in a
cell
■ set the forget gate to 1
■ and, set the input gate to 0
● In practice
■ LSTM preserves info over 100 timesteps
■ Vanilla RNN preserves info over 7 timesteps

● LSTM is not the best architecture

● there are alternative ways of creating more direct and
linear pass-through connections in models for long
distance dependencies
●
40 January, 2025
Sentiment classification with RNNs

41 January, 2025
Sentiment classification with RNNs

42 January, 2025
Sentiment classification with RNNs
● The hidden state is a
contextual representation of
the word “terribly”.
● It contains information about
the left context
● “the movie was”
● Sentiment
● neg: terribly
● pos: terribly exciting

43 January, 2025
Bidirectional RNNs
This contextual
representation of
“terribly”
has both left and
right context

44 January, 2025
Bidirectional RNNs
● Bidirectional RNNs are only applicable if access to the
entire input sequence is available
● They are not applicable to Language Modeling, because
in LM only left context is available.

● With entire input sequence, bidirectionality is powerful.

● BERT (Bidirectional Encoder Representations from
Transformers) is a powerful pretrained contextual
representation system built on bidirectionality.

45 January, 2025
Multi-layer RNNs
● RNNs are already “deep” on one dimension (they unroll
over many timesteps)

● A multi-layer RNN applyies multiple RNNs

● “deep” in another dimension.

● This allows the network to compute more complex

representations
● The lower RNNs should compute lower-level features
● and, the higher RNNs should compute higher-level
features.

46 January, 2025
Multi-layer RNNs

47 January, 2025
Multi-layer RNNs
● RNNs perform better with more layers
● For Neural Machine Translation
● 2 to 4 layers is best for the encoder RNN
● and, 4 layers is best for the decoder RNN
● Britz et al, 2017. Massive Exploration of Neural Machine Translation Architectures.

● Usually, skip-connections/dense-connections are

needed to train deeper RNNs (e.g., 8 layers).

● Transformer-based networks (e.g., BERT) are usually

deeper, like 12 or 24 layers.
● they have a lot of skipping-like connections.

48 January, 2025
Statistical Machine Translation (SMT)
● Core idea: Learn a probabilistic model from data
● Example: French → English
● Find best English sentence y, given French sentence x

argmaxy P(x|y) P(y)

● P(x|y) → Translation Model

● Models how phrases should be translated (ﬁdelity).
● Learned from parallel data.

● P(y) → Language Model

● Models how to write good English (ﬂuency).
● Learned from monolingual data.

49 January, 2025
Neural Machine Translation
● Since 2014

● Neural Machine Translation (NMT) is a way to do

Machine Translation with a single end-to-end neural
network.

● The neural network architecture is called a

sequence-to-sequence model (seq2seq) and it involves
two RNNs.

50 January, 2025
Neural Machine Translation (NMT)

Encoder RNN produces an Decoder RNN is a Language Model that

encoding of the source generates target sentence, conditioned
sentence. on encoding.

51 January, 2025
Sequence-to-sequence is versatile
● The general notion here is an encoder-decoder model
● One NN takes input and produces a neural representation
● Another NN produces output based on that neural
representation
● If the input and output are sequences, we call it a
seq2seq model

● Many NLP tasks can be phrased as seq-2-seq

● Summarization (long text → short text)
● Dialogue (previous utterances → next utterance)
● Parsing (input text → output parse as sequence)
● Code generation (natural language → Python code)

52 January, 2025
Neural Machine Translation (NMT)
● The seq-2-seq model is an example of a Conditional
Language Model
● Language Model because the decoder is predicting the
next word of the target sentence y
● Conditional because its predictions are also conditioned
on the source sentence x

● Training an NMT system needs a big parallel corpus

53 January, 2025
Multi-layer deep encoder-decoder MT Net

Conditioning
Bottleneck!
54 January, 2025
Evaluate Machine Translation
Commonest way: BLEU (Bilingual Evaluation Understudy)
Papineni, et al. (2002). BLEU: a method for automatic evaluation of machine translation (PDF).
ACL-2002: 40th Annual meeting of the Association for Computational Linguistics. pp. 311–318.

● BLEU compares the machine-written translation to one

or several human-written translation(s), and computes a
similarity score based on:
● Geometric mean of n-gram precision (1, 2, 3, and 4-grams)
● Plus a penalty for too-short system translations

● BLEU is useful but imperfect

● There are many valid ways to translate a sentence
● Therefore, a good translation can get a poor BLEU score
because it has low n-gram overlap with the human
translation

55 January, 2025

LLMs and Retrieval-Augmented Generation (RAG)
No ratings yet
LLMs and Retrieval-Augmented Generation (RAG)
120 pages
Recurrent Neural Nets
No ratings yet
Recurrent Neural Nets
144 pages
RNN LSTM GRU Transformers
0% (1)
RNN LSTM GRU Transformers
123 pages
Approaches and Methods in Computational Linguistics
No ratings yet
Approaches and Methods in Computational Linguistics
18 pages
Question Bank
No ratings yet
Question Bank
13 pages
AI for Marketing (1)
No ratings yet
AI for Marketing (1)
198 pages
Lemmatization__Stemming__Presentation
No ratings yet
Lemmatization__Stemming__Presentation
11 pages
Sequence Models231205
No ratings yet
Sequence Models231205
72 pages
L6 - UCLxDeepMind DL2020 document of google
No ratings yet
L6 - UCLxDeepMind DL2020 document of google
141 pages
5a. Recurrent Neural Networks
No ratings yet
5a. Recurrent Neural Networks
45 pages
Hybrid Retrieval-Augmented Generation Approach For LLMs Query Response Enhancement
No ratings yet
Hybrid Retrieval-Augmented Generation Approach For LLMs Query Response Enhancement
5 pages
2AMM30+AY23 24+Text+Mining+Lecture+3
No ratings yet
2AMM30+AY23 24+Text+Mining+Lecture+3
88 pages
RNN-StannfordBased
No ratings yet
RNN-StannfordBased
102 pages
Large Language Models: Dr. Asgari, Dr. Rohban, Soleymani Fall 2023
No ratings yet
Large Language Models: Dr. Asgari, Dr. Rohban, Soleymani Fall 2023
53 pages
XCS224N_Module4_Slides
No ratings yet
XCS224N_Module4_Slides
91 pages
2022-foundations-tutorial3-sunwang-deeplearning4nlp
No ratings yet
2022-foundations-tutorial3-sunwang-deeplearning4nlp
103 pages
2 Generative models
No ratings yet
2 Generative models
60 pages
NLP_basics
No ratings yet
NLP_basics
119 pages
cs224n-spr2024-lecture06-fancy-rnn
No ratings yet
cs224n-spr2024-lecture06-fancy-rnn
56 pages
CS585 Lecture October15th
No ratings yet
CS585 Lecture October15th
162 pages
LLM_book_43-102
No ratings yet
LLM_book_43-102
60 pages
NLP-week7-rnnlstm
No ratings yet
NLP-week7-rnnlstm
66 pages
Language Modeling
No ratings yet
Language Modeling
88 pages
Hocken Maier 25
No ratings yet
Hocken Maier 25
46 pages
11-rnn
No ratings yet
11-rnn
32 pages
14-LookingForward
No ratings yet
14-LookingForward
48 pages
rnnjan25
No ratings yet
rnnjan25
59 pages
Brief Introduction to LLM
No ratings yet
Brief Introduction to LLM
69 pages
LSTM Lecture
No ratings yet
LSTM Lecture
163 pages
Intro DL 10 NLP
No ratings yet
Intro DL 10 NLP
99 pages
Cs224n 2025 Lecture05 Rnnlm
No ratings yet
Cs224n 2025 Lecture05 Rnnlm
54 pages
L5_CSE256_FA24_LM
No ratings yet
L5_CSE256_FA24_LM
65 pages
RNN
No ratings yet
RNN
53 pages
1719720399971
No ratings yet
1719720399971
51 pages
Deep Learning (MODULE-4)_RNN - NLP
No ratings yet
Deep Learning (MODULE-4)_RNN - NLP
52 pages
NLP Lecture 6
No ratings yet
NLP Lecture 6
57 pages
cb.sc.p2cse23010
No ratings yet
cb.sc.p2cse23010
30 pages
04 - RNNs
No ratings yet
04 - RNNs
37 pages
RNN-1
No ratings yet
RNN-1
50 pages
Cs224n 2023 Lecture05 RNNLM
No ratings yet
Cs224n 2023 Lecture05 RNNLM
68 pages
73. Li et al., 2024, Generative AI for Self-Adaptive Systems State of the Art and Research Roadmap
No ratings yet
73. Li et al., 2024, Generative AI for Self-Adaptive Systems State of the Art and Research Roadmap
26 pages
3 Sequence and Language Modeling
No ratings yet
3 Sequence and Language Modeling
56 pages
Time Series Rnn Lstm 1746197734
No ratings yet
Time Series Rnn Lstm 1746197734
25 pages
Cheatsheet Recurrent Neural Networks
No ratings yet
Cheatsheet Recurrent Neural Networks
5 pages
Recurrent Neural Networks: Amir H. Payberah
No ratings yet
Recurrent Neural Networks: Amir H. Payberah
142 pages
Recurrent Neural Networks cheatsheet
No ratings yet
Recurrent Neural Networks cheatsheet
44 pages
NLP NN Language Modeling Week5
No ratings yet
NLP NN Language Modeling Week5
33 pages
Eqps + Notes (TCS)
No ratings yet
Eqps + Notes (TCS)
92 pages
AN2DL_05_2324_Seq2SeqAndWordEmbedding
No ratings yet
AN2DL_05_2324_Seq2SeqAndWordEmbedding
42 pages
aM3RdIpjnYdPsGKF
No ratings yet
aM3RdIpjnYdPsGKF
20 pages
2309.08872
No ratings yet
2309.08872
17 pages
Christopher Manning Lecture 5: Language Models and Recurrent Neural Networks (Oh, and Finish Neural Dependency Parsing J)
No ratings yet
Christopher Manning Lecture 5: Language Models and Recurrent Neural Networks (Oh, and Finish Neural Dependency Parsing J)
66 pages
CT3 Set A
No ratings yet
CT3 Set A
3 pages
REPORT-MTechPESJul23BGrp2-3 (22-02-25)
No ratings yet
REPORT-MTechPESJul23BGrp2-3 (22-02-25)
15 pages
RAGTruth- A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models
No ratings yet
RAGTruth- A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models
16 pages
cs224n-2021-LSTM NN
No ratings yet
cs224n-2021-LSTM NN
59 pages
Day 1
No ratings yet
Day 1
32 pages
DL MODULE 5
No ratings yet
DL MODULE 5
10 pages
language models
No ratings yet
language models
11 pages
LLM and Generative AI Report - SDAIA
No ratings yet
LLM and Generative AI Report - SDAIA
23 pages
The Diverse Landscape of Large Language Models Deepsense Ai
No ratings yet
The Diverse Landscape of Large Language Models Deepsense Ai
16 pages
Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem
No ratings yet
Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem
11 pages
Deep Neural Network Language Models - W12-2703
No ratings yet
Deep Neural Network Language Models - W12-2703
9 pages
Scaling_Transformer_Paradigms__A_Robust_Framework_for_Hallucination_Detection_in_Resource_Limited_NLP_Systems
No ratings yet
Scaling_Transformer_Paradigms__A_Robust_Framework_for_Hallucination_Detection_in_Resource_Limited_NLP_Systems
10 pages
LLM Survey
100% (1)
LLM Survey
43 pages
Improving Rag Systems Via Sentence Clustering and Reordering
No ratings yet
Improving Rag Systems Via Sentence Clustering and Reordering
10 pages
NLP_end_sem
No ratings yet
NLP_end_sem
6 pages
RNN
No ratings yet
RNN
22 pages
Reinforcement Learning For Optimizing RAG For Domain Chatbots
No ratings yet
Reinforcement Learning For Optimizing RAG For Domain Chatbots
7 pages
CS224n: Natural Language Processing With Deep Learning
No ratings yet
CS224n: Natural Language Processing With Deep Learning
14 pages
NLP Project Final Report1
No ratings yet
NLP Project Final Report1
10 pages
Ground Truth for Grammatical Error Correction Metrics
No ratings yet
Ground Truth for Grammatical Error Correction Metrics
6 pages
Presentation TaCos
No ratings yet
Presentation TaCos
15 pages
DP Module 5
No ratings yet
DP Module 5
8 pages
UNIT 5 NLP Tools and Techniques
No ratings yet
UNIT 5 NLP Tools and Techniques
7 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
62 pages
A Survey On Neural Network Language Models
No ratings yet
A Survey On Neural Network Language Models
7 pages
Learning Sentiment-Specific Word Embedding For Twitter Sentiment Classification
No ratings yet
Learning Sentiment-Specific Word Embedding For Twitter Sentiment Classification
11 pages
Parallel Korpuslarni Yaratish Asoslari
No ratings yet
Parallel Korpuslarni Yaratish Asoslari
7 pages
Exam ml4nlp1 Hs21.example Solution
No ratings yet
Exam ml4nlp1 Hs21.example Solution
6 pages
Natual Language Processing
No ratings yet
Natual Language Processing
33 pages
FLAT Unit-1 2M Questions PDF
No ratings yet
FLAT Unit-1 2M Questions PDF
4 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part IV Spring 2015
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part IV Spring 2015
12 pages
Difference Between BART and BERT
No ratings yet
Difference Between BART and BERT
2 pages
Question Bank
No ratings yet
Question Bank
2 pages
Dan_Jurafsky
No ratings yet
Dan_Jurafsky
2 pages
Course Outline - Natural Language Processing-Prof
No ratings yet
Course Outline - Natural Language Processing-Prof
3 pages
Algebra - Task Sheets Gr. 3-5
From Everand
Algebra - Task Sheets Gr. 3-5
Nat Reed
No ratings yet
Programming Language Concepts: Improving your Software Development Skills
From Everand
Programming Language Concepts: Improving your Software Development Skills
Oliver Wegner
No ratings yet

Uploaded by

Uploaded by

Mastère TICV

Transformer Based NLP

Moez BEN HAJ HMIDA

● 6 hours Hands-on Labs

● Machines need to understand words meaning

● All is about words meaning!

⇒ We need better techniques to represent words

● Vector dimension = number of words in vocabulary

Hello word1 word2 Hi

● These two vectors are orthogonal ⇒ not similar

⇒ Can we encode similarity in the vectors themselves?

● Word embeddings provide a way to numerically

⇒ We need to represent a sequence of words.

● While a LM predicts/generates the next word based on

the teacher recommended was

the teacher recommended was

⇒ We need a neural architecture that can process any length input

the teacher recommended was

● Loss function on step t is cross-entropy between

● Overall loss for entire training set

the teacher recommended was amazing

● needs to keep in memory J(t)(𝛉) for each t in T

● Perplexity: inability to deal with or understand something.

● Recurrent Neural Network

● Recurrent Neural Network ≠ Language Model

● NLP tasks that involve generating text or estimating the

● LSTM is not the best architecture

● With entire input sequence, bidirectionality is powerful.

● A multi-layer RNN applyies multiple RNNs

● This allows the network to compute more complex

● Usually, skip-connections/dense-connections are

● Transformer-based networks (e.g., BERT) are usually

argmaxy P(x|y) P(y)

● P(x|y) → Translation Model

● P(y) → Language Model

● Neural Machine Translation (NMT) is a way to do

● The neural network architecture is called a

Encoder RNN produces an Decoder RNN is a Language Model that

● Many NLP tasks can be phrased as seq-2-seq

● Training an NMT system needs a big parallel corpus

● BLEU compares the machine-written translation to one

● BLEU is useful but imperfect

You might also like