0% found this document useful (0 votes)
53 views

Word 2 Vec

Word2vec is a group of related models that are used to produce word embeddings by mapping words or phrases to vectors of real numbers. Specifically, the document discusses Word2vec's skip-gram and continuous bag-of-words (CBOW) models which are trained to efficiently learn high-quality word vectors from large datasets in less than a day. The models predict probabilities of words appearing in the same context to produce word embeddings that capture syntactic and semantic word similarities.

Uploaded by

Henry Fabra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

Word 2 Vec

Word2vec is a group of related models that are used to produce word embeddings by mapping words or phrases to vectors of real numbers. Specifically, the document discusses Word2vec's skip-gram and continuous bag-of-words (CBOW) models which are trained to efficiently learn high-quality word vectors from large datasets in less than a day. The models predict probabilities of words appearing in the same context to produce word embeddings that capture syntactic and semantic word similarities.

Uploaded by

Henry Fabra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

-1-

Word embeddings

▪ Map words or phrases to vectors of real numbers

Adaptado de: Recent-developments of Content-Based RecSys -2-


(Geminis, Lops, Musto, Narducci and Semerarno, 2017)
Word embeddings

▪ Map words or phrases to vectors of real numbers

Adaptado de: Recent-developments of Content-Based RecSys -3-


(Geminis, Lops, Musto, Narducci and Semerarno, 2017)
Word2vec - word embeddings

▪ Efficient Estimation of Word Representations in Vector Space (2013)


Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean
▪ Propose 2 models for efficient computation of vector representations
from large datasets
▪ The quality of these representations is measured in a word similarity task
▪ We observe large improvements in accuracy at much lower computational
cost, i.e. it takes less than a day to learn high quality word vectors from a
1.6 billion words data set.
▪ Furthermore, we show that these vectors provide state-of-the-art
performance on our test set for measuring syntactic and semantic word
similarities

-4-
First, introduction to Autoencoders

▪ Artificial neural network used for dimensionality reduction

Neuron unit

-5-
Tomado de https://www.jeremyjordan.me/autoencoders/
First, introduction to Autoencoders

▪ Artificial neural network used for dimensionality reduction

-6-
Tomado de https://lilianweng.github.io/lil-log/2018/08/12/from-autoencoder-to-
beta-vae.html
First, introduction to Autoencoders

▪ Artificial neural network used for dimensionality reduction

-7-
Tomado de https://lilianweng.github.io/lil-log/2018/08/12/from-autoencoder-to-
beta-vae.html
Word2vec training - predict the context of a word

▪ Skip-gram
– Given a word, predict the probability that other words appear in its context

Word2vec is a group of related models that are used to produce word embeddings.

▪ C-bow
– Continuous bag of words: Given a set of words, predict the probability that other
words appear in the same context

Word2vec is a group of related models that are used to produce word embeddings.

-8-
One-hot encoding

▪ Representation of a vocabulary on one-hot encoding


a able about zebra zinc zoo
1 0 0 0 0 0
0 1 0 0 0 0
0 0 1 0 0 0
0 0 0 ….. 0 0 0
0 0 0 1 0 0
0 0 0 0 1 0
0 0 0 0 0 1

-9-
Skip-gram model

▪ Training input
– One-hot encoding representation of word wi

▪ Training output
– C One-hot encoding representations of words within window of word wi with
size C (context)
Word2vec is a group of related models that are used to produce word embeddings.
0

models
related
0

0 0

0 0

1
0
0

0
that
0

0 1

0
0

- 10 -
C-bow model

▪ Training input
– C One-hot encoding representations of words within window of word wi with
size C (context)
▪ Training output
– One-hot encoding representation of word wi
Word2vec is a group of related models that are used to produce word embeddings.
0

0 0
1
0
related 0

0 1
0
0 models
0 0
0
0
that 0

1
0
0

- 11 -
Word2Vec

▪ After training

models
related
0

0 0

0 0

1
0
0

0
that
0

0 1

0
0

- 12 -
Neuron unit
Word2Vec

▪ After training (skip-gram)

models
0 0,39 0,74 0,46
0
0,71 0,32 0,87
1
0 0,23 0,8 0,85
0
0
0,42 0,38 0,72
0 0,94 0,64 0,68
0,76 0,24 0,83
0,41 0,99 0,12

- 13 -
Neuron unit
Word2Vec

▪ After training (skip-gram)

models
0 0,39 0,74 0,46
0
0,71 0,32 0,87
1
0 0,23 0,8 0,85
0
0
0,42 0,38 0,72
0 0,94 0,64 0,68
0,76 0,24 0,83
0,41 0,99 0,12

- 14 -
Efficient training of word2vec

▪ Problem 1:
– Words that are too frequent in the corpus will be presented to the training
phase too often and other words too rarely
– Common words will be in many contexts associated to words that are not
semantically similar
▪ Solution: Subsampling
– P(Wi) : Probability of keeping the word I
▪ Z(Wi) : Fraction of total words in the corpus that are that word

McCormick, C. (2017, January 11). Word2Vec Tutorial Part 2 - Negative Sampling. - 15 -


Retrieved from http://www.mccormickml.com
Efficient training of word2vec

▪ Problem 2: Backpropagation is slow


0,24 0
0,44 0
0,14 0
models related
0,26 1
0 0,59 0
0 0,47 0
1 0,26 0
0
0,43 0
0
0,18 0
0 0,47 0
0 0,22 1
that
0,37 0
0,14 0
0,58 0

- 16 -
Efficient training of word2vec

▪ Problem 2: Backpropagation is slow


0,24 0,24 0
0,44 0,44 0
0,14 0,14 0
models related
0,26 -0,74 1
0 0,59 0,59 0
0 0,47 0,47 0
1 0,26 0,26 0
0
0,43 0,43 0
0 0,18
0,18 0
0 0,47 0,47 0
0 0,22 -0,78 1
that
0,37 0,37 0
0,14 0,14 0
0,58 0,58 0

- 17 -
Efficient training of word2vec

▪ Problem 2: Backpropagation is slow


– Solution: Negative sampling: Only update 5 to 20 negative examples
– More frequent words are more likely to be selected
0,24 0 0
0,44 0 0
0,14 0,14 0
models related
0,26 -0,74 1
0 0,59 0,59 0
0 0,47 0,47 0
1 0,26 0 0
0
0,43 0 0
0 0
0,18 0
0 0,47
0,47 0
0 0,22 -0,78 1
that
0,37 0,37 0
0,14 0,14 0
0,58 0 0

- 18 -
Word2vec properties

- 19 -
NIPS 2013 - Tomas Mikolov - Google.
Word2vec properties

- 20 -
NIPS 2013 - Tomas Mikolov - Google.
Word2vec properties

- 21 -
NIPS 2013 - Tomas Mikolov - Google.
Word2vec properties

- 22 -
NIPS 2013 - Tomas Mikolov - Google.
Word2vec properties

- 23 -
NIPS 2013 - Tomas Mikolov - Google.
Word2vec properties

- 24 -
NIPS 2013 - Tomas Mikolov - Google.
Using pretrained word2vec models

▪ Gensim
▪ http://vectors.nlpl.eu/repository/

- 25 -
- 26 -
Other approaches

▪ doc2vec (Le and Mikolov, 2014)

▪ Glove – SpaCy
▪ FastText – Facebook (2017)
– Skip-gram “Character” tokenizer

- 27 -
Current approach

▪ RNN’s, Transformers
– BERT – (Devlin et al , 2018)

▪ Attention
– GTP 3 – (Brown et al , 2020)

- 28 -
Use in recommendation

▪ Sentiment analysis classifier


▪ Semantic relatedness between bag-of words or documents

- 29 -

You might also like