Word 2 Vec
Word 2 Vec
Word embeddings
-4-
First, introduction to Autoencoders
Neuron unit
-5-
Tomado de https://www.jeremyjordan.me/autoencoders/
First, introduction to Autoencoders
-6-
Tomado de https://lilianweng.github.io/lil-log/2018/08/12/from-autoencoder-to-
beta-vae.html
First, introduction to Autoencoders
-7-
Tomado de https://lilianweng.github.io/lil-log/2018/08/12/from-autoencoder-to-
beta-vae.html
Word2vec training - predict the context of a word
▪ Skip-gram
– Given a word, predict the probability that other words appear in its context
Word2vec is a group of related models that are used to produce word embeddings.
▪ C-bow
– Continuous bag of words: Given a set of words, predict the probability that other
words appear in the same context
Word2vec is a group of related models that are used to produce word embeddings.
-8-
One-hot encoding
-9-
Skip-gram model
▪ Training input
– One-hot encoding representation of word wi
▪ Training output
– C One-hot encoding representations of words within window of word wi with
size C (context)
Word2vec is a group of related models that are used to produce word embeddings.
0
models
related
0
0 0
0 0
1
0
0
0
that
0
0 1
0
0
- 10 -
C-bow model
▪ Training input
– C One-hot encoding representations of words within window of word wi with
size C (context)
▪ Training output
– One-hot encoding representation of word wi
Word2vec is a group of related models that are used to produce word embeddings.
0
0 0
1
0
related 0
0 1
0
0 models
0 0
0
0
that 0
1
0
0
- 11 -
Word2Vec
▪ After training
models
related
0
0 0
0 0
1
0
0
0
that
0
0 1
0
0
- 12 -
Neuron unit
Word2Vec
models
0 0,39 0,74 0,46
0
0,71 0,32 0,87
1
0 0,23 0,8 0,85
0
0
0,42 0,38 0,72
0 0,94 0,64 0,68
0,76 0,24 0,83
0,41 0,99 0,12
- 13 -
Neuron unit
Word2Vec
models
0 0,39 0,74 0,46
0
0,71 0,32 0,87
1
0 0,23 0,8 0,85
0
0
0,42 0,38 0,72
0 0,94 0,64 0,68
0,76 0,24 0,83
0,41 0,99 0,12
- 14 -
Efficient training of word2vec
▪ Problem 1:
– Words that are too frequent in the corpus will be presented to the training
phase too often and other words too rarely
– Common words will be in many contexts associated to words that are not
semantically similar
▪ Solution: Subsampling
– P(Wi) : Probability of keeping the word I
▪ Z(Wi) : Fraction of total words in the corpus that are that word
- 16 -
Efficient training of word2vec
- 17 -
Efficient training of word2vec
- 18 -
Word2vec properties
- 19 -
NIPS 2013 - Tomas Mikolov - Google.
Word2vec properties
- 20 -
NIPS 2013 - Tomas Mikolov - Google.
Word2vec properties
- 21 -
NIPS 2013 - Tomas Mikolov - Google.
Word2vec properties
- 22 -
NIPS 2013 - Tomas Mikolov - Google.
Word2vec properties
- 23 -
NIPS 2013 - Tomas Mikolov - Google.
Word2vec properties
- 24 -
NIPS 2013 - Tomas Mikolov - Google.
Using pretrained word2vec models
▪ Gensim
▪ http://vectors.nlpl.eu/repository/
- 25 -
- 26 -
Other approaches
▪ Glove – SpaCy
▪ FastText – Facebook (2017)
– Skip-gram “Character” tokenizer
- 27 -
Current approach
▪ RNN’s, Transformers
– BERT – (Devlin et al , 2018)
▪ Attention
– GTP 3 – (Brown et al , 2020)
- 28 -
Use in recommendation
- 29 -