19CSE453 - Natural Language Processing: Part of Speech Tagging
19CSE453 - Natural Language Processing: Part of Speech Tagging
By
Ms. Kavitha C.R.
Dept. of Computer Science and Engineering
11-04-2022 2
Parts of Speech: How many?
11-04-2022 3
POS Examples
11-04-2022 4
POS Tagging: Choosing a tag set
11-04-2022 5
UPenn Treebank POS Tag set
11-04-2022 6
Using the UPenn tag set
11-04-2022 7
Why is POS tagging hard?
adjective
noun in singular
adverb
11-04-2022 8
Ambiguous word types in the Brown Corpus
11-04-2022 9
How bad is the ambiguity problem?
11-04-2022 10
Deciding the correct POS
RP – Particle
VB – verb base form
IN - preposition
11-04-2022 11
Relevant Knowledge for POS tagging
11-04-2022 12
POS Tagging: Two Approaches
11-04-2022 13
TBL Tagger - Transformation Based Taggers
MD-Modal
VBD- verb past
NN – noun in singular
VBN- verb past particle
11-04-2022 14
Probabilistic Tagging: Two different families of models
11-04-2022 15
Generative vs Conditional Models
11-04-2022 16
Generative vs Discriminative Models
11-04-2022 17
Probabilistic Tagging
By Bayesian Inference
11-04-2022 18
Further Simplification
11-04-2022 19
Computing The Probability Values
11-04-2022 20
Disambiguating “race”
11-04-2022 21
Disambiguating “race”
11-04-2022 22
What is this model?
11-04-2022 23
Hidden Markov Model (HMM)
11-04-2022 24
Markov Chain = First–Order Markov Model
11-04-2022 25
Markov Chain Transition Table
Sunny
Rainy
Foggy
11-04-2022 26
Using Markov Chain
11-04-2022 27
Hidden Markov Model (HMM)
11-04-2022 28
Example
Say that there are only three kinds of weather conditions, namely
• Rainy
• Sunny
• Cloudy
• How to make a prediction of the weather for today based on what the
weather has been for the past N days?
• Say we have a sequence. Something like this:
• Sunny, Rainy, Cloudy, Cloudy, Sunny, Sunny, Sunny, Rainy
• So, the weather for any given day can be in any of the three states
11-04-2022 29
Example
• Let’s say we decide to use a Markov Chain Model to solve this problem.
Now using the data that we have, we can construct the following state
diagram with the labelled probabilities.
The Markov property suggests that the distribution for a random variable in the future depends solely only
on its distribution in the current state, and none of the previous states have any impact on the future states.
11-04-2022 31
11-04-2022 32
Hidden Markov Model
11-04-2022 33
• We usually observe longer stretches of the child being awake and being asleep.
If Peter is awake now, the probability of him staying awake is higher than of him
going to sleep. Hence, the 0.6 and 0.4 in the diagram. P (awake | awake) = 0.6 and
P (asleep | awake) = 0.4
• The Transition probabilities matrix
• Before actually trying to solve the problem at hand using HMMs, let’s relate this
model to the task of Part of Speech Tagging
11-04-2022 34
HMMs for Part of Speech Tagging
• We know that to model any problem using a Hidden Markov Model we need a set
of observations and a set of possible states. The states in an HMM are hidden.
• In the part of speech tagging problem, the observations are the words themselves
in the given sequence.
• As for the states, which are hidden, these would be the POS tags for the words.
• The transition probabilities would be somewhat like P(VP | NP) that is, what is
the probability of the current word having a tag of Verb Phrase given that the
previous tag was a Noun Phrase.
• Emission probabilities would be P(john | NP) or P(will | VP) that is, what is the
probability that the word is, say, John given that the tag is a Noun Phrase.
• Note that this is just an informal modeling of the problem to provide a very basic
understanding of how the Part of Speech tagging problem can be modeled using
an HMM
11-04-2022 35
Why do we need HMM for POS Tagger?
• If you notice closely, we can have the words in a sentence as Observable States (given to us in
the data) but their POS Tags as Hidden states and hence we use HMM for estimating POS
tags.
Note: we call Observable states as ‘Observation’ & Hidden states as ‘States’.
• A Hidden Markov Model has the following components:
11-04-2022 36
HMM components example
11-04-2022 37
A sample HMM with both ‘A’ & ‘B’ matrix will look like this :
Here, the black, continuous arrows represent values of Transition matrix ‘A’ while the
dotted black arrow represents Emission Matrix ‘B’ for a system with Q: {MD, VB, NN}.
11-04-2022 38
Decoding using HMM
• Given an input as HMM (Transition Matrix, Emission Matrix) and a sequence of
observations O = o1, o2, …, oT (Words in sentences of a corpus), find the most
probable sequence of states Q = q1q2q3 …qT (POS Tags in our case)
The 2 major assumptions followed while decoding tag sequence using HMMs:
• The probability of a word appearing depends only on its own tag and is
independent of neighboring words and tags.
• The probability of a tag depends only on the previous tag(bigram
HMM) occurred rather than the entire previous tag sequence i.e. shows Markov
Property. Though we can be flexible with this
11-04-2022 39
• So the problem is to find out what is the sequence of that would be used for this
sentence.
• approach to this problem, given the state transition graph that already exists
• To simplify , find out what are all the possible part of speech text a word can take.
• Already discussed, that some words can take multiple part of speech tags, but
none of the words went beyond 7 part of speech tags. And mostly the words
were having if they were ambiguous they were having 2 or 3 part of space tags.
• 1st of all, need to set up a probability matrix called lattice where we
have columns as our observables (words of a sentence in the same sequence as
in sentence) & rows as hidden states(all possible POS Tags are known).
11-04-2022 40
11-04-2022 41
• There are many possibilities. Now each word can take any of these
possibilities.
• So now, there are 32 possibilities of the part of speech tag sequences.
• Among these , we have to find out which is the most likely sequence
of participation tags that is being used.
• Enumerate all the possibilities and for each possibility compute the
probability separately. So for 32 possibilities, compute 32 different
probability.
• That is one possibility, but this is not very efficient, and think of some
sentences which might have say 15-20 words. This will just go
exponentially with the number of words.
So this is not a good solution.
11-04-2022 42
• So, need to have a solution where it does not grow exponentially with
the size of the sentence. So what will be a good algorithm for doing
that?
• So ideally, want to come up with the actual part of speech tag
sequence like here it will be VBD TO VB DT and NN from all the
possibilities.
11-04-2022 43
Viterbi Decoding for HMM
• The decoding algorithm used for HMMs is called the Viterbi algorithm
penned down by the Founder of Qualcomm, an American MNC
11-04-2022 44
Finding the best path: Viterbi Algorithm
11-04-2022 45
Example
Suppose you want to use a HMM tagger to tag the phrase, “the light book”,
where we have the following probabilities:
P(the|Det) = 0.3, P(the|Noun) = 0.1, P(light|Noun) = 0.003, P(light|Adj) = 0.2,
P(light|Verb) = 0.06, P(book|Noun) = 0.03, P(book|Verb) = 0.001
11-04-2022 47
11-04-2022 48
11-04-2022 49
11-04-2022 50
11-04-2022 51
11-04-2022 52
11-04-2022 53
11-04-2022 54
• The most suitable POS tagging for the given phrase is
Determinant, Adjective and Noun respectively.
11-04-2022 55
11-04-2022 • P(tj/ti) = C(ti, tj ) / C(ti) = MLE 56
11-04-2022 57
Excercise
• Suppose you want to use a HMM tagger to tag the phrase, “the beautiful lady”, where
we have the following probabilities:
• P(the|Det) = 0.3, P(the|Noun) = 0.1, P(beautiful|Noun) = 0.01, P(beautiful|Adj) =0.07,
P(beautiful|Verb) = 0.001, P(lady|Noun) = 0.08, P(lady|Verb) = 0.01 ,
• P(Verb|Det) = 0.00001, P(Noun|Det) = 0.5, P(Adj|Det) = 0.4,
• P(Noun|Noun) =0.2, P(Adj|Noun) = 0.002, P(Noun|Adj) = 0.2,
• P(Noun|Verb) = 0.3, P(Verb|Noun) = 0.3, P(Verb|Adj) = 0.001,
• P(Verb|Verb) = 0.1
• Work out in detail the steps of the Viterbi algorithm. Assume all other conditional
probabilities, not mentioned to be zero. Also, assume that all tags have the same
probabilities to appear in the beginning of a sentence.
11-04-2022 58
Thank You