0% found this document useful (0 votes)

163 views

19CSE453 - Natural Language Processing: Part of Speech Tagging

The document discusses part-of-speech (POS) tagging. It covers topics like the different parts of speech, common POS tag sets like the UPenn tag set, challenges in POS tagging like ambiguity, and approaches to POS tagging including rule-based tagging and probabilistic tagging using hidden Markov models. Probabilistic taggers assign POS tags by calculating probabilities based on tag transitions and word emissions. Hidden Markov models are well-suited for POS tagging as they allow modeling the hidden states of POS tags along with the observable words.

Uploaded by

girish reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

163 views

19CSE453 - Natural Language Processing: Part of Speech Tagging

Uploaded by

girish reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 59

19CSE453 – Natural Language Processing

Part of Speech tagging

By
Ms. Kavitha C.R.
Dept. of Computer Science and Engineering

Amrita School of Engineering, Bengaluru

Amrita Vishwa Vidyapeetham
Part-of-Speech (POS) tagging

11-04-2022 2
Parts of Speech: How many?

11-04-2022 3
POS Examples

11-04-2022 4
POS Tagging: Choosing a tag set

11-04-2022 5
UPenn Treebank POS Tag set

11-04-2022 6
Using the UPenn tag set

DT–Determiner, JJ-Adjective, VBD-verb past, IN-preposition, NN-noun singular or mass, NNS-plural

11-04-2022 7
Why is POS tagging hard?

adjective

noun in singular

adverb

Verb base form

11-04-2022 8
Ambiguous word types in the Brown Corpus

11-04-2022 9
How bad is the ambiguity problem?

11-04-2022 10
Deciding the correct POS

NNP-proper noun singular

PRP-personal pronoun
RB-adverb
TO-to
VBD-verb past
VBG-verb gerund or present participle
VBN-verb past participle
VBZ-verb 3 sing present
CD-cardinal no
NN-noun singular or mass

RP – Particle
VB – verb base form
IN - preposition

11-04-2022 11
Relevant Knowledge for POS tagging

11-04-2022 12
POS Tagging: Two Approaches

Transformation Based Taggers

11-04-2022 13
TBL Tagger - Transformation Based Taggers

MD-Modal
VBD- verb past

NN – noun in singular
VBN- verb past particle

11-04-2022 14
Probabilistic Tagging: Two different families of models

11-04-2022 15
Generative vs Conditional Models

11-04-2022 16
Generative vs Discriminative Models

11-04-2022 17
Probabilistic Tagging

By Bayesian Inference

11-04-2022 18
Further Simplification

11-04-2022 19
Computing The Probability Values

NN- noun singular

VBZ-verb 3 sing present

11-04-2022 20
Disambiguating “race”

11-04-2022 21
Disambiguating “race”

11-04-2022 22
What is this model?

11-04-2022 23
Hidden Markov Model (HMM)

11-04-2022 24
Markov Chain = First–Order Markov Model

11-04-2022 25
Markov Chain Transition Table

Sunny

Rainy

Foggy

11-04-2022 26
Using Markov Chain

11-04-2022 27
Hidden Markov Model (HMM)

• The next level of complexity that can be introduced into a stochastic

tagger combines the previous two approaches, using both tag
sequence probabilities and word frequency measurements. This is
known as the Hidden Markov Model (HMM).
• Before proceeding with what is a Hidden Markov Model, let us first
look at what is a Markov Model. That will better help understand the
meaning of the term Hidden in HMMs

11-04-2022 28
Example

Say that there are only three kinds of weather conditions, namely
• Rainy
• Sunny
• Cloudy
• How to make a prediction of the weather for today based on what the
weather has been for the past N days?
• Say we have a sequence. Something like this:
• Sunny, Rainy, Cloudy, Cloudy, Sunny, Sunny, Sunny, Rainy
• So, the weather for any given day can be in any of the three states

11-04-2022 29
Example

• Let’s say we decide to use a Markov Chain Model to solve this problem.
Now using the data that we have, we can construct the following state
diagram with the labelled probabilities.

In order to compute the probability of today’s weather given N previous

observations, we will use the Markovian Property.
11-04-2022 30
Markov Chain is essentially the simplest known Markov model, that it obeys the Markov property.

The Markov property suggests that the distribution for a random variable in the future depends solely only
on its distribution in the current state, and none of the previous states have any impact on the future states.

11-04-2022 31
11-04-2022 32
Hidden Markov Model

There are two kinds of probabilities from the state

diagram.
• One is the emission probabilities, which
represent the probabilities of making certain
observations given a particular state.
For example, we have P(noise | awake) = 0.5 .
This is an emission probability.
• The other ones is transition probabilities, which
represent the probability of transitioning to
another state given a particular state.
For example, we have P(asleep | awake) = 0.4 .
This is a transition probability.

11-04-2022 33
• We usually observe longer stretches of the child being awake and being asleep.
If Peter is awake now, the probability of him staying awake is higher than of him
going to sleep. Hence, the 0.6 and 0.4 in the diagram. P (awake | awake) = 0.6 and
P (asleep | awake) = 0.4
• The Transition probabilities matrix

• The Emission probabilities matrix

• Before actually trying to solve the problem at hand using HMMs, let’s relate this
model to the task of Part of Speech Tagging

11-04-2022 34
HMMs for Part of Speech Tagging

• We know that to model any problem using a Hidden Markov Model we need a set
of observations and a set of possible states. The states in an HMM are hidden.
• In the part of speech tagging problem, the observations are the words themselves
in the given sequence.
• As for the states, which are hidden, these would be the POS tags for the words.
• The transition probabilities would be somewhat like P(VP | NP) that is, what is
the probability of the current word having a tag of Verb Phrase given that the
previous tag was a Noun Phrase.
• Emission probabilities would be P(john | NP) or P(will | VP) that is, what is the
probability that the word is, say, John given that the tag is a Noun Phrase.
• Note that this is just an informal modeling of the problem to provide a very basic
understanding of how the Part of Speech tagging problem can be modeled using
an HMM

11-04-2022 35
Why do we need HMM for POS Tagger?

• If you notice closely, we can have the words in a sentence as Observable States (given to us in
the data) but their POS Tags as Hidden states and hence we use HMM for estimating POS
tags.
Note: we call Observable states as ‘Observation’ & Hidden states as ‘States’.
• A Hidden Markov Model has the following components:

11-04-2022 36
HMM components example

• Q: Set of possible Tags

• A: The A matrix contains the tag transition probabilities P(ti|ti−1) which represent
the probability of a tag occurring given the previous tag.
Example: Calculating A[Verb][Noun]:
• P (Noun|Verb): Count(Verb, Noun)/Count(Verb)
• O: Sequence of observation (words in the sentence)
• B: The B emission probabilities, P(wi|ti), represent the probability, given a tag (say
Verb), that it will be associated with a given word (say Playing).
The emission probability B[Verb][Playing] is calculated using:
• P(Playing | Verb): Count (Verb, Playing) / Count (Verb)
• It must be noted that we get all these Count() from the corpus itself used for
training.

11-04-2022 37
A sample HMM with both ‘A’ & ‘B’ matrix will look like this :

MD–Modal can, should

VB- verb base form
NN-Noun singular,mass

Here, the black, continuous arrows represent values of Transition matrix ‘A’ while the
dotted black arrow represents Emission Matrix ‘B’ for a system with Q: {MD, VB, NN}.
11-04-2022 38
Decoding using HMM
• Given an input as HMM (Transition Matrix, Emission Matrix) and a sequence of
observations O = o1, o2, …, oT (Words in sentences of a corpus), find the most
probable sequence of states Q = q1q2q3 …qT (POS Tags in our case)

The 2 major assumptions followed while decoding tag sequence using HMMs:
• The probability of a word appearing depends only on its own tag and is
independent of neighboring words and tags.
• The probability of a tag depends only on the previous tag(bigram
HMM) occurred rather than the entire previous tag sequence i.e. shows Markov
Property. Though we can be flexible with this

11-04-2022 39
• So the problem is to find out what is the sequence of that would be used for this
sentence.
• approach to this problem, given the state transition graph that already exists
• To simplify , find out what are all the possible part of speech text a word can take.
• Already discussed, that some words can take multiple part of speech tags, but
none of the words went beyond 7 part of speech tags. And mostly the words
were having if they were ambiguous they were having 2 or 3 part of space tags.
• 1st of all, need to set up a probability matrix called lattice where we
have columns as our observables (words of a sentence in the same sequence as
in sentence) & rows as hidden states(all possible POS Tags are known).

11-04-2022 40
11-04-2022 41
• There are many possibilities. Now each word can take any of these
possibilities.
• So now, there are 32 possibilities of the part of speech tag sequences.
• Among these , we have to find out which is the most likely sequence
of participation tags that is being used.
• Enumerate all the possibilities and for each possibility compute the
probability separately. So for 32 possibilities, compute 32 different
probability.
• That is one possibility, but this is not very efficient, and think of some
sentences which might have say 15-20 words. This will just go
exponentially with the number of words.
So this is not a good solution.

11-04-2022 42
• So, need to have a solution where it does not grow exponentially with
the size of the sentence. So what will be a good algorithm for doing
that?
• So ideally, want to come up with the actual part of speech tag
sequence like here it will be VBD TO VB DT and NN from all the
possibilities.

• So this is the Viterbi algorithm

11-04-2022 43
Viterbi Decoding for HMM
• The decoding algorithm used for HMMs is called the Viterbi algorithm
penned down by the Founder of Qualcomm, an American MNC

• The Viterbi algorithm is a dynamic programming algorithm for

obtaining the maximum a posteriori probability estimate of the
most likely sequence of hidden states—called the Viterbi path—that
results in a sequence of observed events, especially in the context of
Markov information sources and hidden Markov models (HMM).
• Now, how to compute the probability by using the HMM.

11-04-2022 44
Finding the best path: Viterbi Algorithm

P(Verb|Det) = 0.00001, P(Noun|Det) = 0.5, P(Adj|Det) = 0.3,

P(Noun|Noun) =0.2, P(Adj|Noun) = 0.002, P(Noun|Adj) = 0.2,
P(Noun|Verb) = 0.3, P(Verb|Noun) = 0.3, P(Verb|Adj) = 0.001,
P(Verb|Verb) = 0.1
Work out in details the steps of the Viterbi algorithm. You can use a Table to
show the steps. Assume all other conditional probabilities, not mentioned to
be zero. Also, assume that all tags have the same probabilities to appear in the
beginning of a sentence. 46
11-04-2022
Example

Det Noun Adj Verb The light book

Det 0 0.5 0.3 0.00001 D 0.3 0 0
Noun 0 0.2 0.002 0.3 N 0.1 0.003 0.03
Adj 0 0.2 0 0.001 A 0 0.2 0
Verb 0 0.3 0 0.1 V 0 0.06 0.001

Tag Transition Probability Word Emission Probability

Word Probability matrix
P(ti/ti-1) P(wi/ti)

11-04-2022 47
11-04-2022 48
11-04-2022 49
11-04-2022 50
11-04-2022 51
11-04-2022 52
11-04-2022 53
11-04-2022 54
• The most suitable POS tagging for the given phrase is
Determinant, Adjective and Noun respectively.

11-04-2022 55
11-04-2022 • P(tj/ti) = C(ti, tj ) / C(ti) = MLE 56
11-04-2022 57
Excercise

11-04-2022 58
Thank You

Unit-2 Aim 502
No ratings yet
Unit-2 Aim 502
6 pages
NLP Assignment 2
No ratings yet
NLP Assignment 2
2 pages
Unit-3 (NLP)
No ratings yet
Unit-3 (NLP)
28 pages
Automatic Speech Recognition Using Python
No ratings yet
Automatic Speech Recognition Using Python
18 pages
Unit 3
No ratings yet
Unit 3
33 pages
SP Question
0% (1)
SP Question
28 pages
NLP Lect Unit I
No ratings yet
NLP Lect Unit I
140 pages
CCS369
No ratings yet
CCS369
2 pages
NLP Unit 5
No ratings yet
NLP Unit 5
10 pages
TE AI Honor Course
No ratings yet
TE AI Honor Course
18 pages
Word Sense Disambiguation: by Under The Guidance of
No ratings yet
Word Sense Disambiguation: by Under The Guidance of
99 pages
Android Building Blocks
No ratings yet
Android Building Blocks
11 pages
AI Assignment 1
No ratings yet
AI Assignment 1
31 pages
Assignment 6 (COPY)
No ratings yet
Assignment 6 (COPY)
6 pages
Spos Lab Manual
100% (1)
Spos Lab Manual
41 pages
NLP UNIT 5 part b
100% (2)
NLP UNIT 5 part b
31 pages
JNTUA Advanced Data Structures and Algorithms Lab Manual R20
No ratings yet
JNTUA Advanced Data Structures and Algorithms Lab Manual R20
71 pages
Testing Conventional Applications
No ratings yet
Testing Conventional Applications
122 pages
NLP Unit-Ii
No ratings yet
NLP Unit-Ii
118 pages
NLP Asgn2
No ratings yet
NLP Asgn2
7 pages
Lecture 01 - UML Case Tools
No ratings yet
Lecture 01 - UML Case Tools
44 pages
NLP Lab Tasks
No ratings yet
NLP Lab Tasks
16 pages
Assignment 5 (COPY)
No ratings yet
Assignment 5 (COPY)
5 pages
Friend Function and Operator Overloading
No ratings yet
Friend Function and Operator Overloading
54 pages
Chapter 7
No ratings yet
Chapter 7
49 pages
DAA R19 - Unit-5
No ratings yet
DAA R19 - Unit-5
12 pages
Nptel QS Bank WSN
No ratings yet
Nptel QS Bank WSN
94 pages
Be Computer Engineering Semester 7 2023 May Dloc III Natural Language Processing Rev 2019 C Scheme
0% (1)
Be Computer Engineering Semester 7 2023 May Dloc III Natural Language Processing Rev 2019 C Scheme
2 pages
Unit-II BDA
No ratings yet
Unit-II BDA
19 pages
WORKBOOK
No ratings yet
WORKBOOK
13 pages
Dependency Parsing: Pawan Goyal
No ratings yet
Dependency Parsing: Pawan Goyal
38 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
14 pages
Software Tools
No ratings yet
Software Tools
18 pages
Model Question Paper
0% (1)
Model Question Paper
2 pages
2 Syntax Directed Transiation
No ratings yet
2 Syntax Directed Transiation
9 pages
Unit 4 BDA
No ratings yet
Unit 4 BDA
31 pages
NLP End Sem Paper - Evaluation Scheme
No ratings yet
NLP End Sem Paper - Evaluation Scheme
14 pages
Android Studio Viva Questions
No ratings yet
Android Studio Viva Questions
23 pages
NLP Lab Expdoc New
No ratings yet
NLP Lab Expdoc New
103 pages
Phases of Compiler
No ratings yet
Phases of Compiler
17 pages
GATE 2015 Question Papers
No ratings yet
GATE 2015 Question Papers
27 pages
Write A C Program To Simulate Lexical Analyzer To Validating A Given Input String.
No ratings yet
Write A C Program To Simulate Lexical Analyzer To Validating A Given Input String.
8 pages
Unit 1 CV
No ratings yet
Unit 1 CV
78 pages
CD Questions With Answers
100% (1)
CD Questions With Answers
36 pages
Chapter 4 Syntax Directed Translation (SDT)
100% (1)
Chapter 4 Syntax Directed Translation (SDT)
6 pages
NLP Module 4 Notes
No ratings yet
NLP Module 4 Notes
8 pages
Spos by Dhamdhere PDF
No ratings yet
Spos by Dhamdhere PDF
456 pages
Non Monotonic Reasoning
No ratings yet
Non Monotonic Reasoning
45 pages
3337 Spos MCQ Dheeraj Kumar
No ratings yet
3337 Spos MCQ Dheeraj Kumar
37 pages
NLP Mcq+Dis Answers-Ok
No ratings yet
NLP Mcq+Dis Answers-Ok
52 pages
From Chapter 1 of Distributed Systems Concepts and Design, 4 Edition
100% (1)
From Chapter 1 of Distributed Systems Concepts and Design, 4 Edition
49 pages
Expression Tree
No ratings yet
Expression Tree
14 pages
Lecture19 & 20 - Xen - Architecture
No ratings yet
Lecture19 & 20 - Xen - Architecture
16 pages
Module 3 - Paper 1 - Extracting Relations From Text From Word Sequences To Dependency Paths
No ratings yet
Module 3 - Paper 1 - Extracting Relations From Text From Word Sequences To Dependency Paths
11 pages
CH03 Loaders and Linkers
100% (5)
CH03 Loaders and Linkers
20 pages
ESIOT_LAB
No ratings yet
ESIOT_LAB
29 pages
Superpipelining
No ratings yet
Superpipelining
7 pages
1-NLP - Lab Manual
No ratings yet
1-NLP - Lab Manual
15 pages
013 Rete Algorihm
No ratings yet
013 Rete Algorihm
28 pages
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet
lab 07
No ratings yet
lab 07
3 pages
Maths in Basketball: A Perfect Free Throw
No ratings yet
Maths in Basketball: A Perfect Free Throw
1 page
Definitions of CEC2023 Benchmark Suite
No ratings yet
Definitions of CEC2023 Benchmark Suite
15 pages
Assignment Statics
No ratings yet
Assignment Statics
2 pages
5-Heat Transfer (Jun 19)
No ratings yet
5-Heat Transfer (Jun 19)
30 pages
Block Foundation Dynamic Analysis (HUGO)
No ratings yet
Block Foundation Dynamic Analysis (HUGO)
31 pages
Thermodynamic Analysis of Linde System For Liquefaction of Gases
No ratings yet
Thermodynamic Analysis of Linde System For Liquefaction of Gases
7 pages
Probability and Statistics Mat 271E
100% (1)
Probability and Statistics Mat 271E
50 pages
Learn at Home Grade 2
100% (3)
Learn at Home Grade 2
66 pages
Junior Training Sheet - Template - V7.0 PDF
No ratings yet
Junior Training Sheet - Template - V7.0 PDF
45 pages
bsc kin3
No ratings yet
bsc kin3
3 pages
Character Recognition Systems A Guide for Students and Practioners 1st Edition Mohamed Cheriet - Download the full ebook set with all chapters in PDF format
100% (1)
Character Recognition Systems A Guide for Students and Practioners 1st Edition Mohamed Cheriet - Download the full ebook set with all chapters in PDF format
57 pages
Rhino Modelling Tutorial
100% (1)
Rhino Modelling Tutorial
23 pages
Similarity Extra Practice
No ratings yet
Similarity Extra Practice
4 pages
STA 102 Practice Problems- Chapter 5,6 7
No ratings yet
STA 102 Practice Problems- Chapter 5,6 7
3 pages
MDFT PDF
No ratings yet
MDFT PDF
4 pages
Experimental Quantification of Hang Up
No ratings yet
Experimental Quantification of Hang Up
9 pages
SALADAGA, G.Educ 102-ASSIGNMENT 1
No ratings yet
SALADAGA, G.Educ 102-ASSIGNMENT 1
7 pages
RF Lab Report
No ratings yet
RF Lab Report
29 pages
Structural Dynamics: M.E. in Earthquake Engineering
No ratings yet
Structural Dynamics: M.E. in Earthquake Engineering
32 pages
Menum Lab: Finite Elements
No ratings yet
Menum Lab: Finite Elements
4 pages
Teaching Plan 3B.1 Wave Phase and Superposition
No ratings yet
Teaching Plan 3B.1 Wave Phase and Superposition
15 pages
Exp 5 Lagrangeinterpolation
No ratings yet
Exp 5 Lagrangeinterpolation
3 pages
h2 Mathematics Summarized Formulae
No ratings yet
h2 Mathematics Summarized Formulae
19 pages
Large Scale Flexible Led Display PDF
No ratings yet
Large Scale Flexible Led Display PDF
543 pages
Biomechanics: Kinematics
No ratings yet
Biomechanics: Kinematics
6 pages
Chapter 9 APplications To Differentiation
No ratings yet
Chapter 9 APplications To Differentiation
11 pages
Newton's First Law
100% (1)
Newton's First Law
4 pages
Activity 4: A. Do What Is Asked
No ratings yet
Activity 4: A. Do What Is Asked
9 pages

Uploaded by

Uploaded by

19CSE453 – Natural Language Processing

Part of Speech tagging

Amrita School of Engineering, Bengaluru

DT–Determiner, JJ-Adjective, VBD-verb past, IN-preposition, NN-noun singular or mass, NNS-plural

Verb base form

NNP-proper noun singular

Transformation Based Taggers

NN- noun singular

VBZ-verb 3 sing present

• The next level of complexity that can be introduced into a stochastic

In order to compute the probability of today’s weather given N previous

There are two kinds of probabilities from the state

• The Emission probabilities matrix

• Q: Set of possible Tags

MD–Modal can, should

• So this is the Viterbi algorithm

• The Viterbi algorithm is a dynamic programming algorithm for

P(Verb|Det) = 0.00001, P(Noun|Det) = 0.5, P(Adj|Det) = 0.3,

Det Noun Adj Verb The light book

Tag Transition Probability Word Emission Probability

You might also like