0% found this document useful (0 votes)
25 views

Markov Chains

The document discusses Markov chains, which are sequences of random variables where the probability of the next variable depends only on the current variable. It defines transition probabilities and initial state probabilities, and shows how these fully specify a Markov chain. Examples of modeling real-world processes as Markov chains are provided.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Markov Chains

The document discusses Markov chains, which are sequences of random variables where the probability of the next variable depends only on the current variable. It defines transition probabilities and initial state probabilities, and shows how these fully specify a Markov chain. Examples of modeling real-world processes as Markov chains are provided.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 136

Markov Chains

◮ Let Xn , n = 0, 1, · · · be a sequence of discrete random


variables taking values in S.
Note that S would be countable
◮ We say it is a Markov chain if ∀n, xi
P [Xn+1 = xn+1 |Xn = xn , Xn−1 = xn−1 · · · X0 = x0 ] = P [Xn+1 = xn+1 |Xn = xn ]

◮ We can write it as

fXn+1 |Xn Xn−1 ···X1 = fXn+1 |Xn


◮ Conditioned on Xn , Xn+1 is independent of
Xn−1 , Xn−2 , · · ·
◮ We think of Xn as state at n
◮ For a Markov chain, given the current state, the future
evolution is independent of the history of how you
reached the current state
P S Sastry, IISc, E1 222, Lecture 19 2020 1/136
Example

◮ Let Xi be iid discrete rv taking integer values.


Let Y0 = 0 and Yn = ni=1 Xi
P

◮ Yn , n = 0, 1, · · · is a Markov chain with state space as


integers
◮ Note that Yn+1 = Yn + Xn+1 and Xn+1 is independent of
Y0 , · · · , Yn .

P [Yn+1 = y|Yn = x, Yn−1 , · · · ] = P [Xn+1 = y − x]

◮ Thus, Yn+1 is conditionally independent of Yn−1 , · · ·


conditioned on Yn
◮ Sum of iid random variables is a Markov chain

P S Sastry, IISc, E1 222, Lecture 19 2020 2/136


◮ In this example, we can think of Xn as the number of
people or things arriving at a facility in the nth time
interval.
◮ Then Yn would be total arrivals till end of nth time
interval.
◮ Number of packets coming into a network switch, number
people joining the queue in a bank, can be modeled as
Markov chains.
In such applications, it is called a queing chain.
◮ Markov chain is a useful model for many dynamic systems
or processes

P S Sastry, IISc, E1 222, Lecture 19 2020 3/136


◮ The Markov property is: given current state, the future
evolution is independent of the history of how we came to
current state.
◮ It essentially means the current state contains all needed
information about history
◮ We are considering the case where states as well as time
are discrete.
◮ It can be more general

P S Sastry, IISc, E1 222, Lecture 19 2020 4/136


Transition Probabilities
◮ Let {Xn , n = 0, 1, · · · } be a Markov Chain with
(countable) state space S

P r[Xn+1 = xn+1 |Xn = xn , Xn−1 · · · X0 ] = P r[Xn+1 = xn+1 |Xn = xn ], ∀x

(Notice the big change of notation: P r[A] for probability of A)


◮ Define function P : S × S → [0, 1] by

P (x, y) = P r[Xn+1 = y|Xn = x]

◮ P is called the state transition probability function. It


satisfies
◮ P
P(x, y) ≥ 0, ∀x, y ∈ S
y∈S P (x, y) = 1, ∀x ∈ S

◮ If S is finite then P can be represented as a matrix


P S Sastry, IISc, E1 222, Lecture 19 2020 5/136
◮ The state transition probability function is given by

P (x, y) = P r[Xn+1 = y|Xn = x]

◮ In general, this can depend on n though our notation


does not show it
◮ If the value of that probability does not depend on n then
the chain is called homogeneous
◮ For a homogeneous chain we have

P r[Xn+1 = y|Xn = x] = P r[X1 = y|X0 = x], ∀n

◮ In this course we will consider only homogeneous chains

P S Sastry, IISc, E1 222, Lecture 19 2020 6/136


Initial State Probabilities

◮ Let {Xn } be a Markov Chain with state space S


◮ Define function π0 : S → [0, 1] by

π0 (x) = P r[X0 = x]

◮ It is the pmf of the rv X0


◮ Hence it satisfies
◮ π
P0 (x) ≥ 0, ∀x ∈ S
x∈S π0 (x) = 1

◮ From now on, without loss of generality, we take


S = {0, 1, · · · }

P S Sastry, IISc, E1 222, Lecture 19 2020 7/136


◮ Let Xn be a (homogeneous) Markov chain
◮ Then we have

P r[X0 = x0 , X1 = x1 ] = P r[X1 = x1 |X0 = x0 ] P r[X0 = x0 ], ∀x0 , x1


= P (x0 , x1 )π0 (x0 ) = π0 (x0 )P (x0 , x1 )

◮ Now we can extend this as

P r[X0 = x0 , X1 = x1 , X2 = x2 ] = P r[X2 = x2 |X1 = x1 , X0 = x0 ]


P r[X0 = x0 , X1 = x1 ]
= P r[X2 = x2 |X1 = x1 ]
P r[X0 = x0 , X1 = x1 ]
= P (x1 , x2 ) P (x0 , x1 )π0 (x0 )
= π0 (x0 )P (x0 , x1 )P (x1 , x2 )

P S Sastry, IISc, E1 222, Lecture 19 2020 8/136


◮ This calculation is easily generalized to any number of
time steps

P r[X0 = x0 , · · · Xn = xn ] = P r[Xn = xn |Xn−1 = xn−1 , · · · X0 = x0 ]


P r[Xn−1 = xn−1 , · · · X0 = x0 ]
= P r[Xn = xn |Xn−1 = xn−1 ]
P r[Xn−1 = xn−1 , · · · X0 = x0 ]
= P (xn−1 , xn )P r[Xn−1 = xn−1 , · · · X0 = x0 ]
= P (xn−1 , xn )P r[Xn−1 = xn−1 |Xn−2 = xn−2 ]
P r[Xn−2 = xn−2 , · · · X0 = x0 ]
..
.
= π0 (x0 )P (x0 , x1 ) · · · P (xn−1 , xn )

P S Sastry, IISc, E1 222, Lecture 19 2020 9/136


◮ We showed

P r[X0 = x0 , · · · Xn = xn ] = π0 (x0 )P (x0 , x1 ) · · · P (xn−1 , xn )

◮ This shows that the transition probabilities, P , and initial


state probabilities, π0 , completely specify the chain.
◮ They give us the joint distribution of any finite
subcollection of the rv’s
◮ Suppose you want joint distribution of Xi1 , · · · Xik
◮ Let m = max(i1 , · · · , ik )
◮ We know how to get joint distribution of X0 , · · · , Xm .
◮ The joint distribution of Xi1 , · · · Xik is now calculated as
a marginal distribution from the joint distribution of
X0 , · · · , Xm

P S Sastry, IISc, E1 222, Lecture 19 2020 10/136


Example: 2-state chain
◮ Let S = {0, 1}.
◮ We can write the transition probabilities as a matrix

P (0, 0) P (0, 1) 1−p p
P = =
P (1, 0) P (1, 1) q 1−q
◮ Now we can calculate the joint distribution, e.g., of
X1 , X2 as
1
X
P r[X1 = 0, X2 = 1] = P r[X0 = x, X1 = 0, X2 = 1]
x=0
1
X
= π0 (x)P (x, 0)P (0, 1)
x=0
= π0 (0)(1 − p)p + π0 (1)qp

P S Sastry, IISc, E1 222, Lecture 19 2020 11/136


◮ We can similarly calculate probabilities of any events
involving these random variables

P r[X2 6= X0 ] = P r[X2 = 0, X0 = 1] + P r[X2 = 1, X0 = 0]


X1
= (π0 (1)P (1, x)P (x, 0) + π0 (0)P (0, x)P (x, 1))
x=0

◮ We have the formula

P r[X0 = x0 , · · · Xn = xn ] = π0 (x0 )P (x0 , x1 ) · · · P (xn−1 , xn )

◮ This can easily be seen through a graphical notation.

P S Sastry, IISc, E1 222, Lecture 19 2020 12/136


◮ Consider the 2-state chain with S = {0, 1} and

1−p p
P =
q 1−q
◮ We can represent the chain through a graph as shown
below
p

0
1

1-q

1-p
q

◮ The nodes represent states. The edges show possible


transitions and the probabilities. We can easily see
P r[X0 = 0, X1 = 1, X2 = 1, X3 = 0] = π0 (0)p(1 − q)q

P S Sastry, IISc, E1 222, Lecture 19 2020 13/136


◮ The two state chain can model the ‘working/down’ state
of a machine
◮ We can ask: what is the probability the machine would be
working for the next two days given it is working today

P r[X0 = 1, X1 = 1, X2 = 1]
P r[X1 = 1, X2 = 1|X0 = 1] =
P r[X0 = 1]
π(1)P (1, 1)P (1, 1)
=
π(1)
◮ We may want to know about limn→∞ P r[Xn = 1]

P S Sastry, IISc, E1 222, Lecture 19 2020 14/136


Another example

◮ A man has 4 umbrellas. carries them from home to office


and back when needed. Probability of rain in the morning
and evening is same, namely, p.
◮ What should be the state?

P S Sastry, IISc, E1 222, Lecture 19 2020 15/136


Another example
◮ A man has 4 umbrellas. carries them from home to office
and back when needed. Probability of rain in the morning
and evening is same, namely, p.
◮ What should be the state?
◮ S = {0, 1, · · · , 5}. The transition probabilities are
 
0 1 2 3 4
 0 0 0 0 0 1 
 
 1 0 0 0 1 − p p 
P =  2

 0 0 1 − p p 0 

 3 0 1−p p 0 0 
4 1−p p 0 0 0

P S Sastry, IISc, E1 222, Lecture 19 2020 16/136


Birth-Death chain
◮ The following Markov chain is known as a birth-death
chain
q q q q

p p p p ...
...

i-1 i i+1

e.g., Random walk: Xi ∈ {−1, +1}, iid, Sn = ni=1 Xi


P

◮ The queing chain would be a birth-death chain if at most


one person can join and one person can leave the queue
in any time step.
◮ In general, the transition probabilities can be different for
different states.
◮ Birth-death chains can also have self-loops on states

P S Sastry, IISc, E1 222, Lecture 19 2020 17/136


◮ We can have birth-death chains with finite state space
also
q q q q
1-p
p p p p
... ... 1-q

i-1 i i+1
0 N

◮ We can say it has ‘reflecting boundaries’


◮ This chain keeps visiting all the states again and again.

P S Sastry, IISc, E1 222, Lecture 19 2020 18/136


Gambler’s Ruin chain

◮ The following chain is called Gambler’s ruin chain


q q q
1
p p p
... ... 1

1 i-1 i i+1
0 N

◮ Here, the chain is ultimately absorbed either in 0 or in N


◮ Here state can be the current funds that the gambler has

P S Sastry, IISc, E1 222, Lecture 19 2020 19/136


◮ The transition probabilities we defined earlier are also
called one step transition probabilities

P (x, y) = P r[Xn+1 = y|Xn = x] = P r[X1 = y|X0 = x]

◮ We can define transition probabilities for multiple steps,


that is, P r[Xn = y|X0 = x]
◮ We first look at one consequence of markov property
◮ The Markov property implies that it is the most recent
past that matters. For example

P r[Xn+m = y|Xn = x, X0 ] = P r[Xn+m = y|Xn = x]

P S Sastry, IISc, E1 222, Lecture 19 2020 20/136


◮ We consider a simple case
P r[X3 = y, X1 = x, X0 = z]
P r[X3 = y|X1 = x, X0 = z] =
P r[X1 = x, X0 = z]
P
w π0 (z)P (z, x)P (x, w)P (w, y)
=
π0 (z)P (z, x)
X
= P (x, w)P (w, y)
w
◮ We also have
P r[X3 = y|X1 = x] = P r[X2 = y|X0 = x]
P
w π0 (x)P (x, w)P (w, y)
=
π0 (x)
X
= P (x, w)P (w, y)
w
◮ Thus we get
P r[X3 = y|X1 = x, X0 = z] = P r[X3 = y|X1 = x]
P S Sastry, IISc, E1 222, Lecture 19 2020 21/136
◮ Using similar algebra, we can show that
P r[Xm+n = y|Xm = x, Xm−1 · · · ] = P r[Xm+n = y|Xm = x]
= P r[Xn = y|X0 = x]
◮ Or, in general,
fXm+n |Xm ,··· ,X0 (y|x, · · · ) = fXm+n |Xm (y|x)
◮ Using the same algebra, we can show
Pr[Xm+n = y|Xm = x, Xm−k ∈ Ak , k = 1, · · · , m] =
P r[Xm+n = y|Xm = x]

P r[Xm+n+r ∈ Br , r = 0, · · · , s | Xm = x, Xm−k ∈ Ak , k = 1, · · · , m]
= P r[Xm+n+r ∈ Br , r = 0, · · · , s | Xm = x]

P S Sastry, IISc, E1 222, Lecture 19 2020 22/136


◮ Now we get
X
P r[Xm+n = y|X0 = x] = P r[Xm+n = y, Xm = z|X0 = x]
z
X
= P r[Xm+n = y|Xm = z, X0 = x] P r[Xm = z|X0 = x]
z
X
= P r[Xm+n = y|Xm = z] P r[Xm = z|X0 = x]
z
X
= P r[Xn = y|X0 = z] P r[Xm = z|X0 = x]
z

P S Sastry, IISc, E1 222, Lecture 19 2020 23/136


Chapman-Kolmogorov Equations

◮ Define: P n (x, y) = P r[Xn = y|x0 = x]


◮ These are called n-step transition probabilities.
◮ From what we showed,

P m+n (x, y) = P r[Xm+n = y|X0 = x]


X
= P r[Xn = y|X0 = z] P r[Xm = z|X0 = x]
z
X
= P m (x, z)P n (z, y)
z

◮ These are known as Chapman-Kolmogorov equations


◮ This relationship is intuitively clear

P S Sastry, IISc, E1 222, Lecture 19 2020 24/136


◮ Specifically, using Chapman-Kolmogorov equations,
X
P 2 (x, y) = P (x, z)P (z, y)
z

◮ For a finite chain, P is a matrix


◮ Thus P 2 (x, y) is
the (x, y)th element of the matrix, P × P
◮ That is why we use P n for n-step transition probabilities

P S Sastry, IISc, E1 222, Lecture 19 2020 25/136


◮ Define: πn (x) = P r[Xn = x].
◮ Then we get
X
πn (y) = P r[Xn = y|X0 = x] P r[X0 = x]
x
X
= π0 (x)P n (x, y)
x

◮ In particular
X
πn+1 (y) = P r[Xn+1 = y|Xn = x] P r[Xn = x]
x
X
= πn (x)P (x, y)
x

P S Sastry, IISc, E1 222, Lecture 19 2020 26/136


Hitting times
◮ Let y be a state.
◮ We define hitting time for y as the random variable

Ty = min{n > 0 : Xn = y}

◮ Ty is the first time that the chain is in state y (after t = 0


when the chain is initiated).
◮ It is easy to see that P r[Ty = 1|X0 = x] = P (x, y).
◮ Notation: Pz (A) = P r[A|X0 = z]
◮ Using this notation, Px (Ty = 1) = P (x, y)
Note that

Px [Ty = n] = P r[Ty = n|X0 = x]6=P n (x, y)

P S Sastry, IISc, E1 222, Lecture 19 2020 27/136


Ty = min{n > 0 : Xn = y}

◮ We can now get


X X
Px (Ty = 2) = P (x, z) P (z, y) = P (x, z) Pz (Ty = 1)
z6=y z6=y
Px (Ty = m) = P r[Ty = m|X0 = x]
X
= P r[Ty = m|X1 = z, X0 = x] P r[X1 = z|X0 = x]
z6=y
X
= P (x, z)P r[Ty = m|X1 = z]
z6=y
X
= P (x, z)Pz (Ty = m − 1)
z6=y

P S Sastry, IISc, E1 222, Lecture 19 2020 28/136


◮ Similarly we can get:
n
X
n
P (x, y) = Px (Ty = m)P n−m (y, y)
m=1
◮ We can derive this as
P n (x, y) = P r[Xn = y|X0 = x]
Xn
= P r[Ty = m, Xn = y|X0 = x]
m=1
n
X
= P r[Xn = y|Ty = m, X0 = x] P r[Ty = m|X0 = x]
m=1
n
X
= P r[Xn = y|Xm = y] P r[Ty = m|X0 = x]
m=1
Xn
= P n−m (y, y) Px (Ty = m)
m=1

P S Sastry, IISc, E1 222, Lecture 19 2020 29/136


transient and recurrent states
◮ Define ρxy = Px (Ty < ∞).
◮ It is the probability that starting in x you will visit y
◮ Note that
n−1
X ∞
X
ρxy = lim Px (Ty < n) = lim Px (Ty = m) = Px (Ty = m)
n→∞ n→∞
m=1 m=1

Definition: A state y is called transient if ρyy < 1; it is


called recurrent if ρyy = 1.
◮ Intuitively, all transient states would be visited only
finitely many times while recurrent states are visited
infinitely often.
◮ For any state y define

1 if Xn = y
Iy (Xn ) =
0 otherwise

P S Sastry, IISc, E1 222, Lecture 19 2020 30/136


◮ Now, the total number of visits to y is given by

X
Ny = Iy (Xn )
n=1

◮ We can get distribution of Ny as


Px (Ny ≥ 1) = Px (Ty < ∞) = ρxy
X
Px (Ny ≥ 2) = Px (Ty = m)Py (Ty < ∞)
m
X
= ρyy Px (Ty = m) = ρyy ρxy
m
m−1
Px (Ny ≥ m) = ρyy ρxy
Px (Ny = m) = Px (Ny ≥ m) − Px (Ny ≥ m + 1)
m−1
= ρyy ρxy − ρm m−1
yy ρxy = ρxy ρyy (1 − ρyy )
Px (Ny = 0) = 1 − Px (Ny ≥ 1) = 1 − ρxy

P S Sastry, IISc, E1 222, Lecture 19 2020 31/136


◮ Notation: Ex [Z] = E[Z|X0 = x]
◮ Define

G(x, y) , Ex [Ny ]
"∞ #
X
= Ex Iy (Xn )
n=1

X
= Ex [Iy (Xn )]
n=1
X∞
= P n (x, y)
n=1

◮ G(x, y) is the expected number of visits to y for a chain


that is started in x.

P S Sastry, IISc, E1 222, Lecture 19 2020 32/136


Theorem:
(i). Let y be transient. Then
ρxy
Px (Ny < ∞) = 1, ∀x and G(x, y) = < ∞, ∀x
1 − ρyy

(ii) Let y be recurrent. Then

Py [Ny = ∞] = 1, and G(y, y) = Ey [Ny ] = ∞


0 if ρxy = 0
Px [Ny = ∞] = ρxy , and G(x, y) =
∞ if ρxy > 0

P S Sastry, IISc, E1 222, Lecture 19 2020 33/136


Proof of (i): y is transient; ρyy < 1

X
G(x, y) = Ex [N (y)] = mPx [N (y) = m]
m
X
m−1
= m ρxy ρyy (1 − ρyy )
m

X
m−1
= ρxy m ρyy (1 − ρyy )
m=1
1
= ρxy < ∞, because ρyy < 1
1 − ρyy

⇒ Px [N (y) < ∞] = 1

P S Sastry, IISc, E1 222, Lecture 19 2020 34/136


Proof of (ii):
y recurrent ⇒ ρyy = 1. Hence

Py [N (y) ≥ m] = ρm
yy = 1, ∀m
⇒ Py [N (y) = ∞] = lim Py [N (y) ≥ m] = 1
m→∞
⇒ G(y, y) = Ey [N (y)] = ∞

m−1
Px [N (y) ≥ m] = ρxy ρyy = ρxy , ∀m
Hence Px [N (y) = ∞] = ρxy

ρxy = 0 ⇒ Px [N (y) ≥ m] = 0, ∀m > 0 ⇒ G(x, y) = 0

ρxy > 0 ⇒ Px [N (y) = ∞] > 0 ⇒ G(x, y) = ∞

P S Sastry, IISc, E1 222, Lecture 19 2020 35/136


◮ Transient states are visited only finitely many times while
recurrent states are visited infinitely often
◮ If S is finite, it should have at least one recurrent state
◮ If y is transient, then, for all x

X
G(x, y) = P n (x, y) < ∞ ⇒ lim P n (x, y) = 0
n→∞
n=1

However, y P n (x, y) = 1, ∀n, ∀x


P

◮ If S is finite and all y ∈ S are transient, then we get a
contradiction
X X
1 = lim P n (x, y) = lim P n (x, y) = 0
n→∞ n→∞
y∈S y∈S

◮ A finite chain has to have at least one recurrent state


◮ It is possible to have an infinite chain where all states are
transient
P S Sastry, IISc, E1 222, Lecture 19 2020 36/136
◮ We say, x leads to y if ρxy > 0
Theorem: If x is recurrent and x leads to y then y is
recurrent and ρxy = ρyx = 1.
Proof:
◮ Take x 6= y, wlog. Since ρxy > 0, ∃n s.t. P n (x, y) > 0
◮ Take least such n. Then we have states y1 , · · · , yn−1 ,
none of which is x (or y) such that
P (x, y1 ) P (y1 , y2 ) · · · P (yn−1 , y) > 0
◮ Now suppose, ρyx < 1. Then
P (x, y1 ) P (y1 , y2 ) · · · P (yn−1 , y)(1 − ρyx ) > 0
is the probability of starting in x but not returning to x.
◮ But this cannot be because x is recurrent and hence
ρxx = 1
◮ Hence, if x is recurrent and x leads to y then ρyx = 1
P S Sastry, IISc, E1 222, Lecture 19 2020 37/136
◮ Now, ∃n0 , n1 s.t. P n0 (x, y) > 0, P n1 (y, x) > 0.

P n1 +n+n0 (y, y) = Py [Xn1 +n+n0 = y]


≥ Py [Xn1 = x, Xn1 +n = x, Xn1 +n+n0 = y]
= P n1 (y, x)P n (x, x)P n0 (x, y), ∀n

We know G(x, x) = ∞ m
P
m=1 P (x, x) = ∞


X ∞
X ∞
X
m m
P (y, y) ≥ P (y, y) = P n1 +n+n0 (y, y)
m=1 m=n0 +n1 +1 n=1
X∞
≥ P (y, x)P n (x, x)P n0 (x, y)
n1

n=1
= ∞, because x is recurrent
⇒ y is recurrent

P S Sastry, IISc, E1 222, Lecture 19 2020 38/136


◮ What we showed so far is: if x leads to y and x is
recurrent, then ρyx = 1 and y is recurrent.
◮ Now, y is recurrent and y leads to x and hence ρxy = 1.
◮ This completes proof of the theorem

P S Sastry, IISc, E1 222, Lecture 19 2020 39/136


equivalence relation

◮ let R be a relation on set A. Note R ⊂ A × A


◮ R is called an equivalence relation if it is
1. reflexive, i.e., (x, x) ∈ R, ∀x ∈ A
2. symmetric, i.e., (x, y) ∈ R ⇒ (y, x) ∈ R
3. transitive, i.e., (x, y), (y, z) ∈ R ⇒ (x, z) ∈ R

P S Sastry, IISc, E1 222, Lecture 19 2020 40/136


example

◮ Let A = { mn
| m, n are integers}
◮ Define relation R by

m p
, ∈ R if mq = np
n q
◮ This is the usual equality of fractions
◮ Easy to check it is an equivalence relation.

P S Sastry, IISc, E1 222, Lecture 19 2020 41/136


Equivalence classes

◮ Let R be an equivalence relation on A.


◮ Then, A can be partitioned as

A = C1 + C2 + · · ·

Where Ci satisfy
◮ x, y ∈ Ci ⇒ (x, y) ∈ R, ∀i
◮ x ∈ Ci , y ∈ Cj , i 6= j ⇒ (x, y) ∈
/R
◮ In our example, each equivalence class corresponds to a
rational number.
◮ Here, Ci contains all fractions that are equal to that
rational number

P S Sastry, IISc, E1 222, Lecture 19 2020 42/136


◮ The state space of any Markov chain can be partitioned
into the transient and recurrent states: S = ST + SR :

ST = {y ∈ S : ρyy < 1} SR = {y ∈ S : ρyy = 1}

◮ On SR , consider the relation: ‘x leads to y’ (i.e., x is


related to y if ρxy > 0)
◮ This is an equivalence relation
◮ ρxx > 0, ∀x ∈ SR
◮ ρxy > 0 ⇒ ρyx > 0, ∀x, y ∈ SR
◮ ρxy > 0, ρyz > 0 ⇒ ρxz > 0
◮ Hence we get a partition: SR = C1 + C2 + · · · where Ci
are equivalence classes.

P S Sastry, IISc, E1 222, Lecture 19 2020 43/136


◮ On SR , “x leads to y” is an equivalence relation.
◮ This gives rise to the partition SR = C1 + C2 + · · ·
◮ Since Ci are equivalence classes, they satisfy:
◮ x, y ∈ Ci ⇒ x leads to y
◮ x ∈ Ci , y ∈ Cj , i 6= j ⇒ ρxy = 0
◮ All states in any Ci lead to each other or communicate
with each other
◮ If i 6= j and x ∈ Ci and y ∈ Cj , then, ρxy = ρyx = 0. x
and y do not communicate with each other.

P S Sastry, IISc, E1 222, Lecture 19 2020 44/136


◮ A set of states, C ⊂ S is said to be irreducible if x leads
to y for all x, y ∈ C
◮ An irreducible set is also called a communicating class
◮ A set of states, C ⊂ S, is said to be closed if
x ∈ C, y ∈ / C implies ρxy = 0.
◮ Once the chain visits a state in a closed set, it cannot
leave that set.
◮ We get a partition of recurrent states

SR = C 1 + C 2 + · · ·

where each Ci is a closed and irreducible set of states.


◮ If S is irreducible then the chain is said to be irreducible.
(Note that S is trivially closed)

P S Sastry, IISc, E1 222, Lecture 19 2020 45/136


◮ In an irreducible set of states, if one state is recurrent,
then, all states are recurrent.
◮ We saw that a finite chain has to have at least one
recurrent state.
◮ Thus, a finite irreducible chain is recurrent.
◮ For example, in the umbrellas problem, the chain is
irreducible and hence all states are recurrent.
◮ An infinite irreducible chain may be wholly transient
◮ Here is a trivial example of non-irreducible transient chain:
0 1 2 3 ...

P S Sastry, IISc, E1 222, Lecture 19 2020 46/136


◮ The state space of any Markov chain can be partitioned
into transient and recurrent states.
◮ We need not calculate ρxx to do this partition.
◮ By looking at the structure of transition probability
matrix we can get this partition

P S Sastry, IISc, E1 222, Lecture 19 2020 47/136


Example
0 1 2 4 5
3

 
0 1 2 3 4 5

 0 + − − − − − 


 1 + + + − − − 


 2 − + + + − + 


 3 − − − + + − 

 4 − − − − + + 
5 − − − + − +

◮ State 0 is called an absorbing state. {0} is a closed


irreducible set.
◮ 1, 2 are transient states.
◮ We get: ST = {1, 2} and SR = {0} + {3, 4, 5}
P S Sastry, IISc, E1 222, Lecture 19 2020 48/136
◮ If you start the chain in a recurrent state it will stay in
the corresponding closed irreducible set
◮ If you start in one of the transient states, it would
eventually get ‘absorbed’ in one of the closed irreducible
sets of recurrent states.
◮ We want to know the probabilities of ending up in
different sets.
◮ We want to know how long you stay in transient states
◮ We want to know what is the ‘steady state’ ?

P S Sastry, IISc, E1 222, Lecture 19 2020 49/136


◮ Let C be a closed irreducible set of recurrent states
◮ TC – hitting time for C.
TC = min{n > 0 : Xn ∈ C}
It is the first time instant when the chain is in C
◮ Define ρC (x) = Px [TC < ∞]

1 if x ∈ C
If x is recurrent, ρC (x) =
0 if x ∈ /C

Because each x is in a closed irreducible set


◮ Suppose x is transient. Then
X X
ρC (x) = P (x, y) + P (x, y) ρC (y)
y∈C y∈ST

◮ By solving this set of linear equations we can get


ρc (x), x ∈ ST
P S Sastry, IISc, E1 222, Lecture 19 2020 50/136
Example: Absorption probabilities
0.2

1.0 0.25 0.2


0 1 0.25 2 4 5
3

0.4
0.5 0.2

◮ ST = {1, 2} and C1 = {0}, C2 = {3, 4, 5}


X X
ρC (x) = P (x, y) + P (x, y) ρC (y)
y∈C y∈ST

ρC1 (1) = P (1, 0) + P (1, 1)ρC1 (1) + P (1, 2)ρC1 (2)


= 0.25 + 0.5ρC1 (1) + 0.25ρC1 (2)
ρC1 (2) = 0 + 0.2ρC1 (1) + 0.4ρC1 (2)

◮ Solving these, we get ρC1 (1) = 0.6, ρC1 (2) = 0.2


◮ What would be ρC2 (1)?
P S Sastry, IISc, E1 222, Lecture 19 2020 51/136
Expected time in transient states
◮ We consider a simple method to get the time spent in
transient states for finite chains
◮ Let states 1, 2, · · · , t be the transient states
◮ bij – the expected number of time instants spent in state
j when started in i.
◮ Then we get
t
X
bij = δij + P (i, k)bkj
k=1
where δij = 1 if i = j and is zero otherwise
◮ let B be the t × t matrix of bij , I be the t × t identity
matrix and PT be the submatrix (corresponding to the
transient states) of P .
◮ Then the above in Matrix notation is
B = I + PT B or B = (I − PT )−1
P S Sastry, IISc, E1 222, Lecture 19 2020 52/136
P S Sastry, IISc, E1 222, Lecture 19 2020 53/136
Stationary Distributions
◮ π : S → [0, 1] is a probability distribution
P (mass
function) over S if π(x) ≥ 0, ∀x and x∈S π(x) = 1
◮ A probability distribution, π, over S, is said to be a
stationary distribution for the Markov chain with
transition probabilities P if
X
π(y) = π(x)P (x, y), ∀y ∈ S
x∈S

◮ Suppose S is finite.
Then π can be represented by a column vector.
◮ The π is stationary if

πT = πT P or P T π = π

(The superscript T denotes transpose of a matrix)


P S Sastry, IISc, E1 222, Lecture 19 2020 54/136
◮ π is a stationary distribution if
X
π(y) = π(x)P (x, y), ∀y ∈ S
x∈S

◮ Recall πn (x) , P r[Xn = x] satisfies


X X
πn+1 (y) = P r[Xn+1 = y|Xn = x]P r[Xn = x] = πn (x)P (x, y)
x∈S x∈S

◮ Hence, if π0 = π then π1 = π
and hence πn = π, ∀n
◮ Hence the name, stationary distribution.
◮ It is also called the invariant distribution or the invariant
measure

P S Sastry, IISc, E1 222, Lecture 19 2020 55/136


◮ If the chain is started in stationary distribution then the
distribution of Xn is not a function of time, as we saw.
◮ Suppose for a chain, distribution of Xn is not dependent
on n. Then the chain must be in a stationary distribution.
◮ Suppose π = π0 = π1 = · · · = πn = · · · . Then
X X
π(y) = π1 (y) = π0 (x)P (x, y) = π(x)P (x, y)
x∈S x∈S

which shows π is a stationary distribution

P S Sastry, IISc, E1 222, Lecture 19 2020 56/136


◮ Suppose S is finite.
Then π can be represented by a vector
◮ Then π is a stationary distribution if

P T π = π or PT − I π = 0

◮ Note that each column of P T sums to 1.


Hence, P T − I would be singular


(1 is always an eigen value for a column stochastic matrix)
◮ A stationary distribution always exists for a finite chain.
◮ But it may or may not be unique.
◮ What about infinite chains?

P S Sastry, IISc, E1 222, Lecture 19 2020 57/136


Example
0.75 0.5
0 1 0.5 2
0.25

0.25
0.75

◮ The stationary distribution has to satisfy


X
π(y) = π(x)P (x, y), ∀y ∈ S
x∈S

◮ Thus we get the following linear equations


0.75π(0) + 0.5π(1) = π(0)
0.25π(0) + 0.75π(2) = π(1)
0.5π(1) + 0.25π(2) = π(2)

in addition, π(0) + π(1) + π(2) = 1

P S Sastry, IISc, E1 222, Lecture 19 2020 58/136


0.75 0.5
0 1 0.5 2
0.25

0.25
0.75

◮ We can also write the equations for π as


 
0.75 0.25 0
π(0) π(1) π(2)  0.5 0 0.5  = π(0) π(1) π(2)
0 0.75 0.25

0.75π(0) + 0.5π(1) = π(0)


0.25π(0) + 0.75π(2) = π(1)
0.5π(1) + 0.25π(2) = π(2)

◮ We have to solve these along with π(0) + π(1) + π(2) = 1


P S Sastry, IISc, E1 222, Lecture 19 2020 59/136
0.75 0.5
0 1 0.5 2
0.25

0.25
0.75

1
0.75π(0) + 0.5π(1) = π(0) ⇒ π(1) = π(0)
2
1
0.25π(0) + 0.75π(2) = π(1) ⇒ π(2) = π(0)
3
0.5π(1) + 0.25π(2) = π(2)

1 1
π(0) + π(1) + π(2) = 1 ⇒ π(0) 1 + + =1
2 3

Now, π(0) 1 + 21 + 13 = 1 gives π(0) = 6




6 3 2 11
◮ We get a unique solution: 11 11 11

P S Sastry, IISc, E1 222, Lecture 19 2020 60/136


Example2

1.0 0.5
0 1 0.5 2
1.0

◮ The stationary distribution has to satisfy


 
1.0 0 0
π(0) π(1) π(2)  0.5 0 0.5  = π(0) π(1) π(2)
0 0 1.0

◮ We also have to add the equation π(0) + π(1) + π(2) = 1


◮ We now do not have a unique stationary distribution

P S Sastry, IISc, E1 222, Lecture 19 2020 61/136


Example2
1.0 0.5
0 1 0.5 2
1.0

X
π(y) = π(x)P (x, y), ∀y ∈ S
x∈S

◮ We get the following linear equations


π(0) + 0.5π(1) = π(0) ⇒ π(1) = 0
0.5π(1) + π(2) = π(2) ⇒ π(1) = 0
π(0) + π(1) + π(2) = 1 ⇒ π(0) = 1 − π(2)
◮ Now there are infinitely many solutions.
◮ Any distribution [a 0 1 − a] with 0 ≤ a ≤ 1 is a
stationary distribution
◮ This chain is not irreducible; the previous one is
irreducible
P S Sastry, IISc, E1 222, Lecture 19 2020 62/136
◮ We now explore conditions for existence and uniqueness
of stationary distributions
◮ For finite chains stationary distribution always exists.
◮ For finite irreducible chains it is unique.
◮ But for infinite chains, it is possible that stationary
distribution does not exist.
◮ The stationary distribution, when it exists, is related to
expected fraction of time spent in different states.
◮ When the stationary distribution is unique, we also want
to know if the chain converges to that distribution
starting with any π0 .

P S Sastry, IISc, E1 222, Lecture 19 2020 63/136


◮ Let Iy (Xn ) be indicator of [Xn = y]
Number of visits to y till n: Nn (y) = nm=1 Iy (Xn )
P

◮ The expected number of visits to y till n is


n
X n
X
Gn (x, y) , Ex [Nn (y)] = Ex [Iy (Xn )] = P m (x, y)
m=1 m=1

◮ Expected fraction of time spent in y till n is


n
Gn (x, y) 1 X m
= P (x, y)
n n m=1

◮ We will first establish a limit for the above as n → ∞

P S Sastry, IISc, E1 222, Lecture 19 2020 64/136


◮ Suppose y is transient. Then we have

lim Nn (y) = N (y)


n→∞
and P r[N (y) < ∞] = 1 lim Gn (x, y) = G(x, y) < ∞
n→∞
Nn (y) Gn (x, y)
⇒ lim = 0 (w.p.1) and lim =0
n→∞ n n→∞ n
◮ The expected fraction of time spent in a transient state is
zero.
◮ This is intuitively obvious

P S Sastry, IISc, E1 222, Lecture 19 2020 65/136


◮ Now, let y be recurrent
◮ Then, Py [Ty < ∞] = 1
◮ Define my = Ey [Ty ]
◮ my is mean return time to y
◮ We will show that Nnn(y) converges to m1y if the chain
starts in y.
◮ Convergence would be with probability one.

P S Sastry, IISc, E1 222, Lecture 19 2020 66/136


◮ Consider a chain started in y
◮ let Tyr be time of rth visit to y, r ≥ 1

Tyr = min{n ≥ 1 : Nn (y) = r}

◮ Define Wy1 = Ty1 = Ty and Wyr = Tyr − Tyr−1 , r > 1


◮ Note that Ey [Wy1 ] = Ey [Ty ] = my
◮ Also, Tyr = Wy1 + · · · + Wyr
◮ Wyr are the “waiting times”
◮ By Markovian property we should expect them to be iid
◮ We will prove this.
◮ Then Tyr /r converges to my by law of large umbers

P S Sastry, IISc, E1 222, Lecture 19 2020 67/136


◮ We have

P r[Wy3 = k3 |Wy2 = k2 , Wy1 = k1 ] =


P r[Xk1 +k2 +j 6= y, 1 ≤ j ≤ k3 − 1, Xk1 +k2 +k3 = y | B]
where B = [Xk1 +k2 = y, Xk1 = y, Xj 6= y, j < k1 + k2 , j 6= k1 ]
◮ Using the Markovian property, we get

P r[Wy3 = k3 |Wy2 = k2 , Wy1 = k1 ] =


P r[Xk1 +k2 +j 6= y, 1 ≤ j ≤ k3 − 1, Xk1 +k2 +k3 = y | Xk1 +k2 = y]
= P r[Xj 6= y, 1 ≤ j ≤ k3 − 1, Xk3 = y | X0 = y]
= Py [Wy1 = k3 ]
◮ In general, we get

P r[Wyr = kr | Wyr−1 = kr−1 , · · · , Wy1 = k1 ] = Py [Wy1 = kr ]

P S Sastry, IISc, E1 222, Lecture 19 2020 68/136


◮ This shows the waiting time are iid
X
Py [Wy2 = k2 ] = Py [Wy2 = k2 | Wy1 = k1 ] Py [Wy1 = k1 ]
k1
X
= Py [Wy1 = k2 ] Py [Wy1 = k1 ]
k1

= Py [Wy1 = k2 ]
⇒ identically distributed

Py [Wy2 = k2 , Wy1 = k1 ] = Py [Wy2 = k2 | Wy1 = k1 ]Py [Wy1 = k1 ]


= Py [Wy1 = k2 ] Py [Wy1 = k1 ]
= Py [Wy2 = k2 ] Py [Wy1 = k1 ]
⇒ independent

P S Sastry, IISc, E1 222, Lecture 19 2020 69/136


◮ We have shown Wyr , r = 1, 2, · · · are iid
◮ Since E[Wy1 ] = my , by strong law of large numbers,

k
Tyk 1X r
lim = lim Wy = my , (w.p.1)
k→∞ k k→∞ k
r=1

◮ Note that this is true even if my = ∞

P S Sastry, IISc, E1 222, Lecture 19 2020 70/136


◮ For all n such that Nn (y) ≥ 1, we have

TyNn (y) ≤ n < TyNn (y)+1

◮ Nn (y) is the number of visits to y till time step n


◮ Suppose N50 (y) = 8 – Visited y 8 times till time 50.
◮ So, the 8th visit occurred at or before time 50.
◮ The 9th visit has not occurred till 50.
◮ So, time of 9th visit is beyond 50.

P S Sastry, IISc, E1 222, Lecture 19 2020 71/136


TyNn (y) ≤ n < TyNn (y)+1

◮ Now we have
N (y) N (y)+1
Ty n n Ty n
≤ <
Nn (y) Nn (y) Nn (y)
◮ We know that
◮ As n → ∞, Nn (y) → ∞, w.p.1
Tn
◮ As n → ∞, ny → my , w.p.1
◮ Hence we get
n
lim = my , w.p.1
n→∞ Nn (y)
or
Nn (y) 1
lim = , w.p.1
n→∞ n my
P S Sastry, IISc, E1 222, Lecture 19 2020 72/136
◮ All this is true if the chain started in y.
◮ That means it is true if the chain visits y once.
◮ So, we get

Nn (y) I[T <∞]


lim = y , w.p.1
n→∞ n my

◮ Since 0 ≤ Nnn(y) ≤ 1, almost sure convergence implies


convergence in mean

Gn (x, y) Nn (y) Px [Ty < ∞] ρxy
lim = lim Ex = lim =
n→∞ n n→∞ n n→∞ my my

◮ The fraction of time spent in each recurrent state is


inversely proportional to the mean recurrence time

P S Sastry, IISc, E1 222, Lecture 19 2020 73/136


◮ Thus we have proved the following theorem
◮ Theorem:
Let y be recurrent. Then
1
Nn (y) I[T <∞]
lim = y , w.p.1
n→∞ n my
2
Gn (x, y) ρxy
lim =
n→∞ n my

◮ Note that
n
1 Gn (y, y) 1X m
= lim = lim P (y, y)
my n→∞ n n→∞ n
m=1

P S Sastry, IISc, E1 222, Lecture 19 2020 74/136


◮ The limiting fraction of time spent in a state is inversely
proportional to my , the mean return time.
◮ Intuitively, the stationary probability of a state could be
the limiting fraction of time spent in that state.
◮ Thus π(y) = m1y is a good candidate for stationary
distribution.
◮ We first note that we can have my = ∞.
Though Py [Ty < ∞] = 1, we can have Ey [Ty ] = ∞.
◮ What if my = ∞, ∀y?
◮ Does not seem reasonable for a finite chain.
◮ But for infinite chains??
◮ Let us characterize y for which my = ∞

P S Sastry, IISc, E1 222, Lecture 19 2020 75/136


◮ A recurrent state y is called null recurrent if my = ∞.
◮ y is called positive recurrent if my < ∞
◮ We earlier saw that the fraction of time spent in a
transient state is zero.
◮ Suppose y is null recurrent. Then

Nn (y) 1
lim = =0
n→∞ n my
◮ Thus the limiting fraction of time spent by the chain in
transient and null recurrent states is zero.

P S Sastry, IISc, E1 222, Lecture 19 2020 76/136


◮ Theorem: Let x be positive recurrent and let x lead to
y. Then y is positive recurrent.
Proof
◮ Since x is recurrent and x leads to y we know ∃n0 , n1 s.t.
P n0 (x, y) > 0, P n1 (y, x) > 0 and

P n1 +m+n0 (y, y) ≥ P n1 (y, x)P m (x, x)P n0 (x, y), ∀m

Summing the above for m = 1, 2, · · · n and dividing by n


n n
1 X n1 +m+n0 n1 1X m
P (y, y) ≥ P (y, x) P (x, x) P n0 (x, y), ∀n
n m=1 n m=1

If we now let n → ∞, the RHS goes to


P n1 (y, x) m1x P n0 (x, y) > 0.

P S Sastry, IISc, E1 222, Lecture 19 2020 77/136


n n
1 X n1 +m+n0 n1 1X m
P (y, y) ≥ P (y, x) P (x, x) P n0 (x, y), ∀n
n m=1 n m=1

◮ We can write the LHS of above as


n n +n+n n1 +n0
1 X
n1 +m+n0 1 1X 0 m 1 X
P (y, y) = P (y, y) − P m (y, y)
n m=1 n m=1 n m=1
n1 +n+n
X 0 n1 +n0
n1 + n + n0 1 m 1 X
= P (y, y) − P m (y, y)
n n1 + n + n0 m=1
n m=1

n
1 X n1 +m+n0 1
⇒ lim P (y, y) =
n→∞ n my
m=1
1 1
⇒ ≥ P n1 (y, x) P n0 (x, y) > 0
my mx
which implies y is positive recurrent
P S Sastry, IISc, E1 222, Lecture 19 2020 78/136
◮ Thus, in a closed irreducible set of recurrent states, if one
state is positive recurrent then all are positive recurrent.
◮ Hence, in the partition: SR = C1 + C2 + · · · , each Ci is
either wholly positive recurrent or wholly null recurrent.
◮ We next show that a finite chain cannot have any null
recurrent states.

P S Sastry, IISc, E1 222, Lecture 19 2020 79/136


◮ Let C be a finite closed set of recurrent states.
◮ Suppose all states in C are null recurrent. Then
n
1X m
lim P (x, y) = 0, ∀x, y ∈ C
n→∞ n
m=1

P m (x, y) = 1, ∀m, ∀x ∈ C.
P
◮ Since C is closed, y∈C
◮ Thus we get
n n
1 XX m X1X
1= P (x, y) = P m (x, y), ∀n
n m=1 y∈C y∈C
n m=1

X1X n n
m
X 1X m
⇒ 1 = lim P (x, y) = lim P (x, y) = 0
n→∞
y∈C
n m=1 y∈C
n→∞ n
m=1

where we could take the limit inside the sum because C is


finite.
P S Sastry, IISc, E1 222, Lecture 19 2020 80/136
◮ If C is a finite closed set of recurrent states then all
states in it cannot be null recurrent.
◮ Actually what we showed is that any closed finite set
must have at least one positive recurrent state.
◮ We also showed that in an irreducible set of recurrent
states, since each state leads to any other, if one state is
positive recurrent then all states have to be positive
recurrent.
◮ Hence, in a finite chain, every closed irreducible set of
recurrent states contains only positive recurrent states.
◮ Hence, a finite chain cannot have a null recurrent state.

P S Sastry, IISc, E1 222, Lecture 19 2020 81/136


Example of null recurrent chain

◮ Consider the chain with state space {0, 1, · · · } given by


q(4)

q(2)

0 1 2 3 4 ...

q(1)

q(3)

Here, q(k) ≥ 0, ∀k and ∞


P
k=1 q(k) = 1.

◮ The chain is irreducible.


◮ So, we want to know whether it is transient or recurrent.
◮ We can calculate ρ00 to test this.

P S Sastry, IISc, E1 222, Lecture 19 2020 82/136


Example of null recurrent chain
◮ Consider the chain with state space {0, 1, · · · } given by
q(4)

q(2)

0 1 2 3 4 ...

q(1)

q(3)

We have
P0 [T0 = j + 1] = q(j), j = 1, 2, · · ·
(Note that P0 [T0 = 1] = 0)
X∞ X∞
P0 [T0 < ∞] = P0 [T0 = j] = q(j) = 1
j=2 j=1

◮ So, the chain is recurrent.


◮ We want to know whether it is positive recurrent or null
P S Sastry, IISc, E1 222, Lecture 19 2020 83/136
Example of null recurrent chain
q(4)

q(2)

0 1 2 3 4 ...

q(1)

q(3)


X ∞
X ∞
X
m0 = j Po [T0 = j] = j q(j − 1) = (j + 1)q(j)
j=2 j=2 j=1

◮ So, m0 = ∞ if the q(·) distribution has infinite


expectation. For example, if q(k) = kc2
◮ Then state 0 is null recurrent. Implies chain is null
recurrent
P S Sastry, IISc, E1 222, Lecture 19 2020 84/136
◮ Suppose π is a stationary distribution.
◮ Then π(y) = 0 if y is transient or null recurrent
◮ We prove this as follows
X
π(y) = π(x)P m (x, y) ∀m
x∈S
n n
1 XX m
X 1X m
⇒ π(y) = π(x)P (x, y) = π(x) P (x, y)
n m=1 x x
n m=1
n
X1X m
⇒ π(y) = lim π(x) P (x, y)
n→∞
x∈S
n m=1

◮ The proof is complete if we can take the limit inside the


sum. (Remember that S may be infinite)

P S Sastry, IISc, E1 222, Lecture 19 2020 85/136


Bounded Convergence Theorem

◮ Bounded Convergence Theorem:


P
1. Suppose a(x) ≥ 0, ∀x ∈ S and x a(x) < ∞.
2. Let bn (x), x ∈ S be such that |bn (x)| ≤ K, ∀x, n and

lim bn (x) = b(x), ∀x ∈ S.


n→∞

Then
X X X
lim a(x) bn (x) = a(x) lim bn (x) = a(x) b(x)
n→∞ n→∞
x∈S x∈S x∈S

P S Sastry, IISc, E1 222, Lecture 19 2020 86/136


◮ Bounded Convergence P Theorem: Suppose
a(x) ≥ 0, ∀x ∈ S and x a(x) < ∞. Let bn (x), x ∈ S
be such that |bn (x)| ≤ K, ∀x, n and suppose
limn→∞ bn (x) = b(x), ∀x ∈ S. Then
X X X
lim a(x) bn (x) = a(x) lim bn (x) = a(x) b(x)
n→∞ n→∞
x∈S x∈S x∈S
◮ We had
n
X 1X m
π(y) = lim π(x) P (x, y)
n→∞
x
n m=1
◮ We have
n
X 1X m
π(x) ≥ 0; π(x) = 1; 0 ≤ P (x, y) ≤ 1, ∀x
x
n m=1
◮ Hence, if y is transient or null recurrent, then
n
X 1X m
π(y) = π(x) lim P (x, y) = 0
n→∞ n
x m=1
P S Sastry, IISc, E1 222, Lecture 19 2020 87/136
◮ In any stationary distribution π, we would have π(y) = 0
is y is transient or null recurrent.
◮ Hence an irreducible transient or null recurrent chain
would not have a stationary distribution.
◮ The null recurrent chain we considered earlier is an
example of a Markov chain that does not have a
stationary distribution.

P S Sastry, IISc, E1 222, Lecture 19 2020 88/136


◮ Theorem An irreducible positive recurrent chain has a
unique stationary distribution given by
1
π(y) = , ∀y ∈ S
my
P
◮ Suppose π is such that π(y) = x π(x)P (x, y). Then
X
π(y) = π(x) P m (x, y), ∀m
x
n
X 1X m
⇒ π(y) = π(x) P (x, y), ∀n
x
n m=1
n
X 1X m
⇒ π(y) = lim π(x) P (x, y)
n→∞
x
n m=1
n
X 1X m
⇒ π(y) = π(x) lim P (x, y)
n→∞ n
x m=1
X 1 1
= π(x) =
x
my my
P S Sastry, IISc, E1 222, Lecture 19 2020 89/136
1
P
◮ To complete the proof, we need to show y my = 1.
◮ We skip this step in the proof.
◮ The theorem shows that an irreducible positive recurrent
chain has a unique stationary distribution
◮ Corollary: An irreducible finite chain has a unique
stationary distribution
◮ Corollary: An irreducible chain has a stationary
distribution if and only if it is positive recurrent

P S Sastry, IISc, E1 222, Lecture 19 2020 90/136


◮ If π 1 and π 2 are stationary distributions, then so is
απ 1 + (1 − α)π 2 (easily verified)
◮ Let C be a closed irreducible set of positive recurrent
states.
Then there is a unique stationary distribution π that
satisfies π(y) = 0, ∀y ∈ / C.
◮ Any other stationary distribution of the chain is a convex
combination of the stationary distributions concentrated
on each of the closed irreducible sets of positive recurrent
states.
◮ This answers all questions about existence and uniqueness
of stationary distributions

P S Sastry, IISc, E1 222, Lecture 19 2020 91/136


◮ Consider an irreducible positive recurrent chain.
◮ It P
has a unique stationary distribution and
1 n m
n m=1 P (x, y) converges to π(y).
◮ The next question is convergence of πn
X
lim πn (y) = lim π0 (x) P n (x, y) =?
n→∞ n→∞
x

◮ If P n (x, y) converges to g(y) then that would be the


stationary distribution and πn converges to it
But, n1 nm=1 am may have a limit though limn→∞ an
P

may not exist.
For example, an = (−1)n

P S Sastry, IISc, E1 222, Lecture 19 2020 92/136


◮ Consider a chain with transition probabilities
 
0 1 0 0
 1/3 0 2/3 0 
P = 0 2/3 0 1/3 

0 0 1 0

One can show π T = 81 38 38 18



◮ However, P n goes to different limits based on whether n


is even or odd

P S Sastry, IISc, E1 222, Lecture 19 2020 93/136


◮ The chain is the folowing
1/3 2/3 1

0 1 2 3
1 2/3 1/3

   
0 1 0 0 1/3 0 2/3 0
 1/3 0 2/3 0  2  0 7/9 0 2/9
 
 0 2/3 0 1/3  , P =  2/9 0 7/9 0
P =  

0 0 1 0 0 2/3 0 1/3

◮ We can return to a state only after even number of time


steps
◮ That is why P n does not go to a limit
◮ Such a chain is called a periodic chain

P S Sastry, IISc, E1 222, Lecture 19 2020 94/136


◮ We define period of a state x as

dx = gcd{n ≥ 1 : P n (x, x) > 0}

◮ If P (x, x) > 0 then dx = 1


◮ If x leads to y and y leads to x, then dx = dy
◮ Let P n1 (x, y) > 0, P n2 (y, x) > 0.
Then P n1 +n2 (x, x) > 0 ⇒ dx divides n1 + n2 .
◮ For any n s.t. P n (y, y) > 0, we get pn1 +n+n2 (x, x) > 0
◮ Hence, dx divides n for all n s.t. P n (y, y) > 0 ⇒ dx ≤ dy
◮ Similarly, dy ≤ dx and hence dy = dx
◮ All states in an irreducible chain have the same period.
◮ If the period is 1 then chain is called aperiodic

P S Sastry, IISc, E1 222, Lecture 19 2020 95/136


◮ The extra condition we need for convergence of πn is
aperiodicity
◮ For an aperiodic, irreducible, positive recurrent chain,
there is a unique stationary distribution and πn converges
to it irrespective of what π0 is.
◮ An aperiodic, irreducible, positive recurrent chain is called
an ergodic chain

P S Sastry, IISc, E1 222, Lecture 19 2020 96/136


Example
◮ Consider the umbrella problem
 
0 1 2 3 4
 0 0 0 0 0 1 
 
 1 0 0 0 1−p p 
P =  2

 0 0 1 − p p 0 

 3 0 1−p p 0 0 
4 1−p p 0 0 0

0 1 2 3 4

◮ This is an irreducible, aperiodic positive recurrent chain


P S Sastry, IISc, E1 222, Lecture 19 2020 97/136
◮ We want to calculate the probability of getting caught in
a rain without an umbrella.
◮ This would be the steady state probability of state 0
multiplied by p
◮ We are using the fact that this chain converges to the
stationary distribution starting with any initial
probabilities.

P S Sastry, IISc, E1 222, Lecture 19 2020 98/136


 
0 1 2 3 4

 0 0 0 0 0 1 

 1 0 0 0 1−p p 
P = 

 2 0 0 1−p p 0 

 3 0 1−p p 0 0 
4 1−p p 0 0 0
The stationary distribution satisfies π T P = π T
π(0) = (1 − p)π(4)
π(1) = (1 − p)π(3) + pπ(4) ⇒ π(3) = π(1)
π(2) = (1 − p)π(2) + pπ(3)
π(3) = (1 − p)π(1) + pπ(2) ⇒ π(2) = π(1)
π(4) = π(0) + pπ(1) ⇒ π(4) = π(1)
This gives 4π(1) + (1 − p)π(1) = 1 and hence
1 1−p
π(i) = i = 1, 2, 3, 4 and π(0) =
5−p 5−p
P S Sastry, IISc, E1 222, Lecture 19 2020 99/136
P S Sastry, IISc, E1 222, Lecture 19 2020 100/136
Birth-Death chains

◮ The following is a finite birth-death chain


q
i
p 1-q
0 N
1-p 0
0 ... i-1 i i+1 N
...
p q
i N

ri

◮ We assume 0 < pi , qi < 1, ∀i.


◮ Then the chain is irreducible, positive recurrent
◮ It is also aperiodic
◮ We can derive a general form for its stationary
probabilities

P S Sastry, IISc, E1 222, Lecture 19 2020 101/136


birth-death chains – stationary distribution
q
i
p 1-q
0 N
1-p 0
0 ... i-1 i i+1 N
...
p q
i N

ri

X
π(y) = π(x)P (x, y)
x

π(0) = π(0)(1 − p0 ) + π(1)q1


⇒ π(1)q1 − π(0)p0 = 0
π(1) = π(0)p0 + π(1)(1 − p1 − q1 ) + π(2)q2
⇒ π(1)q1 − π(0)p0 = π(2)q2 − π(1)p1
⇒ π(2)q2 − π(1)p1 = 0
π(2) = π(1)p1 + π(2)(1 − p2 − q2 ) + π(3)q3
⇒ π(2)q2 − π(1)p1 = π(3)q3 − π(2)p2 = 0

P S Sastry, IISc, E1 222, Lecture 19 2020 102/136


◮ For any x, the relevant part of the chain is
q q
x x+1

q
x-1
x
...
x-1
...
x+1 p
p p x+1
x-1 x

rx-1 rx rx+1

◮ We get

π(x) = π(x − 1)px−1 + π(x)(1 − px − qx ) + π(x + 1)qx+1

◮ This gives us the general recurrence

π(x + 1)qx+1 − π(x)px = π(x)qx − π(x − 1)px−1 = 0

P S Sastry, IISc, E1 222, Lecture 19 2020 103/136


◮ Thus we get
p0
π(1)q1 − π(0)p0 = 0 ⇒ π(1) = π(0)
q1
p1 p0 p1
π(2)q2 − π(1)p1 = 0 ⇒ π(2) = π(1) = π(0)
q2 q1 q2
◮ Iterating like this, we get
p0 p1 · · · pn−1
π(n) = ηn π(0), where ηn = , n = 1, 2, · · · , N
q 1 q2 · · · qn
PN
◮ With η0 = 1, we get π(0) j=0 ηj = 1 and hence
1
π(0) = PN and π(n) = ηn π(0), n = 1, · · · , N
j=0 ηj
◮ Note that this process is applicable even for infinite chains
with state space {0, 1, 2, · · · } (but there may not be a
solution)
P S Sastry, IISc, E1 222, Lecture 19 2020 104/136
◮ Consider a birth-death chain
q q
x x+1

q
x-1
x
...
x-1
...
x+1 p
p p x+1
x-1 x

rx-1 rx rx+1

◮ The chain may be infinite or finite


◮ Let a, b ∈ S with a < b. Assume px , qx > 0, a < x < b.
◮ Define

U (x) = Px [Ta < Tb ], a < x < b, U (a) = 1, U (b) = 0

◮ We want to derive a formula for U (x)


◮ This can be useful, e.g., in the gambler’s ruin chain

P S Sastry, IISc, E1 222, Lecture 19 2020 105/136


q q
x x+1

q
x-1
x
...
x-1
...
x+1 p
p p x+1
x-1 x

rx-1 rx rx+1

U (x) = Px [Ta < Tb ] = P r[Ta < Tb |X0 = x]


x+1
X
= P r[Ta < Tb |X1 = y] P r[X1 = y|X0 = x]
y=x−1
= U (x − 1)qx + U (x)rx + U (x + 1)px
= U (x − 1)qx + U (x)(1 − px − qx ) + U (x + 1)px

⇒ qx [U (x) − U (x − 1)] = px [U (x + 1) − U (x)]


qx
⇒ U (x + 1) − U (x) = [U (x) − U (x − 1)]
px

P S Sastry, IISc, E1 222, Lecture 19 2020 106/136


qx
U (x + 1) − U (x) = [U (x) − U (x − 1)]
px
qx qx−1
= [U (x − 1) − U (x − 2)]
px px−1
qx qx−1 · · · qa+1
= [U (a + 1) − U (a)]
px px−1 · · · pa+1

qy qy−1 · · · qa+1
Let γy = , a < y < b, γa = 1
py py−1 · · · pa+1
Now we get
γx
U (x + 1) − U (x) = [U (a + 1) − U (a)]
γa

P S Sastry, IISc, E1 222, Lecture 19 2020 107/136


γx
U (x + 1) − U (x) = [U (a + 1) − U (a)]
γa
◮ By taking x = b − 1, b − 2, · · · , a + 1, a,
γb−1
U (b) − U (b − 1) = [U (a + 1) − U (a)]
γa
γb−2
U (b − 1) − U (b − 2) = [U (a + 1) − U (a)]
γa
..
.
γa
U (a + 1) − U (a) = [U (a + 1) − U (a)]
γa
◮ Adding all these we get
b−1
1 X
0 − 1 = U (b) − U (a) = [U (a + 1) − U (a)] γx
γa x=a
γa
⇒ U (a) − U (a + 1) = Pb−1
x=a γx
P S Sastry, IISc, E1 222, Lecture 19 2020 108/136
◮ Using these, we get
γx
U (x) − U (x + 1) = [U (a) − U (a + 1)]
γa
γx γa γx
= Pb−1 = Pb−1
γa x=a γx x=a γx
◮ Putting x = b − 1, b − 2, · · · , y in the above
γb−1
U (b − 1) − U (b) = Pb−1
x=a γx
γb−2
U (b − 2) − U (b − 1) = Pb−1
x=a γx
..
.
γy
U (y) − U (y + 1) = Pb−1
x=a γx
◮ Adding these we get
Pb−1
x=y γx
U (y) − U (b) = U (y) = Pb−1 , a < y < b
x=a γx
P S Sastry, IISc, E1 222, Lecture 19 2020 109/136
◮ We are considering birth-death chains
q q
x x+1

q
x-1
x
...
x-1
...
x+1 p
p p x+1
x-1 x

rx-1 rx rx+1

◮ We have derived, for a < y < b,


Pb−1
x=y γx qx qx−1 · · · qa+1
U (y) = Py [Ta < Tb ] = Pb−1 , γx =
x=a γx
px px−1 · · · pa+1

◮ Hence we also get


Py−1
γx
Py [Tb < Ta ] = Px=a
b−1
x=a γx

P S Sastry, IISc, E1 222, Lecture 19 2020 110/136


◮ Suppose this is a Gambler’s ruin chain:
px = p, qx = q, ∀x
q q q
1
p p p
... ... 1

1 i-1 i i+1
0 N

x
q
◮ Then, γx = p
◮ Hence, for a Gambler’s ruin chain we get, e.g.,
i
q
Pi−1
γ p
−1
x
Pi [TN < T0 ] = PNx=0
−1
= N
x=0 γx

q
p
−1

◮ This is the probability of gambler being successful

P S Sastry, IISc, E1 222, Lecture 19 2020 111/136


◮ Consider the following chain over {0, 1, · · · }
q
i
p
0
1-p 0
0 ... i-1 i i+1
...
p
i

ri

◮ This is an infinite irreducible birth-death chain


◮ We want to know whether the chain is transient or
recurrent etc.
◮ We can use the earlier analysis for this too.
Pn−1
γx
P1 [T0 < Tn ] = Px=1 n−1 , ∀n > 1
x=0 γx
Pn−1
x=0 γx − γ0 1
= Pn−1 = 1 − Pn−1
x=0 γx x=0 γx

P S Sastry, IISc, E1 222, Lecture 19 2020 112/136


◮ Consider this chain started in state 1.

[T0 < Tn ] ⊂ [T0 < Tn+1 ], n = 2, 3, · · ·

since the chain cannot hit n + 1 without hitting n.


◮ Also, 1 ≤ T2 < T3 < · · · < Tn
◮ Hence [T0 < ∞] is same as [T0 < Tn , for some n]

P S Sastry, IISc, E1 222, Lecture 19 2020 113/136


◮ Consider this chain started in state 1.

[T0 < Tn ] ⊂ [T0 < Tn+1 ], n = 2, 3, · · ·

since the chain cannot hit n + 1 without hitting n.


◮ Also, 1 ≤ T2 < T3 < · · · < Tn
◮ Hence [T0 < ∞] is same as [T0 < Tn , for some n]

P1 [T0 < Tn , for some n] = P1 (∪n>1 [T0 < Tn ])



= P1 lim [T0 < Tn ]
n→∞
= lim P1 ([T0 < Tn ])
n→∞
1
= lim 1 − Pn−1
n→∞
x=0 γx
1
⇒ P1 [T0 < ∞] = 1 − P∞
x=0 γx

P S Sastry, IISc, E1 222, Lecture 19 2020 114/136


Theorem: The chain is recurrent iff ∞
P
x=0 γx = ∞

Proof
◮ Supoose chain is recurrent. Since it is irreducible,


X
P1 [T0 < ∞] = 1 ⇒ γx = ∞
x=0

P∞
◮ Suppose x=0 γx = ∞ ⇒ P1 [T0 < ∞] = 1

P0 [T0 < ∞] = P (0, 0) + P (0, 1) P1 [T0 < ∞]


= P (0, 0) + P (0, 1) = 1

◮ Implies state 0 is recurrent and hence the chain is


recurrent because it is irreducible.
◮ Note that we have used the fact that the chain is infinite
only to the right.
P S Sastry, IISc, E1 222, Lecture 19 2020 115/136
P∞
◮ The chain is transient if x=0 γx < ∞
x
q
◮ Let px = p, qx = q ⇒ γx = p

∞ x
X q
Transient if <∞ ⇔ q<p
x=0
p

∞ x
X q
Recurrent if =∞ ⇔ q≥p
x=0
p
◮ Intuitively clear
◮ This chain with q < p is an example of an irreducible
chain that is wholly transient

P S Sastry, IISc, E1 222, Lecture 19 2020 116/136


P∞ q x
◮ We know the chain is recurrent if x=0 p
=∞
◮ When will this chain be positive recurrent?
◮ We know that an irreducible chain is positive recurrent if
and only if it has a stationary distribution.
◮ We can check if it has a stationary distribution
◮ The equations that we derived earlier hold for this infinite
case also.

P S Sastry, IISc, E1 222, Lecture 19 2020 117/136


◮ We derived earlier the equations that a stationary
distribution of this chain (if it exists) has to satisfy
p0 p1 · · · pn−1
π(n) = ηn π(0), where ηn = , n = 1, 2, · · · ,
q 1 q2 · · · qn

Setting η0 = 1, we get π(0) ∞


P
j=0 ηj = 1

Hence stationary distribution exists iff ∞


P
j=0 ηj < ∞

◮ Let px = p, qx = q
∞ ∞ j
X X p
ηj = <∞ ⇔ p<q
j=0 j=0
q

◮ Thus in this special case, the chain is


◮ transient if p > q; recurrent if p ≤ q
◮ positive recurrent if p < q
◮ null recurrent if p = q
P S Sastry, IISc, E1 222, Lecture 19 2020 118/136
◮ This analysis can handle chains which are infinite in one
direction
◮ Consider the following random walk chain
1-p 1-p

1-p
0 +1
... ...
-1 p p p

◮ The state space here is {· · · , −1, 0, +1, · · · }


◮ The chain is irreducible and periodic with period 2
◮ P 2n+1 (0, 0) = 0 and P 2n (0, 0) = 2n Cn pn (1 − p)n .
State 0 is recurrent if n P 2n (0, 0) = ∞
P

◮ We can show that the chain is transient if p 6= 0.5 and is


recurrent if p = 0.5.

P S Sastry, IISc, E1 222, Lecture 19 2020 119/136


1-p 1-p

1-p
0 +1
... ...
-1 p p p


◮ We use Stirling approximation: n! ∼ nn+0.5 e−n 2π.

(2n)! n
P 2n (0, 0) ∼ 2n
Cn pn (1 − p)n = p (1 − p)n
√ n!n!
(2n)2n+0.5 e−2n 2π
= (p(1 − p))n
nn+0.5 nn+0.5 e−n e−n (2π)

22n 2n
= √ (p(1 − p))n
n 2π
(4p(1 − p))n
= √
πn

P S Sastry, IISc, E1 222, Lecture 19 2020 120/136


◮ We got
(4p(1 − P ))n
P 2n (0, 0) ∼ √
πn
P P
◮ When an ∼ bn , we have n an = ∞ iff n bn = ∞
◮ So,
P for state 0 to be recurrent, we need
(4p(1−P ))n
n

πn
= ∞.
P αn
◮ If p 6= 0.5, then 4p(1 − p) < 1 and n √ n
< ∞ if α < 1.
P 2n
◮ Hence, if p 6= 0.5, then, n P (0, 0) < ∞
◮ If
Pp = 2n 0.5 then 4p(1 − p) = 1 and hence
n P (0, 0) = ∞.
◮ The chain is recurrent only when p = 0.5

P S Sastry, IISc, E1 222, Lecture 19 2020 121/136


◮ The one dimensional random walk chain is
1-p 1-p

1-p
0 +1
... ...
-1 p p p

◮ This chain is transient if p 6= 0.5 and is recurrent if


p = 0.5.

P S Sastry, IISc, E1 222, Lecture 19 2020 122/136


◮ Let {Xn , n ≥ 0} be an irreducible markov chain on a
finite state space S with stationary distribution π.
◮ Let r : S → ℜ be a bounded function.
◮ Suppose we want E[r(X)] P with respect to the stationary
distribution π (E[r(X)] = j∈S r(j)π(j))
◮ Let Nn (j) be as earlier. Then
n
1X 1X
r(Xm ) = Nn (j)r(j)
n m=1 n j∈S

n
1X X Nn (j) X
⇒ lim r(Xm ) = r(j) lim = r(j)π(j)
n→∞ n n→∞ n
m=1 j∈S j∈S

◮ This is known as the ergodic theorem for Markov Chains

P S Sastry, IISc, E1 222, Lecture 19 2020 123/136


MCMC Sampling
◮ Consider a distribution over (finite) S: π(x) = b(x)
Z
P
◮ Since this is a distribution, Z = x∈S b(x)
◮ We assume, we can efficiently calculate b(x) for any x
but computation of Z is intractable or computationally
expensive
E.g., the Boltzmann distribution: b(x) = e−E(x)/KT
◮ We want E[g(X)] w.r.t. distribution π (for any g)
n
X 1X
E[g(X)] = g(x) π(x) ≈ g(Xi ), X1 , · · · Xn ∼ π
x
n i=1

This is the Monte Carlo method for expectations.


◮ One way to generate samples is to design an ergodic
markov chain with stationary distribution π
– Markov Chain Monte Carlo sampling
P S Sastry, IISc, E1 222, Lecture 19 2020 124/136
◮ Suppose {Xn } is a an irreducible, aperiodic positive
recurrent Markov chain with stationary dist π(x) = b(x)
Z
◮ Then we have
n
1X X
lim g(Xm ) = g(x)π(x)
n→∞ n
m=1 x

◮ hence, if we can design a Markov chain with a given


stationary distribution, we can use that to calculate the
expectation.

P S Sastry, IISc, E1 222, Lecture 19 2020 125/136


◮ We can use the chain to generate samples from
distribution π
We can approximate the expectation as
n
X 1X
g(x)π(x) ≈ g(XM +i )
x
n i=1

with M is large (so that chain is in steady state)


When we take sample mean, n1 ni=1 Zi , we want Zi to
P

be uncorrelated
◮ We can, for example, use
n
X 1X
g(x)π(x) ≈ g(XM +Ki )
x
n i=1

◮ For all these, we need to design a Markov chain with π as


stationary distribution
P S Sastry, IISc, E1 222, Lecture 19 2020 126/136
◮ Let Q = [q(i, j)] be the transition probability matrix of an
irreducible Markov chain over S.
◮ Q is called the proposal distribution
◮ We start with arbitray X0 and generate
Xn+1 , n = 0, 1, 2, · · · , iteratively as follows
◮ If Xn = i, we generate Y with P r[Y = k] = q(i, k)
◮ Let the generated value for Y be j. Set

j with probability α(i, j)
Xn+1 =
Xn with probability 1 − α(i, j)

◮ α(i, j) is called the acceptance probability


◮ We want to choose α(i, j) to make Xn an ergodic
markov chain with stationary probabilities π

P S Sastry, IISc, E1 222, Lecture 19 2020 127/136


◮ The stationary distribution π satisfies (with transition
probabilities P )
X
π(y) = π(x) P (x, y), ∀y ∈ S
x

◮ Suppose there is a distribution g(·) that satisfies


g(y) P (y, x) = g(x) P (x, y), ∀x, y ∈ S
This is called detailed balance
◮ Summing both sides above over x give
X X
g(y) = g(y) P (y, x) = g(x)P (x, y), ∀y
x x

◮ Thus if g(·) satisfies detailed balance, then it must be the


stationary distribution
◮ Note that it is not necessary for a stationary distribution
to satisfy detailed balance
P S Sastry, IISc, E1 222, Lecture 19 2020 128/136
◮ Any stationary distribution has to satisfy
X
π(y) = π(x) P (x, y), ∀y ∈ S
x

◮ If I can find a π that satisfies

π(x)P (x, y) = π(y)P (y, x), ∀x, y ∈ S, x 6= y

that would be the stationary distribution


◮ This is called detailed balance

P S Sastry, IISc, E1 222, Lecture 19 2020 129/136


◮ Recall our algorithm for generating Xn , n = 0, 1, · · ·
◮ Start with arbitrary X0 and generate Xn+1 from Xn
◮ If Xn = i, we generate Y with P r[Y = k] = q(i, k)
◮ Let the generated value for Y be j. Set

j with probability α(i, j)
Xn+1 =
Xn with probability 1 − α(i, j)

◮ Hence the transition probabilities for Xn are


P (i, j) = q(i, j) α(i, j), i 6= j
X
P (i, i) = q(i, i) + q(i, j) (1 − α(i, j))
j6=i

◮ π(i) = b(i)/Z is the desired stationary distribution


◮ So, we can try to satisfy
π(i) P (i, j) = π(j) P (j, i), ∀i, j, i 6= j
that is, b(i)q(i, j) α(i, j) = b(j)q(j, i) α(j, i)

P S Sastry, IISc, E1 222, Lecture 19 2020 130/136


◮ We want to satisfy

b(i)q(i, j) α(i, j) = b(j)q(j, i) α(j, i)

◮ Choose

π(j)q(j, i) b(j)q(j, i)
α(i, j) = min ,1 = min ,1
π(i)q(i, j) b(i)q(i, j)
◮ Note that one of α(i, j), α(j, i) is 1

π(j)q(j, i)
suppose α(i, j) = < 1
π(i)q(i, j)
⇒ π(i) q(i, j) α(i, j) = π(j) q(j, i)
= π(j) q(j, i) α(j, i)

◮ Note that π(i) above can be replaced by b(i)

P S Sastry, IISc, E1 222, Lecture 19 2020 131/136


Metropolis-Hastings Algorithm
◮ Start with arbitrary X0 and generate Xn+1 from Xn
◮ If Xn = i, we generate Y with P r[Y = k] = q(i, k)
◮ Let the generated value for Y be j. Set

j with probability α(i, j)
Xn+1 =
Xn with probability 1 − α(i, j)

Where Q = [q(i, j)] is the transition probabilities of an


irreducible chain and

π(j)q(j, i)
α(i, j) = min ,1
π(i)q(i, j)
◮ Then {Xn } would be an irreducible, aperiodic chain with
stationary distribution π.
◮ Q is called the proposal chain and α(i, j) is called
acceptance probabilities
P S Sastry, IISc, E1 222, Lecture 19 2020 132/136
◮ Consider Boltzmann distribution: b(x) = e−E(x)/KT
◮ Take proposal to be uniform: from any state, we go to all
other states with equal probabilities
◮ Then,

b(y)
, 1 = min e−(E(y)−E(x))/KT , 1

α(x, y) = min
b(x)
◮ In state x you generate a random new state y.
If E(y) ≤ E(x) you always go there;
if E(y) > E(x), accept with probability e−(E(y)−E(x))/KT
◮ An interesting way to simulate Boltzmann distribution
◮ We could have chosen Q to be ‘uniform over neighbours’

P S Sastry, IISc, E1 222, Lecture 19 2020 133/136


◮ Suppose E : S → ℜ is some function.
◮ We want to find x ∈ S where E is globally minimized.
◮ A gradient descent type method tries to find a locally
minimizing direction and hence gives only a ‘local’
minimum.
◮ The Metropolis-Hastings algorithm gives another view
point on how such optimization problems can be handled.
◮ We can think of E as the energy function in a Boltzmann
distribution

P S Sastry, IISc, E1 222, Lecture 19 2020 134/136


◮ Let b(x) = e−E(x)/T where T is a parameter called
‘temparature’
◮ {Xn } be Markov chain with stationary dist π(x) = b(x)Z
◮ We can find relative occupation of different states by the
chain by collecting statistics during steady state
◮ We know
π(x1 ) b(x1 )
= = e−(E(x1 )−E(x2 ))/T
π(x2 ) b(x2 )
◮ We spend more time in global minimum
We can increase the relative fraction of time spent in
global minimum by decreasing T (There is a price to pay!)
◮ Gives rise to interesting optimization technique called
simulated annealing

P S Sastry, IISc, E1 222, Lecture 19 2020 135/136


◮ In most applications of MCMC, x ∈ S is a vector.
◮ One normally changes one component at a time. That is
how neighbours can be defined
◮ A special case of proposal distribution is the conditional
distribution.
◮ Suppose X = (X1 , · · · , XN ). To propose a value for Xi ,
we use fXi |X−i (= fXi |X1 ,··· ,Xi−1 ,Xi+1 ,··· ,XN )
◮ Here the conditional distribution is calculated using the
target π as the joint distribution.
◮ With such a proposal distribution, one can show that
α(i, j) is always 1
◮ This is known as Gibbs sampling

P S Sastry, IISc, E1 222, Lecture 19 2020 136/136

You might also like