Markov Chains
Markov Chains
Andreas Klappenecker
1 / 58
Stochastic Processes
2 / 58
Markov Chains
Definition
A discrete time process X “ tX0 , X1 , X2 , X3 , . . .u is called a Markov chain if and
only if the state at time t merely depends on the state at time t ´ 1. More
precisely, the transition probabilities
In other words, Markov chains are “memoryless” discrete time processes. This
means that the current state (at time t ´ 1) is sufficient to determine the
probability of the next state (at time t). All knowledge of the past states is
comprised in the current state.
3 / 58
Homogeneous Markov Chains
Definition
A Markov chain is called homogeneous if and only if the transition
probabilities are independent of the time t, that is, there exist
constants Pi,j such that
4 / 58
Discrete State Space
Definition
We say that a Markov chain has a discrete state space if and
only if the set of values of the random variables is countably infinite
tv0 , v1 , v2 , . . .u.
t0, 1, 2, . . .u.
5 / 58
Finite State Space
Definition
We say that a Markov chain is finite if and only if the set of values
of the random variables is a finite set
tv0 , v1 , v2 , . . . , vn´1 u.
6 / 58
Transition Probabilities
This is the probability that the chain moves from state i to state j
in exactly m steps.
8 / 58
Small Example
Example
¨ 1 3
˛ ¨ ˛
0 4 0 4 0.00172635 0.00268246 0.992286 0.00330525
˚ 1 0 1 1 ‹ ˚ 0.00139476 0.00216748 0.993767 0.00267057 ‹
P“˚ 2 3 6 ‹ P 20 “˚ ‹
˝ 0 0 1 0 ‚ ˝ 0 0 1 0 ‚
1 1 1
0 2 4 4
0.00132339 0.00205646 0.994086 0.00253401
9 / 58
Graphical Representation
A Markov chain with state space V and transition matrix P can be represented
by a labeled directed graph G “ pV , E q, where edges are given by transitions
with nonzero probability
Self-loops are allowed in these directed graphs, since we might have Pu,u ą 0.
10 / 58
Example of a Graphical Representation
1{3
1 2
1 3
¨ ˛
0 4 0 4
1
0 31 1
1{
1{2
2
˚ ‹
P “˝ 2 6
1{
˚ ‹
1{4
1{4
0 0 1 0‚
6
0 12 14 14
0 3{4
3
4
1{
11 / 58
Irreducible Markov Chains
12 / 58
Accessible States
If two states i and j are accessible from each other, then we say
that they communicate and we write i Ø j.
13 / 58
Irreducible Markov Chains
Proposition
The communication relation is an equivalence relation.
14 / 58
Irreducible Markov Chains
Proposition
A finite Markov chain is irreducible if and only if its graph
representation is a strongly connected graph.
15 / 58
Exercise
1
1{3
¨
0 1
0 3
˛ 1 2
4 4
˚ 1 0 1 1
1{
1{2
‹
2
P “˚ 2 3 6
1{
‹
1{4
1{4
˝0 0 1 0‚
6
0 12 14 14
0 3{4 3
4
1{
Question
Is this Markov chain irreducible?
16 / 58
Exercise
1
1{3
¨
0 1
0 3
˛ 1 2
4 4
˚ 1 0 1 1
1{
1{2
‹
2
P “˚ 2 3 6
1{
‹
1{4
1{4
˝0 0 1 0‚
6
0 12 14 14
0 3{4 3
4
1{
Question
Is this Markov chain irreducible?
Answer
No, since no other state can be reached from 2.
16 / 58
Exercise
1{3
1 2
1{
1{2
2
1 3
¨ ˛
1
0 0
1{
1{4
1{4
4 4
6
˚ 1 0 1 ‹ 1
P “˚ 2 3 ‹ 6
˝0 0 0 1‚
0 3{4
3
0 12 14 14
4
1{
Question
Is this Markov chain irreducible?
17 / 58
Exercise
1{3
1 2
1{
1{2
2
1 3
¨ ˛
1
0 0
1{
1{4
1{4
4 4
6
˚ 1 0 1 ‹ 1
P “˚ 2 3 ‹ 6
˝0 0 0 1‚
0 3{4
3
0 12 14 14
4
1{
Question
Is this Markov chain irreducible?
Answer
Yes.
17 / 58
Periodic and Aperiodic Markov Chains
18 / 58
Period
Definition
The period dpkq of a state k of a homogeneous Markov chain with
transition matrix P is given by
m
dpkq “ gcdtm ě 1 : Pk,k ą 0u.
19 / 58
Exercise
1{3
1 2
1{
1{2
2
1 3
¨ ˛
1
0 0
1{
1{4
1{4
4 4
6
˚ 1 0 1 ‹ 1
P “˚ 2 3 ‹ 6
˝0 0 0 1‚
0 3{4
3
0 12 14 14
4
1{
Question
What is the period of each state?
20 / 58
Exercise
1{3
1 2
1{
1{2
2
1 3
¨ ˛
1
0 0
1{
1{4
1{4
4 4
6
˚ 1 0 1 ‹ 1
P “˚ 2 3 ‹ 6
˝0 0 0 1‚
0 3{4
3
0 12 14 14
4
1{
Question
What is the period of each state?
dp0q “
20 / 58
Exercise
1{3
1 2
1{
1{2
2
1 3
¨ ˛
1
0 0
1{
1{4
1{4
4 4
6
˚ 1 0 1 ‹ 1
P “˚ 2 3 ‹ 6
˝0 0 0 1‚
0 3{4
3
0 12 14 14
4
1{
Question
What is the period of each state?
dp0q “ 1, dp1q “
20 / 58
Exercise
1{3
1 2
1{
1{2
2
1 3
¨ ˛
1
0 0
1{
1{4
1{4
4 4
6
˚ 1 0 1 ‹ 1
P “˚ 2 3 ‹ 6
˝0 0 0 1‚
0 3{4
3
0 12 14 14
4
1{
Question
What is the period of each state?
1{
1{2
2
1 3
¨ ˛
1
0 0
1{
1{4
1{4
4 4
6
˚ 1 0 1 ‹ 1
P “˚ 2 3 ‹ 6
˝0 0 0 1‚
0 3{4
3
0 12 14 14
4
1{
Question
What is the period of each state?
1{
1{2
2
1 3
¨ ˛
1
0 0
1{
1{4
1{4
4 4
6
˚ 1 0 1 ‹ 1
P “˚ 2 3 ‹ 6
˝0 0 0 1‚
0 3{4
3
0 12 14 14
4
1{
Question
What is the period of each state?
21 / 58
Exercise
1
1 2
¨ ˛
1
0 1 0 0
1
˚0 0 1 0‹
P “˚
˝0 0 0 1‚
‹
0 3{4 3
3 1
4 0 0 4
4
1{
Question
m
What is the smallest number of steps Ni such that Pi,i ą 0 for all m ě N for
i P t0, 1, 2, 3u?
22 / 58
Exercise
1
1 2
¨ ˛
1
0 1 0 0
1
˚0 0 1 0‹
P “˚
˝0 0 0 1‚
‹
0 3{4 3
3 1
4 0 0 4
4
1{
Question
m
What is the smallest number of steps Ni such that Pi,i ą 0 for all m ě N for
i P t0, 1, 2, 3u?
Answer
N0 “ 22 / 58
Exercise
1
1 2
¨ ˛
1
0 1 0 0
1
˚0 0 1 0‹
P “˚
˝0 0 0 1‚
‹
0 3{4 3
3 1
4 0 0 4
4
1{
Question
m
What is the smallest number of steps Ni such that Pi,i ą 0 for all m ě N for
i P t0, 1, 2, 3u?
Answer
N0 “ 4, N1 “ 22 / 58
Exercise
1
1 2
¨ ˛
1
0 1 0 0
1
˚0 0 1 0‹
P “˚
˝0 0 0 1‚
‹
0 3{4 3
3 1
4 0 0 4
4
1{
Question
m
What is the smallest number of steps Ni such that Pi,i ą 0 for all m ě N for
i P t0, 1, 2, 3u?
Answer
N0 “ 4, N1 “4, N2 “ 4, N3 “ 4. 22 / 58
Aperiodic Markov Chains
23 / 58
Proof.
24 / 58
Proof of the Lemma.
Suppose that A “ ta1 , a2 , a3 , . . .u. Since gcd A “ 1, there must exist some
postiive integer k such that
gcdpa1 , a2 , . . . , ak q “ 1.
n1 a1 ` n2 a2 ` ¨ ¨ ¨ ` nk ak “ 1.
We can split this sum into a positive part P and a negative part N such that
P ´ N “ 1.
25 / 58
Proof of the Lemma (Continued)
Suppose that n is a positive integer such that n ě NpN ´ 1q. We can express n
in the form
n “ aN ` r
for some integer a and a nonnegative integer r in the range 0 ď r ď N ´ 1.
We must have a ě N ´ 1. Indeed, if a were less than N ´ 1, then we would have
n “ aN ` r ă NpN ´ 1q, contradicting our choice of n.
We can express n in the form
n “ aN ` r “ aN ` r pP ´ Nq “ pa ´ r qN ` rP.
26 / 58
Proof of the Proposition.
For each state i, consider the set Ai of possible return times
m
Ai “ tm ě 1 | Pi,i ą 0u.
Therefore,
So m ` m1 is an element of Ai . Therefore, Ai ` Ai Ď Ai .
By the lemma, Ai contains all but finitely many nonnegative integers. Therefore,
A contains all but finitely many nonnegative integers. 2
27 / 58
Aperiodic Irreducible Markov Chains
Proposition
Let X be an irreducible and aperiodic Markov chain with finite
state space and transition matrix P. Then there exists an M ă 8
such that pP m qi,j ą 0 for all states i and j and all m ě M.
28 / 58
Proof.
Since the Markov chain is aperiodic, there exist a positive integer N such that
pP n qi,i ą 0 for all states i and all n ě N.
n
Since P is irreducible, there exist a positive integer ni,j such that Pi,ji,j ą 0. After
m ě N ` ni,j steps, we have
PrrX m “ j | X0 “ is ě PrrX
looooooooooomooooooooooon m “ j | Xm´ni,j “ is looooooooooooomooooooooooooon
looooooooooooomooooooooooooon PrrXm´ni,j “ i | X0 “ is .
m ą0
Pi,j n m´ni,j
“Pi,ji,j ą0 “Pi,i ą0
m
In other words, we have Pi,j ą 0, as claimed.
29 / 58
Stationary Distributions
30 / 58
Stationary Distribution
Definition
Suppose that X is a finite Markov chain with transition matrix P.
A row vector v “ pp0 , p1 , . . . , pn´1 q is called a stationary
distribution for P if and only if
n´1
ÿ
1 the pk are nonnegative real numbers such that pk “ 1.
k“0
2 vP “ v .
31 / 58
Examples
Example
Every probability distribution on the states is a stationary
probability distribution when P is the identity matrix.
Example
ˆ ˙
1{2 1{2
If P “ , then v “ p1{6, 5{6q satisfies vP “ v .
1{10 9{10
32 / 58
Existence and Uniqueness of Stationary Distribution
Proposition
Any aperiodic and irreducible finite Markov chain has precisely one
stationary distribution.
33 / 58
Total Variation Distance
Definition
If p “ pp0 , p1 , . . . , pn´1 q and q “ pq0 , q1 , . . . , qn´1 q are probability
distributions on a finite state space, then
n´1
1ÿ
dTV pp, qq “ |pk ´ qk |
2 k“0
34 / 58
Convergence in Total Variation
Definition
pmq pmq pmq
If p pmq “ pp0 , p1 , . . . , pn´1 q is a probability distribution for each
m ě 1 and p “ pp0 , p1 , . . . , pn´1 q is a probability distribution, then
we say that p pmq converges to p in total variation if and only if
35 / 58
Convergence Theorem
Proposition
Let X be a finite irreducible aperiodic Markov chain with transition
matrix P. If p p0q is some initial probability distribution on the states
and p is a stationary distribution, then p pmq “ p p0q P m converges in
total variation to the stationary distribution,
36 / 58
Reversible Markov Chains
37 / 58
Definition
Suppose that X is a Markov chain with finite state space and
transition matrix P. A probability distribution π on S is called
reversible for the chain if and only if
πi Pi,j “ πj Pj,i
38 / 58
Reversible Markov Chains
Proposition
Suppose that X is a Markov chain with finite state space and
transition matrix P. If π is a reversible distribution for the Markov
chain, then it is also a stationary distribution for it.
39 / 58
Proof.
We need to show that πP “ π. In other words, we need to show
that ÿ
πj “ πk Pk,j .
kPS
holds for all states j.
This is straightforward, since
ÿ ÿ ÿ
πj “ πj Pj,k “ πj Pj,k “ πk Pk,j ,
kPS kPS kPS
40 / 58
Random Walks
41 / 58
Random Walks
Definition
A random walk on an undirected graph G “ pV , E q is given by
the transitition matrix P with
#
1
dpuq if pu, v q P E ,
Pu,v “
0 otherwise.
42 / 58
Properties
Proposition
For a random walk on a undirected graph with transition matrix P,
we have
1 P is irreducible if and only if G is connected,
2 P is aperiodic if and only if G is not bipartite.
Proof.
If P is irreducible, then the graphical representation is a strongly
connected directed graph, so the underlying undirected graph is
connected. The converse is clear.
43 / 58
Proof. (Continued)
The Markov chain corresponding to a random walk on an
undirected graph has either period 1 or 2. It has period 2 if and
only if G is bipartite. In other words, P is aperiodic if and only if G
is not bipartite.
44 / 58
Reversible Markov Chain
Proposition
A random walk on a graph with vertex set V “ tv1 , v2 , . . . , vn u is a
Markov chain with reversible distribution
ˆ ˙
dpv1 q dpv2 q dpvn q
π“ , ,..., ,
d d d
ř
where d “ v PV dpv q is the total degree of the graph.
45 / 58
Proof.
Suppose that u and v are adjacent vertices. Then
dpuq 1 1 dpv q 1
πu Pu,v “ “ “ “ πv Pv ,u .
d dpuq d d dpv q
If u and v are non-adjacent vertices, then
πu Pu,v “ 0 “ πv Pv ,u ,
since Pu,v “ 0 “ Pv ,u .
46 / 58
Example
|V | “ 8 and |E | “ 12
v1
8
ÿ
dpvk q “ 2|E | “ 24 v2 v3 v4
k“1
v5
ˆ
2 3 5 3 2 3 3 3
˙ v6 v7 v8
π“ , , , , , , ,
24 24 24 24 24 24 24 24
47 / 58
Markov Chain Monte Carlo Algorithms
48 / 58
Markov Chain Monte Carlo Algorithms
The Idea
Given a probability distribution π on a set S, we want to be able to
sample from this probability distribution.
In MCMC, we define a Markov chain that has π as a stationary
distribution. We run the chain for some iterations and then sample
from it.
49 / 58
Markov Chain Monte Carlo Algorithms
The Idea
Given a probability distribution π on a set S, we want to be able to
sample from this probability distribution.
In MCMC, we define a Markov chain that has π as a stationary
distribution. We run the chain for some iterations and then sample
from it.
Why?
Sometimes it is easier to construct the Markov chain than the
probability distribution π.
49 / 58
Hardcore Model
Definition
Let G “ pV , E q be a graph. The hardcore model of G randomly
assigns either 0 or 1 to each vertex such that no neighboring
vertices both have the value 1.
Assignment of the values 0 or 1 to the vertices are called
configurations. So a configuration is a map in t0, 1uV .
A configuration is called feasible if and only if no adjacent vertices
have the value 1.
In the hardcore model, the feasible configurations are chosen
uniformly at random.
50 / 58
Hardcore Model
Question
For a given graph G , how can you directly choose a feasible
configuration uniformly at random?
51 / 58
Hardcore Model
Question
For a given graph G , how can you directly choose a feasible
configuration uniformly at random?
51 / 58
Grid Graph Example
Observation
2
In an n ˆ n grid graph, there are 2n configurations.
52 / 58
Grid Graph Example
Observation
2
In an n ˆ n grid graph, there are 2n configurations.
Observation
2
There are at least 2n {2 feasible configurations in the grid graph.
Indeed, set every other node in the grid graph to 0. For example, if we label the
vertices by tpx, y q | 0 ď x ă n, 0 ď y ă nu. Then set all vertices with x ` y ” 0
pmod 2q to 0. The value of the remaining n2 {2 vertices can be chosen arbitrarily,
2
giving at least 2n {2 feasible configurations.
52 / 58
Hardcore Model Markov Chain
54 / 58
Hardcore Model
Proposition
The hardcore model Markov chain is irreducible.
Proof.
Given an arbitrary feasible configuration with m ones, it is possible
to reach the configuration with all zeros in m steps.
Similarly, it is possible to go from the zero configuration to an
arbitrary feasible configuration with positive probability in a finite
number of steps.
Therefore, it is possible to go from an arbitrary feasible
configuration to another in a finite number of steps with positive
probability.
54 / 58
Hardcore Model
Proposition
The hardcore model Markov chain is aperiodic.
55 / 58
Hardcore Model
Proposition
The hardcore model Markov chain is aperiodic.
Proof.
For each state, there is a small but nonzero probability that the
Markov chain stays in the same state. Thus, each state is
aperiodic. Therefore, the Markov chain is aperiodic.
55 / 58
Hardcore Model
Proposition
Let π denote the uniform distribution on the set of feasible
configurations F. Let P denote the transition matrix. Then
πf Pf ,g “ πg Pg ,f
56 / 58
Proof.
Since πf “ πg “ 1{|F|, it suffices to show that Pf ,g “ Pg ,f .
1 This is trivial if f “ g .
2 If f and g differ in more than one vertex, then
Pf ,g “ 0 “ Pg ,f .
3 If f and g differ only on the vertex v . If G has k vertices,
then
1 1
Pf ,g “ ¨ “ Pg ,f .
2 k
57 / 58
Hardcore Model
Corollary
The stationary distribution of the hardcore model Markov chain is
the uniform distribution on the set of feasible configurations.
58 / 58