Newman Barkema Monte Carlo Methods in Statistical Physics
Newman Barkema Monte Carlo Methods in Statistical Physics
in Statistical Physics
This page intentionally left blank
Monte Carlo Methods
in Statistical Physics
M. E. J. NEWMAN
Santa Fe Institute
and
G. T. BARKEMA
Institute for Theoretical Physics
Utrecht University
This book is intended for those who are interested in the use of Monte Carlo
simulations in classical statistical mechanics. It would be suitable for use in
a course on simulation methods or on statistical physics. It would also be
a good choice for those who wish to teach themselves about Monte Carlo
methods, or for experienced researchers who want to learn more about some
of the sophisticated new simulation techniques which have appeared in the
last decade or so.
The primary goal of the book is to explain how to perform Monte Carlo
simulations efficiently. For many people, Monte Carlo simulation just means
applying the Metropolis algorithm to the problem in hand. Although this
famous algorithm is very easy to program, it is rarely the most efficient way
to perform a simulation. The Metropolis algorithm is certainly important
and we do discuss it in some detail (Chapter 3 is devoted to it), but we also
show that for most problems a little work with a pencil and paper can usu-
ally turn up a better algorithm, in some cases thousands or millions of times
faster. In recent years there has been quite a flurry of interesting new Monte
Carlo algorithms described in the literature, many of which are specifically
designed to accelerate the simulation of particular classes of problems in
statistical physics. Amongst others, we describe cluster algorithms, multi-
grid methods, non-local algorithms for conserved-order-parameter models,
entropic sampling, simulated tempering and continuous time Monte Carlo.
The book is divided into parts covering equilibrium and non-equilibrium
simulations, and throughout we give pointers to how the algorithms can be
most efficiently implemented. At the end of the book we include a number
of chapters on general implementation issues for Monte Carlo simulations.
We also cover data analysis methods in some detail, including generic meth-
ods for estimating observable quantities, equilibration and correlation times,
correlation functions, and standard errors, as well as a number of techniques
which are specific to Monte Carlo simulation, such as the single and multiple
histogram methods, finite-size scaling and the Monte Carlo renormalization
group.
The modus operandi of this book is teaching by example. We have tried
vi Preface
II Out-of-equilibrium simulations
9 Out-of-equilibrium Monte Carlo simulations 263
9.1 Dynamics . 264
9.1.1 Choosing the dynamics 266
III Implementation
13 Lattices and data structures 331
13.1 Representing lattices on a computer 332
13.1.1 Square and cubic lattices 332
13.1.2 Triangular, honeycomb and Kagome lattices 335
13.1.3 Fcc, bcc and diamond lattices 340
13.1.4 General lattices 342
13.2 Data structures 343
13.2.1 Variables 343
13.2.2 Arrays 345
13.2.3 Linked lists 345
13.2.4 Trees 348
13.2.5 Buffers 352
Problems 355
References 410
Appendices
A Answers to problems 417
B Sample programs 433
B.1 Algorithms for the Ising model 433
B.1.1 Metropolis algorithm 433
B.1.2 Multispin-coded Metropolis algorithm 435
B.1.3 Wolff algorithm 437
B.2 Algorithms for the COP Ising model 438
B.2.1 Non-local algorithm 438
B.2.2 Continuous time algorithm 441
B.3 Algorithms for Potts models 445
B.4 Algorithms for ice models 448
B.5 Random number generators 451
B.5.1 Linear congruential generator 451
B.5.2 Shuffled linear congruential generator 452
B.5.3 Lagged Fibonacci generator 452
Index 455
Part I
litre of, say, oxygen at standard temperature and pressure consists of about
3 x 1022 oxygen molecules, all moving around and colliding with one another
and the walls of the container. One litre of air under the same conditions
contains the same number of molecules, but they are now a mixture of oxy-
gen, nitrogen, carbon dioxide and a few other things. The atmosphere of the
Earth contains 4 x 1021 litres of air, or about 1 x 1044 molecules, all moving
around and colliding with each other and the ground and trees and houses
and people. These are large systems. It is not feasible to solve Hamilton's
equations for these systems because there are simply too many equations,
and yet when we look at the macroscopic properties of the gas, they are very
well-behaved and predictable. Clearly, there is something special about the
behaviour of the solutions of these many equations that "averages out" to
give us a predictable behaviour for the entire system. For example, the pres-
sure and temperature of the gas obey quite simple laws although both are
measures of rather gross average properties of the gas. Statistical mechanics
attempts to side-step the problem of solving the equations of motion and cut
straight to the business of calculating these gross properties of large systems
by treating them in a probabilistic fashion. Instead of looking for exact so-
lutions, we deal with the probabilities of the system being in one state or
another, having this value of the pressure or that—hence the name statisti-
cal mechanics. Such probabilistic statements turn out to be extremely useful,
because we usually find that for large systems the range of behaviours of the
system that are anything more than phenomenally unlikely is very small;
all the reasonably probable behaviours fall into a narrow range, allowing us
to state with extremely high confidence that the real system will display
behaviour within that range. Let us look at how statistical mechanics treats
these systems and demonstrates these conclusions.
The typical paradigm for the systems we will be studying in this book is
one of a system governed by a Hamiltonian function H which gives us the
total energy of the system in any particular state. Most of the examples
we will be looking at have discrete sets of states each with its own energy,
ranging from the lowest, or ground state energy E0 upwards, E1, E2, E3 ...,
possibly without limit. Statistical mechanics, and the Monte Carlo methods
we will be introducing, are also applicable to systems with continuous energy
spectra, and we will be giving some examples of such applications.
If our Hamiltonian system were all we had, life would be dull. Being
a Hamiltonian system, energy would be conserved, which means that the
system would stay in the same energy state all the time (or if there were
a number of degenerate states with the same energy, maybe it would make
transitions between those, but that's as far as it would get).1 However,
1
For a classical system which has a continuum of energy states there can be a continuous
set of degenerate states through which the system passes, and an average over those states
can sometimes give a good answer for certain properties of the system. Such sets of
1.1 Statistical mechanics 5
there's another component to our paradigm, and that is the thermal reser-
voir. This is an external system which acts as a source and sink of heat,
constantly exchanging energy with our Hamiltonian system in such a way
as always to push the temperature of the system—defined as in classical
thermodynamics—towards the temperature of the reservoir. In effect the
reservoir is a weak perturbation on the Hamiltonian, which we ignore in our
calculation of the energy levels of our system, but which pushes the system
frequently from one energy level to another. We can incorporate the effects
of the reservoir in our calculations by giving the system a dynamics, a rule
whereby the system changes periodically from one state to another. The ex-
act nature of the dynamics is dictated by the form of the perturbation that
the reservoir produces in the Hamiltonian. We will discuss many different
possible types of dynamics in the later chapters of this book. However, there
are a number of general conclusions that we can reach without specifying
the exact form of the dynamics, and we will examine these first.
Suppose our system is in a state u. Let us define R(u —» v) dt to be the
probability that it is in state v a time dt later. R(u — v) is the transi-
tion rate for the transition from u to v. The transition rate is normally
assumed to be time-independent and we will make that assumption here.
We can define a transition rate like this for every possible state v that the
system can reach. These transition rates are usually all we know about the
dynamics, which means that even if we know the state u that the system
starts off in, we need only wait a short interval of time and it could be in
any one of a very large number of other possible states. This is where our
probabilistic treatment of the problem comes in. We define a set of weights
wu(t) which represent the probability that the system will be in state u at
time t. Statistical mechanics deals with these weights, and they represent
our entire knowledge about the state of the system. We can write a master
equation for the evolution of wu(t) in terms of the rates R(u —> v) thus:2
The first term on the right-hand side of this equation represents the rate at
which the system is undergoing transitions into state u; the second term is
the rate at which it is undergoing transitions out of u into other states. The
probabilities wu(t) must also obey the sum rule
degenerate states are said to form a microcanonical ensemble. The more general case
we consider here, in which there is a thermal reservoir causing the energy of the system
to fluctuate, is known as a canonical ensemble.
2
The master equation is really a set of equations, one for each state u, although people
always call it the master equation, as if there were only one equation here.
6 Chapter 1: Introduction
for all t, since the system must always be in some state. The solution of
Equation (1.1), subject to the constraint (1.2), tells us how the weights wu
vary over time.
And how are the weights wu related to the macroscopic properties of the
system which we want to know about? Well, if we are interested in some
quantity Q, which takes the value Qu in state u, then we can define the
expectation of Q at time t for our system as
Clearly this quantity contains important information about the real value of
Q that we might expect to measure in an experiment. For example, if our
system is definitely in one state T then (Q) will take the corresponding value
QT. And if the system is equally likely to be in any of perhaps three states,
and has zero probability of being in any other state, then (Q) is equal to the
mean of the values of Q in those three states, and so forth. However, the
precise relation of (Q) to the observed value of Q is perhaps not very clear.
There are really two ways to look at it. The first, and more rigorous, is to
imagine having a large number of copies of our system all interacting with
their own thermal reservoirs and whizzing between one state and another
all the time. (Q) is then a good estimate of the number we would get if
we were to measure the instantaneous value of the quantity Q in each of
these systems and then take the mean of all of them. People who worry
about the conceptual foundations of statistical mechanics like to take this
"many systems" approach to defining the expectation of a quantity.3 The
trouble with it however is that it's not very much like what happens in a
real experiment. In a real experiment we normally only have one system and
we make all our measurements of Q on that system, though we probably
don't just make a single instantaneous measurement, but rather integrate
our results over some period of time. There is another way of looking at
the expectation value which is similar to this experimental picture, though
it is less rigorous than the many systems approach. This is to envisage the
expectation as a time average of the quantity Q. Imagine recording the value
of Q every second for a thousand seconds and taking the average of those one
thousand values. This will correspond roughly to the quantity calculated in
Equation (1.3) as long as the system passes through a representative selection
of the states in the probability distribution wu in those thousand seconds.
And if we make ten thousand measurements of Q instead of one thousand,
3
In fact the word ensemble, as in the "canonical ensemble" which was mentioned in a
previous footnote, was originally introduced by Gibbs to describe an ensemble of systems
like this, and not an ensemble of, say, molecules, or any other kind of ensemble. These
days however, use of this word no longer implies that the writer is necessarily thinking of
a many systems formulation of statistical mechanics.
1.2 Equilibrium 7
1.2 Equilibrium
Consider the master equation (1.1) again. If our system ever reaches a state
in which the two terms on the right-hand side exactly cancel one another for
all u, then the rates of change dwu/dt will all vanish and the weights will
all take constant values for the rest of time. This is an equilibrium state.
Since the master equation is first order with real parameters, and since the
variables wu. are constrained to lie between zero and one (which effectively
prohibits exponentially growing solutions to the equations) we can see that
all systems governed by these equations must come to equilibrium in the end.
A large part of this book will be concerned with Monte Carlo techniques for
simulating equilibrium systems and in this section we develop some of the
important statistical mechanical concepts that apply to these systems.
The transition rates R(u —» v) appearing in the master equation (1.1)
8 Chapter 1: Introduction
do not just take any values. They take particular values which arise out of
the thermal nature of the interaction between the system and the thermal
reservoir. In the later chapters of this book we will have to choose val-
ues for these rates when we simulate thermal systems in our Monte Carlo
calculations, and it is crucial that we choose them so that they mimic the
interactions with the thermal reservoir correctly. The important point is
that we know a priori what the equilibrium values of the weights wu are for
our system. We call these equilibrium values the equilibrium occupation
probabilities and denote them by
It was Gibbs (1902) who showed that for a system in thermal equilibrium
with a reservoir at temperature T, the equilibrium occupation probabilities
are
Z is also known as the partition function, and it figures a lot more heavily
in the mathematical development of statistical mechanics than a mere nor-
malizing constant might be expected to. It turns out in fact that a knowledge
of the variation of Z with temperature and any other parameters affecting
the system (like the volume of the box enclosing a sample of gas, or the mag-
netic field applied to a magnet) can tell us virtually everything we might want
to know about the macroscopic behaviour of the system. The probability
distribution (1.5) is known as the Boltzmann distribution, after Ludwig
Boltzmann, one of the pioneers of statistical mechanics. For a discussion of
the origins of the Boltzmann distribution and the arguments that lead to it,
the reader is referred to the exposition by Walter Grandy in his excellent
book Foundations of Statistical Mechanics (1987). In our treatment we will
take Equation (1.5) as our starting point for further developments.
From Equations (1.3), (1.4) and (1.5) the expectation of a quantity Q for
a system in equilibrium is
1.2 Equilibrium 9
For example, the expectation value of the energy (E), which is also the
quantity we know from thermodynamics as the internal energy U, is given
by
From Equation (1.6) we can see that this can also be written in terms of a
derivative of the partition function:
However, from thermodynamics we know that the specific heat is also related
to the entropy:
and, equating these two expressions for C and integrating with respect to B,
we find the following expression for the entropy:
Thus, if we can calculate the free energy using Equation (1.13), then we can
calculate the effects of parameter variations too.
In performing Monte Carlo calculations of the properties of equilibrium
systems, it is sometimes appropriate to calculate the partition function and
then evaluate other quantities from it. More often it is better to calculate
the quantities of interest directly, but many times in considering the theory
behind our simulations we will return to the idea of the partition function,
because in principle the entire range of thermodynamic properties of a system
can be deduced from this function, and any numerical method that can make
a good estimate of the partition function is at heart a sound method.
So
Using Equation (1.10) to eliminate the second derivative, we can also write
this as
And the standard deviation of E, the RMS fluctuation in the internal energy,
is just the square root of this expression.
This result is interesting for a number of reasons. First, it gives us the
magnitude of the fluctuations in terms of the specific heat C or alternatively
in terms of log Z = -BF. In other words we can calculate the fluctuations
entirely from quantities that are available within classical thermodynamics.
However, this result could never have been derived within the framework
of thermodynamics, since it depends on microscopic details that thermody-
namics has no access to. Second, let us look at what sort of numbers we
get out for the size of the energy fluctuations of a typical system. Let us go
back to our litre of gas in a box. A typical specific heat for such a system is
1 JK-1 at room temperature and atmospheric pressure, giving RMS energy
fluctuations of about 10-18 J. The internal energy itself on the other hand
will be around 102 J, so the fluctuations are only about one part in 1020.
This lends some credence to our earlier contention that statistical treatments
can often give a very accurate estimate of the expected behaviour of a sys-
tem. We see that in the case of the internal energy at least, the variation of
the actual value of U around the expectation value (E) is tiny by compari-
son with the kind of energies we are considering for the whole system, and
probably not within the resolution of our measuring equipment. So quoting
the expectation value gives a very good guide to what we should expect to
see in an experiment. Furthermore, note that, since the specific heat C is
an extensive quantity, the RMS energy fluctuations, which are the square
root of Equation (1.19), scale like /V with the volume V of the system. The
internal energy itself on the other hand scales like V, so that the relative size
of the fluctuations compared to the internal energy decreases as 1\V as the
system becomes large. In the limit of a very large system, therefore, we can
ignore the fluctuations altogether. For this reason, the limit of a large system
is called the thermodynamic limit. Most of the questions we would like
to answer about condensed matter systems are questions about behaviour
in the thermodynamic limit. Unfortunately, in Monte Carlo simulations it
is often not feasible to simulate a system large enough that its behaviour
is a good approximation to a large system. Much of the effort we put into
designing algorithms will be aimed at making them efficient enough that we
can simulate the largest systems possible in the available computer time, in
12 Chapter 1: Introduction
since Eu now contains the term — XuY which the derivative acts on. Here
Xu, is the value of the quantity X in the state u. We can then write this in
terms of the free energy thus:
4
We use lower-case xi to denote an intensive variable. X by contrast was extensive,
i.e., its value scales with the size of the system. We will use this convention to distinguish
intensive and extensive variables throughout much of this book.
14 Chapter 1: Introduction
(2)
The quantity G(2) (i,j) is called the two-point connected correlation
function of x between sites i and j, or just the connected correlation, for
short. The superscript (2) is to distinguish this function from higher order
correlation functions, which are discussed below. As its name suggests, this
function is a measure of the correlation between the values of the variable x
on the two sites; it takes a positive value if the values of x on those two sites
fluctuate in the same direction together, and a negative one if they fluctuate
in opposite directions. If their fluctuations are completely unrelated, then its
value will be zero. To see why it behaves this way consider first the simpler
disconnected correlation function G(2)(i,j) which is defined to be
If the variables Xj and Xj are fluctuating roughly together, around zero, both
becoming positive at once and then both becoming negative, at least most of
the time, then all or most of the values of the product XiXj that we average
will be positive, and this function will take a positive value. Conversely, if
they fluctuate in opposite directions, then it will take a negative value. If
they sometimes fluctuate in the same direction as one another and sometimes
in the opposite direction, then the values of XiXj will take a mixture of
positive and negative values, and the correlation function will average out
close to zero. This function therefore has pretty much the properties we
desire of a correlation function, and it can tell us a lot of useful things about
the behaviour of our system. However, it is not perfect, because we must
also consider what happens if we apply our field Y to the system. This
can have the effect that the mean value of x at a site (xi) can be non-
zero. The same thing can happen even in the absence of an external field if
our system undergoes a phase transition to a spontaneously symmetry
broken state where a variable such as x spontaneously develops a non-zero
expectation value. (The Ising model of Section 1.2.2, for instance, does this.)
In cases like these, the disconnected correlation function above can have a
large positive value simply because the values of the variables Xi and Xj are
always either both positive or both negative, even though this has nothing to
do with them being correlated to one another. The fluctuations of Xi and Xj
can be completely unrelated and still the disconnected correlation function
takes a non-zero value. To obviate this problem we define the connected
correlation function as above:
When the expectations (xi) and (xj) are zero and Xi and Xj are just fluctuat-
ing around zero, this function is exactly equal to the disconnected correlation
function. But when the expectations are non-zero, the connected correlation
1.2 Equilibrium 15
these spins assume the simplest form possible, which is not particularly re-
alistic, of scalar variables Si which can take only two values ±1, representing
up-pointing or down-pointing dipoles of unit magnitude. In a real, magnetic
material the spins interact, for example through exchange interactions or
RKKY interactions (see, for instance, Ashcroft and Mermin 1976), and the
Ising model mimics this by including terms in the Hamiltonian proportional
to products SiSj of the spins. In the simplest case, the interactions are all of
the same strength, denoted by J which has the dimensions of an energy, and
are only between spins on sites which are nearest neighbours on the lattice.
We can also introduce an external magnetic field B coupling to the spins.
The Hamiltonian then takes the form
where the notation (ij) indicates that the sites i and j appearing in the sum
are nearest neighbours.5 The minus signs here are conventional. They merely
dictate the choice of sign for the interaction parameter J and the external
field B. With the signs as they are here, a positive value of J makes the
spins want to line up with one another—a ferromagnetic model as opposed
to an anti-ferromagnetic one which is what we get if J is negative—and the
spins also want to line up in the same direction as the external field—they
want to be positive if B > 0 and negative if B < 0.
The states of the Ising system are the different sets of values that the
spins can take. Since each spin can take two values, there are a total of 2N
states for a lattice with N spins on it. The partition function of the model
is the sum
Often, in fact, we are more interested in the mean magnetization per spin (m),
which is just
(In the later chapters of this book, we frequently use the letter m alone
to denote the average magnetization per spin, and omit the brackets (...)
around it indicating the average. This is also the common practice of many
other authors. In almost all cases it is clear from the context when an average
over states is to be understood.)
We can calculate fluctuations in the magnetization or the internal energy
by calculating derivatives of the partition function. Or, as we mentioned in
Section 1.2.1, if we have some way of calculating the size of the fluctuations
in the magnetization, we can use those to evaluate the magnetic suscep-
tibility
(Note the leading factor of N here, which is easily overlooked when calculat-
ing x from Monte Carlo data.) Similarly we can calculate the specific heat
per spin c from the energy fluctuations thus:
and the model has the added pedagogical advantage that its behaviour has
been solved exactly, so we can compare our numerical calculations with the
exact solution. Let's take a smallish system to start with, of 25 spins on
a square lattice in a 5 x 5 arrangement. By convention we apply periodic
boundary conditions, so that there are interactions between spins on the
border of the array and the opposing spins on the other side. We will also
set the external magnetic field B to zero, to make things simpler still.
With each spin taking two possible states, represented by ±1, our 25 spin
system has a total of 225 = 33 554 432 possible states. However, we can save
ourselves from summing over half of these, because the system has up/down
symmetry, which means that for every state there is another one in which
every spin is simply flipped upside down, which has exactly the same energy
in zero magnetic field. So we can simplify the calculation of the partition
function by just taking one out of every pair of such states, for a total of
16 777 216 states, and summing up the corresponding terms in the partition
function, Equation (1.6), and then doubling the sum.6
In Figure 1.1 we show the mean magnetization per spin and the spe-
cific heat per spin for this 5 x 5 system, calculated from Equations (1.10)
and (1.34). On the same axes we show the exact solutions for these quanti-
ties on an infinite lattice, as calculated by Onsager. The differences between
the two are clear, and this is precisely the difference between our small
finite-sized system and the infinite thermodynamic-limit system which we
discussed in Section 1.2.1. Notice in particular that the exact solution has
a non-analytic point at about kT = 2.3J which is not reproduced even
moderately accurately by our small numerical calculation. This point is the
so-called "critical temperature" at which the length-scale £ of the fluctu-
ations in the magnetization, also called the "correlation length", diverges.
(This point is discussed in more detail in Section 3.7.1.) Because of this
divergence of the length-scale, it is never possible to get good results for the
behaviour of the system at the critical temperature out of any calculation
performed on a finite lattice—the lattice is never large enough to include all
of the important physics of the critical point. Does this mean that calcula-
tions on finite lattices are useless? No, it certainly does not. To start with,
at temperatures well away from the critical point the problems are much less
severe, and the numerical calculation and the exact solution agree better,
6
If we were really serious about this, we could save ourselves further time by making
use of other symmetries too. For example the square system we are investigating here also
has a reflection symmetry and a four-fold rotational symmetry (the symmetry group is
C4), meaning that the states actually group into sets of 16 states (including the up-down
symmetry pairs), all of which have the same energy. This would reduce the number of
terms we have to evaluate to 2 105 872. (The reader may like to ponder why this number is
not exactly 225/16, as one might expect.) However, such efforts are not really worthwhile,
since, as we will see very shortly, this direct evaluation of the partition function is not a
promising method for solving models.
20 Chapter 1: Introduction
a slide-rule. As first envisaged, Monte Carlo was not a method for solving
problems in physics, but a method for estimating integrals which could not
be performed by other means. Integrals over poorly-behaved functions and
integrals in high-dimensional spaces are two areas in which the method has
traditionally proved profitable, and indeed it is still an important technique
for problems of these types. To give an example, consider the function
which is pictured in Figure 1.2. The values of this function lie entirely
between zero and one, but it is increasingly rapidly varying in the neigh-
bourhood of x = 0. Clearly the integral
which is the area under this curve between 0 and x, takes a finite value
somewhere in the range 0 < I(x) < x, but it is not simple to calculate
this value exactly because of the pathologies of the function near the origin.
However, we can make an estimate of it by the following method. If we
choose a random real number h, uniformly distributed between zero and x,
and another v between zero and one and plot on Figure 1.2 the point for
which these are the horizontal and vertical coordinates, the probability that
24 Chapter 1: Introduction
this point will be below the line of f ( x ) is just I(x)/x. It is easy to determine
whether the point is in fact below the line: it is below it if h < f ( v ) . Thus if
we simply pick a large number N of these random points and count up the
number M which fall below the line, we can estimate I(x) from
You can get an answer accurate to one figure by taking a thousand points,
which would be about the limit of what one could have reasonably done in
the days before computers. Nowadays, even a cheap desktop computer can
comfortably run through a million points in a few seconds, giving an answer
accurate to about three figures. In Figure 1.3 we have plotted the results of
such a calculation for a range of values of x. The errors in this calculation
are smaller than the width of the line in the figure.7
A famous early example of this type of calculation is the experiment
known as "Buffon's needle" (Dorrie 1965), in which the mathematical con-
stant tt is determined by repeatedly dropping a needle onto a sheet of paper
ruled with evenly spaced lines. The experiment is named after Georges-Louis
Leclerc, Comte de Buffon who in 1777 was the first to show that if we throw
a needle of length l completely at random onto a sheet of paper ruled with
lines a distance d apart, then the chances that the needle will fall so as to
7
In fact there exist a number of more sophisticated Monte Carlo integration techniques
which give more accurate answers than the simple "hit or miss" method we have described
here. A discussion can be found in the book by Kalos and Whitlock (1986).
1.4 A brief history of the Monte Carlo method 25
intersect one of the lines is 2l/ttd, provided that d > l. It was Laplace in
1820 who then pointed out that if the needle is thrown down N times and
is observed to land on a line M of those times, we can make an estimate of
tt from
(Perhaps the connection between this and the Monte Carlo evaluation of
integrals is not immediately apparent, but it will certainly become clear
if you try to derive Equation (1.44) for yourself, or if you follow Dome's
derivation.) A number of investigators made use of this method over the
years to calculate approximate values for tt. The most famous of these is
Mario Lazzarini, who in 1901 announced that he had calculated a value of
3.1415929 for tt from an experiment in which a 21 cm needle was dropped
3408 times onto a sheet of paper ruled with lines 3 cm apart. This value,
accurate to better than three parts in ten million, would be an impressive
example of the power of the statistical sampling method were it not for
the fact that it is almost certainly faked. Badger (1994) has demonstrated
extremely convincingly that, even supposing Lazzarini had the technology
at his disposal to measure the length of his needle and the spaces between
his lines to a few parts in 107 (a step necessary to ensure the accuracy of
Equation (1.44)), still the chances of his finding the results he did were
poorer than three in a million; Lazzarini was imprudent enough to publish
details of the progress of the experiment through the 3408 castings of the
needle, and it turns out that the statistical "fluctuations" in the numbers of
intersections of the needle with the ruled lines are much smaller than one
would expect in a real experiment. All indications are that Lazzarini forged
his results. However, other, less well known attempts at the experiment were
certainly genuine, and yielded reasonable figures for tt: 3.1596 (Wolf 1850),
3.1553 (Smith 1855). Apparently, performing the Buffon's needle experiment
was for a while quite a sophisticated pastime amongst Europe's intellectual
gentry.
With the advent of mechanical calculating machines at the end of the
nineteenth century, numerical methods took a large step forward. These
machines increased enormously the number and reliability of the arithmetic
operations that could be performed in a numerical "experiment", and made
the application of statistical sampling techniques to research problems in
physics a realistic possibility for the first time. An early example of what
was effectively a Monte Carlo calculation of the motion and collision of the
molecules in a gas was described by William Thomson (later Lord Kelvin)
in 1901. Thomson's calculations were aimed at demonstrating the truth
of the equipartition theorem for the internal energy of a classical system.
However, after the fashion of the time, he did not perform the laborious
analysis himself, and a lot of the credit for the results must go to Thomson's
26 Chapter 1: Introduction
in New Mexico, where Nick Metropolis, Stanislaw Ulam and John von Neu-
mann gathered in the last months of the Second World War shortly after the
epochal bomb test at Alamagordo, to collaborate on numerical calculations
to be performed on the new ENIAC electronic computer, a mammoth, room-
filing machine containing some 18 000 triode valves, whose construction was
nearing completion at the University of Pennsylvania. Metropolis (1980) has
remarked that the technology that went into the ENIAC existed well before
1941, but that it took the pressure of America's entry into the war to spur
the construction of the machine.
It seems to have been Stan Ulam who was responsible for reinventing
Fermi's statistical sampling methods. He tells of how the idea of calculat-
ing the average effect of a frequently repeated physical process by simply
simulating the process over and over again on a digital computer came to
him whilst huddled over a pack of cards, playing patience8 one day. The
game he was playing was "Canfield" patience, which is one of those forms
of patience where the goal is simply to turn up every card in the pack, and
he wondered how often on average one could actually expect to win the
game. After abandoning the hopelessly complex combinatorics involved in
answering this question analytically, it occurred to him that you could get
an approximate answer simply by playing a very large number of games and
seeing how often you win. With his mind never far from the exciting new
prospect of the ENIAC computer, the thought immediately crossed his mind
that he might be able to get the machine to play these games for him far
faster than he ever could himself, and it was only a short conceptual leap to
applying the same idea to some of the problems of the physics of the hydro-
gen bomb that were filling his work hours at Los Alamos. He later described
his idea to John von Neumann who was very enthusiastic about it, and the
two of them began making plans to perform actual calculations. Though
Ulam's idea may appear simple and obvious to us today, there are actually
many subtle questions involved in this idea that a physical problem with
an exact answer can be approximately solved by studying a suitably chosen
random process. It is a tribute to the ingenuity of the early Los Alamos
workers that, rather than plunging headlong into the computer calculations,
they considered most of these subtleties right from the start.
The war ended before the first Monte Carlo calculations were performed
on the ENIAC. There was some uncertainty about whether the Los Alamos
laboratory would continue to exist in peacetime, and Edward Teller, who
was leading the project to develop the hydrogen bomb, was keen to apply
the power of the computer to the problems of building the new bomb, in
order to show that significant work was still going on at Los Alamos. Von
Neumann developed a detailed plan of how the Monte Carlo method could be
8
Also called "solitaire" in the USA.
28 Chapter 1: Introduction
refers to as a "hastily built first try". It was faster and contained a larger
memory (40 kilobits, or 5 kilobytes in modern terms). It was built under
the direction of Metropolis, who had been lured back to Los Alamos after
a brief stint on the faculty at Chicago by the prospect of the new machine.
The design was based on ideas put forward by John von Neumann and in-
corporated a number of technical refinements proposed by Jim Richardson,
an engineer working on the project. A still more sophisticated computer, the
MANIAC 2, was built at Los Alamos two years later, and both machines
remained in service until the late fifties, producing a stream of results, many
of which have proved to be seminal contributions to the field of Monte Carlo
simulation. Of particular note to us is the publication in 1953 of the paper by
Nick Metropolis, Marshall and Arianna Rosenbluth, and Edward and Mici
Teller, in which they describe for the first time the Monte Carlo technique
that has come to be known as the Metropolis algorithm. This algorithm was
the first example of a thermal "importance sampling" method, and it is to
this day easily the most widely used such method. We will be discussing it
in some detail in Chapter 3. Also of interest are the Monte Carlo studies
of nuclear cascades performed by Antony Turkevich and Nick Metropolis,
and Edward Teller's work on phase changes in interacting hard-sphere gases
using the Metropolis algorithm.
The exponential growth in computer power since those early days is by
now a familiar story to us all, and with this increase in computational re-
sources Monte Carlo techniques have looked deeper and deeper into the
subject of statistical physics. Monte Carlo simulations have also become
more accurate as a result of the invention of new algorithms. Particularly in
the last twenty years, many new ideas have been put forward, of which we
describe a good number in the rest of this book.
Problems
are allowed to move back and forward between the boxes under the influence
of thermal excitations from a reservoir at temperature T. Find the partition
function for this system and then use this result to calculate the internal
energy.
1.4 Solve the Ising model, whose Hamiltonian is given in Equation (1.30),
in one dimension for the case where B — 0 as follows. Define a new set
of variables a1 which take values 0 and 1 according to ai = 1(1 — SjSj+1)
and rewrite the Hamiltonian in terms of these variables for a system of N
spins with periodic boundary conditions. Show that the resulting system
is equivalent to the one studied in Problem 1.3 in the limit of large N and
hence calculate the internal energy as a function of temperature.
2
The principles of equilibrium
thermal Monte Carlo simulation
is only tractable in the very smallest of systems. In larger systems, the best
we can do is average over some subset of the states, though this necessarily
introduces some inaccuracy into the calculation. Monte Carlo techniques
work by choosing a subset of states at random from some probability distri-
bution pu which we specify. Suppose we choose M such states {u1 ... uM}.
32 Chapter 2: Equilibrium thermal Monte Carlo simulations
It turns out however, that this is usually a rather poor choice to make.
In most numerical calculations it is only possible to sample a very small
fraction of the total number of states. Consider, for example, the Ising
model of Section 1.2.2 again. A small three-dimensional cubic system of
10 x 10 x 10 Ising spins would have 21000 ~ 10300 states, and a typical
numerical calculation could only hope to sample up to about 108 of those in
a few hours on a good computer, which would mean we were only sampling
one in every 10292 states of the system, a very small fraction indeed. The
estimator given above is normally a poor guide to the value of (Q) under
these circumstances. The reason is that one or both of the sums appearing
in Equation (2.1) may be dominated by a small number of states, with all
the other states, the vast majority, contributing a negligible amount even
when we add them all together. This effect is often especially obvious at
low temperatures, where these sums may be dominated by a hundred states,
or ten states, or even one state, because at low temperatures there is not
enough thermal energy to lift the system into the higher excited states, and
so it spends almost all of its time sitting in the ground state, or one of the
lowest of the excited states. In the example described above, the chances of
one of the 108 random states we sample in our simulation being the ground
state are one in 10292, which means there is essentially no chance of our
picking it, which makes QM a very inaccurate estimate of (Q) if the sums
are dominated by the contribution from this state.
On the other hand, if we had some way of knowing which states made
the important contributions to the sums in Equation (2.1) and if we could
pick our sample of M states from just those states and ignore all the others,
we could get a very good estimate of (Q) with only a small number of terms.
This is the essence of the idea behind thermal Monte Carlo methods. The
2.2 Importance sampling 33
technique for picking out the important states from amongst the very large
number of possibilities is called importance sampling.
energy of the system—the ratio was about 10-20 in the case of our litre of gas,
for instance. Similar arguments can be used to show that systems sample
very narrow ranges of other quantities as well. The reason for this, as we
saw, is that the system is not sampling all states with equal probability, but
instead sampling them according to the Boltzmann probability distribution,
Equation (1.5). If we can mimic this effect in our simulations, we can exploit
these narrow ranges of energy and other quantities to make our estimates
of such quantities very accurate. For this reason, we normally try to take a
sample of the states of the system in which the likelihood of any particular
one appearing is proportional to its Boltzmann weight. This is the most
common form of importance sampling, and most of the algorithms in this
book make use of this idea in one form or another.
Our strategy then is this: instead of picking our M states in such a way
that every state of the system is as likely to get chosen as every other, we
pick them so that the probability that a particular state u gets chosen is
Pu = Z - 1 e - B E u . Then our estimator for (Q), Equation (2.2), becomes just
Notice that the Boltzmann factors have now cancelled out of the estimator,
top and bottom, leaving a particularly simple expression. This definition of
QM works much better than (2.3), especially when the system is spending
the majority of its time in a small number of states (such as, for example,
the lowest-lying ones when we are at low temperatures), since these will be
precisely the states that we pick most often, and the relative frequency with
which we pick them will exactly correspond to the amount of time the real
system would spend in those states.
The only remaining question is how exactly we pick our states so that
each one appears with its correct Boltzmann probability. This is by no means
a simple task. In the remainder of this chapter we describe the standard
solution to the problem, which makes use of a "Markov process".
since the Markov process must generate some state v when handed a system
in the state u. Note however, that the transition probability P(u —> u),
which is the probability that the new state generated will be the same as
the old one, need not be zero. This amounts to saying there may be a finite
probability that the Markov process will just stay in state u.
In a Monte Carlo simulation we use a Markov process repeatedly to gen-
erate a Markov chain of states. Starting with a state u, we use the process
to generate a new one v, and then we feed that state into the process to
generate another A, and so on. The Markov process is chosen specially so
that when it is run for long enough starting from any state of the system it
will eventually produce a succession of states which appear with probabili-
ties given by the Boltzmann distribution. (We call the process of reaching
the Boltzmann distribution "coming to equilibrium", since it is exactly the
process that a real system goes through with its "analogue computer" as it
reaches equilibrium at the ambient temperature.) In order to achieve this,
we place two further conditions on our Markov process, in addition to the
ones specified above, the conditions of "ergodicity" and "detailed balance"
2.2.2 Ergodicity
The condition of ergodicity is the requirement that it should be possible
for our Markov process to reach any state of the system from any other state,
if we run it for long enough. This is necessary to achieve our stated goal of
generating states with their correct Boltzmann probabilities. Every state v
appears with some non-zero probability pv in the Boltzmann distribution,
and if that state were inaccessible from another state u no matter how long
we continue our process for, then our goal is thwarted if we start in state u:
the probability of finding v in our Markov chain of states will be zero, and
36 Chapter 2: Equilibrium thermal Monte Carlo simulations
Making use of the sum rule, Equation (2.5), we can simplify this to
For any set of transition probabilities satisfying this equation, the probability
distribution pu will be an equilibrium of the dynamics of the Markov process.
Unfortunately, however, simply satisfying this equation is not sufficient to
guarantee that the probability distribution will tend to pu from any state of
the system if we run the process for long enough. We can demonstrate this
as follows.
The transition probabilities P(u —> v) can be thought of as the elements
of a matrix P. This matrix is called the Markov matrix or the stochastic
matrix for the Markov process. Let us return to the notation of Section 1.1,
in which we denoted by wu(t), the probability that our system is in a state
u at time t. If we measure time in steps along our Markov chain, then the
3
This equation is essentially just a discrete-time version of the one we would get if we
were to set the derivative in the master equation, Equation (1.1), to zero.
2.2 Importance sampling 37
where w(t) is the vector whose elements are the weights wu(t). If the Markov
process reaches a simple equilibrium state w(oo) as t —> oo, then that state
satisfies
where n is the length of the limit cycle. If we choose our transition prob-
abilities (or equivalently our Markov matrix) to satisfy Equation (2.7) we
guarantee that the Markov chain will have a simple equilibrium probabil-
ity distribution pu, but it may also have any number of limit cycles of the
form (2.11). This means that there is no guarantee that the actual states
generated will have anything like the desired probability distribution.
We get around this problem by applying an additional condition to our
transition probabilities thus:
This is the condition of detailed balance. It is clear that any set of transition
probabilities which satisfy this condition also satisfy Equation (2.6). (To
prove it, simply sum both sides of Equation (2.12) over v.) We can also
show that this condition eliminates limit cycles. To see this, look first at the
left-hand side of the equation, which is- the probability of being in a state
u multiplied by the probability of making a transition from that state to
another state v. In other words, it is the overall rate at which transitions
from u to v happen in our system. The right-hand side is the overall rate
for the reverse transition. The condition of detailed balance tells us that on
average the system should go from u to v just as often as it goes from v to
u. In a limit cycle, in which the probability of occupation of some or all of
the states changes in a cyclic fashion, there must be states for which this
4
This equation is also closely related to Equation (1.1). The reader may like to work
out how the one can be transformed into the other.
38 Chapter 2: Equilibrium thermal Monte Carlo simulations
condition is violated on any particular step of the Markov chain; in order for
the probability of occupation of a particular state to increase, for instance,
there must be more transitions into that state than out of it, on average.
The condition of detailed balance forbids dynamics of this kind and hence
forbids limit cycles.
Once we remove the limit cycles in this way, it is straightforward to
show that the system will always tend to the probability distribution pu as
t —> oo. As t —> oo, w(t) will tend exponentially towards the eigenvector
corresponding to the largest eigenvalue of P. This may be obvious to you if
you are familiar with stochastic matrices. If not, we prove it in Section 3.3.2.
For the moment, let us take it as given. Looking at Equation (2.10) we
see that the largest eigenvalue of the Markov matrix must in fact be one.5
If limit cycles of the form (2.11) were present, then we could also have
eigenvalues which are complex roots of one, but the condition of detailed
balance prevents this from happening. Now look back at Equation (2.7)
again. We can express this equation in matrix notation as
This equation and Equation (2.5) are the constraints on our choice of tran-
sition probabilities P(u —> v). If we satisfy these, as well as the condition of
ergodicity, then the equilibrium distribution of states in our Markov process
will be the Boltzmann distribution. Given a suitable set of transition prob-
abilities, our plan is then to write a computer program which implements
the Markov process corresponding to these transition probabilities so as to
generate a chain of states. After waiting a suitable length of time7 to allow
the probability distribution of states wu(t) to get sufficiently close to the
Boltzmann distribution, we average the observable Q that we are interested
in over M states and we have calculated the estimator QM defined in Equa-
tion (2.4). A number of refinements on this outline are possible and we will
discuss some of those in the remainder of this chapter and in later chapters
of the book, but this is the basic principle on which virtually all modern
equilibrium Monte Carlo calculations are based.
Our constraints still leave us a good deal of freedom over how we choose
the transition probabilities. There are many ways in which to satisfy them.
One simple choice for example is
although as we will show in Section 3.1 this choice is not a very good one.
There are some other choices which are known to work well in many cases,
such as the "Metropolis algorithm" proposed by Metropolis and co-workers
in 1953, and we will discuss the most important of these in the coming chap-
ters. However, it must be stressed—and this is one of the most important
6
Occasionally, in fact, we want to generate equilibrium distributions other than the
Boltzmann distribution. An example is the entropic sampling algorithm of Section 6.3.
In this case the arguments here still apply. We simply feed our required distribution into
the condition of detailed balance.
7Exactly how long we have to wait can be a difficult thing to decide. A number of
possible criteria are discussed in Section 3.2.
40 Chapter 2: Equilibrium thermal Monte Carlo simulations
things this book has to say—that the standard algorithms are very rarely
the best ones for solving new problems with. In most cases they will work,
and in some cases they will even give quite good answers, but you can almost
always do a better job by giving a little extra thought to choosing the best
set of transition probabilities to construct an algorithm that will answer
the particular questions that you are interested in. A purpose-built algo-
rithm can often give a much faster simulation than an equivalent standard
algorithm, and the improvement in efficiency can easily make the difference
between finding an answer to a problem and not finding one.
The quantity g(u —> v) is the selection probability, which is the proba-
bility, given an initial state u, that our algorithm will generate a new target
state v, and A(u —> v) is the acceptance ratio (sometimes also called the
"acceptance probability"). The acceptance ratio says that if we start off in a
state /x and our algorithm generates a new state v from it, we should accept
that state and change our system to the new state v a fraction of the time
A(u —> v). The rest of the time we should just stay in the state u. We are
free to choose the acceptance ratio to be any number we like between zero
and one; choosing it to be zero for all transitions is equivalent to choosing
P(u —> u) = 1, which is the largest value it can take, and means that we
will never leave the state /x. (Not a very desirable situation. We would never
choose an acceptance ratio of zero for an actual calculation.)
This gives us complete freedom about how we choose the selection prob-
abilities g(u —> v), since the constraint (2.14) only fixes the ratio
The ratio A(u —> v)/A(v —> u) can take any value we choose between zero
and infinity, which means that both g(u —> v) and g(v —> u) can take any
values we like.
Our other constraint, the sum rule of Equation (2.5), is still satisfied,
since the system must end up in some state after each step in the Markov
chain, even if that state is just the state we started in.
So, in order to create our Monte Carlo algorithm what we actually do
is think up an algorithm which generates random new states v given old
ones u, with some set of probabilities g(u —> v), and then we accept or
reject those states with acceptance ratios A(u —» v) which we choose to
satisfy Equation (2.17). This will then satisfy all the requirements for the
transition probabilities, and so produce a string of states which, when the
algorithm reaches equilibrium, will each appear with their correct Boltzmann
probability.
This all seems delightful, but there is a catch which we must always
bear in mind, and which is one of the most important considerations in the
design of Monte Carlo algorithms. If the acceptance ratios for our moves
are low, then the algorithm will on most time steps simply stay in the state
42 Chapter 2: Equilibrium thermal Monte Carlo simulations
that it is in, and not go anywhere. The step on which it actually accepts a
change to a new state will be rare, and this is wasteful of time. We want an
algorithm that moves nimbly about state space and samples a wide selection
of different states. We don't want to take a million time steps and find that
our algorithm has only sampled a dozen states. The solution to this problem
is to make the acceptance ratio as close to unity as possible. One way to do
this is to note that Equation (2.17) fixes only the ratio A(u —» v)/A(v —> u)
of the acceptance ratios for the transitions in either direction between any
two states. Thus we are free to multiply both A(u —> v) and A(v —> u) by
the same factor, and the equation will still be obeyed. The only constraint is
that both acceptance ratios should remain between zero and one. In practice
then, what we do is to set the larger of the two acceptance ratios to one,
and have the other one take whatever value is necessary for the ratio of the
two to satisfy (2.17). This ensures that the acceptance ratios will be as large
as they can be while still satisfying the relevant conditions, and indeed that
the ratio in one direction will be unity, which means that in that direction
at least, moves will always be accepted.
However, the best thing we can do to keep the acceptance ratios large
is to try to embody in the selection probabilities g(u —» v) as much as
we can of the dependence of P(u —> v) on the characteristics of the states
u and v, and put as little as we can in the acceptance ratio. The ideal
algorithm is one in which the new states are selected with exactly the correct
transition probabilities all the time, and the acceptance ratio is always one.
A good algorithm is one in which the acceptance probability is usually close
to one. Much of the effort invested in the algorithms described in this book
is directed at making the acceptance ratios large.
lation in the ground state, then move up to the first excited state for one
time-step and then relax back to the ground state again. Such behaviour is
not unreasonable for a cold system but we waste a lot of computer time sim-
ulating it. Time-step after time-step our algorithm selects a possible move
to some excited state, but the acceptance ratio is very low and virtually all
of these possible moves are rejected, and the system just ends up spending
most of its time in the ground state.
Well, what if we were to accept that this is the case, and take a look at the
acceptance ratio for a move from the ground state to the first excited state,
and say to ourselves, "Judging by this acceptance ratio, this system is going
to spend a hundred time-steps in the ground state before it accepts a move
to the first excited state". Then we could jump the gun by assuming that
the system will do this, miss out the calculations involved in the intervening
useless one hundred time-steps, and progress straight to the one time-step in
which something interesting happens. This is the essence of the idea behind
the continuous time method. In this technique, we have a time-step which
corresponds to a varying length of time, depending on how long we expect
the system to remain in its present state before moving to a new one. Then
when we come to take the average of Our observable Q over many states, we
weight the states in which the system spends longest the most heavily—the
calculation of the estimator of Q is no more than a time average, so each
value Qu for Q in state u should be weighted by how long the system spends
in that state.
How can we adapt our previous ideas concerning the transition proba-
bilities for our Markov process to take this new idea into account? Well,
assuming that the system is in some state u, we can calculate how long a
time At (measured in steps of the simulation) it will stay there for before a
move to another state is accepted by considering the "stay-at-home" prob-
ability P(u —> u). The probability that it is still in this same state u after t
time-steps is just
So, if we can calculate this quantity At, then rather than wait this many
time-steps for a Monte Carlo move to get accepted, we can simply pretend
that we have done the waiting and go right ahead and change the state of
the system to a new state v = u. Which state should we choose for v? We
should choose one at random, but in proportion to P(u —> v). Thus our
continuous time Monte Carlo algorithm consists of the following steps:
44 Chapter 2: Equilibrium thermal Monte Carlo simulations
Problems
2.1 Derive Equation (2.8) from Equation (1.1).
2.2 Consider a system which has just three energy states, with energies
E0 < E1 < E2. Suppose that the only allowed transitions are ones of
the form u —» V, where v = (u + 1) mod 3. Such a system cannot satisfy
detailed balance. Show nonetheless that it is possible to choose the transition
probabilities P(u —> v) so that the Boltzmann distribution is an equilibrium
of the dynamics.
3
In Section 1.2.2 we introduced the Ising model, which is one of the simplest
and best studied of statistical mechanical models. In this chapter and the
next we look in detail at the Monte Carlo methods that have been used
to investigate the properties of this model. As well as demonstrating the
application of the basic principles described in the last chapter, the study
of the Ising model provides an excellent introduction to the most important
Monte Carlo algorithms in use today. Along the way we will also look at some
of the tricks used for implementing Monte Carlo algorithms in computer
programs and at some of the standard techniques used to analyse the data
those programs generate.
To recap briefly, the Ising model is a simple model of a magnet, in which
dipoles or "spins" si are placed on the sites i of a lattice. Each spin can take
either of two values: +1 and — 1. If there are N sites on the lattice, then the
system can be in 2N states, and the energy of any particular state is given
by the Ising Hamiltonian:
The, very first Monte Carlo algorithm we introduce in this book is the most
famous and widely used algorithm of them all, the Metropolis algorithm,
which was introduced by Nicolas Metropolis and his co-workers in a 1953
paper on simulations of hard-sphere gases (Metropolis et al. 1953). We
will use this algorithm to illustrate many of the general concepts involved
in a real Monte Carlo calculation, including equilibration, measurement of
expectation values, and the calculation of errors. First however, let us see
how the algorithm is arrived at, and how one might go about implementing
it on a computer.
The derivation of the Metropolis algorithm follows exactly the plan we
outlined in Section 2.3. We choose a set of selection probabilities g(u —» v),
one for each possible transition from one state to another, u —> v, and
then we choose a set of acceptance probabilities A(u —> v) such that Equa-
tion (2.17) satisfies the condition of detailed balance, Equation (2.14). The
algorithm works by repeatedly choosing a new state v, and then accepting
or rejecting it at random with our chosen acceptance probability. If the state
is accepted, the computer changes the system to the new state v. If not, it
just leaves it as it is. And then the process is repeated again and again.
The selection probabilities g(u —> v) should be chosen so that the condi-
tion of ergodicity—the requirement that every state be accessible from every
other in a finite number of steps—is fulfilled (see Section 2.2.2). This still
leaves us a good deal of latitude about how they are chosen; given an initial
state u we can generate any number of candidate states v simply by flipping
different subsets of the spins on the lattice. However, as we demonstrated
in Section 1.2.1, the energies of systems in thermal equilibrium stay within
a very narrow range—the energy fluctuations are small by comparison with
the energy of the entire system. In other words, the real system spends
most of its time in a subset of states with a narrow range of energies and
rarely makes transitions that change the energy of the system dramatically.
This tells us that we probably don't want to spend much time in our sim-
ulation considering transitions to states whose energy is very different from
the energy of the present state. The simplest way of achieving this in the
Ising model is to consider only those states which differ from the present one
by the flip of a single spin. An algorithm which does this is said to have
single-spin-flip dynamics. The algorithm we describe in this chapter has
single-spin-flip dynamics, although this is not what makes it the Metropolis
algorithm. (As discussed below, it is the particular choice of acceptance ratio
that characterizes the Metropolis algorithm. Our algorithm would still be a
Metropolis algorithm even if it flipped many spins at once.)
Using single-spin-flip dynamics guarantees that the new state v will have
an energy Ev differing from the current energy Eu by at most 2J for each
3.1 The Metropolis algorithm 47
bond between the spin we flip and its neighbours. For example, on a square
lattice in two dimensions each spin has four neighbours, so the maximum
difference in energy would be 8J. The general expression is 2zJ, where z
is the lattice coordination number, i.e., the number of neighbours that
each site on the lattice has.1 Using single-spin-flip dynamics also ensures
that our algorithm obeys ergodicity, since it is clear that we can get from
any state to any other on a finite lattice by flipping one by one each of the
spins by which the two states differ.
In the Metropolis algorithm the selection probabilities g(u —> v) for each
of the possible states v are all chosen to be equal. The selection probabilities
of all other states are set to zero. Suppose there are N spins in the system
we are simulating. With single-spin-flip dynamics there are then N different
spins that we could flip, and hence N possible states v which we can reach
from a given state u. Thus there are N selection probabilities g(u —> v)
which are non-zero, and each of them takes the value
Now we have to choose the acceptance ratios A(u -> v) to satisfy this equa-
tion. As we pointed out in Section 2.2.3, one possibility is to choose
FIGURE 3.1 Plot of the acceptance ratio given in Equation (3.6) (solid
line). This acceptance ratio gives rise to an algorithm which samples
the Boltzmann distribution correctly, but is very inefficient, since it
rejects the vast majority of the moves it selects for consideration. The
Metropolis acceptance ratio (dashed line) is much more efficient.
This is not the Metropolis algorithm (we are coming to that), but using
this acceptance probability we can perform a Monte Carlo simulation of
the Ising model, and it will correctly sample the Boltzmann distribution.
However, the simulation will be very inefficient, because the acceptance ratio,
Equation (3.6), is very small for almost all moves. Figure 3.1 shows the
acceptance ratio (solid line) as a function of the energy difference A.E =
Ev — Eu over the allowed range of values for a simulation with (3 = J = 1
and a lattice coordination number z = 4, as on a square lattice for example.
As we can see, although A(u —> v) starts off at 1 for A.E = —8, it quickly
falls to only about 0.13 at AE = -4, and to only 0.02 when AE = 0. The
chances of making any move for which A.E > 0 are pitifully small, and in
practice this means that an algorithm making use of this acceptance ratio
would be tremendously slow, spending most of its time rejecting moves and
not flipping any spins at all. The solution to this problem is as follows.
In Equation (3.4) we have assumed a particular functional form for the ac-
ceptance ratio, but the condition of detailed balance, Equation (3.3), doesn't
actually require that it take this form. Equation (3.3) only specifies the ratio
of pairs of acceptance probabilities, which still leaves us quite a lot of room
to manoeuvre. In fact, as we pointed out in Section 2.3, when given a con-
3.1 The Metropolis algorithm 49
straint like (3.3) the way to maximize the acceptance ratios (and therefore
produce the most efficient algorithm) is always to give the larger of the two
ratios the largest value possible—namely 1—and then adjust the other to
satisfy the constraint. To see how that works out in this case, suppose that
of the two states u and v we are considering here, u has the lower energy
and v the higher: Eu < EV. Then the larger of the two acceptance ratios is
A(v —» u), so we set that equal to one. In order to satisfy Equation (3.3),
A(u —> v) must then take the value e-B(Ev - Eu) . Thus the optimal algorithm
is one in which
In other words, if we select a new state which has an energy lower than or
equal to the present one, we should always accept the transition to that state.
If it has a higher energy then we maybe accept it, with the probability given
above. This is the Metropolis algorithm for the Ising model with single-spin-
flip dynamics. It is Equation (3.7) which makes it the Metropolis algorithm.
This is the part that was pioneered by Metropolis and co-workers in their
paper on hard-sphere gases, and any algorithm, applied to any model, which
chooses selection probabilities according to a rule like (3.7) can be said to
be a Metropolis algorithm. At first, this rule may seem a little strange,
especially the part about how we always accept a move that will lower the
energy of the system. The first algorithm we suggested, Equation (3.6),
seems much more natural in this respect, since it sometimes rejects moves to
lower energy. However, as we have shown, the Metropolis algorithm satisfies
detailed balance, and is by far the more efficient algorithm, so, natural or
not, it has become the algorithm of choice in the overwhelming majority of
Monte Carlo studies of simple statistical mechanical models in the last forty
years. We have also plotted Equation (3.7) in Figure 3.1 (dashed line) for
comparison between the two algorithms.
In the second line the sum is over only those spins i which are nearest
neighbours of the flipped spin k and we have made use of the fact that all
of these spins do not themselves flip, so that sv = su. Now if su = +1, then
after spin k has been flipped we must have svk = —1, so that sv - su = -2.
On the other hand, if su = —1 then sv — su = +2. Thus we can write
52 Chapter 3: The Ising model and the Metropolis algorithm
and so
This expression only involves summing over z terms, rather than 1 N z , and
it doesn't require us to perform any multiplications for the terms in the sum,
so it is much more efficient than evaluating the change in energy directly.
What's more, it involves only the values of the spins in state u, so we can
evaluate it before we actually flip the spin k.
The algorithm thus involves calculating Ev — Eu from Equation (3.10)
and then following the rule given in Equation (3.7): if Ev — Eu < 0 we
definitely accept the move and flip the spin Sk —> -Sk. If Ev — Eu > 0 we
still may want to flip the spin. The Metropolis algorithm tells us to flip it
with probability A(u —» v) = e-B(Ev - Eu). We can do this as follows. We
evaluate the acceptance ratio A(u —> v) using our value of Ev — Eu from
Equation (3.10), and then we choose a random number r between zero and
one. (Strictly the number can be equal to zero, but it must be less than one:
0 < r < 1.) If that number is less than our acceptance ratio, r < A(u —> v),
then we flip the spin. If it isn't, we leave the spin alone.
And that is our complete algorithm. Now we just keep on repeating
the same calculations over and over again, choosing a spin, calculating the
energy change we would get if we flipped it, and then deciding whether to flip
it according to Equation (3.7). Actually, there is one other trick that we can
pull that makes our algorithm a bit faster still. (In fact, on most computers
it will make it a lot faster.) One of the slowest parts of the algorithm as
we have described it is the calculation of the exponential, which we have to
perform if the energy of the new state we choose is greater than that of the
current state. Calculating exponentials on a computer is usually done using
a polynomial approximation which involves performing a number of floating-
point multiplications and additions, and can take a considerable amount of
time. We can save ourselves this effort, and thereby speed up our simulation,
if we notice that the quantity, Equation (3.10), which we are calculating the
exponential of, can only take a rather small number of values. Each of the
terms in the sum can only take the values +1 and —1. So the entire sum,
which has z terms, can only take the values —z, —z + 2, — z + 4 ... and so
on up to +z—a total of z + 1 possible values. And we only actually need
to calculate the exponential when the sum is negative (see Equation (3.7)
again), so in fact there are only 1z values of Eu — Ev for which we ever
need to calculate exponentials. Thus, it makes good sense to calculate the
values of these 1z exponentials before we start the calculation proper, and
store them in the computer's memory (usually in an array), where we can
3.2 Equilibration 53
simply look them up when we need them during the simulation. We pay
the one-time cost of evaluating them at the beginning, and save a great deal
more by never having to evaluate any exponentials again during the rest of
the simulation. Not only does this save us the effort of evaluating all those
exponentials, it also means that we hardly have to perform any floating-point
arithmetic during the simulation. The only floating-point calculations will
be in the generation of the random number r. (We discuss techniques for
doing this in Chapter 16.) All the other calculations involve only integers,
which on most computers are much quicker to deal with than real numbers.
3.2 Equilibration
So what do we do with our Monte Carlo program for the Ising model, once
we have written it? Well, we probably want to know the answer to some
questions like "What is the magnetization at such-and-such a temperature?",
or "How does the internal energy behave with temperature over such-and-
such a range?" To answer these questions we have to do two things. First we
have to run our simulation for a suitably long period of time until it has come
to equilibrium at the temperature we are interested in—this period is called
the equilibration time req—and then we have to measure the quantity we
are interested in over another suitably long period of time and average it,
to evaluate the estimator of that quantity (see Equation (2.4)). This leads
us to several other questions. What exactly do we mean by "allowing the
system to come to equilibrium" ? And how long is a "suitably long" time for
it to happen? How do we go about measuring our quantity of interest, and
how long do we have to average over to get a result of a desired degree of
accuracy? These are very general questions which we need to consider every
time we do a Monte Carlo calculation. Although we will be discussing them
here using our Ising model simulation as an example, the conclusions we
will draw in this and the following sections are applicable to all equilibrium
Monte Carlo calculations. These sections are some of the most important in
this book.
As we discussed in Section 1.2, "equilibrium" means that the average
probability of finding our system in any particular state u is proportional
to the Boltzmann weight e-BEu of that state. If we start our system off in
a state such as the T = 0 or T = oo states described in the last section
and we want to perform a simulation at some finite non-zero temperature,
it will take a little time before we reach equilibrium. To see this, recall that,
as we demonstrated in Section 1.2.1, a system at equilibrium spends the
overwhelming majority of its time in a small subset of states in which its
internal energy and other properties take a narrow range of values. In order
to get a good estimate of the equilibrium value of any property of the system
therefore, we need to wait until it has found its way to one of the states that
54 Chapter 3: The Ising model and the Metropolis algorithm
fall in this narrow range. Then, we assume, the Monte Carlo algorithm we
have designed will ensure that it stays roughly within that range for the rest
of the simulation—it should do since we designed the algorithm specifically
to simulate the behaviour of the system at equilibrium. But it may take
some time to find a state that lies within the correct range. In the version
of the Metropolis algorithm which we have described here, we can only flip
one spin at a time, and since we are choosing the spins we flip at random,
it could take quite a while before we hit, on the correct sequence of spins to
3.2 Equilibration 55
flip in order to get us to one of the states we want to be in. At the very
least we can expect it to take about N Monte Carlo steps to reach a state
in the appropriate energy range, where N is the number of spins on the
lattice, since we need to allow every spin the chance to flip at least once. In
Figure 3.2 we show a succession of states of a two-dimension Ising model on
a square lattice of 100 x 100 spins with J = 1, as it is "warmed" up to a
temperature T = 2.4 from an initial T = 0 state in which all the spins are
aligned. In these pictures the +1 and —1 spins are depicted as black and
white squares. By the time we reach the last frame out of nine, the system
has equilibrated. The whole process takes on the order of 107 steps in this
case.
However looking at pictures of the lattice is not a reliable way of gauging
when the system has come to equilibrium. A better way, which takes very
little extra effort, is to plot a graph of some quantity of interest, like the
magnetization per spin m of the system or the energy of the system E, as
a function of time from the start of the simulation. We have done this in
Figure 3.3. (We will discuss the best ways of measuring these quantities
in the next section, but for the moment let's just assume that we calculate
them directly. For example, the energy of a given state can be calculated by
56 Chapter 3: The Ising model and the Metropolis algorithm
feeding all the values of the spins Si into the Hamiltonian, Equation (3.1).)
It is not hard to guess simply by looking at this graph that the system came
to equilibrium at around time t — 6000. Up until this point the energy
and the magnetization are changing, but after this point they just fluctuate
around a steady average value.
The horizontal axis in Figure 3.3 measures time in Monte Carlo steps per
lattice site, which is the normal practice for simulations of this kind. The
reason is that if time is measured in this way, then the average frequency
with which any particular spin is selected for flipping is independent of the
total number of spins N on the lattice. This average frequency is called
the "attempt frequency" for the spin. In the simulation we are considering
here the attempt frequency has the value 1. It is natural that we should
arrange for the attempt frequency to be independent of the lattice size; in
an experimental system, the rate at which spins or atoms or molecules change
from one state to another does not depend on how many there are in the
whole system. An atom in a tiny sample will change state as often as one in
a sample the size of a house. Attempt frequencies are discussed further in
Section 11.1.1.
When we perform N Monte Carlo steps—one for each spin in the system,
on average—we say we have completed one sweep of the lattice. We could
therefore also say that the time axis of Figure 3.3 was calibrated in sweeps.
Judging the equilibration of a system by eye from a plot such as Figure 3.3
is a reasonable method, provided we know that the system will come to
equilibrium in a smooth and predictable fashion as it does in this case.
The trouble is that we usually know no such thing. In many cases it is
possible for the system to get stuck in some metastable region of its state
space for a while, giving roughly constant values for all the quantities we
are observing and so appearing to have reached equilibrium. In statistical
mechanical terms, there can be a local energy minimum in which the
system can remain temporarily, and we may mistake this for the global
energy minimum, which is the region of state space that the equilibrium
system is most likely to inhabit. (These ideas are discussed in more detail
in the first few sections of Chapter 6.) To avoid this potential pitfall, we
commonly adopt a different strategy for determining the equilibration time,
in which we perform two different simulations of the same system, starting
them in different initial states. In the case of the Ising model we might, for
example, start one in the T = 0 state with all spins aligned, and one in the
T = oo state with random spins. Or we could choose two different T = oo
random-spin states. We should also run the two simulations with different
"seeds" for the random number generator (see Section 16.1.2), to ensure
that they take different paths to equilibrium. Then we watch the value of
the magnetization or energy or other quantity in the two systems and when
we see them reach the same approximately constant value, we deduce that
3.3 Measurement 57
both systems have reached equilibrium. We have done this for two 100 x 100
Ising systems in Figure 3.4. Again, we clearly see that it takes about 6000
Monte Carlo steps for the two systems to reach a consensus about the value
of the magnetization. This technique avoids the problem mentioned above,
since if one of the systems finds itself in some metastable region, and the
other reaches equilibrium or gets stuck in another metastable region, this will
be apparent from the graph, because the magnetization (or other quantity)
will take different values for the two systems. Only in the unlikely event
that the two systems coincidentally become trapped in the same metastable
region (for example, if we choose two initial states that are too similar to
one another) will we be misled into thinking they have reached equilibrium
when they haven't. If we are worried about this possibility, we can run three
different simulations from different starting points, or four, or five. Usually,
however, two is sufficient.
3.3 Measurement
Once we are sure the system has reached equilibrium, we need to measure
whatever quantity it is that we are interested in. The most likely candidates
58 Chapter 3: The Ising model and the Metropolis algorithm
for the Ising model are the energy and the magnetization of the system. As
we pointed out above, the energy Eu of the current state u of the system
can be evaluated directly from the Hamiltonian by substituting in the values
of the spins si from our array of integers. However, this is not an especially
efficient way of doing it, and there is a much better way. As part of our
implementation of the Metropolis algorithm, you will recall we calculated
the energy difference A.E = Ev — Eu in going from state u to state v (see
Equation (3.10)). So, if we know the energy of the current state u, we can
calculate the new energy when we flip a spin, using only a single addition:
So the clever thing to do is to calculate the energy of the system from the
Hamiltonian at the very start of the simulation, and then every time we flip
a spin calculate the new energy from Equation (3.11) using the value of AE,
which we have to calculate anyway.
Calculating the magnetization is even easier. The total magnetization
Mu, of the whole system in state u (as opposed to the magnetization per
spin—we'll calculate that in a moment), is given by the sum
where the last equality follows from Equation (3.9). Thus, the clever way to
evaluate the magnetization is to calculate its value at the beginning of the
simulation, and then make use of the formula
Given the energy and the magnetization of our Ising system at a selection
of times during the simulation, we can average them to find the estimators of
the internal energy and average magnetization. Then dividing these figures
by the number of sites N gives us the internal energy and average magneti-
zation per site.
We can also average the squares of the energy and magnetization to find
quantities like the specific heat and the magnetic susceptibility:
(See Equations (1.36) and (1.37). Note that we have set k = 1 again.)
In order to average quantities like E and M, we need to know how long a
run we have to average them over to get a good estimate of their expectation
values. One simple solution would again be just to look at a graph like
Figure 3.3 and guess how long we need to wait. However (as you might
imagine) this is not a very satisfactory solution. What we really need is
a measure of the correlation time r of the simulation. The correlation
time is a measure of how long it takes the system to get from one state to
another one which is significantly different from the first, i.e., a state in which
the number of spins which are the same as in the initial state is no more
than what you would expect to find just by chance. (We will give a more
rigorous definition in a moment.) There are a number of ways to estimate
the correlation time. One that is sometimes used is just to assume that it
is equal to the equilibration time. This is usually a fairly safe assumption:3
usually the equilibration time is considerably longer than the correlation
time, req > T, because two states close to equilibrium are qualitatively more
similar than a state far from equilibrium (like the T = 0 or T = oo states
we suggested for starting this simulation with) and one close to equilibrium.
However, this is again a rather unrigorous supposition, and there are more
watertight ways to estimate r. The most direct of these is to calculate the
"time-displaced autocorrelation function" of some property of the model.
3
In particular, it works fine for the Metropolis simulation of the Ising model which we
are considering here.
60 Chapter 3: The Ising model and the Metropolis algorithm
(In the next section we show why it should take this form.) With this
definition, we see that in fact there is still a significant correlation between
two samples taken a correlation time apart: at time t = T the autocorrelation
function, which is a measure of the similarity of the two states, is only
a factor of 1/e down from its maximum value at t = 0. If we want truly
independent samples then, we may want to draw them at intervals of greater
than one correlation time. In fact, the most natural definition of statistical
independence turns out to be samples drawn at intervals of 2T. We discuss
this point further in Section 3.4.1.
3.3 Measurement 61
for a certain amount of time, and we want to be sure of having at least one
measurement every two correlation times. Another reason is that we want
to be able to calculate the autocorrelation function for times less than a
correlation time, so that we can use it to make an accurate estimate of the
correlation time. If we only had one measurement every 2r, we wouldn't be
able to calculate T with any accuracy at all.
If we want a more reliable figure for T, we can replot our autocorrelation
function on semi-logarithmic axes as we have done in Figure 3.6, so that
the slope of the line gives us the correlation time. Then we can estimate T
by fitting the straight-line portion of the plot using a least-squares method.
The dotted line in the figure is just such a fit and its slope gives us a figure
of T — 95 ± 5 for the correlation time in this case.
An alternative is to calculate the integrated correlation time.4 If we
assume that Equation (3.18) is accurate for all times t then
This form has a number of advantages. First, it is often easier to apply Equa-
tion (3.20) than it is to perform the exponential fit to the autocorrelation
4
This is a rather poor name for this quantity, since it is not the correlation time that
is integrated but the autocorrelation function. However, it is the name in common use so
we use it here too.
3.3 Measurement 63
Notice how we have evaluated the mean magnetization m in the second term
using the same subsets of the data that we used in the first term. This is
not strictly speaking necessary, but it makes x(t) a little better behaved. In
Figure 3.5 we have also normalized x(t) by dividing throughout by x(0), but
this is optional. We've just done it for neatness.
Note that one should be careful about using Equation (3.21) to evaluate
x(t) at long times. When t gets close to tmax, the upper limit of the sums
becomes small and we end up integrating over a rather small time interval
to get our answer. This means that the statistical errors in x ( t ) due to
the random nature of the fluctuations in m(t) may become large. A really
satisfactory simulation would always run for many correlation times, in which
case we will probably not be interested in the very tails of x(t), since the
correlations will have died away by then, by definition. However, it is not
5
In fact, this formula differs from (3.17) by a multiplicative constant, but this makes
no difference as far as the calculation of the correlation time is concerned.
64 Chapter 3: The Ising model and the Metropolis algorithm
been written very carefully by people who understand the exact workings of
the computer and are designed to be as fast as possible at performing this
particular calculation, so it usually saves us much time and effort to make
use of the programs in these packages. Having calculated x(w), we can then
invert the Fourier transform, again using a streamlined inverse FFT routine
which also runs in time proportional to n log n, and so recover the function
x(t).
We might imagine that, if we wanted to calculate the integrated correla-
tion time, Equation (3.20), we could avoid inverting the Fourier transform,
since
and thus
where w is the vector whose elements are the probabilities wu (see Equa-
tion (2.9)). By iterating this equation from time t = 0 we can then show
that
66 Chapter 3: The Ising model and the Metropolis algorithm
where the quantities ai are coefficients whose values depend on the configu-
ration of the system at t = 0. Then
or
where q is the vector whose elements are the values Qu of the quantity in
the various states of the system. Substituting Equation (3.29) into (3.31) we
then get
8
P is in general not symmetric, so its right and left eigenvectors are not the same.
3.3 Measurement 67
The quantities Ti are the correlation times for the system, as we can demon-
strate by showing that they also govern the decay of the autocorrelation
function. Noting that the long-time limit Q(oo) of the expectation of Q is
none other than the equilibrium expectation (Q), we can write the autocor-
relation of Q as the correlation between the expectations at zero time and
some later time t thus:
with
9
Monte Carlo simulations are in many ways rather similar to experiments. It often
helps to regard them as "computer experiments", and analyse the results in the same way
as we would analyse the results of a laboratory experiment.
3.4 Calculation of errors 69
and our best estimate of the standard deviation on the mean is given by10
where r is the correlation time and At is the time interval at which the
samples were taken. Clearly this becomes equal to Equation (3.37) when
At > T, but more often we have At < T. In this case, we can ignore the 1
in the numerator of Equation (3.38). Noting that for a run of length t max
(after equilibration) the interval At is related to the total number of samples
by
which is the same result as we would get by simply using Equation (3.19)
for n in Equation (3.37). This in fact was the basis for our assertion in
Section 3.3.1 that the appropriate sampling interval for getting independent
samples was twice the correlation time. Note that the value of a in Equa-
tion (3.40) is independent of the value of At, which means we are free to
choose At in whatever way is most convenient.
section. This happens when the result we want is not merely the average of
some measurement repeated many times over the course of the simulation, as
the magnetization is, but is instead derived in some more complex way from
measurements we make during the run. An example is the specific heat c,
Equation (3.15), which is inherently an average macroscopic quantity. Unlike
the magnetization, the specific heat is not defined at a single time step in the
simulation. It is only defined in terms of averages of many measurements
of E and E2 over a longer period of time. We might imagine that we could
calculate the error on (E) and the error on (E2) using the techniques we
employed for the magnetization, and then combine them in some fashion
to give an estimate of the error in c. But this is not as straightforward as
it seems at first, since the errors in these two quantities are correlated—
when (E) goes up, so does (E2). It is possible to do the analysis necessary
to calculate the error on c in this fashion. However, it is not particularly
simple, and there are other more general methods of error estimation which
lend themselves to this problem. As the quantities we want to measure
become more complex, these methods—"blocking", the "bootstrap" and the
"jackknife"—will save us a great deal of effort in estimating errors. We
illustrate these methods here for the case of the specific heat, though it
should be clear that they are applicable to almost any quantity that can be
measured in a Monte Carlo simulation.
The simplest of our general-purpose error estimation methods is the
blocking method. Applied to the specific heat, the idea is that we take
the measurements of E that we made during the simulation and divide them
into several groups, or blocks. We then calculate c separately for each block,
and the spread of values from one block to another gives us an estimate of
the error. To see how this works, suppose we make 200 measurements of the
energy during our Ising model simulation, and then split those into 10 groups
of 20 measurements. We can evaluate the specific heat from Equation (3.15)
for each group and then find the mean of those 10 results exactly as we
did for the magnetization above. The error on the mean is given again by
Equation (3.37), except that n is now replaced by the number nb of blocks,
which would be 10 in our example. This method is intuitive, and will give
a reasonable estimate of the order of magnitude of the error in a quantity
such as c. However, the estimates it gives vary depending on the number
of different blocks you divide your data up into, with the smallest being as-
sociated with large numbers of blocks, and the largest with small numbers
of blocks, so it is clearly not a very rigorous method. A related but more
reliable method, which can be used for error estimation in a wide variety of
different circumstances, is the bootstrap method, which we now describe.
3.4 Calculation of errors 71
Notice that there is no extra factor of l/(n - 1) here as there was in Equa-
tion (3.37). (It is clear that the latter would not give a correct result, since
it would imply that our estimate of the error could be reduced by simply
resampling our data more times.)
As we mentioned, it is not necessary for the working of the bootstrap
method that all the measurements made be independent in the sense of Sec-
tion 3.3.1 (i.e., one every two correlation times or more). As we pointed
out earlier, it is more common in a Monte Carlo simulation to make mea-
surements at comfortably short intervals throughout the simulation so as to
be sure of making at least one every correlation time or so and then calcu-
late the number of independent measurements made using Equation (3.19).
Thus the number of samples taken usually exceeds the number which ac-
tually constitute independent measurements. One of the nice things about
the bootstrap method is that it is not necessary to compensate for this dif-
ference in applying the method. You get fundamentally the same estimate
of the error if you simply resample your n measurements from the entire
set of measurements that were made. In this case, still about 63% of the
samples will be duplicates of one another, but many others will effectively
be duplicates as well because they will be measurements taken at times less
than a correlation time apart. Nonetheless, the resulting estimate of a is the
72 Chapter 3: The Ising model and the Metropolis algorithm
same.
The bootstrap method is a good general method for estimating errors
in quantities measured by Monte Carlo simulation. Although the method
initially met with some opposition from mathematicians who were not con-
vinced of its statistical validity, it is now widely accepted as giving good
estimates of errors (Efron 1979).
where c is our estimate of the specific heat using all the data.11
Both the jackknife and the bootstrap give good estimates of errors for
large data sets, and as the size of the data set becomes infinite they give
exact estimates. Which one we choose in a particular case usually depends
on how much work is involved applying them. In order to get a decent error
estimate from the bootstrap method we usually need to take at least 100
resampled sets of data, and 1000 would not be excessive. (100 would give
the error to a bit better than 10% accuracy.) With the jackknife we have to
recalculate the quantity we are interested in exactly n times to get the error
estimate. So, if n is much larger than 100 or so, the bootstrap is probably
the more efficient. Otherwise, we should use the jackknife.12
11
In fact, it is possible to use the jackknife method with samples taken at intervals At
less than 2r. In this case we just reduce the sum inside the square root by a factor of
At/2T to get an estimate of a which is independent of the sampling interval.
12
For a more detailed discussion of these two methods, we refer the interested reader to
the review article by Efron (1979).
3.5 Measuring the entropy 73
so that if we can calculate one, we can easily find the other using the known
value of the total internal energy U. Normally, we calculate the entropy,
which we do by integrating the specific heat over temperature as follows.
We can calculate the specific heat of our system from the fluctuations in
the internal energy as described in Section 3.3. Moreover, we know that the
specific heat C is equal to
calculating this correlation function for the Ising model. The most straight-
forward way is to evaluate it directly from the definition, Equation (1.26),
using the values of the spins si from our simulation:
independent of the value of ri. This means that we can improve our estimate
of the correlation function G(2) (r) by averaging its value over the whole
lattice for all pairs of spins separated by a displacement r:
If, as with our Ising model simulation, the system we are simulating has
periodic boundary conditions (see Section 3.1), then G(2)(r) will not die
away for very large values of r. Instead, it will be periodic, dying away for
values of r up to half the width of the lattice, and then building up again to
another maximum when we have gone all the way across the lattice and got
back to the spin we started with.
In order to evaluate G(2) (r) using Equation (3.50) we have to record the
value of every single spin on the lattice at intervals during the simulation.
This is not usually a big problem given the generous amounts of storage space
provided by modern computers. However, if we want to calculate G(2) (r) for
every value of r on the lattice, this kind of direct calculation does take an
amount of time which scales with the number N of spins on the lattice as
N2. As with the calculation of the autocorrelation function in Section 3.3.1,
it actually turns out to be quicker to calculate the Fourier transform of the
correlation function instead.
The spatial Fourier transform G(2) (k) is defined by
76 Chapter 3: The Ising model and the Metropolis algorithm
Note that si and s' differ only by the average magnetization m, which is
a constant. As a result, G(2)(k) and G(2) (k) are in fact identical, except
at k = 0. For this reason, it is often simpler to calculate G(2) (k) by first
calculating G(2)(k) and then just setting the k = 0 component to zero.
describe in detail how these and other results are arrived at, and discuss
what conclusions we can draw from them.
The first step in performing any Monte Carlo calculation, once we have
decided on the algorithm we are going to use, is to write the computer
program to implement that algorithm. The code used for our Metropolis
simulation of the Ising model is given in Appendix B. It is written in the
computer language C.
As a test of whether the program is correct, we have first used it to
simulate a small 5 x 5 Ising model for a variety of temperatures between
T = 0 and T = 5.0 with J set equal to 1. For such a small system our
program runs very fast, and the entire simulation only took about a second
at each temperature. In Section 1.3 we performed an exact calculation of
the magnetization and specific heat for the 5 x 5 Ising system by directly
evaluating the partition function from a sum over all the states of the system.
This gives us something to compare our Monte Carlo results with, so that
we can tell if our program is doing the right thing. At this stage, we are
not interested in doing a very rigorous calculation, only in performing a
quick check of the program, so we have not made much effort to ensure the
equilibration of the system or to measure the correlation time. Instead, we
simply ran our program for 20 000 Monte Carlo steps per site (i.e., 20 000 x
25 = 500 000 steps in all), and averaged over the last 18 000 of these to
measure the magnetization and the energy. Then we calculated m from
Equation (3.12) and c from Equation (3.15). If the results do not agree
with our exact calculation then it could mean either that there is a problem
with the program, or that we have not waited long enough in either the
equilibration or the measurement sections of the simulation. However, as
shown in Figure 3.7, the numerical results agree rather well with the exact
ones. Even though we have not calculated the statistical errors on our data in
order to determine the degree of agreement, these results still give us enough
confidence in our program to proceed with a more thorough calculation on
a larger system.
For our large-scale simulation, we have chosen to examine a system of
100 x 100 spins on a square lattice. We started the program with randomly
chosen values of all the spins—the T = oo state of Section 3.1.1—and ran
the simulations at a variety of temperatures from T = 0.2 to T = 5.0 in steps
of 0.2, for a total of 25 simulations in all.15 Again we ran our simulations
15
Note that it is not possible to perform a simulation at T = 0 because the acceptance
ratio, Equation (3.7), for spin flips which increase the energy of the system becomes zero
in this limit. This means that it is not possible to guarantee that the system will come
to equilibrium, because the requirement of ergodicity is violated; there are some states
which it is not possible to get to in a finite number of moves. It is true in general of
thermal Monte Carlo methods that they break down at T = 0, and often they become
very slow close to T = 0. The continuous time Monte Carlo method of Section 2.4 can
sometimes by used to overcome this problem in cases where we are particularly interested
78 Chapter 3: The Ising model and the Metropolis algorithm
for 20 000 Monte Carlo steps per lattice site. This is a fairly generous first
run, and is only possible because we are looking at quite a small system still.
In the case of larger or more complex models, one might well first perform
a shorter run to get a rough measure of the equilibration and correlation
times for the system, before deciding how long a simulation to perform. A
still more sophisticated approach is to perform a short run and then store
the configuration of the spins on the lattice before the program ends. Then,
after deciding how long the entire calculation should last on the basis of
the measurements during that short run, we can pick up exactly where we
left off using the stored configuration, thus saving ourselves the effort of
equilibrating the system twice.
Taking the data from our 25 simulations at different temperatures, we
first estimate the equilibration times req at each temperature using the meth-
ods described in Section 3.2. In this case we found that all the equilibration
times were less than about 1000 Monte Carlo steps per site, except for the
simulations performed at T = 2.0 and T = 2.2, which both had equilibration
times on the order of 6000 steps per site. (The reason for this anomaly is
explained in the next section.) Allowing ourselves a margin for error in these
estimates, we therefore took the data from time 2000 onwards as our equi-
FIGURE 3.8 The correlation time for the 100 x 100 Ising model simu-
lated using the Metropolis algorithm. The correlation time is measured
in Monte Carlo steps per lattice site (i.e., in multiples of 10 000 Monte
Carlo steps in this case). The straight lines joining the points are just
to guide the eye.
librium measurements for all the temperatures except the two slower ones,
for which we took the data for times 10 000 onwards.
Next we need to estimate how many independent measurements these
data constitute, which means estimating the correlation time. To do this,
we calculate the magnetization autocorrelation function at each temperature
from Equation (3.21), for times t up to 1000. (We must be careful only to use
our equilibrium data for this calculation since the autocorrelation function is
an equilibrium quantity. That is, we should not use the data from the early
part of the simulation during which the system was coming to equilibrium.)
Performing a fit to these functions as in Figure 3.6, we make an estimate
of the correlation time T at each temperature. The results are shown in
Figure 3.8. Note the peak in the correlation time around T = 2.2. This
effect is called "critical slowing down", and we will discuss it in more detail
in Section 3.7.2. Given the length of the simulation tmax and our estimates of
req and T for each temperature, we can calculate the number n of independent
measurements to which our simulations correspond using Equation (3.19).
Using these figures we can now calculate the equilibrium properties of
the 100 x 100 Ising model in two dimensions. As an example, we have cal-
culated the magnetization and the specific heat again. Our estimate of the
magnetization is calculated by averaging over the magnetization measure-
ments from the simulation, again excluding the data from the early portion
80 Chapter 3: The Ising model and the Metropolis algorithm
of the run where the system was not equilibrated. The results are shown
in Figure 3.9, along with the known exact solution for the infinite system.
Calculating the errors on the magnetization from Equation (3.37), we find
that the errors are so small that the error bars would be completely covered
by the points themselves, so we have not bothered to put them in the fig-
ure. The agreement between the numerical calculation and the exact one
for the infinite lattice is much better than it was for the smaller system in
Figure 1.1, although there is still some discrepancy between the two. This
discrepancy arises because the quantity plotted in the figure is in fact the
average (|m|) of the magnitude of the magnetization, and not the average
magnetization itself; we discuss our reasons for doing this in Section 3.7.1
when we examine the spontaneous magnetization of the Ising model.
The figure clearly shows the benefits of the Monte Carlo method. The
calculation on the 5 x 5 system which we performed in Section 1.3 was ex-
act,whereas the Monte Carlo calculation on the 100 x 100 system is not.
However, the Monte Carlo calculation still gives a better estimate of the
magnetization of the infinite system. The errors due to statistical fluctua-
tions in our measurements of the magnetization are much smaller than the
inaccuracy of working with a tiny 5 x 5 system.
Using the energy measurements from our simulation, we have also calcu-
lated the specific heat for our Ising model from Equation (3.15). To calculate
3.7 An actual calculation 81
FIGURE 3.10 The specific heat per spin of the two-dimensional Ising
model calculated by Monte Carlo simulation (points with error bars)
and the exact solution for the same quantity (solid line). Note how
the error bars get bigger close to the peak in the specific heat. This
phenomenon is discussed in detail in the next section.
the errors on the resulting numbers we could use the blocking method of Sec-
tion 3.4.2 to get a rough estimate. Here we are interested in doing a more
accurate calculation, and for that we should use either the bootstrap or the
jackknife method (see Sections 3.4.3 and 3.4.4). The number of independent
samples n is for most temperatures considerably greater than 100, so by the
criterion given in Section 3.4.4, the bootstrap method is the more efficient
one to use. In Figure 3.10 we show our results for the specific heat with
error bars calculated from 200 bootstrap resamplings of the data (giving er-
rors accurate to about 5%). The agreement here with the known exact result
for the specific heat is excellent—better than for the magnetization—though
the errors are larger, especially in the region close to the peak. If we were
particularly interested to know the value of c in this region it would make
sense for us to go back and do a longer run of our program in this region to
get more accurate data. For example, if we wanted to calculate the entropy
difference from one side of the peak to the other using Equation (3.45), then
large error bars in this region would make the integral inaccurate and we
might well benefit from expending a little more effort in this region to get
more accurate results.
82 Chapter 3: The Ising model and the Metropolis algorithm
16
This also provides an explanation of why the agreement between the analytic solution
and the Monte Carlo calculation was better for the specific heat, Figure 3.10, than it
was for the magnetization. The process of taking the mean of the magnitude |m| of
the magnetization means that we consistently overestimate the magnetization above the
critical temperature, and in fact this problem extends to temperatures a little below Tc
as well (see Figure 3.9). No such adjustments are necessary when calculating the specific
heat, and as a result our simulation agrees much better with the known values for c, even
though the error bars are larger in this case.
17
A number of different mathematical definitions of a cluster are possible. Some of
them require that all spins in the cluster point in the same direction whilst others are
less strict. We discuss these definitions in detail in the next chapter, particularly in
Sections 4.2 and 4.4.2.
84 Chapter 3: The Ising model and the Metropolis algorithm
comes in. As we saw in Figure 3.8, the correlation time r of the simulation
is also large in the region around Tc. In fact, like the susceptibility and the
specific heat, the correlation time actually diverges at Tc in the thermody-
namic limit. For the finite-sized systems of our Monte Carlo simulations T
does not diverge, but it can still become very large in the critical region,
and a large correlation time means that the number of independent mea-
surements we can extract from a simulation of a certain length is small (see
Equation (3.19)). This effect on its own would increase the size of the errors
on measurements from our simulation, even without the large critical fluctu-
ations. The combination of both effects is particularly unfortunate, because
it means that in order to increase the number of independent measurements
we make during our simulation, we have to perform a much longer run of
the program; the computer time necessary to reduce the error bars to a size
comparable with those away from Tc increases very rapidly as we approach
the phase transition.
The critical fluctuations which increase the size of our error bars are
an innate physical feature of the Ising model. Any Monte Carlo algorithm
which correctly samples the Boltzmann distribution will also give critical
fluctuations. There is nothing we can do to change our algorithm which will
reduce this source of error. However, the same is not true of the increase in
correlation time. This effect, known as critical slowing down, is a property
of the Monte Carlo algorithm we have used to perform the simulation, but
not of the Ising model in general. Different algorithms can have different
values of the correlation time at any given temperature, and the degree to
which the correlation time grows as we approach Tc, if it grows at all, depends
on the precise details of the algorithm. Therefore, if we are particularly
interested in the behaviour of a model in the critical region, it may be possible
to construct an algorithm which suffers less from critical slowing down than
does the Metropolis algorithm, or even eliminates it completely, allowing us
to achieve much greater accuracy for our measurements. In the next chapter
we look at a number of other algorithms which do just this and which allow
us to study the critical region of the Ising model more accurately.
Problems
In the last chapter we saw that the Metropolis algorithm with single-spin-
flip dynamics is an excellent Monte Carlo algorithm for simulating the Ising
model when we are interested in temperatures well away from the critical
temperature Tc. However, as we approach the critical temperature, the
combination of large critical fluctuations and long correlation time makes
the errors on measured quantities grow enormously. As we pointed out,
there is little to be done about the critical fluctuations, since these are an
intrinsic property of the model near its phase transition (and are, what's
more, precisely the kind of interesting physical effect that we want to study
with our simulations). On the other hand, the increase in the correlation
time close to the phase transition is a function of the particular algorithm
we are using—the Metropolis algorithm in this case—and it turns out that
by using different algorithms we can greatly reduce this undesirable effect.
In the first part of this chapter we will study one of the most widely used
and successful such algorithms, the Wolff algorithm. Before introducing the
algorithm however, we need to define a few terms.
where T is measured in Monte Carlo steps per lattice site. The exponent z
is often called the dynamic exponent. (It should not be confused with the
lattice coordination number introduced in the last chapter, which was also
1
We could have different values for v above and below the transition temperature, but
as it turns out we don't. Although it is beyond the scope of this book, it is relatively
straightforward to show using renormalization group arguments that this and the other
critical exponents defined here must take the same values above and below Tc (see Bin-
ney et al. 1992). Note however that the constant of proportionality in Equation (4.2) need
not be the same above and below the transition.
4.1 Critical exponents and their measurement 89
denoted z.) Its definition differs from that of the other exponents by the
inclusion of the V, which is the same v as appeared in Equation (4.2); the
exponent is really zv, and not just z. We're not quite sure why it is defined
this way, but it's the way everyone does it, so we just have to get used to it.
It is convenient in one respect, as we will see in a moment, because of the
way in which z is measured.
The dynamic exponent gives us a way to quantify the critical slowing
down effect. As we mentioned, the amount of critical slowing down we
see in our simulations depends on what algorithm we use. This gives rise to
different values of z for different algorithms—z is not a universal exponent in
the way that v, a and 7 are.2 A large value of z means that T becomes large
very quickly as we approach the phase transition, making our simulation
much slower and hence less accurate. A small value of z implies relatively
little critical slowing down and a faster algorithm near Tc. If z = 0, then
there is no critical slowing down at all, and the algorithm can be used right
up to the critical temperature without T becoming large.3
The measurement of critical exponents is far from being a trivial exercise.
Quite a lot of effort has in fact been devoted to their measurement, since
their values have intrinsic interest to those concerned with the physics of
phase transitions. At present we are interested in the value of the dynamic
exponent z primarily as a measure of the critical slowing down in our algo-
rithms. In Chapter 8 we will discuss in some detail the techniques which
have been developed for measuring critical exponents, but for the moment
we will just run briefly though one simple technique—a type of "finite size
scaling"—to give an idea of how we gauge the efficiency of these algorithms.
Combining Equations (4.2) and (4.5), we can write
This equation tells us how the correlation time gets longer as the correlation
length diverges near the critical point. However, in a system of finite size,
which includes all the systems in our Monte Carlo simulations, the corre-
lation length can never really diverge. Recall that the correlation length is
the typical dimension of the clusters of correlated spins in the system. Once
this size reaches the dimension, call it L, of the system being simulated it
can get no bigger. A volume of Ld, where d is the dimensionality of our
system, is the largest cluster of spins we can have. This means that in fact
2
The exponent z is still independent of the shape of the lattice, the spin-spin interaction
J and so forth. Only changes in the dynamics affect its value. (Hence the name "dynamic
exponent".)
3
In fact, it is conventional to denote a logarithmic divergence T ~ — log \t\ of the
correlation time by z = 0, so a zero dynamic exponent does not necessarily mean that
there is no critical slowing down. However, a logarithmic divergence is far less severe than
a power-law one, so z = 0 is still a good thing.
90 Chapter 4: Other algorithms for the Ising model
the divergence of the correlation length, and as a result that of the correla-
tion time, is "cut off" in the region for which £ > L. This effect is pictured
schematically in Figure 4.1. Thus, for all temperatures sufficiently close to
the critical one, and particularly when we are at the critical temperature,
Equation (4.6) becomes
Now suppose we know what the critical temperature Tc of our model is,
perhaps because, as in the 2D Ising model, an exact solution is available.
(More generally, we don't know the critical temperature with any accuracy,
in which case this method won't work, and we have to resort to some of the
more sophisticated ones described in Chapter 8.) In that case, we can use
Equation (4.7) to measure z. We simply perform a sequence of simulations
at T = Tc for systems of a variety of different sizes L, and then plot r
against L on logarithmic scales. The slope of the resulting plot should be the
exponent z. We have done exactly this in Figure 4.2 for the 2D Ising model
simulated using the Metropolis algorithm. The line is the best fit through
the points, given the errors, and its slope gives us a figure of z = 2.09 ± 0.06
for the dynamic exponent. This calculation was rather a rough one, done
on one of the authors' home computers. Much more thorough Monte Carlo
calculations of z have been performed using this and a number of other
methods by a variety of people. At the time of the writing of this book the
4.2 The Wolff algorithm 91
FIGURE 4.2 The correlation time T for the 2D Ising model simulated
using the Metropolis algorithm. The measurements were made at the
critical temperature Tc — 2.269J for systems of a variety of different
sizes L x L, and then plotted on logarithmic scales against L. The
slope z = 2.09 ± 0.06 is an estimate of the dynamic exponent.
best figure available for this exponent was z = 2.1665 ± 0.0012, obtained by
Nightingale and Blote (1996).
The value z — 2.17 is a fairly high one amongst Monte Carlo algorithms
for the 2D Ising model, and indicates that the Metropolis algorithm is by no
means the best algorithm for investigating the behaviour of the model near
its phase transition. In the next section we will introduce a new algorithm
which has a much lower dynamic exponent, albeit at the expense of some
increase in complexity.
which for the 2D Ising model is about L4. This makes measurements on
large systems extremely difficult in the critical region. (The simulations
which went into Figure 4.2, for instance, took more than two weeks of CPU
time, with the largest system size (L = 100) requiring 150 billion Monte
Carlo steps to get a reasonably accurate value of T.)
The fundamental reason for the large value of z in the Metropolis algo-
rithm is the divergence of the correlation length and the critical fluctuations
present near the phase transition. When the correlation length becomes
large close to Tc, large regions form in which all the spins are pointing in the
same direction. These regions are often called domains.4 It is quite difficult
for the algorithm to flip over one of these large domains because it has to do
it spin by spin, and each move has quite a high probability of being rejected
because of the ferromagnetic interactions between neighbouring spins. The
chances of flipping over a spin in the middle of a domain are particularly
low, because it is surrounded by four others pointing in the same direction.
In two dimensions, flipping such a spin costs us 8J in energy and, using the
value of Tc ~ 2.269J given in Equation (3.53), the probability of accepting
such a move, when we are close to the critical temperature, is
or about three per cent, regardless of the value of J. The chance of flipping
a spin on the edge of a domain is higher because it has a lower energy cost,
and in fact it turns out that this process is the dominant one when it comes
to flipping over entire domains. It can be very slow however, especially with
the big domains that form near the critical temperature.
A solution to these problems was proposed in 1989 by Ulli Wolff, based on
previous work by Swendsen and Wang (1987). The algorithm he suggested
is now known as the Wolff algorithm. The basic idea is to look for clusters
of similarly oriented spins and then flip them in their entirety all in one go,
rather than trying to turn them over spin by painful spin. Algorithms of
this type are referred to as cluster-flipping algorithms, or sometimes just
cluster algorithms, and in recent years they have become popular for all
sorts of problems, since it turns out that, at least in the case of the Ising
model, they almost entirely remove the problem of critical slowing down.
How then do we go about finding these clusters of spins which we are
going to flip? The simplest strategy which suggests itself, and the one which
the Wolff algorithm uses, is just to pick a spin at random from the lattice
and then look at its neighbours to see if any of them are pointing in the same
4
Note that these domains are not quite the same thing as the clusters discussed in
Section 3.7.1. By convention, the word "domain" refers to a group of adjacent spins
which are pointing in the same direction, whereas the word "cluster" refers to a group of
spins whose values are correlated with one another. We will shortly give a more precise
definition of a cluster.
4.2 The Wolff algorithm 93
the flipping of a single cluster of similarly oriented spins. The crucial thing
to notice is the way the spins are oriented around the edge of the cluster
(which is indicated by the line in the figure). Notice that in each of the two
states, some of the spins just outside the cluster are pointing the same way
as the spins in the cluster. The bonds between these spins and the ones in
the cluster have to be "broken" when the cluster is flipped. Inevitably, the
bonds which are not broken in going from u to v must be broken if we flip
back again from v to u.
Consider now a move which will take us from u to v. There are in fact
many such moves—we could choose any of the spins in the cluster as our
seed spin, and then we could add the rest of the spins to it in a variety of
orders. For the moment, however, let us just consider one particular move,
starting with a particular seed spin and then adding the others to it in a
particular order. Consider also the reverse move, which takes us back to u
from v, starting with exactly the same seed spin, and adding the others to it
in exactly the same way as in the forward move. The probability of choosing
the seed is exactly the same in the two directions, as is the probability of
adding each spin to the cluster. The only thing that changes between the
two is the probability of "breaking" bonds around the edge of the cluster,
because the bonds which have to be broken are different in the two cases.
Suppose that, for the forward move, there are m bonds which have to be
broken in order to flip the cluster. These broken bonds represent pairs of
similarly oriented spins which were not added to the cluster by the algorithm.
The probability of not adding such a spin is 1 — Padd. Thus the probability
of not adding all of them, which is proportional to the selection probability
g(u —> v) for the forward move, is (1 — P add ) m - If there are n bonds which
need to be broken in the reverse move then the probability of doing it will
be (1 - Padd)n- The condition of detailed balance, Equation (2.14), along
with Equation (2.17), then tells us that
where A(U —> v) and A(y —> u) are the acceptance ratios for the moves in
the two directions. The change in energy Ev — Eu between the two states
also depends on the bonds which are broken. For each of the m bonds which
are broken in going from u to v, the energy changes by +2J. For each of
the n bonds which are made, the energy changes by — 2J. Thus
as the temperature gets lower. The flipping of these clusters is a. much less
laborious task than in the case of the Metropolis: algorithm—as we shall see
it. takes a time proportional to the size of the cluster to grow it and then
t u r n it. over so we have every hope that, the algorithm will have a lower
dynamic exponent and less critical slowing down. In Section 4,3.1 we show-
that this is indeed the ease. First wo look in a little more detail at, the way
the Wolff algorithm works.
FIGURE 4.5 The average cluster size in the Wolff algorithm as a frac-
tion of the size of the lattice measured as function of temperature.
The error bars on the measurements are not shown, because they are
smaller than the points. The lines are just a guide to the eye.
is not difficult to make out which cluster flipped at each step. Clearly the
algorithm is doing its job, flipping large areas of spins when we are in the
critical region. In the T > Tc case, it is much harder to make out the
changes between one frame and the next. The reason for this is that in this
temperature region Padd is quite small (see Equation (4.13)) and this in turn
makes the clusters small, so it is hard to see when they flip over. Of course,
this is exactly what the algorithm is supposed to do, since the correlation
length here is small, and we don't expect to get large regions of spins flipping
over together. In Figure 4.5, we have plotted the mean size of the clusters
flipped by the Wolff algorithm over a range of temperatures, and, as we can
see, they do indeed become small at large temperatures.
When the temperature gets sufficiently high (around T = 10J), the mean
size of the clusters becomes hardly greater than one. In other words, the
single seed spin for each cluster is being flipped over with probability one at
each step, but none of its neighbours are. This is exactly what the Metropo-
lis algorithm does in this temperature regime also. When T is large, the
Metropolis acceptance ratio, Equation (3.7), is 1, or very close to it, for any
transition between two states u and v. Thus in the limit of high tempera-
tures, the Wolff algorithm and the Metropolis algorithm become the same
thing. But notice that the Wolff algorithm will actually be the slower of the
two in this case, because for each seed spin it has to go through the business
of testing each of the neighbours for possible inclusion in the cluster, whereas
98 Chapter 4: Other algorithms for the Ising model
the Metropolis algorithm only has to decide whether to flip a single spin or
not, a comparatively simple computational task. (To convince yourself of
this, compare the programs for the two algorithms given in Appendix B.
The Wolff algorithm is longer and more complex and will take more com-
puter time per step than the Metropolis algorithm, even for a cluster with
only one spin.) Thus, even if the Wolff algorithm is a good thing near the
phase transition (and we will show that it is), there comes a point as the
temperature increases where the Metropolis algorithm becomes better.
Now let us turn to the simulation at low temperature, the bottom row
in Figure 4.4. The action of the algorithm is dramatically obvious in this
case—almost every spin on the lattice is being flipped at each step. The
reason for this is clear. When we are well below the critical temperature the
Ising model develops a finite magnetization in one direction or the other, and
the majority of the spins on the lattice line up in this direction, forming a
"backbone" of similarly oriented spins which spans the entire lattice. When
we choose our seed spin for the Wolff algorithm, it is likely that we will land
on one of the spins comprising this backbone. Furthermore, theprobability
Padd, Equation (4.13), is large when T is small, so the neighbours of the seed
spin are not only very likely to be aligned with it, but are also very likely to be
added to the growing cluster. The result is that the cluster grows (usually)
to fill almost the entire backbone of spontaneously magnetized spins, and
then they are all flipped over in one step. Such a lattice-filling cluster is said
to be a percolating cluster.
On the face of it, this seems like a very inefficient way to generate states
for the Ising model. After all, we know that what should really be happening
in the Ising model at low temperature is that most of the spins should be lined
up with one another, except for a few "excitation" spins, which are pointing
the other way (see the Figure 4.4). Every so often, one of these excitation
spins flips back over to join the majority pointing the other way, or perhaps
one of the backbone spins gets flipped by a particularly enthusiastic thermal
excitation and becomes a new excitation spin. This of course is exactly
what the Metropolis algorithm does in this regime. The Wolff algorithm
on the other hand is removing the single-spin excitations by the seemingly
extravagant measure of Hipping all the other spins on the entire lattice to
point the same way as the single lonely excitation. It's like working out
which string of your guitar is out of tune and then tuning the other five to
that one. In fact, however, it's actually not such a stupid thing to do. First,
let us point out that, since the Wolff algorithm never flips spins which are
pointing in the opposite direction to the seed spin, all the excitation spins
on the entire lattice end up pointing the same way as the backbone when
the algorithm flips over a percolating cluster. Therefore, the Wolff algorithm
gets rid of all the excitation spins on the lattice in a single step. Second,
since we know that the Wolff algorithm generates states with the correct
4.3 Properties of the Wolff algorithm 99
by a slim margin) at both high and low temperatures, that leaves only the
intermediate regime close to Tc in which the Wolff algorithm might be worth-
while. This of course is the regime in which we designed the algorithm to
work well, so we have every hope that it will beat the Metropolis algorithm
there, and indeed it does, very handily, as we can show by comparing the
correlation times of the two algorithms.
5
This is fairly easy to see: for each spin we have to look at its neighbours to see if they
should be included in the cluster, and then when the cluster is complete we have to flip
all the spins. Since the same operations are performed for each spin in the cluster, the
time taken to do one Monte Carlo step should scale with cluster size.
4.3 Properties of the Wolff algorithm 101
FIGURE 4.6 The correlation time T for the 2D Ising model simulated
using the Wolff algorithm. The measurements deviate from a straight
line for small system sizes L, but a fit to the larger sizes, indicated by
the dashed line, gives a reasonable figure of z = 0.25 ± 0.02 for the
dynamic exponent of the algorithm.
Using this definition, we can compare the performance of the two algorithms
in the region close to the critical temperature, and we find that the Wolff
algorithm does indeed dramatically outperform the Metropolis algorithm.
For instance, in a 100 x 100 two-dimensional system right at the critical
temperature, we measure the correlation time of the Wolff algorithm to be
T = 2.80 ± 0.03 spin-flips per site. The Metropolis algorithm by contrast has
T = 2570 ± 330. A factor of a thousand certainly outweighs any difference
in the relative complexity of the two algorithms. It is this impressive per-
formance on the part of the Wolff algorithm which makes it a worthwhile
algorithm to use if we want to study the behaviour of the model close to Tc.
In Figure 4.6 we have plotted on logarithmic scales the correlation time
of the Wolff algorithm for the two-dimensional Ising model at the critical
temperature, over a range of different system sizes, just as we did for the
Metropolis algorithm in Figure 4.2. Again, the slope of the line gives us an
estimate of the dynamic exponent. Our best fit, given the errors on the data
points, is z = 0.25 ± 0.02. Again, this was something of a rough calculation,
although our result is competitive with other more thorough ones. The
best available figure at the time of writing was that of Coddington and
Baillie (1992) who measured z = 0.25 ± 0.01. This figure is clearly much lower
than the z = 2.17 of the Metropolis algorithm, and gives us a quantitative
measure of how much better the Wolff algorithm really is.
102 Chapter 4: Other algorithms for the Ising model
We can measure this exponent in exactly the same way as we did before,
and it turns out that it is related to the real dynamic exponent z for the
algorithm by
where 7 and v are the critical exponents governing the divergences of the
magnetic susceptibility and the correlation length (see Equations (4.2) and
(4.3)) and d is the dimensionality of the model (which is 2 in the cases we
have been looking at). If we know the values of v and 7, as we do in the
case of the 2D Ising model, then we can use Equation (4.16) to calculate
z without ever having to measure the mean cluster size in the algorithm,
which eliminates one source of error in the measurement, thereby making
the value of z more accurate.6
The first step in demonstrating Equation (4.16) is to prove another useful
result, about the magnetic susceptibility X. It turns out that, for tempera-
tures T > Tc, the susceptibility is related to the mean size (n) of the clusters
flipped by the Wolff algorithm thus:
In many simulations of the Ising model using the Wolff algorithm, the sus-
ceptibility is measured using the mean cluster size in this way.
The demonstration of Equation (4.17) goes like this. Instead of carrying
out the Wolff algorithm in the way described in this chapter, imagine instead
doing it a slightly different way. Imagine that at each step we look at the
whole lattice and for every pair of neighbouring spins which are pointing
in the same direction, we make a "link" between them with probability
•Padd = 1 - e-2BJ. When we are done, we will have divided the whole
lattice into many different clusters of spins, as shown in Figure 4.7, each of
which will be a correct Wolff cluster (since we have used the correct Wolff
probability Padd to make the links). Now we choose a single seed spin from
the lattice at random, and flip the cluster to which it belongs. Then we
throw away all the links we have made and start again. The only difference
6
In cases such as the 3D Ising model, for which we don't know the exact values of the
critical exponents, it may still be better to make use of Equation (4.14) and measure z
directly, as we did in Figure 4.6.
4.3 Properties of the Wolff algorithm 103
Here i labels the different clusters, ni is their size (a positive integer), and
Si = ±1 depending on whether the ith cluster is pointing up or down, thereby
making either a positive or negative contribution to M. In order to calculate
the mean square magnetization we now take the square of this expression
and average it over a large number of spin configurations:
Now, since the two values ±1 which the variables Si can take are equally
likely on average, the first term in this expression is an average over a large
number of quantities which are randomly either positive or negative and
which will therefore tend to average out to zero. The second term on the
other hand is an average over only positive quantities since S2 i = +1 for all
104 Chapter 4: Other algorithms for the Ising model
Now consider the average (n) of the size of the clusters which get flipped
in the Wolff algorithm. This is not quite the same thing as the average
size of the clusters over the entire lattice, because when we choose our seed
spin for the Wolff algorithm it is chosen at random from the entire lattice,
which means that the probability pi of it falling in a particular cluster i is
proportional to the size of that cluster:
The average cluster size in the Wolff algorithm is then given by the average
of the probability of a cluster being chosen times the size of that cluster:
Now if we employ Equation (1.36), and recall that (m) = 0 for T > Tc, we
get
7
Strictly, the second term scales like the number of spin configurations in the average
and the first scales like its square root, so the second term dominates for a large number
of configurations.
8
In fact, measuring the susceptibility in this way is not only simpler than a direct
measurement, it is also superior, giving, as it turns out, smaller statistical errors. For this
reason the expression (4.24) is sometimes referred to as an improved estimator for the
susceptibility (Sweeny 1983, Wolff 1989).
4.3 Properties of the Wolff algorithm 105
FIGURE 4.8 The correlation time Tsteps of the 2D Ising model sim-
ulated using the Wolff algorithm, measured in units of Monte Carlo
steps (i.e., cluster flips). The fit gives us a value of zsteps = 0.50 ±0.01
for the corresponding dynamic exponent.
Now if we use Equations (4.2), (4.3), (4.6) and (4.15), this implies that
slightly above Tc we must have
Then, as we did in Section 4.1, we note that, close to Tc, the correlation
length is just equal to the dimension L of the system, so we replace £ every-
where by L, and we get
with probability 1/2 whether to flip it or not. We notice the following facts
about this algorithm:
1. The algorithm satisfies the requirement of ergodicity (Section 2.2.2).
To see this, we note that there is always a finite chance that no links
will be made on the lattice at all (in fact this is what usually happens
when T becomes very large), in which case the subsequent randomizing
of the clusters is just equivalent to choosing a new state at random for
the entire lattice. Thus, it is in theory possible to go from any state
to any other in one step.
2. The algorithm also satisfies the condition of detailed balance. The
proof of this fact is exactly the same as it was for the Wolff algorithm.
If the number of links broken and made in performing a move are
m and n respectively (and the reverse for the reverse move), then
the energy change for the move is 2J(m — n) (or 2 J ( n - m) for the
reverse move). The selection probabilities for choosing a particular set
of links differ between forward and reverse moves only at the places
where bonds are made or broken, so the ratio of the two selection
probabilities is (1 — Padd)m-n, just as it was before. By choosing
-Padd = 1 — e- -2BJ , we then ensure, just as before, that the acceptance
probability is independent of m and n and everything else, so any
choice which makes it the same in each direction, such as flipping all
clusters with probability 1/2, will make the algorithm correct. Notice
however that many other choices would also work. It doesn't matter
how we choose to flip the clusters, though the choice made here is good
because it minimizes the correlation between the direction of a cluster
before and after a move, the new direction being chosen completely at
random, regardless of the old one.
3. The algorithm updates the entire lattice on each move. In measuring
correlation times for this algorithm, one should therefore measure them
simply in numbers of Monte Carlo steps, and not steps per site as with
the Metropolis algorithm. (In fact, on average, only half the spins get
flipped on each move, but the number flipped scales like the size of the
system, which is the important point.)
4. The Swendsen-Wang algorithm is essentially the same as the Wolff
algorithm for low temperatures. Well below Tc, one of the clusters
chosen by the algorithm will be the big percolating cluster, and the rest
will correspond to the "excitations" discussed in Section 4.3, which will
be very small. Ignoring these very small clusters then, the Swendsen-
Wang algorithm will tend to turn over the percolating backbone of the
lattice on average every two steps (rather than every step as in the
Wolff algorithm—see Figure 4.4), but otherwise the two will behave
almost identically. Thus, as with the Wolff algorithm, we can expect
108 Chapter 4: Other algorithms for the Ising model
The combination of the last two points here implies that the only regime
in which the Swendsen-Wang algorithm can be expected to outperform the
Metropolis algorithm is the one close to the critical temperature. The best
measurement of the dynamic exponent of the algorithm is that of Coddington
and Baillie (1992), who found z = 0.25 ± 0.01 in two dimensions, which is
clearly much better than the Metropolis algorithm, and is in fact exactly the
same as the result for the Wolff algorithm. So the Swendsen-Wang algorithm
is a pretty good algorithm for investigating the 2D Ising model close to its
critical point. However, the Wolff algorithm is always at least a factor of two
faster, since in the Wolff algorithm every spin in every cluster generated is
flipped, whereas the spins in only half of the clusters generated are flipped in
the Swendsen-Wang algorithm. In addition, as Table 4.1 shows, for higher
dimensions the Swendsen-Wang algorithm has a significantly higher dynamic
exponent than the Wolff algorithm, making it slower close to Tc. The reason
for this is that close to Tc the properties of the Ising model are dominated by
the fluctuations of large clusters of spins. As the arguments of Section 4.3.2
showed, the Wolff algorithm preferentially flips larger clusters because the
chance of the seed spin belonging to any particular cluster is proportional to
the size of that cluster. The Swendsen-Wang algorithm on the other hand
treats all clusters equally, regardless of their size, and therefore wastes a
considerable amount of effort on small clusters which make vanishingly little
contribution to the macroscopic properties of the system for large system
sizes. This, coupled with the fact that the Swendsen-Wang algorithm is
slightly more complicated to program than the Wolff algorithm, makes the
Wolff algorithm the algorithm of choice for most people.9
9
There is one important exception to this rule; as discussed in Section 14.2.2, the
Swendsen-Wang algorithm can be implemented more efficiently on a parallel computer
than can the Wolff algorithm.
4.4 Further algorithms for the Ising model 109
He then wrote the probability for making a link between two neighbouring
spins as a function of this energy P a dd(Eij). In the Ising model Eij can
only take two values ±J, so the function P add(Eij) only needs to be de
be defined
at these points, but for some of the more general models Niedermayer
considered it needs to be defined elsewhere as well. Clearly, if for the Ising
model we make Padd( —J) = 1 — e-2BJ and P a dd(J) = 0, then we recover the
Wolff algorithm or the Swendsen-Wang algorithm, depending on whether
we flip only a single cluster on each move, or many clusters over the entire
lattice—Niedermayer's formalism is applicable in either case.
To be concrete about things, let us look at the case of the single-cluster,
Wolff-type version of Niedermayer's algorithm. First, it is clear that for
any choice of Padd (except the very stupid choice P a dd(E) = 1 for all E),
the algorithm will satisfy the condition of ergodicity. Just as in the Wolff
algorithm, there is a finite probability that any spin on the lattice will find
itself the sole member of a cluster of one. Flipping a succession of such
clusters will clearly get us from any state to any other in a finite number
of moves. Second, let us apply the condition of detailed balance to the
algorithm. Consider, as we did in the case of the Wolff algorithm, two states
of our system which differ by the flipping of a single cluster. (You can
look again at Figure 4.3 if you like, but bear in mind that, since we are now
allowing links between anti-parallel spins, not all the spins in the cluster need
be pointing in the same direction.) As before, the probability of forming the
cluster itself is exactly the same in the forward and reverse directions, except
for the contributions which come from the borders. At the borders, there
are some pairs of spins which are parallel and some which are anti-parallel.
Suppose that in the forward direction there are m pairs of parallel spins at
the border—bonds which will be broken in flipping the cluster—and n pairs
which are anti-parallel—bonds which will be made. By definition, no links
are made between the spins of any of these border pairs, and the probability
of that happening is [1 — P a dd(-J)] m[ l — Padd(J)] n . In the reverse direction
the corresponding probability is [1 — P a dd(-J)]n[l - Padd(J)] m . Just as in
the Wolff case, the energy cost of flipping the cluster from state u to state
v is
Any choice of acceptance ratios A(n —> v) and A(y —> U) which satisfies
relation will satisfy detailed balance. For the Wolff choice ofPaddwe get
acceptance ratios which are always unity, but Niedermayer pointed out that
there are other ways to achieve this. In fact, all we need to do is choose Padd
to satisfy
and we will get acceptance ratios which are always one. Niedermayer's solu-
tion to this equation was P a dd(E) = 1 — exp[B(E - E0)] where E0 is a free
parameter whose value we can choose as we like. Notice however that since
-Padd(-E) is supposed to be a probability, it is not allowed to be less than
zero. Thus the best expression we can write for the probability P a dd(Eij) of
adding a link between sites i and j is
4.4 Further algorithms for the Ising model 111
And this defines Niedermayer's algorithm. Notice the following things about
this algorithm:
1. As long as all Eij on the lattice are less than or equal to E0, the right-
hand side of Equation (4.31) is always unity for the choice of Padd
given in (4.33), so the two acceptance ratios can be chosen to be one
for every move. Since we are at liberty to choose E0 however we like,
we can always satisfy this condition by making it greater than or equal
to the largest value that Eij can take, which is J in the case of the
Ising model. This gives us a whole spectrum of Wolff-type algorithms
for various values E0 > J, which all have acceptance ratios of one.
The Wolff algorithm itself is equivalent to E0 — J. If we increase E0
above this value, the probabilities add(Eij) tend closer and closer to
one, making the clusters formed larger and larger. This gives us a way
of controlling the sizes of the clusters formed in our algorithm, all the
way up to clusters which encompass (almost) every spin on the lattice
at every move.
2. If we choose E0 to be less than the largest possible value of Eij then
the right-hand side of Equation (4.31) is no longer equal to one, and
we can no longer choose the acceptance ratios to be unity. For the
Ising model, if we choose — J < E0 < J then we have
up into clusters of one or two, joined by bonds which may or may not have
interactions associated with them. Then we treat those clusters as single
spins, and we carry out the Metropolis algorithm on them, for a few sweeps
of the lattice.
But this is not the end. Now we do the whole procedure again, treating
the clusters as spins, and joining them into bigger clusters of either one or
two elements each, using exactly the same rules as before. (Note that the
lattice of clusters is not a regular lattice, as the original system was, but
this does not stop us from carrying out the procedure just as before.) Then
we do a few Metropolis sweeps of this coarser lattice too. And we keep
repeating the whole thing until the size of the blocks reaches the size of the
whole lattice. In this way, we get to flip blocks of spins of all sizes from
single spins right up to the size of the entire system. Then we start taking
the blocks apart again into the blocks that made them up, and so forth until
we get back to the lattice of single spins. In fact, Kandel and co-workers
used a scheme where at each level in the blocking procedure they either
went towards bigger blocks ("coarsening") or smaller ones ("uncoarsening")
according to the following rule. At any particular level of the procedure we
look back and see what we did the previous times we got to this level. If we
coarsened the lattice the previous two times we got to this point, then on
the third time only, we uncoarsen. This choice has the effect of biasing the
algorithm towards working more at the long length-scales (bigger, coarser
blocks).
Well, perhaps you can see why the complexity of this algorithm has put
people off using it. The proof that the algorithm satisfies detailed balance is
quite involved, and, since you're probably not dying to hear about it right
now, we'll refer you to the original paper for the details. In the same paper it
is demonstrated that the dynamic exponent for the algorithm is in the region
of 0.2 for the two-dimensional Ising model—a value similar to, though not
markedly better than, the Wolff algorithm. Lest you dismiss the multigrid
method out of hand, however, let us just point out that the simulations
do indicate that its performance is superior to cluster algorithms for large
systems. These days, with increasing computer power, people are pushing
simulations towards larger and larger lattices, and there may well come a
point at which using a multigrid method could win us a factor of ten or more
in the speed of our simulation.
single pair of neighbouring spins on the lattice are aligned, and are
therefore candidates for linking with probability Padd- In this case, it
is known that for the square lattice in two dimensions for instance,
percolation will set in when the probability of making a link between
two spins is Padd = 1/2. Solving for the temperature, this is equivalent
to BJ =1/2log 2 = 0.346..., so this is the temperature at which the
first step should be conducted.
3. Once the links are made, we flip each cluster separately with probabil-
ity1/2,just as we do in the normal Swendsen-Wang algorithm. Then
the whole procedure is repeated from step 2 again.
So what's the point here? Why does the algorithm work? Well, consider
what happens if the system is below the critical temperature. In that case,
the spins will be more likely to be aligned with their neighbours than they
are at the critical temperature, and therefore there will be more neighbour-
ing pairs of aligned spins which are candidates for making links. Thus, in
order to make enough links to get one cluster to percolate across the entire
lattice we don't need as high a linking probability Padd as we would at Tc.
But a lower value of Padd corresponds to a higher value T > Tc of the corre-
sponding temperature. In other words, when the system is below the critical
temperature, the algorithm automatically chooses a temperature T > Tc for
its Swendsen-Wang procedure. Conversely, if the system is at a tempera-
ture above the critical point, then neighbouring spins will be less likely to
be aligned with one another than they are at Tc. As a result, there will be
fewer places that we can put links on the lattice, and Padd will need to be
higher than it would be at Tc in order to make one of the clusters percolate
and fill the entire lattice.10 The higher value of Padd corresponds to a lower
value of the temperature T < Tc, and so, for a state of the system above the
critical temperature, the algorithm will automatically choose a temperature
T < Tc for its Swendsen-Wang procedure.
The algorithm therefore has a kind of negative feedback built into it,
which always drives the system towards the critical point, and when it finally
reaches Tc it will just stay there, performing Swendsen-Wang Monte Carlo
steps at the critical temperature for the rest of the simulation. We do not
need to know what the critical temperature is for the algorithm to work. It
finds Tc all on its own, and for that reason the algorithm is a good way of
measuring Tc. Notice also that when the temperature of the system is lower
than the critical temperature, the algorithm performs Monte Carlo steps
with T > Tc, and vice versa. Thus, it seems plausible that the algorithm
would drive itself towards the critical point quicker than simply performing
10
In fact, for sufficiently high temperatures it may not be possible to produce a per-
colating cluster at all. In this case the algorithm will perform a Monte Carlo step at
T = 0.
4.4 Further algorithms for the Ising model 117
13
This behaviour depends on the criterion used to judge when percolation takes place.
For some criteria the estimate of Tc approaches the infinite system result from below
rather than above.
4.5 Other spin models 119
the invaded cluster algorithm does not sample the Boltzmann distribution
exactly. In particular, the fluctuations in quantities measured using the algo-
rithm are different from those you would get in the Boltzmann distribution.
To see this, consider what happens once the algorithm has equilibrated at
the critical temperature. At this point, as we argued before, it should stop
changing the temperature and just become equivalent to the Swendsen-
Wang algorithm at Tc, which certainly samples the Boltzmann distribution
correctly. However, in actual fact, because the lattice is finite, the innate
randomness of the Monte Carlo method will give rise to variations in the
temperature T of successive steps in the simulation. The negative feedback
effect that we described above will ensure that T always remains close to Tc,
but the size of fluctuations is very sensitive to small changes in temperature
near to Tc and as a result the measured fluctuations are not a good approx-
imation to those of the true Boltzmann distribution.14 Thus the invaded
cluster algorithm is not well suited to estimating, for instance, the specific
heat c of the Ising model at Tc, which is determined by measuring fluctu-
ations. On the other hand, one could use the algorithm to determine the
value of Tc, and then use the normal Wolff or Swendsen-Wang algorithm
to perform a simulation at that temperature to measure c. Determining Tc
in this way could also be a useful preliminary to determining the dynamic
exponent of another algorithm, as we did earlier in the chapter for both the
Metropolis and Wolff algorithms. Those calculations demanded a knowledge
of the value of Tc, which is known exactly for the 2D Ising model, but not
for many other models (including the 3D Ising model).
where <ij is the Kronecker 6-symbol, which is 1 when i = j and zero other-
wise.
For the case q = 2, the Potts model is equivalent to the Ising model,
up to an additive constant in the Hamiltonian. To see this we note that
Equation (4.37) can be rewritten as
But 2(6 S i s j - 1/2) is +1 when s, = Si and -1 when they are different, so this
expression is indeed equivalent to the Ising Hamiltonian except for the con-
stant term — X(ij) 1/2J. (Note though that the interaction energy is changed
by a factor of two J—>1/2J from Equation (3.1).)
For higher values of q the Potts model behaves similarly in some ways
to the Ising model. For J > 0 (the ferromagnetic case) it has q equivalent
ground states in which all the spins have the same value, and as the temper-
ature is increased it undergoes a phase transition to a state in which each
of the q spin states occurs with equal frequency across the lattice. There
are differences with the Ising model however. In particular, the entropy of a
Potts model with q > 2 is higher than that of the Ising model at an equiva-
lent temperature, because the density of states of the system as a function of
energy is higher. For example, at temperatures just above T — 0, almost all
of the spins will be in one state— say q = 1 — but there will be a few isolated
excitations on the lattice, just as there were in the Ising model. Unlike the
4.5 Other spin models 121
Ising model however, each excitation can take any of the possible spin values
other than 1, which may be very many if q is large, and this gives rise to
many more low-lying excitation states than in the Ising case.
Monte Carlo simulation of Potts models is quite similar to the simulation
of the Ising model. The simplest thing one can do is apply the single-spin-flip
Metropolis algorithm, which would go like this. First we pick a spin i from
the lattice at random. It will have some value si. We choose at random a
new value s't = Si from the q — 1 available possibilities. Then we calculate
the change in energy AE that would result if we were to make this change to
this spin, and either accept or reject the move, with acceptance probability
(See Equation (3.7).) This algorithm satisfies the condition's of both ergod-
icity and detailed balance, for the same reasons that it did in the case of the
Ising model. However, even for a single-spin-flip algorithm, the Metropolis
algorithm is not a very good one to use for Potts models, especially when
q becomes large. To see this, let us consider an extreme case: the 17 = 100
Potts model on a square lattice in two dimensions.
At high temperatures, the acceptance ratio (4.39) is always either 1 or
close to it because B is small, so the algorithm is reasonably efficient. How-
ever, at lower temperatures this ceases to be the case. As the temperature
decreases, more and more spins will take on the same values as their neigh-
bours, forming ferromagnetic domains, just as in the Ising model. Consider
the case of a spin which has four neighbours which all have different values.
The four states of this spin in which it is aligned with one of its neighbours
have lower energy, and therefore higher Boltzmann weight, than the other
96 possible states. If our spin is in one of the 96 higher energy states, how
long will it take to find one of the four aligned states? Well, until we find
one of them, all the states that the system passes through will have the same
energy, so the Metropolis acceptance ratio will be 1. On average therefore,
it will take about 100/4 = 25 steps to find one of the four desirable states.
As q increases this number will get larger; it will take longer and longer
for the system to find the low-lying energy states, despite the fact that the
acceptance ratio for all moves is 1, simply because there are so many states
to sort through.
Conversely, if the spin is in one of the four low-lying states, then there is
an energy cost associated with excitation to one of the other 96, which means
that the acceptance ratio will be less than one, and possibly very small (if
the temperature is low). Thus it could be that nearly 96 out of every 100
attempted moves is rejected, giving an overall acceptance ratio little better
than 4%.
One way to get around these problems is to use the heat-bath algo-
122 Chapter 4: Other algorithms for the Ising model
The first sum here is the same for all q possible values of sk, and therefore
cancels out of the expression (4.40). The second sum has only z terms,
where z is the lattice coordination number (which is four in the square
4.5 Other spin models 123
lattice case we have been considering). Just calculating these terms, rather
than evaluating the entire Hamiltonian, makes evaluating pn much quicker.
Second, we notice that there are at most z states of the spin Sk in which
it is the same as at least one of its neighbours. These are the only states
in which the spin makes a contribution to the Hamiltonian. In all the other
states it makes no contribution to the Hamiltonian. Thus, evaluating the
Boltzmann factors e-BEn appearing in Equation (4.40) requires us to cal-
culate the values of, at most, z exponentials, all the other terms being just
equal to 1.
Finally, we note that there is only a small spectrum of possible values for
each Boltzmann factor, since the second term in Equation (4.42) can only
take values which are multiples of — J, from zero up to — zJ. Thus we can,
as we did for the Ising model, calculate the values of all the exponentials
we are going to need at the very beginning of our simulation, and thereafter
simply look up those values whenever we need to know one of them. The
calculation of exponentials being a costly process in terms of CPU time, this
trick considerably improves the efficiency of the algorithm.
In Figure 4.10 we compare the internal energy of a q = 10 Potts model on
a 20 x 20 square lattice at low temperature simulated using the heat-bath and
Metropolis algorithms. In this case, the heat-bath algorithm takes about 200
sweeps of the lattice to come to equilibrium, by contrast with the Metropolis
algorithm which takes about 20 000 sweeps. This is an impressive gain in
speed, although as usual we need to be careful about our claims. The heat-
124 Chapter 4: Other algorithms for the Ising model
Now suppose that the initial state of the spin is down. What acceptance
ratio A do these equations represent for the move that flips the spin to the
up state? Well, clearly the acceptance ratio is simply A = pup. If we multiply
numerator and denominator by e1/2B(Eup+Edown), we can write it in terms of
the change in energy AE = Eup - Edown-
It is easy to show that the same expression applies for the move which flips
the spin from up to down. Thus for the Ising model, the heat-bath algorithm
can be regarded as a single-spin-flip algorithm with an acceptance ratio
which is a function of the energy change A.E, just as it is in the Metropolis
algorithm, although the exact functional dependence of A on AE is different
from that of the Metropolis algorithm. In Figure 4.11 we show A as a
function of A.E for the two algorithms. For all values of A.E, the heat-bath
acceptance ratio is lower than the corresponding value for the Metropolis
algorithm, Equation (3.7), so it is always more efficient to use the Metropolis
algorithm. This should come as no surprise, since, as we pointed out in the
last chapter, the Metropolis algorithm is the most efficient possible single-
spin-flip algorithm for simulating the Ising model. However, it is worth being
familiar with Equation (4.44), because, for reasons which are not completely
clear to the authors, some people still use it for simulating the Ising model,
despite its relative inefficiency.
We have shown then that if we are going to simulate a Potts model with
a single-spin-flip algorithm, the algorithm that we choose should depend
4.5 Other spin models 125
FIGURE 4.11 The acceptance ratio of the heat-bath algorithm for the
Ising model (solid line) as a function of the energy difference between
states A.E. For comparison, the acceptance ratio of the Metropolis
algorithm is also shown (dashed line).
on the value of q. For small q (for example, the Ising model, which is
equivalent to q = 2) the Metropolis algorithm is the most efficient algorithm,
but as q gets larger there comes a point at which it is advantageous to
switch over to the heat-bath algorithm. Where exactly the cross-over occurs
is unfortunately dependent on what temperature we are interested in. At
the low temperatures used for the simulations in Figure 4.10, clearly the
cross-over was well below q = 10. However, at high temperatures where
the probabilities of occurrence of the states of any particular spin become
increasingly independent of the states of the spin's neighbours, there is less
and less difference between the performance of the two algorithms, and we
have to go to higher values of q to make the extra complexity of the heat-bath
algorithm worthwhile. In practice, for any serious study of a Potts model,
one should conduct short investigatory simulations with both algorithms to
decide which is more suited to the problem in hand.
appear gets longer and longer as we approach Tc. As with the Ising model,
we can define a dynamic exponent z, and show chat the correlation time r
at the critical point, measured in sweeps of the lattice, increases as T ~ Lz
with the size L of the system simulated. If z takes a large value, this can
place severe limitations on the size of the system we can work with. As it
turns out z, is indeed quite large. For the case of the q — 3 Potts model, for
example, Schulke and Zheng (1995) have measured a value of 2.198 ± 0.008
for the dynamic exponent of the Metropolis algorithm in two dimensions,
which is comparable to the value for the Ising model, and it is believed that
z has similar values for larger q also (Landau et al. 1988). The solution to
this problem is exactly the same as it was in the Ising model case: to try
and flip the large correlated clusters all in one go using a cluster-flipping
algorithm. In fact, the Wolff algorithm which we described in Section 4.2
generalizes to the Potts case in a very straightforward manner. The correct
generalization is as follows.
The proof that this algorithm satisfies detailed balance is exactly the
same as it was for the Ising model. If we consider two states u and v of
the system which differ by the changing of just one cluster, then the ratio
g(u —> v)/g(v —> u) of the selection probabilities for the moves between
these states depends only on the number of bonds broken m and the number
made n around the edges of the cluster. This gives us an equation of detailed
4.5 Other spin models 127
just as in the Ising case. The change in energy is also given by the same
expression as before, except for a factor of two:
and so the ratio of the acceptance ratios A(u —> v) and A(v —> u) for the
two moves is
For the choice of Padd given above, this is just equal to 1. This equation is
satisfied by making the acceptance ratios also equal to 1, and so, with this
choice, the algorithm satisfies detailed balance.
The algorithm also satisfies the condition of ergodicity for the same rea-
son that it did for the Ising model. Any spin can find itself the sole member
of a cluster of one, and flipping the appropriate succession of such single-spin
clusters can take us from any state to any other on a finite lattice in a finite
number of steps.
As in the case of the Ising model, the Wolff algorithm gives an impressive
improvement in performance near to the critical temperature. Coddington
and Baillie (1991) have measured a dynamic exponent of z = 0.60 ± 0.02
for the algorithm in the q = 3 case in two dimensions—a considerable im-
provement over the z — 2.2 of the Metropolis algorithm. As before, the
single-spin-flip algorithms come into their own well away from the critical
point because critical slowing down ceases to be a problem and the relative
simplicity of these algorithms over the Wolff algorithm tends to give them
the edge. One's choice of algorithm should therefore (as always) depend on
exactly which properties of the model one wants to investigate, but the Wolff
algorithm is definitely a good choice for examining critical properties.
The Swendsen-Wang algorithm can also be generalized for use with Potts
models in a straightforward fashion, as can all of the other algorithms de-
scribed in Section 4.4. In general however, the three algorithms we have
described here—Metropolis, heat-bath and Wolff—should be adequate for
most purposes.
model. In the XY model the spins are two-component vectors of unit length,
which can point in any direction on a two-dimensional plane. Thus the spins
can be represented by their components sx, sy, with the constraint that
s2 = s2 x+ s2y, = 1, or they can be represented by a single angle variable 0,
which records the direction that the spin points in. Note that although the
spins are two-dimensional there is no reason why the lattice need be two-
dimensional. For example, we can put XY spins on a cubic lattice in three
dimensions if we want to.
The Heisenberg model follows the same idea, but the spins are three-
dimensional unit vectors. (Again, the dimensionality of the spins and that
of the lattice are independent. We can put Heisenberg spins on a two-
dimensional lattice for instance.) Heisenberg spins can be represented either
as three-component vectors with s2 = s2 x+ s2y,+ s2 z = 1, or by two angle
variables, such as the 9 and $ of spherical coordinates.
The Hamiltonian for each of these models is the obvious generalization
of the Ising Hamiltonian:
where Si and Sj are the vectors representing the spins on sites i and j. When
J > 0 the spins can lower their energy by lining up with one another and the
model is ferromagnetic, just as in the Ising case. When J < 0, the model is
anti-ferromagnetic.
We can also define similar models with higher-dimensional spins—unit
vectors in four or more dimensions—and there are many other generaliza-
tions of the idea to continuous-valued spins which have symmetries other
than the simple spherical symmetry of these models. In this section we con-
centrate on the XY/Heisenberg class of models as examples of how Monte
Carlo algorithms can be constructed for continuous spin models.
The first thing we notice about these continuous spin models is that they
have a continuous spectrum of possible states, with energies which can take
any real value between the ground state energy and the energy of the highest-
lying state. The appropriate generalization of the Boltzmann distribution
to such a model is that the probability of finding the system in a state with
energy between E and E + dE is
where p ( E ) is the density of states, defined such that p(E) dE is the num-
ber of states in the interval E to E + dE. You may well ask what exactly
it means to talk about the number of states in an energy interval when we
have a continuous spectrum of states. This was one of the toughest prob-
lems which plagued the nineteenth century study of statistical mechanics,
4.5 Other spin models 129
and in fact it cannot be answered within the realm of the classical systems
we are studying here. Only by studying quantum mechanical systems and
then taking their classical limit can the question be answered properly. How-
ever, for the purposes of Monte Carlo simulation, we don't actually need to
know what the density of states in the system is, just as we didn't need
to know the complete spectrum of energies in the discrete case—the Monte
Carlo method ensures that, provided we wait long enough for the system to
equilibrate, all states will appear with their correct equilibrium probabilities
in the simulation.15
The partition function Z for a system with a continuous energy spectrum
is the obvious generalization of the discrete case:
This algorithm now satisfies ergodicity, since we have a finite chance of ac-
cepting any move, and can therefore in theory get from any state of the
lattice to any other in N moves or fewer, where N is the number of spins
on the lattice. It also satisfies detailed balance, since the selection probabil-
ities g(u —> v) and g(v —» u) for a move to go from state u to state v or
back again are the same, and the ratio of the acceptance probabilities is just
exp(—BAE), as it should be.
This form of the Metropolis algorithm is the most widely used single-
spin-flip algorithm for the XY model. However, we should point out that it
is not the only form the algorithm can take, nor is it known whether this is
the most efficient form. The thing is, there is quite a lot of flexibility about
the choice of the new state for the spin we are proposing to change. The only
restriction that the algorithm places on it is that the selection probabilities
g(u —> v) and g(v —> u) for the forward and reverse moves be equal. This
in turn only requires that the probability of picking a new direction 0' for
the spin be a function solely of the angle \0' — 0| between the two states.
Any function will do however, not just the uniform one we used above. If
we choose a function which is biased towards small changes in the direction
of our spin, then we will increase the acceptance ratio of the algorithm
because the energy cost of any move will be small. However, the change of
state will be small too, so the algorithm may take a lot of moves to reach
equilibrium. (For example, many small changes in the direction of single
spins would be necessary to cool the system from an initial T = oo state in
which the directions of the spins are all random to a low-temperature state
in which they all point in approximately the same direction.) Conversely,
an algorithm in which we preferentially propose larger changes in direction
\6' — 0\ for our Monte Carlo moves would take fewer accepted moves to get
from one state to another on average, but would tend to suffer from a lower
mean acceptance ratio, because with larger changes in spin direction moves
can have a larger energy cost.
As we mentioned, no one has, to our knowledge, investigated in any
detail the question of how best to compromise between these two cases—all
the large Metropolis studies of models like the XY and Heisenberg models
have been carried out with the uniform choice of directions described above.
It seems likely that the algorithm would benefit from a choice of smaller
a new direction at random isn't quite so simple. The problem of choosing a random
spherically symmetric direction is discussed in Section 16.2.1.
4.5 Other spin models 131
angles at low temperatures, where the acceptance ratio for higher energy
excitations becomes extremely small, and larger angles at high temperatures,
where the important point is to cover the space of possible states quickly,
the acceptance ratio being close to unity regardless of the moves proposed.
A detailed investigation of this point would make an interesting study.17
Many continuous spin models exhibit a phase transition similar to that of
the Ising model between a ferromagnetic and a paramagnetic phase, with the
accompanying problem of critical slowing down. As with the Ising model,
cluster algorithms can reduce these problems. A version of the Wolff al-
gorithm which is suitable for the XY and Heisenberg models was given by
Wolff in his original paper in 1989. The basic idea is that we choose at
random both a seed spin from which to start building our cluster and a
random direction, which we will denote by a unit vector n. Then we treat
the components n . Si of the spins in that direction roughly in the same way
as we did the spins in the Ising model. A neighbour of the seed spin whose
component in this direction has the same sign as that of the seed spin can
be added to the cluster by making a link between it and the seed spin with
some probability -Padd- If the components point in opposite directions then
the spin is not added to the cluster. When the complete cluster has been
built, it is "flipped" by reflecting all the spins in the plane perpendicular to
n.
The only complicating factor is that, in order to satisfy detailed balance,
the expression for Padd has to depend on the exact values of the spins which
are joined by links thus:
Readers might like to demonstrate for themselves that, with this choice,
the ratio of the selection probabilities g(u —> v) and g(v —> u) is equal
to e-BAE, Where AE is the change in energy in going from a state u to a
state v by flipping a single cluster. Thus, detailed balance is obeyed, as in
the Ising case, by an algorithm for which the acceptance probability for the
cluster flip is 1. This algorithm has been used, for example, by Gottlob and
Hasenbusch (1993) to perform extensive studies of the critical properties of
the XY model in three dimensions.
Similar generalizations to continuous spins are possible for all the algo-
rithms we discussed in Section 4.4. However, the Metropolis algorithm and
the Wolff algorithm are probably adequate to cover most situations.
17
In a system where the energy varies quadratically about its minimum as we change the
values of the spins, one can achieve a roughly constant acceptance ratio by making changes
of direction whose typical size varies as \/T. This might also be a reasonable approach for
systems such as the XY and Heisenberg models when we are at low temperatures because,
to leading order, the energies of these systems are quadratic about their minima.
132 Chapter 4: Other algorithms for the Ising model
Problems
In the last two chapters we have looked in detail at the Ising model, which
is primarily of interest as a model of a ferromagnet. In this chapter we
look at a variation on the Ising model idea, the conserved-order-parameter
Ising model. Although mathematically similar to the normal Ising model,
the conserved-order-parameter Ising model is used to model very different
systems. In particular, it can be used to study the properties of lattice
gases.
A lattice gas is a simple model of a gas in which a large number of particles
(representing atoms or molecules) move around on the vertices of a lattice.
Confining the particles to a lattice in this way makes the model considerably
simpler to deal with than a true gas model in which particles can take any
position in space, although it also makes the model less realistic. However,
lattice gases can give a good deal of insight into the general behaviour of
real gases without necessarily being quantitatively accurate representations.
Many different types of lattice gases have been studied, including ones
in which the particles possess inertia, or do not, ones in which particular
types of particle collisions occur, and ones with a variety of different types
of particle particle interactions. In this chapter we will look at one of the
simplest types which, as we will show, is equivalent to an Ising model. In our
lattice gas the particles have no inertia and simply walk at random around
the lattice under the influence of thermal excitations. The model is defined
as follows. We start off with a lattice possessing N sites, of which a fraction
p are occupied by particles. The particles satisfy the following rules:
2. A lattice site can be occupied by at most one particle at any time. This
site exclusion rule has a similar effect to the hard-sphere repulsion
seen in real systems, which is the result primarily of Pauli exclusion.
3. If two particles occupy nearest-neighbour sites on the lattice, they feel
an attraction with a fixed energy e. This rule mimics the effect of the
attractive forces between molecules.
Rules 2 and 3 are a fairly poor approximation to the forces between real
molecules. For one thing, the model assumes that the repulsive and attrac-
tive forces between molecules act on the same length-scale, whereas in reality
the attractive forces are normally of longer range. Nonetheless the model
proves useful. This is in part because, as we mentioned above, we are often
interested more in the qualitative behaviour of the system than in simulating
it faithfully, but also because of the model's critical properties. As we will
see, the model defined above possesses a phase transition, a critical point at
which it switches from a homogeneous gas phase to a solid/vapour coexis-
tence phase. In the vicinity of this phase transition some properties of the
model become independent of the exact nature of the interactions between
molecules and, simple though it is, our lattice gas is in this region capable of
making quantitative predictions about real systems. This is another aspect
of the phenomenon of universality which was introduced in Section 4.1.
The first step in studying our lattice gas model is to derive a Hamiltonian
for it. To represent the state of the gas we define a set of variables <i, one
on each lattice site, such that <i is 1 if site i is occupied by a particle and
0 if the site is vacant. We use the notation <i to distinguish these variables
from the spins Si of the previous chapters, which took values of ±1.
Already, this notation ensures that Rule 2 above is obeyed. A site can
only be occupied or not; we cannot have two or more particles on the same
site. In order to ensure that Rule 1 is obeyed, we require that the total
number of ones on the lattice, which is the number of occupied sites, should
be a constant. We can express this mathematically as
where (ij) denotes pairs of spins i and j which are nearest neighbours, just
as in previous chapters. Now let us define a new set of variables
Chapter 5: The conserved- order-parameter Ising model 135
These new variables are precisely the Ising spins of previous chapters. The
variable Si takes the value +1 on a site occupied by a particle, and —1 on
an unoccupied site. Inverting (5.3) to get ci, = 1/2(si + 1) and substituting
into Equation (5.2) we then find
The second term is constant, since z, N and p all are. If we now define
J = e/4 our Hamiltonian becomes
(See Equation (1.33).) Since both p and N are constant quantities, so also is
M, and for this reason we call this model the conserved-order-pararneter
Ising model, sometimes abbreviated to COP Ising model. (Recall that
the magnetization M is also called the order parameter of the Ising model—
see Section 3.7.1.) Throughout much of this chapter we will treat our lattice
gas using this Ising representation. This will allow us to make use of many
of the ideas we have developed in previous chapters to create algorithms for
simulating the model.
136 Chapter 5: The conserved-order-parameter Ising model
Unlike the Ising model studied in previous chapters, not all states of
the spins in the COP Ising model are allowed; only those for which the
constraint (5.5) is satisfied are allowed. This means that the sum over states
involved in calculating an expectation value (Equation (1.7)) is a sum over
only these allowed states. When we come to design a Monte Carlo algorithm
for this model we will need to take this constraint into account. A related
point is that the COP Ising model has two free parameters we can specify.
In the normal Ising model in zero magnetic field the only parameter we can
vary is the temperature. In the COP Ising model on the other hand, we
have to specify both the temperature and the average density of particles p,
which in the language of the Ising model is the density of up-pointing spins
or equivalently the magnetization.
From our investigations in Chapter 3 we know that in the ferromagnetic
case (J > 0), the normal Ising model has an average magnetization which is
zero above the critical temperature Tc but-non-zero below (see Figure 3.9, for
instance). Rearranging Equation (5.8) for p, this magnetization corresponds
to a preferred density of
where m is the magnetization per spin. In fact, m has two equilibrium values
below Tc, one positive and one negative, so there are really two preferred
densities for the model in this regime:
Suppose now that we fix p for the COP model somewhere in the range
between these two preferred densities: p- < p < p+. The system can then
arrange for the local density at every point on the lattice to take one or
other of the preferred values by phase separating into domains of these
two densities, and as we will see in Section 5.1 the COP model does exactly
this. When p is outside the range between /o_ and p+, it is still possible for
it to reach one of the preferred densities in some regions. For example, if
p < p- then the system has fewer up-pointing spins than it needs to reach
the preferred density, but it can still reach it in some portion of the lattice
by concentrating all its up-spins in local domains. This however would leave
the rest of the system starved of up-spins and even further from the preferred
density than it was before. It turns out that the energetic cost of doing this
is greater than the gain derived from forming the domains, so the system
prefers to be homogeneous.
Thus, when J > 0, our COP Ising model has two phases below the
critical temperature, one in which p_ < p < p+ and there is separation
into domains of high and low density, and one in which p lies outside this
range and the system is homogeneous. The absolute magnetization m of
Chapter 5: The conserved-order-parameter Ising model 137
FIGURE 5.1 Phase diagram of the COP Ising model on a square lat-
tice in two dimensions. Depending on the values of the temperature T
and density p, the system will either be homogeneous or will separate
into coexisting domains of the two preferred densities p+ and p-.
the normal Ising model gets smaller with increasing temperature, so that
the range of p within which there is phase separation in the COP model,
given by Equation (5.10), also gets smaller. Above Tc the range shrinks
to zero and there is no phase separation. This behaviour is depicted in
Figure 5.1, which shows the phase diagram for the two-dimensional COP
Ising model.1 Real fluid systems show a very similar phenomenon when
they phase separate on cooling. HgO at atmospheric pressure for example is
a gas at high temperatures, but passes on cooling into a coexistence region
in which it separates into distinct fractions of water vapour and liquid water.
This lends support to our earlier contention that the COP Ising model shows
behaviour qualitatively similar to that of real fluids.2
In this chapter we will show how the equilibrium properties of the COP
Ising model can be efficiently simulated on a computer and discuss two ap-
1
The line separating the two regimes, which is called the binodal, can be calculated
exactly from the known magnetization m = [1 — cosech2 2BJ|1/8 of the normal Ising
model in two dimensions, which was postulated by Onsager in 1949, though not proven
until Yang tackled the problem three years later (Yang 1952).
2
Note that the high-density phase of the COP Ising model is more like a solid than a
liquid however, since the "molecules" in the condensed phase of the model are arranged
on the regular grid imposed by the lattice.
138 Chapter 5: The conserved-order-parameter Ising model
As we argued in Section 3.1, the most efficient way to satisfy this condition
is the choice given in Equation (5.11).
Implementing the Kawasaki algorithm is not complicated, but there are
one or two tricks worth noting. First, if we pick two spins which are pointing
in the same direction then clearly there is no point in exchanging their values,
since this would make no change to the state of the lattice. So it is worth
checking, before we calculate AE, whether the spins are indeed aligned with
one another. If they are, we can save ourselves some work by forgetting them
and going straight on to the next Monte Carlo step.
It is tempting, in fact, to think that we might improve the speed of
the algorithm by ignoring pairs of spins which are aligned and choosing
only amongst those pairs which are anti-aligned. This however would be a
mistake, because the number of such pairs can change when we make a move,
which means that the selection probabilities g(u —> v) and g(v —> u) for
moves in opposite directions would not be equal any more, and so would not
cancel out of Equation (5.12). Choosing our spins in this fashion would result
in an algorithm which does not generate states with the correct Boltzmann
probabilities.4
In theory calculating the energy difference A.E means calculating the
energies Eu and Eu of the system before and after an exchange and taking
their difference. As with the Metropolis algorithm however, most of the
spins on the lattice are the same in states u and v, and we can use this
fact to derive a formula for AE which does not involve summing over the
whole lattice. The derivation follows exactly the same lines as that for the
Metropolis algorithm in Section 3.1.1. We won't go through it in detail,5
but the end result is that
4
Actually, it is possible to create a correct algorithm in this way, but one has to be
quite crafty about it. And for equilibrium simulations there are, as we will shortly see,
better ways than this to improve the efficiency of our algorithm. For out-of-equilibrium
simulations, on the other hand, an approach which favours anti-parallel spin pairs can
prove useful. This point is discussed in Section 10.3.1. (See also Problem 5.1.)
5
The derivation appears at the end of the chapter as Problem 5.2.
140 Chapter 5: The conserved-order-parameter Ising model
Here k and k' are the two spins which we propose to exchange and the two
sums are carried out over all the nearest neighbours of each spin except the
two spins k and k' themselves. Notice that, like Equation (3.10) for the
Metropolis algorithm, this expression only involves the values of spins in the
initial state u and not in the final state v, which allows us to calculate the
change in energy without actually having to carry out the spin exchange.
which has a minimum value of1/2when p = 1/2. Thus we are always wasting
at least 50% of our moves doing nothing, and perhaps more, depending on
the particle density p. If instead of picking pairs at random we pick one
spin from the set of up-pointing spins and one from the set of down-pointing
ones and exchange their values (or alternatively, flip them both over, which
comes to the same thing) with a probability given by Equation (5.11), then
no time is wasted on aligned pairs. This gives us an immediate improvement
in efficiency by a factor of 1/(1 — p||), which is at least 2 and can become
arbitrarily large for p close to zero or one. The proof of ergodicity for this
algorithm follows exactly the same lines as for the Kawasaki algorithm, as
does the proof of detailed balance once we note that the selection probability
for every spin pair is the same, so that the factors g(u —> v) and g(v —> u)
cancel out of Equation (5.12) again.6
Implementation is a little more complex for this algorithm than it was for
the Kawasaki algorithm, since in order to pick an up-spin and a down-spin
we need to know where all the up- and down-spins are. Simply picking spins
at random and throwing them away if they don't have the desired direction
would not work—the resulting algorithm would be no more efficient than
the original one which we proposed. In order to achieve the full performance
gain of our new algorithm we must keep a list of all the up- and down-spins
and choose our pairs of spins at random from these lists. Luckily this is
quite simple to do. Since the number of up- and down-pointing spins does
not change during the simulation, two arrays of fixed length can store the
two lists, and when the values of two spins are exchanged, we also exchange
their entries in these lists. The computational overhead incurred in doing
this is relatively small,7 and the gain in efficiency of the algorithm certainly
justifies the extra work involved.
This non-local spin-exchange algorithm is already a huge improvement
on the Kawasaki algorithm. For example, we find that the equilibration of
6
Note that this algorithm obeys the condition of detailed balance perfectly well, even
though, as we pointed out in Section 5.1, the equivalent local algorithm, in which we
choose at random amongst only those pairs of nearest-neighbour spins which are anti-
aligned, does not.
7
See the sample program given in Appendix B.
5.2 More efficient algorithms 143
an interface simulation like the ones depicted in Figure 5.2 requires about
1/40 as many Monte Carlo steps with the new algorithm as it does with the
Kawasaki one. One reason for this is the non-diffusive nature of the particle
motion. Another is that the non-local update move always picks pairs of
spins which are anti-aligned. The Kawasaki algorithm, as we have pointed
out, sometimes picks pairs which are aligned and thereby wastes time. As it
turns out, in fact, the Kawasaki algorithm picks such aligned pairs almost
all of the time when we are in the phase coexistence regime, which makes
it quite inefficient. A glance once more at Figure 5.2 reveals the reason for
this: the Kawasaki algorithm only selects adjacent pairs of spins, but most
adjacent pairs of spins are pointing in the same direction when we are in
the coexistence regime. To be fair however, the non-local algorithm is more
complicated to program and in our implementation we find that a single step
requires about twice as much CPU time as in the Kawasaki case. Even in
terms of CPU time however, the new algorithm is still a factor of 20 faster
than the Kawasaki algorithm.
Our non-local algorithm is still far from perfect however. The main prob-
lem with it is the perennial one of the acceptance ratio. If the acceptance
ratio is low, we are wasting a large portion of our CPU time selecting pairs
and then failing to exchange their values. In the simulations of interfaces
described above, for example, we find that at T = 0.8TC the average ac-
ceptance ratio for the exchange of a pair of anti-aligned spins is only about
3%.
It would be satisfying if we could find an algorithm which only considered
the exchange of anti-aligned spin pairs but had a higher acceptance ratio
than this, perhaps even an acceptance ratio of one. It turns out that we
can achieve this goal by using the continuous time Monte Carlo method
described in Section 2.4.
where <ij is the Kronecker <5-symbol, which is 1 when i = j and zero other-
wise. The spin coordination number is the number of nearest-neighbours of
site i which have spins pointing in the same direction as Si, i.e., the number
of satisfied bonds between the spin on site i and its neighbours. The change
in the total energy when we update a spin at site i is then just — 2 J times
the change in the number of satisfied bonds, or —2J(nv i - nu i )), where nui
and nvi are the values of ni in the states u and v. The energy change on
exchanging two spins i and j is
This choice is particularly elegant because it depends only on the initial state
of the lattice u and not on the final one, and also because it factorizes into
a product of two terms, one for each of the two spins involved. As we will
see in a moment, this simplifies the implementation of our algorithm.
Using this choice, our continuous time Monte Carlo algorithm is now as
follows. We choose an up-pointing spin i at random from all the possible
choices with probability proportional to e-2BJnui and a down-pointing spin j
with probability proportional to e- 2BJnui . The joint probability of choosing
the two is then given by Equation (5.19). Then we simply exchange the
values of the two spins. In order to ensure that detailed balance is preserved
we must also add an amount
to our time variable for each move we perform (see Equation (2.19)). Av-
erages over the measured values of observable quantities then become time
averages, with states weighted in proportion to the time At spent in them.
5.3 Equilibrium crystal shapes 145
the preferred state of the model in the phase coexistence regime is one in
which most of the particles in the system coalesce into a single domain or
droplet which is usually approximately circular in shape. The reason is that
the interface between domains of up- and down-pointing spins is energeti-
cally costly—the total energy of the system increases by 2 J for every pair
of anti-aligned nearest-neighbour spins along the interface. In effect, there
is a surface energy or surface tension at the boundary of our droplet.9
The system can therefore reduce its energy by reducing the length of the
boundary. The lowest surface to area ratio in two dimensions is achieved
when the droplet is a circle,10 although, as we will see, it never actually
takes this shape in practice because of the effects of the lattice.
Crystals display similar behaviour as they crystallize, choosing a shape
which reduces their surface energy. In the limit of a large crystal this pre-
ferred shape is called the equilibrium crystal shape or ECS. In 1901
Wulff invented a geometrical construction for calculating equilibrium crystal
shapes from a knowledge of the way in which the surface tension varies with
the orientation of a surface. His construction shows that the ECS is indeed
a circle (or a sphere in three dimensions) if the surface tension is isotropic.
Because of the underlying crystal lattice, however, the surface tension is not
isotropic and for real crystals the ECS is at best only approximately circu-
lar. Unfortunately, the orientation dependence of the surface tension is only
known exactly in a very few cases, so that it is usually not possible to apply
the Wulff construction to calculate the ECS. Instead we turn to Monte Carlo
methods.
In order to perform a truly accurate calculation of an equilibrium crys-
tal shape we would need to simulate in detail the interactions between the
molecules in a solidifying solid. This however is not an easy thing to do.
So instead many people have turned to the conserved-order-parameter Ising
model as a simple model of the processes taking place. Like the real physi-
cal system, the COP Ising model shows "crystallization" in which particles
coalesce out of the "melt" onto a regular crystalline lattice, and it possesses
a surface tension which is orientation-dependent, giving rise to non-trivial
equilibrium crystal shapes. It is true that the details of the interactions be-
tween the particles in our Ising model are very different from those in the
true crystal, so we do not expect the model to give an accurate quantitative
prediction of a real ECS, but, as with our earlier simulations of interfaces,
the study of the COP Ising model can nonetheless shed light on the sort of
processes taking place when a crystal solidifies.
9
For simplicity we will use the three-dimensional term "surface tension" to refer to this
phenomenon in both two and three dimensions, although in two dimensions it is strictly
more correct to call it "line tension".
10
Strictly, this is only true if the density p of particles is sufficiently low or sufficiently
high. See Problem 5.3 for an exploration of this point.
5.3 Equilibrium crystal shapes 147
11
We illustrate the method using a two-dimensional system for clarity. At the end
of this chapter we also give some results for three-dimensional systems. However, the
two-dimensional results are not purely academic. In fact, the two-dimensional COP Ising
model is quite a good approximation to the behaviour of adsorbed atoms on the surfaces'
of metals, and in certain temperature regimes one does indeed find droplets or islands of
adatoms which look very similar to the ones depicted in Figure 5.4. This point is discussed
further in Chapter 11.
12
In fact, our system has four-fold rotational symmetry and inversion symmetry on the
148 Chapter 5: The conserved-order-parameter Ising model
volves taking the mean of the particle density over all our measurements as
a function of position. At the centre of mass itself this mean takes a value
close to p+ (see Equation (5.10)) and it remains roughly constant until we
approach the droplet interface, at which point it falls quickly towards p-.
We define the edge of the droplet to be the point at which the density is
equal to 1/2.
As an example of this kind of calculation we show in Figure 5.5 measure-
ments of the average ECS of droplets on square and triangular lattices in
two dimensions. In these calculations we used the continuous time Monte
Carlo algorithm of Section 5.2.1, which again proves significantly faster than
either the Kawasaki algorithm or the non-local algorithm we introduced in
Section 5.2.
We can also use the Monte Carlo method to study the ECS of three-
dimensional systems. (Real crystals are after all three-dimensional.) The
behaviour of the ECS is richer in three dimensions than in two. In two di-
mensions the sides of the droplet are flat at zero temperature, but become
rounded for all finite temperatures. In three dimensions the story is more
complicated. On a cubic lattice for instance, the ECS is a cube at T = 0,
just as we would expect. As we raise the temperature above zero, the corners
and edges of the cube become rounded, but the droplet still has flat facets.
square lattice (the symmetry group is C4), so we can reduce the errors on our average
by using each droplet configuration eight times over in each of its symmetry equivalent
orientations.
5.3 Equilibrium crystal shapes 149
Problems
5.1 As we pointed out in Section 5.1, the variant of the Kawasaki algorithm
in which we choose only among the anti-aligned nearest-neighbour spin pairs
does not sample the Boltzmann distribution correctly. There is however a
correct generalization of the Kawasaki algorithm which favours anti-aligned
pairs and is more efficient than the standard algorithm when the density
of particles p is far from 1/2. In this variant, we choose at random a spin
of the minority type (up or down, particle or vacancy) and then choose
at random one of its neighbours, and exchange the values of the pair with
the Metropolis acceptance ratio, Equation (5.11). Prove that this algorithm
satisfies the conditions of ergodicity and detailed balance. Why is it more
efficient than the standard Kawasaki algorithm?
5.2 Derive Equation (5.13).
5.3 Consider a COP Ising system on an L x L square lattice in two dimen-
sions with a density of particles p. Calculate the total length of interface
in the system (in terms of number of unsatisfied bonds) if (a) the parti-
cles coalesce into a single droplet and (b) they form a band with straight
interfaces which stretches all the way across the system and wraps around
the boundary conditions. Hence show that the droplet is not the preferred
configuration of the system below Tc when1/4< p < 3/4.
6
In the previous few chapters of this book we have studied a variety of dif-
ferent Monte Carlo algorithms for simulating the Ising model under various
conditions. In Chapter 3 we looked at the Metropolis algorithm, as it ap-
plies to the Ising model, and in Chapter 4 we looked at a number of other
algorithms, most of which are designed to simulate the model more effi-
ciently in the critical region, the region near the phase transition between
paramagnetism and ferromagnetism. In the last chapter, we looked at a
slight variation on the Ising model theme, the conserved-order-parameter
Ising model, which can be used as a model of a gas, or of atoms diffusing on
a surface. As we have seen, for different regimes of temperature or different
variations of the model, different algorithms are appropriate. The trick is
to tailor the algorithm to the type of problem we wish to tackle. However,
all the algorithms we have seen so far eventually boil down to one thing:
they generate a series of states, a Markov chain, in which each state is de-
rived from the previous one in the chain, and attempt to contrive that each
should appear with a probability proportional to its Boltzrnann weight. In
this chapter, for the first time, we will come across systems for which an
approach of this kind doesn't work. The problem, as we will see, is that for
some systems, the so-called glassy systems, algorithms of this kind tend to
get stuck in small regions of the state space from which they cannot escape.
The reason for this is two-fold. First, the fact that each state is generated
from the previous one in the chain means that there is a sense in which
states are "close together". There are states which it is easy to get to from
the present one in a small number of moves, such as states which differ by
the flipping of just a few spins for a single-spin-flip algorithm, and there are
other states which you can only get to by making a large number of moves.
Even for algorithms such as the Swendsen-Wang algorithm of Section 4.4.1,
which can in principle get from any state to any other in a single move, some
152 Chapter 6: Disordered spin models
states are much more likely to be generated from a given starting state than
others, even though the states may have the same energy. The result is that,
for almost all algorithms, there are pairs of states between which the only
probable (or possible) routes are via a large number of intermediates. This
in itself would not be a problem—all of the algorithms we have described
for the Ising model have this property, and yet all of them work acceptably
well, at least in some temperature regimes. For the glassy systems we will
be interested in in this chapter however, it is a problem because the state
spaces of these systems contain metastable "basins" in which all the states
have relatively low energy, surrounded by states with much higher energy.
In order to escape from such a basin, our Monte Carlo algorithm must pass
through one of these states of higher energy. But if the algorithm samples
states according to their Boltzmann weights, then the chances that it will
ever make a move to one of these high-energy states is exponentially low, and
so, at least at low temperatures, the algorithm gets stranded in the basin.
We will have to wait a very long time indeed before it finds its way across
the "energy barrier" into the world outside.
In fact, the existence of metastable energy basins is not unique to glassy
models. For example, the normal Ising model of Chapters 3 and 4 also
possesses such basins. Below the critical temperature, the Ising model has a
spontaneous magnetization (see Section 3.7.1) so that a significant majority
of its spins point in one direction or the other. In order to invert this
spontaneous magnetization from (say) up to down, we must create a domain
of down-spins amongst our up-pointing majority and grow it until it fills the
entire system. It is not difficult to see that this implies that at some point we
must have at least two domain walls between up- and down-spins which are
of dimension at least the size L of the system. The cost of such a pair of walls
is at least 4JL d-1 , where d is the dimensionality of the system. This then is
the energy barrier which must be crossed in order to invert the magnetization
of the system. Since the probability of sampling a state with this energy is
a factor of exp(—4BJL d - 1 ) smaller than the chance of sampling a low-lying
state of the system, the time to cross this barrier becomes exponentially long
as the size of the system grows. We see an exactly similar exponential slowing
down with system size in glassy models as well, and in fact this behaviour
is often taken as definitive of a glassy system. What then is it that tells us
that the Ising model is not a glassy system? The answer is that the two
energy basins in the Ising model are symmetry equivalent—each state in
one of them corresponds to a state in the other which has exactly the same
spin configuration, except that each spin is inverted. This means that it
does not matter which basin we are in below the critical temperature—we
will get exactly the same answer for all measured properties of the system,
except for a possible symmetry operation (such as the change of the sign
of the average magnetization). By contrast, the basins in a glassy model
6.1 Glassy systems 153
move is so long that, for all practical purposes (such as building houses),
we can consider it to be solid. The temperature at which the glass becomes
solid is known as the glass temperature, usually denoted Tg. In fact, it
is still a matter of debate whether this transition is a sharp one, like the
phase transitions we saw in the last few chapters, or whether it occurs over
a range of temperature even when the system is cooled infinitely slowly. For
our purposes however, it is enough to note that there are temperatures suf-
ficiently low that the dynamics becomes problematic, and others which are
high enough that it doesn't.
In this chapter we will not be considering real glass. Real glass is a
fantastically difficult system to study, and even after many decades of work
its properties are not at all well understood. So instead, in order to get a
handle on the basics of the problem, a number of simpler model systems
have been developed which show many of the same features but which are
mathematically easier to cope with. Here we will describe briefly two such,
both of them variations on the Ising model of the previous chapters.
Here, just as before, the notation (ij) means that the sum runs over pairs
of spins i, j which are nearest neighbours, and the variables hi are random
fields, local magnetic fields which each act on just one spin on the lattice,
and whose values are chosen at random. There are many ways in which the
random fields could be chosen. The simplest case, and the only one which
has been studied in any detail, is the case in which their values have mean
zero and in which the values on different sites are uncorrelated—they are
independent random variables. Variously they are chosen to have a Gaussian
distribution with some finite width a, or to have values randomly ±h where
h is a constant, or any of a variety of other possible alternatives. To a large
extent the interesting properties of the model are believed to be independent
of the exact choice of distribution (another consequence of the phenomenon
of universality discussed in Section 4.1). However, to be definite about it,
let us suppose that the fields hi are chosen from a Gaussian distribution:
The width a of the Gaussian is often called the "randomness" , since it mea-
sures how big the typical random fields are.
6.1 Glassy systems 155
Strictly speaking the random-field Ising model is not a true glassy system.
As discussed below, its slow equilibration arises as a direct result of the
onset of ferromagnetism below Tc, so that it doesn't really have a glass
temperature, only a critical temperature.1 However, for the purposes of
Monte Carlo simulation we may as well consider it in the same category as
the true glasses, since it presents all the same problems that true glasses
do. The model also has the nice property that it is particularly easy to
see how the large energy barriers arise which make simulation hard at low
temperatures.
Consider first the ordinary Ising model, familiar to us from Chapters 3
and 4 of this book. As we know, in two or more dimensions this model pos-
sesses a critical temperature Tc above which it is paramagnetic and below
which it is ferromagnetic, possessing some spontaneous, non-zero magnetiza-
tion, either up or down. Imagine doing a Metropolis Monte Carlo simulation
with single-spin-flip dynamics in which we cool the Ising model down quickly
from some high-temperature state (all spins random, for instance) to a tem-
perature below Tc. What we see is depicted in the frames of Figure 6.1.
Quite quickly there form domains of moderate size in which all the spins are
either up or down and as time progresses the smallest of these domains shrink
and vanish, closely followed by the next smallest, until eventually most of
the spins on the lattice are pointing in the same direction. The reason for
this behaviour is that the domains of spins possess a surface energy—there
is an energy cost to having a domain wall which increases with the length
of the wall, because the spins on either side of the wall are pointing in op-
posite directions. The system can therefore lower its energy by flipping the
spins around the edge of a domain to make it smaller (and its wall shorter).
Thus, the domains "evaporate", leaving us with most of the spins either up
or down.2
However, the story is different when we come to the random-field Ising
model. First of all, the random-field Ising model doesn't actually possess
a phase transition in two dimensions, so Figure 6.1 is not really valid in
this case. If we go to three dimensions it does have a transition and it is
ferromagnetic below its critical temperature Tc, but the physics of the model
is nevertheless very different from that of the normal Ising model. In the
random-field model domains still form in the ferromagnetic regime, and there
is still a surface energy associated with the domain walls, but it is no longer
always possible to shrink the domains to reduce this energy. For each spin
in a random-field Ising model, the random field acting on it means that the
1
Also unlike true glassy systems, it usually has a unique ground state and doesn't
possess a residual entropy in the limit T —> 0. However, this is all rather beside the point
for our purposes.
2
These and other processes involved in the equilibration of the normal Ising model are
discussed in greater detail in Chapter 10.
156 Chapter6:Disorderedspinmodels
FIGURE 6.1| When the normal Ising model is coded from an initial
high t e m p e r a t u r e state to a low temperature one (T = 1.2J in this
casej. we see domains of aligned spins f o r m i n g which one by oneevap-
orate, leaving the latice with most of i|ts spins pointing in the same
direction.
flip, say, clusters of eight spins (in the three-dimensional case), we could get
around the problem because if one of the local fields in the cluster had a large
value in one direction, there is good chance that others would have opposite
values and cancel it out, so that the overall cost in energy of flipping the
spins in the cluster would be small. Unfortunately, this is not the case. The
overall energy change when we flip the spins in a cluster depends on the sum
Ehi of the local fields in the cluster, which is also a random variable.3 We
can repeat the argument we made above for this new variable to show that
if we flip groups of eight spins, the domain walls can still become pinned
at places where the sum of eight local fields is unusually large, and so the
acceptance ratio will still become exponentially small.
One possible way around this problem is to consider flipping clusters on
all different length-scales. If a certain region of spins is pinned because the
sum of the local fields is of unusually large magnitude, it may be that when
we consider that region as part of a larger one, the local fields cancel out
and the acceptance ratio becomes reasonable again. Or conversely, perhaps
it might help to break the pinned region up into smaller ones, some of which
could be quite easy to flip, even though the whole region is not. Newman and
Barkema (1996) came up with a cluster-flipping algorithm of this kind for
the random-field Ising model, based loosely on the Niedermayer algorithm
of Section 4.4.2. However, their algorithm is tailored very much to the
particular problems of the random-field model and does not work well for
other glassy spin models. In this chapter we want to concentrate instead on
universal algorithms which are good for a wide variety of systems.
where the variables Jij are the random bond strengths. This model is
one particular instance of the general class of models known as Edwards—
3
In fact, the central limit theorem tells us that it will be a Gaussianly distributed
random variable, regardless of the distribution of the underlying random fields. Coupled
with renormalization group arguments it is this fact which tells us that the properties of
the model should not depend on the distribution we choose for the random fields.
158 Chapter 6: Disordered spin models
Anderson spin glasses (Edwards and Anderson 1975). Various choices for
J have been investigated. A common one is to choose Jij — ±1 randomly
for each bond on the lattice. Another is for the bond strengths to be un-
correlated Gaussian random variables with mean zero, just as our random
fields were in the last section.
It is less clear in the case of this spin glass than it was in the case of
the random-field model why there are energy barriers in the system—why
getting from one state to another may require you to go through intermediate
states with much higher energies. However, we can see that there can be
states with very different spin configurations which have similar low energies,
since we know that we can get from any state of the lattice to any other by
flipping one or more clusters of nearest-neighbour spins. Except under very
unusual circumstances, it should be possible to find a set of such clusters
for which the sum EJij of all the interactions around their surfaces adds
up to a number less than or close to zero. In this case, we can get from our
low-energy state to another with comparable or lower energy just by flipping
this set of clusters. However, if we now try to get from one of these low-
energy states to the other using, for example, the single-spin-flip Metropolis
algorithm, we get exactly the same problem as we did in the random-field
case. Getting from the first state to the second involves flipping in turn each
spin by which they differ and it is very likely that one or more of the spins
will be unwilling to flip, because the bonds joining it to its neighbours are
unusually strong. In effect, there is an energy barrier between the two states
and the time-scale for our Monte Carlo simulation to get from one side of the
barrier to the other will be exponentially long, which makes the simulation
of large, or even moderate-sized spin glass systems prohibitively slow. As
we pointed out before, we can always overcome the barriers in the system if
we make the temperature high enough. But when kT falls below the height
of the largest barriers in the system we lose ergodicity and things become
difficult. This is the glass transition which we spoke of in Section 6.1, which
occurs around the glass temperature Tg, and we will assume in this chapter
that it is in this low-temperature regime that we are interested in simulating
the model.
Before we get into the details of algorithms for the Monte Carlo simulation
of glassy systems, we need to consider exactly what it is we want to do
with our simulation. Real spin glasses are presumably passing continually
through large numbers of states, each of which appears with its appropriate
Boltzmann weight. If however, as we have argued above, these systems
require exponentially long times to get over the energy barriers posed by the
randomness in their Hamiltonians, then presumably this phenomenon will be
visible in their measured properties. The effect we are talking about is called
ergodicity breaking. As discussed in the introduction to the chapter, the
system fails to sample all states possessing the same energy with the same
probability, because it simply can't get to some of them during the course
of an experiment. There are energy barriers in the system so high that the
system is not likely to cross them in a minute, or a year, or a century. We can
observe the system in our laboratory for as long as we like and still find it in
the same basin of low energy states, never getting beyond the surrounding
energy barriers to sample other portions of its state space.
Thus we should not necessarily count it a failing of a Monte Carlo algo-
rithm if in simulating a glassy system it too gets stuck in an energy basin
and does not sample the whole of the state space. Perhaps this is exactly
what we want it to do in order to mimic the real system. However, there
are a number of problems with this interpretation. The first is that both
the nature of the energy barriers and the regions that a Monte Carlo algo-
rithm will sample depend on the particular dynamics of the algorithm. As
we pointed out in Section 6.1, cluster algorithms for the random-field Ising
model may be able to get over energy barriers which the Metropolis algo-
rithm cannot, for example. Thus the subset of states which get sampled, and
therefore the observed value of any particular quantity (like the magnetiza-
tion or the specific heat of the model), depend on the particular algorithm
we use. This is clearly unsatisfactory: there is no way the different answers
given by different algorithms can all agree with the results of our laboratory
experiments, and how are we to know which algorithm is right and which
is wrong? It is much better to find some observable property of our model
which is independent of the algorithm we use, so that we are free to choose
any algorithm we think will be efficient, without worrying about whether it
is the "right one".
Another problem concerns the particular values of the random interac-
tions in the Hamiltonian—the realization of the randomness, as it is com-
monly called. The measured properties of any finite system are clearly going
to depend on how we choose these values. For some models this variation
averages out as the size of the system becomes larger, although for some it
does not. (The system is then said to be non-self-averaging.) However,
160 Chapter 6: Disordered spin models
even in the cases where it does, it proves so hard to simulate glassy systems
that we rarely get around to systems large enough for this to be a factor.
As well as these problems, there is an additional one that a glassy system
is history dependent, which means that even for a given algorithm and a
given realization of the randomness the simulation can give different results
just because we start it in a different spin configuration, or with a different
random number seed (see Section 16.1.2). Depending on how we start the
simulation, the system can find its way into different energy basins, and
so give entirely different measurements of the magnetization or any other
parameter.
In order to avoid these various problems, calculations and simulations
of glassy systems tend to concentrate on average properties of the systems.
In particular we are usually interested in the values of quantities averaged
over many different realizations of the randomness, and in thermal averages
over the whole of phase space (to get around the problems of ergodicity
breaking and history dependence). In some sense this is rather unrealistic.
Real glassy systems are single realizations of the randomness which certainly
display both ergodicity breaking and history dependence. However, we can
cope with this by performing experiments on many different samples of a
spin glass (or one large sample, for systems which are self-averaging), or by
heating and cooling our samples to coax them into different basins of low
energy, and the advantages of having well-defined physical properties which
are independent of dynamics and starting conditions (and which therefore
allow us to compare simulation and experiment) are great enough to make
it worthwhile.4 As far as the simulation of glassy spin systems is concerned
then, what we want to find is an algorithm which can sample over the whole
of the state space without getting stuck in the metastable regions formed
by local basins of low energy. We will also want to perform our simulation
many times with different realizations of the randomness and average over
the results. This second step is trivial (although it can be time-consuming),
but the first—creation of the algorithm—is a problem which has puzzled
physicists for decades.
4
There are some glassy systems for which this is not the case. A good example is the
folding of proteins. Although at first it may not be obvious, the dynamics of this problem
is glassy in nature. But in proteins the realization of the randomness (which depends
on the sequence of amino acids), the starting conditions (which depend on the protein
translation mechanism, as well as the possible presence of "helper enzymes" which aid
the initial folding of the protein), and the kinetics of the folding process—the "algorithm"
which the protein uses to fold—are all very important in determining the conformation of
the protein. It would be quite wrong in this case to concentrate on properties which are
independent of these factors, since the measurement of interest is the final (folded) state
of the protein, rather than any thermally averaged quantity like an internal energy or a
specific heat.
6.3 The entropic sampling method 161
tells us that we can in fact sample states from any probability distribution
pu and still get an estimate QM of the observable Q which we are interested
in, provided we divide out the sampling distribution and then multiply by
the Boltzmann weights e- BEu . In the algorithms we have seen so far in
this book, we chose the probabilities pu to be the Boltzmann weights, which
meant that everything cancelled out nicely in Equation (6.4) and we were
just left taking the average over our measurements Qui to get an estimate
of Q. This is a very convenient route to take in most cases but, as we have
argued above, it is doomed to failure in the case of glassy systems. So, as an
alternative, Lee proposed that we sample from a new distribution denned as
follows. Let p(E) dE be the number of states of our system in the interval
between E and E + dE. The quantity p(E) is the density of states for the
system, which we encountered previously in Section 4.5.3. In the entropic
sampling method, instead of sampling states with probability proportional
to e-BEE, we sample with probability proportional to [p(E)]- 1 , the reciprocal
of the density of states. In other words, states in ranges of E where there
are many states, so that the density of states is high, are sampled with
lower probability than those in ranges where there few states. (As discussed
below, the big problem with the method is knowing what the density of
states actually is for a given energy. However, for the moment let us assume
that, by some magical means, we have this information.)
Why do we do it this way? Well, consider now the probability of sampling
a state in a given small region of energy in any particular Monte Carlo step.
The number of states in our region is just p(E) dE, and we sample each of
them with a probability
There's bad news and good news about this equation. The bad news is
that for a given number of samples it doesn't give as accurate an answer
as normal importance sampling using Boltzmann weights. The reason is
that, since the entropic sampling method samples states of all energies, it
will certainly sample some states whose energies are high enough that the
factor e-BEui in this equation is tiny. These states will make a negligible
contribution to QM and therefore we are, in a sense, wasting CPU time
putting them in. On the other hand, these states were the very reason we
resorted to entropic sampling in the first place; they, or some of them at least,
are precisely the high-energy states at the top of the energy barriers which
we need to be able to reach in order to cross from one basin to another in
the state space. Thus there is really no escaping them—we have them there
for a good reason, even though they don't contribute any accuracy to our
final estimate of the observable Q. There can also be states for which p(E)
is very small, and these also make a small contribution to QM- Whether
this is a problem or not depends on the form of the density of states.
And how about the good news? Well, to understand this take another
look at Equations (6.5) and (6.9). Notice that the sampling probability pu is
independent of temperature. This means that the sample of states which the
algorithm chooses does not depend on what temperature we are interested
in. The temperature only enters the calculation in Equation (6.9), where we
reweight our samples to get an answer. However, we can apply this equation
after our simulation is finished and all the samples have been taken. In
other words, simply by entering different values of B into Equation (6.9), we
can calculate the value of Q for any temperature we please from the same
set of samples. We only have to run the simulation once, and we get data
for the entire temperature range from zero to infinity. This makes up for
the decreased accuracy of each estimate, because we are usually willing to
perform a longer simulation to recover the lost accuracy if we know that
we will only have to perform one such simulation, rather than one for every
temperature we are interested in.
There are a number of reasons why in practice this expression gives a poor
result for the density of states (see below), but in most cases it is at least
a better estimate than the one we started off with. So we can use this new
estimate— let's call it P2(E)—as the starting point for another simulation,
and repeat the process over and over until we are satisfied with the results.
In general, the density of states at the (n + l) th step is related to that at the
nth by
the term log h(E) infinite. A simple way around this is simply to leave the
value of S(E) untouched in this case. In other words, the actual algorithm
A more serious problem with the entropic sampling method is its rate of
convergence, which in many cases is extremely poor indeed. It is clear that
if, as we have claimed, the density of states does have a very large dynamic
range, and if we start off by approximating it with a flat distribution, then
there are inevitably going to be parts of the function which are overestimated
by many orders of magnitude, relative to other parts. Bearing in mind
that the magnitude of the function h(E) varies between its most and least
populated bins by at most a factor of the number of samples taken in one
iteration of the calculation, we can see that it could take many such iterations
to arrive at the true functional form for p(E). There are sometimes a few
tricks one can play to get around this problem, but by and large the method
simply is slow to converge, and this is its primary shortcoming.
We note that states u and v with similar energies are likely to have similar
values of p(E), which makes it possible for both of the acceptance ratios in
this equation to be close to one. In other words, it is a good idea to choose
moves which take us to states with energies close to the energy of the current
state. As with the algorithms of Chapter 3, a simple way to achieve this is
to use single-spin-flip dynamics. (Other choices are possible, but this one
will do fine for now.) In other words, our new state v is generated by taking
the present state u and flipping just one spin, chosen at random. Each of
the N possible such states are generated with equal probability 1/N, and all
6.3 The entropic sampling method 167
FIGURE 6.2 The logarithm S(E) of the density of states of the three-
dimensional random-field Ising model on an 8 x 8 x 8 cubic lattice with
random fields ±h with h = 2.0. The curves show the best estimate of
S(E) at intervals of about 20 iterations, with the top curve being the
final converged result. The entire calculation took about 150 iterations.
the other states of the lattice are generated with probability zero. Then the
selection probabilities g(u —> v) and g(v — u) in Equation (6.18) drop out
and we are left with
This then defines the algorithm. We take our first estimate of the den-
sity of states, start flipping spins and accepting the flips according to this
acceptance ratio. When we have accumulated enough samples to make a
reasonable histogram we stop, recalculate p(E) from Equation (6.14) (or al-
ternatively Equation (6.17)) and then repeat the whole process. In practice
a good criterion for telling when to finish a particular iteration is to count
the average number of samples in each non-empty bin. (It is normal for
many of them to be empty, especially in the early stages of the calculation.)
When this number reaches a certain predetermined level, the iteration ends
and you recalculate p(E). In practice, quite small numbers of samples give
perfectly adequate results—ten or so per bin is a reasonable figure. The
reason is that it is more important to get through a large number of iter-
168 Chapter 6: Disordered spin models
random fields. Note that this averaging will also contribute an error to our
value for the magnetization (or any other quantity). Ideally we should divide
the available CPU time between our many simulations in such a way that the
statistical errors introduced by each individual run are of the same order of
magnitude as those introduced by averaging over the randomness. It would
be a waste of CPU time to perform a simulation long enough to reduce the
sampling error to a level where it was swamped by variations due to the
randomness. And conversely it would be a waste to perform so many runs
that the average over the realizations of the randomness was known much
more accurately than the statistical error in the individual measurements.
in each of the two simulations to those in the other. The result is that the
higher-temperature simulation helps the lower-temperature one across the
energy barriers in the system. The higher-temperature one crosses barriers
with relative ease, and when it gets to the other side, we swap the states
of the two models, thereby "carrying" the lower-temperature model across
a barrier which it otherwise would not have been able to cross. We now
describe the algorithm in detail, for the simplest case in which we do just
two simulations of the system at the same time.
Consider a glassy system, the random-field Ising model of the last section,
for example, or the Ising spin glass described by the Hamiltonian (6.3). The
system has a temperature—Tc in the case of the random-field model, or the
glass temperature Tg in the spin glass case—below which glassy behaviour
sets in and simulation becomes difficult. If we are interested in measuring
properties of the system at some temperature Tlow below this transition,
we can do so by performing, simultaneously, two simulations, one at TlOW
and another at a higher temperature Thigh, which is above the transition.
By definition, the higher-temperature simulation will not show ergodicity
breaking, and we assume therefore that it can sample uniformly over the
state space of the model with reasonable ease. The lower-temperature one
on the other hand will probably get stuck in some basin of low energies and
be unable to get out. Our algorithm then is as follows. At each time step we
do one of two things. On the majority of time steps, we simply do one step in
the simulation of each of the two systems. For example, with our spin models
we could do one single-spin-flip Metropolis Monte Carlo step, exactly as in
Chapter 3. However, every so often (exactly how often is discussed below)
instead of doing this we calculate the difference in energies of the current
states of the two simulations, A.E = .Ehigh — Elow, and we swap the values
of every spin in the two with an acceptance probability
The first step in understanding why this algorithm works is to prove that it
satisfies ergodicity and detailed balance.
Proving ergodicity is straightforward. We already know that if we simu-
late a single system using the Metropolis algorithm we achieve ergodicity. If
we therefore follow the state of one of our two systems until a swap occurs,
and then follow the other and so on, it is clear that we can reach any state
in a finite number of moves. However, if one system can reach any state in
a finite number of moves, then so by symmetry can the other, and we are
guaranteed ergodicity. The same argument also applies if we use an algo-
rithm other than the Metropolis algorithm for the individual steps of the
two simulations.
Detailed balance is a little more tricky. To prove it, let us consider the
6.4 Simulated tempering 171
joint probability puv that the low-temperature system is in state u and the
high-temperature one in state v. We want the equilibrium value of this
probability to reflect the desired Boltzmann distribution of states in both
systems. In other words, we would like to have
where Zlow and Zhigh are the partition functions of the two systems. The
condition of detailed balance, Equation (2.12), tells us that we can achieve
this if we ensure that the transition probabilities P(uv —> u'v') and P(u'v' —>
uv) satisfy
which is just the normal detailed balance condition for a single simulation.
Thus any Monte Carlo move which satisfies this condition, such as the single-
spin-flip Metropolis one, will correctly preserve detailed balance. The proof
that ordinary Monte Carlo steps on the high-temperature system also satisfy
Equation (6.23) is identical.
The swap moves are only a little more complicated. In these moves we
are swapping the states of the two systems, so u' = v and v' — u. Thus our
equation becomes
It takes only a moment to verify that the acceptance ratio given in Equa-
tion (6.21) satisfies this condition.
Thus each of the three types of move in the algorithm satisfies detailed
balance, and so, therefore, does the entire algorithm. All the proofs of Chap-
ter 2 then apply to the equilibration of the algorithm and at equilibrium we
know that any joint state uv of the combined systems will appear with the
probability given in Equation (6.22). Thus, each state in each of the simu-
lations appears with exactly its correct Boltzmann weight, so we can make
measurements in tempering simulations in the normal fashion—we sample
the observable of interest at intervals throughout the simulation (after wait-
ing a suitable time for equilibration) and take the mean of those samples.
The reason why the method gets us over energy barriers is intuitively
clear. Since the high-temperature simulation is performed at a temperature
above the glass transition of the system, it shows no ergodicity breaking, and
can find its way about the state space of the system on a time-scale similar to
that of a simulation of a normal, non-glassy system. If we start off both our
simulations in a deep energy basin then the low-temperature one will remain
stuck there, while the high-temperature one moves over the surrounding en-
ergy barriers and into other regions of state space. Now suppose we attempt
to swap the states of the two simulations. Equation (6.21) says that if the
swap would increase the energy of the low-temperature simulation by a great
deal then it is unlikely to be accepted. On the other hand, if, as it will from
time to time, the high-temperature system finds its way into a region of low
energy—another basin—then the move will quite likely be accepted, and we
will perform the swap. Thus the low-temperature system is transported in
one move to another energy basin, and the high-temperature one finds itself
back in the basin that it started in. Now the process repeats, and over a long
simulation the low-temperature simulation is moved repeatedly to new en-
ergy basins. Thus the simulated tempering algorithm effectively overcomes
the problems of barrier crossing which make simulation of glassy systems so
hard and allows us to sample a significant fraction of our state space, while
still sampling with the correct Boltzmann weights for a temperature below
the glass transition.
One question which remains to be settled is how often we should perform
the swap moves. It is clearly not a good idea to perform them too often: if
we swap the states of our two simulations, and then immediately swap them
back again, it is very likely that the low-temperature one will be dumped
back in the basin that it only just escaped from, which defeats the point of
the exercise. On the other hand, we want to perform swaps as often as is
practical, since otherwise we waste time doing the high-temperature simu-
lation and not making use of the results. (The only reason we perform the
high-temperature simulation is to help us in performing the low-temperature
one; if we were actually interested in simulating the model above the glass
6.4 Simulated tempering 173
interchanging the spin states and leaving the temperatures unchanged, and
is much faster.8
6.4.2 Variations
A number of variations are possible on the basic scheme described above.
One common one is to perform the basic Metropolis or other Monte Carlo
steps on the two systems at different rates. We have to wait about one corre-
lation time of the high-temperature system between swap moves, but there
is no reason why it should be optimal to perform the same number of moves
on the low-temperature system during this time. In particular, the basins
which the low-temperature system finds itself in may be very small, which
means that it does not take very many moves to sample them thoroughly,
and performing any more is a waste of computer resources. In this case it
is desirable to perform a smaller number of moves in the low-temperature
system than in the high. For example we might perform one Monte Carlo
step on the low-temperature system for every two in the high-temperature
one. An extreme example of this problem is when the low-temperature sys-
tem is at T = 0, or close to it. In this case, the low-temperature system will
simply sink to the lowest energy state in the local basin and then stay there,
unmoving, until a swap move occurs and moves it to another basin. Clearly,
once the local minimum of energy has been found there is no point in ex-
pending any more time on the low-temperature system, since we know that
all moves are going to be rejected. So we can save time by simply forgetting
the low-temperature system and simulating the high-temperature one until
one correlation time has passed, and then attempting a swap. This is not a
very likely scenario. We are not often interested in the behaviour of glassy
systems at absolute zero. But it serves to illustrate the point.
Another common variation on the algorithm is to simulate at more than
just two temperatures. There is in many cases a distinct advantage to adding
extra simulations at temperatures intermediate between the two discussed so
far. The reason is that, as pointed out in Section 1.2.1, the range of energies
sampled by a system at a given temperature is usually quite small, and
decreases with increasing system size as 1/\/N. However, as Equation (6.21)
indicates, for the swap moves in the simulated tempering algorithm to work,
the two states to be swapped must be of comparable energies—the state of
the high-temperature system must be a viable state at the lower temperature
in order to be accepted. In effect, this means that the ranges of energy
sampled by the simulations at the two different temperatures must overlap.
As system sizes get larger and the ranges get smaller, this becomes less and
8
An alternative in programming languages such as Pascal and C which provide pointers
is to interchange the values of pointers to the spin arrays, leaving the arrays themselves
unchanged.
6.4 Simulated tempering 175
less likely, unless the temperatures of the two are very close together. We
can get around this problem by adding one or more extra simulations at
temperatures in between the first two. As Figure 6.4 shows, a new middle
simulation can sample states over a range of energies which permits it to
overlap with the ranges of both the other simulations and so to swap states
with them easily. The middle simulation (or simulations) effectively acts as
a "conveyor" of states between the simulations at the highest and lowest
temperatures. Generalization of the algorithm to more than two simulations
is very straightforward. Each simulation attempts to swap states with the
one at the next highest temperature at regular intervals determined by the
correlation time of the hotter of the two. The acceptance ratios are exactly
as in Equation (6.21). Of course, the more simulations you do at once, the
more computer time the method demands, which again places limits on the
size of the system you can study. However, the method correctly samples
the Boltzmann distribution at all of the temperatures simulated, and so
gives results at as many temperatures as there are simulations, so the extra
computer time is by no means wasted. It does not give results for the entire
continuum of temperatures in the range simulated, as the entropic sampling
method does. However, it is possible to interpolate between temperatures
using the "multiple histogram method" described in Section 8.2, which gives
results of accuracy comparable to entropic sampling.9
Another slant on the method is to implement it on a parallel computer.
The algorithm parallelizes very nicely, since the individual simulations can
9
In fact, as we will see in Section 8.2.1, the histogram interpolation method gives accu-
rate interpolation only between simulations which sample overlapping sets of states. Since
this is also precisely the requirement for the simulated tempering method, the histogram
method almost always works well with results from simulated tempering calculations.
176 Chapter 6: Disordered spin models
10
This explains why Monte Carlo moves were carried out on all systems at the same
rate, since the multispin coding method requires it. It is possible that an algorithm which
updated different systems at different rates could make more efficient use of CPU time,
but the speed advantage of using multispin coding is a significant one, so one should not
assume this.
6.4 Simulated tempering 177
Problems
6.1 Consider a simple system with only three states, such that one state
forms an energy barrier between the other two thus:
fits this description, proposed by Mattis (1976), is to let Jij = EiEi, where
£i = ±1 are random constants defined on the sites of the lattice. Show that
this choice of Jij does not give a glassy model. Show further, however, that
if we just choose Jij = ±1 at random, the chances of coincidentally making
the Mattis choice go down exponentially with increasing system size.
6.3 Calculate the density of states for a one-dimensional Ising model with
N sites and periodic boundary conditions. (Hint: you may want to refer
to the solution of the model in Problem 1.4.) Use this to calculate the ap-
propriate acceptance ratio for a single-spin-flip entropic sampling algorithm
for the model. (If you feel like it, you could write a program to implement
this algorithm and check that it does indeed sample all energies with equal
probability.)
6.4 In Section 6.4.2 we explained that in order for the parallel tempering
algorithm to work there needs to be significant overlap of the energy ranges
sampled by the different simulations. How does the number of simulations
necessary to satisfy this requirement vary with the size of the system being
simulated?
6.5 Write a program to apply the simulated tempering algorithm to the
random-field Ising model in three dimensions. (Hint: the simplest approach
is probably just to use a single-spin-flip Metropolis algorithm for the individ-
ual simulations. Try the calculation on a relatively small system at first—say
4 x 4 x 4—so that you don't have to perform too many simulations at once.)
7
Ice models
Most of the models we have studied in the preceding chapters of this book
have been variations of the Ising model. Although Ising models are the most
important and best-studied class of models in statistical mechanics, it is
useful for us to study other classes as well, since there are many concepts
in the design of Monte Carlo algorithms which only arise when we look at
other models. In this chapter we introduce the so-called ice models, which,
as the name suggests, model some of the thermodynamic properties of ice,
though they do so only in a rather primitive fashion. Ice models are similar
to Ising models in that the microscopic variables making up the model are
discrete quantities having just two possible values, positioned on a lattice.
However, unlike the models of the previous chapters, these variables are now
located on the bonds of the lattice, rather than the sites, and, as we shall
see, this, in combination with some other features of ice models, makes for
some interesting challenges in the design of Monte Carlo algorithms.
FIGURE 7.1 The structure of the oxygen atoms and the hydrogen
bonds between them in hexagonal ice Ih (left) and cubic ice Ic (right).
structure of ice, and in the early years of the twentieth century Bragg and
others started experimenting with X-ray diffraction to try to determine this
structure. The crucial breakthrough was made by Barnes (1929), who was
the first to map the structure now known as hexagonal ice or Ih, which
is the normal form of ice at atmospheric pressure. The arrangement of the
oxygen atoms in hexagonal ice is shown on the left-hand side of Figure 7.1.
Since X-rays are scattered primarily by the electron distribution in a crystal,
the experimental signal measured in scattering studies is mostly due to the
oxygen atoms, and it is difficult to detect the small contribution from the
electron cloud around the hydrogen atoms (protons). However, since the X-
ray results show no evidence of any superlattice lines, we can conclude that
the protons are not arranged in a unit cell which is larger than that of the
oxygens. Given that the unit cell of the oxygens contains only four oxygen
atoms, this does not leave room for a very large number of different proton
arrangements, and Barnes suggested the most probable arrangement to be
one in which each proton is located exactly half-way along the line joining
the centres of two adjacent oxygens.
Barnes' conjecture has not stood the test of time however. Later and
more sensitive scattering experiments indicated clearly that protons are not
shared equally between two oxygen atoms, but that each proton stays close to
one oxygen atom only. Furthermore, two and only two protons are associated
with each oxygen, so that the integrity of the water molecules as structural
units is preserved. Building on these experimental results, Bernal and Fowler
in 1933 proposed a number of alternative proton arrangements. They were
however unable to find any periodic arrangement which had a repeat size as
small as the measured unit cell of the oxygen structure; all their proposed
structures had a larger unit cell, in direct opposition to earlier experimen-
tal findings. This apparent contradiction ultimately led Bernal, Fowler and
7.1 Real ice and ice models 181
FIGURE 7.2 The phase diagram of ice. Dotted lines indicate first-
order phase boundaries dividing pairs of phases each of which can
exist metastably in the region of stability of the other.
others to conclude that the protons in ice cannot in fact be arranged peri-
odically at all, but must instead be disordered. In their paper, Bernal and
Fowler wrote: "It is quite conceivable and even likely that at temperatures
just below the melting point the molecular arrangement is still partially or
even largely irregular, though preserving at every point tetrahedral coordi-
nation and balanced dipoles. In that case ice would be crystalline only in
the positions of its molecules but glass-like in their orientation."
Since then, it has been shown that ice can assume a large number of other
stable structures at different temperatures and pressures. On the right-hand
side of Figure 7.1, for example, we show the oxygen structure of cubic ice
(I c ), which is a metastable structure which exists in approximately the same
temperature and pressure regimes as hexagonal ice. The many different
ice structures all have one important feature in common: they obtain their
strength from the existence of hydrogen bonds between the oxygen atoms—
four hydrogen bonds per oxygen. A phase diagram of ice is sketched in
Figure 7.2. This figure may well be incomplete. It includes the most widely
accepted phases known at the time of the writing of this book, but a number
of others have been tentatively identified in the literature, and some of these
may well have undergone more thorough experimental verification by the
182 Chapter 7: Ice models
time you read this. For example, it has recently been suggested that at
very low temperatures ice IH becomes a metastable phase, and that given
enough time it will transform into another phase, called ice XI. Also at
low temperatures it has been suggested that ice VI, which is tetragonal,
transforms into an orthorhombic phase called ice VI'. Most transitions that
occur under a lowering of the temperature involve only an ordering in the
proton arrangement, while the oxygen structure stays unaltered. This is true
for the transition from ice VII to ice VIII, from ice II to ice IX, and from ice I
and VI to ice XI and VI' respectively if the latter exist. These transitions
are very difficult to observe experimentally, since the proton configuration
becomes frozen below about 100 K and some trickery has to be employed if
equilibrium is ever to be reached.1
temperature. The second ice rule is violated if either one or three protons
are located near an oxygen atom, a situation which we call an ionic defect.
Ionic defects cost even more energy than Bjerrum defects and are therefore
even more scarce.
In this chapter we study the class of models known as ice models, which
are models which obey the ice rules exactly. The low frequency with which
the ice rules are violated makes these models rather a good description of
the behaviour of the protons in real ice. As we will see however, although
models based on the rules are easily constructed, it is by no means straight-
forward to determine their properties. It took over thirty years from the first
formulation of the ice rules before an exact solution to even the simplest ice
model was found, and most ice models are not solved to this date. Instead,
therefore, researchers have turned to computer simulation, and in this chap-
ter we study the problem of designing an efficient Monte Carlo algorithm
for an ice model, and give a selection of results to demonstrate the kind of
insight we can gain from our simulations.
is 6N, where N is the number of oxygen atoms. This is not a good estimate
of the number of ground states of the system however; ignoring the first ice
rule introduces a big error, since only a small fraction of these 6N states
do not violate the first ice rule. This is easy to see, since in any one of the
states, the probability for any hydrogen bond to be occupied by exactly one
proton is only 1/2, and if any single bond is either doubly occupied, or not
occupied at all, that state should be disallowed as a possible ground state
of the system. However, this error should be easy to correct for. First note
that there are 2N hydrogen bonds on any ice lattice, since each oxygen has
four bonds and each bond is shared between two oxygens. Therefore, if the
probability of any one bond being correctly occupied by a single proton is
|, then the probability of all of them being so occupied is 2 - 2 N . Thus, we
estimate the number of states obeying both ice rules to be 6N/22N and so
the residual entropy is
This value agrees well with experimental values, and for a long time it was
believed to be an exact answer. As we now show however, this is not the
case.
To investigate Pauling's argument in more detail, we will look at the
simplest model system obeying the ice rules, square ice.2 In square ice,
the oxygens are arranged on the vertices of a square grid—the simplest
lattice with the required coordination number of four—and each oxygen
has a hydrogen bond with its neighbours to the left, right, up and down,
represented by the lines of the grid. Commonly, an arrow is drawn on each
bond to represent the state of the corresponding proton: the arrow points
2
The name "square ice" is used specifically to refer to the ice model on a square
lattice in which all configurations are assigned the same energy. In Section 7.6 we will
study a number of other ice models, such as the KDP and F models, in which different
configurations are assigned different energies. Although these models are also ice models
and are also defined on a square lattice, they are not also called square ice. This is
somewhat confusing, but since the terminology is widely used we will follow convention
and use it here too.
7.1 Real ice and ice models 185
FIGURE 7.4 The 82 different ways in which the six types of vertices
can be combined on a small 2 x 2 segment of a larger lattice. All of
the 82 configurations can be transformed into one of the four shown
here by rotation or reflection, or a combination of the two.
towards the end of the bond nearest the proton. The second ice rule then
requires that each vertex have exactly two arrows pointing towards it, and
two pointing away. This allows for exactly six types of vertices, as we pointed
out above, and has led to the alternative name six-vertex models being
applied to ice models, a name which you may well run across in the literature
from time to time. The six vertices are illustrated in Figure 7.3.
Let us now zoom in on a small piece of the lattice, containing only four
sites arranged in a square (see Figure 7.4). The arrows around any one
of these four vertices can take any of the six allowed configurations, giving
a total of 64 possible states overall. However, four of the bonds in this
small piece of lattice are shared between two sites, so that choosing the
directions of the arrows around one vertex affects the possible directions at
the neighbouring vertices. In the language of Pauling's argument, we can
correctly assign two protons to each of the four vertices, which is equivalent
to drawing two arrows pointing in towards each one and two pointing out,
but that does not guarantee that there will be only one arrow on each bond.
In fact there is only a 50% chance that any given bond will have exactly
one arrow on it, and thus only a fraction 2-4 of the possible arrangements
of the arrows correctly obey the first ice rule. Thus we expect to find a
total of 6 4 /2 4 = 81 ways to arrange the arrows. However, if we count
the possible configurations explicitly we find that there are in fact 82 of
them, as Figure 7.4 demonstrates. This is a strong indication that Pauling's
reasoning is not correct. In fact there is a flaw in his argument, which is
that it assumes—incorrectly—that the probabilities for different hydrogen
bonds to be singly occupied are independent of each other.
Although this argument demonstrates that Pauling's formula for the en-
tropy is only approximately correct, it does not tell us how to put the formula
right, and in fact a general answer to this problem still escapes us. However,
in 1967, Lieb gave the solution for square ice, employing a rather lengthy
argument involving transfer matrices. For the details of Lieb's derivation
the interested reader is referred to his paper (Lieb 1967). Here we just quote
186 Chapter 7: Ice models
his5 result:
Structures other than square ice are expected to have entropies differing
slightly from this figure. Although we don't have an exact solution for any
three-dimensional case, some very accurate estimates have been made using
series expansions. For instance, Nagle (1966) obtained the figure
for the entropy of hexagonal ice Ih- When converted into the units used in
the experiments, this comes to 5 = 0.8154 ± 0.0002 cal K-1 mole- 1 , which
is in excellent agreement with the experimental result of Giauque and Stout
quoted earlier. Note also that Pauling's estimate is only a few tenths of a
per cent different. At the time of the writing of this book the accuracy of
even the most recent experiments is insufficient to prove him wrong.
FIGURE 7.5 A
three-colouring of a square
lattice and the
corresponding configuration
of the arrows in square ice.
modulo three. The only way to get back to the colour we started with when
we have gone all the way around is if we increase twice and decrease twice.
This means that the vertex we walk around must have two ingoing and
two outgoing arrows, exactly as we desire. Thus each configuration of the
three-colour model corresponds to a unique correct configuration of square
ice.
We can also reverse the process, transforming an ice model configuration
into a three-colouring. We are free to choose the colour of one square on
the lattice as we wish, but once that one is fixed, the arrows on the bonds
separating that square from each of its neighbours uniquely determine the
colour of the neighbouring squares, and, by repeated application of the rule
given above, the colour of all the rest of the squares in the lattice. Thus,
the number of ways in which the squares of the lattice can be coloured is
exactly the number of configurations of the ice model on the same lattice,
except for a factor of three.
the two-dimensional Ising model, for which we also have an exact solution.
In the later sections of this chapter we will develop algorithms for more
complicated ice model variations.
As with the Ising model, the first step in designing a Monte Carlo algo-
rithm is to choose a set of moves which will take us from one configuration of
our ice model to another. In the case of the ordinary Ising model the obvious
Monte Carlo move was just to flip a single spin. Unfortunately, there is no
such obvious move for an ice model. We cannot simply reverse an arrow, or
change the configuration of arrows around a vertex in an ice model, since
that would affect the configuration of arrows at all the neighbouring vertices
and we would surely end up in a state that violated the ice rules. So what
elementary move can we devise which will take us from one configuration
that obeys the ice rules to another? As we will see, there are a number
of candidates, and of course it is part of our goal here to explore all the
possibilities to see how we can make algorithms that are as efficient as pos-
sible. We begin, however, as we did with the Ising model, by describing the
standard algorithm for this problem, due to Rahman and Stillinger (1972),
which involves reversing the arrows around loops on the lattice.
FIGURE 7.6 Flipping arrows one by one along a line across the lattice
allows us to change the configuration and still satisfy the ice rules. The
only problems are at the ends of the line, but this is fixed if the two
ends eventually meet one another forming a closed loop.
wandering defect fixes the arrows around both. The net result is that we
have reversed all of the arrows lying around a closed loop on the lattice, and
the final configuration will obey the ice rules as long as the initial one did.
In the figure we have illustrated the case of the smallest possible loop,
which on the square lattice involves the reversal of just four arrows. However,
loops generated by the algorithm above can be much longer than this. By
making a random choice at each vertex between the two possible arrows that
we could reverse, we generate a species of random walk across the lattice,
and even on a finite lattice this walk can take an arbitrarily long time to
return to its starting point. For this reason we will refer to this algorithm
as the "long loop algorithm". Long loops are not necessarily a bad thing; on
the finite lattices we use in our Monte Carlo simulations we are guaranteed
that the walk will always return eventually, and although longer loops take
longer to generate they also flip a larger number of arrows, which allows the
system to decorrelate quicker.
7.2.2 Ergodicity
We have now specified a move that will take us from one correct configuration
of the arrows to another, and our proposed Monte Carlo algorithm for square
190 Chapter 7: Ice models
ice is simply to carry out a large number of such moves, one after another.
However, we still need to demonstrate that the algorithm satisfies the criteria
of ergodicity and detailed balance.
First, consider ergodicity, whose proof is illustrated in Figure 7.7. The
figure shows how the difference between two configurations of the model
on a finite lattice can be decomposed into the flips of arrows around a finite
number of loops. We can show that this is possible for any two configurations
by the following argument. Each of the vertices in Figure 7.3 differs from each
of the others by the reversal of an even number of arrows. (You can convince
yourself of this simply by looking at the figure, although it follows directly
from the ice rules as well.) Thus, if we take two different configurations of
the model on a particular lattice and imagine drawing lines along the bonds
on which the arrows differ, we are guaranteed that there will be an even
number of such lines meeting at each vertex. Thus these lines form a set of
(possibly intersecting) loops covering a subset of the vertices on the lattice.
It is not difficult to show that these loops can be chosen so that the arrows
around each one all point in the same direction. Since the reversals of the
arrows around these loops are precisely our Monte Carlo moves, and since
there are a finite number of such loops, it follows that we can get from any
configuration to any other in a finite number of steps, and thus that our
algorithm is ergodic. Note that it is important to allow the loops to pass
through the periodic boundary conditions for this to work.4
4
It is not too hard to show that the loops which wrap around the periodic boundary
conditions change the polarization, Equation (7.14), of the system, whereas the ones which
don't conserve polarization. Thus, if we didn't allow the loops to wrap around in this way
the polarization would never change and ergodicity would not be satisfied.
7.3 An alternative algorithm 191
In other words, the rate for the move from u, to v should be the same as that
from v to u. In the long loop algorithm, our Monte Carlo move consists of
choosing a starting site 5o and reversing a loop of arrows starting at that
site and ending, m steps later, at the same site Sm = SQ. The probability
of selecting any particular site as the starting site is l/N, where N is the
number of sites on the lattice. The probability of making a particular choice
from the two possible outgoing arrows at each step around the loop is |
for each step, so that the probability that we chose a certain sequence of
steps is equal to 2- m , and the probability of performing the entire move is
1/N2- m . For the reverse move, in which the same loop of arrows is flipped
back again to take us from state v back to state u, the exact same arguments
apply, again giving us a probability of 1/N2-m for making the move. (The
only difference is that for the reverse move we follow the loop in the reverse
direction.) Thus the forward and backward rates are indeed the same and
detailed balance is observed. This, in combination with the demonstration
of ergodicity above, ensures that our algorithm will sample all states of the
model with the correct (equal) probabilities.
each other again, we now continue only until the wandering defect encoun-
ters any site which it has encountered before in its path around the lattice.
If we call this site Sm, then Sm = Sl with 0 < l < m. From this point, we
retrace our steps backwards down the old path of the defect, until we reach
S0 again, reversing all the arrows along the way. The process is illustrated
in Figure 7.8. The net result is that we reverse all the arrows along the path
from site SQ to 5; twice (which means that they are the same before and
after the move), and all the arrows in the loop from Si to Sm once. Since the
wandering defect only needs to find any one of the sites on its previous path
in order to close the loop, we are guaranteed that the length of its walk will
never exceed N steps where N is the number of vertices on the lattice, and
in practice the typical length is much shorter than this. (In fact, the number
of steps tends to a finite limit as the lattice becomes large—see Section 7.5.)
The proof of ergodicity for the short loop algorithm is identical to that
for the long loop case: the difference between any two states on a finite
lattice can be reduced to the reversal of the spins around a finite number of
loops. Since the algorithm has a finite chance of reversing each such loop, it
can connect any two states in a finite number of moves.
The proof of detailed balance is also similar to that for the long loop
algorithm. Consider again a move which takes us from state u. to state v.
The move consists of choosing a starting site SQ at random, then a path
P = {S0 . . . Si} in which the arrows are left untouched, followed by a loop
L — ( S l ; . . . Sm} in which we reverse the arrows. (Remember that the last
site in the loop Sm is necessarily the same as the first Si.) The probability
that we choose So as the starting point is l/N, where N is the number of
sites on the lattice. After that we have a choice of two directions at each step
along the starting path and around the loop, so that the probability that we
end up taking the path P is equal to 2~' and the probability that we follow
the loop L is 2- (m-1) After the loop reaches site Sm = Si, we do not have
any more free choices. The probability that we move from a configuration u
7.4 Algorithms for the three-colour model 193
For the reverse move, the probability of starting at S0 is again l/N, and
the probability of following the same path P as before to site S; is 2-l again.
However, we cannot now follow the same loop L from Si to Sm as we did
before, since the arrows along the loop are reversed from what they were in
state u. On the other hand, we can follow the loop in the reverse direction,
and this again has probability 2 - ( m - l ) Thus we have
exactly as before. This demonstrates detailed balance for the algorithm and,
in combination with the demonstration of ergodicity, ensures that all possible
states will be sampled with equal probability.
single-plaquet moves of this kind cannot reach these states and so do not
lead to an ergodic dynamics. Again then, we must resort to more complex
moves. One possibility is to look for clusters of nearest-neighbour plaquets
of only two colours, call them A and B, entirely surrounded by plaquets
of the third colour C. A move which exchanges the two colours A and B
in such a cluster but leaves the rest of the lattice untouched will result in
a new configuration which still has an allowed arrangement of colours, and
this suggests the following cluster-type algorithm for square ice:
1. We choose a plaquet at random from the lattice as the seed square for
the cluster. Suppose this plaquet has colour A.
2. We choose another colour B # A at random from the two other possi-
bilities.
3. Starting from our seed square, we form a cluster by adding all nearest-
neighbour squares which have either colour A or colour B. We keep
doing this until no more such nearest neighbours exist.
4. The colours A and B of all sites in the cluster are exchanged.
There are a couple of points to notice about this algorithm. First, the
cluster possesses no nearest neighbours of either colour A or colour B and
therefore all its nearest neighbours must be of the third colour, C. In the
simplest case, the seed square has no neighbours of colour B at all, in which
case the cluster consists of only the one plaquet. It is crucial to the working of
the algorithm that such moves should be possible. If we had chosen instead
to seed our cluster by picking two neighbouring plaquets and forming a
cluster with their colours, single-plaquet moves would not be possible and
we would find that the algorithm satisfied neither ergodicity nor detailed
balance. Second, notice also that within the boundary of colour C, the
cluster of As and Bs must form a checkerboard pattern, since no two As or
Bs can be neighbours.
We now want to show that the algorithm above satisfies the conditions
of ergodicity and detailed balance. In this case it turns out that detailed
balance is the easier to prove. Consider once more a Monte Carlo move
which takes us from a state u, to a state v, and suppose that this move
involves a cluster of m squares. The probability of choosing our seed square
in this cluster is m/N, where N is the total number of plaquets on the
lattice. The probability that we then choose a particular colour B as the
other colour for the cluster is 1/2, and after that there are no more choices:
the algorithm specifies exactly how the cluster should be grown from here
on. Thus the total probability for the move from u to v is m/(2N). Exactly
the same argument applies for the reverse move from v to u with the same
values of m and N, and hence the rates for forward and reverse moves are
the same. Thus detailed balance is obeyed.
7.4 Algorithms for the three-colour model 195
The proof of ergodicity is a little trickier. It. involves two .steps. First,
we show that from any configuration we can evolve; via a finite: sequence of
reversible; moves to a checkerboard colouring (a configuration in which one of
the three colours is absent). Then we show that, all checkerboard colourings
are connected through reversible moves.
Any configuration of the lattice can be broken down into a number of
checkerboard regions consisting of only two colours, surrounded by plaquets
of the third colour. This is always true since any plaquct. which doesn't
belong to .such a region can be regarded as being the sole member of a
checkerboard of one. Under the dynamics of our proposed Monte Carlo
algorithm these checkerboard domains can grow or shrink. Consider for ex-
ample a domain of colours A and B. which is necessarily entirely surrounded
by plaquets of 1 the t h i r d colour C. We can form a cluster at the boundary of
the domain out of plaquets of colour C and one of the other two colours, and
by swapping the colours in this cluster we can increase the size of our AB
domain. (In order to make this work it may be necessary first, to exchange
the colours A and D in the AB domain.) By repeating this process we can
take a single cluster of one checkerboard pattern and grow it. until it covers
the entire lattice, leaving the lattice, in a checkerboard state. This process is
illustrated in Figure 7.9. (As the figure show's, it is possible for moves of this
type to leave the occasional odd plaquet in the middle of the checkerboard
region. However, these can easily bo removed by a. Monte Carlo step which
lakes t h i s single plaquet as its cluster and changes its colour. If such singlo-
plaquet moves were not allowed, then the algorithm would not, be ergodic,
a.s we maintained above.)
There arc six possible checkerboard colourings in all,6 and from any one
6)
Hoar in mind that the checkerboard with colour A on the even silcs and 11 B I ho the
196 Chapter 7: Ice models
of them the others can easily be reached, since on a checkerboard the colour
of any square can be changed on its own without changing any other squares.
Thus for example we can get from a checkerboard of colours A and B to one
of A and C by changing all the Bs to Cs one by one. All other combinations
can be reached by a similar process.
Since we can get from any state to a checkerboard colouring and from
any checkerboard to any other, all via reversible moves, it follows that our
algorithm can get from any state to any other in a finite number of moves.
In other words, it is ergodic.
The algorithm presented above, a single-cluster algorithm, resembles in
spirit the Wolff algorithm for the Ising model which we studied in Section 4.2.
It is also possible to construct a multi-cluster algorithm for the three-colour
model, similar to the Swendsen-Wang algorithm of Section 4.4.1. In this
algorithm we start by choosing at random a pair of colours A and B. Then
we construct all clusters of nearest-neighbour spins made out of these two
colours, and for each cluster we decide at random with 50% probability
whether to exchange the two colours or not. This algorithm satisfies ergod-
icity for the same reason the single-cluster algorithm did—each move in our
single-cluster algorithm is also a valid move in the multi-cluster version, so
we can apply exactly the same arguments about checkerboard domains to
prove ergodicity. The algorithm also satisfies detailed balance: the proba-
bility of selecting a particular two out of the three colours for a move is |,
and the probability of exchanging the colours in a particular set of clusters
is 2- n , where n is the number of clusters. The probability for the reverse
move is exactly the same, and hence detailed balance is upheld.
sites is distinct from the one in which the colours are reversed.
7.5 Comparison of algorithms for square ice 197
FIGURE 7.10 The mean length (m) of loops in the long loop algorithm
as a function of system size L. We find that (m) ~ L 1 - 66 5±0-002
see Figure 7.10.7 The amount of CPU time required per step in our algorithm
increases linearly with the size of the loop, and hence we expect the CPU
time per Monte Carlo step also to increase with system size as L1-67. This is
not necessarily a problem; since longer loops reverse more arrows as well as
taking more CPU time it is unclear whether longer is better in this case, or
worse. To answer this question we need to consider the correlation time T of
the algorithm. This, however, presents a new problem. In order to calculate
T we first need to calculate an autocorrelation (see Section 3.3.1), but what
quantity do we calculate the autocorrelation for? In the case of the Ising
model in Chapter 3 we calculated the autocorrelation of the magnetization,
but the square ice model does not have a magnetization. We might consider
calculating the autocorrelation of the internal energy, but all states of the
square ice model have the same energy, so the energy autocorrelation would
always be zero.
In this case, we have chosen to measure a quantity psym which we define
to be the density of the symmetric vertices, numbers 5 and 6 in Figure 7.3.
We calculate the autocorrelation of this quantity using Equation (3.21) and
then fit the resulting function to an exponential to find the correlation time,
as in Figure 3.6. The results are shown in Figure 7.11. As we can see, when
we measure time in Monte Carlo steps we find a correlation time which
7
In fact, it is known that the loop size scales exactly as L5/3. This result follows from
calculations using the so-called Coulomb gas representation of the square ice model. See
Saleur (1991).
198 Chapter 7: Ice models
FIGURE 7.11 The correlation time in Monte Carlo steps of the long
loop algorithm as a function of system size L. The best fit straight
line gives rsteps ~ La68±0.03.
FIGURE 7.12 The correlation time rsteps of the short loop algorithm
measured in Monte Carlo steps as a function of system size. The best
fit straight line gives Tsteps ~ L2-00±0.01
The short loop algorithm of Section 7.3 also involves creating a pair of
defects and having one of them diffuse around. Recall however that in this
case the wandering defect only has to find any of the sites which it has
previously visited in order to close the loop and finish the Monte Carlo step.
If the diffusion were a normal random walk then this process would generate
loops of finite average length. Although the diffusion of defects in square ice
is not a true random walk, it turns out once more that the behaviour of the
system is qualitatively the same as if it were. Prom simulations using the
short loop algorithm we find that the average number of steps per move is
(m) ~ 13.1, independent of the lattice size, for a sufficiently large lattice.
The correlation time measured in Monte Carlo steps Tsteps, for the same
observable psym as above, increases as L2, as shown in Figure 7.12. Since
the mean number of steps in a loop is independent of L, the correlation time
per unit area therefore goes as
Thus the correlation time of the short loop algorithm does not increase at
model varied with system size at the critical temperature in the algorithms studied in
Chapter 4. Yet the square ice model has no temperature parameter and therefore pre-
sumably cannot have a phase transition. In fact, it turns out that, in a certain technical
sense, the square ice model is at a phase transition, and the exponent 0.35 measured here
is precisely the dynamic exponent z of our algorithm at this transition (see Section 4.1).
A discussion of this point is given by Baxter (1982).
200 Chapter 7: Ice models
all with system size. This is the best behaviour we could hope for in our
algorithm, and so the short loop algorithm is in a sense optimal.
Our third algorithm was the single-cluster three-colour algorithm which
we developed in Section 7.1.3. For this algorithm each Monte Carlo step
corresponds to (n)/L2 sweeps of the lattice, where (n) is the average cluster
size. Like the average loop length in the long loop algorithm, (n) scales up
with increasing lattice size, and from our simulations we find that
The large value of the exponent appearing here indicates that the single-
cluster algorithm would be a poor algorithm for studying square ice on large
lattices.
Our last algorithm, the full-lattice three-colour algorithm, also described
in Section 7.1.3, generates clusters in a way similar to the single-cluster
algorithm, but rather than generating only one cluster per Monte Carlo
step, it covers the whole lattice with them. For this algorithm we find
numerically that the correlation time Tsteps measured in Monte Carlo steps is
approximately constant as a function of lattice size. Since each Monte Carlo
step updates sites over the entire lattice, one step corresponds to a constant
number of sweeps of the lattice on average, and hence the correlation time
in moves per site goes as
Thus, like the short loop algorithm, this algorithm possesses optimal scaling
as lattice size increases.
Comparing the four algorithms, clearly the most efficient ones for large
systems are the short loop algorithm and the full-lattice three-colour algo-
rithm. In both other algorithms, the computer time required per site to
generate an independent configuration of the lattice increases with system
size. The larger impact of the larger loops in the long loop algorithm for
instance does not compensate for the extra effort invested generating them.
Between the short loop algorithm and the full-lattice three-colour algorithm
it is harder to decide the winner since both have the same scaling of CPU
requirements with system size. The best thing to do in this case is simply
try them both out. As far as we can tell, the loop algorithm is slightly faster
(maybe 10% or 20%), but on the other hand the three-colour algorithm is
considerably more straightforward to program. It's a close race.
7.6 Energetic ice models 201
where the vector fij is a unit vector in the direction of the iih arrow. In
an infinite system the polarization is zero above the critical temperature
of the phase transition Tc, and non-zero below it with a direction either
upwards and to the right, or downwards and to the left. The factor of 1/2
in Equation (7.14) is included so that the magnitude of P approaches unity
asT 0.
Another widely studied energetic ice model is the so-called F model
introduced by Rys (1963). In this model the symmetric vertices, numbers 5
and 6 in Figure 7.3, are favoured by giving them a lower energy —e, while
all the others are given energy zero. This model has a ground state in which
vertices 5 and 6 alternate in a checkerboard pattern across the lattice. There
are again two possible such ground states, depending in this case on which
type of vertex falls on the even sites of the lattice and which on the odd, and
as we lower the temperature there is a phase transition to one or other of
these states from a high-temperature phase in which vertices 5 and 6 fall on
even and odd sites with equal probability. Since neither symmetric vertex
202 Chapter 7: Ice models
where Vi is the type of the vertex at site i, represented using the numbering
scheme from Figure 7.3. Interest in the F model has primarily focused on
its behaviour close to the phase transition, which as Lieb (1967) has shown
takes place at Tc = e/ log 2. Most of the developments here will be directed
at finding an algorithm which performs well in this region, which means
both finding an efficient way of sampling states of the system with their
correct Boltzmann probabilities, and also tackling the critical slowing down
problems which we typically run into in the vicinity of a phase transition
(see Section 3.7.2).
The simplest way to create a correct algorithm for the F model is to
generate possible moves, such as the loop or cluster moves of the previous
sections, and then employ a Metropolis-type scheme in which instead of
accepting every move generated by the algorithm, we accept them with a
probability A(p, —> v) which depends on the energy difference &E = Ev -E^
7.6 Energetic ice models 203
FIGURE 7.13
(a) Symmetric vertices
become non-symmetric if a
loop passes through them.
(b) Non-symmetric vertices
stay non-symmetric if the
loop goes straight through,
but (c) become symmetric
if the loop makes a turn.
between the states u and v of the system before and after the move thus:
FIGURE 7.14 The correlation time rsteps of the short loop algorithm
for the F model measured in Monte Carlo steps, as a function of system
size. The best fit straight line gives Tsteps ~ L 2 - 00±0 - 09 .
each such vertex adds an amount e to AE, it is clear that the energy cost
of making a move will increase with the length of the loop, especially at low
temperatures. Large values of AE mean low acceptance ratios, which are
wasteful of computer time, and this implies that the short loop version of
our F model algorithm should be more efficient than the long loop one.
In Figure 7.14 we show results from a simulation of the F model using
the short loop algorithm at the critical temperature. The figure shows the
correlation time rsteps of the algorithm measured in Monte Carlo steps, and
the best fit to these data gives us
As with square ice, the number of sites updated by a single Monte Carlo
step tends to a constant for large lattices, so that the correlation time in
steps per site is
As with the Ising model calculations of Chapter 4, the scaling of r with the
system size gives us a measure of the dynamic exponent of our algorithm
(see Section 4.2). In the present case, all indications are that our short loop
algorithm has a zero dynamic exponent, at least to the accuracy with which
we can measure it. This is the best value we could hope for. It implies
that the CPU time taken by the algorithm goes up only as fast as the size
of the lattice we are simulating, and no faster. However, it turns out that
7.6 Energetic ice models 205
the algorithm is still quite inefficient because it has a low acceptance ratio,
which means that much of the CPU time is being wasted proposing moves
which are never carried out. For example, at Tc the acceptance ratio of the
algorithm is 36%, so that nearly two thirds of the computational effort is
wasted.
How can we increase the acceptance ratio of our Monte Carlo algorithm?
One way is to modify the algorithm to generate moves that are less likely
to cost energy and therefore more likely to be accepted. For example, if we
could encourage the loop to make turns in non-symmetric vertices, we would
on average end up with a lower final energy, since the reversal of the arrows
around the loop will create more symmetric vertices. Unfortunately, it turns
out to be rather complicated to formulate a correct algorithm along these
lines, and the expression for the acceptance ratio becomes quite tedious.
There is however an elegant alternative, which is to employ a three-colour
algorithm of the type discussed in Section 7.4.
3. Starting from our seed square, we form a cluster by adding all nearest-
neighbour squares which have either colour A or colour B, and in
206 Chapter 7: Ice models
addition we now also add to the cluster the squares which are next-
nearest neighbours of some square i which is already in the cluster,
provided they have the same colour as square i. However, we make this
latter addition with a temperature-dependent probability Padd < 1,
whose value we calculate below in order to satisfy the condition of
detailed balance. We go on adding squares to the cluster in this way
until every possible addition has been considered.
4. The colours A and B of all sites in the cluster are exchanged.
It is straightforward to prove ergodicity for this algorithm. Since our
three-colour algorithm for square ice was ergodic (see Section 7.4), and since
each move in the square ice algorithm is also a possible move in our F model
algorithm (as long as Padd < 1)> the result follows immediately.
Detailed balance is a little more tricky. As before, we consider two states
u, and v which differ by the exchange of colours in a single cluster of m
squares. The probability of choosing the seed square to be in this cluster
is m/N and the probability of choosing a particular colour B as the other
colour for the cluster is1/2,just as in the square ice case. However, we now
also have a factor of Padd for every square which we add to the cluster which
is only a next-nearest neighbour of another and not a nearest neighbour.
And we have a factor of 1 — Padd for every such site which we could have
added but didn't. Thus the overall probability of making the move from u
to v is
The expression for log P(y —» u) is identical except for the exchange of the
labels u, and v. Thus the logarithm of the ratio of the probabilities for the
forward and reverse moves is
(see Equation (7.20)). The only contribution in this sum comes from nearest-
nea rest-neighbour pairs i,j| such] that i belongs to the fluster and j dons
not, since all other pairs contribute the same amount to the Harniltonian in
state u, a.s in state v. Thus
9Notice the similarity between this rosull and Equation (4.13) for the Wolff algorithm.
Our cluster algorithm for the three-colour model and the Wolff algorithm for the Ising
model have more in common than just a philosophical similarity.
208 Chapter 7: Ice models
FIGURE 7.16 The average area covered by the largest cluster of sym-
metric vertices in the F model as a function of inverse temperature B
for a variety of different system sizes.
method above. The proofs of ergodicity and detailed balance for the full-
lattice version of the algorithm follow from the single-cluster version just as
in the case of the square ice model.
So how do these cluster algorithms perform? In Figure 7.15 we show some
results from simulations of the F model with e = 1 and varying temperatures
using the full-lattice version of the algorithm described above. In this figure
we have coloured areas of the two low-energy domains (checkerboards of
symmetric vertices) in black and white—type 5 vertices on even lattice sites
and type 6 vertices on odd lattice sites are black, while type 5 vertices on
odd lattice sites and type 6 vertices on even lattice sites are white. All other
vertices are in grey. The phase transition is clearly visible in the figure as a
change from a state in which black and white appear with equal frequency
to one in which one or the other dominates.
One observable in which we can detect the change in behaviour at the
phase transition is the fraction of the system covered by the largest cluster of
symmetric vertices (the black and white areas in Figure 7.15). In Figure 7.16
we have plotted this fraction as a function of the inverse temperature B for
a number of different lattice sizes L with e = 1. For large (3 (i.e., low
temperature) the largest cluster percolates (see Section 4.3) and covers a
sizeable fraction of the system. For smaller values the cluster only covers
a small fraction of the system and its size is independent of the size of the
lattice, so that the fraction of the total system size which it covers gets
smaller as L increases. The transition between these two regimes can be
7.6 Energetic ice models 209
seen in Figure 7.16, although it is only really clear for the larger system sizes
(L = 400 and above). The critical temperature appears to be somewhere in
the range between B = 0.65 and 0.75, which agrees with Lieb's exact figure
of Bc — log 2 = 0.693. In Section 8.3 we will show how the technique of
finite size scaling can be used to make more accurate estimates of critical
temperatures from Monte Carlo data.
More extensive simulations (Barkema and Newman 1998) indicate that
the single-cluster three-colour algorithm for the F model is actually rather
poor at simulating the model near Tc. The measured dynamic exponent is
about z — 1.3, making it the worst of the algorithms we have looked at in
this chapter. The full-lattice algorithm on the other hand is much better;
there is no measurable increase in the correlation time in number of lattice
sweeps with system size at Tc. The best estimate of the dynamic exponent
is z = 0.005 ± 0.022. Since this algorithm has an acceptance ratio of unity
and is in addition relatively straightforward to program, it would clearly be
the algorithm of choice for studying the critical properties of the F model.
Problems
7.1 We can extend the square ice model to three dimensions by placing
arrows on the bonds of a cubic lattice and requiring that exactly three arrows
enter and leave each vertex. Using Pauling's argument (Section 7.1.2), make
an estimate of the residual entropy per site of this model.
7.2 In the ground states of the q = 3 Potts model with anti-ferromagnetic
(J < 0) nearest-neighbour interactions, no two adjacent spins may take the
same value. This means that the set of ground states is the same as the set
of states of the three-colour model described in Section 7.1.3, or equivalently
the square ice model. What kind of Potts model is equivalent to the F model
of Section 7.6?
7.3 Design a Monte Carlo algorithm to simulate the eight-vertex model
described at the beginning of Section 7.6, in the version where all vertices
have the same energy.
8
In the preceding chapters of this book we have looked at Monte Carlo al-
gorithms for measuring the equilibrium properties of a variety of different
models. The next few chapters are devoted to an examination of Monte
Carlo methods for the study of out-of-equilibrium systems. But before we
embark on that particular endeavour, we are going to take a look at some
techniques for the analysis of data from equilibrium simulations. There are
no new Monte Carlo algorithms in this chapter, but it does describe sev-
eral methods of data analysis which can significantly improve the quality of
the results we extract from the raw data spit out by the algorithms of the
previous chapters.
In Chapter 3 we discussed a number of basic techniques for extracting
estimates of quantities of interest from Monte Carlo data, and for calculating
the errors on these estimates. We covered such topics as the measurement
of the equilibration and correlation times, and the calculation of the values
of directly measurable quantities such as internal energy, as well as ones
which can only be measured indirectly, such as specific heat or entropy. We
looked at error estimation techniques like blocking and bootstrapping, and
at the calculation of correlation functions and autocorrelations. Most of the
methods we discussed are essentially direct applications of ideas common
to all data analysis; we deal with our Monte Carlo data very much as an
experimentalist would deal with data from an experiment on a real physical
system. In this chapter, by contrast, we will examine a number of data anal-
ysis techniques which are peculiar to Monte Carlo methods. In Sections 8.1
and 8.2 we look at the "histogram method" of Ferrenberg and Swendsen, in
both its single and multiple histogram incarnations. The single histogram
method allows us to take a single Monte Carlo simulation performed at some
particular temperature and extrapolate the results to give predictions of ob-
servable quantities (internal energy, for instance) at other temperatures. The
8.1 The single histogram method 211
In most of the algorithms we have seen so far, the probabilities pui with
which the individual states are sampled once the simulation reaches equilib-
rium were chosen to be the Boltzmann weights for the temperature we are
interested in.1 Suppose however, that the pui are instead the Boltzmann
probabilities for another temperature Bo = 1/kT0—one close to, but not
exactly equal to the temperature we are interested in:
1 An exception is the entropic sampling method of Section 6.3, which samples states in
inverse proportion to the density of states.
212 Chapter 8: Analysing Monte Carlo data
energy (i.e., the Hamiltonian) of the system. We can rewrite the equation
in the rorm
where the sum now runs over all possible energies E of the states of our
32 x 32 Ising model (i.e., E = -2048J, -2044J, . . . 2048J) and N(E) is the
number of times that states of energy E occurred during the Monte Carlo
run. N(E) is thus a histogram of the energies of the states sampled, and in
fact in their original exposition Ferrenberg and Swendsen wrote the entire
development of their method in terms of histograms like this, which is where
the name "histogram method" comes from. However, although we will use
the histogram here to show what the limits of the extrapolation should be, it
is not necessary to use it to perform calculations using the histogram method;
Equation (8.3) is all that is necessary. The main reason why one might
want to use a histogram is because the direct application of Equation (8.3)
normally requires us to store all the measurements Qui made during the
simulation. For a long simulation this means storing a large amount of data,
whereas the histogram by contrast is fairly small, and furthermore does not
grow in size with the length of the simulation. However, these days (in
214 Chapter 8: Analysing Monte Carlo data
FIGURE 8.2 The weight function W(E) as a function of E for the two-
dimensional Ising model on a 32 x 32 lattice. The original simulation
was performed at the critical temperature Tc = 2.269 with J = 1 and
the curves shown are (left to right) for T = Tc, 2.3, 2.4 and 2.6.
8.1.1 Implementation
The single histogram method is usually very simple to implement, as the ex-
ample given in the last section demonstrates. However, there is one problem
which crops up, particularly with simulations of large systems, which can
make direct implementation of the method problematic. Recall that the en-
ergies appearing in Equation (8.3) are total energies of the system simulated,
which means that they increase as the volume of the simulated system. For
larger systems therefore, it is not unusual for the exponentials to overflow
or underflow the range of reals which the computer can represent (which is
about 10+300 on most modern computers). There are a couple of tricks we
can use in this situation.
Since the properties of our system are independent of where we set the
origin of measured energies, we can add or subtract any constant EQ from
all energies Eu and still get the same answer for the expectation of any
observable quantity. (Quantities which depend directly on the energy will
of course be shifted by an amount corresponding to EQ, but this is easy to
correct for.) Thus we can reduce the probability of the exponentials causing
overflows or underflows by subtracting the mean energy EQ = (E) from all
energies before performing the histogram calculation. The range of energies
sampled by the calculation increases only as VN, where N is the size of
the system, so by using this trick we should be able to simulate significantly
larger systems before overflow occurs. Often this is sufficient to remove the
problem for all system sizes of interest. For very large systems however, it
may not be enough, in which case we can use another trick, similar to the
one we used for the density of states in the entropic sampling method (see
Section 6.3.3). Instead of calculating the terms in Equation (8.3) directly,
we calculate their logarithms, which have a much smaller dynamic range
and will never overflow the limits of the computer's accuracy. Evaluating
the equation then requires us to calculate a sum of quantities for which we
know only the logarithms. Moreover, we want to do this without taking the
exponential of these logarithms, since we are using the logarithms precisely
because the exponentials are expected to overflow the numerical limits of
our computer. We can get around this problem using the following trick.
Suppose l1 and l2 are the logarithms of two numbers x1 and 0:2, and that
l1 > l2. Then the logarithm of the sum of x1 and x2 is
The exponential inside the logarithm is, by hypothesis, less than one, so the
expression can be safely evaluated without risk of overflow. In the case where
l1 < 12 we can use the same expression but with l1 and l2 swapped around.
Most computers provide a library function which will evaluate log(l + x) ac-
218 Chapter 8: Analysing Monte Carlo data
curately, even for very small x (which is often the case with Equation (8.12)),
and if such a function is available we advise you to use it. (In C, for example,
it is usually called loglpO.)
Using Equation (8.12) requires more computational effort than perform-
ing the simple sums involved in Equation (8.3) and it can slow down the
calculation considerably. However, the time taken is usually still small com-
pared with the time invested in performing the Monte Carlo simulation itself,
so this is not normally an issue of any importance.
Another slight problem is that some observables, such as energy or mag-
netization, may have negative values, which means that we cannot take the
logarithms of the terms in the denominator of (8.3). Normally, however,
there is a way around this problem. In the case of the energy for instance,
since we are at liberty to shift its value by any constant amount EQ, we can
simply shift the energies by an amount big enough to make them all positive.
In the case of the magnetization, we normally work with the absolute value
anyway, so sign problems do not arise. If there is no other alternative, we
can always represent the absolute magnitude of a quantity using a logarithm
and store the sign of the quantity in a separate variable.
one feels that it ought to be possible to combine the estimates from the two
simulations to give a better estimate of (Q). Indeed, since every simulation
gives an estimate (however poor) of the value of (Q) at every temperature,
we should be able to combine all of these estimates in some fashion (presum-
ably giving greater weight to ones which are more accurate) to give the best
possible figure for (Q) given our several simulations. This in essence is the
idea behind the multiple histogram method, first proposed by Ferren-
berg and Swendsen in 1989. The multiple histogram method is in fact not
particularly closely related to the single histogram method of Section 8.1,
and it is probably better to regard it as an entirely new technique, rather
than an extension of the ideas of the last section.
There are a variety of ways in which we might reasonably combine esti-
mates of an observable derived from several different Monte Carlo calcula-
tions. The most obvious approach is to take some kind of weighted average
of the single histogram extrapolations from each of our simulations. It turns
out, however, that this approach is rather error-prone and does not give good
results except when the quality of our Monte Carlo data is very high. Here
we describe an alternative prescription which concentrates instead on the
evaluation of a best estimate for the density of states. From this estimate,
we can then derive the value of any other observables we are interested in.
As we did in the last section, we will start off by considering the observable
E—the internal energy—which is the simplest case. In Section 8.2.2 we give
the appropriate generalization to other quantities.
For a Monte Carlo simulation with normal Boltzmann importance sam-
pling, the probability p(E) of generating a state with energy E on any one
time-sten is
where the histogram N(E) is the number of times out of n that the system
was measured to have energy E. Equation (8.20) then tells us that the
2
We are assuming here that the system has discrete energies, as the Ising model for
example does. If it has a continuous spectrum of energies, as in the Heisenberg model
for example, then p(E) is replaced by p(E) dE, the number of states in an interval dE,
and the following equations are modified appropriately. As it turns out however, the final
equations for the multiple histogram interpolation in the continuous case are exactly the
same as for the discrete case considered here.
8.2 The multiple histogram method 221
Since p(E) is a single function which depends only on the system we are
studying, and not on the temperature, each of these estimates pi(E) is an
estimate of the same function. Now we ask ourselves, "What is the best
estimate we can make of the true density of states, given these many differ-
ent estimates from our different simulations?" To make the point clearer,
look at Figure 8.3, in which we have plotted the histograms Ni(E) of ener-
gies measured in five simulations of a two-dimensional Ising model at five
different temperatures. As we can see, the simulations sample different but
overlapping ranges of energy. Clearly Equation (8.23) will give a fairly good
measure of the density of states in a region where the corresponding histo-
gram Ni(E) has many samples. But in other regions where it has few (or
none at all) it will give a very poor estimate. What we want to do is form
222 Chapter 8: Analysing Monte Carlo data
a weighted average over the estimates pt(E) to get a good estimate over
the whole range of energies covered by our histograms. This average should
give more weight to individual estimates in regions where the corresponding
histograms have more samples.
The standard way to perform such a weighted average is as follows. If we
have a number of measurements Xi of a quantity x, each of which has some
associated standard error o1, then the best estimate of x is3
In fact, and this turns out to be an important point, the true error
is a function of the average histogram Ni(E) at temperature Bi, taken over
many runs.5
3
This assumes that the errors are normally distributed. As we will see, the errors on our
histograms are Poissonian, but it is a reasonable assumption to approximate Poissonian
errors as Gaussian in this case.
4
It is in fact not too difficult to prove this formula. Writing x as a weighted average
over the measurements, with weights Wi thus:
we can then calculate the difference between this estimate and the true value x, which
we don't know. Then we square this quantity and minimize, so as to make x as close
as possible to the true value. Making the assumption that the errors oi in the different
measurements are all independent random variables, we can then show that the best
estimate is achieved when Wi = 1/r?.
5
Similarly, we can make an estimate of the standard deviation of a quantity x by mak-
ing a number of measurements Xi and calculating the standard deviation of the sample.
However, the true standard deviation is only given by taking an infinite number of such
measurements.
8.2 The multiple histogram method 223
To clarify what we mean here, imagine making a very large number of sim-
ulations all at inverse temperature ft, taking n, measurements of E in each,
forming a histogram of each run, and then averaging the histograms bin by
bin. The square root of these average bins would then give us the correct
estimate of the error ANi (E) on any one histogram. The reason that this is
important is that Ni(E) is related to p(E) thus (see Equation (8.23)):
where we have used Equation (8.27). Recalling (Equation (8.24)) that the
weights in our weighted average over the pi(E) are 1/o5 i this shows, as
we contended above, that the best estimate of the true p(E) is arrived at
by weighting each pi(E) in proportion to the number of samples in the
corresponding histogram at that energy.6 Performing the weighted average,
our best estimate of p(E) is thus
where we have used Equation (8.27) again to get rid of the quantities Ni(E),
whose values we don't know.
Unfortunately however, we are not finished yet. This expression is still of
little use to us, because it contains the partition functions Zj of the system
at each of the simulated temperatures Bj on the right-hand side. These are
6
Strictly, it's the ideal histogram Ni(E), not the measured one.
224 Chapter 8: Analysing Monte Carlo data
also quantities which we don't know. We get around this by noting that the
partition function itself is given by
Combining this with Equation (8.30) we can then get an estimate of, for
instance, the internal energy:
Note that, just as with the single histogram method, the energies E appear-
ing in all of these formulae are total energies, not energies per spin. If we
use energies per spin the method will give erroneous results.
The multiple histogram method is undoubtedly quite a complicated tech-
nique, but in practice it gives excellent results. As an example, consider
Figure 8.4. Here we have taken the five Ising simulations which produced
the histograms in Figure 8.3 and calculated the internal energy for each one
by a direct average over the measured energies. These five measurements are
represented by the five large dots. Then we have used the five histograms
to evaluate the partition function at each of the five temperatures by iter-
ating Equation (8.31) until it converges to a stable fixed point. Then we
have calculated the internal energy over the entire range of temperatures
using Equations (8.32) and (8.33). The results are shown as the solid line in
the figure. As we can see, the line passes neatly through each of the large
dots, and interpolates smoothly between them. Finally, we have performed
a number of further simulations at temperatures intermediate between the
first simulations, as a check of the histogram interpolation. From these we
8.2 The multiple histogram method 225
have made direct measurements of the internal energy which we show as the
open circles on the figure.
We can also calculate the errors on our interpolated results by applying
any of the methods described in Section 3.4. For example, we could repeat
the entire histogram procedure several times over with bootstrap resampled
data sets (Section 3.4.3) drawn from the data produced by the simulations,
and calculate the variance of the interpolated results over all repetitions. In
fact, in the case of the calculation shown in Figure 8.4, the error on u is
smaller than the width of the line, so there isn't much point in our including
it on the graph.
As the figure shows, the histogram interpolation approximates extremely
well the true value of the internal energy in the regions between the five initial
simulations. This is the power of the multiple histogram method: it allows
us to perform a small number of simulations at different temperatures and
interpolate between them to get a good estimate of an observable over the
entire range,7 thereby saving us the effort of performing more simulations at
intermediate temperatures. The amount of CPU time involved in performing
7
Actually, the method also allows us to extrapolate a short distance outside the range
sampled by the simulations—it extrapolates well over the same range as the single his-
togram method (see Equation (8.10)).
226 Chapter 8: Analysing Monte Carlo data
the iteration of Equation (8.31) is small compared with the amount needed
to perform most Monte Carlo simulations, especially on large systems, so the
method can save us a lot of time when we want to measure some quantity
over a large range of temperatures.
8.2.1 Implementation
Before moving on to other topics, we should say a few words on how you
actually implement the multiple histogram method. First, how widely spaced
should our Monte Carlo simulations be to allow accurate interpolation using
the multiple histogram method? Well, the range over which any one of
our estimates pi(E) of the density of states is valid is the range over which
the corresponding histogram Ni(E) is significantly greater than one. Thus,
the linear combination used to estimate p(E) will only give good results over
the entire temperature range if the histograms from the different simulations
overlap, just as we suggested they should earlier. As in the case of the single
histogram method, we can write a criterion for this overlap in terms of the
standard deviations of the samples in the histograms, or in terms of the
specific heat. A simple version, which works perfectly well in most cases, is
just to double the temperature range allowed by Equation (8.10) and make
this the temperature separation of adjacent simulations.
The complicated part of the histogram calculation is the iteration of
Equation (8.31) to calculate the partition functions Zk. There are a number
of tricks which are useful here. The main problem is that the values of
the partition functions can become very large or very small—recall that the
partition function is exponentially dependent on the total energy E of the
states of the system. We can alleviate this problem somewhat by normalizing
the partition functions. Notice that Equation (8.31) remains unchanged if
we multiply all the Zk by the same normalization factor A. Furthermore,
any such normalization will cancel out of Equation (8.33) when we come to
calculate U. It makes sense to normalize the partition functions by some
factor which reduces the chances of their either over- or under-flowing the
numerical range of the computer (about 10±300). A simple way to achieve
this is to set A equal to one over the geometric mean of the largest and
smallest partition functions at each iteration:
This makes the normalized values of Ziarge and Zsmall reciprocals of one
another, and therefore equally far away from the limits of precision of the
machine.
It is not unusual however, for the ratio Zlarge/Zsmall to exceed even the
600 orders of magnitude over which a typical computer can represent real
8.2 The multiple histogram method 227
numbers accurately, in which case the trick above will not be sufficient and
at least one of our partition functions will pass outside the allowed range,
causing the histogram method to fail. This is particularly common with
large system sizes, since the value of the partition function increases expo-
nentially with the volume of the system simulated. In this case, we can use
the same trick as we did in the single histogram method and work with the
logarithms of the partition functions. (The partition function is always pos-
itive, so there is no danger of our trying to take the logarithm of a negative
number.) To evaluate the sums in Equations (8.31), (8.32) and (8.33) we
use Equation (8.12).
Another issue is the question of what the starting values of the Zk should
be, and how many iterations we need to perform to calculate their final
value. The starting values are in fact not particularly important. The only
crucial point is that they should be greater than zero. In the calculations
for Figure 8.4 we used starting values of Zk = 1 for all k. If we were
performing the calculation using the logarithms of the partition functions,
then the equivalent starting values of the logarithms would be zero. As
for the number of iterations we should use, we can estimate this from the
amount by which the partition functions change on each iteration. In the
calculations presented here, we gauged this by calculating the quantity
where Z™ is the value of Zk at the mth iteration. When this quantity falls
below some predefined target value e2 we know that the fractional change
per iteration in the most quickly changing of the Zk is less than e. In
our calculations we used a value of e = 10-7. (Strictly, this gives us no
guarantees about the accuracy of our results, since we have no information
about the speed with which the iteration converges. In practice however,
it converges very fast—exponentially in fact—and the criterion above gives
perfectly good results.)
A further point worth noting is that, as with the single histogram method,
it is not necessary to express all the formulae in terms of actual histograms
N(E). In fact, it is often more convenient to work with the raw data them-
selves. We can rewrite the fundamental iterative relation, Equation (8.31),
as follows:
228 Chapter 8: Analysing Monte Carlo data
where the sum over s is over all states sampled during the ith simulation,
and Eia is the total energy of such a state. Equation (8.32) can similarly be
rewritten
As with the single histogram method, these forms of the equations are also
better if we are studying systems with continuous energy spectra, since in
that case the construction of a histogram necessarily entails throwing away
some of the information from our Monte Carlo simulation. The equations
above by contrast make use of all the available information and so will in
general give a more accurate answer for U .
chapter is devoted. In this section we study the most widely used technique
for extracting the values of exponents, finite size scaling. In Section 8.4
we study "Monte Carlo renormalization group" techniques, which are more
complex but can in some cases give better answers.
then in the thermodynamic limit (i.e., infinite system size) the behaviour of
the correlation length £ in the critical region is given by
This equation is only useful below the critical temperature however, since m
is zero for T > Tc.
How should we go about measuring these critical exponents? The most
obvious approach is the direct one, simply to fit the data from our Monte
Carlo simulations to the suggested asymptotic forms. For example, in Fig-
ure 8.5 we show a number of measurements of the magnetic susceptibility of
a 1000 x 1000 two-dimensional Ising model below its critical temperature,
made using the Swendsen-Wang cluster algorithm of Section 4.4.1. We have
plotted them on logarithmic scales against the values of the reduced temper-
ature t and, as the figure shows, they tend to what appears to be a straight
line as we approach the critical temperature at t = 0. The slope of this line
should give us the value of the critical exponent 7 defined in Equation (8.42).
8.3 Finite size scaling 231
Fitting a straight line to the last twenty or so points, as shown in the fig-
ure, gives us a value of 7 = 1.76 ± 0.02, which compares very well with the
known exact result 7 = |. We could probably improve on this value still
further by using the multiple histogram method of Section 8.2 to interpolate
between the points in the figure. However, there is in fact little point in
doing this, because this direct method of measuring exponents turns out to
have a number of problems which make it unsuitable for general use. One
problem is already evident in Figure 8.5. The simple power-law form of
the susceptibility is only followed closely when we are very near the critical
temperature. Away from Tc the line in the figure is no longer straight. In
order to perform a fit to the data, we need to estimate where the critical
region ends, and the value we get for the exponent 7 varies depending on
this estimate. This makes the error on 7 difficult to estimate. Furthermore,
some systems, such as the random-field Ising model of Section 6.1.1, show
two different values of their critical exponents. The real one is only seen
if you get sufficiently close to the critical temperature. Further away the
system appears to behave according to Equation (8.42), or the equivalent
for the exponent of interest, but with the wrong value of the exponent. Such
"cross-over" effects are very difficult to allow for if we don't know where the
cross-over actually takes place.
Another problem is that in order to calculate the reduced temperature t
we need to know the value of the critical temperature (see Equation (8.40)).
232 Chapter 8: Analysing Monte Carlo data
In general we don't know this value, which makes it difficult to perform the
fit. It is possible to guess Tc and then vary the guess to make the line in
Figure 8.5 as close to straight as possible. However, this process is highly
susceptible to error. The curvature which is already present in the line as
we move away from Tc can be particularly misleading when we are trying
to gauge the straightness of the line, and it also turns out that rather small
miscalculations in the value of Tc can lead to large errors in the measured
critical exponent.
There are also other reasons for avoiding the direct method of measuring
exponents. It requires simulations of rather high accuracy to give reasonable
results, and it requires us to work with large systems to avoid problems with
"finite size effects" (see below). In short, the method is to be avoided. In its
place another technique has been developed which circumvents all of these
problems: finite size scaling.
and
The precise way in which the susceptibility gets cut off close to Tc is contained
in the functional form of Xo- It is this function which we will measure in our
Monte Carlo simulations.
Equation (8.46) in fact contains all the information we need about the
behaviour of our system with varying system size. However, it is not in a very
useful form, since it still contains the variable &, the correlation length at
temperature t in the infinite system, which we don't know. For this reason
it is both conventional and convenient to reorganize the equation a little.
Defining a new dimensionless function X thus:
In fact, to be strictly correct, we should have two equations such as this, one
for positive and one for negative values of t with different functions Xi since
the behaviour of x is not symmetric on the two sides of the phase transition.
However, we can easily combine these two equations into one by extending
the definition of x(x) to negative values of x. Then we can write
This is the basic equation for the finite size behaviour of the magnetic sus-
ceptibility. It tells us how the susceptibility should vary with system size L
for finite systems close to the critical temperature. Note that we have de-
rived this equation for the susceptibility per spin as defined in Section 1.2.2.
If we were to use the extensive susceptibility, the leading power of L would
be LT/v+d instead of just Ly/v, where d is the dimensionality of the sys-
tem. It is very important to recognize this distinction if you want to get the
234 Chapter 8: Analysing Monte Carlo data
correct answers for exponents using the finite size scaling method. All the
equations given in this section are correct for intensive quantities but need
to be modified if you are going to use extensive ones.8
Equation (8.51) contains the unknown function x(x) which we call the
scaling function for the susceptibility. Although the scaling function is
unknown, there are certain things we do know about it. Equation (8.48)
tells us that
In other words, x is finite at the origin, which in this case means close to
the critical temperature. Another important point is that, by design, all
the L-dependence of x is displayed explicitly in Equation (8.51); the scaling
function does not contain any extra hidden dependence on L which is not
accounted for. In other words, if we measure x(x) we should get the same
result regardless of the size of the system. It is this last fact that allows us
to use Equation (8.51) to calculate the exponents 7 and v and the value of
the critical temperature.
Suppose we perform a set of Monte Carlo calculations of the system of
interest for a variety of different system sizes L over a range of temperatures
close to where we believe the critical temperature to be. (With this method
we do not need to be exactly at the critical temperature, only in the rough
vicinity. We can estimate where this region is by, for example, looking for
the tell-tale peak in the magnetic susceptibility or the specific heat—see
Figure 3.10.) For each system size, we measure the magnetic susceptibility
XL(i) at a set of temperatures t. We can now rearrange Equation (8.51) thus
to get an estimate of the scaling function x for several different values of the
scaling variable
for each system size. Since the scaling function is supposed to be the same
for all system sizes, these estimates should coincide with one another—they
should all fall on the same curve if we plot them together on one graph.
However—and this is the crucial point—this will only happen if we use the
correct values of the exponents 7 and v in Equation (8.53). Also, although
it's not immediately obvious from the equation, we must use the correct
value of the critical temperature Tc, which enters in the calculation of the
reduced temperature t through Equation (8.40). The idea behind finite size
scaling therefore is to calculate x(x) for eacn °f our different system sizes,
8
The reader might be interested to work out where in the preceding derivation the
extra powers of L would come in if we were to use the extensive susceptibility.
8.3 Finite size scaling 235
FIGURE 8.6 Data collapse of magnetic susceptibility data for the two
dimensional Ising model. The points are taken from Monte Carlo mea-
surements of the susceptibility for five different sizes of system as indi-
cated. From this collapse we find 7 = 1.76, v = 1.00 and Tc = 2.27J.
Notice that the collapse fails once we get sufficiently far away from the
critical temperature (t = 0).
and then vary the exponents 7 and v and the critical temperature until the
resulting curves all fall or collapse on top of one another. An example of
such a calculation is shown for the two-dimensional Ising model in Figure 8.6.
Here we performed simulations around the critical temperature for square
systems of size L = 10, 20, 50, 100 and 200. The best data collapse is
obtained when 7 = 1.76 ± 0.01, v = 1.00 ± 0.05 and TC/J = 2.27 ± 0.01.
These values are in good agreement with the exact known values of 7 = |,
v = 1 and Tc — 2.269J. (The error bars are quite rough here, calculated
by estimating the region over which the collapse appears optimal. Error
estimation is discussed in more detail in the following section.)
This method can easily be extended to quantities other than the suscep-
tibility. For the Ising model for example, we can derive scaling equations
similar to (8.51) for the specific heat and the magnetization by arguments
closely similar to the ones given above. The results are:
Performing data collapses using these equations yields values for a and (3,
as well as values for v and Tc again. (If we perform collapses for a num-
ber of different quantities, we can use the several values of v and Tc which
236 Chapter 8: Analysing Monte Carlo data
This however is a little difficult to estimate, since the points at which the
scaling function is evaluated are different for each system size. What we need
is some way of interpolating between these points, and the perfect technique
is provided by the multiple histogram method of Section 8.2. If we use this
method, then we can directly evaluate (8.57) by using for example a simple
trapezium rule integration (or any other form of numerical integration that
we happen to favour) and then minimize it to give an estimate of the critical
exponents and Tc. Bootstrapping then gives an estimate of our statistical
errors.
There is still a problem however, which is evident in Figure 8.6: if we
stray too far from the critical temperature, our scaling equation (8.51) is
no longer correct, simply because we are no longer in the critical region.
8.3 Finite size scaling 237
the scaling function is the same for each system size, the values X0 of the
scaling variable at which we find these maxima should be the same for all
system sizes. From Equations (8.40) and (8.54) we see that the temperature
TO corresponding to XQ is given by
Thus, if we plot TO against L -1/v, the resulting points should lie on a straight
line, provided we use the correct value of v. And the intercept of this line
with the vertical axis gives us an estimate of Tc. So, for example, we can fit
our data points to a straight line using a least-squares fit, and minimize the
variance on the fit to give the best straight line. The result is an estimate
of both v and the critical temperature. As before, we can use any of our
standard methods of error estimation to calculate the errors on these values.
Once we have estimated v and Tc in this fashion, we can use them to make
estimates of the values of the other exponents. Since the scaling function is
the same for all system sizes, its value x(xo) at X0 should be independent of
L. The value of the susceptibility at its maximum should therefore take the
particularly simple form
with x(xo) being the constant of proportionality. Thus if we plot the maxi-
mum value of x as a function of L on logarithmic scales we should again get
8.3 Finite size scaling 239
a straight line. The slope of this line gives us a measure of 7/v, and hence
7, since we already know v. We can calculate an error on our figure just as
before.
In Figure 8.8 we show an example of the application of this method,
again to the random-field Ising model, taken from Rieger (1995). The main
figure shows the calculation of v and Tc from Equation (8.58) and the inset
shows the calculation of 7. The results for the exponents, v = 1.1 ± 0.2
and 7 = 1.7 ± 0.2, are in respectable agreement with those from the other
method.
A third problem with the finite size scaling method as we have described it
here is that scaling equations such as Equation (8.51) are only approximate,
even close to the critical point. In particular, the arguments leading to the
scaling equations are only valid for sufficiently large system sizes. If the
system size L becomes small—how small depends on the particular model
being studied—there are correction terms in the equations which become
important. These terms can lead to systematic errors in the values of our
exponents if they are not taken into account in the analysis. In fact, except
for very high-resolution studies, such corrections are usually not important.
Furthermore, a detailed discussion of them would take us some way away
from the Monte Carlo simulations which are the principal topic of this book.
For more information on corrections to the scaling forms discussed here, we
therefore refer the reader to other sources. For the case of the normal Ising
240 Chapter 8: Analysing Monte Carlo data
model a good discussion has been given by Ferrenberg and Landau (1991).
Binder and Heermann (1992) also cover the subject in some detail.
there were more ups or downs in the original block of four spins. If there
were exactly two ups and two downs then we choose the value of the new
block spin at random. As we can see, the blocked system preserves the gross
features of the spin configuration of the original system, but only has one
quarter as many spins. This is made clearer in the remaining two frames
of the figure, in which we have performed the same procedure on a larger
200 x 200 Ising model, reducing it by blocking to a 100 x 100 one. It is
clear from these frames that the blocked system does indeed preserve the
large-scale features of the configuration of the system.
The blocking procedure used here shrinks the size L of the system (mea-
sured in lattice spacings) by a factor of 2 in each direction. This rescaling
factor is normally denoted b, and the number of spins in the system de-
creases on blocking by bd, where d is the dimensionality of the system. The
choice b = 2 is a common one for square or cubic lattices. For other lat-
tices other choices are appropriate. For a triangular lattice, for example, the
242 Chapter 8: Analysing Monte Carlo data
most common blocking scheme groups the spins in blocks of three, giving a
rescaling factor of b = - 3
The "majority rule" described here is not the only way of performing
the blocking. Many others have been suggested, with different ones being
appropriate for different systems. For Ising models another common choice
is the decimation procedure, in which we discard all but one spin from
each block and set the new block spin equal to that one. In the example of
Figure 8.9 for instance, we might discard all but the top left spin Of each
block of four and use the value of this one spin for our new block spin. The
decimation method also works well with Potts models (see Section 4.5.1).
For models with a continuum of spin values a simple additive rule may be
appropriate, in which the renormalized spin is a (suitably normalized) sum of
the spins in the original block. For the moment, however, let us concentrate
on the simple case of the Ising model and the majority rule.
We now come to the crucial assumption of the renormalization group
method. We assume that the blocked spin configuration is a typical spin
configuration of another Ising model on a lattice of dimension L' = L/b* In
other words, imagine sampling a set of states of our original Ising system,
each with the correct Boltzmann probability for some temperature T, for
example by performing a Monte Carlo simulation. For each state we per-
form the blocking procedure described above and generate a blocked state
of the smaller system. Our assumption is that the series of blocked states
we generate also appear with their correct Boltzmann probabilities, if we
calculate their energies using the same Ising Hamiltonian as we used for the
original system. In fact, as we might well imagine, this assumption is not
normally correct and this is the primary source of the uncontrolled errors in
the method, to which we alluded earlier. However, for the moment let us
accept this assumption and see where it leads us.
It is clear that the blocked states cannot normally appear with the Boltz-
mann probabilities appropriate to the same temperature T as the original set
of states which we started with. To see this, consider the correlation length
£ of the system. (See Section 3.7.1 for a definition of the correlation length.)
When we block the system, the average correlation between two spins which
are far apart must remain approximately the same, since, as we pointed out,
the large-scale features of the system do not change on blocking. This means
that the correlation length of the system should also stay the same, except
that the number of spins in the system decreases by a factor of 6 in each
direction. This in turn means that, when measured in terms of the lattice
spacing, the correlation length of the blocked system is
(In general, we will use primed variables to denote the values of quantities in
the blocked system.) In the case of the systems illustrated in Figure 8.9 we
8.4 Monte Carlo renormaliz&tion group 243
have b = 2, so that the rescaled system has a correlation length a half that of
the original system, when measured in terms of the lattice parameter. As we
know, the correlation length varies with temperature, so that the blocked
states of the system must be typical of a different temperature from the
original ones. We assume then that the blocked states are typical of an Ising
model at a temperature T' for which the correlation length is a half that of
the original system.
Now suppose that we calculate some measurable property of our system,
such as, for example, the internal energy per spin u, and average it over
our complete sequence of states. Since internal energy is also a function
of temperature, we will presumably get two different answers for the aver-
age internal energies u and u' of the original and rescaled systems. To the
extent that the blocked states appear with the correct Boltzmann weights
for a system at temperature T', the internal energy u' of the rescaled sys-
tem is presumably the appropriate internal energy for the system at that
temperature.
Here, then, is the clever bit: there is one temperature at which the corre-
lation length of the system is the same for the original and rescaled systems
and that is the critical temperature. At this temperature, as discussed in
Section 3.7.1, the correlation length becomes infinite, which means that the
rescaled correlation length, Equation (8.60), is also infinite. At this one
point, therefore, our original and rescaled systems have the same correlation
length and hence the same temperature T — T' = Tc. They also therefore
have the same values of all other (intensive) quantities such as the internal
energy per spin. So here is our scheme for calculating the critical tempera-
ture:
1. We perform a Monte Carlo simulation at temperature T and calculate
the internal energy u.
2. We take each of the states generated by our simulation and block them,
using for example the majority rule blocking scheme described above.
3. We calculate the internal energy u' for the blocked system, averaged
over all of these blocked states.
4. Now we vary the temperature T until we find the point at which u' = u.
This is the critical temperature of the system.
As a practical consideration, the part about varying the temperature can
be done most efficiently using the single or multiple histogram techniques of
Sections 8.1 and 8.2, rather than performing a separate simulation for every
temperature we are interested in. In order to extrapolate the value of the
internal energy of the rescaled system, we treat it as an observable quantity
in the original system (albeit one with a slightly unusual definition), and
reweight with the Boltzmann factors appropriate to the original system.
244 Chapter 8: Analysing Monte Carlo data
(Notice the arrangement of the primes on the variables here—it might not
be exactly what you expect, but it is correct.) T' is thus given in terms of
T by
many blocks in which three or four spins are pointing up, and the majority
rule then transforms these all into upward-pointing blocked spins. In other
words, a minority of downward-pointing spins tends to get "washed out" in
the blocking process, leaving a larger majority of up-spins in the rescaled
system, a configuration typical of a lower temperature T' < T.
So how do we use our knowledge of the renormalization group transfor-
mation to calculate a critical exponent? Consider the exponent v, defined
in Equation (4.2), which we repeat here for convenience:
Recall that the variable t is the reduced temperature, which measures our
distance from the critical point:
In our blocked system, the rescaled correlation length £' is also given by
Equation (8.63), except that we must substitute in the transformed temper-
ature T', giving
8.4 Monte Carlo renormalization group 249
Substituting into Equation (8.66) using (8.64), and rearranging, we then get
(We use the magnetization per site here again, although if, as we recommend,
you compare the magnetization of the blocked system with a separate simu-
lation of a system of the same size, it makes no difference whether you use the
total magnetization or the magnetization per site.) Using Equation (8.63),
we can rewrite this as
Dividing these two equations one by the other, and using Equation (8.60),
we get
and hence
The trouble with this equation is that it is strictly only true in an infinite
system, since Equation (8.69) is only true in an infinite system. On the other
hand, if we did know the magnetization for the infinite system, we wouldn't
be able to calculate m'/m at Tc, because both m and m' are zero at this
point. Instead however we can use 1'Hopital's rule, which says that as m and
m' go to zero at Tc, the limiting value of their ratio is
8.4 Monte Carlo renormalization group 251
And thus
The clever thing about this equation is that the derivativedm/dmdoes not
vary much as we vary the size of the system simulated, so that the equa-
tion gives good results for finite systems and not just infinite ones. This
makes Equation (8.75) a better way to calculate B in a Monte Carlo simula-
tion than Equation (8.73). Normally, in a finite system we should interpret
m as meaning the absolute value of the magnetization, as we discussed in
Section 3.7.1.
Note the similarity of form between Equation (8.75) and Equation (8.68)
for the exponent v. We can also, by a very similar line of argument, derive
equivalent equations for other exponents. For example, the exponents a and
7 are given by
Note the minus signs, which arise from the slight difference between the
definitions of the exponents a and 7, and that of B.
In the simplest case, we can evaluate the derivatives numerically, as we
did for the calculation of v, although we will shortly see that there is a
better way of evaluating these exponents which avoids the calculation of a
derivative altogether.11
To calculate this exponent we use a method analogous to that for the calcu-
lation of B. First we look at the way in which the correlation length diverges
11ln fact, we can in this case do better by noting that dm'/dT and dm/dT can both
be calculated by analytically differentiating the single histogram formula, Equation (8.3),
with respect to T and then using Equation (8.74). We can do a similar thing for the
derivatives in Equations (8.76) and (8.77). Unfortunately, there is no equivalent trick for
the derivative in Equation (8.68).
252 Chapter 8: Analysing Monte Carlo data
transformation will work well for a particular model. As we saw, the major-
ity rule transformation of Section 8.4.1 worked well for the two-dimensional
Ising model, but other transformations might not have been so successful,
and the majority rule might not work so well with other models.
There is however a way to improve the accuracy of our renormalization
group transformation, regardless of the blocking scheme we use, although
some schemes will still give better results than others. The trick is to intro-
duce longer-range interactions into our Hamiltonian.
Consider once more the example of the two-dimensional Ising model.
For this model we perform a simulation at temperature T of some system
of size L x L, and block each state generated to give a sequence of states
of a system a factor of b smaller along each side. These states will to some
approximation be typical of an Ising model at another temperature T', but
this is only an approximation. We can make this approximation better by
regarding the blocked states instead as the states of a system which has extra
interaction parameters in its Hamiltonian. For example, we might introduce
a next-nearest-neighbour interaction, or longer interactions still. If we are
working with zero magnetic field, then symmetry dictates that all of these
interactions involve an even number of spins on the lattice. Thus three-spin
interactions are not allowed, but we might include perhaps the interactions of
groups of four spins in a square. It is clear that if we introduce enough such
interactions, we can approximate the behaviour of the blocked system to
any desired degree of accuracy, since the more parameters we add, the more
freedom we have to tune the Hamiltonian to take the desired value in any
particular state. It turns out in practice, however, that quite good results
can be achieved by including only a small number of interactions—usually
no more than five or six—if they are carefully chosen.
Let us write a general expression for the Hamiltonian of this generalized
Ising model thus:
Here we have absorbed the factor of the inverse temperature B which appears
in the Boltzmann weight into the coupling constants Ki. The quantities 5,
represent sums over all sets of similar groups of spins on the lattice. For
example, one possible set is the set of all nearest neighbours:
Other possibilities are the set of next-nearest neighbours, and the set of all
groups of four spins in a square, as described above.
Our normal Ising model with nearest-neighbour interactions only is de-
fined by setting all the Ki to zero except for the one which multiplies 5nn.
254 Chapter 8: Analysing Monte Carlo data
However, even if we start off simulating this model, the blocked states gener-
ated in the simulation will usually be best represented using a model in which
some of the longer-range interactions are non-zero (although the nearest-
neighbour interaction is probably still the largest of the parameters Ki). In
other words, the renormalization group flows (see Section 8.4.2) move away
from the line in .K-space representing the normal Ising model into the larger
space of the generalized model. However, and this is a crucial result, the
critical exponents of this generalized model are the same as those of the
original model which did not possess any longer-range interactions. We will
not actually prove this result here. It is another consequence of the phe-
nomenon of universality which we discussed in Section 4.1. It is proved in,
for example, Binney et al. (1992). Here, we will just make use of the result.
If we can find the values of the coupling constants K'i for the blocked sys-
tem near the fixed point of the renormalization group transformation in our
new higher-dimensional space of parameters Ki, then we can calculate the
critical exponents of the generalized model, which are the same as those for
the normal Ising model. However, the renormalization group transformation
for the generalized model is more accurate than the one we found for the
normal Ising model, since we have more parameters to play with, and this
improves the accuracy of our results. By introducing longer and longer range
interactions we can, in a moderately systematic way, improve the accuracy
of the calculation, as well as making a rough estimate of the error introduced
by truncating the Hamiltonian at interactions of some finite range.
We calculate the critical exponents for our generalized model as follows.
Denoting by K and K' two vectors whose elements are the couplings Ki
and K'i for the original and rescaled systems, we consider K' to be a func-
tion of K via a renormalization group transformation, just as we did in the
simpler one-parameter case of Section 8.4.2. Then, by analogy with Equa-
tion (8.67), we linearize this transformation about the critical fixed point,
which is traditionally denoted K*. This gives
Now we change variables to a new set {xi} which are linear combinations
of the components Ki — K*:
so that
We choose the matrix Q to make QMQ-1 diagonal, and thus end up with
where the Ai are the eigenvalues of M. Given that the correlation length
diverges at the critical fixed point, we can define an exponent v/i governing
that divergence along the direction of any of the eigenvectors of M thus:
Now, using the same arguments which led us to Equation (8.68), we can
show that
Note that this gives a negative value of vi for any eigenvalues Ai which are
less than one. There are some standard terms used to describe this situa-
tion. Relevant variables are the variables Xi which grow under the renor-
malization group flows defined by the matrix M because the corresponding
eigenvalue in Equation (8.90) is greater than one. Irrelevant variables are
those which get smaller under the renormalization group flows because the
corresponding eigenvalue is less than one. In the special case where A = 1
the corresponding variable a;, is said to be marginally relevant. Only the
relevant variables contribute to the power-law divergence in the correlation
length as we approach the critical fixed point. Marginally relevant ones
can produce logarithmic divergences, though we have to examine each case
separately to determine whether they in fact do.
The number of relevant variables is also the number of independent criti-
cal exponents in the model. It turns out that all the other exponents studied
earlier, such as a, B, 7 and 6 in the Ising model, can be calculated from the
values of the exponents Vi defined in Equation (8.92). In the case of the Ising
model, it turns out that there are just two independent critical exponents,
the exponents v and 9 encountered earlier. We should emphasize that it is
by no means obvious that there should only be two. There are many models
which have only one independent exponent, or more than two, and many
more for which we do not yet know how many there are. The number of
256 Chapter 8: Analysing Monte Carlo data
where d is the dimensionality of the system. We will not prove these relations
here; we merely state them for completeness. The interested reader can find
a proof in any good book on critical phenomena.
where the quantities S'i are the same as those appearing in Equation (8.83)
except that they are evaluated using the blocked spins, rather than the
original spins. Let us denote by (Si) and (S'i) our best estimates of the
values of Si and S't averaged over the states sampled in our Monte Carlo
simulation. Now we write
where we have made use of the linear response theorem, Equation (1.22),
to calculate the expressions on the right-hand side. (The factor of B in
Equation (1.22) has disappeared because we absorbed a factor of (3 into our
definition of the quantities Ki and K'i in Equations (8.83) and (8.97).)
Our calculation then proceeds as follows: we evaluate the two matri-
ces A and B by calculating the correlation functions, Equations (8.100)
and (8.101), and from these we calculate M = B- 1 A. Then we diago-
nalize this matrix to find the eigenvalues Ai, and for each A, > 1 we cal-
culate one independent exponent using Equation (8.92). From these, all
the other exponents can then be calculated, using scaling relations such as
Equations (8.93-8.96).
The only remaining issue which we have not addressed is the question
of how we find the position of the critical fixed point itself. In fact, as we
saw in Figure 8.12, the values of the critical exponents are not very sensi-
tive to the position at which we evaluate our eigenvalues, which is one of
the nice things about this method. However, we still need to make a rough
estimate of the position of the fixed point. In the case where we explicitly cal-
culated the renormalization group transformation for the temperature (see
Equation (8.62)), this simply meant finding the fixed point of that trans-
formation. In the present case however, where we don't explicitly calculate
the transformation, it is not so obvious what we should do. One solution
has been given by Swendsen (1979). In his calculations, he performed two
consecutive iterations of the blocking procedure, creating systems of size b
and 62 smaller than the original system simulated. He then calculated the
exponent v for each of these systems using the methods outlined above. Ex-
actly at the critical point, these calculations should give the same answer,
apart from finite size effects, but away from the critical point, where the
parameters Ki drift with the renormalization group flows, the two will give
slightly different answers. By minimizing the difference between the two
values for v, Swendsen was able to make an estimate of the position of the
critical fixed point with sufficient accuracy to give good measurements of the
elements of the matrix M.
As an example of this technique, Blote and Swendsen (1979) calculated
critical exponents for the three-dimensional Ising model on a cubic lattice. In
their calculation they included in the Hamiltonian all two-spin interactions
up to fourth-nearest neighbours, and also all four-spins-in-a-square interac-
tions. Since they only included interactions which are even in the number
of spins12 their calculation only gives information about the exponent v and
12
In order to study 0 we need to introduce an external magnetic field, which couples to
an odd number of spins. The up-down symmetry of the Ising model then ensures that
258 Chapter 8: Analysing Monte Carlo data
Problems
8.1 Equation (8.10) gives a criterion for estimating the temperature range
over which the single histogram extrapolation is reliable. For a simulation
performed at the critical temperature, how does this scale with system size?
(Hint: notice that Equation (8.10) involves the total specific heat, not the
specific heat per site.)
8.2 The criterion given in Equation (8.10) is only a rough one. In partic-
ular, it does not, as we pointed out, take into account the improvement in
the extrapolation range on increasing the length of our Monte Carlo simu-
lation. Assuming that the distribution of energies sampled by a simulation
is roughly Gaussian, show that the extrapolation range increases as 1og n,
where n is the number of independent energy measurements made during
the simulation.
8.3 Suppose we want to use the multiple histogram method to calculate the
value of a quantity over a certain temperature range. How does the number
of different simulations needed vary with the size of the system?
the Ki for the even and odd interactions transform separately. In a calculation such as
that of Blote and Swendsen, in which no odd interactions were introduced to begin with,
there can thus never be any odd ones appearing.
8.4 Monte Carlo renormalization group 259
8.4 Why does the peak of the scaling function in Figure 8.6 fall slightly
above the true critical point t = 0?
8.5 Derive the equivalent of Equation (8.51) for the scaling of the correlation
time T.
8.6 Write a program to perform a Metropolis Monte Carlo simulation of
the four-dimensional Ising model, and use it to estimate the missing number
in Table 4.1, the dynamic exponent of the Metropolis algorithm in four
dimensions, by finite size scaling using the formula derived in the previous
problem. (Hint: you might want to take the Metropolis program given in
Appendix B as a starting point. Further hint: we don't know the answer to
this problem. As we said in Chapter 4, this calculation has never been done,
as far as we know. You probably shouldn't attempt this problem unless you
have a lot of spare time.)
This page intentionally left blank
Part II
Out-of-equilibrium
simulations
This page intentionally left blank
9
The principles of
out-of-equilibrium Monte Carlo
simulation
9.1 Dynamics
The Wolff algorithm of Section 4.2 for example is particularly good for sim-
ulations near the critical temperature because it has a dynamic exponent
close to zero. If we are interested in the way in which the Ising model comes
to equilibrium under a single-spin-flip dynamics such as the heat-bath algo-
rithm, then under no circumstances can we perform a simulation using the
Wolff algorithm to study the problem. This is a great shame, since it means
that many of the cleverest ideas which we saw in Part I of this book are in-
applicable to simulations of out-of-equilibrium systems. Cluster algorithms,
non-local algorithms for conserved-order-parameter models, simulated tem-
pering and entropic sampling algorithms for glassy systems, and any number
of other ingenious methods cannot be applied because they will give com-
pletely erroneous answers. What then can we do to improve the efficiency
of these calculations?
There are a number of tricks which can be used to speed up our calcu-
lations without altering the dynamics. One of the most important we have
seen already. The continuous time Monte Carlo method, which was intro-
duced in Section 2.4 and which we applied to the conserved-order-parameter
Ising model in Section 5.2.1, is a way of improving the efficiency of a Monte
Carlo algorithm, particularly at low temperatures, without altering the ef-
fective dynamics of the algorithm. In Section 10.3.2 we apply the continuous
time method to the problem of domain growth in an equilibrating conserved-
266 Chapter 9: Out-of-equilibrium Monte Carlo simulations
order-parameter model, and show that one can achieve a speed-up of better
than a factor of 108 over the normal Metropolis method under certain cir-
cumstances.
Another way of speeding up out-of-equilibrium Monte Carlo simulations
is to look for mappings which turn a model of interest into another model
which may be simulated more efficiently. In Chapter 12 we study the "rep-
ton model" of DNA gel electrophoresis. This is a polymer model which
is normally defined in two or three dimensions. We show how the model
can be mapped onto a much simpler model in one dimension, which can
then be simulated efficiently using a relatively straightforward Monte Carlo
algorithm.
A third method for speeding up our calculations, and perhaps the one
which has been exploited most thoroughly, is to use any of a variety of pro-
gramming tricks to improve the efficiency of our code. Some of these tricks
we have seen before, such as the calculation of look-up tables of quantities
needed by the algorithm. Others include using parallel computers, which
we discuss in Chapter 14, and "multispin coding", which is discussed in
Chapter 15.
rather than jumping in at the deep end with a realistic simulation of a real
material.
In other cases, where we are concerned with simulating a real material,
it may be possible to use our understanding of the physics of that material
to work out what dynamics we should use in our Monte Carlo algorithm.
In Chapter 11, for instance, we look at the simulation of adatoms on metal
surfaces. In this case quite accurate methods are available for studying
the dynamics of individual surface atoms, including a variety of semiclassi-
cal techniques for calculating the interaction energies of the atoms. These
methods allow us to make an estimate of the correct dynamics, so as to make
our Monte Carlo algorithm mimic the true system as closely as possible.
The simulation of adatom diffusion however highlights another problem—
the existence of energy barriers between states. When we are interested only
in equilibrium properties of a system then the condition of detailed balance,
Equation (2.2.3), tells us that we only need to know the energies of the states
themselves in order to construct a correct algorithm. If we are interested
in the out-of-equilibrium behaviour however, we need to know the way in
which the energy varies "between" the states of the simulation. In going
from one state u to another v (for example, when an adatom moves from
one hollow on a metal surface to another), the system must pass through
a succession of intermediate states with higher energy, and the rate for the
transition u —> v depends on the energies of these intermediates. In effect,
there is an energy barrier between the two states, and we expect the rate
for the system to cross this barrier to go exponentially with its height (the
so-called Arrhenius law—see Section 11.1.1).
There is no reason why any system we study should not have energy
barriers between states. In the Ising model, for instance, we can introduce
energy barriers for the flipping of individual spins by putting in extra inter-
mediate states between up and down spin states which have higher energy.
Such barriers can have a profound effect on the dynamics of the model, and
in fact a number of authors have looked into this very problem in some detail.
So our focus in the next few chapters will be to look at the simulation
of out-of-equilibrium systems, giving a number of examples of algorithms
for particular models which illustrate some of the techniques in use in this
area. First, we start with a chapter on the Ising model, which is simple and
familiar, but nonetheless illustrates a number of important points.
10
Non-equilibrium simulations of
the Ising model
temperature, one above the critical point, to one below it and watching the
formation of these domains and the way in which they grow. The formation
and growth of domains of this kind is often called spinodal decomposi-
tion. (The spinodal is the line on the phase diagram between the regions
in which phase separation takes place and the regions in which it does not.1)
You might well ask what use it could be to simulate the Ising model
when, as we pointed out in Chapter 5, its properties are only rather loosely
related to the properties of real fluids. Previously we got around this prob-
lem by noticing that the COP Ising model actually gives a good estimate of
some equilibrium properties of a real gas because the two systems fall in the
same universality class. This means that quantities like critical exponents
(see Section 4.1) take the same values for both systems, despite the obvi-
ous differences in the physics involved. As discussed in Section 9.1 however,
this argument does not apply to the out-of-equilibrium properties of the two
models. Even models in the same universality class can give completely dif-
ferent results when they are not in equilibrium. So what makes us think that
the Ising model can tell us anything about the non-equilibrium properties of
real fluids?
The answer is that certain properties of domain growth in spinodal de-
composition fall into dynamic universality classes of the type discussed in
Section 9.1. In particular, the average dimension R of the domains grows as
a power law R ~ tx with time t, where the value of a depends only on the
coarse properties of the model and the dynamics used to simulate it, but not
on the details, in a way reminiscent of equilibrium universality. This result
was first demonstrated by Lifshitz and Slyozov (1961), and independently
also by Wagner (1961). The argument goes as follows.
When a system phase separates, the domains in the system coalesce and
grow because by doing so they reduce the total area of domain walls in the
system; domain walls cost energy, so reducing them lowers the energy. The
energy cost of having a domain wall is characterized by a surface tension o,
i.e., the energy per unit area of the wall. The surface energy of a domain thus
scales as a R d - 1 , where d is the dimensionality of the system. The volume
of a domain scales as Rd. So the energy density u—the cost of the domain
walls per unit volume of the system—goes down as R increases according
to u ~ o / R , and the gradient of the energy density scales according to
du/dR ~ o/R 2 . If we assume that the domain walls move diffusively,2 then
1
In practice, the point at which separation takes place does not coincide precisely
with the phase boundary depicted in Figure 5.1. There is a metastable region below the
boundary in which the system remains mixed on laboratory time-scales, even though the
demixed state has lower energy. For this reason the location of the spinodal is not well
defined; it occurs closer to the phase boundary the longer one is prepared to wait for the
nucleation of phase separated domains.
2
We have argued previously that the value z = 2 of the dynamic exponent for single-
270 Chapter 10: Non-equilibrium simulations of the Ising model
which is of the form postulated above, with a = 1/3. Note that this result
is independent of the dimensionality d and so should apply for systems in
either two or three dimensions, both of which we look at in this chapter.
The crucial assumption in this argument is that the movement of the
domain walls obeys a diffusion equation—their velocity is proportional to
the energy gradient they are moving across. For some systems this is clearly
not true. However, it does turn out to be correct for systems in which
(a) the motion of the domain walls is driven by thermal fluctuations, as is
the case for normal fluids and for our Ising models, and (b) the dynamics
by which the molecules or spins in the system rearrange is a local one. In
the language of our spin models for example, a dynamics like that of the
Kawasaki algorithm of Section 5.1 in which up- and down-pointing spins on
adjacent sites swap values is fine. However, a dynamics like that of the non-
local algorithm of Section 5.2 would not be acceptable. For the non-local
algorithm the argument given above breaks down and a different formula for
R applies. However, as long as we stick to Monte Carlo algorithms with local
dynamics, both (a) and (b) above are satisfied. They are also satisfied for
real fluid systems, and so we expect both the real system and our simulations
to display the same growth law for domains.
A similar scaling argument can be made for the growth of domains in
the ordinary Ising model in which the order parameter is not conserved. In
this case we find that the domain size increases somewhat faster than in the
COP case, as
Again this result only applies if the dynamics is local, as in the single-spin-
flip Metropolis and heat-bath algorithms, for example. Non-local algorithms
such as the Wolff cluster algorithm of Section 4.2 show completely different
behaviour.
Beyond scaling arguments of this sort there is no reliable analytical theory
of phase separation to date, and most of what is known comes from computer
simulations, usually of very simple models such as the ordinary Ising model
or the COP Ising model. The problem with these simulations is that we are
restricted to using only local dynamics, such as the Metropolis algorithm.
spin-flip algorithms indicates that the domain walls do indeed move diffusively (see Prob-
lem 4.1).
10.1 Phase separation and the Ising model 271
This prevents us from using most of the clever methods we developed in the
first part of this book to speed up our simulations. However, as we will see,
there are still some tricks we can play to make things faster. In the next
two sections we will illustrate a number of techniques for simulating these
kinds of systems by developing algorithms to test our scaling hypotheses for
domain growth, Equations (10.2) and (10.3). We will look at the problem
first for the ordinary (non-conserved-order-parameter) Ising model, which is
the simpler example, and then for the conserved-order-parameter version.
fluids preserve the volume occupied by each fluid in the system, and in this
case the conserved-order-parameter Ising model is the correct model to use.
As with the ordinary Ising model there are a variety of possible choices
for the dynamics of a COP simulation, always with the constraint that the
moves in the algorithm should be local. The simplest and commonest choice
is the Kawasaki algorithm of Section 5.1 in which randomly chosen pairs of
nearest-neighbour spins are exchanged with an acceptance probability which
depends on the resultant change AE in the energy of the system according
to
10.1 Phase separation and the Ising model 273
(See Equation 5.1.) This algorithm ensures that the number of up- and
down-pointing spins is preserved at every step of the simulation.
In Figure 10.2 we show the results of a simulation of a two-dimensional
COP Ising model on a 500 x 500 square lattice using the Kawasaki algo-
rithm. The frames in this figure are taken at times t = 10, 100, 1000 and
10 000 sweeps of the lattice—ten times as long as the ones in Figure 10.1.
Even so, the domains of up- and down-spins are much smaller than in the
non-conserved-order-parameter case. This we might have anticipated. The
scaling form for the growth of domains in the COP model, Equation (10.2),
goes as t 1/3, which increases more slowly than the t 1/2 of the normal Ising
model. In the next section we show how to estimate the typical domain size
R so that we can check these hypotheses quantitatively.
274 Chapter 10: Non-equilibrium simulations of the Ising model
which were for longer times and made use of a more sophisticated algorithm
which we describe in Section 10.3.1.)
There is one important difference between the forms of the correlation
functions for the two different models. In the case of the ordinary Ising
model the correlation function is, apart from small statistical fluctuations,
always positive, whereas in the COP case it is oscillatory, falling quickly to
zero and becoming negative before rising again. The reason for this is that
in order to form a domain of up-pointing spins in the COP model we have to
take up-pointing spins from surrounding regions. Thus the area surrounding
such a domain tends to be depleted of up-pointing spins, giving rise to a
negative value for the correlation function.
We can now use the correlation function to extract an estimate of the
typical domain size. In the case of the ordinary Ising model, the conventional
choice is to define the domain size R to be the distance over which the
correlation function falls from 1 to a value of 1/e = 0.368. In the COP case
it is defined as the distance to the point at which the correlation function
first becomes negative. With these definitions R is roughly the distance from
the centre of a domain to the surrounding domain wall; the domain diameter
is2 R.
Clearly these definitions are arbitrary to a degree, as they must neces-
sarily be since a domain is not a very well-defined object in these models.
However, if we look at Figures 10.3 and 10.4 again, we notice an important
point. As time passes by, the correlations of spins in the system become
longer in range, but the functional form of the correlation function stays
the same; the curve moves to the right on our figures—corresponding to
multiplication by a constant on the logarithmic scales we have used—but
its shape remains roughly constant. This means that, as far as the study of
domain growth goes, it does not matter precisely what definition we choose
for the domain size R. Different definitions will give answers which differ by
a multiplicative constant, but universal scaling laws like (10.2) and (10.3)
will not be affected.
Using the definitions of domain size above, we have extracted values
of R from the correlation functions shown in Figures 10.3 and 10.4 and
plotted the results in Figure 10.5. We also show the conjectured scaling
laws, Equations (10.2) and (10.3), which appear as straight lines on the
logarithmic scales used in the figure. As we can see, the results for the
ordinary Ising model (the circles in the figure) follow the t1/2 scaling quite
convincingly, even for the rather short simulation we have performed here.
The results for the conserved-order-parameter model (the squares) are not
such a good fit, but for longer times the match between the simulation results
and the theory is reasonable.
10.2 Measuring domain size 277
FIGURE 10.5 The typical domain size R as a function of time for our
simulations of the ordinary Ising model (circles) and the conserved-
order-parameter Ising model (squares). The solid lines show the scaling
laws for these quantities, Equations (10.2) and (10.3).
3
This name comes from the theory of X-ray scattering, in which the structure factor
is the quantity directly measured by the scattering experiment.
278 Chapter 10: Non-equilibrium simulations of the Ising model
In fact, this method does not give exactly the same estimate of R as the one
discussed in the last section, but again the two differ only by a multiplicative
constant so that either is fine for investigating the scaling behaviour of R
with time.
The structure factor contains more information about the structure of
domains than simply their average size. As k —» oo we normally find that
the structure factor tends to zero as k~(d+1) where d is the dimensionality
of the system, a behaviour referred to as Porod's law. In the language
of scattering experiments, Porod's law is the result of scattering from the
domain walls and only persists up to values of k corresponding to the typical
width of such a wall (Shinozaki and Oono 1991). The behaviour of the
structure factor in the limit of small k is not so well understood, and is still
a matter of some debate.
phase separation experiments, and also because it is the harder of the two
models to get good results from. If we can find an efficient way of simulat-
ing the conserved-order-parameter model in three dimensions, then we can
surely do the easier non-conserved-order-parameter model as well.
that used in continuous time Monte Carlo algorithms: we make our Monte
Carlo steps correspond to varying amounts of real time. In detail here is
how the algorithm works.
First we make a list of all the anti-aligned pairs of nearest-neighbour
spins for the initial configuration of the system. Then at each step of the
algorithm we perform the following operations.
1. We select a pair of spins at random from our list and calculate the
change in energy which would result if their values were swapped over.
We then perform the swap (or not) with the Kawasaki acceptance
probability, Equation (10.4).
2. We add an amount At to the time, where
3. If the values of the spin were in fact exchanged, we must update our
list of spin pairs to reflect the change. The pair that we exchanged can
just stay where they are in the list, since they are still anti-aligned after
the move. All the neighbouring pairs will have to be changed however.
All the ones which previously were anti-aligned are now aligned and
should be removed from the list and all the ones which previously were
aligned are now anti-aligned and should be added.
To see that this algorithm satisfies detailed balance, notice that the prob-
ability per Monte Carlo step of any anti-aligned spin pair being chosen is
just 1/m, and hence that the probability per unit simulated time of it being
chosen is l/(m At) which equals 1, independent of TO. Given the form of the
acceptance probability for the move, this then ensures detailed balance. Er-
godicity is also obeyed, for exactly the same reasons that it is in the normal
Kawasaki algorithm.
Using this algorithm (and with a bit more patience) we have been able
to simulate the two-dimensional COP Ising model for two decades of time
longer than we previously could with the simple Kawasaki algorithm. The
results are shown in Figure 10.6.
performed. In Section 5.2.1 we saw how to get around this problem in the
case of an equilibrium simulation by using a continuous time Monte Carlo
method. We can use exactly the same trick with the current non-equilibrium
problem to produce an algorithm in which no moves are ever rejected. This
algorithm is a very straightforward implementation of the continuous time
idea as set it out in Section 2.4. It goes like this.
First we take the initial T = oo configuration of the lattice and construct
separate lists of anti-aligned nearest-neighbour spin pairs, grouped according
to the change in energy AE which would result if their values were to be
swapped over. On a cubic lattice in three dimensions, for example, the value
of AE ranges from — 20J to +20 J in steps of 4J, so there would be 11 lists
in all. Suppose that the number of entries in the ith list is mi. Each Monte
Carlo step then consists of performing the following operations:
1. We choose one of our lists at random, with the probability of choosing
the iih list being proportional to miAi, where Ai is the value of the
acceptance ratio, Equation (10.4), for the moves in list i.
1. We select one of the pairs of spins in the chosen list uniformly at
random.
3. We exchange the values of the two spins in the selected pair.
4. We add an amount At to the time, equal to the reciprocal of the sum
of the rates for all possible moves, which in this case means
282 Chapter 10: Non-equilibrium simulations of the Ising model
In fact, it is this last step which takes all the time in this algorithm. The
business of updating all the lists after every move is quite involved and can
take a lot of work. For this reason, it turns out that it only really pays to use
a continuous time algorithm at very low temperatures. At the temperature
T = 2J which we used in our Ising model simulations earlier in this chapter,
for instance, the lowest values of the Kawasaki acceptance ratio occurred
towards the end of our simulations and averaged about 34%. This means
that our continuous time algorithm, which has an effective acceptance ratio
of 100%, should be about three times as fast. However, the extra work
necessary to keep the lists of spin pairs up to date means that each Monte
Carlo step of the continuous time algorithm takes a good deal more than
three times as much CPU time as a step of the simpler algorithm, and so
the simpler algorithm is more efficient in the long run. As the simulation
temperature is lowered, the acceptance ratio in Equation (10.4) becomes
exponentially small, so there will always be a temperature low enough that
using the continuous time algorithm will pay off. For our purposes in this
chapter however, the continuous time method does not help.4
So what are we to do if we want to probe longer times still? In the
next section we investigate a completely different approach to speeding up
our simulation. Instead of changing the Monte Carlo algorithm, we look at
changing the actual physics of the system simulated.
4
We didn't know that this was going to be the case before we tried it. Often finding
the best Monte Carlo algorithm for a problem is just a question of trial and error.
10.4 An alternative dynamics 283
whole domains around the lattice and in this way cause domain growth by
joining pairs of smaller domains into larger ones.
Surface diffusion takes place much more easily than bulk diffusion because
there is no activation energy involved. It costs energy to break a spin off
from a domain, but it costs nothing to simply slide one along the surface.
For this reason surface diffusion tends to dominate the dynamics of domain
growth at early times. As time draws on, however, it becomes less and less
important, because it can only take place on the surfaces of the domains. As
we argued in Section 10.1, the ratio of domain surface to volume decreases
as 1/R, so the opportunity for surface diffusion dwindles as R grows. Thus,
it is only the bulk diffusion which contributes to the long-time scaling of
domain sizes which we see in our simulations.
In fact, surface diffusion turns out not only to be irrelevant to the results
we are looking for in our simulations, it is also the main reason why our
simulation is slow. Using scaling arguments similar to the ones we gave in
Section 10.1 we can show that the growth of domains as a result of surface
diffusion should go as t1/4. Given that we expect surface diffusion to dom-
inate at early times, this implies that we should see R growing slowly as
t1/4 initially, and then switching over to a t1/3 growth law later on. This
is precisely the behaviour which we remarked on earlier in Figure 10.5. So
if we could find an algorithm in which surface diffusion was suppressed in
favour of bulk diffusion, it should show the t1/3 behaviour much earlier in
the simulation. We now show how to create precisely such an algorithm.
where z is the lattice coordination number, i.e., the total number of nearest
neighbours of each site on the lattice.
Our proposed Monte Carlo algorithm can also be expressed in terms
10.4 An alternative dynamics 285
of the spin coordination numbers. The acceptance ratio for making the
transition from a state u to a state v by exchanging the values of two adjacent
spins is
where nui and nu are the coordination numbers in the initial state u. As we
will shortly demonstrate, this choice does indeed suppress surface diffusion
relative to bulk diffusion, and it is also very simple to implement. First
however, we have to prove that it satisfies the conditions of ergodicity and
detailed balance.
Ergodicity is easy. The individual moves in this algorithm are the same
exchanges of adjacent spin values as for the Kawasaki algorithm and so the
proof of ergodicity which we gave in Section 5.1 applies here as well. To prove
detailed balance the crucial step lies in noticing that the spin coordination
numbers of sites in the final state v are related to those in the initial state
u by
as required.
So how does this algorithm help us? Consider again the two scenarios
shown in Figure 10.7. With the Kawasaki acceptance ratio, Equation (10.4),
the bulk diffusion process on the left is slow because the move which splits the
spin off from its domain has an acceptance ratio which is less than one and
becomes exponentially small as temperature falls. For the surface diffusion
process on the right, by contrast, the spin never splits off from the domain
and all moves have an acceptance ratio of one. In the new algorithm, the
bulk diffusion process still has an acceptance ratio which is less than one.
So, however, does the surface diffusion process, since the spin coordination
numbers ni and ni of a pair of spins on the surface are both greater than
zero. This has the effect that fewer surface moves will be accepted relative
to bulk ones in the new algorithm. As a result the t1/4 domain growth
produced by the surface diffusion should be less strong and we should see
the late-time t1/3 setting in much earlier.
The acceptance ratio, Equation (10.11) can be used either with a simple
accept/reject algorithm of the type described in Section 10.3.1, which would
286 Chapter 10: Non-equilibrium simulations of the Ising model
Problems
10.1 What is the state of a phase-separating COP Ising model on an L x L
square lattice in the limit t —> oo when the numbers of up- and down-pointing
spins are equal?
10.2 The implementation of the bulk diffusion algorithm described in Sec-
tion 10.4.2 as a continuous time algorithm turns out to be charmingly simple.
We maintain lists of spins, as we did for the equilibrium algorithm of Sec-
tion 5.2.1, divided according to their spin coordination numbers ni. Then
we choose a spin i at random from one of these lists with probability pro-
portional to (z — ni) exp(— 4Bni) and exchange its value with that of one
of its anti-aligned neighbours, chosen at random. Show that this algorithm
does indeed result in a transition probability proportional to the acceptance
ratio of Equation (10.11).
11
Monte Carlo simulations in
surface science
In the last chapter we used the results of simulations of the Ising model to
study the growth of domains under spinodal decomposition. In this chapter
we look at another use of the Ising model in quite a different field: surface
science.
Out-of-equilibrium Monte Carlo calculations have become an important
tool in surface science. They provide an efficient way of simulating the move-
ment of atoms on crystalline surfaces. Most often we are interested in metals,
the majority of which have a face-centred cubic (fee) crystalline structure.
The surfaces of such a crystal provide a natural lattice for our Monte Carlo
simulation. For example, the atoms which make up the (001) facet of an
fee crystal form a square lattice, as shown in Figure 11.1. The atoms of a
(111) facet form a triangular lattice.
Now consider what happens if we add a few extra atoms to our surface.
These adatoms might be atoms of the same element as the crystal, or of a
different element. In either case, there will be attractive forces between the
adatoms and the atoms of the surface which mean that it is energetically
favourable for the adatoms to be next to as many surface atoms as possi-
ble. On a (001) surface, for instance, they will prefer to sit in the so-called
four-fold hollow sites where they are in contact with four surface atoms
simultaneously, as shown in Figure 11.1. Such four-fold hollow sites also
form a square lattice on the surface.
On a (111) surface the best an adatom can do is to sit in a three-fold
hollow site, in contact with three surface atoms. The three-fold hollow
sites divide into two distinct types depending on their position relative to
the atoms in lower layers of the metal (see Figure 11.1 again). An hcp
site, named after the hexagonal-close-packed lattice to which it belongs, is
290 Chapter 11: Monte Curio simulations in surface science
a three-fold hollow site directly over an atom in the layer immediately below
the surface. An fee site is one which lies directly over one of the atoms two
layers below the surface. Both the set of hcp sites and the set of fee sites
form triangular lattices on the surface of the crystal. The combined set of
all three-fold hollow sites forms a honeycomb lattice. (See Section 13.1.2 for
a description of the honeycomb lattice.)
Many different types of crystalline surfaces are of interest in surface sci-
ence. In this chapter we will illustrate our Monte Carlo techniques using the
square lattice of a (001) surface, since the square lattice is the easiest to work
with. All the ideas and algorithms we will describe, however, are equally ap-
plicable to other lattices. Techniques for performing simulations on lattices
such as triangular and honeycomb lattices are discussed in Chapter 13.
Consider then a collection of adatoms in the four-fold hollow sites of the
(001) surface of an fee metal. (Actually, it could just as well be a body-
centred cubic (bcc) metal. For the (001) surface it makes no difference.)
Each adatom is bound to the metal surface with some binding energy E9,
which is typically a few electron volts (eV). This binding energy is the same
for all sites. However, two adatoms can lower their energy further by being in
adjacent sites; there is an extra binding energy Eb, between adjacent adatoms
which is typically on the order of a few tenths of an eV. This extends to
adatoms which are adjacent to more than one other adatom: we gain an
extra Eb in energy for every pair of adjacent adatoms on the surface. For
most metals it is a pretty good approximation to neglect binding energies
between adatoms which are any further apart than adjacent four-fold hollow
sites. Calculations for copper surfaces using semi-empirical potentials, for
Chapter 11: Monte Carlo simulations in surface science 291
The first term — nE8 here is a constant as long as the number n of adatoms
does not change. Only the second term depends on the configuration of the
atoms. Let us represent this configuration as we did in Chapter 5 using a
set of variables {6i} such that 6i is 1 if site i is occupied by an adatom and
0 if the site is empty. In terms of these variables we can write the number
of nearest-neighbour pairs of atoms as
where (ij) indicates that sites i and j are nearest neighbours. Making a
change of variables to the Ising spins
this becomes
where we have put all the constant terms together. Comparing this ex-
pression with Equation (5.7), we see that the total energy of our collection
of adatoms has exactly the same form as the Hamiltonian of the conserved-
order-parameter Ising model with the spin-spin coupling J set equal to1/4Eb.
For the typical values of Eb found in metals, the critical temperature of
this system is significantly higher than room temperature. To see this, recall
that the critical temperature of the Ising model (see Equation (3.53)) is
Typically binding energies between adatoms are on the order of about 0.3 eV,
which is equivalent to about 5 x 10-20 joules. Substituting this figure
into (11.7) we get a critical temperature of about Tc = 2000 K1 which is
well in excess of most laboratory temperatures (room temperature is about
300K), and indeed in excess of the melting point of most metals. Thus the
adatoms on a metal surface will under most conditions be in the condensed
phase of the COP Ising model, in which they tend to cluster together into
"islands" rather than being dispersed like the atoms in a gas. We can make
use of many of the results of Chapter 5 concerning the COP Ising model
to tell us about the properties of the adatoms in this condensed phase. In
this chapter however, we are more concerned, as most surface scientists are,
with the out-of-equilibrium properties of the system, such as the way in
which adatom islands form and grow. It is on these properties that we will
concentrate for the remainder of the chapter.
then we consider how this algorithm needs to be modified when more than
one adatom is present.
FIGURE 11.2 The energy at the highest point of the trajectory be-
tween sites i and j is Emax = Ei + Bij = Ej + Bji. Rearranging
we then get Bij — Bji = Ej — Ei. The condition of detailed balance,
Equation (11.10), follows as a result.
and assume that successive hops are uncorrelated and obey the Arrhenius
law.
Given these assumptions it is not hard to show that the hopping process
satisfies the normal condition of detailed balance, Equation (2.14). As Fig-
ure 11.2 shows, the energy barrier By which an atom has to climb in order
to hop from site i to site j is related to the barrier Bji for the reverse hop
by
where JBj and Ej are the binding energies at the two sites. (For a single
adatom on a uniform surface these would be the same, but the proof works
without making this assumption. In the next section we consider systems of
many adatoms for which the binding energies vary, so it is worth our while
to prove the more general result.) Using Equation (11.8) we can then show
that the rates for hopping in the two directions satisfy detailed balance:
where Bij is the energy barrier for that move. Under this algorithm, each
Monte Carlo step would be equivalent to an interval of real time
The factor of four comes from the four directions in which the adatom can
hop.
We can do considerably better than this however. Notice that if all the
energy barriers are high compared with kT then all moves will get rejected
most of the time with this algorithm, which is a waste of CPU time. To get
around this problem we can use a trick similar to the one we used for the
Metropolis algorithm in Section 3.1 and multiply all our acceptance ratios
by a constant chosen such that the largest acceptance ratio becomes equal
to one, which is the highest value it is allowed to have. We can do this by
setting
where -Bmin is the lowest energy barrier in the system. (Again, we are avoid-
ing making the assumption that all the barriers are same, even though this
will be the case in most single adatom systems.)
Making this change does not affect the dynamics of the system, but it
does improve the efficiency of our simulation by a factor of exp(BB min ). It
also changes the time-scale by the same factor. With this algorithm, each
Monte Carlo step now corresponds to a time interval of
The sum here is over all sites j to which the adatom can hop and the rij
are calculated from Equation (11.8). Notice that this time interval can vary
from one Monte Carlo step to another.
296 Chapter 11: Monte Carlo simulations in surface science
Kawasaki- type energy barriers: The simplest choice is to set the energy
Emax of the top of the barrier (see Figure 11.2) equal to the greater of the
binding energies of the atom in the two sites i and j between which it is
hopping, plus a constant B0. In other words, the barrier By for hopping
from site i to site j is given by
Notice that this choice satisfies Equation (11.9), so that detailed balance will
be obeyed once the system has come to equilibrium.
11.1 Dynamics, algorithms and energy barriers 297
We can now feed these barrier heights into any of the algorithms proposed
in the previous section to create a Monte Carlo algorithm for the many
adatom system. The easiest way to go is to use the second algorithm we
proposed. In this case, we would choose an atom at random and consider
moving it in one of the four possible directions, also chosen at random. If the
site we want to move it to is already occupied by an atom, then we cannot
make the move. If it is empty, then we decide whether to make the move
based on the height of the energy barrier for the move. The height of the
lowest barrier is -Bmin = B0, so the acceptance ratio, Equation (11.13), for
the move becomes
This algorithm is very similar to the Kawasaki algorithm for the conserved-
order-parameter Ising model (Section 5.1), and for this reason we refer to
the energy barriers in Equation (11.16) as Kawasaki-type energy barriers.
The only subtlety with this algorithm is that a single Monte Carlo step
of this algorithm now corresponds to an interval of real time
where n is the number of adatoms. (We need this factor of n, so that the
attempt frequency of each atom for hopping in each direction is correctly v
per unit of real time.)
We can also use Kawasaki-type energy barriers with the continuous time
algorithm proposed in Section 11.1.1. Combining Equation (11.16) with the
Arrhenius law, Equation (11.8), we get
We can then employ this result in our continuous time algorithm. In practice
this is the better way of simulating the system, although the continuous time
algorithm is more complicated to implement. As discussed above, it requires
a certain amount of careful programming to avoid having to recalculate lists
of possible moves and their rates at every time step. In Section 11.2 we
discuss implementation in more depth.
Both of these algorithms can be simplified if we approximate the total
energies Ei and Ej of the system before and after the hop using Equa-
tion (11.1). (As we mentioned, this is usually a pretty good approximation.)
In this case the energy change Ej — Ei can be written as
298 Chapter 11: Monte Carlo simulations in surface science
where B0 is again a constant. This defines the barrier heights in the bond-
counting approach. Combining this result with the Arrhenius law, Equa-
tion (11.8), we obtain
As with the Kawasaki barriers, we can then use these rates to create a
continuous time Monte Carlo algorithm for simulating the system.
method (Breeman et al. 1996). In the first, situation we find an energy barrier
of about 0,4 eV, In the second we f i n d a barrier of about. 0.2 eV, Note that
the isolated atom has the higher barrier for hopping, which is the opposite
of the prediction of the bond-counting approach.
To highlight the differences between the three cases further, let us cal-
culate the ratio of the bopping rates for the two moves considered. For the
Kawasaki-typo barriers, the barriers in the two cases are the same, so the
ratio of hopping rates is 1. For the bond-counting approach. Equation (11.8)
tells us that the ratio of the rates will be exp( - B E b ) , For copper, Eb,is about
0.3 eV, which means that at room temperature the hopping of the isolated
atom is more than 100 000 times faster than the hopping of the atom on the
edge of the island.
Finally, plugging our atom-embedding results into Equation (11.8), we
get hopping rates of 3 x 10-4 V and l ( ) - 7 v for the two situations in Figure 11.4.
Thus the atom on the edge of the island has the higher hopping rate (by
contrast with the bund-counting approach, in which it. has a lower rate) and
hops about 3000 times faster than the isolated atom.
Thus we see tin-it there is a factor of 3000 between the values of this
hopping ratio in the atom-embedding calculations (which are the most accu-
rate) and the Kawasaki-type approximation, and a factor of 3 x 108 when we
so to the; bond-counting method. Clearly such large factors have important
implications (or our simulation results. In Section 11.3 we give an example
of the application of the methods described hero to the problem of simu-
lating rnolecular beam epitaxial growth and show that, they do indeed give
qualitatively different pictures of adatom dynamics. First, however, we look
at how these types of simulations are; implemented.
11,2 Implementation 301
11.2 Implementation
As discussed in the last section, there are a couple of different ways to
formulate our Monte Carlo simulation of adatom hopping. The simpler way
is to choose atoms at random and move them in a randomly chosen direction
with an acceptance probability given by Equation (11.13), provided always
that the site we wish to move them to is empty. The trouble with this
approach is that most of the time the site we wish to move them to will
not be empty. At typical laboratory temperatures, adatoms tend to phase
separate and form islands, which means that our Monte Carlo algorithm
wastes a lot of time picking atoms in the middle of islands and then not
moving them because all the surrounding sites already have atoms in them.
The solution, as we have already pointed out, is to use a continuous time
algorithm. As it turns out, this is much easier to do for the Kawasaki and
bond-counting methods discussed in the last section than it is if we wish to
use energy barriers stored in a lookup table. This in fact is the main reason
for the continued popularity of these simpler methods of calculating barriers
despite their rather poor representation of the properties of real surfaces.
Here we will discuss Kawasaki and bond-counting algorithms first. Lookup
table algorithms are discussed in Section 11.2.2.
where m is an integer which takes values in the range 0... z, with z be-
ing the lattice coordination number of the surface (four in the case of the
(001) surface we have been considering). In fact, for both Kawasaki and
bond-counting approaches, m can only equal z if the spin coordination num-
ber ni at site i also equals z, in other words if all the nearest-neighbour sites
of site i are occupied, in which case no moves are possible. So in practice
we only need to consider values of m up to z — 1. (Note that in general
the value of m will be different for any given move under the Kawasaki and
bond-counting approximations.)
This suggests the following scheme for implementing our continuous time
algorithm. For every possible move on the lattice, the barrier height, Equa-
tion (11.25), takes one of z possible values corresponding to m = 0 . . . z — 1.
Suppose that at the start of our simulation we find all the possible moves
and divide them into z lists, one for each value of m. Since the hopping rate,
Equation (11.8), is entirely determined by the barrier height, all the moves
302 Chapter 11: Monte Carlo simulations in surface science
4. We update our lists of moves to take the move we have made into
account.
The last step is the only tricky part of the algorithm. We don't want to
recalculate all our lists of moves from scratch for the entire lattice after
every move, since most sites on a large lattice will be far enough away not to
be affected by the move. However, if our Monte Carlo step moves an adatom
from site z to site j, then the hopping rates for all moves to or from nearest-
neighbour sites of either i or j will be affected and the corresponding entries
will have to be found in our lists and moved to different lists. In addition,
some new moves will have become possible (ones into the site vacated by the
atom which just moved) and others will no longer be allowed (ones into the
site it now occupies). These moves will have to be added to or subtracted
from the lists. As discussed in Section 5.2.1, all this bookkeeping can slow
the algorithm down by as much as a factor of 40. On the other hand, no
Monte Carlo step is wasted in a continuous time algorithm, and the resultant
improvements in the efficiency of the simulation usually more than make up
for the extra complexity of the algorithm. Depending on the temperature
used for the simulation the continuous time algorithm can be a factor of
100 or more faster, on balance, than the simple algorithms described in
Section 11.1.1.
heights becomes quite large, maybe as large as 1024 (see Figure 11.3). In
order to use the algorithm of the last section in this case we would have to
maintain 1024 different lists of possible moves. Unfortunately, the amount of
time taken to perform one Monte Carlo step increases approximately linearly
with the number of lists, and this makes the algorithm impractically slow.
A better way of performing the simulation is to store the moves in a binary
tree. (If you are unfamiliar with tree structures, you will find them discussed
in detail in Section 13.2.4.) The idea is as follows.
We create a binary tree, each leaf of which contains one of the possible
moves of an adatom. (The leaves of the tree are the nodes at the ends of the
branches, the ones which have no child nodes.) We also calculate the hopping
rate for each move and store it in the corresponding leaf. Let us denote by ri
the rate for the ith possible move, calculated using Equation (11.8). In the
parent node of any two leaves i and j we store the sum rj + rj of the hopping
rates for the two children and in the grandparent the sum of those sums, and
so forth up the tree until we reach the root node, which contains the sum
R = Eiri n of all the rates. To choose a move at random in proportion to its
hopping rate, we first generate a random real number in the range from 0 to
R. If this number is less than the sum of rates R1 stored in the first child of
the root node then we proceed to that child. Otherwise we subtract R1 from
our random number and proceed to the second child. We repeat this process
all the way down the tree until we reach a leaf. It is not hard to show that
the probability of ending up in any particular leaf is then proportional to
the hopping rate for the move stored in that leaf.
To complete the Monte Carlo step, we now perform the chosen move and
then add an amount At to the time, which is given by the inverse of the
sum of the rates for all possible moves, Equation (11.15). For this algorithm
At is particularly easy to calculate, since the sum of the rates is just the
number R stored in the root node of the tree:
Problems
11.1 The adatom hopping process discussed in this chapter is only one
mechanism by which atoms diffuse on metal surfaces. Another common one
is the exchange mechanism where an adatom pushes one of the atoms in
the bulk of the metal up onto the surface and then takes its place in the
bulk. On a (001) surface, the ejected atom usually ends up in a site which
is a next-nearest-neighbour of the site occupied by the original adatom, so
that the overall effect of the process is to move an adatom by a distance v2
times the lattice parameter. For some metals, such as copper, the attempt
frequency for the exchange mechanism is actually higher than that for the
hopping one, but it also has a higher energy barrier. Suppose that the
attempt frequency of the exchange mechanism is a factor of ten greater and
its barrier is AB = 0.2 eV higher. At what temperature does exchange
become the dominant transport mechanism?
11.2 Suppose that we perform an MBE simulation using the algorithm
described in Section 11.2.2, which stores all the possible moves in a binary
tree. If a given run on a 100 x 100 square lattice takes one hour of CPU
time to reach a certain adatom density with a given deposition rate, about
how long will it take to reach the same density on a 250 x 250 lattice with
the same deposition rate?
12
The repton model
12.1 Electrophoresis
Electrophoresis is an experimental technique for separating mixtures of
long-chain polymers according to their length. It has particular importance
in the rapidly growing fields of molecular genetics and genetic engineering for
refining DNA, which is just such a polymer. The model we will be discussing
in this chapter is quite a good approximation to the behaviour of DNA. (It
would not be so good, for instance, as a model of RNA—see Section 12.2.)
The basic idea behind electrophoresis is to separate charged particles by
pulling them through a stationary gel with an electric field. If we start with
a mixture of different types of particles, each of which drifts with a different
308 12: The repton model
elled furthest down the lane, and the mixture of DNA will have separated
into a number of bands, each containing DNA fragments with one specific
length. The positions of these bands are a measure of the length of the frag-
ments making them up, and the experiment also serves to separate the DNA
by length; at the end of the experiment, we can cut one of the bands out
of the gel, thus refining fragments of one particular length from the initial
mixture of many lengths.
Figure 12.2 shows results from a typical DNA electrophoresis experiment,
performed according to the procedure outlined above. In this experiment five
different lanes were run in the same gel box, with the same mixture of DNA
fragments being injected into successive lanes at regular intervals of two
hours, starting with the rightmost lane. The electric field was turned off two
hours after the last injection, so that, reading the figure from left to right,
the lanes correspond to five individual electrophoresis experiments of length
2, 4, 6, 8 and 10 hours. If each fragment of DNA drifts with a constant
velocity, a band in the four-hour lane will have drifted twice as far at the
end of the experiment as a band of identical DNA in the two-hour lane. If
we were to connect the middle points of all bands containing the same DNA
we should get a straight line, and the slope of this line would tell us the drift
velocity for that particular DNA fragment.
However, determining the drift velocity of a particular group of DNA
fragments doesn't actually tell us what their length is—the experiment needs
to be calibrated before we can calculate this. The customary calibration
procedure involves injecting a standardized DNA ladder into an empty
lane in the gel box. This ladder is a mixture of DNA fragments of known
lengths, usually multiples of 100 or 1000 bp. During electrophoresis this
ladder separates to form a set of bands which are then compared with the
bands formed by the fragments in other lanes, to determine the lengths of
those fragments. This procedure is advantageous because it provides an ap-
proximate estimate of fragment length which is independent of variations in
the experimental conditions. If a band of unknown DNA fragments drifts
about as fast as the standard 7 kb band in the DNA ladder, then it consists
of fragments around 7 kb in length, regardless of changes in temperature,
voltage or buffer concentration during the experiment or between one exper-
iment and another. The DNA mixture that we used to produce Figure 12.2
was just such a DNA ladder.
FIGURE 12.3 A typical configuration of the repton model (left). If the reptons
are numbered along the chain, starting with the end in the top-left corner, we
can characterize this configuration by plotting the x-position as a function of the
repton number. This is done in the right side of the figure. This is the projected
model described in the text. Allowed moves are denoted by arrows.
fragment length and drift velocity, so as to put the calibration of the gel and
the use of DNA ladders on a firmer scientific footing. In this chapter we
discuss the mechanisms at work in electrophoresis and study a simple model
of the process called the "repton model".
First it is important to have some idea of how DNA diffuses through
a gel in the absence of an electric field. The fragments of DNA which we
inject into the gel are usually quite long (between 1 and 10 kb) and only
moderately flexible. One can think of the molecule as a piece of stiff rope
sliding through the web-like gel. Sideways movement of the rope is blocked
by the agarose strands, and its main mode of diffusion is motion along its
own length. The end of the molecule makes random thermal movements
around the gel, and the rest of the molecule follows behind, like a snake.
This mechanism was first proposed by de Gennes (1971), who christened it
reptation.
The repton model was introduced by Rubinstein (1987) as a simple
model of polymer reptation. The model is illustrated on the left-hand side
of Figure 12.3. The gel is represented as a square lattice; the plaquets on
the lattice correspond to the pores in the gel. (The lattice is shown as two-
dimensional in Figure 12.3, although the real system is three-dimensional.
However, as discussed below, the properties of the model are independent of
the number of dimensions of the lattice, so we may as well stick with two for
clarity.) We represent the DNA or other polymer as a chain of N polymer
segments or reptons, the black dots in the figure. (The reptons are not the
same thing as base pairs. In fact, each repton corresponds to a string of about
150 base pairs, as discussed below in Section 12.2.2.) Adjacent reptons in
312 Chapter 12: The repton model
the chain can only occupy the same or nearest-neighbour pores on the lattice
and reptons move from pore to adjacent pore diagonally according to the
following rules:
(i) A repton in the interior of the chain can move to one of the adjacent
squares on the lattice, provided that one of its immediate neighbours
in the chain is already in the square to which it is moving, and the
other is in the square which it leaves. This rule reproduces de Gennes'
reptation motion; reptation is the only mechanism for diffusion in the
repton model. Note that if three or more neighbouring reptons in the
chain find themselves all in one pore, only the two with connections to
other reptons outside this pore are allowed to move.
(ii) The two reptons at the ends of the chain may move in any direction
to an adjacent square, provided that such a move does not take them
more than one lattice spacing away from their neighbouring repton on
the chain.
In real polymers there is the additional effect that the polymer has a finite
width, which limits how much of the polymer can fit into a certain space.
This self-avoidance property is assumed to have a small effect on the dynam-
ics of real polymers and is not included in the repton model.
Since we are interested in the rate at which polymers drift through the
gel, we also need to define the time-scale on which moves take place. To do
this, we make the assumption that all the reptons are continually driven by
thermal fluctuations to attempt moves to the adjacent squares on the grid.
We assume that every such move is as likely as every other to be attempted
at any particular time, and we choose the time-scale such that each move
will be attempted once, on average, per unit time. (Not all of these moves
will actually take place; many of them will be rejected for violating one
of the rules above. However, the moves which are allowed each take place
with equal probability per unit time. There are no energies associated with
the different states, and so there are no Boltzmann weights making one
move more likely than another. The dynamics of the repton model is purely
entropic.)
The model described so far is, as we mentioned, just a model of polymer
diffusion. All the motions are random thermal ones and there is no applied
electric field. To make a model for electrophoresis, we first of all assign
to each repton the same negative electric charge, mimicking the charging
effect of the buffer solution in the experiment. Then we apply a uniform
electric field to the model along the rr-axis (the horizontal direction in the
figure), breaking the spatial symmetry. As a result, instead of unit rates
for the allowed moves in the positive and negative x-direction, the rates
become exp(—-E/2) in the positive direction and exp(E/2) in the negative
one, where E is a new parameter which is proportional to the applied field
12.2 The repton model 313
(see Section 12.2.2). The resulting model describes the qualitative behaviour
of DNA electrophoresis reasonably well. Many details of the real system
are still missing however, such as mechanical properties of the polymer, the
effects of counterions, the inhomogeneity of the gel and the gel concentration,
and to be quantitatively accurate the model would have to include these
features as well. However, we can still extract a lot of useful information
from the simple model we have here.
where ( . . . ) represents an average over the states the molecule passes through over a long
period of time. When the base pairs i and j are very close together, their corresponding
direction vectors will be pointing in almost the same direction, so Gij will be close to
unity. When they are far apart there will be no correlation between the directions, and
the product of the two vectors is as likely to be negative as positive, so that after averaging
over many states of the system, the correlation function will be zero. As we approach this
long length-scale limit, the correlation function will decay exponentially:
The length-scale f for this decay is the persistence length (measured here in multiples of
the spacing between monomers).
12.3 Monte Carlo simulation of the repton model 315
where a is the lattice parameter (i.e., the pore size), q is the charge per
repton (i.e., per persistence length) and Ef is the applied electric field. The
numerator here represents the energy needed to move a repton a distance
v2a (the x-distance between two nearest neighbours of the same lattice site)
against the electric force qEf acting on it.
The charge q is equal to one electron charge e per base pair, or about
150 x 1.6 x 10-19 C per repton.2 If we take a typical room-temperature value
of 300 K for T and a value of about 2000 A for the pore size a, we find that
To begin with, let us consider the case of the repton model in zero electric
field E. In this case, we attempt each possible move of a repton once in each
unit of simulated time. We will work with the projected model, in which case
these moves consist of shifting the reptons up or down in Figure 12.3, for a
total of 2N possible moves (some of which, as we have said, will be rejected,
because they don't satisfy the rules set out in Section 12.2.1). We should
attempt each of the possible moves with equal probability, and we should
attempt them at a total rate of 2N attempted moves per unit of simulated
time.
The simplest Monte Carlo simulation of the repton model then goes like
this. In each Monte Carlo step we randomly select one of the N reptons and
one of the two directions for it to move in (up or down). If the move is allowed
according to the rules given in Section 12.2, it is accepted. Otherwise, it is
rejected. The probability for each move to be selected in one Monte Carlo
step equals 1 / ( 2 N ) , and after 2N moves each possibility will have been tried
once on average, corresponding to one unit of time. Instead of making 2N
Monte Carlo moves and then adjusting the time t on our clock by one, it is
more elegant to add At = 1/(2N) to t after each step.
To simulate electrophoresis we have to include the effects of an electric
field. We can achieve this if we keep the probability for attempting each
upward or downward move of a repton the same, but now make the up-
ward moves a factor of exp(E/2) more likely to be accepted and the down-
ward moves exp(— E/2) less likely. As with the Metropolis algorithm of
Section 3.1, we can get the ratio of these acceptance probabilities right, and
at the same time maximize the overall rate at which moves are accepted, if
we always accept upward moves, so long as they do not violate the rules of
the dynamics, but only accept downward moves with probability exp(— E).
Of course, moves which violate the rules will still be rejected outright.
This algorithm makes the ratio between the upward and downward moves
right, but it doesn't get the rates exactly correct. As it stands the algorithm
accepts allowed upward moves exactly as often as in the zero-field case,
whereas we'd like them to be accepted a factor of exp(E/2) more often. To
achieve this, we simply add
to the time t at each step. This also makes the downward moves exp(— E/2)
less likely (per unit time) than in the zero-field case (instead of exp(-E) less
likely) which is exactly what we want.
or
How much exactly does this speed our algorithm up by? Well, let us
assume first of all that the time taken to perform each Monte Carlo step is
the same for this new algorithm as it was for our original one. This seems
a reasonable assumption; the only extra work we have to do in the new
algorithm that we didn't have to do in the old one is the calculation of the
quantities a and At from the formulae above. However, we only have to
calculate each of these once for the whole simulation, so they don't add any
effort to the individual Monte Carlo steps.
The new algorithm is more efficient than the old one because some of
the steps that were previously rejected are now accepted. How much of a
difference does this make? This is measured by the quantity At in Equa-
tion (12.7) above, which is the amount of time which is simulated in one
Monte Carlo step. If the steps in both algorithms take the same amount of
CPU time, then At is also a measure of the amount of simulated time you
can get through with a given amount of CPU time. The corresponding quan-
tity in the original algorithm was exp(—E/2)/(2N) (see Equation (12.3)) so
the factor by which we have speeded up our algorithm is
The only questions we need to answer are, what is the probability for
choosing each of the lists, and how big is the increment At that we have to
add to the time at each step of the simulation? The probability of choosing
one list or the other is obviously related to the probability a in the previous
version of the algorithm, but it's not quite equal to a, because if it were, the
probability of choosing any one particular allowed move would go up as the
number of possibilities on the corresponding list went down, which is not
how we want things to work at all.
Suppose the number of allowed upward and downward moves are nu
and rid respectively. At each step of the simulation we either decide, with
probability 7, to take a move from the downward list, or we decide, with
probability 1 — 7, to take one from the upward list. The probability of
selecting any particular downward move at a given time is then y/nd, and
the probability of selecting a particular upward one is (1 — y)/nu. We want
the ratio of these probabilities to be exp(—E), which when you work it out
means that
Note that this value of 7 is not a constant; it has to be adjusted each time
the number of upward or downward moves changes.
To calculate At, we note that the rate y/nd at which a particular down-
ward move is made should be exp(—E/2) per unit time, which means that
we have to make nu exp(E/2)+rid exp(-E/2) Monte Carlo steps to simulate
one unit of time. Or alternatively, we increment the time-scale by
at each step. (Since we have already fixed the probability ratio between up
and down moves at exp(E) by our choice of 7, we are ensured that the rate
for upward moves will now be exp(E/2), as we want it to be.)
There is one more point we have to consider with this algorithm. The
process of going through the entire chain and making a list of all the possible
moves could be quite a lengthy one, and there is little point in construct-
ing this elaborate algorithm to save CPU time making moves if we simply
squander all the time saved constructing lists instead. In fact, if we had to
reconstruct the lists after every move, this would not be an efficient algo-
rithm at all. However, just as with the similar algorithms used in Chapters 5
and 10, it turns out that we do not have to reconstruct the lists completely
at every Monte Carlo step. When we make one move, most of the repton
chain stays exactly as it was and the possible moves are just the same as
they were. The possible moves of only three reptons are affected at each
step (or two if it's one of the reptons at the end of the chain that moves). So
the efficient thing to do is just update the entries for those reptons and keep
320 Chapter 12: The repton model
the two particle types are trying to go in opposite directions but cannot pass
one another. Furthermore, these traffic jams will be unstable to fluctuations.
If one side of the jam becomes larger than the other—if there are more
particles of type A for example than there are of type B—then the larger
half will "push" the smaller half back, and, ultimately, off the end of the
chain. When this happens, we can end up with a configuration of the chain
dominated by one type of particle or the other. What does this situation
correspond to if we project it back onto the original repton model? It turns
out that it corresponds to a polymer that has got itself lined up along the
electric field, so that the reptation motion along the line of the molecule is
in exactly the direction the field is trying to push it. And in fact, this is a
situation that frequently occurs in electrophoresis experiments.
We can use the particle form of the repton model to calculate the ratio of
the number 27V of attempted moves to the number of moves that are actually
possible according to the rules governing the dynamics. When there is no
applied electric field, there is no physical process to distinguish particles of
the two types, and the equilibrium densities of A- and B-particles will be
identical. They will also be independent of position along the chain, since
the rates at which they enter and leave the chain are in equilibrium. Let us
define 0A and 0B to be the equilibrium densities of the two types of particles,
with 0A = 0B when E = 0. Both types of particles can enter at either end
of the chain if the corresponding end site is empty. Summed over both ends,
the rate at which particles enter the chain is thus equal to the probability
1 — OA — OB of an end site being unoccupied, times 2 for the number of ends,
and times another 2 for the two different types of particles. The rate at
which particles leave the chain summed over both ends equals 0A + 0B for
the probability that there will be a particle in one end site, times 2 for the
number of ends. In equilibrium these rates are equal, which implies that
0A + 0B = 2/3- In zero electric field 0A = 0B-, so we should find that _ the
densities of A-particles, of B-particles and of vacancies are all equal to ^.
The probability that a particle can move from a site i to a site i + I is
given by the probability that there is a particle at site i times the probability
that there is a vacancy at site i + 1. In zero field this is just2/3x1/3= 2/9. Thus
the ratio of the total number of attempted moves and the total number of
allowed moves will be | = 4.5 on average. We used this property in our
estimation of the efficiency of our algorithm in the previous section.
which we can calculate and then plug into Equation (12.11). The results
of simulations to determine the zero-field diffusion constant in this way are
shown in Figure 12.5 for values of N up to 250. In this figure DN2 —1/3is
plotted as a function of N on logarithmic scales. On the same axes we have
also plotted the exact results for N = 3.. .20. From this figure it is clear
that the diffusion constant for large polymers approaches D =1 / 3 N - 2 ,as
suggested above. The slope of the line through the data is —2/3,which suggests
that for smaller N there is a correction to de Gennes' scaling form which
goes like N - 2 / 3 . This is actually rather surprising, since similar corrections
in other particle models usually go like A - 1 . The line in Figure 12.5 is given
by
FIGURE 12.5 This figure shows how the diffusion constant scales with
the length N of the repton chain. The circles indicate the known exact
results for chain lengths up to N — 20 and the squares are the Monte
Carlo results. The straight line is given by DN2 = 5 (l + 5 N - 2 / 3 } .
After Newman and Barkema (1997).
will still have some random diffusive behaviour as in the zero-field case,
but if we average over a long enough time we can get an accurate measure
of v.) The electric field exerts a force on each repton proportional to the
parameter E, and the total force on the polymer is thus proportional to NE.
The Nernst-Einstein relation tells us that, if we exert a small force F on a
particle with diffusion constant D, its drift velocity will be v = FD. In our
model, the diffusion constant is given by D ~ N-2 to leading order, so for a
small electric field, the drift velocity should go like v = NED ~ E/N. In this
regime then, the drift velocity depends on the length of the polymer, which
is exactly what we want, since it makes the separation of DNA fragments
of different lengths possible. It is known from experiments however that
for strong fields the drift velocity becomes independent of the length of the
fragment, making such fields useless for separating DNA. Note also that, if
the direction of the electric field is reversed, the polymers will drift with an
equal velocity in the opposite direction. For this reason it has been argued
that the velocity should be expressed as a sum of odd powers of E, so that
when E changes sign, so will v. Combining this idea with the results above,
the form
has been suggested for the drift velocity, where c\ and c3 are constants.
12.4 Results of Monte Carlo simulations 325
This expression contains only odd powers of E, has the right form when E
is small, and becomes independent of N for large E. For many years it was
believed that this expression was essentially correct.
In simulations of the repton model using the algorithms described in
this chapter (and also in simulations of other related models), it is indeed
observed that in a sufficiently strong electric field, the drift velocity is in-
dependent of N. However, the velocity appears to scale as v ~ E2 instead
of the proposed E3. Recent more careful examination of the experimen-
tal results has shown that in DNA electrophoresis the velocity for strong
electric fields actually does go like v ~ E2. The combination of these two
observations, plus some supporting scaling arguments (Barkema et al. 1994)
indicate that the received wisdom on this matter is, after all, wrong.5
So what is the appropriate formula linking the drift velocity to N and El
Well, there are a number of plausible possibilities, but the one that seems
to fit the Monte Carlo data best is
where a = c2/c4 and /? = c22/c4. This equation expresses the product vN2
as a function of just one quantity NE. If our formula is correct then, we
should be able to take the Monte Carlo data and plot the values of vN2
against NE and they should all fall on the line given by this equation (or
"collapse" onto it, as the jargon goes). This is precisely what we have done
in Figure 12.6, where the circles represent our data and the solid line is
Equation (12.16). The best fit of the line to the simulation data is given by
a = 5/6, B =5/18 or equivalently c2 = 1/3, c4 = 2/5. On the left-hand side of the
figure where the electric field is small, vN2 increases linearly with NE. This
corresponds to the expected behaviour v = 1 / 3 E / N for small fields. On the
right-hand side, vN2 goes quadratically with NE: we have v =2/5E2for large
fields, independent of the value of N, in accordance with the experimental
5
How does this fit with the argument about reversing the electric field and the odd
powers of E? Well, we simply write the large E drift velocity in the form v ~ E|E|, which
goes like E2 for positive E and like — E2 for negative E. If we do this, v is no longer
analytic, but there is no requirement that it should be. This is the root of the error in
the old argument.
326 Chapter 12: The repton model
observations. The cross-over between the two regimes occurs when the two
terms on the right-hand side of Equation (12.16) are approximately equal,
or in other words around NE = 5/6. Thus for longer and longer chains, we
have to work at smaller and smaller fields in order to be sure of falling in the
regime in which v is ./V-dependent and hence separation by length occurs.
So what does this tell us about real experiments? Well, having obtained
a convincing collapse for the data from our simulations, we should perhaps
try to apply the same idea to the experimental data. A difficulty that arises
is that experiments have an additional parameter, the concentration p of
the gel. However, it turns out that we can still get a collapse of the data
if we use the same collapse formula that we used for the simulation data
modified to include variation in p too. In Figure 12.7 we show experimental
values of the product p5/2vN2 as a function of pNE for experimental DNA
electrophoresis data published by Heller et al. (1994). As in the repton
model, the left-hand side of the figure shows linear behaviour, while the
right-hand side is quadratic. The solid line running through the points is
inspired by the collapse function that we obtained from the Monte Carlo
simulations and is given by
(These powers of p were just found by trial and error; unlike the powers of
12.4 Results of Monte Carlo simulations 327
N and E, our Monte Carlo simulation does not give us any hints as to what
the correct forms are for the variation with gel concentration.)
The fact that we can obtain a good collapse of the experimental data us-
ing essentially the same method which worked for the repton model suggests
that the important processes which drive the dynamics of DNA electrophore-
sis are the same ones which went into the repton model, namely entangle-
ment of the polymer in the gel and reptation as the dominant mechanism of
movement.
Problems
12.1 We obtained a collapse of the data from our repton model simula-
tions using the scaling form given in Equation (12.16). What would be the
appropriate scaling form for a system which obeyed Equation (12.14)?
12.2 De Gennes (1971) proposed a model of reptation slightly different from
the one discussed in this chapter. In his model the polymer chain is once
again represented as a line of particles or reptons on a lattice. Unlike the
repton model however, successive reptons in the chain are prohibited from
occupying the same square on the lattice; they can only occupy adjacent
squares. The possible moves of the chain are that (i) either of the two
reptons at the ends of the chain can move to any other square, as long as
328 Chapter 12: The reptou model
the resulting configuration is an allowed one, and (ii) any two adjacent links
in the chain which are anti-parallel can move together to another position,
provided the resulting configuration is lawful, (a) Verify that the chain
does indeed move by reptation. (b) The most straightforward Monte Carlo
algorithm for this model is to choose at each step one of the two end links
of the chain or a pair of links in the middle, and move them at random to a
new allowed position. What increment of time should be attributed to one
such elementary move for the two-dimensional version of the model? (c) In
the two-dimensional case what is the average probability that any proposed
move will be accepted?
Part III
Implementation
This page intentionally left blank
13
Lattices and data structures
proximate reasonably well to sites drawn from the interior of a large system
(see Figure 13.1).
For many simulations on square and cubic lattices a representation of this
type is completely adequate. However, for some simulations where speed is
critical it can be inefficient. The problem is that the memory of a computer
is only one-dimensional. Each memory location is labelled with a single
number. To represent an L x L array in memory therefore, the computer lays
out the elements of the array one after another in memory and the value of a
spin with lattice coordinates i and j is found at the variable iL + j from the
beginning of the array.2 This means that every time you retrieve the value
of a particular spin, the computer has to perform a multiplication to work
out where that value is. And multiplications are slow. Often this is only the
tiniest contribution to the total time taken by the simulation. Sometimes,
however, as in the Metropolis algorithm for the Ising model (Section 3.1), it
can be quite a significant portion of the total time because the rest of the
algorithm is very simple and can be programmed with great efficiency. In
cases like this, it often pays to use helical boundary conditions.
Helical boundary conditions are a variation on the idea of periodic bound-
ary conditions in which each lattice site is indexed by just a single coordinate
i. For the case of a two-dimensional L x L lattice for instance, i would run
from 0 to L2 - 1 in C, or from 1 to L2 in FORTRAN. (Henceforth, we will
use the C-style convention of indices starting at zero, since the formulae
work out a little simpler in this case. The FORTRAN version can easily be
2
FORTRAN uses so-called "column major" ordering of arrays (as opposed to the "row
major" ordering used in C) and stores the element at i + jL instead.
334 Chapter 13: Lattices and data structures
derived from the C one.) On the square lattice i increases along one row
of the lattice and then along the next and so on until the entire lattice is
filled. On a cubic lattice each plane is filled in this fashion and then the
next plane, and so on down the z-axis. The idea can also be generalized to
higher dimensions still. In Figure 13.1 we contrast helical boundary condi-
tions with ordinary periodic ones for the case of the square lattice. As the
figure shows, in both systems each site is equivalent to all the others, and
each site approximates to a site drawn from the interior of a large system.
The advantage of using helical boundary conditions is that no multiplica-
tion is necessary to find the value of a spin or other variable at a particular
site. The spins on the lattice are now represented by a one-dimensional
array, regardless of the actual dimensionality of the lattice, and the value
of spin i is given simply by s[i] or S(I). Since there is no multiplication
involved (and in addition, only one index i to deal with) most modern com-
puters can retrieve this value from memory about twice as fast as in the
case of periodic boundary conditions. The wrapping around of the lattice is
handled in a way very similar to the periodic boundary condition case: the
four neighbours of the spin at site i are (i ± 1) mod L2 and (i ± L) mod L2.
In practice, it is often advantageous to work with a lattice where the total
number of sites is a power of 2. In this case, the modulo operation can be
performed very simply by setting the highest order bits to zero, which we
can do with a bitwise AND operation, which is usually much faster than the
modulo function.
Although they may seem a little asymmetric at first, helical boundary
13.1 Representing lattices on a computer 335
produce a lattice which has the same topology as the regular triangular
lattice; a simple shear will take the lattice into a triangular one. Since it
is only the topology of the lattice that we are interested in, we can thus
represent a triangular lattice in exactly the same way as we did the square
lattice, using a square array. The only thing which changes is which sites
are nearest neighbours of which others. If we use a two-dimensional array
to store the lattice, the neighbours of site (i,j) are (i ± 1, j) and (i, j ± 1) as
before, but there are also two new neighbours (i + l,j — 1) and (i — l,j +1).
If we use periodic boundary conditions then all these coordinates need to be
calculated modulo L, where L is the dimension of the lattice. If we store
our lattice in a one-dimensional array and use helical boundary conditions,
which is usually more efficient (see Section 13.1.1), then the neighbours of
spin i are i ± 1, i ± L and i ± (L — 1), all modulo N, where N is the total
number of sites on the lattice (which is usually L2, though it need not be).
One small issue with this representation of the triangular lattice is that
the overall shape of the lattice in Figure 13.3 is that of a rhombus. This is a
little unsatisfactory because the rhombus does not have the same symmetries
as the lattice itself. An infinite triangular lattice has a six-fold rotational
symmetry, whereas the rhombus only has a two-fold one. For large systems
this is not usually a problem, but for smaller ones it can give rise to unwanted
finite-size effects, particularly if we are calculating directional quantities for
which the symmetries are important, such as correlation functions. A useful
trick for getting around this problem is illustrated in Figure 13.4. As the
figure shows, if instead of an L x L system we use one which is longer on one
side than the other, and we stagger the boundary conditions, we can arrange
for the shape of the repeated cell in the lattice to be a hexagon, which has
the same six-fold symmetry as the infinite lattice. The figure shows how to
do this for a system with helical boundary conditions, although the same
13.1 Representing lattices on a computer 337
FIGURE 13.5 The solid circles and lines depict the honeycomb (left)
and Kagome (right) lattices. As the empty circles indicate, both these
lattices are subsets of the sites in a triangular lattice.
For a Kagome lattice with periodic boundary conditions, the missed out
sites are those for which i and j are both odd (or both even, if you prefer).
Helical boundary conditions will not work for the Kagome lattice, and so we
are restricted to using periodic ones. (Readers might like to take another
look at Figure 13.5 to convince themselves of this.) Below we describe an
alternative representation of the Kagome lattice which does not suffer from
this drawback.
These representations of the honeycomb and Kagome lattices are simple
and relatively straightforward to implement. They are, however, wasteful.
They require us to define arrays to hold our lattices which are bigger than
the number of sites on the lattice, the extra elements of the array simply
not getting used. In the case of the honeycomb lattice for example, this
increases the space needed to store all the spins or other variables on the
lattice by 50%. For smaller simulations this is unlikely to bother us, but for
large lattice sizes we might run into problems with the memory capacity of
our computer. As it turns out there are other ways of representing these
lattices which are more economical with memory and no slower than the
simple methods given above. What's more, studying these representations
will give us a good idea of how to represent more complex lattices still, such
as the diamond lattice of Section 13.1.3.
The fact that we can represent a triangular lattice in the simple fashion
described at the beginning of this section is a special consequence of a more
general property of lattices. The triangular lattice is one of the five Bravais
lattices in two dimensions. Bravais lattices (in two dimensions) are ones for
which the coordinates of every site can be written in the form r = ia1 + ja2,
where i and j are integers. The vectors a1 and a2 are linearly independent
and their magnitudes are the lattice parameters a1, a2 of the lattice. The
simplest two-dimensional Bravais lattice is the square lattice, for which a1
and a2 are the perpendicular unit axis vectors x and y. In the triangular
case a1 and a2 are again unit vectors, but this time set at a 60° angle to one
another, as shown in Figure 13.3. The other two-dimensional Bravais lattices
are the rectangular, centred rectangular and oblique lattices, none of which
you are ever likely to use in a Monte Carlo simulation. However, we can
make use of the Bravais lattice idea to construct more complicated periodic
lattices such as the honeycomb and Kagome lattices, which are useful for
Monte Carlo work.
A periodic lattice is any infinite lattice which maps exactly onto itself
when shifted along a correctly chosen translation vector. Periodic lattices
account for almost all the lattices used in Monte Carlo simulations. Rare
exceptions are the random lattices used in some quantum simulations, and
quasiperiodic lattices, which are of interest in the study of quasicrystals. We
discuss the representation of lattices like these in Section 13.1.4. Periodic
lattices can be represented in terms of Bravais lattices (of which they are
13.1 Representing lattices on a computer 339
where L now represents the linear dimension of the lattice in unit cells.
For the Kagome lattice, the unit cell consists of three sites arranged in
an equilateral triangle. The underlying Bravais lattice is again a triangular
one. In Figure 13.7 we have used this construction to create a numbering
scheme for the Kagome lattice, appropriate for use with helical boundary
conditions. In this case the nearest neighbours of site i are as follows:
340 Chapter 13: Lattices and data structures
is the usual lattice structure for the majority of crystalline metals, body-
centred cubic (bcc) which is the lattice structure for iron and the alkali
metals, and the diamond lattice which, as its name indicates, is the lattice
adopted by the carbon atoms in diamond, amongst other examples. If you
are not familiar with these lattices, a good description can be found in, for
example, Ashcroft and Mermin (1976).
The representation of these lattices can be tackled in two different ways,
exactly akin to those used for the honeycomb and triangular lattices in Sec-
tion 13.1.2. The simpler though more squanderous way is to notice that the
sites in all of these lattices are subsets of the sites in a cubic lattice. So we
can represent them as three-dimensional cubic arrays in which some of the
elements are not used. The appropriate rules are as follows:
• A point (h, k, l) on the cubic lattice5 is a member of the fee lattice if
h + k + l is even.
• A point (h, k, l) is a member of the bcc lattice if ft, k and l are either
all even or all odd.
any other observable defined on the lattice: we simply run through the sites
on the lattice adding terms for the interactions of each site with each of its
neighbours. If we have three-point or higher interactions in the Hamilto-
nian we don't need to store any further lists; we can calculate all orders of
interactions from just the topological information contained in our lists of
nearest neighbours.
13.2.1 Variables
All high-level computer languages provide a variety of different variable types
which can be used to represent the variables in our physical models. The
most common are integer variables and real or more correctly floating-
point variables. We assume that the reader is already familiar with these
types of variables and their implementation in the language of his or her
choice. Some languages, notably FORTRAN, also provide complex vari-
ables. For languages in which there is no complex variable type we can
usually create one using a structure or class (see below). Other variable
types which occur in some languages include Boolean (true/false) variables,
character (single letter) variables and strings (rows of letters such as words or
sentences). All of these may find occasional use in Monte Carlo calculations.
Many computer languages provide a selection of different precision ranges
for variables. Integers, for instance, may be stored in 16, 32 or 64 bits and
may be signed or unsigned. Floating-point numbers typically come in single
precision (32-bit) or double precision (64-bit) incarnations, while newer
computers often provide 80-bit or even longer representations. At the time of
344 Chapter 13: Lattices and data structures
the writing of this book, almost all new computers come with special-purpose
hardware for performing fast calculations with double precision floating-
point numbers, which makes these calculations just as fast as their single
precision counterparts. Unless memory space is an important consideration
therefore, there is usually no reason not to use double precision variables in
your simulations.
Two common extensions of the basic idea of a program variable are
pointers and references. Pointers, which occur in C and Pascal and deriva-
tive languages, are variables which store not the value of a quantity, but the
location of that value in the memory of the computer. Pointers are useful,
amongst other things, for efficient implementation of the linked list and tree
data structures which are discussed in following sections. References are in
some ways similar to pointers. A reference is a variable for which we can
specify where in the memory of the computer its value is stored. (Often we
are only allowed to do this indirectly by specifying that the memory location
should be the same as that of some other variable we already have.) Ref-
erences allow us to do pretty much the same things as pointers do, but are
sometimes more convenient to work with. References occur most notably in
C++, but are also found in a more limited form in some older languages,
such as FORTRAN and Pascal.
Two other useful features found in many modern computer languages
are structures6 and classes. A structure is a collection of variables which
we refer to by one name. For example, in a program which used complex
variables, we could define a structure consisting of two floating-point num-
bers to represent the real and imaginary parts of a complex number. This
is merely a convenience: there is nothing that can be done with structures
which cannot also be done with ordinary variables. However, one should not
underestimate the importance of such conveniences. By making programs
simpler and easier to read, structures can greatly reduce the number of bugs
introduced into a program when it is written, and help us to spot more easily
those that we do introduce.
Classes are found in object-oriented languages and are an extension of
the idea of structures. Like a structure, a class contains a number of vari-
ables which can all be referred to by one name, but a class can also contain
functions which perform operations on those variables. This again is just a
convenience, but a very useful one. A properly written class of complex vari-
ables, for instance, would allow us to perform straightforward arithmetic on
our complex numbers without ever worrying about the real and imaginary
parts; we could simply add or multiply our variables together and every-
6
The word "structure" here refers to a specific construction in a high-level computer
language and should not be confused with our more general use of the phrase "data
structure" throughout this chapter to refer to ways of storing the data in our Monte Carlo
simulations.
13.2 Data structures 345
thing would be taken care of automatically. Again, this can greatly improve
the readability of our programs and help us to avoid introducing too many
bugs.7
In the rest of this chapter we will look at a variety of data structures for
storing large numbers of variables in efficient ways. The variables in these
data structures can be any of the types described above. The techniques
are the same whether you are using integers or real numbers or complex
numbers or any other type of data.
13.2.2 Arrays
The most common data structure is the array, which is a collection of
variables stored in a block of memory on the computer and indexed by one
or more integer labels. Arrays are so important that all high-level computer
languages provide them as a feature of the language. They also provide the
perfect way of representing the variables on a periodic lattice, as described
in the first part of this chapter. We will assume that our readers are already
familiar with the idea of arrays and how they are used.
FIGURE 13.8 A linked list used, in this case, for the alphabetical
storage of names.
that it is not possible to read any particular element in the list without first
going through each of the preceding elements in turn. This makes linked
lists good for algorithms which only require sequential access to items in a
list (such as sparse matrix multiplications), but poor for algorithms where
"random access" is required. (For cases in which random access is required,
the tree structures described in Section 13.2.4 are often a good solution.)
A linked list is a set of data items, stored in any order in the memory of
the computer, with each of which is associated a pointer which points to the
memory location of the next item in the list. As long as we know the location
of the first item, we can then easily run through all of them by starting at
the first, reading its value, and then following its pointer to the second item
and reading that, and so on. We also need some way of indicating when an
item is the last one in a list. Typically this might be done by setting the
pointer for that item to a special value such as 0 or — 1.
Figure 13.8 is a schematic illustration of a linked list of names. In this
example, the names are stored in alphabetical order. If we want to add a new
name to the list, it should be inserted in the correct place in the alphabet.
This is actually very simple to do, as Figure 13.9 shows. To insert a new
element after an existing element en, we create a variable containing the new
element, set the pointer of element en to point to it, and set its pointer to
point to en+1. There is no need to shift any data around the memory of
the computer, making the list much quicker to change than an equivalent
array. Sometimes it may also be necessary to insert a new element before an
existing one en. This is a little more complicated. What we would like to do
is create a variable containing the new element and set the pointer of element
e n _1 to point to it. Unfortunately, we can't easily do this because we can't
go backwards from element en to get to element en-i; the pointers only point
one way along the list. Instead we use the trick illustrated in Figure 13.10.
13.2 Data structures 347
We create a new variable and copy both the value and the pointer from
element en into it. Then we set the pointer of element en to point to this
new element, and we store our new value in en. This only involves looking
ahead along the list and not backwards, and achieves the desired result of
inserting the new value into the list ahead of the nth element.
We use a similar trick to remove an element from the list. If we wish to
remove element en, then what we would really like to do is set the pointer
of element e n _1 to point to en+1 and just throw en away. This again would
involve moving backwards in the list, which we cannot easily do. So instead
we set the value of en equal to the value of en+1, set the pointer of en to
point to en+2, and throw away element en+1. This achieves the desired result
while only looking ahead along the list.
Sometimes, there will be cases where we really do need to be able to
move backwards as well as forwards through a list. In this case we can use
a doubly linked list in which each element includes pointers to both the
next and previous elements of the list. Making changes to such a list takes
longer than in the case of the singly linked list, since two pointers have to be
maintained for each element. For this reason we should use a singly linked
list wherever possible, reserving the doubly linked one for those cases where
the need to move backwards is unavoidable.
These methods, as we have described them, rely on two crucial features
of the computer language used: it should have pointers and it should allow
dynamic memory allocation. At the time of the writing of this book, this
restricts us to using C or C-like languages. In other languages a more prim-
itive implementation of the linked list is possible, using an array and storing
array indices instead of pointers to elements. Such structures have most of
the advantages of linked lists but their size is limited by the size of the array
used to store them, and so cannot expand arbitrarily, as a true linked list
can.
13.2.4 Trees
Both arrays and linked lists are good for certain types of task, but both have
their problems. As we saw in Section 13.2.3, a linked list allows us to add
or remove elements efficiently at any point in the list, and the list can grow
or shrink arbitrarily as the number of elements we wish to store changes.
Arrays do neither of these things, having a structure and size which cannot
be changed after they are first declared. On the other hand, arrays are good
for some other things which linked lists are not, such as allowing us to read
a specified value quickly without having to run through all the preceding
elements of the array.
The obvious question to ask then is whether there is some other data
structure which combines the good features of both arrays and linked lists.
13.2 Data structures 349
to the current node. To one of them we assign the old name that used
to be stored in the current node, and to the other we assign the new
name.
This procedure also takes an amount of time which scales as the logarithm
of the number of names or other values stored in the tree. This is much
better than the time taken to add a new name to an alphabetically ordered
array, which goes linearly with the number of names. However, it is not as
good as the case of the linked list, for which the time taken to add a new
name is independent of the length of the list. For this reason you should use
linked lists where possible, reserving the binary tree for cases in which it is
important to be able to search for a particular value quickly.
Finally, to remove a name from the tree, we start once more at the root
node and perform the following steps:
1. We find the leaf which contains the name we want to remove, using
the search method described above. As we move through the tree, we
record the location of each node we pass through. These locations are
needed in the next two steps.
2. From the leaf containing our name, we retrace our tracks to the grand-
parent node of that leaf and alter it so that it points to the brother of
the node we are removing.
3. We check whether the alphabetic range recorded by the grandparent
node still correctly describes its children. If it does, we are finished.
If not, we correct it, and then move up to the great-grandparent, and
so on, repeating this step until we reach a node for which no changes
need to be made, or we reach the root node.
This process, like the addition of a name, takes a time which increases loga-
rithmically with the number of names or other values stored in the tree.
In fact, it is not always true that all these operations take an amount
of time going logarithmically with the size of the tree. This is the typical
result, but it is possible for trees to get into a pathological state in which it
takes much longer to perform searches or updates. If the number of leaves
which you can get to by choosing either path from a node is roughly the
same, then the logarithmic rule applies. Such trees are said to be balanced.
If we happen to add names to the tree in the wrong order however, it is
quite possible that the tree will become unbalanced and the efficiency of our
algorithms will fall dramatically. In the worst case it can take an amount of
time which scales linearly with the size of the tree to both search and update
it, which is worse than either an array or a linked list. There are a variety
of strategies for preventing this situation. Often we have some flexibility in
the way in which we add values to the tree. In the example given above,
352 Chapter 13: Lattices and data structures
for instance, we will sometimes add a new name which falls between the
alphabetic ranges covered by the two children of a node. In this case we
are free to choose the branch which we add the new name to. If we always
add the name to the branch which leads to fewer leaves this will help to
keep the tree balanced. There are times, however, when strategies like this
are insufficient to maintain balance. In these cases we can employ one of a
number of algorithms which have been developed for rebalancing trees. The
working of rebalancing algorithms is rather technical in nature however, and
we won't go into it here. A good discussion is given by Sedgewick (1988).
An example of the use of a tree data structure in a Monte Carlo algorithm
can be found in Section 11.2.2, where we describe an efficient algorithm for
simulating the diffusion of adatoms on a crystal surface.
13.2.5 Buffers
A buffer is a data structure used to store the values of variables temporarily,
and retrieve them later. Buffers come in a number of different varieties. Here
we describe the two which occur most commonly in Monte Carlo programs,
the last in/first out (LIFO) buffer or stack, and the first in/first out
(FIFO) buffer or queue. Both of these can be created easily using arrays.
The descriptions here assume arrays whose indices start at zero as in C,
rather than at one as in FORTRAN. However, it is very straightforward to
adapt the same ideas for use in FORTRAN programs.
A last in/first out buffer operates exactly as its name suggests. We
"push" the values of as many variables as we like into the buffer, and at
any point we can "pop" a value out again. The values come out in the
opposite order to the way we put them in, so that the last one which went
in is the first one out. To create a LIFO buffer in a computer program we
make a one-dimensional array of the appropriate variable type with enough
elements to store the maximum number of values which the buffer will have
to contain—let us call this number L—and we also create an integer variable,
m say, to store the number of values in the buffer. Initially m = 0. When we
push a value onto the stack, we place the new value in the mth element of
the array and then increase the value of m by one. When we want to pop a
value out again we decrease m by one and read the value in the mth element
of the array. At any time the number of values in the buffer is equal to the
value of m. If m ever reaches L, then the buffer is full and no more values
can be added. If m = 0 then the buffer is empty and no more values can be
removed.
A first in/first out buffer is one in which the values stored pop out in the
same order as they were put in, so that the first one in is also the first one
out, just as the name says. One simple way to implement a FIFO buffer in a
computer program would be to create an array of L elements and an integer
13.2 Data structures 353
variable m to store the number of values in the buffer, just as in the LIFO
case, with m = 0 initially. Just as before, when we add a value to the buffer
we place the new value in the mth element of the array and then increase
m by one. Now however, when we want to pop a value out of the buffer,
we read the first element of the array (which would have index zero in C),
decrease m by one, and then move all the other elements down one place,
so that a new one becomes the first element, ready to be popped out in its
turn.
Simple though it is, this is not a good way to implement our buffer. It
will work, but it will be slow. Every time we pop a value out of the buffer
we have to reassign the values of m elements in the array. This takes an
amount of time proportional to m and if m is very large it could be a very
slow process. Luckily, there is a much better way to make a FIFO buffer,
using circular buffering. Circular buffering also makes use of a single
array of size L to store the contents of the buffer, and in addition it uses
two integers m and n which point to the first and last values in the buffer.
Initially m and n are both set to zero. When we want to add a value to the
buffer we place the new value in the mth element of the array and increase
m by one, modulo L. When we want to remove a value we read the nth
element of the array and then increase n by one modulo L. The number of
values in the buffer at any time is equal to m — n (possibly plus L because
of the modulo operations).
This way of implementing the FIFO buffer is much more efficient than
the first one we proposed. Each push or pop operation requires us to read
or write the value of only one variable, and can be completed in an amount
of time which remains constant as the number of values in the buffer grows.
There are a couple of things we need to be careful about however. First,
just as in the case of the stack, a FIFO buffer will overflow if we try to put
more than L values into it. To guard against this we should compare the
values of m and n after each value is added to the buffer. If m = n after a
value has been added, then the buffer is full and adding any further values
will probably cause our program to malfunction. Second, the buffer can
underflow if we try to remove a value from it when there are none there. To
guard against this we should compare the values of m and n after removing
each value. If m = n after removing a value then there are no more values
left in the buffer.
As an example of the use of buffers in a Monte Carlo algorithm, consider
the Wolff cluster algorithm for the Ising model, which was discussed in Sec-
tion 4.2. We summarized a step of the algorithm as follows in Section 4.2.1:
1. Choose a seed spin at random from the lattice.
2. Look in turn at each of the neighbours of that spin. If they are pointing
in the same direction as the seed spin, add them to the cluster with
354 Chapter 13: Lattices and data structures
probability Fadd = 1 - e- 2 B J .
3. For each spin that was added in step 2, examine each of its neighbours
to find the ones which are pointing in the same direction, and add
each of them to the cluster with the same probability Padd- This step
is repeated as many times as necessary until there are no spins left in
the cluster whose neighbours have not been considered for inclusion in
the cluster.
4. Flip the cluster.
The simplest way for a computer program to carry out this procedure
is to store the spins of the current cluster in a buffer as follows. Starting
off with an empty buffer, we first choose one of the spins on the lattice at
random to be our seed. We look at each of the neighbours of this spin, to
see if it points in the same direction as the seed itself. If it does, we generate
a random real number r between zero and one, and if r < Padd, we add
that spin to the cluster and we add its coordinates to the buffer. When we
have exhausted the neighbours of the seed spin we are looking at, we pop
a spin out of the buffer, and we start checking its neighbours one by one.
(If some of its neighbours have already been added to the cluster then we
should miss those out, otherwise the algorithm will never stop.) Any new
spins added to the cluster are again also added to the buffer, and we go on
popping spins out of the buffer and looking at their neighbours in this way
until the buffer is empty. This algorithm guarantees that we consider the
neighbours of every spin added to the cluster, as we should.
This method will work equally well with either a FIFO or a LIFO buffer
and, statistically speaking at least, we should get the same answers whichever
we use. Furthermore neither buffer type gives an algorithm significantly more
efficient than the other. However, it is worth noting that there are differences
between the two in the way in which the clusters grow. Figure 13.12 shows
how the same cluster grows using each type of buffer. With the FIFO buffer,
the cluster grows in a spiral fashion, remaining, on average, roughly isotropic
throughout its growth. With the LIFO buffer, the cluster first grows along a
line in one direction, and then backs up along its tracks and begins to grow
sideways. Because of these differences in cluster growth patterns, these two
versions of the Wolff algorithm can provide useful checks against potential
problems in the program, such as bugs or deficiencies in the random number
generator used. (The Wolff algorithm is known to be unusually sensitive to
imperfections in random number generators (Ferrenberg et al. 1992).) In
addition, there some variations on the basic Wolff algorithm for which one
or other buffer type may be more efficient. For example, if we wish to place
constraints on the maximum size of a cluster, as is done in the limited
cluster flip algorithm (Barkema and Marko 1993), then an implementa-
tion making use of a FIFO buffer is usually faster.
13.2 Data structures 355
FIGURE 13.12 The numbers indicate the order in which sites are
added to a cluster in the Wolff algorithm. If a FIFO buffer is used
(left) the sites are added in a spiral around the initial seed spin. If a
LIFO one is use (right) the cluster tends to grow first in one direction,
and only later spreads out.
Problems
13.1 If we pack a two-dimensional space with identical circular disks, what
fraction of the space can we fill if the disks are packed with their centres
lying on (a) a square lattice and (b) a triangular lattice?
13.2 As described in Section 13.1.3, the set of points with integer coordi-
nates (h, k, l) where h, k and l are either all even or all odd lie on a bcc
lattice. If we stretch this lattice in the z-direction by a factor of v2, we get
another periodic lattice. What lattice is this?
13.3 A common problem that comes up in many Monte Carlo algorithms
is choosing a single value at random from a list. If the values are stored
in consecutive elements of an array this is easy, but it is harder if they are
stored in a binary tree of the type described in Section 13.2.4. Suppose we
have a number n of values stored in the leaves of such a tree. Devise an
algorithm which selects one of them uniformly at random, given a random
integer 0 < r < n. (Hint: you will need to add to the data stored in the
tree.)
13.4 The time taken to find a particular element stored in a balanced bi-
nary tree varies as the number of levels of the tree which we have to search
through, which is log2 N, where N is the number of elements stored. If a
binary tree is unbalanced in such a way that one child of each node has
roughly twice as many descendants as the other, the average search time
still scales as log2 N but with a larger multiplicative prefactor. How much
larger is this prefactor?
14
Monte Carlo simulations on
parallel computers
the surrounding spins (see Equation (3.10)), and for the spins on the edge
of a domain the values of one or more of the surrounding spins are stored in
the memory of another processor and are not available. One solution to this
problem might be for every processor to keep a record of the values of all
the spins just outside the border of the domain which it covers. This would
require each processor to inform its neighbour whenever it changed the value
of a spin on its border, so that the neighbour can keep its records up-to-date.
This approach has its problems however, the main one being that it requires
all the processors to run in perfect synchrony. Since this rarely occurs in real
computers, one would have to ensure synchronization by having processors
wait after each Monte Carlo step until the slowest amongst them had caught
up with the rest. This is wasteful of CPU time and besides, there is a much
better way of doing it.
The efficient way to perform the simulation is simply not to flip the spins
on the border of a domain. We flip all other spins as usual, but not the
spins on the borders (see Figure 14.1 again). This doesn't affect detailed
balance, since the forward and backward probabilities for flipping a border
spin are both zero, so that Equation (2.12) is still obeyed. It does however
destroy the ergodicity of our algorithm, since there are some states (the ones
14.2 More sophisticated parallel algorithms 361
in which the border spins have been flipped) which can never be reached.
To get around this problem we move the domain borders periodically, so
that all spins get a chance to be in the interior of a domain. When we move
a border, some spins pass out of the domain covered by one processor and
into that covered by another, which means that we have to send the current
values of those spins from the old processor to the new. This of course
requires us to send some inter-processor messages, which, as we pointed out
at the beginning of the chapter, is a comparatively slow process. On the
other hand, we don't have to send such messages very often. The ideal time
interval between successive moves of the domain boundaries turns out in
fact to be the correlation time of a single simulation the size of one of our
domains. Under any circumstances this time is at least one Monte Carlo step
per site, which means we still get to do quite a large chunk of simulation in
between one reorganization of the domains and another.
Problems
14.1 (a) The lattice in Figure 14.1 is divided into six regions of equal size
by fixing 126 spins. Is it possible to divide it into six such domains but fix
fewer spins? (b) What is the biggest lattice that can be divided into eight
domains of equal size with 1000 fixed spins?
14.2 In Figure 14.1 a 33 x 22 lattice is divided into six square regions.
14.2 More sophisticated parallel algorithms 363
The Monte Carlo algorithm of Section 14.2.1 for the Ising model will not be
ergodic if the boundaries are not moved, since the boundary spins are never
changed. Can we make the algorithm ergodic by alternating between two
different positions of the boundaries which are shifted with respect to one
another?
14.3 In the Ising model algorithm described in Section 14.2.1, we shift the
boundaries between regions of the lattice about once every T Monte Carlo
steps per site, where T is the correlation time of a simulation performed
on a lattice the size of the region covered by a single processor. Calculate
how the fraction of time spent on communication scales with the number of
processors for fixed system size, in terms of the dimension d of the system
and the dynamic exponent z. Now perform the same calculation for the
other algorithm suggested in Section 14.2.1, in which the values of the spins
on the borders of each region are transmitted to neighbouring processors
every time they are changed.
15
Multispin coding
simulations simultaneously and for this reason are suitable for the same types
of problems as the "trivially parallel" Monte Carlo simulations discussed
in Section 14.1, i.e., ones for which performing many separate simulations
will give us results as good as performing one long simulation. As with our
trivially parallel simulations, if the equilibration time of our system is a large
fraction of the total simulation time, or if we want results for particularly
large systems or long times, then this type of multispin coding will probably
not work. Moreover, don't even think about using multispin coding if your
Monte Carlo algorithm is a complex one. If the number of lines in the core
part of the algorithm is larger than about a hundred, then multispin coding
is not the way to go. Multispin-coded programs are hard to write, hard to
debug, and hard to alter if you decide to change the way the algorithm works.
Getting such a program to work for a complicated Monte Carlo algorithm
would be difficult and painful, and there's a good chance that you would
never get it to work at all.
However, if you have a simple Monte Carlo algorithm which you wish
to implement—the Metropolis algorithm is a good example—and if your
problem is simply that you can't run your simulation for long enough to get
good statistics for the quantity you want to measure, then multispin coding
could be the answer to your difficulties.
then we have performed one step of our Monte Carlo algorithm. We call this
expression the master expression for our multispin-coded Monte Carlo
algorithm. This particular master expression is quite a simple one. As we
will see in later sections of this chapter, deriving a master expression for a
more complicated model is often quite hard work.
The point of all this is that we can now put many bits, typically 32 or 64
of them, representing many different spins, into one word on our computer,
and use Equation (15.1) to perform our Monte Carlo step on all of them
simultaneously. Notice that we can just as well apply (15.1) to an entire
word full of bits as to a single bit at a time; the expression applies the
same operation to each bit in the word independently and the bits to do not
interfere with one another.
In practice, the simplest way to perform the simulation is to simulate
many systems at once asynchronously, as described above. Let us assume
that the computer we are using employs 32-bit words. Then we would sim-
ulate 32 different systems simultaneously by storing bit i of each of the 32
systems as one of the bits in the ith word of a lattice made up of such words,
15.1 The Ising model 367
and applying Equation (15.1) directly to these words. Each step of such a
Monte Carlo algorithm takes only slightly longer than an equivalent step of
the normal Metropolis algorithm, but generates 32 times as many measure-
ments of the magnetization or energy, or whatever quantity it is that we
are interested in. In our own experiments with the one-dimensional Ising
model we have measured an effective increase in the speed of the simulation
by a factor of 28 on a 32-bit computer. Using a 64-bit computer we have
measured an increase of a factor of 56.
One point which we haven't dealt with is that in order to use the algo-
rithm described, we need to be able to generate a random word of 32 or 64
independent bits in which each bit is 1 with a given probability exp(—4BJ).
We discuss how such words can be generated in Section 16.3, after we have
described how normal random numbers are generated.
Here .Ro and -Ri are logical expressions which are 1 if zero or one of the
spin's nearest neighbours are anti-aligned with it respectively, and R>2 is
1 if two or more are anti-aligned. The variables ro and T1 are random bits
which are 1 with probability exp(—8BJ) and exp(—4BJ) respectively.
This equation will work fine, but it is slightly unsatisfactory, mainly be-
cause the expression for R1 turns out to be quite complicated. An alternative
and equivalent expression is
redefine our random bit variables. In fact, r'0 is just the same as r0; it is
a random bit which is 1 with probability exp(—8BJ). However, r'1 is not
the same as r1.. Notice that if R>1 = 1, then necessarily R>0 = 1 also,
since if there are one or more anti-aligned spins, then there must logically
also be zero or more anti-aligned spins. In this case the probability of the
expression inside the square brackets being 1 is the probability that either
one or both of r'0 and r'1 is 1, and we would like this probability to be equal
to exp(—4BJ). If we write the individual probabilities of r'0 and r'1 being 1
as po and p1, then the probability of either of them being 1 is1
Equation (15.3) is easier to work with than Equation (15.2) because R>0
and R>0 are relatively simple to evaluate. In fact, since we always have zero
or more anti-aligned neighbours of any spin we pick we can immediately see
that.R>o = 1, so we can write the master expression as
which are 1 if and only if the corresponding pair of spins are pointing in
opposite directions. In terms of these variables we can then write R> 1 as
1
To prove this we just note that the probability that neither of them is 1 is given by
l-p=(l-po)(l-p1).
2
In order to demonstrate this, recall that the logical AND and OR operators satisfy
the same rules of distribution and association as multiplication and addition in ordinary
algebra.
15.2 Implementing multispin-coded algorithms 369
Combining all of these expressions and feeding them into Equation (15.6)
we now have our complete algorithm. As in the one-dimensional case, the
simplest way to proceed is to simulate many different systems separately
with the corresponding spins of all systems stored in the bits of one word
on the computer. An example program to perform a simulation using this
algorithm is given in Appendix B.
Since the logical expressions we need to evaluate at each Monte Carlo
step are relatively complex, the gain in efficiency we get from the multispin
coding is not as great in the two-dimensional case as it was in the simpler
one-dimensional one. Running on a computer with a 32-bit CPU we have
measured an effective increase in the speed of the simulation by a factor of
about 21. On a 64-bit computer we measure a factor of 38.
operation symbol in C
AND A &
OR V
XOR
NOT — -
shift left
shift right
master expression should take and how to construct it out of the available
variables such as spins or random bits. For more complicated models how-
ever, it can often be quite difficult to just write down the equations in this
way, and so a number of analytic tools have been developed to help us. The
most important of these are truth tables and Karnaugh maps.
A truth table is a list of the values of one or more Boolean or single-
bit variables which are functions of other such variables. If we wish to find
a compact logical expression for a certain quantity, such as the quantity
R>2 defined in Equation (15.2), then a good first step is to write down in
a truth table the values we would like it to take for all possible values of
the independent variables. In Table 15.2 we give an example of such a truth
table for a fictitious quantity A which is a function of four other quantities
B1... B4. As you will see, the left part of the table lists every possible
combination of values of the B variables, and the right part lists the values,
0 or 1, which we would like A to take for each combination of inputs. If
there are values of the Bs for which we don't care about the value of A then
we mark these with an X. Often this occurs because we know that for one
reason or another that combination of Bs will never arise in our algorithm.
Once we have a truth table for the quantity we want, we can immediately
write down a logical expression which will calculate that quantity for us. To
do this we use the disjunctive normal form of the expression. Let us
illustrate what this means with the example of the quantity A above. There
are nine 1s in the rightmost column of Table 15.2. For each of these we can
write an expression involving the independent variables B1... B4 which is 1
only for the values in that row, and 0 for every other row. For example, the
first 1 in the truth table occurs in row two. The expression B1 A .B2A .B3A .B4
is 1 only for the values of the Bs in this row and not for any other. The
corresponding expression for the fourth row is B1 A B2 A B3 A B4, and so on.
Combining nine such expressions, one for each of the nine 1s, we derive the
15.3 Truth tables and Karnaugh maps 371
B B2 B2 B4
A
0 0 0 0 X
0 0 0 1 1
0 0 1 0 0
0 0 1 1 1
0 1 0 0 1
0 1 0 1 1
0 1 1 0 0
0 1 1 1 1
1 0 0 0 1
1 0 0 1 1
1 0 1 0 0
1 0 1 1 X
1 1 0 0 1
1 1 0 1 1
1 1 1 0 0
1 1 1 1 0
expression
The trouble with this approach, as you will no doubt agree, is that Equa-
tion (15.11) is rather complicated. This makes it slow to evaluate on our
computer and prone to the introduction of errors. To get around these
problems we would like to simplify the equation to give a more compact
expression for A which we can evaluate more quickly. One way to do this is
just to make use of the rules of Boolean algebra to find a good simplifica-
tion. This however is a somewhat hit-or-miss operation, and it too is prone
to error. A quicker and more reliable tool for simplifying logical expressions
is the Karnaugh map.
Figure 15.1 depicts the Karnaugh map corresponding to Table 15.2. Each
of the sixteen squares in this map corresponds to one of the sixteen possible
sets of values of the four independent variables B1.. .B4 in our truth table;
the four black bars along the edges of the map denote the rows and columns
372 Chapter 15: Multispin coding
As you can demonstrate for yourself by referring to the original truth table,
this expression is indeed correct, and it is clearly much more satisfactory
than Equation (15.11).
Karnaugh maps can be used for problems with other numbers of indepen-
dent variables as well. For a function of three variables we would use a 2 x 4
map. For a function of five variables we need to use a three-dimensional map
with two layers each comprised of a 4 x 4 map of the kind we have used here.
A function of six variables requires a 4 x 4 x 4 map and larger numbers of vari-
ables than this require maps with four or more dimensions. High-dimensional
maps are quite hard for humans to visualize, which makes finding blocks by
hand a difficult task when the number of variables is large. Computers, how-
ever, have no such problems, and algorithms have been developed to allow a
computer to find the best choice of blocks on a high-dimensional Karnaugh
map. An example is the Quine-McCluskey algorithm, which is described by
Booth (1971).
u a1 b1 a1 b1
0 0 0 X X
0 0 1 1 1
0 1 0 1 0
0 1 1 1 0
1 0 0 X X
1 0 1 0 1
1 1 0 1 1
1 1 1 0 1
TABLE 15.3 Truth table for moves of the leftmost repton in the chain.
FIGURE 15.2 Karnaugh maps for moves of the leftmost repton in the
chain. Using these maps we find: a'1 — aVb1 (left map), and b'1 — uV77i
(right map).
For the; rightmost rep ton on the chain the the illation follows similar lines.
In this ease acceptance of the move depends on the direction u in which wo
propose to move the repton and the values of the hits an-1 and bN-1 which
represent the relative position of the last, two reptons. The truth table and
Karnaugh maps are very similar to (he ones for the leftmost repton and the-
Olid result is that
Readers may wish to check this result themselves. It doesn't take long.
The rules for updating the rest of the reptons in the chain all the ones
in the middle—-are a little more complex, but deriving them involve;, exactly
the same steps. Suppose we choose a repton somewhere in the middle of
the chain as the one we are going to move. Let us call this repton i, where
] < i < N. As before, we also generate a random bit u to represent. the
direction in which we propose to move the repton. To decide whether or not
the proposed move is an allowed one we need to take into account the value
of u and the relative position of the two reptous to either side of repton i,
which are represented by the bits ai_1, b2-1, a2 and bi).;. In Table 15.4 we
376 Chapter 15: Multispin coding
TABLE 15.4 Truth table for moves of a repton i in the middle of the
chain.
15.4 A multispin-coded algorithm for the repton model 377
show what the correct values a'i_1, b'i_1, a\ and b( of these four variables
should be after the move has been made or rejected, and in Figure 15.3 we
show the corresponding Karnaugh maps. The resulting expressions are:
FIGURE 15.3 The four Karnaugh maps for moves of reptons in the
middle of the chain. Note that each map is a three-dimensional 2 x 4 x 4
one. We have not shaded the blocks of 1s in this case, since this is
rather hard to do clearly in three dimensions.
15.5 Synchronous update algorithms 379
ter 12. Certainly there is some overhead arising from the complexity of
expressions like Equations (15.17-15.20). But in practice we find that the
algorithm performs very well. For large chain lengths (N > 100 or so), we
have compared our multispin algorithm with the standard algorithm of Sec-
tion 12.3 and find that the multispin algorithm outperforms the standard
one by a factor of about 27 on a 32-bit computer, and by a factor of about
48 on a 64-bit one.
Problems
15.1 Consider the "asymmetric exclusion process", which is defined as fol-
lows. We have m particles located on a one-dimensional lattice of N sites
with periodic boundary conditions. No two particles can occupy the same
site. Particles attempt to hop to the site immediately to their left with av-
erage rate p per unit time, and to the site immediately to their right with
average rate q < p. Hops are only allowed if the target site is unoccupied.
An ordinary Monte Carlo algorithm to simulate this model might work as
follows. We select at random a pair of adjacent sites i and j = i + 1 (mod-
ulo N) and attempt to exchange their values. If site i is empty and site j
is occupied, we accept the move with probability 1. If i is occupied and j
is empty we accept it with probability q/p. Otherwise we reject the move.
(a) Suppose we represent the state of the lattice by variables s,; which are 1
for occupied sites and 0 for unoccupied ones. Write down the truth tables
for the variables si and Sj giving their values after their contents have been
exchanged. Use these to derive the master expression for the case p = q.
(b) If r is a random variable which is 1 with probability q/p and 0 other-
wise, write down the master expression for the simulation in the general case
where q < p.
15.2 Consider de Gennes' model for polymer diffusion in two dimensions as
described in Problem 12.2. Suppose we represent the ith link in the chain
by two bits a,: and bi with (ai,bi) = (0,0), (1,1), (0,1) and (1,0) signifying
up, down, left and right respectively, (a) Write down the truth table for the
variable h that equals 1 if and only if links i and j = i + 1 are anti-parallel.
Hence obtain a logical expression for h. There is another simpler way of
expressing h which involves the XOR operation. Can you find it? (b) Using
the previous result, write down a master expression for moves in the interior
of the chain. (Hint: you will need to use some random bits for this.)
16
Random numbers
The fundamental constant which links all Monte Carlo methods together,
the common factor which makes them all Monte Carlo methods, is that
they contain an element of randomness. Things happen in a probabilistic
fashion in a Monte Carlo calculation, and it is only the average over many
unpredictable events that gives us an (approximate) answer to the question
we are interested in. To generate such unpredictable events, all Monte Carlo
methods require a source of random numbers. There are many ways of
generating random numbers. In this chapter we examine some of the most
common of them.
'The floor function [x] is defined as the largest integer not greater than x. Basically,
it's just what you get when you chop the digits after the decimal point off a number.
One must be careful when dealing with negative numbers however. The floor function
16.1 Generating uniformly distributed random numbers 383
always rounds downwards, which means that negative real numbers get rounded away
from zero. In C the floor function is implemented using the integer cast (int), or the
function floor(). In FORTRAN it is implemented using the INT function.
384 Chapter 16: Random numbers
2
An alternative technique was, and still is, used by the now ancient British computer
ERNIE, which picks the numbers of winning premium bonds (essentially a form of lottery)
using the pattern of gas bubbles passing through a liquid.
16.1 Generating uniformly distributed random numbers 385
Generators of this form are also ultimately guaranteed to fall into a repeating
386 Chapter 16: Random numbers
cycle of values, for essentially the same reasons as before, although for a
generator which uses k previous values of the sequence to generate the next,
the cycle can be up to mk steps long, which could be a very large number
indeed for quite modest values of k.
It is not our intention here to go into the mathematics of random number
generators in detail, since this is a book about Monte Carlo methods and
not about random numbers. In this chapter we merely describe some of the
most widely used and best-behaved functions for generating pseudo-random
sequences, without attempting to prove their properties with any rigour.
The information we give should be enough for you to choose a suitable
random number generator for your Monte Carlo simulations and implement
it on a computer in the language of your choice. For the reader who is
interested in investigating the subject more thoroughly, we recommend the
discussion by Knuth in Volume 2 of his classic work The Art of Computer
Programming (1981).
where p mod q represents the modulo operation, which returns the remainder
after p is divided by q. (In C it is written as p%q, in FORTRAN as MOD(P, Q).)
The highest random number that this equation can produce is m — 1, and
so the longest sequence of numbers that can be produced has m elements,
all of them different. Thus in order to make the sequence long we need to
make m large. The maximum possible value of m is equal to the largest
value that your computer can store in an integer variable, plus one. Let us
call this number w. On modern computers integers are normally stored in
32 bits, which means that w — 232. Choosing m = w is very convenient,
because we can then perform arithmetic modulo w simply by adding and
multiplying integers in variables of one word each and ignoring any overflow
that arises during the operations. Unfortunately, however, it turns out that
the random sequences produced by Equation (16.5) when m = w are rather
poor in one respect: the low order bits in the binary representation of the
numbers will not be very random. An extreme example is the lowest order
bit, which for m — w will, depending on the values of a and c, either be
a constant or will alternate between 1 and 0 for successive numbers in in
the series. For applications which make use of the values of the bits in our
random numbers (such as the multispin coding methods of Chapter 15) this
16.1 Generating uniformly distributed random numbers 387
ing our 32-bit variables. This allows us, without too much computational
overhead, to implement a linear congruential generator with the maximum
useful value of m = w — 1. The trick requires us to use signed integer arith-
metic however, so we have to take m — 231 - 1. As it turns out, one of the
most thoroughly tested and widely used random number generators happens
to use this value of m. (It's no coincidence in fact, given the prevalence of
32-bit integer arithmetic on computers.) This generator was first given by
Lewis et al. in 1969, and makes use of the values a = 16 807 and c = 0. For
technical reasons Knuth recommends the choice c = 0 for all generators with
m = 2n ± 1, and in this case it is also quite convenient, since it speeds up
the arithmetic.
Schrage's trick works like this. Given m, we can define two numbers q
and r thus:
so that
For the values of a and m given above, we have q = 127 773 and r = 2836.
It is important for the trick to work that r < q, but that is indeed the case
here, so we are safe. Now it is clear that
This is true regardless of the value of q, since we can add or subtract any
integer multiple of m from ain without changing the value of the remainder
when we divide it by m. But now, using (16.7), we can write this as
The term ai n /q\r < in(r/q), and since r < q, this number is less than in,
which in turn is less than m. Furthermore, given that the number in mod q
lies between zero and q — 1, the term a(in mod q) < aq. The definition of
q, Equation (16.6), then ensures that this term is also less than m. Thus
both of these terms fit into our 32-bit signed integers, and, since neither
of them can be negative, so does their difference. The difference can be
negative however, which is why we need to use signed arithmetic to evaluate
it. Taking the result modulo m now becomes easy: if the difference of the
two terms is positive, we don't have to do anything; if it is negative we just
need to add m. In other words
16.1 Generating uniformly distributed random numbers 389
With the given values of q and r we can use this equation to implement
the random number generator of Lewis et al. An example, written in C,
is given in Appendix B. The code is a little more complex than that for
the straightforward linear congruential generator, but it has the significant
advantage of a longer period (about 2 billion numbers, which is adequate for
many smaller Monte Carlo simulations). It has another advantage too. All
linear congruential generators can produce only a certain number of differ-
ent possible integer values. If we convert these into real numbers between
zero and one, we can only get a certain number of different possible reals.
Of course, the available number of different reals between zero and one is
always finite, limited as it is by the precision with which the computer rep-
resents real numbers. However, with the values for a, c and m we suggested
for the simple 32-bit generator, for example, the number of different val-
ues generated by the algorithm is about 1.7 million, which is far less than
the number allowed by the machine precision on most modern computers.
(Double precision floating-point numbers typically have at least 48 bits of
mantissa, which gives about 1014 different values in each decade, and signif-
icantly more than this between zero and one.) For many applications, this
would not matter a great deal, but we can imagine cases in which it would.
One of the authors for example, has performed extensive Monte Carlo sim-
ulations on one-dimensional systems with several million lattice sites. If one
were not careful, it would be very easy to ignore the fact that not all the
sites in such a system can be reached by choosing a site number using a
single random number r and Equation (16.2), if r has only a million or so
possible values. The result would be a Monte Carlo algorithm which violates
the condition of ergodicity in a very bad way.
The generator described above still suffers from this drawback, but the
situation is a lot less severe than in the case of the straightforward linear
congruential generator. By using Schrage's trick, we have created a generator
that can generate m — 1 = 2 147 483 646 different numbers, which is enough
for most Monte Carlo applications. The method does have one slight problem
(as indeed do all generators which use c = 0) that if we seed the generator
with the number io = 0, we will just generate zeros forever afterwards,
since ain will always be zero. Thus we should avoid using zero as the seed.
The algorithm also never produces zero once seeded with any other number,3
though for the particular choice of a and m given here, it does produce every
other integer between 1 and m — 1. (This is why the number of different
values produced is m — 1 and not m.) If we are worried about an algorithm
which never produces zero, we can always subtract 1 from each number
generated.
A different solution to the problem of writing a good linear congruential
3
It can only produce zero if a and m have a common factor, which in this case they
don't. Actually, m doesn't have any factors; 231 — 1 is a prime number.
390 Chapter 16: Random numbers
In C, for example, the left- and right-shift operations are performed by the
operators « and » and the exclusive-OR operation by ~. All of these
are fairly quick operations, so the generator is usually quite fast even when
implemented in a high-level language. Furthermore, there is no problem
using all 32 bits of an integer with this algorithm, or however many bits
are available on your computer, since Equations (16.11) and (16.12) never
produce an overflow.
Since shift-register random number generators generate each number
from only the single preceding number in the sequence, the arguments given
in Section 16.1.2 immediately tell us that the longest sequence they can gen-
erate has length w, i.e., one greater than the maximum integer which your
computer can store. (In fact, as with the linear congruential generator, the
longest sequence actually has length w — 1, since we cannot use any sequence
which includes the number zero, because zero is mapped onto itself by the
equations above.) As we pointed out in Section 16.1.4 this is inadequate for
some longer Monte Carlo simulations. In this case, we can again employ the
Bays-Durham shuffling scheme, as we did for the linear congruential gen-
erator, to increase the period and at the same time improve the quality of
random numbers generated.
The only remaining question is what value the two parameters s and t
should take. As with the linear congruential generators of Section 16.1.3,
the choice of these numbers involves a certain amount of black magic. Here
we merely state that for a 32-bit generator the values s = 15 and t = 17 are
the most widely used. They also generate a sequence of numbers with the
maximum possible period of 232 — 1. For a 31-bit generator, s = 18, t = 13
and s = 28, t = 3 are the commonest choices. For most purposes these
choices will give adequate random numbers. A more thorough discussion of
the possible choices is given by Marsaglia (1992).
where m, r and s are constants and o can be any of the operations + (plus),
— (minus), x (multiplication) or ® (exclusive-OR). As before, the choice
of the constants is crucial to producing random numbers of good quality.
Knuth gives a thorough discussion of possible values. We will merely list
some common ones.
Many people use the lagged Fibonacci generator with an exclusive-OR
operation (Lewis and Payne 1973). (These generators are sometimes referred
to as "generalized feedback shift register generators", for reasons which are
quite obscure.) This has the advantage that the modulo operation in Equa-
tion (16.13) is not necessary, since the exclusive-OR of two n-bit integers
can never itself exceed n bits. It turns out however that these generators
produce poor random numbers as well as possessing rather short periods,
unless the constants r and s are very large. A common choice for example is
r = 418, s = 1279, which requires us always to store the last 1279 numbers
generated. Considerably superior results can be obtained with much smaller
values of r and s if we use other operators instead, and we recommend us-
ing (16.13) with either addition or multiplication. Addition usually gives a
faster generator, but multiplication appears to give somewhat better random
numbers.
Unlike the exclusive-OR case, both additive and multiplicative lagged
Fibonacci generators require the use of the modulo operation in Equa-
tion (16.13). However, it turns out that the quality of random numbers
generated is not highly sensitive to the value of the modulus m, the only
restriction being that it should be an even number. Thus, it makes sense to
choose m equal to the number w (i.e., one larger than the largest number
which can be stored in an integer variable) so that we don't have to perform
the modulo operation explicitly—we can just let the addition or multiplica-
tion operation overflow our integer variables and ignore the overflow.4 As
4
This happens automatically in machine languages, and also in the C programming
language. Unfortunately, most implementations of FORTRAN and Pascal will not allow
integers to overflow—the overflow will cause a run-time error which will halt the program.
If you program in one of these languages it may be possible to disable this feature in your
compiler, or you may be able to use a generator written in C while writing the rest of
your program in the language of your choice.
16.1 Generating uniformly distributed random numbers 395
for the other constants, the most common choice is r = 24, s = 55, first sug-
gested by Mitchell and Moore in 1958 (unpublished). This choice appears to
give excellent results for both additive and multiplicative generators. Knuth
also suggests a number of other possibilities which you may like to play with.
These generators possess very long periods. In the additive case, for
instance, it can be proved that the period is at least 255 — 1, which should
be adequate for almost any purpose. They do have the disadvantage that
we must store the last 55 numbers generated at all times. This however
adds only a slight extra complexity to the algorithm and so is not a serious
objection. The numbers can be stored in an integer array organized as a
circular FIFO buffer of the kind described in Section 13.2.5. The values of
the elements of this array need to be initialized before the first call to the
generator—effectively the generator needs 55 seeds. These seeds could be
generated, for example, using a linear congruential generator.
Both the additive and multiplicative generators produce random num-
bers of very high quality, although the multiplicative generator appears to
be slightly better—it passes some extremely stringent tests of randomness
for the numbers it generates (Marsaglia 1985) and we recommend it for cal-
culations in which random numbers of high quality are important. It does
have a couple of problems though. First, it is usually slower than the addi-
tive generator, since multiplication normally takes longer than addition. On
modern RISC processors, however, this is not always true; many such pro-
cessors can perform integer multiplication just as fast as addition. Second,
it can only produce either even or odd integers, but not both. To see this,
consider what happens if even a single one of our 55 seed integers is even.
The product of this number with any other will produce an even result and,
since 24 and 55 are coprime, this means that our buffer will, after some time,
be entirely full of even numbers, and thereafter all numbers generated will
be even. Only if all the seeds are chosen odd can the algorithm go on gen-
erating odd numbers for an arbitrary length of time, but in this case it will
only generate odd numbers and no even ones. In fact, the normal practice
is to take the latter course and seed the generator with 55 odd integers so
that all numbers generated are odd. This reduces the number of different
values which can be generated to w/2, rather than w. This is not normally
a problem, but under certain special circumstances it could be, so it is as
well to be aware of it.5
Note that with the additive generator we should be careful to ensure that
not all the seeds are even, otherwise this generator will only ever produce
even integers. In both cases, the seeds should be uniformly distributed over
the entire range 0 to w — 1. If, for example, all the seeds were set equal to 1,
then the first few numbers produced by either algorithm would not be good
5
Note that it is only the number of different values which is reduced. The period of
the generator is still long enough that you should not usually have to worry about it.
396 Chapter 16: Random numbers
random numbers.
Although the theoretical foundations on which the lagged Fibonacci gen-
erators are built are less solid than those of the other methods we have
discussed, they have produced excellent results in many tests. In combi-
nation with another generator they can prove very useful when speed and
quality of random number generation are of the essence.
And this is the fundamental random quantity which we use in Monte Carlo
simulation. Sample programs in C implementing a number of the generators
described here can be found in Appendix B.
like this. Suppose we want to produce real random numbers x which lie
between the limits x min and ormax and are distributed according to some
function f ( x ) . That is, the probability of producing a number in the inter-
val between x and x + dx should be f(x)dx provided xmin < x < xmax,
or zero otherwise. The function f(x) is a properly normalized probability
distribution such that its integral is unity:
The same fraction of our uniformly distributed random numbers lies in the
interval 0 < r < F(x). What we want to do is map numbers from our uni-
form random number generator which fall into this region onto the numbers
between xmin and x in our new distribution. As it turns out, it's very easy
to arrange this. All we have to do is observe that the largest numbers in
each case should correspond to one another. In other words the number x
in our non-uniform distribution should be generated when the number
Here F is a free parameter, the "width" of the Lorentzian, and the factor of
e-1 and the F on top of the fraction are required so that the distribution
is normalized to 1 over the interval — oo to +00. Now Equations (16.16)
and (16.17) tell us that the Lorentzianly distributed random number x should
be generated every time our uniform random number generator produces the
number r, where
398 Chapter. 16: Random numbers
We know that the element of solid angle in these coordinates is sin O do dO.
We want to generate random values of 9 and o, such that an equal number
of the vectors they describe fall in equal divisions of solid angle. In other
words we want to generate uniformly distributed values of <j> between 0 and
2r (which is easy) and we want to generate values of 6 between 0 and it
distributed according to the frequency function
where the factor of1/2is required to ensure that /(o) integrates to unity.
Again employing Equations (16.16) and (16.17), this implies that the value
o should be generated every time our uniform random number generator
produces the number r, where
So, in order to generate our random, spherically symmetric unit vectors, all
we need do is produce two uniformly distributed random numbers between
zero and one, feed one of them into this equation to get o, and the other into
Equation (16.1) to produce a value for 4> between 0 and 2r, and then feed
these angles into Equation (16.21) in order to get x, y and z.
and (16.17) tell us that the Gaussianly distributed random number x should
be generated every time our uniform random number generator produces the
number r, where
where erf (x) is the error function, which is essentially just the definite inte-
gral of a Gaussian. Unfortunately, there is no known closed-form expression
for the error function, which makes it impossible to invert this equation.6
Generating Gaussian random numbers is extremely important for many
applications however, so other methods have been developed to tackle the
problem. The standard way of doing it is a two-dimensional variation of the
transformation method. Imagine we have two independent random numbers
x and y, both drawn from a Gaussian distribution with the same standard
deviation a. The probability that the point (x, y) falls in some small element
dx dy of the xy plane is then
6
Some computer languages, including C, provide a library function which can evaluate
erf(x) using an asymptotic series approximation. However, no such library functions exist
for evaluating the inverse of the function.
16.2 Generating non-uniform random numbers 401
The transformation method then says that we should produce a value r for
this coordinate every time our uniform random number generator produces
a number p (we use p this time to avoid confusion between variables called
r) such that
With this value for r and our random value for 6, the two numbers
can't use the transformation method described in the Section 16.2.1. In this
section we describe another method for generating random numbers, the
rejection method, which is simple to implement and can generate random
numbers according to any distribution, whether it is integrable or not. The
method does have some drawbacks, however, which make it inferior to the
transformation method for integrable functions:
1. It is considerably less efficient than the transformation method. It
requires us to generate at least two and often more than two random
numbers uniformly distributed between zero and one for every non-
uniform number returned.
2. It only works for distributions defined over a finite range. In other
words we can't have x min = —oo or xmax = +00 as we did in some of
the examples in the last two sections.
These drawbacks are offset by the generality of the method, and certainly
there are plenty of situations where the rejection method is the method of
choice.
In its simplest form, the rejection method works like this. We want to
generate random numbers x in the interval from x min to x max distributed
according to some function f ( x ) . Let fmax be the maximum value which the
function attains in the interval. We generate a random number x uniformly
in the interval between x min and xmax using Equation (16.1). (This only
works if xmin and xmax are finite, which is the reason for condition 2 above.)
Now we generate another random number r between zero and one and we
keep the random number x if
Otherwise, we reject x and generate another number between xmin and xmax.
The process continues until we accept one of our numbers, and that is the
number which our random number generator returns. The factor of fmax
on the bottom of Equation (16.33) ensures that the acceptance probabil-
ity f(x)/f max has a maximum value of one, which makes the algorithm as
efficient as possible.
Why does it work? Well, the probability that Equation (16.1) generates
a number in some small range x to x + dx is
And the probability that we return a value of x in this interval is then the
product of these two probabilities:
then be written
and the total number of calls per Gaussian random number generated is
In other words we will have to make about ten calls on average for each
number we generate —ten times as many as for the method of Section 16.2.2.
The rejection method is therefore a poor way of generating Gaussian random
numbers. However it may still be useful for distributions for which no other
method exists.
however, the better g(x) approximates to f ( x ) the more efficient the algo-
rithm will become. Note that f ( x ) does not have to be normalized to unity
in order for the method to work. In fact, it cannot be normalized to unity,
since we know that g(x) is normalized, and f ( x ) < g(x) for all x. Thus the
integral of (16.42) between xmin and xmax must be less than one, implying
that the method has less than perfect efficiency. However, it can still be a
lot more efficient than the simple rejection method. The hybrid method also
does not require that the limits xmin and xmax be finite.
As an example, let's take Gaussian random numbers again. Suppose we
choose for g(x) the Lorentzian function, Equation (16.18), with F = 1, which
has a bell shape similar to that of the Gaussian, but decays to zero slower in
the tails. In Figure 16.2 we show the Lorentzian, normalized to unity, with
the Gaussian f ( x ) scaled so that it lies below it for all values of x. In this
example, we found by simple trial and error that multiplying the normalized
Gaussian by 0.65 brought us very close to touching the Lorentzian.
Now we can apply Equation (16.20) to generate random numbers be-
tween plus and minus infinity with a Lorentzian distribution, and then
Equation (16.41) to decide whether to accept them or not. The integral
over Equation (16.42) then gives us simply 0.65 for the fraction of proposed
numbers which are accepted (since this is the factor by which we scaled the
normalized Gaussian curve to get f ( x ) ) . Given that we have to generate
two random numbers between zero and one for each proposed number, that
406 Chapter 16: Random numbers
means that on average about three random numbers are generated for each
Gaussian number returned, which is much better than the ten numbers we
had to generate with the simple rejection method.
AND of the two we get a new word Wp> = r A Wp in which each bit is one
with probability
These relations tell us that by the judicious combination using AND and
OR operations of the appropriate number of independent random words in
which each bit is one with probability 1/2, we can produce independent bits
whose probability p of being one is any rational fraction with denominator
a power of two.
In practice, given a desired value of p, the best way to find the appropriate
sequence of operations to generate it is to work backwards from the result
we want until we get p =1/2.As an example, here is how we would generate
a word W19/32 using, in this case, five different random words r1 . . . r5 with
P=1/2 =
as
Putting these all together, we can write Wig/32
example, using the techniques described above. Now let us choose between
these two random words at random, with probability q that we choose WP1
and 1 — q that we choose WP2. Then the probability that any bit in the
chosen word will be one is qp1 + (1 — q)p2. Setting this equal to the desired
probability p and solving for q we get
This then allows us to generate words with any real value of p we want.
In practice, the most efficient way to use this method is to decide first
with probability q whether we are going to choose WP1 or WP2 and then
to generate the appropriate word. This is obviously more efficient than
generating both words first and throwing one away. Nonetheless, we still
have to generate one such random word as well as generating a random
floating-point number in order to make the choice. There are other, more
subtle problems as well. In particular, this method tends to generate random
words in which there are correlations between the bits. Any particular bit
generated does indeed have exactly the desired probability of being a one.
But if one such bit is a one, then there is a statistical bias in favour of the
other bits in the same word being ones also. This is most obvious in the case
where we choose the two words WP1 and WP2 to be composed of all zeros
and all ones respectively. This choice satisfies the conditions we specified for
the two words regardless of the value of p, since p1 = 0 and pz = 1. However,
it would be rather a silly choice, since the correlations between the bits are
extremely bad. If any bit in a word generated in this way is a one, then all
the others will be ones as well. And if any bit is a zero, all the others will
be zeros.
In practice, we get the best results if we make the values of p1 and p2
as close as possible to the desired value of p, but there is clearly a compro-
mise to be struck here between the time taken to do this and the amount
of correlation between the bits which we are willing to tolerate. In fact, in
the case of the "asynchronous update" multispin algorithms we studied in
Chapter 15 in which variables from several different systems are packed into
the bits of one word, some correlation between random bits is acceptable
and quite widely spaced values of p1 and p2 such as, say, 0 and 1/2, usually
work fine. (The example program for the two-dimensional Ising model given
in Appendix B uses this choice, and gives perfectly satisfactory results.)
The correlation between the bits may give rise to some correlation between
the values of observable quantities measured in the different systems simu-
lated, and we should be careful that this does not lead us to underestimate
the statistical errors on such quantities. Other than this, however, a little
correlation between bits will do no harm.
16.3 Generating random bits 409
Problems
16.1 For a linear congruential generator, Equation (16.5), with c = 0 derive
a "skip formula" which gives the value of in+k in terms of in for any k,
without our having to actually carry out k iterations of the generator.
16.2 An alternative to the Bays-Durham shuffling scheme of Section 16.1.4
is as follows. We store an array of N integers {jn} just as before and each
time we wish to generate a new random number we produce a random integer
i using our linear congruential generator. We use this integer to calculate
an index k = \_iN/m\. We return jk as our new random number and put i
in its place in the array. Unfortunately, this scheme would be a very poor
one. Why?
16.3 How would you generate floating-point numbers x > 0 according to
the distribution p(x) a e-x?
References
Breeman, M., Barkema, G. T. and Boerma, D. O. 1995 Surf. Sci. 323, 71.
Breeman, M., Barkema, G. T. and Boerma, D. O. 1996 Thin Solid Films
272, 195.
Coddington, P. D. and Baillie, C. F. 1991 Phys. Rev. B 43, 10617.
Coddington, P. D. and Baillie, C. F. 1992 Phys. Rev. Lett. 68, 962.
Coniglio, A. and Klein, W. 1980 J. Phys. A 13, 2775.
Cooley, J. W. and Tukey, J. W. 1965 Math. Computation 19, 297.
Dorrie, H. 1965 One Hundred Great Problems of Elementary Mathematics,
Dover, New York.
Eckert, J. P., Jr. 1980 in A History of Computing in the Twentieth Century,
N. Metropolis, J. Howlett and G.-C. Rota (eds.), Academic Press, New
York.
Edwards, S. F. and Anderson, P. W. 1975 J. Phys. F 5, 965.
Efron, B. 1979 SIAM Review 21, 460.
Ferrenberg, A. M. and Landau, D. P. 1991 Phys. Rev. B 44, 5081.
Ferrenberg, A. M., Landau, D. P. and Wong, Y. J. 1992 Phys. Rev. Lett.
69, 3382.
Ferrenberg, A. M. and Swendsen, R. H. 1988 Phys. Rev. Lett. 61, 2635.
Ferrenberg, A. M. and Swendsen, R. H. 1989 Phys. Rev. Lett. 63, 1195.
Feynman, R. P. 1985 Surely You're Joking, Mr Feynman, Norton, New
York.
Fischer, K. H. and Hertz, J. A. 1991 Spin Glasses, Cambridge University
Press.
Fisher, M. E. 1974 Rev. Mod. Phys. 46, 597.
Fortuin, C. M. and Kasteleyn, P. W. 1972 Physica 57, 536.
Futrelle, R. P. and McGinty, D. J. 1971 Chem. Phys. Lett. 12, 285.
de Gennes, P. G. 1971 J. Chem. Phys. 55, 572.
Giauque, W. F. and Stout, J. W. 1936 J. Am. Chem. Soc. 58, 1144.
Gibbs, J. W. 1902 Elementary Principles in Statistical Mechanics. Reprinted
1981, Ox Bow Press, Woodridge.
Gottlob, A. P. and Hasenbusch, M. 1993 Physica A 201, 593.
Grandy, W. T., Jr. 1987 Foundations of Statistical Mechanics, Reidel, Dor-
drecht.
Heller, C., Duke, T. A. J. and Viovy, J.-L. 1994 Biopolymers 25, 431.
Hukushima, K. and Nemoto, K. 1996 J. Phys. Soc. Japan 65, 1604.
Jayaprakash, C. and Saam, W. F. 1984 Phys. Rev. B 30, 3916.
Kalos, M. H. and Whitlock, P. A. 1986 Monte Cario Methods, Volume 1:
Basics, Wiley, New York.
412 References
Chapter 1
1.1 False. The probability of being in any particular state with energy E is
indeed proportional to e-bE, as we stated in Section 1.2. However, if there
are several states with energy E then the system is more likely to have that
energy than if there are few such states. Often this is expressed in terms of a
"density of states" p(E) which is the number of states with energy E. Then
the probability of having a certain energy varies in proportion to p(E) e-bE.
The density of states is discussed in more detail in Sections 4.5.3 and 6.3.
1.2 For this simple two state system, the sum rule (1.2) tells us that w1 =
1 — Wo. Using this relation we can write the master equation in the form
Since E0 and E1 are constant, we can directly integrate this equation to give
W0, with the integration constant being set by the initial conditions:
Taking the limit t —> oo, the exponential in the numerator vanishes, and by
rearranging we can show that the probability po = w0(oo) is
as in Equation (1.5). The solutions for w1 and p1 follow from the sum rule.
1.3 If we have n out of our N particles in state 1 then the total energy
of the system is H = E1n. The number of states with this energy is the
417
418 Appendix A: Answers to problems
1.4 In one dimension, the Hamiltonian for the Ising model, Equation (1.30),
can be written as H = - J ^ i SjSj+i- Making the proposed change of vari-
ables this becomes
Chapter 2
2.1 The appropriate generalization of (1.1) to the case of a discrete time
variable is
Using Equation (2.5) we can show that the second term is just
Chapter 3
3.1 Using the full Ising Hamiltonian, Equation (3.1), the generalization of
Equation (3.8) to the case of finite B is
The first sum gives exactly the same thing as in the B = 0 case. The second
two give — B(s% — s£), since all the other spins are unchanged and cancel
out. Using Equation (3.9) we then get
3.2 The expression for our estimate of the error on the mean square of a
set of numbers is a generalization of Equation (3.37). Our best estimate of
the mean square is
Applying this to the twelve numbers given in the problem, we get a value
of a = 5.05. Directly applying the jackknife method to the same set of
numbers, we get a value of a = 5.28.
3.3 There are a number of ways we might estimate the partition function.
Perhaps the most direct is to measure the internal energy and make use of
Equation (1.9) to derive the following formula:
int s [N] ;
double beta, J;
void move ()
{
int i;
int n1 , n2 , delta;
i = N*drandom();
if ((n1=i+l)>=N) n1 -= N;
if ((n2=i-l)<0) n2 += N;
delta = s[i]*(s[n1]+s[n2]);
if (delta<=0) {
s[i] = -s[i];
} else if (drandom()<exp(-4*beta*J)) {
s[i] = -s[i] ;
}
>
This is not the most efficient implementation of this algorithm. It could be
made slightly faster by storing the value of e-4bJ in a separate variable so
that we don't have to recalculate it at every step.
Appendix A: Answers to problems 421
Chapter 4
4.1 In order to produce a configuration of the system which is statistically
independent of the current one, the domain walls have to diffuse an average
distance on the order of the correlation length £. If the domain walls perform
a random walk across the system, then the mean square distance moved
increases linearly with time <x2> ~ t. Thus the time r taken to diffuse a
distance £—the correlation time—goes as r ~ £2. Comparing this result
with Equation (4.6) we get z = 2. In fact, the measured values of z for the
two- and three-dimensional Ising models, Table 4.1, are close to 2 for the
Metropolis algorithm, as indeed they are for all single-spin-flip algorithms,
indicating that the domain walls do approximately perform a random walk
for such algorithms. For the cluster algorithms however, the domain walls
are more mobile, and this is the fundamental reason why these algorithms
achieve lower values of z.
4.2 Consider as we did in Section 4.2.1 two states separated by the flipping
of a single cluster, and suppose that that cluster contains k spins. The ap-
propriate generalization of Equation (4.11) to the case of non-zero magnetic
field is
where S = ±1 is the value of the spins in the cluster before they are flipped
over. Substituting this into Equation (4.10) and rearranging, we find that
Equation (4.12) is now modified thus:
is 1. Then the ratio of the acceptance probabilities for forward and backward
moves is e - 2 b S B k . The most efficient choice of acceptance ratio to satisfy
this constraint is
Notice that this reduces to the usual Wolff result A(u —> v) = 1 when B — 0.
4.3 The Swendsen-Wang algorithm is just the same as the Wolff algorithm
except that it covers the entire lattice with clusters, rather than just creating
one. In fact, we already worked out the equivalent of Equation (4.24) for
this case in Section 4.3.2 when we showed (Equation (4.21)) that
Chapter 5
5.1 The algorithm satisfies ergodicity for the same reason that the Kawasaki
algorithm does; all exchanges of up- and down-spins in the Kawasaki algo-
rithm are also possible moves in this algorithm, so that if one of them satisfies
ergodicity then the other does as well. The proof of detailed balance is also
the same as for the Kawasaki algorithm except that we need to show that
the selection probabilities g(u —> v) and g(v —> u) are the same in either
direction. This however is simple since the probability of picking a partic-
ular minority spin is always the same because the number of such spins is
conserved, and the probability of picking a particular one of its neighbours is
constant at 1 / z , where z is the lattice coordination number. To see why the
algorithm is more efficient than the Kawasaki one, consider the case p <\.
In this case the particles (rather than the vacancies) are in the minority.
Then there are two ways to pick any particular particle-particle pair, one
way of picking a particle-vacancy pair, and no ways to pick a vacancy-
vacancy pair. When p =1/2,particle—particle and vacancy-vacancy pairs are
equally common on average, which means that the Kawasaki algorithm and
this new one are equally efficient. But when p<1/2the number of particle-
particle pairs goes down relative to the number of vacancy-vacancy ones,
and hence the new algorithm does better than the Kawasaki one. However,
the algorithm is also more complicated to program, since it requires us to
maintain an up-to-date list of where all the particles in the system are. This
Appendix A: Answers to problems 423
makes the program slower than in the Kawasaki case. Thus in practice we
have to have a density p which is substantially less than 1/2 in order for the
algorithm to pay dividends. A similar argument can be made to show that
we get an efficiency gain when p is significantly greater than 1/2 also.
5.2 The energy difference between the two states can be written
Only the spins k and k' which are exchanged change their values. All others
stay the same, so the only contributions which don't cancel out are the ones
involving these spins. Noting that the interaction between the two spins
themselves doesn't change when they are exchanged, we then get
In the case where Sk = Sk', which is the only case we are really interested
in, we can simplify this using Equation (3.9) to give
Chapter 6
6.1 The condition of detailed balance, Equation (2.14), tells us that the
transition probabilities for transitions between state 1 and the state at the
424 Appendix A: Answers to problems
Since P(B —> 1) is a probability, the largest value it can have is 1, which
means that the largest value which P(1 —> B) can have is exp[— b(EB —
E1)} = exp[— bB1], as claimed in the problem. (This of course is just the
Metropolis algorithm for this transition.) The exponential increase in the
time taken to cross a barrier with barrier height is called the Arrhenius law,
and is discussed in more detail in Chapter 11.
6.2 The Hamiltonian of the Mattis model can be written as
Defining a new set of variables ti = £jSj, which take the values ±1, this
becomes
The most efficient choice of acceptance ratio is then given by Equation (6.20)
with this expression substituted in.
6.4 As explained in Section 1.2.1, the range of energies which the system
passes through, measured in terms of energy per spin, decreases as l/\fN
with the size of the system (or as L-d/2 with the linear dimension). The
Appendix A: Answers to problems 425
Chapter 7
7.1 The number of possible configurations of arrows at a vertex is the num-
ber of ways of choosing the three outgoing arrows out of the six possible
directions, which is 20. If we ignore the first ice rule, this gives us a total
of 20N possible states of the lattice, where N is the number of vertices. By
Pauling's argument however, there is a 50% probability that any bond in
one of these configurations will be doubly occupied, or not occupied at all.
There are a total of 3N bonds, and hence when we reinstate the first ice rule
we reduce the number of states by a factor of 2 3N . Thus the entropy per
vertex is approximatelyN-1log(20N/23N) = log | ~ 0.916.
7.2 If we again take a q = 3 Potts model, we can ensure that nearest-
neighbour spins have different values if we set J = —oo. In addition, from
Equation (7.20) we see that the energy of the F model is proportional to
the number of next-nearest-neighbour spin pairs which have the same value,
and we can incorporate this by giving our Potts model finite ferromagnetic
next-nearest-neighbour interactions. This makes the Potts model equivalent
to the F model, except for an (infinite) additive shift in the energy.
7.3 The simplest way to do this is to create a loop algorithm similar to that
for the six-vertex model, except that when the loop passes through a vertex
it can leave along any bond with equal probability, regardless of the state of
the arrows at the vertex. It is not hard to show that this will produce only
configurations containing the eight allowed vertex types. Both short and
long loop versions of this algorithm are possible, and both obey ergodicity
and detailed balance for the same reasons that they did in the six-vertex
model.
Chapter 8
8.1 For a finite sized system the specific heat per site at the critical tem-
perature scales with the size L of the system as c ~ L a/v (see Equa-
tion (8.43)). This means that the specific heat for the whole system scales
as C ~ Ld+a/v. Thus the temperature range AT varies with system size as
AT ~ £-<V2-a/2i/_ This expression can be simplified using Equation (8.93)
to give AT ~ L"1/". Interestingly, as Equation (8.54) shows, the range of
426 Appendix A: Answers to problems
this becomes
which is the appropriate scaling equation for the finite size behaviour of the
correlation time.
Chapter 10
10.1 The answer to this problem comes from Problem 5.3, where we showed
that the lowest energy state of such a system is one in which the up- and
down-pointing spins form two bands stretching across the system, with two
straight domain walls dividing them. For a phase-separating system, the
final state after running the simulation for a long time will be just such a
state. At finite temperature the domain walls will usually not be straight
and will fluctuate in position, but the overall geometry of the state will be
as described in Problem 5.3.
10.2 The probability of choosing spin i as our first spin is proportional
to (z — ni)exp(-4/3n,). Spin i has z — ni anti-aligned neighbours, so the
probability of choosing a particular neighbour j out of these is l / ( z — n i ) . The
total probability of picking the pair i,j is thus proportional to exp(—4bni).
The same pair of spins could also have been picked in the opposite order,
starting with spin j. The probability of this happening is proportional to
exp(—4bnj). Thus the total probability of picking the pair i, j is proportional
to exp(—4bni) + exp(-4bnj), as desired.
Chapter 11
11.1 The ratio of the rates for the exchange and hopping processes is
10e -bAB . Since the exchange process moves atoms v/2 times as far as the
hopping process, the diffusion constants for the two processes will be equal
when this ratio is 1/2 (because the diffusion constant is proportional to the
mean square distance moved by an atom in a given time). Plugging in the
value of AB, this means that the exchange process dominates for tempera-
tures above about 770 K. This is below the melting point of copper, but still
sufficiently far above typical laboratory temperatures that we don't usually
have to worry about it.
11.2 As pointed out in Section 11.2.2, the time taken per Monte Carlo
step increases logarithmically with the number of possible moves. Since the
number of moves scales on average as the number of sites on the lattice, each
428 Appendix A: Answers to problems
move will take about a factor Iog2502/logl002 = 1.20 longer on the larger
system. The total number of Monte Carlo steps we have to perform in order
to simulate a given interval of real time increases linearly with the number of
lattice sites, so that we have to perform about a factor of 2502/1002 = 6.25
more steps in the larger simulation. Overall, therefore, the simulation will
take about 7.5 times as much CPU time on the larger lattice, or 7.5 hours
in this case.
Chapter 12
12.1 The simplest expression which gives v in terms of only one combination
of the independent variables N and E is
12.2 (a) Since the chain is not allowed to move transversely to its own length
its only mode of translation is the longitudinal reptation mode, (b) For a
chain of N reptons there are N - 2 adjacent pairs of links and two end
links giving a total of N overall. In two dimensions each one can move
to three other positions, so the total number of possible moves is 3N. As
in the repton model, each move should be attempted once per unit time,
so the appropriate time increment is At = 1 / ( 3 N ) per move. (c) The six
possible moves of the first and last links on the chain are always allowed.
The probability of two adjacent links in the interior of the chain being anti-
parallel is1/4,and each such anti-parallel pair has three possible moves. Thus
the average number of allowed moves in the chain is 6 + 3/4(N — 2) =9/2+3/4N.
The average probability of a move being allowed is the ratio of allowed moves
to total moves or(9/2+3/4N)/(3N)= (N + 6)/(4N).
Chapter 13
13.1 (a) On the square lattice the disks cover 1/4TT ~ 78.5% of the space.
(b) On the triangular lattice they cover P/(2-\/3) — 90.7%. The triangular
packing is the most dense packing of circles in two-space, which is the main
reason why the triangular lattice crops up so often in physical systems.
13.2 Before we stretch the lattice, the neighbours of the point (h, k, l) are
(h ± 1, k ± 1,l ± 1), all located at a distance of \/3. After stretching, these
neighbours are located at a distance of 2. In addition the sites (h ± 2, k, l)
and (h, k ± 2, l) are now also located at the same distance from our point,
so that the lattice has become 12-fold coordinated. Since it is still a Bravais
lattice, it must be the fee lattice, which is the only 12-fold coordinated such
lattice.
Appendix A: Answers to problems 429
Chapter 14
14.1 (a) Yes, it is possible. If we tilt the lines separating the domains 30°
away from the vertical, fixing the spins (1/2i, i mod 22) for all 0 < i < 66,
the lattice forms a single strip which wraps three times around the peri-
odic boundary conditions. Cutting this strip into six equal pieces, each cut
requiring us to fix an additional six spins, we achieve our goal. The total
number of fixed spins is 66 + 36 = 102. (b) The most economical way of
dividing up the lattice is to fix spins along lines set at 45° to the vertical. If
we fix the set of spins (i, i) and (i, L — i — 1) for all 0 < i < L, and another set
exactly the same but translated by1/2Lhorizontally (with periodic boundary
conditions), then we can divide an L x L lattice into eight equal domains.
Each line fixes L spins, but 8 spins are shared between lines, so the total
number of fixed spins is 4L — 8. For L = 252 this gives us our 1000 fixed
spins. Thus we can divide a 252 x 252 lattice into eight domains by fixing
1000 spins. (If you can find a larger lattice which can be divided with this
many fixed spins, we want to hear about it.)
14.2 No. The spins located at the points where the two sets of boundaries
cross belong to both sets and hence will never get changed. We must use at
least three different sets of boundaries to achieve ergodicity.
14.3 The volume V of the region covered by each processor scales as n-1,
where n is the number of processors used. On average the values of about half
430 Appendix A: Answers to problems
of the spins on the lattice have to be transmitted every time the boundaries
are shifted, so each processor will have to send and receive a number of values
which also scales as n-1. The amount of CPU time spent simulating each
region for one correlation time scales as Ld+z (see Equation (4.8)) where
L ~ V1/d is the linear dimension of the region. Thus the CPU time scales
with the number of processors as n - 1 - z / d . The ratio of communication
time to calculation time therefore scales as n - 1 / n - 1 - z / d = n z / d . For the
alternative algorithm we have to transmit a message every time we change
a spin on the boundary of a region. The boundaries cover a fraction of the
system which scales as L-1. Using the relations given above this means that
the fraction of time spent on communication goes as n 1/d . This implies that
the second algorithm will, at least in theory, have superior performance if we
use a sufficiently large number of processors provided the dynamic exponent
z is greater than one. Since z ~ 2 for the Metropolis algorithm, does this
mean that we should be using the second algorithm? No, we should not,
for a number of reasons. First there are the practical reasons to do with
the synchronization of the different processors given at the beginning of
Section 14.2.1. In addition however, the calculations above do not really give
the whole picture, since they assume that the time spent on communication
is proportional to the amount of data to be communicated. In practice, the
time spent usually does increase linearly with the amount of data, but it
also has an offset—an extra time penalty associated with transmitting any
message, no matter how small—whose size is independent of the amount of
data. Depending on the type of parallel computer used for the computation,
this offset can range from the negligible to being much greater than the linear
term. For some computers therefore, the scaling arguments above may be
completely irrelevant (although for some they are not).
Chapter 15
15.1 (a) Writing the occupation variables of the two sites after the move as
s\ and s'j, the truth table is
s / /
Si i
0 0 0 0
0 1 1 0
1 0 0 1
1 1 1 1
Thus the master expressions are s^ = Sj and s'j =
Appendix A: Answers to problems 431
(b) The values of s£ and s'j now depend on r as well as si and . The
full truth table is
/ /
Si Sj r
0 0 0 0 0
0 0 1 0 0
0 1 0 1 0
0 1 1 1 0
1 0 0 1 0
1 0 1 0 1
1 1 0 1 1
1 1 1 1 1
Suitable master expressions are
A much simpler expression for the same quantity which makes use of the
XOR operation is
432 Appendix A: Answers to problems
Note however, that there is no easy way of finding this simplification us-
ing a Karnaugh map. Karnaugh maps never generate Boolean expressions
containing the XOR operation.
(b) Appropriate master expressions for the interior moves are
where r0 and r1 are random bits which are one with probability 1/2.
Chapter 16
16.1 It is simple to show that
Iterating the same argument we can then derive the skip formula
16.2 The position in the array at which we store the number i depends only
on the value of i itself, so that each time a particular value is generated it is
stored in the same position. Thus, after the generator has been running for
a while, the number pulled out of the array for a particular value of i will
also be the same every time we generate that value of i. Hence the period
of this generator will be the same as that of the original linear congruential
generator, which makes this a poor shuffling scheme.
16.3 Using the transformation method, the number x should be generated
every time our uniform random number generator produces a number
#include <math.h>
#define N (L*L)
#define XNN 1
#define YNN L
int s [N] ;
double prob[5] ;
double beta;
void initialize ()
{
int i ;
for (i=2; i<5; i+=2) prob[i] = exp(-2*beta*i) ;
void sweep()
{
int i , k ;
int nn, sum, delta;
/* Choose a site */
i = N*drandom() ;
if ((nn=i+XNN)>=N) nn -= N;
sum = s [nn] ;
if ((nn=i-XNN)<0) nn += N;
sum += s [nn] ;
if ((nn=i+YNN)>=N) nn -= N;
sum += s [nn] ;
if ((nn=i-YNN)<0) nn += N;
sum += s [nn] ;
B.1 Algorithms for the Ising model 435
delta = sum*s[i];
if (delta<=0) {
s[i] = -s[i];
} else if (drandom()<prob[delta]) {
s[i] = -s[i];
}
}
}
#define N (L*L)
#define XNN 1
#define YNN L
#define ZERO 0x00000000
sweep ()
{
int i,k;
int n1 , n2 , n3 , n4;
436 Appendix B: Sample programs
i = N*drandom();
spin = s[i];
if ((n1=i+XNN)>=N) n1 -= N;
if ((n2=i-XNN)<0) n2 += N;
if ((n3=i+YNN)>=N) n3 -= N;
if ((n4=i-YNN)<0) n4 += N;
a1 = spin's[n1];
a2 = spin~s[n2];
a3 = spin~s[n3];
a4 = spin"s[n4] ;
R1 = al|a2|a3|a4;
R2 = ((al|a2)&(a3|a4))|((a1&a2)|(a3&a4));
if (drandom()<2*pO) rO = Irandom();
else rO = ZERO;
if (drandom()<2*p1) r1 = Irandom();
else r1 = ZERO;
s[i] ~= R2|(R1&r1)|rO;
>
}
B.1 Algorithms for the Ising model 437
/* padd = 1 - exp(-2*beta*J)
* s[] = lattice of spins with helical boundary conditions
* L = constant edge length of lattice
*/
#define N (L*L)
#define XNN 1
#define YNN L
int s [N] ;
double padd;
void step()
{
int i;
int sp;
int oldspin,newspin;
int current, nn;
int stack[N];
i = N*drandom();
stack[0] = i;
sp = 1;
oldspin = s[i];
newspin = -s [i];
s[i] = newspin;
while (sp) {
current = stack[—sp];
438 Appendix B: Sample programs
if ((nn=current+XNN)>=N) nn -= N;
if (s[nn]==oldspin)
if (drandom( )<padd) {
stack[sp++] = nn;
s[nn] = newspin;
}
if ((nn=current-XNN)<0) nn += N;
if (s[nn]==oldspin)
if (drandom( )<padd) {
stack[sp++] = nn;
s[nn] = newspin;
}
if ((nn=current+YNN)>=N) nn -= N;
if (s[nn]==oldspin)
if (drandom( )<padd) {
stack [sp+-i-] = nn;
s[nn] = newspin;
}
if ((nn=current-YNN)<0) nn += N;
if (s[nn]==oldspin)
if (drandom( )<padd) {
stack[sp++] = nn;
s[nn] = newspin;
}
}
}
#include <math.h>
#define N (L*L)
#define XNN 1
#define YNN L
int s [N] ;
int up[N],down[N];
int nup,ndown;
double prob[17];
double beta;
void initialize()
{
int i;
for (i=0; i<17; i++) prob[i] = exp(-beta*i);
}
void sweep()
{.
int i;
int delta;
int iup , idown ;
int xup,xdown;
int upnn1 , upnn2, upnn3, upnn4;
int downnn1, downnn2, downnn3, downnn4;
int term1,term2;
440 Appendix B: Sample programs
iup = nup*drandom();
idown = ndown*drandom();
xup = up[iup];
xdown = down[idown];
if ((upnnl=xup+XNN)>=N) upnnl -= N;
if ((upnn2=xup-XNN)<0) upnn2 += N;
if ((upnn3=xup+YNN)>=N) upnn3 -= N;
if ((upnn4=xup-YNN)<0) upnn4 += N;
if ((downnn1=xdown+XNN)>=N) downnn1 -= N;
if ((downnn2=xdown-XNN)<0) downnn2 += N;
if ((downnn3=xdown+YNN)>=N) downnn3 -= N;
if ((downnn4=xdown-YNN)<0) downnn4 += N;
term1 = s[upnnl]+s[upnn2]+s[upnn3]+s[upnn4];
term2 = -s[downnn1]-s[downnn2]-s[downnn3]-s[downnn4];
s[xup] = -1;
s[xdown] = +1;
term1 -= -s[upnnl]-s[upnn2]-s[upnn3]-s[upnn4];
term2 -= s[downnn1]+s[downnn2]+s[downnn3]+s[downnn4];
if (delta<=0) {
up[iup] = xdown;
down[idown] = xup;
} else if (drandom()<prob[delta]) {
up[iup] = xdown;
down[idown] = xup;
} else {
s[xup] = +1;
s[xdown] = -1;
>
}
}
#define N (L*L)
#define XNN 1
#define YNN L
442 Appendix B: Sample programs
int s [N] ;
int coord [N] ;
int loc [N] ;
int nup[5] ,ndown[5] ;
int up [5] [N] , down [5] [N] ;
double time;
double prob [5] ;
void sweep ()
{
int i , j ;
int cl,c2;
int 11,12;
int xl,x2;
int nn;
double rd,sum;
double sumup , sumdown ;
time += 1 .0/(sumup*sumdown) ;
rd = sumup*drandom() ;
for (cl=0,sum=0.0; cl<5; cl++) {
sum += nup[c1]*prob[cl] ;
if (sum>rd) break;
}
11 = nup[c1]*drandom() ;
x1 = up [c1][11];
rd = sumdown*drandom( ) ;
B.2 Algorithms for the COP Ising model 443
8 Del] = -1;
s[x2] = +1;
up[cl][11] = up[c1][—nup[cl]];
loc [up [c1][11]] = 11;
coord[xl] = c1 = spincoord(xl);
loc[x1] = ndown.[c1] ;
down[c1] [ndown[c1]++] = x1;
if ((nn=xl+XNN)>=N) nn -= N;
update(nn);
if ((nn=x1-XNN)<0) nn += N;
update(nn);
if ((nn=xl+YNN)>=N) nn -= N;
update(nn);
if ((nn=x1-YNN)<0) nn += N;
update(nn);
if ((nn=x2+XNN)>=N) nn -= N;
update(nn);
if ((nn=x2-XNN)<0) nn += N;
update(nn);
444 Appendix B: Sample programs
if ((nn=x2+YNN)>=N) nn -= N;
update(nn);
if ((nn=x2-YNN)<0) nn += N;
update(nn);
}
}
if ((nn1=spin+XNN)>=N) nn1 -= N;
if ((nn2=spin-XNN)<0) nn2 += N;
if ((nn3=spin+YNN)>=N) nn3 -= N;
if ((nn4=spin-YNN)<0) nn4 += N;
return (s[spin]*(s[nn1]+s[nn2]+s[nn3]+s[nn4])+4)/2;
}
c = coord[spin] ;
1 = loc [spin];
if (s[spin]==+l) {
up[c][1] =up[c][—nup[c]];
loc[up[c][l]] = 1;
coord[spin] = c = spincoord(spin);
loc[spin] = nup[c];
up[c] [nup [c] ++] = spin;
} else {
down[c] [1] = down[c] [—ndown[c]];
loc [down[c] [1]] = 1;
coord[spin] = c = spincoord(spin);
loc[spin] = ndown[c];
down[c][ndown[c]++] = spin;
}
}
B.3 Algorithms for Potts models 445
#define N (L*L)
#define XNN 1
#define YNN L
int s [N] ;
double prob[5];
void sweep( )
{
int i ,n;
int states;
int nright,nleft,nup,ndown;
int sright,sleft,sup,sdown;
int zright,zleft,zup,zdown;
double bright,bleft,bup,bdown;
double sum,rn;
i = N*drandom();
if ((nright=i+XNN)>=N) nright -= N;
446 Appendix B: Sample programs
if ((nleft=i-XNN)<0) nleft += N;
if ((nup=i+YNN)>=N) nup -= N;
if ((ndown=i-YNN)<0) ndown += N;
sright = s[nright];
sleft = s[nleft];
sup = s [nup];
sdown = s[ndown];
states = ((zup+zdown)*zright*zleft +
(zright+zleft)*zup*zdown)/
(zup*zdown*zleft*zright);
bright = prob[zright]/zright;
bleft = prob[zleft]/zleft;
bup = prob[zup]/zup;
bdown = prob[zdown]/zdown;
/* Choose the new state for the spin using the heat-bath
* algorithm
*/
if (rn<bright) {
s[i] = sright;
} else {
rn -= bright;
if (rn<bleft) {
s [i] = sleft;
} else {
rn -= bleft;
if (rn<bup) {
s[i] = sup;
} else {
rn -= bup;
if (rn<bdown) {
s[i] = sdown;
} else {
rn -= bdown;
do {
if (sright<=rn) {
rn += 1.0/zright;
sright = q;
continue;
}
if (sleft<=rn) {
rn += 1.0/zleft;
sleft = q;
continue;
}
if (sup<=rn) {
rn += 1.0/zup;
sup = q;
continue;
}
if (sdown<=rn) {
rn += 1.0/zdown;
sdown = q;
continue;
}
break;
} while (TRUE);
448 Appendix B: Sample programs
s[i] = rn;
>
}
}
>
}
}
#define N (L*(L+1))
#define XNN 1
#define YNN L
#define LU 2
#define RD (-LU)
#define RU 3
#define LD (-RU)
move()
{
int i;
int first,curr;
int currarrow;
int nc,cand;
int clist[2];
int len=0,noloop=l;
first = N*random();
lasttime[first] = iter;
step[len++] = curr = first;
currarrow = arrow[curr] ;
do {
nc = 0;
switch (currarrow) {
case LU:
if ((cand=curr-XNN)<0) cand += N;
if (arrow[cand]==LD) clist[nc++] = cand;
if ((cand=curr-XNN-YNN)<0) cand += N;
if (arrow[cand]==LU) clist[nc++] = cand;
if ((cand=curr-YNN)<0) cand += N;
if (arrow[cand]==RU) clist[nc++] = cand;
break;
case LD:
if ((cand=curr-XNN)<0) cand += N;
if (arrow[cand]==LU) clist[nc++] = cand;
if ((cand=curr-XNN+YNN)>=N) cand -= N;
if (arrow[cand]==LD) clist[nc++] = cand;
if ((cand=curr+YNN)>=N) cand -= N;
if (arrow[cand]==RD) clist[nc++] = cand;
break;
450 Appendix B: Sample programs
case RU:
if ((cand=curr+XNN)>=N) cand -= N;
if (arrow[cand]==RD) clist[nc++] = cand;
if ((cand=curr+XNN-YNN)<0) cand += N;
if (arrow[cand]==RU) clist[nc++] = cand;
if ((cand=curr-YNN)<0) cand += N;
if (arrow [cand]==LU) clist [nc++] = cand;
break;
case RD:
if ((cand=curr+XNN)>=N) cand -= N;
if (arrow[cand]==RU) clist [nc++] = cand;
if ((cand=curr+XNN+YNN)>=N) cand -= N;
if (arrow [cand]==RD) clist [nc++] = cand;
if ((cand=curr+YNN)>=N) cand -= N;
if (arrow [cand] ==LD) clist [nc++] = cand;
break;
if (lasttime[clist[0]]==iter) {
step[len++] = clist[0];
noloop = 0;
} else if (lasttime[clist[l]]==iter) {
step[len++] = clist[l];
noloop = 0;
} else {
} while (noloop) ;
B.5 Random number generators 451
/* Follow the path up to the loop and then reverse all the
* arrows in it
*/
#define a 16807
#define m 2147483647
#define q 127773
#define r 2836
#define conv (1.0/(m-1))
long i;
double drandom()
{
long 1;
1 = i/q;
i = a*(i-q*l) - r*l;
if (i<0) i += m;
return conv*(i-1);
>
452 Appendix B: Sample programs
#include <math.h>
#define a 16807
#define m 2147483647
#define q 127773
#define r 2836
#define conv (1.0/(m-1))
#define N 64
long i;
long y;
long j [N];
double drandom()
{
long 1;
long k;
1 = i/q;
i = a*(i-q*l) - r*1;
if (i<0) i += m;
k = floor((double) y*N/m);
y = j[k];
j[k] = i;
return conv*(y-1);
>
and discussed in Section 16.1.6. The first function, seed(), takes a single
unsigned long integer as a seed and seeds the array used by the generator
with a simple linear congruential generator. The second, drandom(), is the
generator itself. As you can see, the code for the generator is very brief,
which makes it a particularly fast way of generating random numbers. The
third function, Irandom(), is a version of the same generator to generate
random 32-bit integers which could be used, for example, in multispin cod-
ing applications, such as the Ising model algorithm given in Section B.1.2. In
fact, the exact same generator could equally well be used to generate 64-bit
integers on a computer which had 64-bit words. Only the conversion factor
conv2 would have to be changed and the seeding routine would have to be
modified to initialize the array ia[] with 64-bit integers.
#define a 2416
#define c 374441
#define m 1771875
#define conv1 2423.9674
#define conv2 (1/4294967296.0)
double drandom()
{
if (--p<0) p = 54;
if (--pp<0) pp = 54;
return conv2*(ia[p]+=ia[pp]);
>
long Irandom()
{
if (--p<0) p = 54;
if (--pp<0) pp = 54;
return ia[p]+=ia[pp];
}
This page intentionally left blank
Index