I Fundamentals 1
I Fundamentals 1
Preface ii
List of Symbols vi
Contents ix
I Fundamentals 1
1 Introduction 2
1.1 Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Proving Algorithm Correctness . . . . . . . . . . . . . . . . 7
1.4 Algorithm Analysis . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.6 A Case Study: Maximum Subsequence Sum . . . . . . . . . 12
1.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.9 Chapter Notes . . . . . . . . . . . . . . . . . . . . . . . . . . 24
ix
CONTENTS x
3 Analyzing Algorithms 56
3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.2 Big-O Notation . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3 Big-Ω and Big-Θ . . . . . . . . . . . . . . . . . . . . . . . . 61
3.4 Operations on Sets . . . . . . . . . . . . . . . . . . . . . . . 65
3.5 Smooth Functions and Summations . . . . . . . . . . . . . . 67
3.6 Analyzing while Loops . . . . . . . . . . . . . . . . . . . . . 71
3.7 Analyzing Recursion . . . . . . . . . . . . . . . . . . . . . . 72
3.8 Analyzing Space Usage . . . . . . . . . . . . . . . . . . . . . 80
3.9 Multiple Variables . . . . . . . . . . . . . . . . . . . . . . . . 81
3.10 Little-o and Little-ω . . . . . . . . . . . . . . . . . . . . . . . 88
3.11 * Use of Limits in Asymptotic Analysis . . . . . . . . . . . . 91
3.12 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.13 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
3.14 Chapter Notes . . . . . . . . . . . . . . . . . . . . . . . . . . 103
9 Graphs 309
9.1 Universal Sink Detection . . . . . . . . . . . . . . . . . . . . 311
9.2 Topological Sort . . . . . . . . . . . . . . . . . . . . . . . . . 313
9.3 Adjacency Matrix Implementation . . . . . . . . . . . . . . . 316
9.4 Adjacency List Implementation . . . . . . . . . . . . . . . . 319
9.5 Multigraphs . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
9.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
9.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
9.8 Chapter Notes . . . . . . . . . . . . . . . . . . . . . . . . . . 330
CONTENTS xii
16 N P-Completeness 508
16.1 Boolean Satisfiability . . . . . . . . . . . . . . . . . . . . . . 508
16.2 The Set P . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511
16.3 The Set N P . . . . . . . . . . . . . . . . . . . . . . . . . . . 513
16.4 Restricted Satisfiability Problems . . . . . . . . . . . . . . . 517
16.5 Vertex Cover and Independent Set . . . . . . . . . . . . . . . 523
16.6 3-Dimensional Matching . . . . . . . . . . . . . . . . . . . . 527
16.7 Partitioning and Strong N P-Completeness . . . . . . . . . . 532
16.8 Proof of Cook’s Theorem . . . . . . . . . . . . . . . . . . . . 539
16.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553
16.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554
16.11 Chapter Notes . . . . . . . . . . . . . . . . . . . . . . . . . . 559
CONTENTS xiv
Bibliography 588
Index 597
Part I
Fundamentals
1
Chapter 1
Introduction
1.1 Specifications
Before we can design or analyze any software component — including an
algorithm or a data structure — we must first know what it is supposed
to accomplish. A formal statement of what a software component is meant
to accomplish is called a specification. Here, we will discuss specifically the
specification of an algorithm. The specification of a data structure is similar,
but a bit more involved. For this reason, we will wait until Chapter 4 to
discuss the specification of data structures in detail.
Suppose, for example, that we wish to find the kth smallest element
of an array of n numbers. Thus, if k = 1, we are looking for the smallest
element in the array, or if k = n, we are looking for the largest. We will refer
to this problem as the selection problem. Even for such a seemingly simple
problem, there is a potential for ambiguity if we are not careful to state the
2
CHAPTER 1. INTRODUCTION 3
1.2 Algorithms
Once we have a specification, we need to produce an algorithm to implement
that specification. This algorithm is a precise statement of the computa-
tional steps taken to produce the results required by the specification. An
algorithm differs from a program in that it is usually not specified in a pro-
gramming language. In this book, we describe algorithms using a notation
that is precise enough to be implemented as a programming language, but
which is designed to be read by humans.
A straightforward approach to solving the selection problem is as follows:
If we already know how to sort, then we have solved our problem; otherwise,
we must come up with a sorting algorithm. By using sorting to solve the
selection problem, we say that we have reduced the selection problem to the
sorting problem.
Solving a problem by reducing it to one or more simpler problems is the
essence of the top-down approach to designing algorithms. One advantage
to this approach is that it allows us to abstract away certain details so that
we can focus on the main steps of the algorithm. In this case, we have a
selection algorithm, but our algorithm requires a sorting algorithm before
it can be fully implemented. This abstraction facilitates understanding of
the algorithm at a high level. Specifically, if we know what is accomplished
by sorting — but not necessarily how it is accomplished — then because
the selection algorithm consists of very little else, we can readily understand
what it does.
When we reduce the selection problem to the sorting problem, we need
a specification for sorting as well. For this problem, the precondition will be
that A[1..n] is an array of Numbers, where n ∈ N. Our postcondition will be
that A[1..n] is a permutation of its initial values such that for 1 ≤ i < j ≤ n,
A[i] ≤ A[j] — i.e., that A[1..n] contains its initial values in nondecreasing
order. Our selection algorithm is given in Figure 1.2. Note that Sort is
only specified — its algorithm is not provided.
Let us now refine the SimpleSelect algorithm of Figure 1.2 by design-
ing a sorting algorithm. We will reduce the sorting problem to the problem
of inserting an element into a sorted array. In order to complete the re-
duction, we need to have a sorted array in which to insert. We have thus
returned to our original problem. We can break this circularity, however,
CHAPTER 1. INTRODUCTION 6
SimpleSelect(A[1..n], k)
Sort(A[1..n])
return A[k]
InsertSort(A[1..n])
if n > 1
InsertSort(A[1..n − 1])
Insert(A[1..n])
RecursiveInsert(A[1..n])
if n > 1 and A[n] < A[n − 1]
A[n] ↔ A[n − 1]
RecursiveInsert(A[1..n − 1])
TailRecursiveAlgorithm(p1 , . . . , pk )
if hBooleanConditioni
hBaseCasei
else
hRecursiveCaseComputationi
TailRecursiveAlgorithm(q1 , . . . , qk )
IterativeAlgorithm(p1 , . . . , pk )
while not hBooleanConditioni
hRecursiveCaseComputationi
p 1 ← q 1 ; . . . ; pk ← q k
hBaseCasei
IterativeInsert(A[1..n])
j←n
// Invariant: A[1..n] is a permutation of its original values such that
// for 1 ≤ k < k ′ ≤ n, if k ′ 6= j, then A[k] ≤ A[k ′ ].
while j > 1 and A[j] < A[j − 1]
A[j] ↔ A[j − 1]; j ← j − 1
InsertionSort(A[1..n])
// Invariant: A[1..n] is a permutation of its original values
// such that A[1..i − 1] is in nondecreasing order.
for i ← 1 to n
j←i
// Invariant: A[1..n] is a permutation of its original values
// such that for 1 ≤ k < k ′ ≤ i, if k ′ 6= j, then A[k] ≤ A[k ′ ].
while j > 1 and A[j] < A[j − 1]
A[j] ↔ A[j − 1]; j ← j − 1
ment “A[1..j] ← A[1..j − 1]” is not clear, we instead simulated the recursion
by the statement “j ← j − 1”.
Prior to the loop in IterativeInsert, we have given a loop invariant,
which is a statement that must be true at the beginning and end of each We consider each
test of a loop
iteration of the loop. Loop invariants will be an essential part of the cor- exit condition to
rectness proof techniques that we will introduce in the next chapter. For mark the
now, however, we will use them to help us to keep track of what the loop is beginning/end of
an iteration.
doing.
The resulting sorting algorithm, commonly known as insertion sort, is
shown in Figure 1.7. Besides removing the two recursive calls, we have also
combined the two functions into one. Also, we have started the for loop at
1, rather than at 2, as our earlier discussion suggested. As far as correctness
CHAPTER 1. INTRODUCTION 12
goes, there is no difference which starting point we use, as the inner loop
will not iterate when i = 1; however, it turns out that the correctness proof,
which we will present in the next chapter, is simpler if we begin the loop at
1. Furthermore, the impact on performance is minimal.
While analysis techniques can be applied to analyze stack usage, a far
more common application of these techniques is to analyze running time. For
example, we can use these techniques to show that while InsertionSort is
not, in general, a very efficient sorting algorithm, there are important cases
in which it is a very good choice. In Section 1.6, we will present a case study
that demonstrates the practical utility of running time analysis.
Figure 1.8 The subsequence with maximum sum may begin and end any-
where in the array, but must be contiguous
0 n−1
Note that when i = j, the sum has a beginning index of i and an ending
index of i − 1. By convention, we always write summations so that the
index (k in this case) increases from its initial value (i) to its final value
(j − 1). As a result of this convention, whenever the final value is less than
the initial value, the summation contains no elements. Again by convention,
such an empty summation is defined to have a value of 0. Thus, in the above Similar
conventions hold
definition, we are including the empty sequence and assuming its sum is 0. for products,
The specification for this problem is given in Figure 1.9. Note that according except that an
to this specification, the values in A[0..n − 1] may not be modified. empty product is
assumed to have
a value of 1.
Example 1.1 Suppose A[0..5] = h−1, 3, −2, 7, −9, 7i. Then the subsequence
A[1..3] = h3, −2, 7i has a sum of 8. By exhaustively checking all other con-
CHAPTER 1. INTRODUCTION 14
MaxSumIter(A[0..n − 1])
m←0
for i ← 0 to n
for j ← i to n
sum ← 0
for k ← i to j − 1
sum ← sum + A[k]
m ← Max(m, sum)
return m
tiguous subsequences, we can verify that this is, in fact, the maximum. For
example, the subsequence A[1..5] has a sum of 6.
Example 1.2 Suppose A[0..3] = h−3, −4, −1, −5i. Then all nonempty
subsequences have negative sums. However, any empty subsequence (e.g.,
A[0..−1]) by definition has a sum of 0. The maximum subsequence sum of
this array is therefore 0.
We can easily obtain an algorithm for this problem by translating the
definition of a maximum subsequence sum directly into an iterative solution.
The result is shown in Figure 1.10. By applying the analysis techniques
of Chapter 3, it can be shown that the running time of this algorithm is
proportional to n3 , where n is the size of the array.
In order to illustrate the practical ramifications of this analysis, we im-
plemented this algorithm in the JavaTM programming language and ran it
on a personal computer using randomly generated data sets of size 2k for
various values of k. On a data set of size 210 = 1024, MaxSumIter required
less than half a second, which seems reasonably fast. However, as the size
of the array increased, the running time degraded quickly:
MaxSumOpt(A[0..n − 1])
m←0
for i ← 0 to n − 1
sum ← 0
for k ← i to n − 1
sum ← sum + A[k]
m ← Max(m, sum)
return m
Notice that as the size of the array doubles, the running time increases
by roughly a factor of 8. This is not surprising if we realize that the running
time should be cn3 for some c, and c(2n)3 = 8cn3 . We can therefore estimate
that for a data set of size 217 = 131,072, the running time should be 83 = 512
times as long as for 214 . This running time is over a week!
If we want to solve this problem on large data sets, we clearly would
like to improve the running time. A careful examination of Figure 1.10
reveals some redundant computation. Specifically, the inner loop computes
sums of successively longer subsequences. Much work can be saved if we
compute sums of successive sequences by simply adding the next element
to the preceding sum. Furthermore, a small optimization can be made by
running the outer loop to n − 1, as the inner loop would not execute on this
iteration. The result of this optimization is shown in Figure 1.11.
It turns out that this algorithm has a running time proportional to n2 ,
which is an improvement over n3 . To show how significant this improvement
is, we again coded this algorithm and timed it for arrays of various sizes.
The difference was dramatic — for an input of size 217 , which we estimated
would require a week for MaxSumIter to process, the running time of
MaxSumOpt was only 33 seconds. However, because c(2n)2 = 4cn2 , we
would expect the running time to increase by a factor of 4 each time the
array size is doubled. This behavior proved to be true, as a data set of size
220 = 1,048,576 required almost 40 minutes. Extrapolating this behavior,
CHAPTER 1. INTRODUCTION 16
Figure 1.12 The suffix with maximum sum may begin anywhere in the
array, but must end at the end of the array
0 n−1
we would expect a data set of size 224 = 16,777,216 to require over a week.
(MaxSumIter would require over 43,000 years to process a data set of this
size.)
While MaxSumOpt gives a dramatic speedup over MaxSumIter, we
would like further improvement if we wish to solve very large problem in-
stances. Note that neither MaxSumIter nor MaxSumOpt was designed
using the top-down approach. Let us therefore consider how we might solve
the problem in a top-down way. For ease of presentation let us refer to the
maximum subsequence sum of A[0..n−1] as sn . Suppose we can obtain sn−1
(i.e., the maximum subsequence sum of A[0..n − 2]) for n > 0. Then in order
to compute the overall maximum subsequence sum we need the maximum
of sn−1 and all of the sums of subsequences A[i..n − 1] for 0 ≤ i ≤ n. Thus,
we need to solve another problem, that of finding the maximum suffix sum
(see Figure 1.12), which we define to be
(n−1 )
X
max A[k] | 0 ≤ i ≤ n .
k=i
In other words, the maximum suffix sum is the maximum sum that we
can obtain by starting at any index i, where 0 ≤ i ≤ n, and adding together
all elements from index i up to index n − 1. (Note that by taking i = n, we
include the empty sequence in this maximum.) We then have a top-down
solution for computing the maximum subsequence sum:
(
0 if n = 0
sn = (1.1)
max(sn−1 , tn ) if n > 0,
MaxSumTD(A[0..n − 1])
if n = 0
return 0
else
return Max(MaxSumTD(A[0..n−2]), MaxSuffixTD(A[0..n−1]))
Let us consider how to compute the maximum suffix sum tn using the
top-down approach. Observe that every suffix of A[0..n − 1] — except the
empty suffix — ends with A[n − 1]. If we remove A[n − 1] from all of these
suffixes, we obtain all of the suffixes of A[0..n−2]. Thus, tn−1 +A[n−1] = tn
unless tn = 0. We therefore have
Using (1.1) and (1.2), we obtain the recursive solution given in Figure 1.13.
Note that we have combined the algorithm for MaxSuffixTD with its
specification.
Unfortunately, an analysis of this algorithm shows that it also has a run-
ning time proportional to n2 . What is worse, however, is that an analysis
of its stack usage reveals that it is linear in n. Indeed, the program im-
plementing this algorithm threw a StackOverflowError on an input of
size 212 .
While these results are disappointing, we at least have some techniques
for improving the stack usage. Note that in both MaxSumTD and Max-
SuffixTD, the recursive calls don’t depend on any of the rest of the compu-
tation; hence, we should be able to implement both algorithms in a bottom-
CHAPTER 1. INTRODUCTION 18
MaxSumBU(A[0..n − 1])
m ← 0; msuf ← 0
// Invariant: m is the maximum subsequence sum of A[0..i − 1],
// msuf is the maximum suffix sum for A[0..i − 1]
for i ← 0 to n − 1
msuf ← Max(0, msuf + A[i])
m ← Max(m, msuf )
return m
104
103
Time in seconds
102 MaxSumIter
MaxSumOpt
MaxSumTD
101 MaxSumBU
100
10−1
10−2 10
2 212 214 216 218 220 222 224
Array size
because it makes only one pass through the input, it does not require that
the data be stored in an array. Rather, it can simply process the data as it
reads each element. As a result, it can be used for very large data sets that
might not fit into main memory.
Thus, we can see the importance of efficiency in the design of algorithms.
Furthermore, we don’t have to code the algorithm and test it to see how effi-
cient it is. Instead, we can get a fairly good idea of its efficiency by analyzing
it using the techniques presented in Chapter 3. Finally, understanding these
analysis techniques will help us to know where to look to improve algorithm
efficiency.
1.7 Summary
The study of algorithms encompasses several facets. First, before an algo-
rithm or data structure can be considered, a specification of the requirements
must be made. Having a specification, we can then design the algorithm or
data structure with a proof of correctness in mind. Once we have convinced
CHAPTER 1. INTRODUCTION 20
ourselves that our solution is correct, we can then apply mathematical tech-
niques to analyze its resource usage. Such an analysis gives us insight into
how useful our solution might be, including cases in which it may or may
not be useful. This analysis may also point to shortcomings upon which we
might try to improve.
The top-down approach is a useful framework for designing correct, effi-
cient algorithms. Furthermore, algorithms presented in a top-down fashion
can be more easily understood. Together with the top-down approach, tech-
niques such as bottom-up implementation and elimination of tail recursion
— along with others that we will present later — give us a rich collection of
tools for algorithm design. We can think of these techniques as algorithmic
design patterns, as we use each of them in the design of a wide variety of
algorithms.
In Chapters 2 and 3, we will provide the foundations for proving al-
gorithm correctness and analyzing algorithms, respectively. In Part II, we
will examine several of the most commonly-used data structures, including
those that are frequently used by efficient algorithms. In Part III, we ex-
amine the most common approaches to algorithm design. In Part IV, we
will study several specific algorithms to which many other problems can be
reduced. Finally, in Part V, we will consider a class of problems believed
to be computationally intractable and introduce some techniques for coping
with them.
1.8 Exercises
Exercise 1.1 We wish to design an algorithm that takes an array A[0..n−1]
of numbers in nondecreasing order and a number x, and returns the location
of the first occurrence of x in A[0..n − 1], or the location at which x could
be inserted without violating the ordering if x does not occur in the array.
Give a formal specification for this problem. The algorithm shown in Figure
1.16 should meet your specification.
Exercise 1.2 Give an iterative algorithm that results from removing the
tail recursion from the algorithm shown in Figure 1.16. Your algorithm
should meet the specification described in Exercise 1.1.
Exercise 1.3 Figure 1.17 gives a recursive algorithm for computing the dot
product of two vectors, represented as arrays. Give a bottom-up implemen-
tation of this algorithm.
CHAPTER 1. INTRODUCTION 21
Find(A[0..n − 1], x)
if n = 0 or A[n − 1] < x
return n
else
return Find(A[0..n − 2], x)
DotProduct(A[1..n], B[1..n])
if n = 0
return 0
else
return DotProduct(A[1..n − 1], B[1..n − 1]) + A[n]B[n]
c. The specification of Copy does not prohibit the two arrays from shar-
ing elements, for example, Copy(A[1..n − 1], A[2..n]). Modify your
CHAPTER 1. INTRODUCTION 22
Precondition: n is a Nat.
Postcondition: For 1 ≤ i ≤ n, B[i] is modified to equal A[i].
Copy(A[1..n], B[1..n])
algorithm from part a to handle any two arrays of the same size.
Specifically, you cannot assume that the recursive call does not change
A[1..n]. Your algorithm should contain exactly one recursive call and
no loops.
Proving Algorithm
Correctness
25
CHAPTER 2. PROVING ALGORITHM CORRECTNESS 26
P : N → {true, false};
Though this technique is also valid, the version given by Theorem 2.3 is
more appropriate for the study of algorithms. To see why, consider how we
are using the top-down approach. We associate with each input a natural
number that in some way describes the size of the input. We then recursively
apply the algorithm to one or more inputs of strictly smaller size. Theorem
2.3 tells us that in order to prove that this algorithm is correct for inputs of
all sizes, we may assume that for arbitrary n, the algorithm is correct for all
inputs of size less than n. Thus, we may reason about a recursive algorithm
in the same way we reason about an algorithm that calls other algorithms,
provided the size of the parameters is smaller for the recursive calls.
Now let us turn to the proof of Theorem 2.3.
step”, for such cases. It is stylistically better to separate these cases into
one or more base cases. Hence, even though a base case is not required, we
usually include one (or more).
Now we will illustrate the principle of induction by proving the correct-
ness of InsertSort.
Proof: By induction on n.
Base: n ≤ 1. In this case the algorithm does nothing, but its postcondition
is vacuously satisfied (i.e., there are no i, j such that 1 ≤ i < j ≤ n).
Induction Hypothesis: Assume that for some n > 1, for every k < n,
InsertSort(A[1..k]) satisfies its specification.
Induction Step: We first assume that initially, the precondition for Insert-
Sort(A[1..n]) is satisfied. Then the precondition for InsertSort(A[1..n −
1]) is also initially satisfied. By the Induction Hypothesis, we conclude
that InsertSort(A[1..n − 1]) satisfies its specification; hence, its postcon-
dition holds when it finishes. Let A′′ denote the value of A after Insert-
Sort(A[1..n − 1]) finishes. Then A′′ [1..n − 1] is a permutation of A[1..n − 1]
in nondecreasing order, and A′′ [n] = A[n]. Thus, A′′ satisfies the precon-
dition of Insert. Let A′ denote the value of A after Insert(A[1..n]) is
called. By the postcondition of Insert, A′ [1..n] is a permutation of A[1..n]
in nondecreasing order. InsertSort therefore satisfies its specification.
4. Correctness: Whenever the loop invariant and the loop exit condi-
tion both hold, then P must hold.
Proof: Suppose the precondition holds initially. We will show that when
the loop finishes, m contains the maximum subsequence sum of A[0..n − 1],
so that the postcondition is satisfied. Note that the loop invariant states
that m is the maximum subsequence sum of A[0..i − 1].
Initialization: Before the loop iterates the first time, i has a value of 0.
The maximum subsequence sum of A[0..−1] is defined to be 0. m is initially
assigned this value. Likewise, the maximum suffix sum of A[0..−1] is defined
to be 0, and msuf is initially assigned this value. Therefore, the invariant
initially holds.
CHAPTER 2. PROVING ALGORITHM CORRECTNESS 31
• i′ = i + 1.
m′ = max(m, msuf ′ )
= max(si , ti′ )
(h−1 ) (i′ −1 )!
X X
= max max A[k] | 0 ≤ l ≤ h ≤ i , max A[k] | 0 ≤ l ≤ i′
k=l k=l
(h−1 )
X
= max A[k] | 0 ≤ l ≤ h ≤ i′
k=l
= si′ .
Termination: Because the loop is a for loop, it clearly terminates. In this textbook,
a for loop always
contains a single
Correctness: The loop exits when i = n. Thus, from the invariant, m is index variable,
the maximum subsequence sum of A[0..n − 1] when the loop terminates. which either is
incremented by a
fixed positive
As can be seen from the above proof, initialization and maintenance can amount each
iteration until it
be shown using techniques we have already developed. Furthermore, the exceeds a fixed
correctness step is simply logical inference. In the case of Theorem 2.5, value or is
decremented by
termination is trivial, because for loops always terminate. Note, however, a fixed positive
that in order for such a proof to be completed, it is essential that a proper amount each
loop invariant be chosen. Specifically, the invariant must be chosen so that: iteration until it
is less than a
fixed value. The
• it is true every time the loop condition is tested; index cannot be
changed
• it is possible to prove that if it is true at the beginning of an arbitrary otherwise. Such
loops will always
iteration, it must also be true at the end of that iteration; and terminate.
Correctness case
• when coupled with the loop exit condition, it is strong enough to prove for Theorem 2.5
the desired correctness property. corrected
1/27/12.
Thus, if we choose an invariant that is too strong, it may not be true each
time the loop condition is tested. On the other hand, if we choose an invari-
ant that is too weak, we may not be able to prove the correctness property.
Furthermore, even if the invariant is true on each iteration and is strong
enough to prove the correctness property, it may still be impossible to prove
the maintenance step. We will discuss this issue in more detail shortly.
CHAPTER 2. PROVING ALGORITHM CORRECTNESS 33
while n > 1
if n mod 2 = 0
n ← n/2
else
n ← 3n + 1
Proof: We first observe that each iteration of the while loop decreases j
by 1. Thus, if the loop continues to iterate, eventually j ≤ 1, and the loop
then terminates.
Proving termination of a while loop can be much more difficult than the
proof of the above theorem. For example, consider the while loop shown
in Figure 2.1. The mod operation, when applied to positive integers, gives
the remainder obtained when an integer division is performed; thus, the
if statement tests whether n is even. Though many people have studied
this computation over a number of years, as of this writing, it is unknown
whether this loop terminates for all initial integer values of n. This question
is known as the Collatz problem.
On the other hand, when algorithms are designed using the top-down
approach, proving termination of any resulting while loops becomes much
CHAPTER 2. PROVING ALGORITHM CORRECTNESS 34
easier. Even if an examination of the while loop condition does not help
us to find a proof, we should be able to derive a proof from the reduction
we used to solve the problem. Specifically, a loop results from the reduction
of larger instances of a problem to smaller instances of the same problem,
where the size of the instance is a natural number. We should therefore
be able to prove that the expression denoting the size of the instance is a
natural number that is decreased by every iteration. Termination will then
follow.
For example, consider again the algorithm IterativeInsert. In the
design of this algorithm (see Section 1.4), we reduced larger instances to
smaller instances, where the size of an instance was n, the number of array
elements. In removing the tail recursion from the algorithm, we replaced n
by j. j should therefore decrease as the size decreases. We therefore base
our correctness proof on this fact.
Let us now consider an algorithm with nested loops, such as Insertion-
Sort, shown in Figure 1.7 on page 11. When loops are nested, we apply
the same technique to each loop as we encounter it. Specifically, in order
to prove maintenance for the outer loop, we need to prove that the inner
loop satisfies some correctness property, which should in turn be sufficient to
complete the proof of maintenance for the outer loop. Thus, nested within
the maintenance step of the outer loop is a complete proof (i.e., initialization,
maintenance, termination and correctness) for the inner loop.
When we prove initialization for the inner loop, we are not simply rea-
soning about the code leading to the first execution of that loop. Rather, we
are reasoning about the code that initializes the loop on any iteration of the
outer loop. For this reason, we cannot consider the initialization code for the
outer loop when proving the initialization step for the inner loop. Instead,
because the proof for the inner loop is actually a part of the maintenance
proof for the outer loop, we can use any facts available for use in the proof
of maintenance for the outer loop. Specifically, we can use the assumption
that the invariant holds at the beginning of the outer loop iteration, and
we can reason about any code executed prior to the inner loop during this
iteration. We must then show that the invariant of the inner loop is satisfied
upon executing this code.
We will now illustrate this technique by giving a complete proof that
InsertionSort meets its specification.
Proof: We must show that when the for loop finishes, A[1..n] is a permu-
CHAPTER 2. PROVING ALGORITHM CORRECTNESS 35
Initialization: (Outer loop) When the loop begins, i = 1 and the contents
of A[1..n] have not been changed. Because A[1..i − 1] is an empty array, it
is in nondecreasing order.
Initialization: (Inner loop) Because A[1..n] has not been changed since
the beginning of the current iteration of the outer loop, from the outer loop
invariant, A[1..n] is a permutation of its original values. From the outer
loop invariant, A[1..i − 1] is in nondecreasing order; hence, because j = i,
we have for 1 ≤ k < k ′ ≤ i, where k ′ 6= j, A[k] ≤ A[k ′ ].
ii. A′ [j − 1] = A[j];
iv. j ′ = j − 1.
A′ [k] = A[k]; hence, from the invariant, A[k] ≤ A[j − 1]. In either case, we
conclude that A′ [k] ≤ A′ [k ′ ].
Correctness: (Inner loop) Let A′ [1..n] denote the contents of A[1..n] when
the while loop terminates, and let i and j denote their values at this point.
From the invariant, A′ [1..n] is a permutation of its original values. We must
show that A′ [1..i] is in nondecreasing order. Let 1 ≤ k < k ′ ≤ i. We consider
two cases.
This completes the proof for the inner loop, and hence the proof of
maintenance for the outer loop.
Termination: (Outer loop) Because the loop is a for loop, it must termi-
nate.
Correctness: (Outer loop) Let A′ [1..n] denote its final contents. From the
invariant, A′ [1..n] is a permutation of its original values. From the loop exit
condition (i = n + 1) and the invariant, A′ [1..n] is in nondecreasing order.
Therefore, the postcondition is satisfied.
Note that after we have solved the Dutch national flag problem, all
elements less than p appear first in the array, followed by all elements equal
to p, followed by all elements greater than p. Furthermore, because steps 3
and 5 apply to portions of the array that do not contain p, these steps solve
strictly smaller problem instances.
In what follows, we will develop a solution to the Dutch national flag
problem. We will then combine that solution with the above reduction to
obtain a solution to the selection problem (we will simply use the speci-
fication for Median). We will then prove that the resulting algorithm is
correct.
In order to conserve resources, we will constrain our solution to the
Dutch national flag problem to rearrange items by swapping them. We will
reduce a large instance of the problem to a smaller instance. We begin by
examining the last item. If it is blue, then we can simply ignore it and solve
what is left. If it is red, we can swap it with the first item and again ignore
it and solve what is left. If it is white, we need to find out where it belongs;
hence, we temporarily ignore it and solve the remaining problem. We then
swap it with the first blue item, or if there are no blue items, we can leave
it where it is. This algorithm is shown in Figure 2.4.
If we were to implement this solution, or to analyze it using the tech-
niques of Chapter 3, we would soon discover that its stack usage is too high.
Furthermore, none of the recursive calls occur at either the beginning or the
end of the computation; hence, the recursion is not tail recursion, and we
cannot implement it bottom-up.
We can, however, use a technique called generalization that will allow
us to solve the problem using a transformation. We first observe that the
CHAPTER 2. PROVING ALGORITHM CORRECTNESS 40
DutchFlagTD(A[lo..hi], p)
if hi < lo
N ← new Array[1..3]; N [1] ← 0; N [2] ← 0; N [3] ← 0
else if A[hi] < p
A[lo] ↔ A[hi]; N ← DutchFlagTD(A[lo + 1..hi], p)
N [1] ← N [1] + 1
else if A[hi] = p
N ← DutchFlagTD(A[lo..hi − 1], p)
A[hi] ↔ A[lo + N [1] + N [2]]; N [2] ← N [2] + 1
else
N ← DutchFlagTD(A[lo..hi − 1], p); N [3] ← N [3] + 1
return N
only reason we must wait until after the recursive calls to increment the
appropriate element of N is that the recursive call is responsible for con-
structing and initializing N . If instead, we could provide initial values for
N [1..3] to the recursive calls, we could then incorporate the color of the last
element into these initial values. We therefore generalize the problem by
requiring as input initial values for the number of items of each color. The
returned array will then contain values representing the number of items of
the corresponding color, plus the corresponding initial value from the input.
By using 0 for all three initial values, we obtain the number of each color in
the entire array; hence, we have defined a more general problem.
We can use this generalization to make two of the calls tail recursion.
In order to be able to handle a white item, though, we need to modify our
generalization slightly. Specifically, we need to know in advance where to
put a white item. In order to be able to do this, let us specify that if w is
given as the initial value for the number of white items, then the last w items
in the array are white. Note that this variation is still a generalization of
the original problem, because if w = 0, no additional constraints are placed
on the input array.
Suppose we have an instance of this more general problem. If the initial
value for the number of white items is equal to the number of elements in the
array, then we can copy the initial values into N [1..3] and return. Otherwise,
CHAPTER 2. PROVING ALGORITHM CORRECTNESS 41
? ? · · ·? r w w · · · w w r ? · · ·? ? w w · · · w w
swap
? ? · · ·? w w w · · · w w ? ? · · ·? w w w · · · w w
? ? · · ·? b w w · · · w w ? ? · · ·? w w w · · · w b
swap
we examine the item preceding the first known white item (see Figure 2.5).
If it is red, we swap it with the first item and solve the smaller problem
obtained by ignoring the first item. If it is white, we solve the problem that
results from incrementing the initial number of white items. If it is blue, we
swap it with the last element, and solve the smaller problem obtained by
ignoring the last item. A recursive implementation of this strategy is shown
in Figure 2.6.
The way we handle the case in which an item is white is suspicious in
that the reduced instance is an array with the same number of elements.
However, note that in each case, the number of elements of unknown color
is decreased by the reduction. Thus, if we choose our definition of “size”
to be the number of elements of unknown color, then our reduction does
decrease the size of the problem in each case. Recall that our notion of
size is any natural number which decreases in all “smaller” instances. Our
reduction is therefore valid.
Figure 2.7 shows the result of eliminating the tail recursion from Dutch-
FlagTailRec, incorporating it into the selection algorithm described ear-
lier in this section, and making some minor modifications. First, lo and hi
have been replaced by 1 and n, respectively. Second, the array N has been
removed, and r, w, b are used directly instead. Finally, referring to Figure
2.6, note that when a recursive call is made, lo is incremented exactly when
r is incremented, and hi is decremented exactly when b is incremented. Be-
CHAPTER 2. PROVING ALGORITHM CORRECTNESS 42
SelectByMedian(A[1..n], k)
p ← Median(A[1..n]); r ← 0; w ← 0; b ← 0
// Invariant: r, w, b ∈ N, r + w + b ≤ n, and A[i] < p for 1 ≤ i ≤ r,
// A[i] = p for n − b − w < i ≤ n − b, and A[i] > p for n − b < i ≤ n.
while r + w + b < n
j ←n−b−w
if A[j] < p
r ← r + 1; A[j] ↔ A[r]
else if A[j] = p
w ←w+1
else
A[j] ↔ A[n − b]; b = b + 1
if r ≥ k
return SelectByMedian(A[1..r], k)
else if r + w ≥ k
return p
else
return SelectByMedian(A[1 + r + w..n], k − (r + w))
the invariant. However, we must also take into account that the iterations
do not actually change the size of the problem instance; hence, the invari-
ant must also include a characterization of what has been done outside of
A[r + 1..n − b]. The portion to the left is where red items have been placed,
and the portion to the right is where blue items have been placed. We need
to include these constraints in our invariant.
Note that in Figure 2.7, the last line of SelectByMedian contains a
recursive call in which the first parameter is A[1 + r + w..n]. However, the
specification given in Figure 1.1 (page 4) states that the first parameter
must be of the form A[1..n]. To accommodate such a mismatch, we adopt a
convention that allows for automatic re-indexing of arrays when the specifi-
cation requires a parameter to be an array whose beginning index is a fixed
value. Specifically, we think of the sub-array A[1 + r + w..n] as an array
B[1..n − (r + w)]. B is then renamed to A when it is used as the actual
CHAPTER 2. PROVING ALGORITHM CORRECTNESS 44
Proof: By induction on n.
call returns the kth smallest element of A[1..r], which is the kth smallest of
A[1..n].
Case 2: r < k ≤ r + w. In this case, there are fewer than k elements less
than p and at least k elements less than or equal to p. p is therefore the kth
smallest element.
Case 3: r + w < k. In this case, there are fewer than k elements less than
or equal to p. The kth smallest must therefore be greater than p. It must
therefore be in A[r + w + 1..n]. Because every element in A[1..r + w] is less
than the kth smallest, the kth smallest must be the (k − (r + w))th smallest
element in A[r + w + 1..n]. Because p is an element of A[1..n] that is not in
A[r+w+1..n], r+w+1 > 1, so that the number of elements in A[r+w+1..n]
is less than n. Let us refer to A[r + w + 1..n] as B[1..n − (r + w)]. Then
because r + w < k, 1 ≤ k − (r + w), and because k ≤ n, k − (r + w) ≤
n − (r + w). Therefore, the precondition for Select is satisfied by the
recursive call SelectByMedian(B[1..n − (r + w)], k − (r + w)). By the
Induction Hypothesis, this recursive call returns the (k − (r + w))th smallest
element of B[1..n − (r + w)] = A[r + w + 1..n]. This element is the kth
smallest of A[1..n].
In some cases, a recursive call might occur inside a loop. For such cases,
we would need to use the induction hypothesis when reasoning about the
loop. As a result, it would be impossible to separate the proof into a lemma
dealing with the loop and a theorem whose proof uses induction and the
lemma. We would instead need to prove initialization, maintenance, termi-
nation, and correctness of the loop within the induction step of the induction
proof.
MedianBySelect(A[1..n])
return Select(A[1..n], ⌈n/2⌉)
However, (i′ −1 )
X
ti′ = max A[k] | 0 ≤ l ≤ i′ .
k=l
Note that the set on the right-hand side of this last equality has one
more element than does the set on the right-hand side of the preceding
equality. This element is generated by l = i′ , which results in an empty sum
having a value of 0. All of the remaining elements are derived from values
l ≤ i′ − 1, which result in nonempty sums of elements from A[0..i]. Thus, if
A[0..i] contains only negative values, msuf ′ < ti′ . It is therefore impossible
to prove that these values are equal.
A failure to come up with a proof of correctness does not necessarily
mean the algorithm is incorrect. It may be that we have not been clever
enough to find the proof. Alternatively, it may be that an invariant has
not been stated properly, as discussed in Section 2.3. Such a failure always
reveals, however, that we do not yet understand the algorithm well enough
to prove that it is correct.
2.7 Summary
We have introduced two main techniques for proving algorithm correctness,
depending on whether the algorithm uses recursion or iteration:
• The correctness of a recursive algorithm should be shown using induc-
tion.
CHAPTER 2. PROVING ALGORITHM CORRECTNESS 49
2.8 Exercises
Exercise 2.1 Induction can be used to prove solutions for summations. Use
induction to prove each of the following:
a. The arithmetic series:
n
X n(n + 1)
i= (2.1)
2
i=1
b. For every n ∈ N,
φn − (−φ)−n
Fn = √ , (2.6)
5
where φ is the golden ratio:
√
1+ 5
φ= .
2
Exercise 2.6 Prove that DotProduct, shown in Figure 1.17 on page 21,
meets its specification.
Exercise 2.7 Prove that Factorial, shown in Figure 2.9, meets its spec-
ification. n! (pronounced, “n factorial”) denotes the product 1 · 2 · · · n (0! is
defined to be 1).
Precondition: n is a Nat.
Postcondition: Returns n!.
Factorial(n)
p←1
// Invariant: p = (i − 1)!
for i ← 1 to n
p ← ip
return p
MaxSumOpt2(A[0..n − 1])
m←0
// Invariant: m is the maximum of 0 and all sums of sequences A[l..h−1]
// such that 0 ≤ l < i and l ≤ h ≤ n.
for i ← 0 to n − 1
sum ← 0; p ← 0
// Invariant: sum is the sum of the sequence A[i..k − 1], and p is
// the maximum prefix sum of A[i..k − 1].
for k ← i to n − 1
sum ← sum + A[k]
p ← Max(p, sum)
m ← Max(m, p)
return m
CHAPTER 2. PROVING ALGORITHM CORRECTNESS 52
MaxSumIter2(A[0..n − 1])
m←0
// Invariant: m is the maximum of 0 and all sums of sequences A[l..h−1]
// such that 0 ≤ l < i and l ≤ h ≤ n.
for i ← 0 to n
p←0
// Invariant: p is the maximum prefix sum of A[i..j − 1].
for j ← i to n − 1
sum ← 0
// Invariant: sum is the sum of the sequence A[i..k − 1].
for k ← i to j
sum ← sum + A[k]
p ← Max(p, sum)
m ← Max(p, m)
return m
Exercise 2.10 Prove that DutchFlagTD, given in Figure 2.4 (page 40),
meets its specification, given in Figure 2.3 (page 39).
* Exercise 2.14 Prove that Permutations, shown in Figure 2.13, meets Figure 2.13
corrected
it specification. Use the specifications of Copy, AppendToAll, and Fac- 2/10/10.
torial from Figures 1.18, 1.20, and 2.9, respectively.
CHAPTER 2. PROVING ALGORITHM CORRECTNESS 53
InsertionSort2(A[1..n])
for i ← 2 to n
j ← i; t ← A[j]
while j > 1 and A[j − 1] > t
A[j] ← A[j − 1]; j ← j − 1
A[j] ← t
Exercise 2.15 Prove that SwapColors, shown in Figure 2.14, meets its
specification. Note that the second conjunct in the while condition is com-
paring two boolean values; thus, it is true whenever exactly one of A[i] and
A[j] equals p.
Exercise 2.16 Figure 2.15 contains an algorithm for reducing the Dutch
national flag problem to the problem solved in Figure 2.14. However, the
algorithm contains several errors. Work through a proof that this algorithm
meets its specification (given in Figure 2.3 on page 39), pointing out each
place at which the proof fails. At each of these places, suggest a small change
that could be made to correct the error. In some cases, the error might be
in the invariant, not the algorithm itself.
* Exercise 2.17 Reduce the sorting problem to the Dutch national flag
problem and one or more smaller instances of itself.
CHAPTER 2. PROVING ALGORITHM CORRECTNESS 55
DutchFlagFiveBands(A[lo..hi], p)
i ← lo; j ← lo; k ← hi; l ← hi
// Invariant: lo ≤ i ≤ j ≤ hi, lo ≤ k ≤ l ≤ hi, A[lo..i − 1] all equal p,
// A[i..j − 1] are all less than p, A[k..l] are all greater than p, and
// A[l + 1..hi] all equal p.
while j < k
if A[j] < p
j ←j+1
else if A[j] = p
A[j] ↔ A[i]; i ← i + 1; j ← j + 1
else if A[k] = p
A[k] ↔ A[l]; k ← k − 1; l ← l − 1
else if A[k] > p
k ←k−1
else
A[j] ↔ A[k]; j ← j + 1; k ← k − 1
N ← new Array[1..3]
N [1] ← j − i; N [2] ← i − lo + hi − l; N [3] ← l − k
SwapColors(A[lo..j], p)
SwapColors(A[k..hi], p)
return N [1..3]
Analyzing Algorithms
In Chapter 1, we saw that different algorithms for the same problem can have
dramatically different performance. In this chapter, we will introduce tech-
niques for mathematically analyzing the performance of algorithms. These
analyses will enable us to predict, to a certain extent, the performance of
programs using these algorithms.
3.1 Motivation
Perhaps the most common performance measure of a program is its running
time. The running time of a program depends not only on the algorithms it
uses, but also on such factors as the speed of the processor(s), the amount
of main memory available, the speeds of devices accessed, and the impact
of other software utilizing the same resources. Furthermore, the same algo-
rithm can perform differently when coded in different languages, even when
all other factors remain unchanged. When analyzing the performance of an
algorithm, we would like to learn something about the running time of any
of its implementations, regardless of the impact of these other factors.
Suppose we divide an execution of an algorithm into a sequence of steps,
each of which does some fixed amount of work. For example, a step could be
comparing two values or performing a single arithmetic operation. Assuming
the values used are small enough to fit into a single machine word, we could
reasonably expect that any processor could execute each step in a bounded
amount of time. Some of these steps might be faster than others, but for
any given processor, we should be able to identify both a lower bound l > 0
and an upper bound u ≥ l on the amount of time required for any single
execution step, assuming no other programs are being executed by that
56
CHAPTER 3. ANALYZING ALGORITHMS 57
growth rates. Typically, f will be fairly simple, e.g., f (n) = n2 . In this way,
we will be able describe the growth rates of complicated — or even unknown
— functions using well-understood functions like n2 .
The above definition formally defines big-O notation. Let us now dissect
this definition to see what it means. We start with some specific function f
which maps natural numbers to nonnegative real numbers. O(f (n)) is then
defined to be a set whose elements are all functions. Each of the functions in
O(f (n)) maps natural numbers to nonnegative real numbers. Furthermore,
if we consider any function g(n) in O(f (n)), then for every sufficiently large
n (i.e., n ≥ n0 ), g(n) cannot exceed f (n) by more than some fixed constant
factor (i.e., g(n) ≤ cf (n)). Thus, all of the functions in O(f (n)) grow no
faster than some constant multiple of f as n becomes sufficiently large. Note
that the constants n0 and c may differ for different f and g, but are the same
for all n.
Notice that big-O notation is defined solely in terms of mathematical
functions — not in terms of algorithms. Presently, we will show how it
can be used to analyze algorithms. First, however, we will give a series of
examples illustrating some of its mathematical properties.
Example 3.2 Let f (n) = n2 , and let g(n) = 2n2 . Then g(n) ∈ O(f (n))
because g(n) ≤ 2f (n) for every n ≥ 0. Here, the constant n0 is 0, and the
constant c is 2.
Example 3.3 Let f (n) = n2 , and let g(n) = 3n + 10. We wish to show
that g(n) ∈ O(f (n)). Hence, we need to find a positive real number c and
a natural number n0 such that 3n + 10 ≤ cn2 whenever n ≥ n0 . If n > 0,
we can divide both sides of this inequality by n, obtaining an equivalent
inequality, 3 + 10/n ≤ cn. The left-hand side of this inequality is maximized
when n is minimized. Because we have assumed n > 0, 1 is the minimum
value of n. Thus, if we can satisfy cn ≥ 13, the original inequality will be
satisfied. This inequality can be satisfied by choosing c = 13 and n ≥ 1.
Therefore, g(n) ∈ O(f (n)).
CHAPTER 3. ANALYZING ALGORITHMS 59
Example 3.5 1000 ∈ O(1). Here, 1000 and 1 denote constant functions
— functions whose values are the same for all n. Thus, for every n ≥ 0,
1000 ≤ 1000(1).
Example 3.6 O(n) ⊆ O(n2 ); i.e., every function in O(n) is also in O(n2 ).
To see this, note that for any function f (n) ∈ O(n), there exist a positive real
number c and a natural number n0 such that f (n) ≤ cn whenever n ≥ n0 .
Furthermore, n ≤ n2 for all n ∈ N. Therefore, f (n) ≤ cn2 whenever n ≥ n0 .
Example 3.7 O(n2 ) = O(4n2 + 7n); i.e., the sets O(n2 ) and O(4n2 + 7n)
contain exactly the same functions. It is easily seen that O(n2 ) ⊆ O(4n2 +
7n) using an argument similar to that of Example 3.6. Consider any function
f (n) ∈ O(4n2 + 7n). There exist a positive real number c and a natural
number n0 such that f (n) ≤ c(4n2 + 7n) whenever n ≥ n0 . Furthermore,
4n2 + 7n ≤ 11n2 for all n ∈ N. Letting c′ = 11c, we therefore have f (n) ≤
c′ n2 whenever n ≥ n0 . Therefore, f (n) ∈ O(n2 ). Note that although O(n2 )
and O(4n2 + 7n) denote the same set of functions, the preferred notation is
O(n2 ) because it is simpler.
Let us now illustrate the use of big-O notation by analyzing the run-
ning time of MaxSumBU from Figure 1.14 on page 18. The initialization
statements prior to the loop, including the initialization of the loop index i,
require a fixed number of steps. Their running time is therefore bounded by
some constant a. Likewise, the number of steps required by any single iter-
ation of the loop (including the loop test and the increment of i) is bounded
by some constant b. Because the loop iterates n times, the total number of
steps required by the loop is at most bn. Finally, the last loop condition
test and the return statement require a number of steps bounded by some
constant c. The running time of the entire algorithm is therefore bounded
by a + bn + c, where a, b, and c are fixed positive constants. The running
time of MaxSumBU is in O(n), because a + bn + c ≤ (a + b + c)n for all
n ≥ 1.
We can simplify the above analysis somewhat using the following theo-
rem.
Theorem 3.8 Suppose f1 (n) ∈ O(g1 (n)) and f2 (n) ∈ O(g2 (n)). Then
CHAPTER 3. ANALYZING ALGORITHMS 60
Proof: Because f1 (n) ∈ O(g1 (n)) and f2 (n) ∈ O(g2 (n)), there exist positive
real numbers c1 and c2 and natural numbers n1 and n2 such that
and
f2 (n) ≤ c2 g2 (n) whenever n ≥ n2 . (3.2)
Because both of the above inequalities involve only nonnegative numbers,
we may multiply the inequalities, obtaining
the loop are each in O(1). The total running time is then the sum of the
running times of these segments and that of the loop. By applying Theorem
3.8 part 2 twice, we see that the running time of the algorithm is in O(n)
(because max(1, n) ≤ n whenever n ≥ 1).
Recall that the actual running time of the program implementing Max-
SumOpt (Figure 1.11 on page 15) was much slower than that of Max-
SumBU. Let us now analyze MaxSumOpt to see why this is the case.
We will begin with the inner loop. It is easily seen that each iteration
runs in O(1) time. The number of iterations of this loop varies from 1 to n.
Because the number of iterations is in O(n), we can conclude that this loop
runs in O(n) time. It is then easily seen that a single iteration of the outer
loop runs in O(n) time. Because the outer loop iterates n times, this loop,
and hence the entire algorithm, runs in O(n2 ) time.
It is tempting to conclude that this analysis explains the difference in
running times of the implementations of the algorithms; i.e., because n2
grows much more rapidly than does n, MaxSumOpt is therefore much
slower than MaxSumBU. However, this conclusion is not yet warranted,
because we have only shown upper bounds on the running times of the two
algorithms. In particular, it is perfectly valid to conclude that the running
time of MaxSumBU is in O(n2 ), because O(n) ⊆ O(n2 ). Conversely, we
have not shown that the running time of MaxSumOpt is not in O(n).
In general, big-O notation is useful for expressing upper bounds on the
growth rates of functions. In order to get a complete analysis, however, we
need additional notation for expressing lower bounds.
g(n) ∈ Ω(f (n)). We therefore need to find a positive real number c and a
natural number n0 such that n2 ≥ c(3n + 10) for every n ≥ n0 . We have
already found such values in Example 3.3: c = 1/13 and n0 = 1.
The above example illustrates a duality between O and Ω, namely, that
for any positive real number c, g(n) ≤ cf (n) iff f (n) ≥ g(n)/c. The following
theorem summarizes this duality.
Theorem 3.11 Let f : N → R≥0 and g : N → R≥0 . Then g(n) ∈ O(f (n))
iff f (n) ∈ Ω(g(n)).
By applying Theorem 3.11 to Examples 3.2, 3.4, 3.6, and 3.7, we can see
that n2 ∈ Ω(2n2 ), n2 6∈ Ω(n3 ), Ω(n2 ) ⊆ Ω(n), and Ω(n2 ) = Ω(4n2 + 7n).
When we analyze the growth rate of a function g, we would ideally like
to find a simple function f such that g(n) ∈ O(f (n)) and g(n) ∈ Ω(f (n)).
Doing so would tell us that the growth rate of g(n) is the same as that of
f (n), within a constant factor in either direction. We therefore have another
notation for expressing such results.
In other words, Θ(f (n)) is the set of all functions belonging to both
O(f (n)) and Ω(f (n)) (see Figure 3.1). We can restate this definition by
the following theorem, which characterizes Θ(f (n)) in terms similar to the
definitions of O and Ω.
Theorem 3.13 g(n) ∈ Θ(f (n)) iff there exist positive constants c1 and c2
and a natural number n0 such that
whenever n ≥ n0 .
⇒: Suppose g(n) ∈ Θ(f (n)). Then g(n) ∈ O(f (n)) and g(n) ∈ Ω(f (n)).
By the definition of Ω, there exist a positive real number c1 and a natural
number n1 such that c1 f (n) ≤ g(n) whenever n ≥ n1 . By the definition of
O, there exist a positive real number c2 and a natural number n2 such that
CHAPTER 3. ANALYZING ALGORITHMS 63
Figure 3.1 Venn diagram depicting the relationships between the sets
O(f (n)), Ω(f (n)), and Θ(f (n))
11111111111
00000000000
00000000000
11111111111
00000000000
11111111111 00
11 000
111
00000000000
11111111111
00000000000
11111111111 00
11 000
111
00000000000
11111111111 00
11
00
11 000
111
000
111
00000000000
11111111111
00000000000
11111111111 00
11 000
111
00111
11 000
00000000000
11111111111 00
11 000
111
00000000000
11111111111
00000000000
11111111111 00
11 000
111
10101010 O(f (n))
10
00
11 Ω(f (n))
00
11 Θ(f (n))
Corollary 3.14 Let f : N → R≥0 and g : N → R≥0 . Then g(n) ∈ Θ(f (n))
iff f (n) ∈ Θ(g(n)).
Let us now use these definitions to continue the analysis of MaxSumBU.
The analysis follows the same outline as the upper bound analysis; hence,
we need the following theorem, whose proof is left as an exercise.
Theorem 3.15 Suppose f1 (n) ∈ Ω(g1 (n)) and f2 (n) ∈ Ω(g2 (n)). Then
CHAPTER 3. ANALYZING ALGORITHMS 64
Corollary 3.16 Suppose f1 (n) ∈ Θ(g1 (n)) and f2 (n) ∈ Θ(g2 (n)). Then
We can now use (2.1) from page 49 to conclude that the number of steps
taken by the outer loop is
an(n + 1)
b+ ∈ Θ(n2 ).
2
Therefore, the running time of the algorithm is in Θ(n2 ).
This is a rather tedious analysis for such a simple algorithm. Fortunately,
there are techniques for simplifying analyses. In the next two sections, we
will present some of these techniques.
• f ◦ A = {f ◦ g | g ∈ A};
• A ◦ f = {g ◦ f | g ∈ A}; and
• A ◦ B = {g ◦ h | g ∈ A, h ∈ B}.
CHAPTER 3. ANALYZING ALGORITHMS 66
Example 3.18 n2 + Θ(n3 ) is the set of all functions that can be written
n2 + g(n) for some g(n) ∈ Θ(n3 ). This set includes such functions as:
• n2 + 3n3 ;
3 n3 +1
• (n3 +1)/2, which can be written n2 +( n 2+1 −n2 ) (note that 2 −n
2 ≥
0 for all natural numbers n); and
• n3 + 2n, which can be written n2 + (n3 + 2n − n2 ).
Because all functions in this set belong to Θ(n3 ), n2 + Θ(n3 ) ⊆ Θ(n3 ).
Example 3.19 O(n2 ) + O(n3 ) is the set of functions that can be written
f (n) + g(n), where f (n) ∈ O(n2 ) and g(n) ∈ O(n3 ). Functions in this set
include:
• 2n2 + 3n3 ;
• 2n, which can be written as n + n; and
• 2n3 , which can be written as 0 + 2n3 .
Because all functions in this set belong to O(n3 ), O(n2 ) + O(n3 ) ⊆ O(n3 ).
Example 3.21
n
X
Θ(i2 )
i=1
is the set of all functions of the form
n
X
f (i)
i=1
for some h(n) ∈ Θ(n). Note that because h(0) may have any nonnegative
value, so may f (0).
We can use the above definitions to simplify our analysis of the lower
bound for MaxSumOpt. Instead of introducing the constant a to represent
the running time of a single iteration of the inner loop, we can simply use
Ω(1) to represent the lower bound for this running time. We can therefore
conclude that the total running time of the inner loop is in Ω(n − i). Using
Definition 3.20, we can then express the running time of the outer loop, and
hence, of the entire algorithm, as being in
n−1
X
Ω(n − i).
i=0
for some smooth function f . The following theorem, whose proof is outlined
in Exercise 3.10, can then be applied.
• f (n) + g(n);
• f (n)g(n);
• f c (n); and
2lg x = x (3.4)
Again, this summation does not immediately fit the form of Theorem
3.28, as the starting value of the summation index j is i, not 1. Furthermore,
j − i is not a function of j. Notice that the expression j − i takes on the
values 0, 1, . . . , n − i. We can therefore rewrite this sum as
n
X n−i+1
X
Θ(j − i) = Θ(j − 1).
j=i j=1
However, we must be careful, because we have not shown that the while
loop runs in Ω(i) time for every iteration of the for loop; hence the running
time of the for loop might not be in
n
X
Θ(i).
i=1
CHAPTER 3. ANALYZING ALGORITHMS 72
We must show that there are inputs of size n, for every sufficiently large
n, such that the while loop iterates i − 1 times for each iteration of the for
loop. It is not hard to show that an array of distinct elements in decreasing
order will produce the desired behavior. Therefore, the algorithm indeed
operates in Θ(n2 ) time.
where g(n) ∈ Θ(1) is the worst-case running time of the body of the function,
excluding the recursive call. Note that f (n − 1) has already been defined to
be the worst-case running time of MaxSuffixTD on an array of size n − 1;
hence, f (n − 1) gives the worst-case running time of the recursive call.
The solution of arbitrary recurrences is beyond the scope of this book.
However, asymptotic solutions are often much simpler to obtain than are
exact solutions. First, we observe that (3.5) can be simplified using set
operations:
f (n) ∈ f (n − 1) + Θ(1) (3.6)
CHAPTER 3. ANALYZING ALGORITHMS 73
for n > 0.
It turns out that most of the recurrences that we derive when analyzing
algorithms fit into a few general forms. With asymptotic solutions to these
general forms, we can analyze recursive algorithms without using a great
deal of detailed mathematics. (3.6) fits one of the most basic of these forms.
The following theorem, whose proof is outlined in Exercise 3.20, gives the
asymptotic solution to this form.
0 n−1
resides in neither half. However, notice that such a sequence that resides in
neither half can be expressed as a suffix of the first half followed by a prefix
of the last half; e.g., h3, −2, 7i can be expressed as h3, −2i followed by h7i.
Let us define the maximum prefix sum analogously to the maximum
suffix sum as follows:
( i−1 )
X
max A[k] | 0 ≤ i ≤ n .
k=0
It is not hard to see that the maximum sum of any sequence crossing the
boundary is simply the maximum suffix sum of the first half plus the max-
imum prefix sum of the second half. For example, returning to Example
1.1, the maximum suffix sum of the first half is 1, obtained from the suffix
h3, −2i. Likewise, the maximum prefix sum of the second half is 7, obtained
from the prefix h7i. The sum of these two values gives us 8, the maximum
subsequence sum.
Note that when we create smaller instances by splitting the array in half,
one of the two smaller instances — the upper half — does not begin with
index 0. For this reason, let us describe the input array more generally, as
A[lo..hi]. We can then modify the definitions of maximum subsequence sum,
maximum suffix sum, and maximum prefix sum by replacing 0 with lo and
n − 1 with hi. We will discuss the ranges of lo and hi shortly.
We must be careful that each recursive call is of a strictly smaller size.
We wish to divide the array in half, as nearly as possible. We begin by
finding the midpoint between lo and hi; i.e,
lo + hi
mid = .
2
Note that if hi > lo, then lo ≤ mid < hi. In this case, we can split
A[lo..hi] into A[lo..mid] and A[mid + 1..hi], and both sub-arrays are smaller
than the original. However, a problem occurs when lo = hi — i.e., when the
array contains only one element — because in this case mid = hi. In fact,
it is impossible to divide an array of size 1 into two subarrays, each smaller
than the original. Fortunately, it is easy to solve a one-element instance
directly. Furthermore, it now makes sense to consider an empty array as a
special case, because it can only occur when we begin with an empty array,
and not as a result of dividing a nonempty array in half. We will therefore
require in our precondition that lo ≤ hi, and that both are natural numbers.
We can compute the maximum suffix sum as in MaxSumBU (see Figure
1.14 on page 18), and the maximum prefix sum in a similar way. The entire
CHAPTER 3. ANALYZING ALGORITHMS 76
algorithm is shown in Figure 3.3. Note that the specification has been
changed from the one given in Figure 1.9. However, it is a trivial matter to
give an algorithm that takes as input A[0..n−1] and calls MaxSumDC if n >
0, or returns 0 if n = 0. Such an algorithm would satisfy the specification
given in Figure 1.9.
This algorithm contains two recursive calls on arrays of size ⌊n/2⌋ and
⌈n/2⌉, respectively. In addition, it calls MaxSuffixBU on an array of size
⌊n/2⌋ and MaxPrefixBU on an array of size ⌈n/2⌉. These two algorithms
are easily seen to have running times in Θ(n); hence, if f (n) denotes the
worst-case running time of MaxSumDC on an array of size n, we have
for n > 1.
This equation does not fit the form of Theorem 3.31. However, suppose
we focus only on those values of n that are powers of 2; i.e., let n = 2k for
some k > 0, and let g(k) = f (2k ) = f (n). Then
g(k) = f (2k )
∈ 2f (2k−1 ) + Θ(2k )
= 2g(k − 1) + Θ(2k ) (3.8)
for k > 0. Theorem 3.31 applies to (3.8), yielding g(k) ∈ Θ(k2k ). Because
n = 2k , we have k = lg n, so that
f (n) ≤ c2 (k + 1)2k+1
≤ c2 dk2k
≤ c2 dn lg n
∈ O(n lg n).
Likewise,
f (n) ≥ c1 k2k
c1 (k + 1)2k+1
≥
d
c1 n lg n
≥
d
∈ Ω(n lg n).
Let us first see that (3.7) fits the form of Theorem 3.32. As we have
already observed, f is eventually nondecreasing (this requirement is typically
met by recurrences obtained in the analysis of algorithms). When n = 2k ,
(3.7) simplifies to
f (n) ∈ 2f (n/2) + Θ(n).
CHAPTER 3. ANALYZING ALGORITHMS 79
g ′ (n) = g(2n )
= lg 2n
=n
g ′ (n) = g(5 · 3n )
= lg2 (5 · 3n )
= lg2 5 + n2 lg2 3
Precondition: A[1..m, 1..n] and B[1..m, 1..n] are arrays of Numbers, and
m and n are positive Nats.
Postcondition: Returns the sum of A[1..m, 1..n] and B[1..m, 1..n]; i.e.,
returns the array C[1..m, 1..n] such that for 1 ≤ i ≤ m and 1 ≤ j ≤ n,
C[i, j] = A[i, j] + B[i, j].
AddMatrices(A[1..m, 1..n], B[1..m, 1..n])
C ← new Array[1..m, 1..n]
for i ← 1 to m
for j ← 1 to n
C[i, j] ← A[i, j] + B[i, j]
return C[1..m, 1..n]
for n > 0, so that f (n) ∈ Θ(n). The total space used is therefore in Θ(n) +
Θ(n) = Θ(n).
Now let’s consider MaxSumDC. MaxSuffixBU and MaxPrefixBU
each use Θ(1) space. Because the two recursive calls can reuse the same
space, the total space usage is given by
for n > 1. Applying Theorem 3.32, we see that f (n) ∈ Θ(lg n). Because
lg n is such a slow-growing function (e.g., lg 106 < 20), we can see that
MaxSumDC is a much more space-efficient algorithm than MaxSumTD.
Because the space used by both algorithms is almost entirely from the run-
time stack, MaxSumDC will not have the stack problems that MaxSumTD
has.
algorithm because the first inner loop will always execute once, assuming m
is a Nat.
Thus, we can see that if we want to retain the properties of asymptotic
notation on a single variable, we must extend it to multiple variables in
a way that is not straightforward. Unfortunately, the situation is worse
than this — it can be shown that it is impossible to extend the notation to
multiple variables in a way that retains the properties of asymptotic notation
on a single variable. What we can do, however, is to extend it so that
these properties are retained whenever the function inside the asymptotic
notation is strictly nondecreasing. Note that restricting the functions in this We say that a
function
way does not avoid the problems discussed above, as the functions inside f : N × N → R≥0
the asymptotic notation in this discussion are all strictly nondecreasing. We is strictly
therefore must use some less straightforward extension. nondecreasing if,
for every m ∈ N
The definition we propose for O(f (m, n)) considers all values of a func- and n ∈ N,
tion g(m, n), rather than ignoring values when m and/or n are small. How- f (m, n) ≤
f (m + 1, n) and
ever, it allows even infinitely many values of g(m, n) to be large in compar- f (m, n) ≤
ison to f (m, n), provided that they are not too large in comparison to the f (m, n + 1).
overall growth rate of f . In order to accomplish these goals, we first give
the following definition.
and
gb(m, n) ≥ cfb(m, n)
whenever m ≥ n0 and n ≥ n0 .
1. O(f (m, n)) is the set of all functions g : N × N → R≥0 such that there
exist c ∈ R>0 and n0 ∈ N such that
gb(m, n) ≤ cfb(m, n)
whenever m ≥ n0 and n ≥ n0 .
2. Ω(f (m, n)) is the set of all functions g : N × N → R≥0 such that there
exist c ∈ R>0 and n0 ∈ N such that
g(m, n) ≥ cf (m, n)
whenever m ≥ n0 and n ≥ n0 .
Proof: From the definitions, for any function g(m, n) in O(f (m, n)) or in
Ω(f (m, n)), respectively, there are a c ∈ R>0 and an n0 ∈ N such that when-
ever m ≥ n0 and n ≥ n0 , the corresponding inequality above is satisfied.
We therefore only need to show that if there are c ∈ R>0 and n0 ∈ N such
CHAPTER 3. ANALYZING ALGORITHMS 85
fb(m, n) = f (m, n)
g(m, n) ≤ gb(m, n)
≤ cfb(m, n)
= cf (m, n).
gb(m, n) ≥ g(m, n)
≥ cf (m, n)
= cfb(m, n).
Theorem 3.41 Suppose f1 (m, n) ∈ O(g1 (m, n)) and f2 (m, n) ∈ O(g2 (m, n)),
where g1 and g2 are strictly nondecreasing. Then
Proof: We will only show part 1; part 2 will be left as an exercise. Because
f1 (m, n) ∈ O(g1 (m, n)) and f2 (m, n) ∈ O(g2 (m, n)), there exist positive
CHAPTER 3. ANALYZING ALGORITHMS 86
real numbers c1 and c2 and natural numbers n1 and n2 such that whenever
m ≥ n1 and n ≥ n1 ,
fb1 (m, n) ≤ c1 gb1 (m, n),
and whenever m ≥ n2 and n ≥ n2 ,
fd
1 f2 (m, n) = max{f1 (i, j)f2 (i, j) | 0 ≤ i ≤ m, 0 ≤ j ≤ n}.
fd b b
1 f2 (m, n) ≤ f1 (m, n)f2 (m, n).
fd b b
1 f2 (m, n) ≤ f1 (m, n)f2 (m, n)
≤ c1 gb1 (m, n)c2 gb2 (m, n)
= c1 c2 gb1 (m, n)gb2 (m, n)
= cgd
1 g2 (m, n),
Theorem 3.42 Suppose f1 (m, n) ∈ Ω(g1 (m, n)) and f2 (m, n) ∈ Ω(g2 (m, n)),
where g1 and g2 are strictly nondecreasing. Then
1. f1 (m, n)f2 (m, n) ∈ Ω(g1 (m, n)g2 (m, n)); and
Corollary 3.43 Suppose f1 (m, n) ∈ Θ(g1 (m, n)) and f2 (m, n) ∈ Θ(g2 (m, n)),
where g1 and g2 are strictly nondecreasing. Then
CHAPTER 3. ANALYZING ALGORITHMS 87
Before we can extend Theorem 3.28 to more than one variable, we must
first extend the definition of smoothness. In order to do this, we must first
extend the definitions of eventually nondecreasing and eventually positive.
Having the above theorems, we can now complete the analysis of Add-
Matrices. Because we are analyzing the algorithm with respect to two
parameters, we view n as the 2-variable function f (m, n) = n, and we view
m as the 2-variable function g(m, n) = m. We can then apply Corollary 3.43
to Θ(m)Θ(n) to obtain a running time in Θ(mn). Alternatively, because n
is smooth, we could apply Theorem 3.47 to obtain
m
X
Θ(n) ⊆ Θ(mn).
i=1
CHAPTER 3. ANALYZING ALGORITHMS 88
The results from this section give us the tools we need to analyze it-
erative algorithms with two natural parameters. Furthermore, all of these
results can be easily extended to more than two parameters. Recursive
algorithms, however, present a greater challenge. In order to analyze recur-
sive algorithms using more than one natural parameter, we need to be able
to handle asymptotic recurrences in more than one variable. This topic is
beyond the scope of this book.
Definition 3.48 Let f : N → R≥0 . o(f (n)) is the set of all functions o(f (n)) is
pronounced
g : N → R≥0 such that for every positive real number c, there is a natural “little-oh of f of
number n0 such that g(n) < cf (n) whenever n ≥ n0 . n”.
Definition 3.49 Let f : N → R≥0 . ω(f (n)) is the set of all functions ω(f (n)) is
pronounced
g : N → R≥0 such that for every positive real number c, there is a natural “little-omega of
number n0 such that g(n) > cf (n) whenever n ≥ n0 . f of n”.
Figure 3.6 Venn diagram depicting the relationships between the sets
O(f (n)), Ω(f (n)), Θ(f (n)), o(f (n)), and ω(f (n))
11111111111
00000000000
00000000000
11111111111
00000000000
11111111111
000
111 00
11
000000
111111
00000000000
11111111111
000
111
000000
111111 0000000
1111111
00
11
00000000000
11111111111
000
111
000000
111111 0000000
1111111
00
11
0000000
1111111
00000000000
11111111111
000
111
000000
111111 00
11
0000000
1111111
00000000000
11111111111
000
111
000000
111111
00000000000
11111111111
000
111 00
11
0000000
1111111
00
11
000000
111111
00000000000
11111111111
000
111
000000
111111 0000000
1111111
00
11
00000000000
11111111111
000
111 0000000
1111111
00
11
00000000000
11111111111
00
11
00
11
00
1100
11
00
11
00
11 000
111
000
111
O(f (n))
0011
1100
000
111
o(f (n))
000
111
Ω(f (n))
000
111
ω(f (n))
Θ(f (n))
Proof: We will only prove part 1; the proof of part 2 is symmetric. Let
g(n) ∈ o(f (n)), and let c be any positive real number. Then there is a natural
number n0 such that g(n) < cf (n) whenever n ≥ n0 . Hence, g(n) ∈ O(f (n)).
Furthermore, because the choice of c is arbitrary, we can conclude that
g(n) 6∈ Ω(f (n)); hence, g(n) 6∈ Θ(f (n)).
It may seem at this point that the above theorem could be strengthened
to say that o(f (n)) = O(f (n)) \ Θ(f (n)) and ω(f (n)) = Ω(f (n)) \ Θ(f (n)).
Indeed, for functions f and g that we typically encounter in the analysis
of algorithms, it will be the case that if g(n) ∈ O(f (n)) \ Θ(f (n)) then
g(n) ∈ o(f (n)). However, there are exceptions. For example, let f (n) = n,
⌊lg lg n⌋
and let g(n) = 22 . Then g(n) ∈ O(f (n)) because g(n) ≤ f (n) for all
CHAPTER 3. ANALYZING ALGORITHMS 90
k k−1 √
n ∈ N. Furthermore, when n = 22 − 1 for k > 0, g(n) = 22 = n + 1;
k
hence, g(n) 6∈ Θ(f (n)). Finally, when n = 22 , g(n) = n, so g(n) 6∈ o(f (n)).
Note that we have the same duality between o and ω as between O and
Ω. We therefore have the following theorem.
Theorem 3.52 Let f : N → R≥0 and g : N → R≥0 . Then g(n) ∈ o(f (n))
iff f (n) ∈ ω(g(n)).
The following theorems express relationships between common functions
using o-notation.
Theorem 3.53 Let p, q ∈ R≥0 such that p < q, and suppose f (n) ∈ O(np )
and g(n) ∈ Ω(nq ). Then f (n) ∈ o(g(n)).
Proof: Because f (n) ∈ O(np ), there exist a positive real number c1 and a
natural number n1 such that
f (n) ≤ c1 np (3.11)
whenever n ≥ n1 . Because g(n) ∈ Ω(nq ), there exist a positive real number
c2 and a natural number n2 such that
g(n) ≥ c2 nq (3.12)
whenever n ≥ n2 . Combining (3.11) and (3.12), we have
c1 g(n)
f (n) ≤
c2 nq−p
whenever n ≥ max(n1 , n2 ). Let c be an arbitrary positive real number.
Let n0 = max(n1 , n2 , ⌈(c1 /(c2 c))1/(q−p) ⌉) + 1. Then when n ≥ n0 , nq−p >
c1 /(c2 c) because q > p. We therefore have,
c1 g(n)
f (n) ≤
c2 nq−p
< cg(n).
Therefore, f (n) ∈ o(g(n)).
lim f (n) = u
n→∞
if for every positive real number c, there is a natural number n0 such that
|f (n) − u| < c whenever n ≥ n0 . Likewise, for a function g : R≥0 → R, we
say that
lim g(x) = u
x→∞
if for every positive real number c, there is a real number x0 such that
|g(x) − u| < c whenever x ≥ x0 .
Note that for f : N → R and g : R≥0 → R, if f (n) = g(n) for every
n ∈ N, it follows immediately from the above definition that
whenever the latter limit exists. It is also possible to define infinite limits,
but for our purposes we only need finite limits as defined above. Given this
definition, we can now formally relate limits to asymptotic notation.
Note that part 1 is an “if and only if”, whereas part 2 is an “if”. The
reason for this is that there are four possibilities, given arbitrary f and g:
1. limn→∞ g(n)/f (n) = 0. In this case g(n) ∈ o(f (n)) and f (n) ∈
ω(g(n)).
3. limn→∞ g(n)/f (n) = x > 0. In this case, g(n) ∈ Θ(f (n)) and f (n) ∈
Θ(g(n)). (Note that limn→∞ f (n)/g(n) = 1/x > 0.)
4. Neither limn→∞ g(n)/f (n) nor limn→∞ f (n)/g(n) exists. In this case,
we can only conclude that g(n) 6∈ o(f (n)) and f (n) 6∈ o(g(n)) — we do
not have enough information to determine whether g(n) ∈ Θ(f (n)).
2. Suppose limn→∞ g(n)/f (n) = x > 0. Then for every positive real
number c, there is a natural number n0 such that
Because these inequalities hold for every positive real number c, and
because x > 0, we may choose c = x/2, so that both x − c and x + c
are positive. Therefore, g(n) ∈ Θ(f (n)).
A powerful tool for evaluating limits of the form given in Theorem 3.56
is L’Hôpital’s rule, which we present without proof in the following theorem.
We first note that because both lg x and xq/p are nondecreasing and
unbounded (because q and p are both positive), limx→∞ 1/ lg x = 0
and limx→∞ 1/xq/p = 0. In order to compute the derivative of lg x,
we first observe that lg x ln 2 = ln x, where ln denotes the natural
logarithm or base-e logarithm, where e ≈ 2.718. Thus, the derivative
of lg x is 1/(x ln 2). The derivative of xq/p is
q
−1
qx p /p.
3.12 Summary
Asymptotic notation can be used to express the growth rates of functions
in a way that ignores constant factors and focuses on the behavior as the
function argument increases. We can therefore use asymptotic notation to
analyze performance of algorithms in terms of such measures as worst-case
running time or space usage. O and Ω are used to express upper and lower
bounds, respectively, while Θ is used to express the fact that the upper and
lower bounds are tight. o gives us the ability to abstract away low-order
CHAPTER 3. ANALYZING ALGORITHMS 94
terms when we don’t want to ignore constant factors. ω provides a dual for
o.
Analysis of iterative algorithms typically involves summations. Theorem
3.28 gives us a powerful tool for obtaining asymptotic solutions for summa-
tions. Analysis of recursive algorithms, on the other hand, typically involves
recurrence relations. Theorems 3.31 and 3.32 provide asymptotic solutions
for the most common forms of recurrences.
The analyses of the various algorithms for the maximum subsequence
sum problem illustrate the utility of asymptotic analysis. We saw that the
five algorithms have worst-case running times shown in Figure 3.7. These
results correlate well with the actual running times shown in Figure 1.15.
The results of asymptotic analyses can also be used to predict perfor-
mance degradation. If an algorithm’s running time is in Θ(f (n)), then as n
increases, the running time of an implementation must lie between cf (n) and
df (n) for some positive real numbers c and d. In fact, for most algorithms,
this running time will approach cf (n) for a single positive real number c.
Assuming that this convergence occurs, if we run the algorithm on suffi-
ciently large input, we can approximate c by dividing the actual running
time by f (n), where n is the size of the input.
For example, our implementation of MaxSumIter took 1283 seconds
to process an input of size 214 = 16,384. Dividing 1283 by (16,384)3 , we
obtain a value of c = 2.92 × 10−10 . Evaluating cn3 for n = 213 , we obtain The results of
floating-point
a value of 161 seconds. This is very close to the actual running time of 160 computations in
seconds on an input of size 213 . Thus, the running time does appear to be this discussion
converging to cn3 for sufficiently large n. are all rounded
to three
Figure 3.8 shows a plot of the functions estimating the running times significant digits.
of the various maximum subsequence sum implementations, along with the
CHAPTER 3. ANALYZING ALGORITHMS 95
1020
MaxSumIter
MaxSumOpt
MaxSumTD
1015 MaxSumDC
MaxSumBU
Time in seconds
1010
105
100
10−5 10
2 215 220 225 230
Array size
measured running times from Figure 1.15. The functions were derived via
the technique outlined above using the timing information from Figure 1.15,
taking the largest data set tested for each algorithm. We have extended
both axes to show how these functions compare as n grows as large as
230 = 1,073,741,824.
For example, consider the functions estimating the running times of
MaxSumIter and MaxSumBU. As we have already shown, the function
estimating the running time of MaxSumIter is f (n) = (2.92 × 10−10 )n3 .
The function we obtained for MaxSumBU is g(n) = (1.11 × 10−8 )n. Let us
now use these functions to estimate the time these implementations would
require to process an array of 230 elements. g(230 ) = 11.9 seconds, whereas
f (230 ) = 3.61×1017 seconds, or over 11 billion years! Even if we could speed
up the processor by a factor of one million, this implementation would still
require over 11,000 years.
Though this example clearly illustrates the utility of asymptotic analysis,
a word of caution is in order. Asymptotic notation allows us to focus on
growth rates while ignoring constant factors. However, constant factors
CHAPTER 3. ANALYZING ALGORITHMS 96
2250
√
n
lg16 n
2200
2150
2100
250
20
20 2100 2200 2300 2400 2500
can be relevant. For example, two linear-time algorithms will not yield
comparable performance if the hidden constants are very different.
√
For a more subtle example, consider the functions lg16 n and n, shown
√
in Figure 3.9. From Theorem 3.54, O(lg16 n) ⊆ o( n), so that as n increases,
√
lg16 n grows much more slowly than does n. However, consider n = 232 =
√
4,294,967,296. For this value, n = 216 = 65,536, whereas
3.13 Exercises
Exercise 3.1 Prove that if g(n) ∈ O(f (n)), then O(g(n)) ⊆ O(f (n)).
Exercise 3.2 Prove that for any f : N → R≥0 , f (n) ∈ Θ(f (n)).
Exercise 3.3 Prove that if f (n) ∈ O(g(n)) and g(n) ∈ O(h(n)), then
f (n) ∈ O(h(n)).
Exercise 3.5 For each of the following, give functions f (n) ∈ Θ(n) and
g(n) ∈ Θ(n) that satisfy the given property.
Exercise 3.6 Suppose that g1 (n) ∈ Θ(f1 (n)) and g2 (n) ∈ Θ(f2 (n)), where
g2 and f2 are eventually positive. Prove that g1 (n)/g2 (n) ∈ Θ(f1 (n)/f2 (n)).
Exercise 3.7 Show that the result in Exercise 3.6 does not necessarily hold
if we replace Θ by O.
b. Show that f (n) is not smooth; i.e., show that for every c ∈ R>0 and
every n0 ∈ N, there is some n ≥ n0 such that f (2n) > cf (n). [Hint:
k
Consider a sufficiently large value of n having the form 22 −1 .]
* Exercise 3.10 The goal of this exercise is to prove Theorem 3.28. Let f :
N → R≥0 be a smooth function, g : N → N be an eventually nondecreasing
and unbounded function, and h : N → R≥0 .
CHAPTER 3. ANALYZING ALGORITHMS 98
a. Show that if h(n) ∈ O(f (n)), then there exist natural numbers n0 and
n1 , a positive real number c, and a nonnegative real number d such
that for every n ≥ n1 ,
g(n) g(n)
X X
h(i) ≤ d + cf (g(n)).
i=1 i=n0
c. Show that if h(n) ∈ Ω(f (n)), then there exist natural numbers n0 and
n1 and positive real numbers c and d such that for every n ≥ n0 ,
f (n) ≥ f (2n)/d,
and
g(n) ≥ 2n0
hold.
* Exercise 3.11 Prove that for every smooth function f : N → R≥0 and
every eventually nondecreasing and unbounded function g : N → N, and
every X ∈ {O, Ω, Θ},
g(n)
X
X(f (i)) 6= X(g(n)f (g(n))).
i=1
[Hint: First identify a property that every function in the set on the left-
hand side must satisfy, but which functions in the set on the right-hand side
need not satisfy.]
Exercise 3.13 Analyze the worst-case running time of the following code
fragments, assuming that n represents the problem size. Express your result
as simply as possible using Θ-notation.
a. for i ← 0 to 2n
for j ← 0 to 3n
k ←k+i+j
b. for i ← 1 to n2
for j ← i to i3
k ←k+1
* c. i ← n
while i > 0
for j ← 1 to i2
x ← (x + j)/2
i ← ⌊i/2⌋
b.
f (n) ∈ f (n − 1) + Ω(n lg n)
for n > 0.
c.
f (n) ∈ 4f (n/2) + O(lg2 n)
whenever n = 3 · 2k for a positive integer k.
d.
f (n) ∈ 5f (n/3) + Θ(n2 )
whenever n = 3k for a positive integer k.
e.
f (n) ∈ 3f (n/2) + O(n)
whenever n = 8 · 2k for a positive integer k.
Exercise 3.16 Analyze the worst-case running time of the following func-
tions. Express your result as simply as possible using Θ-notation.
a. SlowSort(A[1..n])
if n = 2 and A[1] > A[2]
A[1] ↔ A[2]
else if n > 2
SlowSort(A[1..n − 1])
SlowSort(A[2..n])
SlowSort(A[1..n − 1])
b. FindMax(A[1..n])
if n = 0
error
else if n = 1
return A[1]
else
return Max(FindMax(A[1..⌊n/2⌋]), FindMax(A[⌊n/2⌋+1..n]))
CHAPTER 3. ANALYZING ALGORITHMS 101
c. FindMin(A[1..n])
if n = 0
error
else if n = 1
return A[1]
else
B ← new Array[1..⌈n/2⌉]
for i ← 1 to ⌊n/2⌋
B[i] ← Min(A[2i − 1], A[2i])
if n mod 2 = 1
B[⌈n/2⌉] ← A[n]
return FindMin(B[1..⌈n/2⌉])
Exercise 3.17 Analyze the worst-case space usage of each of the functions
given in Exercise 3.16. Express your result as simply as possible using Θ-
notation.
Changed to a
starred exercise,
* Exercise 3.18 Prove that if f : N → R≥0 is smooth and g(n) ∈ Θ(n), 2/25/11.
then f (g(n)) ∈ Θ(f (n)).
* Exercise 3.19 Prove that for any smooth function g : N → R≥0 , there is
a natural number k such that g(n) ∈ O(nk ).
* Exercise 3.20 The goal of this exercise is to prove Theorem 3.31. Let
f (n) ∈ af (n − 1) + X(bn g(n))
for n > n0 , where n0 ∈ N, a ≥ 1 and b ≥ 1 are real numbers, g(n) is a smooth
function, and X is either O, Ω, or Θ. In what follows, let n1 be any natural
number such that n1 ≥ n0 and whenever n ≥ n1 , 0 < g(n) ≤ g(n + 1).
c. Use parts a and b, together with Equation (2.2), to show that if a < b,
then f (n) ∈ X(bn g(n)).
[Hint: Use the result of Exercise 3.19 and Theorem 3.54 to show that
for sufficiently large i, g(i) ≤ ri ; then apply Equation (2.2).]
Exercise 3.23 Show that Copy, specified in Figure 1.18 on page 22, can be
implemented to run in Θ(n) time, Θ(n) space, and Θ(1) stack space, where
n is the size of both of the arrays. Note that function calls use space from the
stack, but constructed arrays do not. Also recall that the parameters A[1..n]
and B[1..n] should not be included in the analysis of space usage. Your
algorithm should work correctly even for calls like Copy(A[1..n−1], A[2..n])
(see Exercise 1.4).
* Exercise 3.27 Prove Theorem 3.47. [Hint: First work Exercise 3.10,
but note that not all parts of that exercise extend directly to multiple vari- Corrected
2/25/11.
ables.]
CHAPTER 3. ANALYZING ALGORITHMS 103
Exercise 3.29 Prove that if g(n) ∈ o(f (n)), then O(g(n)) ⊆ o(f (n)).
Exercise 3.31 Prove that for any real numbers a > 1 and b > 1,
O(n2 ) = 2n2 + 7n − 4.
Thus, the “=” symbol was used to denote not equality, but a relation that
is not even symmetric.
Over the years, many have observed that a set-based definition, as we
have given here, is more sound mathematically. In fact, Brassard [17] claims
that as long ago as 1962, a set-based treatment was taught consistently
CHAPTER 3. ANALYZING ALGORITHMS 104
Data Structures
105
Chapter 4
Basic Techniques
4.1 Stacks
One of the strengths of both top-down design and object-oriented design is
their use of abstraction to express high-level solutions to problems. In fact,
we can apply abstraction to the problems themselves to obtain high-level
solutions to many similar problems. Such high-level solutions are known
as design patterns. For example, consider the “undo” operation in a word
processor. We have some object that is undergoing a series of modifications.
An application of the “undo” operation restores the object to its state prior
to the last modification. Subsequent applications of “undo” restore the
object to successively earlier states in its history.
We have captured the essence of the “undo” operation without specifying
any of the details of the object being modified or the functionality of the
document formatter in which it will appear. In fact, our description is
general enough that it can apply to other applications, such as a spreadsheet
or the search tree viewer on this book’s web site. We have therefore specified
a design pattern for one aspect of functionality of an application.
106
CHAPTER 4. BASIC TECHNIQUES 107
Precondition: true.
Postcondition: Constructs an empty Stack.
Stack()
Precondition: true.
Postcondition: a is added to the end of the represented sequence.
Stack.Push(a)
Precondition: The stack is nonempty.
Postcondition: The last element of the represented sequence is removed
and returned.
Stack.Pop()
Precondition: true.
Postcondition: Returns true iff the represented sequence is empty.
Stack.IsEmpty()
Precondition: op is an EditOp.
Postcondition: Applies op to this Editable.
Editable.Apply(op)
Precondition: op is an EditOp.
Postcondition: Applies the inverse operation of op to this Editable.
Editable.ApplyInverse(op)
approach that does not quite meet the specification. We will then consider
two full implementations of the Stack ADT.
• a Nat size.
The values of the representation variables, together with all values used
by the interpretation and the structural invariant, comprise the state of the
data structure. Thus, the state of our stack implementation consists of the
value of size, the array elements, and the values stored in elements[1..size].
(We will clarify shortly the distinction between the array and the values
stored in the array.)
We can now complete our implementation by giving algorithms for the
SimpleStack constructor and operations. These algorithms are shown in
Figure 4.4.
Note that the preconditions and postconditions for the constructor and
operations are stated in terms of the definition of a stack, not in terms
of our chosen representation. For example, the precondition for the Push
operation could have been stated as,
Figure 4.4 The data type SimpleStack, which does not quite implement
the Stack ADT
Precondition: true.
Postcondition: Returns the length of the represented sequence.
SimpleStack.Size()
return size
CHAPTER 4. BASIC TECHNIQUES 112
the time to construct a new array is in Θ(1), and the constructor operates
in Θ(1) time.
Proving correctness of operations on a data structure is similar to proving
correctness of ordinary algorithms. There are five parts:
3. Security: If the structural invariant holds, then the state can only be
modified by invoking one of this structure’s operations.
model includes no inheritance. Our algorithms will not always specify the
type of a data item if its type is irrelevant to the essence of the algorithm.
For example, we have not specified the type of the parameter a for the
Stack.Push operation in Figure 4.4 because we do not care what kind of
data is stored in the stack.
When the data type of a parameter is important, we can specify it in the
precondition, as we have done for the constructor in Figure 4.4. Unless we
explicitly state otherwise, when we state in a precondition that a variable
refers to an item of some particular type, we mean that this variable must
be non-nil. Note, however, that a precondition does not affect the execution
of the code. When it is important that the type actually be checked (e.g.,
for maintaining a structural invariant), we will attach a type declaration in
the parameter list, as in the two-argument constructor in Figure 4.10 (page
127). A type declaration applies to a single parameter only, so that in this
example, L is of type ConsList, but a is untyped. We interpret a type
declaration as generating an error if the value passed to that parameter is
not nil and does not refer to an instance of the declared type.
As we have already suggested, the elements of a particular data type may
have operations associated with them. Thus, each instance of the Stack
type has a Push operation and a Pop operation. For the sake of consistency,
we will consider that when a constructor is invoked, it belongs to the data
item that it is constructing. In addition, the elements of a data type may
have internal functions associated with them. Internal functions are just
like operations, but with restricted access, as described below.
In order to control the way in which a data structure can be changed, we
place the following restrictions on how representation variables and internal
functions can be accessed:
• Write access to a representation variable of an instance of data type
A is given only to the operations, constructors, and internal functions
of that instance.
• Read access to a representation variable of an instance of data type
A is given only to operations, constructors, and internal functions of
instances of type A.
• Access to an internal function of an instance of a data type A is given
only to operations, constructors, and internal functions of that in-
stance.
These restrictions are severe enough that we will often need to relax
them. In order to relax either of the first two restrictions, we can provide
CHAPTER 4. BASIC TECHNIQUES 116
Explicitly allowing write access does not technically violate security, be-
cause any changes are made by invoking operations of the data structure.
What can be problematic is allowing read access. For example, suppose we
were to allow read access to the variable elements in the representation of
a stack. Using this reference, a user’s code could change the contents of
that array. Because this array’s contents belong to the state of the data
structure, security would then be violated. We must therefore check for the
following conditions, each of which might compromise security:
• An operation causes the data item to which one of its parameters refers
to be a part of the state of the structure. Under this condition, the
code that invokes the operation has a copy of the parameter, and hence
has access to the state of the structure. If the data item in question
can be changed, security is violated.
x: Op1()
y.Op3()
Op2()
y: Op3()
z.Op4()
z: Op4()
x.Op2()
CHAPTER 4. BASIC TECHNIQUES 119
ExpandableArrayStack()
ExpandableArrayStack(10)
ExpandableArrayStack.Push(a)
if size = SizeOf(elements)
el ← new Array[1..2 · size]
for i ← 1 to size
el[i] ← elements[i]
elements ← el
size ← size + 1; elements[size] ← a
ExpandableArrayStack.Pop()
if size > 0
size ← size − 1
return elements[size + 1]
else
error
ExpandableArrayStack.IsEmpty()
return size = 0
CHAPTER 4. BASIC TECHNIQUES 121
2i k + 1 ≤ n
2i ≤ (n − 1)/k
i ≤ lg(n − 1) − lg k.
Because each loop iteration requires Θ(1) time, the time required for all loop
iterations is in O(n). Combining this result with the earlier analysis that
ignored the loop iterations, we see that the entire sequence runs in Θ(n)
time.
CHAPTER 4. BASIC TECHNIQUES 123
Now to complete the amortized analysis, we must average the total run-
ning time over the n operations in the sequence. By Exercise 3.6 on page 97,
if f (n) ∈ Θ(n), then f (n)/n ∈ Θ(1). Therefore, the worst-case amortized
time for the stack operations is in Θ(1). We conclude that, although an
individual Push operation may be expensive, the expandable array yields
a stack that performs well on any sequence of operations starting from an
initially empty stack.
Precondition: true
Postcondition: Constructs a ConsList representing an empty sequence.
ConsList()
Precondition: L is a ConsList ha1 , . . . an i.
Postcondition: Constructs a ConsList representing the sequence
ha, a1 , . . . , an i.
ConsList(a, L)
Precondition: true.
Postcondition: Returns a Bool that is true iff the represented sequence
is empty.
ConsList.IsEmpty()
Precondition: The represented sequence is nonempty.
Postcondition: Returns the first element of the sequence.
ConsList.Head()
Precondition: The represented sequence ha1 , . . . an i is nonempty.
Postcondition: Returns a ConsList representing the sequence
ha2 , . . . , an i.
ConsList.Tail()
ConsListStack()
elements ← new ConsList()
ConsListStack.Push(a)
elements ← new ConsList(a, elements)
ConsListStack.Pop()
if elements.IsEmpty()
error
else
top ← elements.Head(); elements ← elements.Tail()
return top
ConsListStack.IsEmpty()
return elements.IsEmpty()
a1 a2 a3 a4
CHAPTER 4. BASIC TECHNIQUES 127
ConsList()
isEmpty ← true
ConsList(a, L : ConsList)
if L = nil
error
else
isEmpty ← false; head ← a; tail ← L
sors for the three representation variables. Note that our specification says
nothing about the contents of head and tail when isEmpty is true; hence,
if these accessors are called for an empty list, arbitrary values may be re-
turned. The implementation is shown in Figure 4.10. Because we will only
present a single implementation of ConsList, we use the same name for
the implementation as for the interface.
It is easily seen that each constructor and operation runs in Θ(1) time.
We will now prove that the implementation meets its specification.
Proof:
Correctness: The only operations simply provide read access, and so are
trivially correct.
a3 a2 a1 a3 a2 a1
(a) Stack S after 3 pushes (b) After cloning and popping twice
T.elements
S.elements
a4
a3 a2 a1
(c) After pushing a4 onto T
theless, if we are careful, we can use this idea as a building block for several
more advanced data structures. We will therefore refer to this idea as the
linked list design pattern.
Figure 4.12 Example of actual, potential, and amortized costs for gasoline
$30
$27
$24
$21
$18
$15
$12
$9
$6
$3
$0
Day 1 Day 2 Day 3 Day 4 Day 5 Day 6
Actual cost
Potential cost to fill tank
Amortized cost: actual plus change in potential
potential cost. Because the potential cost can never be negative (the tank
can’t be “overfull”), the sum of the amortized costs will be at least the sum
of the actual costs.
Let us now consider how we might apply this technique to the amortized
analysis of a data structure such as an ExpandableArrayStack. The
potential gasoline cost is essentially a measure of how “bad” the state of
the gas tank is. In a similar way, we could measure how “bad” the state
of an ExpandableArrayStack is by considering how full the array is —
the closer the array is to being filled, the closer we are to an expensive
operation. We can formalize this measure by defining a potential function
Φ, which maps states of a data structure into the nonnegative real numbers,
much like the potential gasoline cost maps “states” of the gas tank into
CHAPTER 4. BASIC TECHNIQUES 132
Theorem 4.5 Let Φ be a valid potential function for some data structure;
i.e., if σ0 is the initial state of the structure, then Φ(σ0 ) = 0, and if σ is any
state of the structure, then Φ(σ) ≥ 0. Then for any sequence of operations
from the initial state σ0 , the sum of the amortized costs of the operations
relative to Φ is at least the sum of the actual costs of the operations.
Precondition: true.
Postcondition: Constructs a counter with value 1.
BinaryCounter()
Precondition: true.
Postcondition: Increases the counter value by 1.
BinaryCounter.Increment()
Precondition: true.
Postcondition: Returns a ConsList of 0s and 1s ending with a 1 and giv-
ing the binary representation of the counter value, with the least significant
bit first; i.e. if the sequence is ha0 , . . . , an i, the value represented is
n
X
ai 2i .
i=0
BinaryCounter.Value()
IterBinCounter()
value ← new ConsList(1, new ConsList())
IterBinCounter.Increment()
k ← 0; c ← value
// Invariant: value contains k 1s, followed by c.
while not c.IsEmpty() and c.Head() = 1
k ← k + 1; c ← c.Tail()
if c.IsEmpty()
c ← new ConsList(1, c)
else
c ← new ConsList(1, c.Tail())
// Invariant: c contains i − 1 0s and a 1, followed by the ConsList
// obtained by removing all initial 1s and the first 0 (if any) from value.
for i ← 1 to k
c ← new ConsList(0, c)
value ← c
the number of iterations of the while loop, plus some constant. We can
therefore use the number of iterations of the while loop as the actual cost
of this operation. Note that the actual cost varies from 0 to n, depending
on the current value represented.
The next step is to define an appropriate potential function. This step is
usually the most challenging part of this technique. While finding a suitable
potential function requires some creativity, there are several guidelines we
can apply.
First, we can categorize operations of a data structure according to two
criteria relevant to amortized analysis:
Using the above criteria, we can divide operations into four categories:
2. Operations that cost little but degrade future performance. The In-
crement operation when the head of value is 0 is an example of this
type. It performs no loop iterations, but causes value to have at least
one leading 1, so that the next Increment will perform at least one
iteration.
3. Operations that cost much but improve future performance. The In-
crement operation when value has many leading 1s is an example of
this type. It performs a loop iteration for each leading 1, but replaces
these leading 1s with 0s. Thus, the next Increment will not perform
any iterations. In fact, a number of Increment operations will be
required before we encounter another expensive one.
k + (1 − k) = 1.
We can therefore conclude that the amortized running time of the IterBin-
Counter operations is in O(1).
Let us now use this technique to analyze the amortized performance of
ExpandableArrayStack. We first observe that operations which result
in an error run in Θ(1) time and do not change the state of the structure;
hence, we can ignore these operations. As we did in Section 4.3, we will
again amortize the number of loop iterations; i.e., the actual cost of an
operation will be the number of loop iterations performed by that operation.
An operation that does not require expanding the array performs no loop
CHAPTER 4. BASIC TECHNIQUES 138
because k must be strictly larger than n, and both are integers. Because no
loop iterations are performed in this case, the actual cost is 0; hence, the
amortized cost is less than 4.
In order to complete the analysis, we must consider the Pop operation.
Because n is initially positive and decreases by 1, and because k remains
the same, the change in potential is
4.6 Summary
We have shown how the top-down design paradigm can be applied to the
design of data structures. In many cases, we can reduce the implementation
of an ADT to the implementation of simpler or lower-level ADTs. In other
CHAPTER 4. BASIC TECHNIQUES 140
4.7 Exercises
Exercise 4.1 Complete the proof of Theorem 4.2 by giving proofs of main-
tenance and correctness for the two missing cases.
Exercise 4.5 Let f (n) denote the number of 1s in value after n calls to
Increment on a new IterBinCounter. Prove by induction on n that the
CHAPTER 4. BASIC TECHNIQUES 141
n − f (n) + 1.
a. Prove that the implementation that uses this algorithm meets its spec-
ification.
4n3
3k 2
as the potential function, where n is the number of elements in the stack
and k is the size of the array.
Exercise 4.8 Let c > 1 be a fixed real number. Suppose we modify Figure
4.6 so that the new array is of size ⌈c · size⌉. Using the potential function
approach, show that the amortized running time of the stack operations is
in O(1).
RecBinCounter.Increment()
value ← Inc(value)
can happen if a large number of elements are pushed onto the stack, then
most are removed. One solution is to modify the Pop operation so that if
the number of elements drops below half the size of the array, then we copy
the elements to a new array of half the size. Give a convincing argument
that this solution would not result in O(1) amortized running time.
* b. Using the technique of Section 4.3, show that the stack operations
have an amortized running time in O(1) when this scheme is used.
You may assume that the array is initially of size 4.
Exercise 4.11 A queue is similar to a stack, but it provides first in first out
(FIFO) access to the data items. Instead of the operations Push and Pop,
it has operations Enqueue and Dequeue — Enqueue adds an item to the
end of the sequence, and Dequeue removes the item from the beginning of
the sequence.
Exercise 4.12 A certain data structure contains operations that each con-
sists of a sequence of zero or more Pops from a stack, followed by a single
Push. The stack is initially empty, and no Pop is attempted when the stack
is empty.
bits[SizeOf(bits) − 1] = 1.
Note that the least significant bit has the lowest index; hence, it might
be helpful to think of the array with index 0 at the far right, and indices
increasing from right to left.
BigNum(A[0..n − 1])
Precondition: v refers to a BigNum.
Postcondition: Returns 1 if the value of this BigNum is greater than v,
0 if it is equal to v, or −1 if it is less than v.
BigNum.CompareTo(v)
Precondition: v refers to a BigNum.
Postcondition: Returns a BigNum representing the sum of the value of
this BigNum and v.
BigNum.Add(v)
Precondition: v refers to a BigNum no greater than the value of this
BigNum.
Postcondition: Returns a BigNum representing the value of this BigNum
minus v.
BigNum.Subtract(v)
Precondition: i is an integer.
Postcondition: Returns the floor of the BigNum obtained by multiplying
this BigNum by 2i .
BigNum.Shift(i)
Precondition: true.
Postcondition: Returns the number of bits in the binary representation
of this BigNum with no leading zeros.
BigNum.NumBits()
Precondition: start and len are natural numbers.
Postcondition: Returns an array A[0..len − 1] containing the values of bit
positions start through start + len − 1; zeros are assigned to the high-order
positions if necessary.
GetBits(start, len)
CHAPTER 4. BASIC TECHNIQUES 147
Priority Queues
148
CHAPTER 5. PRIORITY QUEUES 149
Precondition: true.
Postcondition: Constructs an empty PriorityQueue.
PriorityQueue()
Precondition: p is a Number.
Postcondition: Adds x to the set with priority p.
PriorityQueue.Put(x, p)
Precondition: The represented set is not empty.
Postcondition: Returns the maximum priority of any item in the set.
PriorityQueue.MaxPriority()
Precondition: The represented set is not empty.
Postcondition: An item with maximum priority is removed from the set
and returned.
PriorityQueue.RemoveMax()
Precondition: true.
Postcondition: Returns the number of items in the set.
PriorityQueue.Size()
In order to implement Put(x, p), we must find the correct place to insert
x so that the order of the priorities is maintained. Let us therefore reduce
the Put operation to the problem of finding the correct location to insert a
given priority p. This location is the index i, 0 ≤ i ≤ size, such that
SortedArrayPriorityQueue.Put(x, p : Number)
i ← Find(p)
if size = SizeOf(elements)
elements ← Expand(elements)
for j ← size − 1 to i by −1
elements[j + 1] ← elements[j]
elements[i] ← new Keyed(x, p); size ← size + 1
Section 4.3. The remainder of the implementation and its correctness proof
are left as an exercise.
Let us now analyze the running time of Find. Clearly, each iteration of
the while loop runs in Θ(1) time, as does the code outside the loop. We
therefore only need to count the number of iterations of the loop.
Let f (n) denote the number of iterations, where n = hi − lo gives the
number of elements in the search range. One iteration reduces the number
of elements in the range to either ⌊n/2⌋ or ⌈n/2⌉ − 1. The former value
occurs whenever the key examined is greater than or equal to p. The worst
case therefore occurs whenever we are looking for a key smaller than any
key in the set. In the worst case, the number of iterations is therefore given
by the following recurrence:
f (n) = f (⌊n/2⌋) + 1
for n > 1. From Theorem 3.32, f (n) ∈ Θ(lg n). Therefore, Find runs in
Θ(lg n) time.
Let us now analyze the running time of Put. Let n be the value of
size. The first statement requires Θ(lg n) time, and based on our analysis
in Section 4.3, the Expand function should take O(n) time in the worst
case. Because we can amortize the time for Expand, let us ignore it for
now. Clearly, everything else outside the for loop and a single iteration of
the loop run in Θ(1) time. Furthermore, in the worst case (which occurs
when the new key has a value less than all other keys in the set), the loop
iterates n times. Thus, the entire algorithm runs in Θ(n) time in the worst
case, regardless of whether we count the time for Expand.
5.2 Heaps
The SortedArrayPriorityQueue has very efficient MaxPriority and
RemoveMax operations, but a rather slow Put operation. We could speed
up the Put operation considerably by dropping our requirement that the
array be sorted. In this case, we could simply add an element at the end of
the array, expanding it if necessary. This operation is essentially the same
as the ExpandableArrayStack.Push operation, which has an amortized
running time in Θ(1). However, we would no longer be able to take ad-
vantage of the ordering of the array in finding the maximum priority. As a
result, we would need to search the entire array. The running times for the
MaxPriority and RemoveMax operations would therefore be in Θ(n)
time, where n is the number of elements in the priority queue.
CHAPTER 5. PRIORITY QUEUES 153
Figure 5.3 A heap — each priority is no smaller than any of its children
89
53 32
48 53 27
24 32 13
17
example, in Figure 5.3, the subtree whose root contains 24 has two children,
but the first is empty. This practice can lead to ambiguity; for example, it is
not clear whether the subtree rooted at 13 contains any children, or if they
might all be empty. For this and other reasons, we often consider restricted
classes of rooted trees. Here, we wish to define a binary tree as a rooted tree
in which each nonempty subtree has exactly two children, either (or both)
of which may be empty. In a binary tree, the first child is called the left
child, and the other is called the right child. If we then state that the rooted
tree in Figure 5.3 is a binary tree, it is clear that the subtree rooted at 13,
because it is nonempty, has two empty children.
It is rather difficult to define an ADT for either trees or binary trees
in such a way that it can be implemented efficiently. The difficulty is in
enforcing as a structural invariant the fact that no two children have nodes
in common. In order for an operation to maintain this invariant when adding
a new node, it would apparently need to examine the entire structure to see
if the new node is already in the tree. As we will see, maintaining this
invariant becomes much easier for specific applications of trees. It therefore
seems best to think of a rooted tree as a mathematical object, and to mimic
its structure in defining a heap implementation of PriorityQueue.
In order to build a heap, we need to be able to implement a single
node. For this purpose, we will define a data type BinaryTreeNode. Its
representation will contain three variables:
We will provide read/write access to all three of these variables, and our
structural invariant is simply true. The only constructor is shown in Figure
5.4, and no additional operations are included. Clearly, BinaryTreeNode
meets its specification (there is very little specified), and each operation and
constructor runs in Θ(1) time.
We can now formally define a heap as a binary tree containing Keyed
elements such that if the tree is nonempty, then
• the item stored at the root has the maximum key in the tree; and
Precondition: true.
Postcondition: Constructs a BinaryTreeNode with all three variables
nil.
BinaryTreeNode()
root ← nil; leftChild ← nil; rightChild ← nil
• S.
We can form these two children by recursively merging two of these three
heaps.
A simple implementation, which we call SimpleHeap, is shown in Figure
5.5. Note that we can maintain the structural invariant because we can
ensure that the precondition to Merge is always met (the details are left
as an exercise). Note also that the above discussion leaves some flexibility
in the implementation of Merge. In fact, we will see shortly that this
particular implementation performs rather poorly. As a result, we will need
to find a better way of choosing the two heaps to merge in the recursive call,
and/or a better way to decide which child the resulting heap will be.
Let us now analyze the running time of Merge. Suppose h1 and h2
together have n nodes. Clearly, the running time excluding the recursive
call is in Θ(1). In the recursive call, L.RightChild() has at least one fewer
node than does L; hence the total number of nodes in the two heaps in the
recursive call is no more than n − 1. The total running time is therefore
bounded above by
f (n) ∈ f (n − 1) + O(1)
⊆ O(n)
by Theorem 3.31.
At first it might seem that the bound of n − 1 on the number of nodes in
the two heaps in the recursive call is overly pessimistic. However, upon close
examination of the algorithm, we see that not only does this describe the
worst case, it actually describes every case. To see this, notice that nowhere
in the algorithm is the left child of a node changed after that node is created.
Because each left child is initially empty, no node ever has a nonempty left
child. Thus, each heap is single path of nodes going to the right.
The SimpleHeap implementation therefore amounts to a linked list in
which the keys are kept in nonincreasing order. The Put operation will
therefore require Θ(n) time in the worst case, which occurs when we add a
node whose key is smaller than any in the heap. In the remainder of this
chapter, we will examine various ways of taking advantage of the branching
potential of a heap in order to improve the performance.
CHAPTER 5. PRIORITY QUEUES 158
SimpleHeap()
elements ← nil; size ← 0
SimpleHeap.Put(x, p : Number)
h ← new BinaryTreeNode(); h.SetRoot(new Keyed(x, p))
elements ← Merge(elements, h); size ← size + 1
SimpleHeap.MaxPriority()
return elements.Root().Key()
SimpleHeap.RemoveMax()
x ← elements.Root().Data(); size ← size − 1
elements ← Merge(elements.LeftChild(), elements.RightChild())
return x
Theorem 5.1 For any binary tree T with n nodes, the null path length of
T is at most lg(n + 1).
The proof of this theorem is typical of many proofs of properties of trees.
It proceeds by induction on n using the following general strategy:
• For the base case, prove that the property holds when n = 0 — i.e.,
for an empty tree.
• For the induction step, apply the induction hypothesis to one or more
of the children of a nonempty tree.
Induction Hypothesis: Assume for some n > 0 that for 0 ≤ i < n, the
null path length of any tree with i nodes is at most lg(i + 1).
Induction Step: Let T be a binary tree with n nodes. Then because the
two children together contain n − 1 nodes, they cannot both contain more
CHAPTER 5. PRIORITY QUEUES 160
than (n − 1)/2 nodes; hence, one of the two children has no more than
⌊(n − 1)/2⌋ nodes. By the induction hypothesis, this child has a null path of
at most lg(⌊(n − 1)/2⌋ + 1). The null path length of T is therefore at most
1 + lg(⌊(n − 1)/2⌋ + 1) ≤ 1 + lg((n + 1)/2)
= lg(n + 1).
By the above theorem, if we can always choose the child with smaller
null path length for the recursive call, then the merge will operate in O(lg n)
time, where n is the number of nodes in the larger of the two heaps. We
can develop slightly simpler algorithms if we build our heaps so that the
right-hand child always has the smaller null path length, as in Figure 5.6(a).
We therefore define a leftist tree to be a binary tree which, if nonempty, has The term
“leftist” refers to
two leftist trees as children, with the right-hand child having a null path the tendency of
length no larger than that of the left-hand child. A leftist heap is then a these structures
leftist tree that is also a heap. to be heavier on
the left.
In order to implement a leftist heap, we will use an implementation of
a leftist tree. The leftist tree implementation will take care of maintaining
the proper shape of the tree. Because we will want to combine leftist trees
to form larger leftist trees, we must be able to handle the case in which
two given leftist trees have nodes in common. The simplest way to handle
this situation is to define the implementation to be an immutable structure.
Because no changes can be made to the structure, we can treat all nodes
as distinct, even if they are represented by the same storage (in which case
they are the roots of identical trees).
In order to facilitate fast computation of null path lengths, we will record
the null path length of a leftist tree in one of its representation variables.
Thus, when forming a new leftist tree from a root and two existing leftist
trees, we can simply compare the null path lengths to decide which tree
should be used as the right child. Furthermore, we can compute the null
path length of the new leftist tree by adding 1 to the null path length of its
right child.
For our representation of LeftistTree, we will therefore use four vari-
ables:
• root: a Keyed item;
• leftChild: a LeftistTree;
• rightChild: a LeftistTree; and
CHAPTER 5. PRIORITY QUEUES 161
20
15 13 15
10 7 11 10 7 13
5 3 5 3 11
(a) The original heap (b) Remove the root (20) and merge
the smaller of its children (13)
with the right child of the larger
of its children (7)
15 15
10 13 13 10
5 11 7 11 7 5
3 3
(c) Make 13 the root of the (d) Because 13 has a larger null path
subtree and merge the tree length than 10, swap them
rooted at 7 with the empty
right child of 13
CHAPTER 5. PRIORITY QUEUES 162
• nullPathLength: a Nat.
We will allow read access to all variables. Our structural invariant will be
that this structure is a leftist tree such that
• nullPathLength gives its null path length; and
Example 5.2 Consider the leftist heap shown in Figure 5.6(a). Suppose
we were to perform a RemoveMax on this heap. To obtain the resulting
heap, we must merge the two children of the root. The larger of the two
keys is 15; hence, it becomes the new root. We must then merge its right
child with the original right child of 20 (see Figure 5.6(b)). The larger of the
two roots is 13, so it becomes the root of this subtree. The subtree rooted at
7 is then merged with the empty right child of 13. Figure 5.6(c) shows the
result without considering the null path lengths. We must therefore make
sure that in each subtree that we’ve formed, the null path length of the right
child is no greater than the null path length of the left child. This is the
case for the subtree rooted at 13, but not for the subtree rooted at 15. We
therefore must swap the children of 15, yielding the final result shown in
Figure 5.6(d).
CHAPTER 5. PRIORITY QUEUES 163
LeftistHeap()
elements ← new LeftistTree(); size ← 0
LeftistHeap.Put(x, p : Number)
elements ← Merge(elements, new LeftistTree(new Keyed(x, p)))
size ← size + 1
LeftistHeap.MaxPriority()
return elements.Root().Key()
LeftistHeap.RemoveMax()
x ← elements.Root().Data()
elements ← Merge(elements.LeftChild(), elements.RightChild())
size ← size − 1
return x
Example 5.3 Consider again the heap shown in Figure 5.6(a), and suppose
it is a skew heap. Performing a RemoveMax on this heap proceeds as
shown in Figure 5.6 through part (c). At this point, however, for each node
at which a recursive Merge was performed, the children of this node are
swapped. These nodes are 13 and 15. The resulting heap is shown in Figure
5.10.
In order to understand why such a simple modification might be advan-
tageous, observe that in Merge, when S is merged with L.RightChild(),
we might expect the resulting heap to have a tendency to be larger than
L.LeftChild(). As we noted at the end of the previous section, good
CHAPTER 5. PRIORITY QUEUES 166
worst-case behavior can be obtained by ensuring that the left child of each
node has at least as many nodes as the right child. Intuitively, we might
be able to approximate this behavior by swapping the children after ev-
ery recursive call. However, this swapping does not always avoid expensive
operations.
Suppose, for example, that we start with an empty skew heap, then
insert the sequence of keys 2, 1, 4, 3, . . . , 2i, 2i − 1, 0, for some i ≥ 1. Figure
5.11 shows this sequence of insertions for i = 3. Note that each time an
even key is inserted, because it is the largest in the heap, it becomes the Corrected
3/30/11.
new root and the original heap becomes its left child. Then when the next
key is inserted, because it is smaller than the root, it is merged with the
empty right child, then swapped with the other child. Thus, after each odd
key is inserted, the heap will contain all the even keys in the rightmost path
(i.e., the path beginning at the root and going to the right until it reaches
an empty subtree), and for i ≥ 1, key 2i will have key 2i − 1 as its left child.
Finally, when key 0 is inserted, because it is the smallest key in the heap,
it will successively be merged with each right child until it is merged with
the empty subtree at the far right. Each of the subtrees on this path to the
CHAPTER 5. PRIORITY QUEUES 167
15
13 10
7 11 5
right is then swapped with its sibling. Clearly, this last insertion requires
Θ(i) running time, and i is proportional to the number of nodes in the heap.
The bad behavior described above results because a long rightmost path
is constructed. Note, however, that 2i Put operations were needed to con-
struct this path. Each of these operations required only Θ(1) time. Fur-
thermore, after the Θ(i) operation, no long rightmost paths exist from any
node in the heap (see Figure 5.11). This suggests that a skew heap might
have good amortized running time.
A good measure of the actual cost of the SkewHeap operations is the
number of calls to Merge, including recursive calls. In order to derive a
bound on the amortized cost, let us try to find a good potential function.
Based upon the above discussion, let us say that a node is good if its left
child has at least as many nodes as its right child; otherwise, it is bad. We
now make two key observations, whose proofs are left as exercises:
• In any binary tree with n nodes, the number of good nodes in the
rightmost path is no more than lg(n + 1).
2 2 4 4
1 2 3 2
1 1
6 6 6
4 5 4 4 5
3 2 3 2 2 3
1 1 0 1
the two rightmost paths is logarithmic, the potential function can increase
by only a logarithmic amount on any Merge. Furthermore, because any
bad node encountered becomes good, the resulting change in potential will
cancel the actual cost associated with this call, leaving only a logarithmic
number of calls whose actual costs are not canceled. As a result, we should
expect the amortized costs of the SkewHeap operations to be in O(lg n),
where n is the number of elements in the priority queue (the details of the
analysis are left as an exercise). Thus, a SkewHeap provides a simple, yet
efficient, implementation of PriorityQueue.
CHAPTER 5. PRIORITY QUEUES 169
results of any prior calls. This function can typically be implemented using
a built-in random number generator. Most platforms provide a function
returning random values uniformly distributed over the range of signed in-
tegers on that platform. In a standard signed integer representation, the
negative values comprise exactly half the range. The FlipCoin function
can therefore generate a random integer and return heads iff that integer is
negative.
It usually makes no sense to analyze the worst-case running time for a
randomized algorithm, because the running time usually depends on random
events. For example, if a given heap consists of a single path with n nodes,
the algorithm could follow exactly that path. However, this could only
happen for one particular sequence of n coin flips. If any of the flips differ
from this sequence, the algorithm reaches a base case and terminates at that
point. Because the probability of flipping this exact sequence is very small
for large n, a worst-case analysis seems inappropriate. Perhaps more to the
point, a worst-case analysis would ignore the effect of randomization, and
so does not seem appropriate for a randomized algorithm.
Instead, we can analyze the expected running time of a randomized al-
gorithm. The goal of expected-case analysis is to bound the average perfor-
mance over all possible executions on a worst-case input. For an ordinary
deterministic algorithm, there is only one possible execution on any given
input, but for randomized algorithms, there can be many possible executions
depending on the random choices made.
Expected-case analysis is based on the expected values of random vari-
ables over discrete probability spaces. A discrete probability space is a count- A set is said to
countable if each
able set of elementary events, each having a probability. For an elementary element can be
event e in a discrete probability space S, we denote the probability of e by labeled with a
P (e). For any discrete probability space S, we require that 0 ≤ P (e) ≤ 1 unique natural
number.
and that X
P (e) = 1.
e∈S
As a simple example, consider the flipping of a fair coin. The probability
space is {heads, tails}, and each of these two elementary events has probabil-
ity 1/2. For a more involved example, let T be a binary tree, and consider
the probability space PathT consisting of paths from the root of T to empty
subtrees. We leave as an exercise to show that if T has n nodes, then it
has n + 1 empty subtrees; hence PathT has n + 1 elements. In order that
it be a probability space, we need to assign a probability to each path. The
probability of a given path of length k should be the same as the probability
of the sequence of k coin flips that yields this path in the Merge algorithm;
CHAPTER 5. PRIORITY QUEUES 171
Thus, by multiplying the value of the random variable for each elementary
event by the probability of that elementary event, we obtain an average
value for that variable. Note that it is possible for an expected value to
be infinite. If the summation converges, however, it converges to a unique
value, because all terms are nonnegative.
Example 5.4 Let T be a binary tree with n nodes, such that all paths from
the root to empty subtrees have the same length. Because the probability of
each path is determined solely by its length, all paths must have the same
probability. Because there are n + 1 paths and the sum of their probabilities
is 1, each path must have probability 1/(n + 1). In this case, E[lenT ] is
simply the arithmetic mean, or simple average, of all of the lengths:
X
E[lenT ] = lenT (e)P (e)
e∈PathT
1 X
= lenT (e).
n+1
e∈PathT
Furthermore, because the lengths of all of the paths are the same, E[lenT ]
must be this length, which we will denote by k.
We have defined the probability of a path of length k to be 2−k . Fur-
thermore, we have seen that all probabilities are 1/(n + 1). We therefore
have
2−k = 1/(n + 1).
Solving for k, we have
k = lg(n + 1).
Thus, E[lenT ] = lg(n + 1).
CHAPTER 5. PRIORITY QUEUES 172
Note that because the sum of the probabilities of all elementary events in
a discrete probability space is 1, the probability of an event is never more
than 1.
The following theorem gives a technique for computing expected values
of discrete random variables that range over the natural numbers. It uses
predicates like “f = i” to describe events; e.g., the predicate “f = i” defines
the event in which f has the value i, and P (f = i) is the probability of this
event.
In the above sum, the negative portion iP (f ≥ i + 1) of the ith term cancels
most of the positive portion (i + 1)P (f ≥ i + 1) of the (i + 1)st term.
The result of this cancellation is the desired sum. However, in order for
this reasoning to be valid, it must be the case that the “leftover” term,
−iP (f ≥ i + 1), converges to 0 as i approaches infinity if E[f ] is finite. We
leave the details as an exercise.
CHAPTER 5. PRIORITY QUEUES 173
Example 5.6 Let T be a binary tree in which each of the n nodes has
an empty left child; i.e., the nodes form a single path going to the right.
Again, the size of PathT is n + 1, but now the probabilities are not all the
same. The length of the path to the rightmost empty subtree is n; hence,
its probability is 2−n . For 1 ≤ i ≤ n, there is exactly one path that goes
right i − 1 times and left once. The probabilities for these paths are given
by 2−i . We therefore have
X
E[lenT ] = lenT (e)P (e)
e∈PathT
n
X
= n2−n + i2−i .
i=1
lg x + lg y ≤ 2 lg(x + y) − 2.
In order to isolate lg xy, let us now subtract xy from the fraction in the
above equation. This yields
2
x + 2xy + y 2
2 lg(x + y) − 2 = lg
4
x2 − 2xy + y 2
= lg xy +
4
(x − y)2
= lg xy +
4
≥ lg xy,
We can now show that lg(n + 1) is an upper bound for E[lenT ] when T
is a binary tree with n nodes.
Theorem 5.8 Let T be any binary tree with size n, where n ∈ N. Then
E[lenT ] ≤ lg(n + 1).
Proof: By induction on n.
Base: n = 0. Then only one path to an empty tree exists, and its length is
0. Hence, E[lenT ] = 0 = lg 1.
CHAPTER 5. PRIORITY QUEUES 175
because the probability of any path from the root of a child of T to any
empty subtree is twice the probability of the path from the root of T to the
same empty subtree, and its length is one less.
Because the two sums in (5.1) are similar, we will simplify just the first
one. Thus,
X P (e) 1 X X
(lenL (e) + 1) = lenL (e)P (e) + P (e)
2 2
e∈PathL e∈PathL e∈PathL
1 X
= lenL (e)P (e) + 1 ,
2
e∈PathL
CHAPTER 5. PRIORITY QUEUES 176
The fact that the expected length of a randomly chosen path in a binary
tree of size n is never more than lg(n + 1) gives us reason to believe that the
expected running time of RandomizedHeap.Merge is in O(lg n). How-
ever, Merge operates on two binary trees. We therefore need a bound on
the expected sum of the lengths of two randomly chosen paths, one from
each of two binary trees. Hence, we will combine two probability spaces
PathS and PathT to form a new discrete probability space Paths S,T . The
elementary events of this space will be pairs consisting of an elementary
event from PathS and an elementary event from PathT .
We need to assign probabilities to the elementary events in Paths S,T . In
so doing, we need to reflect the fact that the lengths of any two paths from
S and T are independent of each other; i.e., knowing the length of one path
tells us nothing about the length of the other path. Let e1 and e2 be events
over a discrete probability space S. We say that e1 and e2 are independent
if P (e1 ∩ e2 ) = P (e1 )P (e2 ).
Suppose we were to define a new discrete probability space Se2 including
only those elementary events in the event e2 . The sum of the probabilities
of these elementary events is P (e2 ). If we were to scale all of these proba-
bilities by dividing by P (e2 ), we would achieve a total probability of 1 while
preserving the ratio of any two probabilities. The probability of event e1
within Se2 would be given by
P (e1 ∩ e2 )
P (e1 | e2 ) = , (5.2)
P (e2 )
where the probabilities on the right-hand side are with respect to S. We call
P (e1 | e2 ) the conditional probability of e1 given e2 . Note that if P (e2 ) 6=
0, independence of e1 and e2 is equivalent to P (e1 ) = P (e1 | e2 ). Thus,
two events are independent if knowledge of one event does not affect the
probability of the other.
The definition of independence tells us how to assign the probabilities
in Paths S,T . Let e1 be the event such that the path in S is s, and let e2
be the event such that the path in T is t. Then e1 ∩ e2 is the elementary
event consisting of paths s and t. We need P (e1 ∩ e2 ) = P (e1 )P (e2 ) in
order to achieve independence. However, P (e1 ) should be the probability
of s in PathS , and P (e2 ) should be the probability of t in PathT . Thus the
probability of an elementary event in Paths S,T must be the product of the
probabilities of the constituent elementary events from PathS and PathT .
It is then not hard to verify that P (e1 ) and P (e2 ) are the probabilities of s
in PathS and of t in PathT , respectively.
CHAPTER 5. PRIORITY QUEUES 177
We now extend the discrete random variables lenS and lenT to the space
Paths S,T so that lenS gives the length of the path in S and lenT gives
the length of the path in T . Because neither the lengths of the paths nor
their probabilities change when we make this extension, it is clear that their
expected values do not change either.
The running time of RandomizedHeap.Merge is clearly proportional
to the lengths of the paths followed in the two heaps S and T . These paths
may or may not go all the way to an empty subtree, but if not, we can extend
them to obtain elementary events s and t in PathS and PathT , respectively.
The running time is then bounded above by c(lenS (s) + lenT (t)), where
c is some fixed positive constant. The expected running time of Merge
is therefore bounded above by E[c(lenS + lenT )]. In order to bound this
expression, we need the following theorem.
and " #
∞
X ∞
X
E hi = E[hi ].
i=0 i=0
where |S| and |T | denote the sizes of S and T , respectively. Thus, the
expected running time of Merge is in O(lg n), where n is the total number
of nodes in the two heaps. It follows that the expected running times of
Put and RemoveMax are also in O(lg n).
A close examination of Example 5.4 reveals that the bound of lg(n + 1)
on E[lenT ] is reached when n + 1 is a power of 2. Using the fact that lg is
CHAPTER 5. PRIORITY QUEUES 178
smooth, we can then show that the expected running time of Merge is in
Ω(lg n); the details are left as an exercise. Thus, the expected running times
of Put and RemoveMax are in Θ(lg n).
(a)
55 48 52 37 41 50 70 75 85 89 94
unsorted sorted
(priority queue)
(b)
52 48 50 37 41 70 75 85 89 94
unsorted sorted
(priority queue)
55
(c)
52 48 50 37 41 55 70 75 85 89 94
unsorted sorted
(priority queue)
1
89
2 3
53 32
4 5 6 7
48 53 17 27
8 9 10
24 13 32
CHAPTER 5. PRIORITY QUEUES 180
or exactly twice the index of its parent. Likewise, if x has a right child, its
index is 1 greater than that of y.
As a result of these relationships, we can use simple calculations to find
either child or the parent of a node at a given location. Specifically, the left
and right children of the element at location i are the elements at locations
2i and 2i + 1, respectively, provided they exist. Furthermore, the parent of
the element at location i > 1 is at location ⌊i/2⌋.
Let us consider how we can implement a binary heap as a data structure.
We will use two representation variables:
• size: a Nat.
We allow read access to size. For reasons that will become clear shortly,
elements[0] will act as a sentinel element, and will have as its key the max- A sentinel
element is an
imum allowable value. For convenience, we will use a constant sentinel to extra element
represent such a data item. Note because ⌊1/2⌋ = 0, we can treat elements[0] added to a data
as if it were the parent of elements[1]. structure in order
to indicate when
The structural invariant will be: a traversal of
that structure
• size ≤ SizeOf(elements); has reached the
end.
• elements[0] = sentinel; and
Unfortunately, the algorithms for merging heaps don’t work for binary
heaps because they don’t maintain the balanced shape. Therefore, let us
consider how to insert an element x into a binary heap. If size is 0, then we
can simply make x the root. Otherwise, we need to compare x.Key() with
the key of the root. The larger of the two will be the new root, and we can
then insert the other into one of the children. We select which child based
on where we need the new leaf.
In this insertion algorithm, unless the tree is empty, there will always be
a recursive call. This recursive call will always be on the child in the path
that leads to the location at which we want to add the new node. Note
that the keys along this path from the root to the leaf are in nonincreasing
order. As long as the key to be inserted is smaller than the key to which it
is compared, it will be the inserted element in the recursive call. When it is
compared with a smaller key, that smaller key is used in the recursive call.
When this happens, the key passed to the recursive call will always be at
least as large as the root of the subtree in which it is being inserted; thus,
it will become the new root, and the old root will be used in the recursive
call. Thus, the entire process results in inserting the new key at the proper
point in the path from the root to the desired insertion location.
For example, suppose we wish to insert the priority 35 into the binary
heap shown in Figure 5.15(a). We first find the path to the next insertion
point. This path is h89, 32, 17i. The proper position of 35 in this path
is between 89 and 32. We insert 35 at this point, pushing the following
priorities downward. The result is shown in Figure 5.15(b).
Because we can easily find the parent of a node in a BinaryHeap, we
can implement this algorithm bottom-up by starting at the location of the
new leaf and shifting elements downward one level until we reach a location
where the new element will fit. This is where having a sentinel element is
convenient — we know we will eventually find some element whose key is at
least as large as that of x. The resulting algorithm is shown in Figure 5.16.
We assume that Expand(A) returns an array of twice the size of A, with
the elements of A copied to the first half of the returned array.
The RemoveMax operation is a bit more difficult. We need to remove
the root because it contains the element with maximum priority, but in or-
der to preserve the proper shape of the heap, we need to remove a specific
leaf. We therefore first save the value of the root, then remove the proper
leaf. We need to form a new heap by replacing the root with the removed
leaf. In order to accomplish this, we use the MakeHeap algorithm shown
in Figure 5.17. For ease of presentation, we assume t is formed with Bi-
naryTreeNodes, rather than with an array. If the key of x is at least as
CHAPTER 5. PRIORITY QUEUES 182
89 89
65 32 65 35
48 53 17 27 48 53 32 27
24 13 27 41 24 13 27 41 17
BinaryHeap.Put(x, p : Number)
size ← size + 1
if size > SizeOf(elements)
elements ← Expand(elements)
i ← size; elements[i] ← elements[⌊i/2⌋]
// Invariant: 1 ≤ i ≤ size, elements[1..size] forms a heap,
// elements[0..size] contains the elements originally in
// elements[0..size − 1], with elements[i] and elements[⌊i/2⌋]
// being duplicates, and p > elements[j].Key() for
// 2i ≤ j ≤ max(2i + 1, size).
while p > elements[i].Key()
i ← ⌊i/2⌋; elements[i] ← elements[⌊i/2⌋]
elements[i] ← new Keyed(x, p)
CHAPTER 5. PRIORITY QUEUES 183
large as the keys of the roots of all children of t, we can simply replace the
root of t with x, and we are finished. Otherwise, we need to move the root
of the child with larger key to the root of t and make a heap from this child
and x. This is just a smaller instance of the original problem.
We can simplify MakeHeap somewhat when we use it with a binary
heap. First, we observe that once we have determined that at least one child
is nonempty, we can conclude that the left child must be nonempty. We also
observe that the reduction is a transformation to a smaller instance; i.e.,
MakeHeap is tail recursive. We can therefore implement it using a loop.
In order to simplify the statement of the loop invariant, we make use of the
fact that the entire tree is initially a heap, so that the precondition of Make-
Heap could be strengthened to specify that t is a heap. (Later we will use
CHAPTER 5. PRIORITY QUEUES 184
BinaryHeap.RemoveMax()
if size = 0
error
else
m ← elements[1].Data(); size ← size − 1; i ← 1
// Invariant: elements[1..size] forms a heap; 1 ≤ i ≤ size + 1;
// elements[1..i − 1], elements[i + 1..size + 1], and m are
// the elements in the original set;
// elements[size + 1].Key() ≤ elements[⌊i/2⌋].Key();
// and m has maximum key.
while elements[i] 6= elements[size + 1]
j ← 2i
if j > size
elements[i] ← elements[size + 1]
else
if j < size and elements[j].Key() < elements[j + 1].Key()
j ←j+1
if elements[j].Key() ≤ elements[size + 1].Key()
elements[i] ← elements[size + 1]
else
elements[i] ← elements[j]; i ← j
return m
89 65
65 32 48 32
48 43 17 27 41 43 17 27
24 33 27 41 24 33 27
in which the parent of A[i] is A[⌊i/2⌋] for i > 1. With this view in mind, the
natural approach seems to be to make the children into heaps first, then use
MakeHeap to make the entire tree into a heap. The resulting algorithm is
easiest to analyze when the tree is completely balanced — i.e., when n + 1
is a power of 2. Let N = n + 1, and let f (N ) give the worst-case running
time for this algorithm. When N is a power of 2, we have
f (N ) ∈ 2f (N/2) + Θ(lg N ).
5.7 Summary
A heap provides a clean framework for implementing a priority queue. Al-
though LeftistHeaps yield Θ(lg n) worst-case performance for the opera-
tions Put and RemoveMax, the simpler SkewHeaps and Randomized-
Heaps yield O(lg n) amortized and Θ(lg n) expected costs, respectively, for
these operations. BinaryHeaps, while providing no asymptotic improve-
ments over LeftistHeaps, nevertheless tend to be more efficient in practice
because they require less dynamic memory allocation. They also provide the
basis for HeapSort, a Θ(n lg n) in-place sorting algorithm. A summary of
the running times of the PriorityQueue operations for the various imple-
mentations is shown in Figure 5.21.
CHAPTER 5. PRIORITY QUEUES 187
HeapSort(A[1..n])
// Invariant: A[1..n] is a permutation of its original elements such
// that for 2(i + 1) ≤ j ≤ n, A[⌊j/2⌋] ≥ A[j].
for i ← ⌊n/2⌋ to 1 by −1
MakeHeap(A[1..n], i, A[i])
// Invariant: A[1..n] is a permutation of its original elements such
// that for 2 ≤ j ≤ i, A[⌊j/2⌋] ≥ A[j], and
// A[1] ≤ A[i + 1] ≤ A[i + 2] ≤ · · · ≤ A[n].
for i ← n to 2 by −1
t ← A[i]; A[i] ← A[1]; MakeHeap(A[1..i − 1], 1, t)
Figure 5.21 Running times for the PriorityQueue operations for various
implementations.
Put RemoveMax
SortedArrayPriorityQueue Θ(n) Θ(1)
SimpleHeap Θ(n) Θ(1)
LeftistHeap Θ(lg n) Θ(lg n)
SkewHeap O(lg n) O(lg n)
amortized amortized
RandomizedHeap Θ(lg n) Θ(lg n)
expected expected
BinaryHeap Θ(lg n) Θ(lg n)
amortized
Notes:
• The constructor and the MaxPriority and Size operations all run
is Θ(1) worst-case time for all implementations.
5.8 Exercises
Exercise 5.1 Complete the implementation of SortedArrayPriority-
Queue shown in Figure 5.2 by adding a constructor and implementations
of the MaxPriority and RemoveMax operations. Prove that your im-
plementation meets its specification.
Exercise 5.2 Prove that SimpleHeap, shown in Figure 5.5, meets its spec-
ification.
Exercise 5.3 Show the result of first inserting the sequence of priorities
below into a leftist heap, then executing one RemoveMax.
Exercise 5.4 Prove that LeftistTree, shown in Figure 5.7, meets its
specification.
Exercise 5.5 Prove that LeftistHeap, shown in Figure 5.8, meets its
specification.
Exercise 5.7 Instead of keeping track of the null path lengths of each node,
a variation on LeftistTree keeps track of the number of nodes in each
subtree, and ensures that the left child has as many nodes as the right child.
We call this variation a LeftHeavyTree.
Precondition: true.
Postcondition: Returns a number representing a priority.
HasPriority.Priority()
Exercise 5.8 Repeat Exercise 5.3 using a skew heap instead of a leftist
heap.
Exercise 5.11 The goal of this exercise is to complete the analysis of the
amortized running times of the SkewHeap operations.
a. Prove by induction on n that in any binary tree T with n nodes, the
number of good nodes on its rightmost path is no more than lg(n + 1),
where the definition of a good node is as in Section 5.4.
b. Prove that in the SkewHeap.Merge operation (shown in Figure 5.9
on page 166) if L is initially a bad node, then it is a good node in the
resulting heap.
CHAPTER 5. PRIORITY QUEUES 191
c. Given two skew heaps to be merged, let us define the potential of each
node to be 0 if the node is good, or 1 if the node is bad. Using the
results from parts a and b above, prove that the actual cost of the
Merge operation, plus the sum of the potentials of the nodes in the
resulting heap, minus the sum of potentials of the nodes in the two
original heaps, is in O(lg n) where n is the number of keys in the two
heaps together.
d. Using the result of part c, prove that the amortized running times of
the SkewHeap operations are in O(lg n), where n is the number of
nodes in the heap.
Exercise 5.13 Prove by induction on n that any binary tree with n nodes
has exactly n + 1 empty subtrees.
Exercise 5.15 The goal of this exercise is to prove Theorem 5.5. Let f :
S → N be a discrete random variable.
a. Prove by induction on n that
n
X n
X
iP (f = i) = P (f ≥ i) − nP (f ≥ n + 1).
i=0 i=1
Exercise 5.17 Let S be the set of all sequences of four flips of a fair coin,
where each sequence has probability 1/16. Let h be the discrete random
variable giving the number of heads in the sequence.
a. Compute E[h].
Exercise 5.18 Use Example 5.4 to show that the expected running time of
RandomizedHeap.Merge, shown in Figure 5.12, is in Ω(lg n) in the worst
case, where n is the number of elements in the two heaps combined.
Exercise 5.20 Repeat Exercise 5.3 using a binary heap instead of a leftist
heap. Show the result as both a tree and an array.
Exercise 5.21 Prove that HeapSort, shown in Figure 5.20, meets its spec-
ification.
Exercise 5.22 Prove that the first loop in HeapSort runs in Θ(n) time
in the worst case.
Exercise 5.23 Prove that HeapSort runs in Θ(n lg n) time in the worst
case.
Exercise 5.24 We can easily modify the Sort specification (Figure 1.2 on
page 6) so that instead of sorting numbers, we are sorting Keyed items in
nondecreasing order of their keys. HeapSort can be trivially modified to
meet this specification. Any sorting algorithm meeting this specification is
said to be stable if the resulting sorted array always has elements with equal
keys in the same order as they were initially. Show that HeapSort, when
modified to sort Keyed items, is not stable.
CHAPTER 5. PRIORITY QUEUES 193
• if the job has already been executed for a < ei time units, then t +
ei − a ≤ di (i.e., the job can meet its deadline).
Note that this scheduling strategy may preempt jobs, and that it will discard
jobs that have been delayed so long that they can no longer meet their
deadlines. Give an algorithm to produce such a schedule, when given a
sequence of jobs ordered by ready time. Your algorithm should store the
ready jobs in an InvertedPriorityQueue. (You do not need to give an
implementation of InvertedPriorityQueue.) Show that your algorithm
operates in O(k lg n) time, where k is length of the schedule and n is the
number of jobs. You may assume that k ≥ n and that Put and RemoveMin
both operate in Θ(lg n) time in the worst case.
Exercise 5.26 The game of craps consists of a sequence of rolls of two six-
sided dice with faces numbered 1 through 6. The first roll is known as the
come-out roll. If the come-out roll is a 7 or 11 (the sum of the top faces
of the two dice), the shooter wins. If the come-out roll is a 2, 3, or 12,
the shooter loses. Otherwise, the result is known as the point. The shooter
continues to roll until the result is either the point (in which case the shooter
wins) or a 7 (in which case the shooter loses).
a. For each of the values 2 through 12, compute the probability that any
single roll is that value.
b. A field bet can be made on any roll. For each dollar bet, the payout
is determined by the roll as follows:
• 2 or 12: $3 (i.e., the bettor pays $1 and receives $3, netting $2);
• 3, 4, 9, 10 or 11: $2;
• 5, 6, 7, or 8: 0.
c. A pass-line bet is a bet, placed prior to the come-out roll, that the
shooter will win. For each dollar bet, the payout for a win is $2,
whereas the payout for a loss is 0. Compute the expected payout for a
pass-line bet. [Hint: The problem is much easier if you define a finite
probability space, ignoring those rolls that don’t affect the outcome.
In order to do this you will need to use conditional probabilities (e.g.,
given that the roll is either a 5 or a 7, the probability that it is a 5).]
Storage/Retrieval I: Ordered
Keys
195
CHAPTER 6. STORAGE/RETRIEVAL I: ORDERED KEYS 196
Precondition: true.
Postcondition: Completes without an error. May modify the state of x.
Visitor.Visit(x)
Precondition: true.
Postcondition: Constructs an empty Dictionary.
Dictionary()
Precondition: k is a Key.
Postcondition: Returns the element with key k, or nil if no item with key
k is contained in the set.
Dictionary.Get(k)
Precondition: x 6= nil, and k is a Key that is not associated with any
item in the set.
Postcondition: Adds x to the set with key k.
Dictionary.Put(x, k)
Precondition: k is a Key.
Postcondition: If there is an item with key k in the set, this item is
removed.
Dictionary.Remove(k)
Precondition: true.
Postcondition: Returns the number of items in the set.
Dictionary.Size()
CHAPTER 6. STORAGE/RETRIEVAL I: ORDERED KEYS 197
Precondition: v is a Visitor.
Postcondition: Applies v.Visit(x) to every item x in the set in order of
their keys.
OrderedDictionary.VisitInOrder(v)
Example 6.1 Suppose we would like to print all of the data items in an in-
stance of OrderedDictionary in order of keys. We can accomplish this by
implementing Visitor so that its Visit operation prints its argument. Such
an implementation, Printer, is shown in Figure 6.4. Note that we have
strengthened the postcondition over what is specified in Figure 6.1. This is
allowable because an implementation with a stronger postcondition and/or
a weaker precondition is still consistent with the specification. Having de-
fined Printer, we can print the contents of the OrderedDictionary d
with the statement,
d.VisitInOrder(new Printer())
In order to implement OrderedDictionary, it is possible to store the
data items in a sorted array, as we did with SortedArrayPriorityQueue
in Section 5.1. Such an implementation has similar advantages and disad-
vantages to those of SortedArrayPriorityQueue. Using binary search,
we can find an arbitrary data item in Θ(lg n) time, where n is the number of
items in the dictionary. Thus, the Get operation can be implemented to run
in Θ(lg n) time. However, to add or remove an item requires Θ(n) time in
the worst case. Thus, Put and Remove are inefficient using such an imple-
mentation. The fact that the elements of an array are located contiguously
CHAPTER 6. STORAGE/RETRIEVAL I: ORDERED KEYS 198
54
23 77
13 35 64 92
17 28 41 71
allows random access, which, together with the fact that the elements are
sorted, facilitates the fast binary search algorithm. However, it is exactly
this contiguity that causes updates to be slow, because in order to maintain
sorted order, elements must be moved to make room for new elements or to
take the place of those removed.
If we were to use a linked list instead of an array, we would be able
to change the structure without moving elements around, but we would no
longer be able to use binary search. The “shape” of a linked list demands
a sequential search; hence, look-ups will be slow. In order to provide fast
updates and retrievals, we need a linked structure on which we can approx-
imate a binary search.
key at the current node, we look in the left child, and if it is larger, we look
in the right child. Because this type of search approximates a binary search,
this structure is called a binary search tree.
More formally we define a binary search tree (BST) to be a binary tree
satisfying the following properties:
• If the BST is nonempty, then both of its children are BSTs, and the
key in its root node is
• size, which is an integer giving the number of data items in the set.
BSTDictionary()
elements ← new BinaryTreeNode(); size ← 0
BSTDictionary.Get(k)
return Find(k, elements).Root().Data()
BSTDictionary.Put(x, k)
if x = nil
error
else
t ← Find(k, elements)
if t.Root() = nil
size ← size + 1
t.SetRoot(new Keyed(x, k))
t.SetLeftChild(new BinaryTreeNode())
t.SetRightChild(new BinaryTreeNode())
BSTDictionary.Remove(k)
t ← Find(k, elements)
if t.Root() 6= nil
if t.LeftChild().Root() = nil
Copy(t.RightChild(), t)
else if t.RightChild().Root() = nil
Copy(t.LeftChild(), t)
else
m ← t.RightChild()
// Invariant: The smallest key in the right child of t is in the
// subtree rooted at m.
while m.LeftChild().Root() 6= nil
m ← m.LeftChild()
t.SetRoot(m.Root()); Copy(m.RightChild(), m)
BSTDictionary.VisitInOrder(v)
TraverseInOrder(elements, v)
Figure 6.8 The result of deleting 54 from the BST shown in Figure 6.5 —
54 is replaced by 64, which in turn is replaced by 71.
64
23 77
13 35 71 92
17 28 41
times. This can happen, for example, if all left children are empty, so that
elements refers to a BST that consists of a single chain of nodes going to
the right (see Figure 6.9). The worst-case running time is therefore in Θ(n).
..
.
n
the remaining code runs in Θ(1) time. However, setting up a recurrence de-
scribing the worst-case running time, including the recursion but excluding
calls to v.Visit, is not easy. We must make two recursive calls, but all we
know about the sizes of the trees in these calls is that their sum is one less
than the size of the entire tree.
Let us therefore take a different approach to the analysis. As we have
already argued, v.Visit is called exactly once for each data item. Further-
more, it is easily seen that, excluding the calls made on empty trees, v.Visit
is called exactly once in each call to TraverseInOrder. A total of exactly
n calls are therefore made on nonempty trees. The calls made on empty trees
make no further recursive calls. We can therefore obtain the total number
of recursive calls (excluding the initial call made from VisitInOrder) by
counting the recursive calls made by each of the calls on nonempty trees.
Because each of these calls makes two recursive calls, the total number of
recursive calls is exactly 2n. Including the initial call the total number of
calls made to TraverseInOrder is 2n + 1. Because each of these calls
runs in Θ(1) time (excluding the time taken by v.Visit), the total time is
in Θ(n). Note that we cannot hope to do any better than this because the
specification requires that v.Visit be called n times.
55
42 79
31 53 74 86
25 34 47 61
12
height 3 with even more nodes. Nevertheless, the children of each nonempty
subtree have heights differing by at most 1, so it is an AVL tree.
Before we begin designing an AVL tree implementation of Ordered-
Dictionary, let us first derive an upper bound on the height of an AVL
tree with n nodes. We will not derive this bound directly. Instead, we will
first derive a lower bound on the number of nodes in an AVL tree of height
h. We will then transform this lower bound into our desired upper bound.
Consider an AVL tree with height h having a minimum number of nodes.
By definition, both children of a nonempty AVL tree must also be AVL trees.
By definition of the height of a tree, at least one child must have height h−1.
By definition of an AVL tree, the other child must have height at least h − 2.
In order to minimize the number of nodes in this child, its height must be
exactly h − 2, provided h ≥ 1. Thus, the two children are AVL trees of
heights h − 1 and h − 2, each having a minimum number of nodes.
The above discussion suggests a recurrence giving the minimum number
of nodes in an AVL tree of height h. Let g(h) give this number. Then for
h ≥ 1, the number of nodes in the two children are g(h − 1) and g(h − 2).
Then for h ≥ 1,
g(h) = g(h − 1) + g(h − 2) + 1, (6.1)
where g(−1) = 0 (the number of nodes in an empty tree) and g(0) = 1 (the
number of nodes in a tree of height 0).
CHAPTER 6. STORAGE/RETRIEVAL I: ORDERED KEYS 207
g2 (h) = g1 (2h)
= 2g1 (2h − 2) + 1
= 2g1 (2(h − 1)) + 1
= 2g2 (h − 1) + 1.
g2 then fits the form of Theorem 3.31. Applying this theorem, we obtain
g2 (h) ∈ Θ(2h ).
Thus, for sufficiently large h, there is a positive real number c1 such that
g1 (2h) = g2 (h)
≥ c1 2h .
g1 (h) ≥ c1 2h/2 .
CHAPTER 6. STORAGE/RETRIEVAL I: ORDERED KEYS 208
g1 (h) ≥ g1 (h − 1)
≥ c1 2(h−1)/2 (because h − 1 is even)
c1
= √ 2h/2 ,
2
so that for some positive real number c2 and all sufficiently large h,
c2 2h/2 ≤ g(h)
h ≤ 2(lg g(h) − lg c2 )
∈ O(lg g(h)).
Theorem 6.4 The worst-case height of an AVL tree is in Θ(lg n), where n
is the number of nodes.
By Theorem 6.4, if we can design operations that run in time linear in
the height of an AVL tree, these operations will run in time logarithmic in
the size of the data set. Certainly, adding or deleting a node will change the
heights of some of the subtrees in an AVL tree; hence, these operations must
re-establish balance. Computing the height of a binary tree involves finding
the longest path, which apparently requires examining the entire tree. How-
ever, we can avoid recomputing heights from scratch if we record the height
of each subtree. If the heights of both children are known, computing the
height of the tree is straightforward.
We therefore define the data type AVLNode, which is just like Binary-
TreeNode, except that it has an additional representation variable, height.
This variable is used to record the height of the tree as an integer. As for the
other three variables, we allow read/write access to height. The constructor
CHAPTER 6. STORAGE/RETRIEVAL I: ORDERED KEYS 209
d b
b e a d
a c c e
single nodes, and triangles denote arbitrary subtrees, which in some cases
may be empty. All nodes and subtrees are labeled in a way that corresponds
to the order of nodes in a BST (e.g., subtree c is to the right of node b and
to the left of node d). The rotation shown is known as a single rotate
right. It is accomplished by promoting node b to the root, then filling in the
remaining pieces in the only way that maintains the ordering of keys in the
BST. Suppose that in the “before” picture, the right child (e) has height h
and the left child has height h + 2. Because the left child is an AVL tree,
one of its two children has a height of h + 1 and the other has a height of
either h or h + 1. Suppose subtree a has a height of h + 1. Then it is easily
seen that this rotation results in an AVL tree.
The rotation shown in Figure 6.11 does not restore balance, however, if
subtree a has height h. Because the left child in the “before” picture has
height h+2, subtree c must have height h+1 in this case. After the rotation,
the left child has height h, but the right child has height h + 2. To take care
of this case, we need another kind of rotation called a double rotate right,
shown in Figure 6.12. It is accomplished by promoting node d to the root
and again filling in the remaining pieces in the only way that maintains the
ordering of keys. Suppose that subtrees a and g have height h and that the
subtree rooted at d in the “before” picture has height h + 1. This is then the
case for which a single rotate fails to restore balance. Subtrees c and e may
have heights of either h or h − 1 (though at least one must have height h).
It is therefore easily seen that following the rotation, balance is restored.
These two rotations handle the cases in which the left child has height
2 greater than the right child. When the right child has height 2 greater
than the left child a single rotate left or a double rotate left may be applied.
These rotations are simply mirror images of the rotations shown in Figures
CHAPTER 6. STORAGE/RETRIEVAL I: ORDERED KEYS 211
f d
b g b f
a d a c e g
c e
Example 6.5 Suppose we were to insert the key 39 into the AVL tree shown
in Figure 6.14(a). Using the ordinary BST insertion algorithm, 39 should
be made the right child of 35, as shown in Figure 6.14(b). To complete
the insertion, we must check the balance along the path to 39, starting at
the bottom. Both 35 and 23 satisfy the the balance criterion; however,
the left child of 42 has height 2, whereas the right child has height 0. We
CHAPTER 6. STORAGE/RETRIEVAL I: ORDERED KEYS 212
54 54 54
42 67 42 67 35 67
23 50 73 23 50 73 23 42 73
11 35 11 35 11 39 50
39
(a) The original tree (b) 39 is inserted as (c) A double rotate
into an ordinary right is done at
BST 42
f (h) ∈ f (h − 1) + Θ(1)
CHAPTER 6. STORAGE/RETRIEVAL I: ORDERED KEYS 214
c a
b D A b
a C B c
A B C D
ture. Notice that in Figure 6.11, nodes in subtree c do not get any closer
to the root as a result of the rotation. As a result, we need a new kind of
double rotation that can be applied when the node to be promoted is not a
“zig-zag” from its grandparent. So that we might distinguish between the
various double rotations, we will refer to the rotation of Figure 6.12 as a
zig-zag right, and to its mirror image as a zig-zag left. A zig-zig right is
shown in Figure 6.15. Note that by this rotation, the distance between the
root and any descendant of a is decreased by at least 1.
Our representation, interpretation, and structural invariant will be the
same as for BSTDictionary. The only differences will occur in the actual
implementations of the operations. In fact, the implementation of VisitIn-
Order will also be the same as for BSTDictionary.
Let us consider how we can implement a Find function. First, we observe
that no value needs to be returned, because if the key we are looking for
exists, we will bring it to the root of the tree. Hence, after invoking the
Find function, the Get operation only needs to look in the root to see if
the desired key is there. Second, we don’t want to bring a node representing
an empty subtree to the root. For this reason, we will need to verify that a
node is nonempty at some point before rotating it to the root. It therefore
seems reasonable to include as part of the precondition that the tree is
nonempty.
We therefore begin by comparing the given key k to the key at the root
of the given tree t. If the keys don’t match, we will need to look in the
appropriate child, after verifying that it is nonempty. However, we want to
CHAPTER 6. STORAGE/RETRIEVAL I: ORDERED KEYS 216
Figure 6.16 The Find internal function for the SplayDictionary imple-
mentation of OrderedDictionary.
50 50
47 63 47 63
43 53 70 43 60 70
34 51 60 65 76 34 53 65 76
51
60
50 63
47 53 70
43 51 65 76
34
Figure 6.18 The splay tree after the calls to Find and FindMin in Remove
A m
SplayDictionary.Remove(k)
if elements.Root() 6= nil
Find(k, elements)
if elements.Root().Key() = k
l ← elements.LeftChild()
r ← elements.RightChild()
if r.Root() = nil
elements = l
else
FindMin(r); r.SetLeftChild(l); elements ← r
Tc
c Tb′
Ta
b
Ta′ Tc′
a D
a c
Tb
A b
A B C D
B C
Tc Ta′
c a
Tb Tb′
Ta b D A b Tc′
a C B c
A B C D
cost (i.e., 1) plus the change in the potential function Φ. Noting that |Tb′ | =
|Tc |, we conclude that the change in Φ is
Tb Ta′
b a
Ta Tb′
a C A b
A B B C
In order to get a tight bound for this expression in terms of lg |Tc | − lg |Ta |,
we need to be a bit more clever. We would again like to use Theorem 5.7.
Note that |Ta | + |Tc′ | ≤ |Tc |; however, lg |Ta | + lg |Tc′ | does not occur in (6.6).
Let us therefore both add and subtract lg |Ta | to (6.6). Adding in the actual
cost, applying Theorem 5.7, and simplifying, we obtain the following bound
on the amortized cost of a zig-zig rotation:
lg |Tb′ | + lg |Tc′ | + lg |Ta | − 2 lg |Ta | − lg |Tb | + 1
≤ lg |Tb′ | + 2 lg |Tc | − 2 − 2 lg |Ta | − lg |Tb | + 1
≤ 3 lg |Tc | − 3 lg |Ta | − 1
= 3(lg |Tc | − lg |Ta |) − 1. (6.7)
Finally, let us analyze the amortized cost of a single rotate. We refer to
Figure 6.22 for this analysis. Clearly, the amortized cost is bounded by
lg |Tb′ | − lg |Ta | + 1 ≤ lg |Tb | − lg |Ta | + 1. (6.8)
Because each operation will do at most two single rotations (recall that a
deletion can do a single rotation in both the Find and the FindMin), the
“+ 1” in this bound will not cause problems.
We can now analyze the amortized cost of a Find. We first combine
bounds (6.5), (6.7), and (6.8) into a single recurrence defining a function
f (k, t) bounding the amortized cost of Find(k, t). Suppose Find(k, t) makes
a recursive call on a subtree s and performs a double rotation. We can then
combine (6.5) and (6.7) to define:
f (k, t) = 3(lg |t| − lg |s|) + f (k, s).
CHAPTER 6. STORAGE/RETRIEVAL I: ORDERED KEYS 223
For the base of the recurrence, suppose that either no rotation or a single
rotate is done. Using (6.8), we can define
where s is the subtree whose root is rotated to the root of t. Clearly, f (k, t) ∈
O(lg n), where n is the number of nodes in t. The amortized cost of Find,
and hence of Get, is therefore in O(lg n).
The analysis of Put is identical to the analysis of Find, except that we
must also account for the change in Φ when the new node is added to the
tree. When the new node is added, prior to any subsequent rotations, it is
a leaf. Let s denote the empty subtree into which the new leaf is inserted.
The insertion causes each of the ancestors of s, including s itself, to increase
in size by 1. Let t be one of these ancestors other than the root, and let t′ be
the same subtree after the new node is inserted. Note that t′ has no more
nodes than does the parent of t. If we think of the insertion as replacing the
parent of t by t′ , then this replacement causes no increase in Φ. The only
node for which this argument does not apply is the root. Therefore, the
increase in Φ is no more than lg(n + 1), where n is the number of nodes in
the tree prior to the insertion. The entire amortized cost of Put is therefore
in O(lg n).
Finally, let us consider the Remove operation. The Find has an amor-
tized cost in O(lg n). Furthermore, the amortized analysis of Find also
applies to FindMin, so that it is also in O(lg n). Finally, it is easily seen
that the actual removal of the node does not increase Φ. The amortized cost
of Remove is therefore in O(lg n) as well.
−∞ 12 26 35 48 63 66 73 ∞
additional references to skip over portions of the list (see Figure 6.23). Using
these additional references, a binary search can be approximated.
The main building block for a skip list is the data type SkipListNode,
which represents a data item, its key, a level n ≥ 1, and a sequence of n val-
ues, each of which is either a SkipListNode or empty. The representation
consists of three variables:
• key: a Key;
We interpret data as the represented data item, key as its associated key,
SizeOf(links) as the level of the SkipListNode, and links[i] as the ith
element of the sequence, where empty is represented by nil. We allow read
access to key and data. The complete implementation is shown in Figure
6.24.
We represent the OrderedDictionary with four variables:
• size: a Nat.
CHAPTER 6. STORAGE/RETRIEVAL I: ORDERED KEYS 225
Precondition: true.
Postcondition: Returns the level.
SkipListNode.Level()
return SizeOf(links)
We interpret the represented set to be the data items in the linked list
beginning with start and ending with end, using the variables links[1] to
obtain the next element in the list; the data items in start and end are
excluded from the set.
Our structural invariant is:
• Both start and end have a level of M ≥ maxLevel.
At this point, let us observe that a worst-case input must have x 6= nil and k
as a new key, not already in the set. On any such input, the overall running
time is the sum of the running times of the above five parts. By the linearity
CHAPTER 6. STORAGE/RETRIEVAL I: ORDERED KEYS 228
SkipListDictionary()
start ← new SkipListNode(nil, minKey, 100)
end ← new SkipListNode(nil, maxKey, 100)
for i ← 1 to 100
start.SetLink(i, end)
size ← 0; maxLevel ← 1
SkipListDictionary.Put(x, k)
if x = nil
error
else
l←1
while FlipCoin() = heads
l ←l+1
p ← Find(k, l)
if p[1].Link(1).Key() 6= k
maxLevel ← Max(maxLevel, l); q ← new SkipListNode(x, k, l)
for i ← 1 to l
q.SetLink(i, p[i].Link(i)); p[i].SetLink(i, q)
Theorem 6.7 For any real numbers a and c such that c > 1,
∞
X ca+1
ca−i = .
c−1
i=0
Proof:
∞
X n
X
ca−i = lim ca−i
n→∞
i=0 i=0
n
X
= ca lim (1/c)i
n→∞
i=0
(1/c)n+1 − 1
= ca lim 1 from (2.2)
c −1
n→∞
c − (1/c)n
= ca lim
n→∞ c−1
ca+1
=
c−1
because 1/c < 1.
We now define the discrete random variable len over Seq such that len(e)
is the length of the sequence of flips. Note that E[len] gives us the expected
number of times the while loop condition is tested, as well as the expected
final value of l. As a result, it also gives us the expected number of iterations
of the for loop, provided k is not already in the set.
CHAPTER 6. STORAGE/RETRIEVAL I: ORDERED KEYS 230
! denotes the
\ Y product of the
P e = P (e), probabilities
P (e) for all
e∈T e∈T events e in T .
CHAPTER 6. STORAGE/RETRIEVAL I: ORDERED KEYS 231
then we say the events in S are mutually independent. We leave as an exer- In other words,
the probability
cise to show that pairwise independence does not necessarily imply mutual that all of the
independence, even for 3-element sets of events. events in T occur
Returning to Seqn , let lenij denote the event that component i has length is the product of
the probabilities
j, for 1 ≤ i ≤ n, j > 1. In any set {leni1 j1 , . . . , lenim jm } with 2 ≤ m ≤ n for each of these
and all the ik s different, the events should be mutually independent. Fur- events.
thermore, in order to be consistent with Seq, we want P (lenij ) = 2−j for
1 ≤ i ≤ n and j > 1. We can satisfy these constraints by setting the proba-
bility of elementary event he1 , . . . , en i to the product of the probabilities in
Seq of e1 , . . . , en ; i.e.,
n
Y
P (he1 , . . . , en i) = 2−len(ei ) .
i=1
• greater than the largest key less than k at any level j > i (or −∞ if
there is no such key).
Example 6.8 Let e represent the skip list shown in Figure 6.23 on page Corrected
4/7/11.
224. Then
• tail 1 (e) = 0 because there are no level-1 nodes following the last node
with level greater than 1;
• tail 2 (e) = 2 because there are 2 level-2 nodes following the last node
with level greater than 2; and
• tail 3 (e) = 1 because there is 1 level-3 node, and there are no nodes
with level greater than 3.
Suppose e describes some skip list with n elements, and suppose this
skip list’s Find function is called with a key larger than any in the list. The
running time of Find is then proportional to the number of times the while
loop condition is tested. On iteration i of the for loop, the while loop will
iterate exactly tail i (e) times, but will be tested tail i (e) + 1 times, including
the test that causes the loop to terminate. The expected running time of
Find on a worst-case input is therefore proportional to:
max(maxLevel,l)
X
E (tail i + 1)
i=1
max(maxLevel,l)
X
= E tail i + max(maxLevel, l)
i=1
max(maxLevel,l)
X
=E tail i + E[max(maxLevel, l)]. (6.9)
i=1
Let us first consider the first term in (6.9). It is tempting to apply lin-
earity of expectation to this term; however, note that maxLevel is a random
variable, as its value depends on the levels of the nodes in the skip list. The-
orem 5.9 therefore does not apply to this term. In particular, note that for
CHAPTER 6. STORAGE/RETRIEVAL I: ORDERED KEYS 233
any positive n and i, there is a non-zero probability that there is at least one
node at level i; hence, there is a non-zero probability that tail i is positive.
The proper way to handle this kind of a summation, therefore, is to
convert it to an infinite sum. The term inside the summation should be
equal to tail i when i ≤ max(maxLevel, l), but should be 0 for all larger i. In
this case, it is easy to derive such a term, as tail i = 0 when i > maxLevel.
We therefore have:
max(maxLevel,l)
"∞ #
X X
E tail i = E tail i
i=1 i=1
∞
X
= E[tail i ]. (6.10)
i=1
By Theorem 5.5,
∞
X
E[tail i ] = P (tail i ≥ j).
j=1
Suppose that there are at least j components with length at least i. In order
for tail i ≥ j, the ith coin flip in each of the last j of these components must
be tails. The probability that j independent coin flips are all tails is 2−j .
However, this is not the probability that tail i ≥ j, but rather the conditional
probability given that there are at least j components with length at least
i. Let numi denote the number of components whose length is at least i.
We then have
P (tail i ≥ j | numi ≥ j) = 2−j .
Fortunately, this conditional probability is closely related to P (tail i ≥ j).
Specifically, in order for tail i ≥ j, it must be the case that numi ≥ j. Thus,
the event tail i ≥ j is a subset of the event numi ≥ j. Therefore, from (5.2)
we have
=1
Theorem 6.9 Let e be any event in a discrete probability space. Then Corrected
4/7/11.
E[I(e)] = P (e).
Applying the above theorem, we obtain
Xn
E[numi ] = E I(len ≥ i)
j=1
= E[nI(len ≥ i)]
= nE[I(len ≥ i)]
= nP (len ≥ i)
= n21−i .
Clearly, E[numi ] > 1 iff i < 1+lg n. This suggests that maxLevel should
typically be about lg n (however, this is not a proof of the expected value
of maxLevel). Because we already know that the while loop is expected
to iterate no more than once for each level, this suggests that the overall
running time is logarithmic in n (assuming l is sufficiently small). While we
don’t have quite enough yet to show this, we can now show a logarithmic
bound on the first term in (6.9):
∞
X ∞
X
E[tail i ] ≤ (min(1, E[numi ]))
i=1 i=1
⌈lg n⌉ ∞
X X
= 1+ n21−i
i=1 i=⌈lg n⌉+1
∞
X
−⌈lg n⌉−i
= ⌈lg n⌉ + n 2
i=0
1−⌈lg n⌉
= ⌈lg n⌉ + n2
≤ ⌈lg n⌉ + n21−lg n
= ⌈lg n⌉ + 2. (6.11)
CHAPTER 6. STORAGE/RETRIEVAL I: ORDERED KEYS 236
For the case in which Find is called from Put, we know that E[l] = 2. Corrected
4/7/11.
We therefore need to evaluate E[maxLevel]. Note that maxLevel is the
number of nonempty levels. We can therefore use another indicator random
variable to express maxLevel — specifically,
∞
X
maxLevel = I(numi > 0).
i=1
We therefore have
∞
" #
X
E[maxLevel] = E I(numi > 0)
i=1
∞
X
= E[I(numi > 0)]. (6.13)
i=1
Clearly, I(numi > 0)(e) ≤ 1 for all e ∈ Seqn , so that E[I(numi > 0)] ≤
1. Furthermore, I(numi > 0)(e) ≤ numi (e), so that E[I(numi > 0)] ≤
E[numi ]. We therefore have E[I(numi > 0)] ≤ min(1, E[numi ]), which
is the same upper bound we showed for E[tail i ]. Therefore, following the
derivation of (6.11), we have
Now combining (6.9), (6.10), (6.11), (6.12), and (6.14), it follows that
the expected number of tests of the while loop condition is no more than
2(⌈lg n⌉ + 2) + 2 ∈ O(lg n)
for a worst-case input when Find is called by Put. The expected running Corrected
4/7/11.
time of Find in this context is therefore in O(lg n).
CHAPTER 6. STORAGE/RETRIEVAL I: ORDERED KEYS 237
A matching lower bound for the expected running time of Find can also
be shown — the details are outlined in Exercise 6.18. We can therefore
conclude that the expected running time of Find when called from Put on
a worst-case input is in Θ(lg n).
We can now complete the analysis of Put. We have shown that the
expected running times for both loops and the constructor for SkipList-
Node are all in Θ(1). The expected running time of Find(k, l) is in Θ(lg n). Corrected
4/7/11.
The remainder of the algorithm clearly runs in Θ(1) time. The total time
is therefore expected to be in Θ(lg n) for a worst-case input. We leave as
exercises to design Get and Remove to run in Θ(lg n) expected time, as
well.
Earlier, we suggested that for all practical purposes, fixed-sized arrays
could be used for both start.elements and end.elements. We can now justify
that claim by observing that
Thus, the probability that some element has a level strictly greater than 100
is at most n2−100 . Because 2−20 < 10−6 , this means that for n ≤ 280 ≈ 1024 ,
the probability that a level higher than 100 is reached is less than one in a
million. Such a small probability of error can safely be considered negligible.
6.5 Summary
A summary of the running times of the operations for the various imple-
mentations of OrderedDictionary is given in Figure 6.26. Θ(lg n)-time
implementations of the Get, Put, and Remove operations for the Or-
deredDictionary interface can be achieved in three ways:
• A balanced binary search tree, such as an AVL tree, guarantees Θ(lg n)
performance in the worst case.
Notes:
• The constructor and the Size operation each run in Θ(1) worst-case
time for each implementation.
Section 6.4 introduced the use of indicator random variables for ana-
lyzing randomized algorithms. The application of this technique involves
converting the expected value of a random variable to the expected values
of indicator random variables and ultimately to probabilities. Theorems 5.5,
5.9, and 6.9 are useful in performing this conversion. The probabilities are Corrected
4/7/11.
then computed using the probabilities of the elementary events and the laws
of probability theory. Because we are only interested in asymptotic bounds,
probabilities which are difficult to compute exactly can often be bounded
by probabilities that are easier to compute.
6.6 Exercises
Exercise 6.1 Prove the correctness of BSTDictionary.TraverseInOr-
der, shown in Figure 6.7.
CHAPTER 6. STORAGE/RETRIEVAL I: ORDERED KEYS 239
Exercise 6.2 Draw the result of inserting the following keys in the order
given into an initially empty binary search tree:
Exercise 6.3 Draw the result of deleting each of the following keys from
the tree shown in Figure 6.10, assuming that it is an ordinary binary search
tree. The deletions are not cumulative; i.e., each deletion operates on the
original tree.
a. 55
b. 74
c. 34
Exercise 6.5 Repeat Exercise 6.3 assuming the tree is an AVL tree.
Exercise 6.7 Repeat Exercise 6.3 assuming the tree is a splay tree.
Exercise 6.9 The depth of a node in a tree is its distance from the root;
specifically the root has depth 0 and the depth of any other node is 1 plus
the depth of its parent. Prove by induction on the height h of any AVL tree
that every leaf has depth at least h/2.
* Exercise 6.10 Prove that when a node is inserted into an AVL tree, at
most one rotation is performed.
** Exercise 6.11 Prove that if 2m − 1 keys are inserted into an AVL tree
in increasing order, the result is a perfectly balanced tree. [Hint: You will
need to describe the shape of the tree after n insertions for arbitrary n, and
prove this by induction on n.]
CHAPTER 6. STORAGE/RETRIEVAL I: ORDERED KEYS 240
Exercise 6.12 A red-black tree is a binary search tree whose nodes are
colored either red or black such that
• if a node is red, then the roots of its nonempty children are black; and
• from any given node, every path to any empty subtree has the same
number of black nodes.
We call the number of black nodes on a path from a node to an empty
subtree to be the black-height of that node. In calculating the black-height
of a node, we consider that the node itself is on the path to the empty
subtree.
a. Prove by induction on the height of a red-black tree that if the black-
height of the root is b, then the tree has at least 2b − 1 black nodes.
b. Prove that if a red-black tree has height h, then it has at least 2h/2 − 1
nodes.
c. Prove that if a red-black tree has n nodes, then its height is at most
2 lg(n + 1).
E[max(I(heads), I(tails))].
* Exercise 6.18 The goal of this exercise is to show a lower bound on the
expected running time of SkipListDictionary.Find. Corrected
4/7/11.
a. Prove that P (numi > 0) = 1 − (1 − 21−i )n . [Hint: First compute
P (numi = 0).]
b. Prove the binomial theorem, namely, for any real a, b, and natural
number n,
n
n
X n n−j j
(a + b) = a b , (6.15)
j
j=0
where
n n!
=
j j!(n − j)!
are the binomial coefficients for 0 ≤ j ≤ n. [Hint: Use induction on
n.]
d. Using the result of part c, Exercise 6.17, and (6.13), prove that
and hence, the expected running time of Find is in Ω(lg n). Corrected
4/7/11.
Exercise 6.19 Give algorithms for SkipListDictionary.Get and Skip-
ListDictionary.Remove. Prove that they meet their specifications and
run in expected Θ(lg n) time for worst-case input. Note that in both cases, Corrected
4/7/11.
you will need to modify the analysis of SkipListDictionary.Find to use
the appropriate value for E[l]. You may use the result of Exercise 6.18 for
the lower bounds.
Show that the three events are pairwise independent, but not mutually in-
dependent.
* Exercise 6.21 Let len be as defined in Section 6.4. For each of the
following, either find the expected value or show that it diverges (i.e., that
it is infinite).
a. E[2len ].
√
b. E[ 2len ].
All of the above trees can be manipulated by the tree viewer on this
textbook’s web site. The implementations of these trees within this package
are all immutable.
Another important balanced search tree scheme is the B-tree, introduced
by Bayer and McCreight [9]. A B-tree is a data structure designed for
accessing keyed data from an external storage device such as a disk drive.
B-trees therefore have high branching factor in order to minimize the number
of disk accesses needed.
Skip lists were introduced by Pugh [93].
Chapter 7
Storage/Retrieval II:
Unordered Keys
In the last chapter, we considered the problem of storage and retrieval, as-
suming that we also need to be able to access keys in a predefined order.
In this chapter, we drop this assumption; i.e., we will be considering imple-
mentations of Dictionary (see Figure 6.2, p. 196) rather than Ordered-
Dictionary. The structures we defined in the last chapter all utilized the
ordering on the keys to guide the searches. Hence, it might seem that there
is nothing to be gained by neglecting to keep the keys in order. However,
we will see that disorder can actually be more beneficial when it comes to
locating keys quickly.
244
CHAPTER 7. STORAGE/RETRIEVAL II: UNORDERED KEYS 245
proach using keys as array indices, provided we are willing to make some
assumptions. First, we assume that each key is a natural number (or equiv-
alently, each key can be treated as a natural number). Second, we assume
that there is a known upper bound on the values of all of the keys. Even
with these assumptions, it can still be the case that the range of the keys
is much larger than the number of keys. For example, suppose our data
set consists of 5,000 items keyed by 9-digit natural numbers (e.g., Social
Security Numbers). An array of 1 billion elements is required to store these
5,000 items. Initializing such an array would be very expensive.
Note, however, that once an array is initialized, storage and retrieval can
both be done in Θ(1) time in the worst case. What we need is a technique for
initializing an array in Θ(1) time while maintaining constant-time accesses
to elements. We will now present such a technique, known as virtual initial-
ization. This technique involves keeping track of which array elements have
been initialized in a way that facilitates making this determination quickly.
We assume that the environment provides a facility for allocating an array
in Θ(1) time without initializing its locations. We will call the resulting data
structure a VArray.
In addition to an array elements[0..n − 1] to store the data, we also
need an array used[0..n − 1] of Nats to keep track of which locations of
elements are used to store data. We use a Nat num to keep track of how
many locations of elements store data items. Thus, used[0..num − 1] will
be indices at which data items are stored in elements. Finally, in order
to facilitate a quick determination of whether elements[i] contains a data
element, we use a third array loc[0..n − 1] such that loc[i] stores the index in
used at which i is stored, if indeed i is in used[0..num − 1]. The structural
invariant is that 0 ≤ num ≤ n, and for 0 ≤ i < num, loc[used[i]] = i. We
interpret elements[i] as giving the data item at location i if 0 ≤ loc[i] < num
and used[loc[i]] = i; otherwise, we interpret the value stored at location i as
nil.
For example, Figure 7.1 shows a VArray with 10 locations, storing 35
at location 4, 17 at location 7, and nil at all other locations. Note that for
i = 4 or i = 7, 0 ≤ loc[i] < num and used[loc[i]] = i. For other values of i,
it is possible that loc[i] stores a natural number less than num; however, if
this is the case, then used[loc[i]] is either 4 or 7, so that used[loc[i]] 6= i.
To initialize all locations of the VArray to nil, we simply set num to 0.
In this way, there is no possible value of loc[i] such that 0 ≤ loc[i] < num,
so we interpret all locations as being nil. To retrieve the value at location
i, we first determine whether 0 ≤ loc[i] < num and used[loc[i]] = i. Note,
however, that loc[i] may not yet have been initialized, so that it may not
CHAPTER 7. STORAGE/RETRIEVAL II: UNORDERED KEYS 246
0 1 2 3 4 5 6 7 8 9
elements: ? ? ? ? 35 ? ? 17 ? ?
used: 7 4 ? ? ? ? ? ? ? ?
num = 2
loc: ? ? ? ? 1 ? ? 0 ? ?
Precondition: true.
Postcondition: Sets all locations to nil.
VArray.Clear()
num ← 0
length of the key, even if the keys are not natural numbers and have an
unbounded range. Consequently, if the keys do have a fixed range, the
amortized expected access time is in Θ(1).
7.2 Hashing
The technique we will develop over the remainder of this chapter is known
as hashing. The basic idea behind hashing is to convert each key k to an
index h(k) using a hash function h, so that for all k, 0 ≤ h(k) < m for some
positive integer m. h(k) is then used as an index into a hash table, which is
an array T [0..m − 1]. We then store the data item at that index.
Typically, the universe of keys is much larger than m, the size of the hash
table. By choosing our array size m to be close to the number of elements
we need to store, we eliminate the space usage problem discussed in Section
7.1. However, because the number of possible keys will now be greater than
m, we must deal with the problem that h must map more than one potential
key to the same index. When two actual keys map to the same index, it is
known as a collision.
The potential for collisions is not just a theoretical issue unlikely to
occur in practice. Suppose, for example, that we were to randomly and
independently assign indices to n keys, so that for any given key k and
index i, 0 ≤ i < m, the probability that k is assigned i is 1/m. We can
model this scenario with a discrete probability space consisting of the mn
n-tuples of natural numbers less than m. Each tuple is equally likely, and
so has probability m−n . We can then define the random variable coll as
the number of collisions; i.e., coll(hi1 , . . . , in i) is the number of ordered pairs
(ij , ik ) such that ij = ik and j < k.
coll can be expressed as the sum of indicator random variables as follows:
n−1
X n
X
coll(hi1 , . . . , in i) = I(ij = ik ).
j=1 k=j+1
CHAPTER 7. STORAGE/RETRIEVAL II: UNORDERED KEYS 249
Therefore,
n−1
X n
X
E[coll] = E I(ij = ik )
j=1 k=j+1
n−1
X n
X
= E[I(ij = ik )]
j=1 k=j+1
n−1
X n
X
= P (ij = ik ).
j=1 k=j+1
For each choice of i, j, and ij , ik can take on m possible values, one of which
is ij . Because the probabilities of all elementary events are equal, it is easily
seen that P (ij = ik ) = 1/m for j < k. Hence,
n−1
X n
X
E[coll] = 1/m
j=1 k=j+1
n−1
1 X
= (n − j)
m
j=1
n−1
1 X
= j (reversing the sum)
m
j=1
n(n − 1)
=
2m
by (2.1).
For example, if our hash table has 500,000 locations and we have more
than a thousand data elements, we should expect at least one collision, on
average. In general, it requires too much space to make the table large
enough so that we can reasonably expect to have no collisions.
Several solutions to the collision problem exist, but the most common is
to use a linked list to store all data elements that are mapped to the same
location. The approach we take here is similar, but we will use a ConsList
instead of a linked list. Using a ConsList results in somewhat simpler code,
and likely would not result in any significant performance degradation. This
approach is illustrated in Figure 7.3.
In the remainder of this section, we will ignore the details of specific hash
functions and instead focus on the other implementation details of a hash
table. In order to approach the use of hash functions in a general way, we
CHAPTER 7. STORAGE/RETRIEVAL II: UNORDERED KEYS 250
0 14
1 8 29
4 53 11 32
use the HashFunction ADT, shown in Figure 7.4. Note that because there
are no operations to change the hash function, the HashFunction ADT
specifies an immutable data type. In remaining sections of this chapter,
we will consider various ways of implementing a HashFunction. As we
will see in the next section, not all hash table sizes are appropriate for
every HashFunction implementation. For this reason, we allow the user
to select an approximate table size, but leave it up to the HashFunction
to determine the exact table size.
Our HashTable representation of Dictionary then consists of three
variables:
Precondition: n ≥ 1 is an Int.
Postcondition: Constructs a HashFunction for some table size that is
at least n and strictly less than 3n.
HashFunction(n)
Precondition: k refers to a Key.
Postcondition: Returns the index i associated with k by this HashFunc-
tion. i is a Nat strictly less than the table size.
HashFunction.Index(k)
Precondition: true.
Postcondition: Returns the table size for this HashFunction.
HashFunction.Size()
Theorem 7.1 Let T be a hash table with m locations, and suppose the
universe U of possible keys contains more than m(n − 1) elements. Then for
any function h mapping U to natural numbers less than m, there is some
natural number i < m such that h maps at least n keys in U to i.
The proof of the above theorem is simply the observation that if it were
CHAPTER 7. STORAGE/RETRIEVAL II: UNORDERED KEYS 252
HashTable()
size ← 0; hash ← new HashFunction(100)
table ← new Array[0..hash.Size() − 1]
for i ← 0 to SizeOf(table) − 1
table[i] ← new ConsList()
HashTable.Get(k)
i ← hash.Index(k); L ← table[i]
while not L.IsEmpty()
if L.Head().Key() = k
return L.Head().Data()
L ← L.Tail()
return nil
not true — i.e., if h maps at most n − 1 elements to each i — then the size
of U could be at most m(n − 1). Though this result looks bad, what it tells
us is that we really want h to produce a random distribution of the keys so
that the list lengths are more evenly distributed throughout the table.
For the remainder of this section, therefore, we will assume that the
key distribution is modeled by a discrete probability space hashDist. The
elementary events in hashDist are the same as those in the probability dis-
tribution defined above: all n-tuples of natural numbers less than m. Again,
the n positions in the tuple correspond to n keys, and their values give their
indices in the hash table. Regarding probabilities, however, we will make
a weaker assumption, namely, the probability that any two given distinct
positions are equal is at most ǫ, where 0 < ǫ < 1. Our earlier probability
space satisfies this property for ǫ = 1/m, but we will see in Sections 7.4 and
7.5 that other spaces do as well.
In what follows, we will analyze the expected length of the ConsList
CHAPTER 7. STORAGE/RETRIEVAL II: UNORDERED KEYS 253
= 1 + ǫ(n − 1).
The above value is the expected length of the ConsList searched when
the key is found in a table containing n keys. If the key is not in the table,
n − 1 gives the number of keys in the table, and E[len] is one greater than
CHAPTER 7. STORAGE/RETRIEVAL II: UNORDERED KEYS 254
the expected length of the ConsList. Thus, if we let n denote the number
of keys in the table, the length of the ConsList searched is expected to be
nǫ.
In either case, the length of the ConsList is linear in n if ǫ is a fixed
constant. However, ǫ may depend upon m. Thus, if ǫ ≤ c/m for some
positive constant c and we use an expandable array for the table, we can
keep the expected length bounded by a constant. Let λ = n/m be known
as the load factor of the hash table. Using the expandable array design
pattern, we can ensure that λ ≤ d, where d is a fixed positive real number
of our choosing. Thus, the expected list length is bounded by
1 + ǫn ≤ 1 + cn/m
= 1 + cλ
≤ 1 + cd
∈ O(1).
HashTable.Put(x, k)
i ← hash.Index(k); L ← table[i]
while not L.IsEmpty()
if L.Head().Key() = k
error
L ← L.Tail()
table[i] ← new ConsList(new Keyed(x, k), table[i])
size ← size + 1
if size/SizeOf(table) > λ
hash ← new HashFunction(2 · SizeOf(table))
t ← new Array[0..hash.Size() − 1]
for j ← 0 to SizeOf(t) − 1
t[j] ← new ConsList()
for j ← 0 to SizeOf(table) − 1
L ← table[j]
while not L.IsEmpty()
y ← L.Head(); i ← hash.Index(y.Key())
t[i] ← new ConsList(y, t[i]); L ← L.Tail()
table ← t
construct a deterministic hash function for which such cases are very unlikely
to occur in practice.
We will assume that our keys are represented as natural numbers. This
assumption does not result in any loss of generality, because all data types
can be viewed as sequences of bytes, or more generally, as w-bit compo-
nents. We can view each component as a natural number less than 2w . The
sequence hk1 , . . . , kl i then represents the natural number
l
X
ki 2w(l−i) ;
i=1
DivisionMethod.Index(k)
components[1..l] ← ToArray(k, w); h ← 0
for i ← 1 to l
h ← (h · 2w + components[i]) mod size
return h
k, we define
h(k) = k mod m,
where m is the number of array locations in the hash table. Thus, 0 ≤
k mod m < m. The table shown in Figure 7.3 uses the division method.
It is not hard to show that
We can therefore compute h(k) bottom-up by starting with the first com-
ponent of k and repeatedly multiplying by 2w , adding the next component,
and taking the result mod m.
The division method is illustrated in Figure 7.7, where an implementa-
tion of HashFunction is presented. The representation of HashFunction
is a Nat size, and the structural invariant is size > 0. We assume the exis-
tence of a function ToArray(x, w), which returns an array of Nats, each
strictly less than 2w , and which together give a representation of x. It is
easily seen that Index runs in time linear in the length of the key.
One advantage of the division method is that it can be applied quickly.
Because the multiplication is by a power of 2, it can be implemented by
CHAPTER 7. STORAGE/RETRIEVAL II: UNORDERED KEYS 257
shifting to the left by w bits. The addition then adds a w-bit number to
a number whose binary representation ends in w zeros; hence, the addition
can be accomplished via a bitwise or. Otherwise, there is only an application
of the mod operator for each word of the key.
The effectiveness of the division method as a hash function depends on
the value chosen for the table size. Knowing that we cannot prevent bad
cases from ever occurring, the best we can do is to try to avoid bad behavior
on cases which may be likely to occur. If data were random, our job would
be much simpler, because we could take advantage of this randomness to
generate a random distribution in the table. Real data sets, however, tend to
contain patterns. We need our hash function to perform well in the presence
of these patterns.
Suppose, for example, that the table size m = 255, and that each byte of
the key is a character encoded in ASCII. From the binomial theorem ((6.15)
on page 241), we can write the key as
l
X l
X
256l−i ki = (255 + 1)l−i ki
i=1 i=1
l−i
l X
X l−i
= 255j ki .
j
i=1 j=0
Each term of the inner sum such that j > 0 is divisible by 255; hence,
computing the key mod 255 yields:
l
! l
X X
i−1
256 ki mod 255 = ki mod 255.
i=1 i=1
This gives us a top-down solution that can be applied bottom-up in the same
way as we applied the division method directly to large keys. Specifically,
we start with k1 and repeatedly multiply by r and add the next ki . This
procedure requires one multiplication and one addition for each component
of the key. Furthermore, all computation can be done with single-word
arithmetic.
In order for this method to work well, r must be chosen properly. We first
note that 256 is a poor choice, because 256i mod 2w = 0 for all i ≥ w/8;
thus only the first w/8 components of the key are used in computing the hash
value. More generally, r should never be even, because (c2j )i mod 2w = 0
CHAPTER 7. STORAGE/RETRIEVAL II: UNORDERED KEYS 260
for j > 0 and i ≥ w/j. Furthermore, not all odd values work well. For
example, r = 1 yields ri = 1 for all i, so that the result is simply the
sum of the components, mod 2w . This has the disadvantage of causing all
permutations of a key to collide.
More generally, if r is odd, ri mod 2w will repeat its values in a cyclic
fashion. In other words, for every odd r there is a natural number n such
that rn+i mod 2w = ri for all i ∈ N. Fortunately, there are only a few values
of r (like 1) that have short cycles. In order to avoid these short cycles, we
would like to choose r so that this cycle length is as large as possible. It
is beyond the scope of this book to explain why, but it turns out that this
cycle length is maximized whenever r mod 8 is either 3 or 5.
We can run into other problems if r is small and the component size
is smaller than w. Suppose, for example that r = 3, w = 32, and each
component is one byte. For any key containing fewer than 15 components,
the polynomial-hash value will be less than 231 . We have therefore reduced
the range of possible results by more than half — much more for shorter
keys. As a result, more collisions than necessary are introduced. A similar
phenomenon occurs if r is very close to 2w .
If we avoid these problems, polynomial hashing usually works very well
as a compression map. To summarize, we should choose r so that r mod 8
is either 3 or 5, and not too close to either 0 or 2w . This last condition can
typically be satisfied if we choose an r with 5-9 bits (i.e., between 16 and
512). The division method can then be used to obtain an index into the
table. Because it will be applied to a single word, its computation consists
of a single mod operation.
than m. Let us also suppose that each element of H has some probability, so
that H is a discrete probability space. Two distinct keys k1 and k2 collide for
h ∈ H iff h(k1 ) = h(k2 ). Taking h(k1 ) and h(k2 ) as random variables over
H, we see that the probability that these keys collide is P (h(k1 ) = h(k2 )). If
two values from M are chosen independently with uniform probability, then
the probability that they are the same is 1/m. We therefore say that a H is
a universal family of hash functions if for any two keys in U , the probability
that they collide is no more than 1/m. As we showed in Section 7.2, this
probability bound implies that for any hash table access, the expected length
of the list searched is in Θ(1).
Several universal families of hash functions have been defined, but most
of them require some number theory in order to prove that they are universal
families. In what follows, we present a universal family that is easier to
understand at the cost of requiring a bit more computational overhead.
Then in the next section, we will show how number theory can be utilized
to define universal families whose hash functions can be computed more
efficiently.
Suppose each key k ∈ U is encoded by l bits. Ideally, we would like to
generate each function mapping U into M with equal probability. However,
doing so is too expensive. There are 2l keys in U , and m possible values to
which each could be mapped. The total number of possible hash functions is
l
therefore m2 . Uniquely identifying one of these functions therefore requires
l
at least lg m2 = 2l lg m bits. If, for example, each key is 32 bits and our
hash table size is 256, four gigabytes of storage would be needed just to
identify the hash function.
Instead, we will randomly generate a table location for each of the l bit
positions. Let these locations be t1 , . . . , tl . We will assume that m is a power
of 2 so that each of these locations is encoded using lg m bits. A given key
k will select the subsequence of ht1 , . . . , tl i such that ti is included iff the ith
bit of k is a 1. Thus, each key selects a unique subsequence of locations.
The hash table location of k is then given by the bitwise exclusive-or of the
locations in the subsequence; in other words, the binary encoding of the hash
location has a 1 in position j iff the number of selected locations having a 1
in position j is odd.
Example 7.2 Suppose our keys contain 4 bits, and we want to use a hash
table with 8 locations. We then randomly generate 4 table locations, one
for each of the 4 bit positions in the keys:
• t1 = 3, or 011 in binary;
CHAPTER 7. STORAGE/RETRIEVAL II: UNORDERED KEYS 262
• t2 = 6, or 110 in binary;
• t3 = 0, or 000 in binary;
• t4 = 3, or 011 in binary.
we must first define a discrete probability space that will represent the set
of hash functions. Let Sl,m be the set of all l-tuples of bit strings of length
lg m, where l is a positive integer and m is a power of 2. Each of these
l-tuples will represent a hash function. Note that Sl,m has ml elements. We
therefore assign each element of Sl,m a probability of m−l ; hence, Sl,m is
a discrete probability space in which each elementary event has the same
probability.
We can now formally define a hash function corresponding to each ele-
ment in Sl,m . Let select be the function that takes a sequence s = ht1 , . . . , tn i
of bit strings all having the same length, together with a bit string k1 · · · kn ,
and returns the subsequence of s such that ti is included iff ki = 1. Fur-
thermore, let X be the function that takes a sequence of bit strings each
having the same length and returns their bitwise exclusive-or. Given s =
ht1 , . . . , tl i ∈ Sl,m , let hs : U → M such that
We now define
1
Hl,m = {hs | s ∈ Sl,m }.
1
Each element h ∈ Hl,m corresponds to the event consisting of all sequences
s ∈ Sl,m such that h = hs . We leave it as an exercise to show that for
1 , there is exactly one such s; hence, there is a one-to-one
each h ∈ Hl,m
correspondence between elementary events in Sl,m and hash functions in
1 . We will now show that for every distinct k, k ′ ∈ U , P (h(k) = h(k ′ )) =
Hl,m
1
1/m, so that Hl,m is a universal family of hash functions. In the proof and
the implementation that follows, we use ⊗ to denote bitwise exclusive-or.
Proof: Suppose k and k ′ are two keys differing in bit position i. Without
loss of generality, suppose the ith bit of k is 0 and the ith bit of k ′ is 1.
Let k ′′ be the key obtained from k ′ by changing the ith bit to 0. Let tj
be the discrete random variable giving the value of the jth component of s
for s ∈ Sl,m , and let h(x) be the random variable giving the hash value of
x ∈ U . Then h(k ′ ) = h(k ′′ ) ⊗ ti . Thus, h(k) = h(k ′ ) iff h(k) = h(k ′′ ) ⊗ ti .
Because the ith bits of both k and k ′′ are 0, we can evaluate h(k) and
h(k ′′ ) knowing only t1 , . . . , ti−1 , ti+1 , . . . tl . For each choice of these values,
there is exactly one value of ti for which h(k) = h(k ′′ ) ⊗ ti , namely ti =
CHAPTER 7. STORAGE/RETRIEVAL II: UNORDERED KEYS 264
UniversalHash1(n)
size ← 2⌈lg n⌉ ; indices ← new Array[1..l]
for i ← 1 to l
indices[i] ← Random(size)
UniversalHash1.Index(k)
bits[1..l] ← ToArray(k, 1); h ← 0
for i ← 1 to l
if bits[i] = 1
h ← h ⊗ indices[i]
return h
h(k) ⊗ h(k ′′ ). There are then ml−1 hash functions for which k and k ′ collide.
Because each hash function occurs with probability m−l ,
In many applications, the key lengths may vary, and we may not know
the maximum length in advance. Such situations can be handled easily,
provided we may pad keys with zeros without producing other valid keys.
This padding may be done safely if the length of the key is encoded within
the key, or if each key is terminated by some specific value. We can therefore
consider each key as having infinite length, but containing only finitely many
1s. We can ensure that we have bit strings for indices[1..i] for some i. If we
encounter a key with a 1 in bit position j > i, we can generate bit strings for
positions i + 1 through j at that time. Note that neither of these strategies
add any significant overhead — they simply delay the generation of the bit
strings. We leave the implementation details as an exercise.
Theorem 7.4 Let a, b, and m be natural numbers such that 0 < a < m
and b < m. Then the equation
ai mod m = b
has a unique solution in the range 0 ≤ i < m iff a and m are relatively prime
(i.e., 1 is the greatest common divisor of a and m).
Proof: Because we will only need to use this theorem in one direction, we
will only prove one implication and leave the other as an exercise.
ai − q1 m = aj − q2 m
a(i − j) = (q1 − q2 )m,
CHAPTER 7. STORAGE/RETRIEVAL II: UNORDERED KEYS 266
For our next universal family, we will interpret the keys as natural num-
bers and assume that there is some maximum value for a key. Let p be a
prime number strictly larger than this maximum key value. Our hash func-
tions will consist of two steps. The first step will map each key to a unique
natural number less than p. We will design this part so that, depending
on which hash function is used, a distinct pair of keys will be mapped with
uniform probability to any of the pairs of distinct natural numbers less than
p. The second step will apply the division method to scale the value to an
appropriate range.
For the first step, let
for a and b strictly less than p. Consider distinct keys k and k ′ . We then
have
j = a(k − k ′ ) mod p
= (hp,a,b (k) − hp,a,b (k ′ )) mod p,
Lemma 7.5 Let p be a prime number, and let k and k ′ be distinct natural
numbers strictly less than p. If a and b are chosen independently and uni-
formly such that 1 ≤ a < p and 0 ≤ b < p, then hp,a,b (k) and hp,a,b (k ′ ) are
any pair of distinct natural numbers less than p with uniform probability.
To apply the second step of the hash function, let
fm (i) = i mod m,
Proof: Let k and k ′ be two distinct keys. As we argued above, hp,a,b (k)
and hp,a,b (k ′ ) are distinct natural numbers less than p, and each possible
pair of distinct values can be obtained by exactly one pair of values for a
and b. fm (hp,a,b (k)) = fm (hp,a,b (k ′ )) iff hp,a,b (k) mod m = hp,a,b (k ′ ) mod m
iff hp,a,b (k)−hp,a,b (k ′ ) is divisible by m. For any natural number i < p, there
are strictly fewer than p/m natural numbers j < p (other than i) such that
i − j is divisible by m. Because the number of these values of j is an integer,
it is at most (p − 1)/m. Because there are p possible values of hp,a,b (k) and
p(p − 1) possible pairs of values for hp,a,b (k) and hp,a,b (k ′ ), each of which is
equally likely, the probability that fm (hp,a,b (k)) = fm (hp,a,b (k ′ )) is at most
p p−1
m 1
= .
p(p − 1) m
Note that by the above theorem, Hp,m 2 is universal for any positive m.
As a result, the size of the hash table does not need to be a particular kind of
number, such as a prime number or a power of 2, in order for this strategy
to yield good expected performance. However, the restriction that p is a
CHAPTER 7. STORAGE/RETRIEVAL II: UNORDERED KEYS 268
prime number larger than the value of the largest possible key places some
limitations on the effectiveness of this approach. Specifically, if there is no
upper bound on the length of a key, we cannot choose a p that is guaranteed
to work. Furthermore, even if an upper bound is known, unless it is rather
small, the sizes of p, a, and b would make the cost of computing the hash
function too expensive.
Let us therefore treat keys as sequences of natural numbers strictly
smaller than some value p, which we presume to be not too large (e.g.,
small enough to fit in a single machine word). Furthermore, let us choose
p to be a prime number. Let hk1 , . . . , kl i be a key, and let s = ha1 , . . . , al i
be a sequence of natural numbers, each of which is strictly less than p. We
then define !
l
X
hp,s (hk1 , . . . , kl i) = ai ki mod p.
i=1
We first observe that we cannot guarantee that hp,s (k) 6= hp,s (k ′ ) for
each distinct pair of keys k and k ′ . The reason for this is that there are
potentially more keys than there are values of hp,s . However, suppose k and
k ′ are distinct keys, and let ki 6= ki′ , where 1 ≤ i ≤ l. Let us arbitrarily fix
the values of all aj such that j 6= i, and let
i−1
X l
X i−1
X l
X
c= aj kj′ + aj kj′ − aj kj − aj kj mod p.
j=1 j=i+1 j=1 j=i+1
Then
l
X l
X
(hp,s (k) − hp,s (k ′ )) mod p = aj kj − aj kj′ mod p
j=1 j=1
apply the mod operation after each addition, we are always working with
values having no more than roughly twice the number of bits as p; hence,
we can compute this hash function reasonably quickly for each key. Fur-
thermore, even if we don’t know the maximum key length, we can generate
the multipliers ai as we need them.
However, if we don’t know in advance the approximate size of the data
set, we may need to use rehashing. For the sake of efficiency, we would like to
avoid the need to apply a new hash function to the entire key. Furthermore,
as we will see in the next section, it would be useful to have a universal family
that is appropriate for large keys and for which the table size is unrestricted.
A straightforward attempt to achieve these goals is to combine Hp,l 3 with
2 . Specifically, we define
Hp,m
4 2 3
Hp,l,m = {h1 ◦ h2 | h1 ∈ Hp,m , h2 ∈ Hp,l }.
where a, b, and each ai are natural numbers, and a 6= 0. We define the prob-
4
ability of each element of Hp,l,m by selecting a, b, and each ai independently
with uniform probability. (We leave it as an exercise to show that the same
4
probability distribution for Hp,l,m can be achieved by setting a = 1 and
selecting b and each ai independently with uniform probability.)
Because Hp,m2 is a universal family, it causes any pair of distinct keys to
collide with probability at most 1/m. However, Hp,l 3 also causes distinct keys
to collide with probability 1/p. When the function from Hp,m 2 is applied to
equal values, it yields equal values. We must therefore be careful in analyzing
4
the probability of collisions for Hp,l,m .
Let us first consider the case in which two distinct keys k and k ′ are
mapped to distinct values by hp,s ∈ Hp,l 3 . From Lemma 7.7, the probability
Now consider the case in which hp,s (k) = hp,s (k ′ ). From Lemma 7.7, this
case occurs with probability 1/p. For any value of a, 1 ≤ a < p, and any
value of i, 0 ≤ i < p, there is exactly one value of b such that 0 ≤ b < p and
Thus, each value of i is reached with probability 1/p. Therefore, for each
natural number i < p, the probability that hp,a,b (hp,s (k)) = hp,a,b (hp,s (k ′ )) =
i is 1/p2 .
Thus, for a hash function h chosen from Hp,l,m4 , h(k) = i mod m and
′
h(k ) = j mod m, where i and j are natural numbers less than p chosen
CHAPTER 7. STORAGE/RETRIEVAL II: UNORDERED KEYS 271
p − (p mod m)
p
m collisions p − (p mod m) p
p2 − (p mod m)2
m
pairs result in collisions. Of the remaining (p mod m)2 pairs, only those
in which i = j result in collisions. There are exactly p mod m such pairs.
CHAPTER 7. STORAGE/RETRIEVAL II: UNORDERED KEYS 272
or equivalently, to minimize
There are several ways to find the minimum value of a quadratic, but
one way that does not involve calculus is by the technique of completing
the square. A quadratic of the form (ax − b)2 is clearly nonnegative for all
values of a, x, and b. Furthermore, it reaches a value of 0 (its minimum) at
x = b/a. We can therefore minimize f (m) by finding a value d such that
f (m) − d is of the form
p2 /8
1+ = 9/8.
p2
We therefore have the following theorem.
CHAPTER 7. STORAGE/RETRIEVAL II: UNORDERED KEYS 274
Theorem 7.9 For any prime number p and positive integers l and m such
4
that 1 < m < p, Hp,l,m is 9/8-universal.
The upper bound of 9/8 can be reached when m = 3p/4; however, in
order for this equality to be satisfied, p must be a multiple of 4, and hence
cannot be prime. We can, however, come arbitrarily close to this bound by
using a sufficiently large prime number p and setting m to either ⌊3p/4⌋ or
⌈3p/4⌉. Practically speaking, though, such values for m are much too large.
In practice, m would be much smaller than p, and as a result, the actual
probability of a collision would be much closer to 1/m.
By choosing p to be of an appropriate size, we can choose a single h of
the form !
X l
h(k) = a ai ki + b mod p,
i=1
0 12
0 57
1
1
2
0 2
3 51
1 15 3 64
4
2 27 4
5 24
3 5 36
6
6
7 16
7
8
8
9 83
(see Figure 7.11). Instead of using a ConsList to store all of the elements
that hash to a certain location, we use a secondary hash table with its own
hash function. The secondary hash tables that store more than one element
are much larger than the number of elements they store. As a result, we
will be able to find a hash function for each secondary hash table such that
no collisions occur. Furthermore, we will see that the sizes of the secondary
hash tables can be chosen so that the total number of locations in all of the
hash tables combined is linear in the number of elements stored.
Let us first determine an appropriate size m for a secondary hash table in
which we need to store n distinct keys. We saw in Section 7.2 that in order
for the expected number of collisions to be less than 1, if the probability
that two keys collide is 1/m, then m must be nearly n2 . We will therefore
assume that m ≥ n2 .
Let Hm be a c-universal family of hash functions. We wish to determine
an upper bound on the number of hash functions we would need to select
from Hm before we can expect to find one that produces no collisions among
CHAPTER 7. STORAGE/RETRIEVAL II: UNORDERED KEYS 277
the given keys. Let coll be the discrete random variable giving the total
number of collisions, as defined in Section 7.2, produced by a hash function
h ∈ Hm on distinct keys k1 , . . . , kn . As we showed in Section 7.2,
n−1
X n
X
E[coll] = P (h(ki ) = h(kj )).
i=1 j=i+1
Because the probability that any two distinct keys collide is no more than
c/m ≤ c/n2 , we have
n−1 n
X X c
E[coll] ≤
n2
i=1 j=i+1
n−1
c X
= (n − i)
n2
i=1
n−1
c X
= i (reversing the sum)
n2
i=1
cn(n − 1)
= (by (2.1))
2n2
< c/2.
From Markov’s Inequality (5.3) on page 194, the probability that there is at
least one collision is therefore less than c/2.
Suppose, for example, that c = 1, as for a universal hash family. Then
the probability that a randomly chosen hash function results in no collisions
4
is greater than 1/2. If c = 9/8, as for Hp,l,m , then the probability is greater
than 7/16. Suppose we repeatedly select hash functions and try storing
the keys in the table. Because the probability that there are no collisions
is positive whenever c < 2, we will eventually find a hash function that
produces no collisions.
Let us now determine how many hash functions we would expect to
try before finding one that results in no collisions. Let reps be the discrete
random variable giving this number. For a given positive integer i, P (reps ≥
i) is the probability that i − 1 successive hash functions fail; i.e.,
Suppose c < 2. Then we can re-index the sum to begin at 0 and apply
Theorem 6.7, yielding
∞
X
E[reps] < (2/c)−i
i=0
2/c
=
(2/c) − 1
2
= .
2−c
Note that the above value is a fixed constant for fixed c < 2. Thus,
the expected number of attempts at finding an appropriate secondary hash
function is bounded by a fixed constant. For example, with c = 1, the value
of this constant is less than 2, or with c = 9/8, the value is less than 16/7.
As a result, we would expect that the number of times a secondary hash
function is applied to any key during the process placing keys in secondary
hash tables is bounded by a constant.
We must now ensure that the total space used by the primary and sec-
ondary hash tables (and hence the time needed to initialize them) is linear
in n, the total number of keys. Suppose the primary hash table has m loca-
tions. Further suppose that ni keys are mapped to index i in the primary
hash table. We will then construct a HashFunction by passing n2i to the
constructor of an implementation providing a c-universal hash family. Due
to the specification of the HashFunction constructor, the actual Hash-
Function constructed may contain up to 3n2i − 1 locations when ni > 0.
The size of the table constructed is therefore linear in n2i . The actual space
used by all of the secondary hash tables is therefore linear in
m−1
X
n2i .
i=0
Let sumsq be a discrete random variable denoting the above sum. The
expected space usage of the secondary hash tables is then linear in E[sumsq].
CHAPTER 7. STORAGE/RETRIEVAL II: UNORDERED KEYS 279
E[sumsq] = 2E[coll] + n.
cn(n − 1)
E[coll] ≤ .
2m
Hence,
E[sumsq] = 2E[coll] + n
cn(n − 1)
≤ + n. (7.6)
m
Thus, if m ∈ Θ(n), the expected number of locations in the primary
hash table and all of the secondary hash tables is in Θ(n). In particular, if
m ≥ n, then E[sumsq] ≤ (c + 1)n. It turns out that the value of m that
minimizes
cn(n − 1)
+n+m
m
is roughly n (see Exercise 7.17); hence, we will construct our primary hash
function by passing n to constructor for an appropriate implementation of
HashFunction.
Of course, we could be unlucky in selecting a primary hash function, so
that the number of secondary locations is much larger than what we expect.
For example, if it happens that all keys hash to the same location, then a
single secondary hash table with at least n2 locations will be used. In order
CHAPTER 7. STORAGE/RETRIEVAL II: UNORDERED KEYS 280
to guarantee linear space usage in the worst case, we therefore need to select
primary hash functions repeatedly until we get one that yields a reasonable
total space usage. Because the space usage is linear in sumsq, we don’t need
to construct the actual secondary hash tables in order to determine whether
the space usage is reasonable — we can instead simply compute sumsq. We
should therefore determine some maximum acceptable value for sumsq.
In order to ensure a reasonable probability of success, we don’t want
this maximum value to be too small. From Markov’s Inequality (5.3), the
probability that a discrete random variable is at least twice its expected
value is at most 1/2, provided its expected value is strictly positive. Based
on (7.6) above, because we will be using a primary table size of at least
n, it makes sense to use 2(c + 1)n as the maximum allowable value for
sumsq. Furthermore, our derivations have assumed that c < 2; hence, we
can simplify the maximum allowable value to 6n. By using this maximum,
we would expect to select no more than 2 primary hash functions, on average,
and still guarantee linear space usage.
We represent an ImmutableDictionary with the following variables:
• hash: a HashFunction;
• if table[i] 6= nil, then the array stored there is indexed 0..s − 1, where
s is the size of functions[i];
PerfectHash.Get(k)
h ← hash.Index(k)
if table[h] = nil
return nil
else
return table[h][functions[h].Index(k)]
Finally, we observe that because E[sumsq] < (c + 1)n, the expected total
number of array locations is no more than
1 ;
• 6n for Hl,m
2 ; or
• 3n for Hp,m
4
• 25n/8 for Hp,l,m .
These last bounds hold regardless of whether we change the bound on the
first repeat loop.
The Get operation is shown in Figure 7.13. It clearly runs in Θ(f (l))
time, where f (l) is the time needed to compute the hash function on a key
of length l.
CHAPTER 7. STORAGE/RETRIEVAL II: UNORDERED KEYS 284
7.7 Summary
If keys are natural numbers, we can implement Dictionary using a VAr-
ray and thus achieve constant-time accesses in the worst case. However,
the space usage of a VArray makes it impractical. For this reason, hash
tables are the preferred implementation in practice. Furthermore, hashing
can be done for arbitrary types of keys.
Deterministic hashing yields data accesses that, in practice, run in amor-
tized time proportional to the length of the key, independent of the number
of data items in the set. This compares very well to the structures pre-
sented in Chapter 6, which give Θ(lg n) access times, where n is the number
of data items in the set. In our analyses in Chapter 6, we did not consider
the key length. Our analyses thus implicitly assumed that keys could be
compared in constant time. Each of the structures in Chapter 6 require
Θ(lg n) comparisons in either the worst, amortized, or expected case, de-
pending on the structure. In the worst case, each of these comparisons
requires a time proportional to the length of the key. As a result, the per-
formance of deterministic hashing is usually significantly better in practice
than those structures given in Chapter 6. The trade-off is that hash tables
do not permit fast access to all of the keys in a predetermined order.
The division method, which computes the value of the key mod the ta-
ble size, is the most common type of hash function. In order for this method
to work well, the table size should be a prime number that is not too close to
a power of 2. The division method is often combined with polynomial hash-
ing in order to produce a single-word index, which can then be converted
to locations in tables of different sizes. Polynomial hashing involves multi-
plying each component of the key by a radix raised to successively higher
powers, retaining only those bits that will fit in a single machine word. The
radix r should use 5 to 9 bits, and should be such that r mod 8 is either 3
or 5.
Though it works very well in practice, in the worst case, deterministic
hashing results in accesses having a running time in Θ(n). We can achieve
better theoretical results using universal hashing, in which a hash function
is selected at random from a universal family of hash functions. When
universal hashing is used, data accesses have an expected amortized running
time proportional to the key length.
Perfect hashing is an application of universal hashing which produces
an ImmutableDictionary. Using the inherent randomization in universal
hashing, we can construct an ImmutableDictionary in expected time
linear in the sum of the key lengths. Retrievals can then be performed by
CHAPTER 7. STORAGE/RETRIEVAL II: UNORDERED KEYS 285
7.8 Exercises
Exercise 7.1 Prove that VArray, shown in Figure 7.2, meets its specifi-
cation.
Exercise 7.5 Prove the following for all integers x and y and all positive
integers m:
a. (x + (y mod m)) mod m = (x + y) mod m.
b. (x(y mod m)) mod m = (xy) mod m.
c. (−(x mod m)) mod m = (−x) mod m.
Exercise 7.6 Show the hash table that results from inserting the following
keys in the order listed, assuming the division method is used with a table
of size 13:
27, 36, 14, 40, 42, 15, 25, 2.
You may assume that no rehashing is done. How does the number of colli-
sions, as defined by the random variable coll in Section 7.2, compare with
the expected number, assuming that distinct keys collide with probability
1/13?
CHAPTER 7. STORAGE/RETRIEVAL II: UNORDERED KEYS 286
* Exercise 7.9 Prove that for each h ∈ Hl,m 1 there is exactly one s ∈ Sl,m
such that hs = h. [Hint: Prove that if s 6= s′ , then hs 6= hs′ . In order to do
this, it is sufficient to find a k such that hs (k) 6= hs′ (k).]
sume the variable p contains a prime number larger than any key. You may
also assume that all values will fit into integer variables.
sume the variable p contains a prime number larger than w bits, where w
is another variable. You may also assume that if a, b, and c are all natural
numbers less than p, then ab + c will fit in an integer variable; however, you
CHAPTER 7. STORAGE/RETRIEVAL II: UNORDERED KEYS 287
may not assume that arbitrarily many of these values added together will
fit.
that for each ai , 1 ≤ ai < p. Show that for every l ≥ 2 and prime number
p, the resulting family of hash functions is not universal. Specifically, show
that there are two distinct keys that collide with probability strictly greater
than 1/p. [Hint: First consider l = 2, then generalize.]
4
Exercise 7.15 Implement HashFunction to provide Hp,l,m using the same
assumptions as for Exercise 7.13.
Exercise 7.18 Prove that the constructor for PerfectHash runs in Θ(nl)
1 is used as the universal hash family, where m is a power
expected time if Hl,m
of 2.
CHAPTER 7. STORAGE/RETRIEVAL II: UNORDERED KEYS 288
Disjoint Sets
In order to motivate the topic of this chapter, let us consider the following
problem. We want to design an algorithm to schedule a set of jobs on a
single server. Each job requires one unit of execution time and has its own
deadline. We must assign a job with deadline d to some time slot t, where
1 ≤ t ≤ d. Furthermore, no two jobs can be assigned to the same time slot.
If we can’t find a time slot for some jobs, we simply won’t schedule them.
One way to construct such a schedule is to assign each job in turn to the
latest available time slot prior to its deadline, provided there is such a time
slot. The challenge here is to find an efficient way of locating the latest
available time slot prior to the deadline.
One way to think about this problem is to partition the time slots into
disjoint sets — i.e., a collection of sets such that no two sets have any
element in common. In this case, each set will contain a nonempty range of
time slots such that the first has not been assigned to a job, but all the rest
have been assigned to jobs. In order to be able to handle the case in which
time slot 1 has been assigned a job, we will also include a time slot 0, which
we will consider to be always available.
Suppose, for example, that we have scheduled jobs in time slots 1, 2,
5, 7, and 8. Each set must have a single available time slot, which must
be the smallest time slot in that set; thus, the elements 0, 3, 4, 6, and
all elements greater than 8 must be in different sets and must each be the
smallest element of its set. If 10 is the latest deadline, our disjoint sets will
therefore be {0, 1, 2}, {3}, {4, 5}, {6, 7, 8}, {9}, and {10}. If we then wish
to schedule a job with deadline 8, we need to find the latest available time
slot prior to 8. This is simply the first time slot in the set containing 8
— namely, 6. Thus, in order to find this time slot, we need to be able to
289
CHAPTER 8. DISJOINT SETS 290
determine which set contains the deadline 8, and what is the first time slot
in that set. When we then schedule the job at time slot 6, the set {6, 7, 8}
no longer contains an available time slot. We therefore need to merge the
set {6, 7, 8} with the set containing 5, namely, {4, 5}.
The operations of finding the set containing a given element and merging
two sets are typical of many algorithms that manipulate disjoint sets. The
operation of finding the smallest element of a given set is not as commonly
needed, so we will ignore this operation for now; however, as we will see
shortly, it is not hard to use an array to keep track of this information.
Furthermore, we often need to manipulate objects other than Nats; how-
ever, we can always store these objects in an array and use their indices as
the elements of the disjoint sets. For this reason, we will simplify matters
by assuming that the elements of the disjoint sets are the Nats 0..n − 1.
In general, the individual sets will be allowed to contain non-consecutive
integers.
The DisjointSets ADT, shown in Figure 8.1, specifies the data struc-
ture we need. Each of the sets contains an element that is distinguished
as its representative. The Find operation simply returns that represen-
tative. Thus, if two calls to Find return the same result, we know that
both elements belong to the same set. The Merge operation takes two
representatives, combines the sets identified by these elements, and returns
the resulting set’s representative. In this chapter, we will consider how the
DisjointSets ADT can be implemented efficiently. Before we do this, how-
ever, let us take a closer look at how the DisjointSets ADT can be used
to implement the scheduling algorithm outlined above.
Note that this invariant implies that the values in the universe are grouped
into trees, where the root k of a tree is denoted by parent[k] = k.
The Merge and Find operations are now straightforward. Merge sim-
ply makes one tree a child of the root of the other, and Find follows the
CHAPTER 8. DISJOINT SETS 293
2 3 6
0 1 4 5
7 8
parent references until the root is reached. The full implementation is shown
in Figure 8.4.
Clearly, the constructor operates in Θ(n) time, and Merge operates in
Θ(1) time. The number of iterations of the while loop in Find is the depth
of k, which in the worst case is the height of the tree. Clearly, the height is
at most n − 1. Unfortunately, this height can be achieved by the sequence
TreeDisjointSets(n)
parent ← new Array[0..n − 1]
// Invariant: For 0 ≤ j < i, parent[j] = j.
for i ← 0 to n − 1
parent[i] ← i
TreeDisjointSets.Find(k)
i←k
// Invariant: i is an ancestor of k.
while parent[i] 6= i
i ← parent[i]
return i
TreeDisjointSets.Merge(i, j)
if i = j or i 6= parent[i] or j 6= parent[j]
error
else
parent[i] ← j
return j
f (h) = 2f (h − 1).
h = lg f (h)
≤ lg k.
parent to the root and returns the root. It can therefore complete its task
by making k a child of the root and returning the root. The resulting Find
algorithm is shown in Figure 8.6.
The rest of the implementation is the same as for ShortDisjointSets;
however, because path compression can decrease the height of a node i
without updating height[i], this value is no longer guaranteed to give the
height of node i. For this reason, we will change the name of this array
CHAPTER 8. DISJOINT SETS 296
ShortDisjointSets(n)
parent ← new Array[0..n − 1]; height ← new Array[0..n − 1]
for i ← 0 to n − 1
parent[i] ← i; height[i] ← 0
ShortDisjointSets.Merge(i, j)
if i = j or i 6= parent[i] or j 6= parent[j]
error
else if height[i] ≤ height[j]
parent[i] ← j
if height[i] = height[j]
height[j] ← height[j] + 1
return j
else
parent[j] ← i
return i
to rank and weaken the structural invariant so that rank[i] is at least the
height of i. The precise meaning of rank[i] is now rather elusive, other than
the fact that it gives an upper bound on the height of i.
Clearly, the running time of CompressedDisjointSets.Find is in O(h),
where h is the height of the tree. Furthermore, the height of a tree in
this implementation can certainly be no larger than it would have been
if no path compression had been done; hence, the upper bound of O(lg n)
shown for ShortDisjointSets.Find also holds for CompressedDisjoint-
Sets.Find. Because we can still construct a tree with height in Θ(lg n) with
a sequence of Merges, we conclude that CompressedDisjointSets.Find
runs in Θ(lg n) time in the worst case. Clearly, CompressedDisjoint-
Sets.Merge runs in Θ(1) time, so that in the worst case, the asymptotic
performance of CompressedDisjointSets is identical to that of Short-
DisjointSets.
On the other hand, because each path compression has the tendency
CHAPTER 8. DISJOINT SETS 297
CompressedDisjointSets.Find(k)
return Compress(k)
// Internal function
Precondition: k is a Nat less than the size of the universe of elements,
and the structural invariant holds.
Postcondition: Returns the representative of the partition containing k,
and makes each other element that was an ancestor of k a child of the
representative. The structural invariant holds.
CompressedDisjointSets.Compress(k)
if parent[k] 6= k
parent[k] ← Compress(parent[k])
return parent[k]
Let rank s and parents denote the values of the rank and parent arrays
in state s. We will define our potential function based on these values. Let
ǫ denote the initial state. Thus, for 0 ≤ i < n, rank ǫ [i] = 0. In order for
Φ to be a valid potential function, we need Φ(ǫ) = 0. To accomplish this,
we let φs (i) = 0 if rank s [i] = 0 for 0 ≤ i < n and any state s. Note that a
node can only obtain a nonzero rank when a Merge makes it the parent of
another node; thus, rank s [i] = 0 iff i is a leaf.
We have two operations we need to consider as we define φs (i) for non-
leaf i. Merge is a cheap operation, having an actual cost of 2, whereas
Find is more expensive in the worst case. We therefore need to amortize
the cost of an expensive Find over preceding Merges. This means that
we need the potential function to increase by some amount — say α(n),
where α is some appropriate function — for at least some of the Merges.
The Merge operation only operates on roots, so let us focus our attention
there. When a Merge is performed, the rank of one node — a root — may
increase by 1, but otherwise, no ranks increase (see Figure 8.5, and recall
that the rank array replaces the height array in this algorithm). Therefore,
let us define φs (i) to be α(n)rank s [i] if i is a root (i.e., if parents [i] = i).
Note that the above definitions are consistent with each other: if i is both
a leaf and a root, its rank is 0, and hence its potential is 0 by either of the
above definitions. Furthermore, if we can ensure that a Merge causes the
potential of no node other than the root of the resulting tree to increase, we
will have a bound of α(n) + 2 on the amortized cost of Merge with respect
to Φ. We still need to define the potentials for nodes that are neither leaves
nor roots in such a way that an expensive Find causes Φ to decrease enough
to offset much of the actual cost.
Consider the effect of Find(j) on a node i that is neither a leaf nor a root.
i’s rank doesn’t change, but its parent may. In particular, it may receive a
parent with different rank than its original parent. It is not hard to show
as a structural invariant that if parent[i] 6= i, then rank[parent[i]] > rank[i].
We would therefore like the potential of i to decrease, generally speaking,
as rank[parent[i]] increases. Furthermore, in order that its potential does
not increase when a Merge changes it from a root to a non-root, we should
have φs (i) ≤ α(n)rank s [i] for all states s and nodes i.
CHAPTER 8. DISJOINT SETS 299
from j to the root, other than leaves or the root, that do not have a proper
ancestor i′ with f (s, i′ ) = f (s, i). Thus, if we can decrease the potential of
each i having a proper ancestor i′ with f (s, i′ ) = f (s, i), without increasing
any potentials, then we will have a bound of α(n) + 2 on the amortized cost
of a Find with respect to Φ.
The behavior described above gives us some insight into how we might
define f , g, and each Ak . First, we would like g to give the maximum
number of times Af (s,i) can be applied to the rank of i without exceeding
the rank of i’s parent. Thus, if i has a proper ancestor i′ with f (s, i′ ) =
f (s, i), and if f (s′ , i) = f (s, i), then g(s′ , i) > g(s, i). As a result, the
potential of i decreases. In order to keep g(s, i) within the proper range, we
should define f (s, i) and Ak so that if we apply Af (s,i) more than rank s (i)
times to rank s (i), we must attain a value of at least Af (s,i)+1 (rank s [i]).
Then if we define f (s, i) to be the maximum k such that rank s [parents [i]] ≥
Ak (rank s [i]), we can never have rank s [parents [i]] ≥ Af (s,i)+1 (rank s [i]).
We still need to define the functions Ak . In order to facilitate this def-
inition, we first define the iteration operator for functions. Let F : N → N.
We then define
F (0) (n) = n
F (k) (n) = F (F (k−1) (n)) for k > 0.
For example, if F (n) = 2n, then F (2) (n) = 4n and F (3) (n) = 8n; more
generally, F (k) (n) = 2k n.
We now define:
(
n+1 if k = 0
Ak (n) = (n+1)
Ak−1 (n) if k ≥ 1.
We can then define, for each node i that is neither a leaf nor a root,
and
(k)
g(s, i) = max{k | rank s [parents [i]] ≥ Af (s,i) (rank s [i])}.
Finally, we need f (s, i) < α(n) whenever i is neither a leaf nor a root.
Thus, we need to ensure that whenever i is neither a leaf nor a root, we have
We have shown that without path compression, the height of a tree never
exceeds lg n; hence, with path compression, the rank of a node never exceeds
lg n. It therefore suffices to define
α(n) = min{k | Ak (1) > lg n}.
As the subscript k increases, Ak (1) increases very rapidly. We leave it
as an exercise to show that
2
··
2·
A4 (1) ≥ 2 ,
where there are 2051 2s on the right-hand side. It is hard to comprehend
just how large this value is, for if the right-hand side contained only six 2s,
the number of bits required to store it would be 265536 + 1. By contrast, the
number of elementary particles in the universe is currently estimated to be
no more than about 2300 . Hence, there is not nearly enough matter in the
universe to store A4 (1) in binary. Because α(n) ≤ 4 for all n < 2A4 (1) , we
can see that α grows very slowly.
To summarize, we define our potential function Φ so that
n−1
X
Φ(s) = φs (i),
i=0
where
0
if rank s [i] = 0
φs (i) = α(n)rank s [i] if parents [i] = i
(α(n) − f (s, i))rank s [i] − g(s, i) otherwise,
Proof: First, because the rank of the parent of i is strictly larger than that
of i, we have
rank s [parents [i]] ≥ rank s [i] + 1
= A0 (rank s [i]),
CHAPTER 8. DISJOINT SETS 302
We are now ready to show that the amortized costs of Merge and Find
are in O(α(n)).
hence, the potentials for all other nodes remain unchanged. The change in
potential for node i is given by
φs′ (i) − φs (i) = (α(n) − f (s′ , i))rank s′ [i] − g(s′ , i) − α(n)rank s [i]
< α(n)(rank s′ [i] − rank s [i])
= 0.
The above theorems show that the amortized running times of Merge
and Find are in O(α(n)). However, α appears to be a somewhat contrived
function. We have argued intuitively that α increases very slowly, but we
have not formally compared it with any better-known slow-growing function
like lg or lg lg. We address this issue more formally in the Exercises. For
now, we will simply state that the collection of functions Ak form a variation
of Ackermann’s function, and that α is one way of defining its inverse. There
have actually been several different 2- or 3-variable functions that have been
called Ackermann’s function, and all grow at roughly the same rapid rate.
8.5 Summary
Tree-based implementations of disjoint sets provide very efficient Merge
and Find operations, particularly when path compression is used. The
worst-case running times for these operations are in Θ(1) and Θ(lg n), re-
spectively, for both ShortDisjointSets and CompressedDisjointSets.
CHAPTER 8. DISJOINT SETS 305
Find
TreeDisjointSets Θ(n)
ShortDisjointSets Θ(lg n)
CompressedDisjointSets O(α(n))
amortized
Notes:
• The Merge operation runs in Θ(1) worst-case time for each operation.
8.6 Exercises
Exercise 8.1 Draw the trees that result from the following sequence of
operations: Corrected
4/12/12.
t ← new TreeDisjointSets(8)
t.Merge(0, 1)
t.Merge(t.Find(1), 2)
t.Merge(3, 4)
t.Merge(5, 6)
t.Merge(t.Find(3), t.Find(6))
t.Merge(t.Find(3), t.Find(0))
tation.
Exercise 8.4 Prove that Schedule, shown in Figure 8.2, meets its speci-
fication.
Exercise 8.17 Using the results of Exercises 8.15 and 8.16, prove by in-
duction on i that n
·2
(i) ··
A2 (n) ≥ 22 ,
where the right-hand side has i 2s.
Exercise 8.19 Using the result of Exercises 8.17 and 8.18, show that
2
··
2·
A4 (1) ≥ 2 ,
Exercise 8.20 For the following, you may use the results of Exercises 8.15
and 8.16.
(k)
b. Prove by induction on k that for k ≥ 4, Ak (1) ≥ A2 (k).
c. Using the results of parts a and b, prove that for each i ∈ N, there is
an ni ∈ N such that whenever n ≥ ni , α(n) ≤ lg(i) (n).
d. Using the result of part c, prove that for each i ∈ N, α(i) ∈ o(lg(i) n).
Graphs
309
CHAPTER 9. GRAPHS 310
indicate the directions of the edges. Conventionally, we draw the edge (u, v)
as an arrow from u to v (see Figure 9.2). For a directed edge (u, v) we say
that v is adjacent to u, but not vice versa (unless (v, u) is also an edge in
the graph).
We usually want to associate some additional information with the ver-
tices and/or the edges. For example, if the graph is used to represent dis-
tances between points on a map, we would want to associate a distance
with each edge. In addition, we might want to associate the name of a city
with each vertex. In order to simplify our presentation, we will focus our
attention on the edges of a graph and any information associated with them.
Specifically, as we did for disjoint sets in the previous chapter, we will adopt
the convention that the vertices of a graph will be designated by natural
CHAPTER 9. GRAPHS 311
Precondition: n is a Nat.
Postcondition: Constructs a Graph with vertices 0, . . . , n − 1 and no
edges.
Graph(n)
Precondition: true.
Postcondition: Returns the number of vertices in the graph.
Graph.Size()
Precondition: i and j are Nats less than the number of vertices, i 6= j,
and x refers to a non-nil item.
Postcondition: Associates x with the edge (i, j), adding this edge if nec-
essary.
Graph.Put(i, j, x)
Precondition: i and j are Nats less than the number of vertices.
Postcondition: Returns the data item associated with edge (i, j), or nil if
(i, j) is not in the graph.
Graph.Get(i, j)
Precondition: i is a Nat less than the number of vertices.
Postcondition: Returns a ConsList of Edges representing the edges
proceeding from vertex i, where an edge (i, j) with data x is represented by
an Edge with source = i, dest = j, and data = x.
Graph.AllFrom(i)
CHAPTER 9. GRAPHS 313
otherwise. We will then reduce the universal sink detection problem to this
variant. Suppose we consider any two distinct vertices, u and v (if there is
only one vertex, clearly it is a universal sink). If (u, v) ∈ E, then u cannot
be a sink. Otherwise, v cannot be a universal sink. Let G′ be the graph
obtained by removing from G one vertex x that is not a universal sink,
along with all edges incident on x. If G has a universal sink w, then w must
also be a universal sink in G′ . We have therefore transformed this problem
to a smaller instance. Because this reduction is a transformation, we can
implement it using a loop.
In order to implement this algorithm using the Graph ADT, we need to
generalize the problem to a subgraph of G comprised of the vertices i, . . . , j
and all edges between them. If j > i, we can then eliminate either i or j
from the range of vertices, depending on whether (i, j) is an edge. Note
that by generalizing the problem in this way, we do not need to modify
the graph when we eliminate vertices — we simply keep track of i and j,
the endpoints of the range of vertices we are considering. If there is an edge
(i, j), we eliminate vertex i by incrementing i; otherwise, we eliminate vertex
j by decrementing j. When all vertices but one have been eliminated (i.e.,
when i = j), the remaining vertex must be the universal sink if there is one.
We can therefore solve the original universal sink detection problem for a
nonempty graph by first finding a candidate vertex i as described above. We
know that if there is a universal sink, it must be i. We then check whether
i is a universal sink by verifying that for every j 6= i, (j, i) is an edge but
(i, j) is not. The resulting algorithm is shown in Figure 9.4.
Lemma 9.1 Every nonempty directed acyclic graph has at least one vertex
with no incoming edges.
A B
C D
the graph; instead, we can simply decrement incount[j] for all j adjacent to i.
To initialize incount, we can simply examine each edge (i, j) and increment
incount[j].
In order to speed up finding the next vertex in the topological sort, let
us keep track of all vertices i for which incount[i] = 0. We can use a Stack
for this purpose. After we initialize incount, we can traverse it once and
push each i such that incount[i] = 0 onto the stack. Thereafter, when we
decrement an entry incount[i], we need to see if it reaches 0, and if so, push
it onto the stack. The algorithm is shown in Figure 9.6.
MatrixGraph(n)
edges ← new Array[0..n − 1, 0..n − 1]
for i ← 0 to n − 1
for j ← 0 to n − 1
edges[i, j] ← nil
MatrixGraph.Size()
return SizeOf(edges[0]) // Number of columns
MatrixGraph.Put(i, j, x)
if i = j
error
else
edges[i, j] ← x
MatrixGraph.Get(i, j)
return edges[i, j]
MatrixGraph.AllFrom(i)
L ← new ConsList()
for j ← 0 to n − 1
if edges[i, j] 6= nil
L ← new ConsList(new Edge(i, j, edges[i, j]), L)
return L
CHAPTER 9. GRAPHS 319
n − 1 times. Its running time is therefore in O(n). The body of the second
for loop therefore runs in Θ(n) time. Because it iterates n times, its running
time is in Θ(n2 ).
The first and third for loops clearly run in Θ(n) time. Furthermore, the
analysis of the fourth for loop is similar to that of the second. Therefore,
the entire algorithm runs in Θ(n2 ) time.
Note that the second and fourth for loops in TopSort each contain a
nested while loop. Each iteration of this while loop processes one of the
edges. Furthermore, each edge is processed at most once by each while
loop. The total number of iterations of each of the while loops is therefore
the number of edges in the graph. While this number can be as large as
n(n − 1) ∈ Θ(n2 ), it can also be much smaller.
The number of edges does not affect the asymptotic running time, how-
ever, because MatrixGraph.AllFrom runs in Θ(n) time, regardless of
how many edges it retrieves. If we can make this operation more efficient,
we might be able to improve the running time for TopSort on graphs with
few edges. In the next section, we will examine an alternative implementa-
tion that accomplishes this.
the edges in the graph, along with the information associated with each
edge.
A partial implementation of ListGraph is shown in Figure 9.8. In Figure 9.8
corrected
addition, the Size operation returns the size of elements, and AllFrom(i) 4/29/10.
returns elements[i].
It is easily seen that the Size and AllFrom operations run in Θ(1)
time, and that the constructor runs in Θ(n) time. Each iteration of the
while loop in the Get operation reduces the size of L by 1. The length of
L is initially the number of vertices adjacent to i. Because each iteration
runs in Θ(1) time, the entire operation runs in Θ(m) time in the worst case,
where m is the number of vertices adjacent to i. The worst case for Get
occurs when vertex j is not adjacent to i. Similarly, it can be seen that the
Put operation runs in Θ(m) time. Note that Θ(m) ⊆ O(n). The space
usage of ListGraph is easily seen to be in Θ(n + a), where a is the number
of edges in the graph.
Let us now revisit the analysis of the running time of TopSort (Figure
9.6), this time assuming that G is a ListGraph. Consider the second for
loop. Note that running time of the nested while loop does not depend on
the implementation of G; hence, we can still conclude that it runs in O(n)
time. We can therefore conclude that the running time of the second for
loop is in O(n2 ). However, because we have reduced the running time of
AllFrom from Θ(n) to Θ(1), it is no longer clear that the running time of
this loop is in Ω(n2 ). Indeed, if there are no edges in the graph, then the
nested while loop will not iterate. In this case, the running time is in Θ(n).
We therefore need to analyze the running time of the nested while loop
more carefully. Notice that over the course of the for loop, each edge is
processed by the inner while loop exactly once. Therefore, the body of the
inner loop is executed exactly a times over the course of the entire outer
loop, where a is the number of edges in G. Because the remainder of the
outer loop is executed exactly n times, the running time of the outer loop
is in Θ(n + a).
We now observe that the fourth loop can be analyzed in exactly the same
way as the second loop; hence, the fourth loop also runs in Θ(n + a) time.
In fact, because the structure of these two loops is quite common for graph
algorithms, this method of calculating the running time is often needed for
analyzing algorithms that operate on ListGraphs.
To complete the analysis of TopSort, we observe that the first and
third loops do not depend on how G is implemented; hence, they both run
in Θ(n) time. The total running time of TopSort is therefore in Θ(n + a).
For graphs in which a ∈ o(n2 ), this is an improvement over the Θ(n2 )
CHAPTER 9. GRAPHS 321
ListGraph(n)
elements ← new Array[0..n − 1]
for i ← 0 to n − 1
elements[i] ← new ConsList()
ListGraph.Put(i, j, x)
if j ≥ SizeOf(elements) or j < 0 or j = i or x = nil
error
else
elements[i] ← AddEdge(new Edge(i, j, x), elements[i])
ListGraph.Get(i, j)
L ← elements[i]
while not L.IsEmpty()
if L.Head().Dest() = j
return L.Head().Data()
else
L ← L.Tail()
return nil
9.5 Multigraphs
Let us briefly consider the building of a ListGraph. We must first con-
struct a graph with no edges, then add edges one by one using the Put
operation. The constructor runs in Θ(n) time. The Put operation runs
in Θ(m) time, where m is the number of vertices adjacent to the source
of the edge. It is easily seen that the time required to build the graph is
in O(n + a min(n, a)), where a is the number of edges. It is not hard to Corrected
3/28/12.
match this upper bound using graphs in which the number of vertices with
outgoing edges is minimized for the given number of edges. An example of
a sparse graph (specifically, with a ≤ n) that gives this behavior is a graph
whose edge set is
{(0, j) | 1 ≤ j < a}.
A dense graph giving this behavior is a complete graph — a graph in which
every possible edge is present. Note that in terms of the number of vertices,
the running time for building a complete graph is in Θ(n3 ).
Building a ListGraph is expensive because the Put operation must
check each edge to see if it is already in the graph. We could speed this
activity considerably if we could avoid this check. However, the check is
necessary not only to satisfy the operation’s specification, but also to main-
tain the structural invariant. If we can modify the specification of Put and
weaken the structural invariant so that parallel edges (i.e., multiple edges
from a vertex i to a vertex j) are not prohibited, then we can build the
graph more quickly.
We therefore extend the definitions of undirected and directed graphs to
allow parallel edges from one vertex to another. We call such a structure a
multigraph. We can then define the Multigraph ADT by modifying the
following postconditions in the specification of Graph:
• Put(i, j, x): Adds the edge (i, j) and associates x with it.
• Get(i, j): Returns the data item associated with an edge (i, j), or nil
if (i, j) is not in the graph.
We could also specify additional operations for retrieving all edges (i, j) or
modifying the data associated with an edge, but this specification is sufficient
for our purposes.
We will represent a Multigraph using adjacency lists in the same way
as in the ListGraph implementation. The structural invariant will be mod-
ified to allow parallel edges (i, j) for the same i and j. The implementation
of the operations remains the same except for Put, whose implementation is
CHAPTER 9. GRAPHS 324
ListMultigraph.Put(i, j, x)
if j ≥ SizeOf(elements) or j < 0 or x = nil
error
else
elements[i] ← new ConsList(new Edge(i, j, x), elements[i])
shown in Figure 9.9. It is easily seen that a ListMultigraph can be built Figure 9.9
corrected
in Θ(n + a) time, where n is the number of vertices and a is the number of 5/7/12.
edges.
Because it is more efficient to build a ListMultigraph than to build
a ListGraph, it may be advantageous to represent a graph using a List-
Multigraph. If we are careful never to add parallel edges, we can maintain
an invariant that the ListMultigraph represents a graph. This transfers
the burden of maintaining a valid graph structure from the Put operation
to the code that invokes the Put operation.
Although we can use a ListMultigraph to represent a graph, an inter-
esting problem is how to construct a ListGraph from a given ListMulti-
graph. Specifically, suppose we wish to define a ListGraph constructor
that takes a ListMultigraph as input and produces a ListGraph with
the same vertices and edges, assuming the ListMultigraph has no par-
allel edges. We may be able to do this more efficiently than simply calling
ListGraph.Put repeatedly.
One approach would be to convert the ListMultigraph to a Matrix-
Graph, then convert the MatrixGraph to a ListGraph. In fact, we
do not really need to build a MatrixGraph — we could simply use a
two-dimensional array as a temporary representation of the graph. As we
examine each edge of the ListMultigraph, we can check to see if it has
been added to the array, and if not, add it to the appropriate adjacency
list. If we ever find parallel edges, we can immediately terminate with an
error. Each edge can therefore be processed in Θ(1) time, for a total of Θ(a)
time to process the edges. Unfortunately, Θ(n2 ) time is required to initialize
the array, and the space usage of Θ(n2 ) is rather high, especially for sparse
graphs. However, the resulting time of Θ(n2 ) is still an improvement (in
CHAPTER 9. GRAPHS 325
most cases) over the Θ(a min(n, a)) worst-case time for repeatedly calling
ListGraph.Put.
In the above solution, the most natural way of processing the edges of
the ListMultigraph is to consider each vertex i in turn, and for each
i, to process all edges proceeding from i. As we are processing the edges
from vertex i, we only need row i of the array. We could therefore save a
significant amount of space by replacing the two-dimensional array with a
singly-dimensioned array A[0..n − 1]. Before we consider any vertex i, we
initialize A. For each edge (i, j), we check to see if it has been recorded in
A[j]. If so, we have found a parallel edge; otherwise, we record this edge in
A[j]. Thus, we have reduced our space usage to Θ(n). However, because A
must be initialized each time we consider a new vertex, the overall running
time is still in Θ(n2 ).
Note that if we ignore the time for initializing A, this last solution runs
in Θ(n + a) time in the worst case. We can therefore use the virtual ini-
tialization technique of Section 7.1 to reduce the overall running time to
Θ(n + a). However, the technique of virtual initialization was inspired by
the need to avoid initializing large arrays, not to avoid initializing small ar-
rays many times. If we are careful about the way we use the array, a single
initialization should be significant. Thus, if we can find a way to avoid re-
peated initializations of A, we will be able to achieve Θ(n + a) running time
without using virtual initialization.
Consider what happens if we simply omit every initialization of A except
the first. If, when processing edge (i, j), we find edge (i′ , j), for some i′ <
i, recorded in A[j], then we know that no other edge (i, j) has yet been
processed. We can then simply record (i, j) in A[j], as if no edge had been
recorded there. If, on the other hand, we find that (i, j) has already been
recorded in A[j], we know that we have found a parallel edge.
As a final simplification to this algorithm, we note that there is really
no reason to store Edges in the array. Specifically, the only information we
need to record in A[j] is the most recent (i.e., the largest) vertex i for which
an edge (i, j) has been found. Thus, A can be an array of integers. We
can initialize A to contain only negative values, such as −1, to indicate that
no such edge has yet been found. The resulting ListGraph constructor is
shown in Figure 9.10. It is easily seen to run in Θ(n + a) time and use Θ(n)
temporary space in the worst case, where n and a are the number of vertices
and unique edges, respectively, in the given ListMultigraph.
CHAPTER 9. GRAPHS 326
9.6 Summary
Graphs are useful for representing relationships between data items. Various
algorithms can then be designed for manipulating graphs. As a result, we
can often use the same algorithm in a variety of different applications.
Graphs may be either directed or undirected, but we can treat undirected
graphs as directed graphs in which for every edge (u, v), there is a reverse
edge (v, u). We then have two implementations of graphs. The adjacency
matrix implementation has Get and Put operations that run in Θ(1) time,
but its AllFrom operation runs in Θ(n) time, where n is the number of
vertices in the graph. Its space usage is in Θ(n2 ). On the other hand, the
adjacency list implementation has an AllFrom operation that runs in Θ(1)
time, but its Get and Put operations run in Θ(m) time in the worst case,
where m is the number of vertices adjacent to the given source vertex. Its
space usage is in Θ(n + a) where n is the number of vertices and a is the
number of edges.
In order to improve the running time of the Put operation — and hence
of building a graph — when using an adjacency list, we can relax our defi-
nition to allow parallel edges. The resulting structure is known as a multi-
graph. We can always use a multigraph whenever a graph is required, though
it might be useful to maintain an invariant that no parallel edges exist. Fur-
thermore, we can construct a ListGraph from a ListMultigraph with
no parallel edges in Θ(n + a) time and Θ(n) space, where n is the number
of vertices and a is the number of edges. Figure 9.11 shows a summary of
the running times of these operations for each of the implementations of
Graph, as well as for ListMultigraph.
9.7 Exercises
Exercise 9.1 Prove that UniversalSink, shown in Figure 9.4, meets its
specification.
Exercise 9.2 Prove that TopSort, shown in Figure 9.6, meets its specifi-
cation.
Notes:
• The Size operation runs in Θ(1) worst-case time for all implementa-
tions.
Thus, G′ contains the same edges as does G, except that they are reversed
in G′ . Express the running time of your algorithm as simply as possible
using Θ-notation in terms of the number of vertices n and the number of
edges a, assuming the graphs are implemented using
a. MatrixGraph
b. ListGraph
c. ListMultigraph.
b. ListGraph
Exercise 9.7 Prove that MatrixGraph, shown in Figure 9.7, meets its
specification.
Exercise 9.8 Prove that the ListGraph constructor shown in Figure 9.10
meets its specification.
Exercise 9.9 Give an algorithm that takes as input a graph G and returns
true iff G is undirected; i.e., if for every edge (u, v), (v, u) is also an edge.
Give the best upper bound you can on the running time, expressed as simply
as possible using big-O notation in terms of the number of vertices n and
the number of edges a, assuming the graph is implemented using Matrix-
Graph.
331
Chapter 10
In Part I of this text, we introduced several techniques for applying the top-
down approach to algorithm design. We will now take a closer look at some
of these techniques. In this chapter, we will look at the divide-and-conquer
technique.
As we stated in Chapter 3, the divide-and-conquer technique involves
reducing a large instance of a problem to one or more instances having
a fixed fraction of the size of the original instance. For example, recall
that the algorithm MaxSumDC, shown in Figure 3.3 on page 77, reduces
large instances of the maximum subsequence sum problem to two smaller
instances of roughly half the size.
Though we can sometimes convert divide-and-conquer algorithms to it-
erative algorithms, it is usually better to implement them using recursion.
One reason is that typical divide-and-conquer algorithms implemented using
recursion require very little stack space to support the recursion. If we di-
vide an instance of size n into instances of size n/b whenever n is divisible by
b, we can express the total stack usage due to recursion with the recurrence
f (n) ∈ f (n/b) + Θ(1).
Applying Theorem 3.32 to this recurrence, we see that f (n) ∈ Θ(lg n). The
other reason for retaining the recursion is that when a large instance is
reduced to more than one smaller instance, removing the recursion can be
difficult and usually requires the use of a stack to simulate at least one
recursive call.
Because divide-and-conquer algorithms are typically expressed using re-
cursion, the analysis of their running times usually involves the asymptotic
solution of a recurrence. Theorem 3.32 almost always applies to this recur-
rence. Not only does this give us a tool for analyzing running times, it also
332
CHAPTER 10. DIVIDE AND CONQUER 333
can give us some insight into what must be done to make an algorithm more
efficient. We will explain this concept further as we illustrate the technique
by applying it to several problems.
If we set m = ⌊n/2⌋, then each of the smaller polynomials has roughly n/2
terms.
The product polynomial is now
pq(x) = p0 (x)q0 (x) + xm (p0 (x)q1 (x) + p1 (x)q0 (x)) + x2m p1 (x)q1 (x). (10.1)
To obtain the coefficients of pq, we can first compute the four products of
the smaller polynomials. We can then obtain any given coefficient of pq
by performing at most two additions. We can therefore obtain all 2n −
1 coefficients in Θ(n) time after the four smaller products are computed.
Setting m = ⌊n/2⌋, we can describe the running time of this divide-and-
conquer algorithm with the following recurrence:
Note that all four of the terms in the right-hand-side above appear in the
product pq (see (10.1) above). In order to make this fact useful, however, we
need to be able to separate out the first and last terms. We can do this by
CHAPTER 10. DIVIDE AND CONQUER 335
computing the products p0 (x)q0 (x) and p1 (x)q1 (x), then subtracting. Thus,
we can compute the product pq using the following three products:
We can then compute any given coefficient of pq with at most two subtrac-
tions and one addition.
The algorithm is shown in Figure 10.1. This implementation uses the Figure 10.1
corrected
Copy function specified in Figure 1.18 on page 22. Note that when we 5/7/12.
CHAPTER 10. DIVIDE AND CONQUER 336
• If two elements in the same input array have equal keys, they remain
in the same order in the output array.
CHAPTER 10. DIVIDE AND CONQUER 337
• If an element x from the first input array has a key equal to some
element y in the second input array, then x must precede y in the
output array.
Suppose we are given two sorted arrays. If either is empty, we can simply
use the other. Otherwise, the element with minimum key in the two arrays
needs to be first in the sorted result. The element with minimum key in each
array is the first element in the array. We can therefore determine the overall
minimum by comparing the keys of the first elements of the two arrays. If
the keys are equal, in order to ensure stability, we must take the element
from the first array. To obtain the remainder of the result, we merge the
remainder of the two input arrays. We have therefore transformed a large
instance of merging to a smaller instance.
Putting it all together, we have the MergeSort algorithm shown in
Figure 10.2. Note that in the Merge function, when the loop terminates,
either i > m or j > n. Hence, either A[i..m] or B[j..n] is empty. As a
result, only one of the two calls to Copy (see Figure 1.18 on page 22 for its
specification) will have any effect.
It is easily seen that Merge runs in Θ(m + n) time. Therefore, the time
required for MergeSort excluding the recursive calls is in Θ(n). For n > 1
a power of 2, the following recurrence gives the worst-case running time of
MergeSort:
f (n) ∈ 2f (n/2) + Θ(n).
From Theorem 3.32, f (n) ∈ Θ(n lg n).
By solving the resulting Dutch national flag problem, we will have parti-
tioned the array into three sections:
• The first section consists of all elements with keys less than p.Key().
• The second section consists of all elements with keys equal to p.Key().
• The third section consists of all elements with keys greater than p.Key().
By sorting the first and third sections, we will have sorted the array. This
general strategy is known as quick sort.
Following the divide-and-conquer paradigm, we would like to select the
pivot so that after the array has been partitioned, the first and third sections
have roughly the same number of elements. Thus, the median element would
be a good choice for p. As we will show in the next section, it is possible to
find the median in Θ(n) time in the worst case. We saw in Section 3.6 that
the Dutch national flag problem can be solved in Θ(n) time. Because each
of the two subproblems is at most half the size of the original problem, we
can bound the running time of this sorting algorithm with the recurrence
f (n) ∈ f (n − 1) + Θ(n).
From Theorem 3.31, f (n) ∈ Θ(n2 ), so that the running time for this algo-
rithm is in Ω(n2 ) in the worst case. Observing that each element is chosen
as a pivot at most once, we can easily see that O(n2 ) is an upper bound
on the running time, so that the algorithm runs in Θ(n2 ) time in the worst
case.
It turns out that the versions of quick sort used most frequently do, in
fact, run in Θ(n2 ) time in the worst case. However, choosing the first element
CHAPTER 10. DIVIDE AND CONQUER 340
(or the last element) as the pivot is a bad idea, because an already-sorted
array yields the worst-case performance. Furthermore, the performance is
nearly as bad on a nearly-sorted array. To make matters worse, it is not
hard to see that when the running time is in Θ(n2 ), the stack usage is in
Θ(n). Because we often need to sort a nearly-sorted array, we don’t want
an algorithm that performs badly in such cases.
The above analyses illustrate that it is better for the pivot element to be
chosen to be near the median than to be near the smallest (or equivalently,
the largest) element. More generally, it illustrates why divide-and-conquer
is often an effective algorithm design strategy: when a problem is reduced
to multiple subproblems, it is best if these subproblems are the same size.
For quick sort, we need a way to choose the pivot element quickly in such a
way that it tends to be near the median.
One way to accomplish this is to choose the pivot element randomly.
This algorithm is shown in Figure 10.3. In order to make the presentation
easier to follow, we have specified the algorithm so that the array is indexed
with arbitrary endpoints.
Let us now analyze the expected running time of QuickSort on an
array of size n. We first observe that for any call in which lo < hi, the loop
will execute at least once. Furthermore, by an easy induction on n, we can
show that at most n + 1 calls have lo ≥ hi. Because each of these calls
requires Θ(1) time, a total of at most O(n) time is used in processing the
base cases. Otherwise, the running time is proportional to the number of
times the loop executes over the course of the algorithm.
Each iteration of the loop involves comparing one pair of elements. For a
given call to QuickSort, the pivot is compared to all elements currently in
the array, then is excluded from the subsequent recursive calls. Thus, once a
pair of elements is compared, they are never compared again on subsequent
loop iterations (though they may be compared twice in the same iteration —
once in each if statement). The total running time is therefore proportional
to the number of pairs of elements that are compared. We will only concern
ourselves with pairs of distinct elements, as this will only exclude O(n) pairs.
Let F [1..n] be the final sorted array, and let comp be a discrete random
variable giving the number of pairs (i, j) such that 1 ≤ i < j ≤ N and F [i]
is compared with F [j]. We wish to compute E[comp]. Let cij denote the
CHAPTER 10. DIVIDE AND CONQUER 341
We observe that F [i] is compared with F [j] iff one of them is in the
subarray being sorted when the other is chosen as the pivot. Furthermore,
two elements F [i] and F [j] are in the same subarray as long as no element
k such that
F [i].Key() ≤ F [k].Key() ≤ F [j].Key()
is chosen as the pivot. Thus, the probability that F [i] and F [j] are compared
is the probability that one of them is chosen as pivot before any other F [k]
satisfying the above inequality. Because there are at least j − i + 1 elements
F [k] satisfying the above inequality when j > i,
2
P (cij ) ≤ .
j−i+1
We therefore have
n X
X n
E[comp] = P (cij )
i=1 j=i+1
n X n
X 2
≤
j−i+1
i=1 j=i+1
n n−i+1
X X 1
=2 . (10.3)
j
i=1 j=2
Tight bounds for Hn are given by the following theorem, whose proof is left
as an exercise.
CHAPTER 10. DIVIDE AND CONQUER 343
ln(n + 1) ≤ Hn ≤ 1 + ln n.
10.4 Selection
In Section 1.1, we introduced the selection problem. Recall that this problem
is to find the kth smallest element of an array of n elements. We showed
that it can be reduced to sorting. Using either heap sort or merge sort,
we therefore have an algorithm for this problem with a running time in
Θ(n lg n). In this section, we will improve upon this running time.
CHAPTER 10. DIVIDE AND CONQUER 344
Section 2.4 shows that the selection problem can be reduced to the Dutch
National Flag problem and a smaller instance of itself. This reduction is
very similar to the reduction upon which quick sort is based. Specifically,
we choose a pivot element p and solve the resulting Dutch national flag
problem as we did for the quick sort reduction. Let r and w denote the
numbers of red items and white items, respectively. We then have three
cases:
Due to the similarity of this algorithm to quick sort, some of the same
problems arise in choosing the pivot element appropriately. For example, if
we always use the first element as the pivot, then selecting the nth smallest
element in a sorted array of n distinct elements always results in a recursive
call with all but one of the original elements. As we saw in Section 10.3, this
yields a running time in Θ(n2 ). On the other hand, it is possible to show
that selecting the pivot at random yields an expected running time in Θ(n)
— the details are left as an exercise.
Our goal is to construct a deterministic algorithm with worst-case run-
ning time in O(n). As we saw in Section 10.3, quick sort achieves a better
asymptotic running time if the median is chosen as the pivot. It stands to
reason that such a choice might be best for the selection algorithm. We
mentioned in Section 10.3 that it is possible to find the median in O(n)
time. The way to do this is to use our linear-time selection algorithm to find
the ⌈n/2⌉nd smallest element. However, this doesn’t help us in designing
the linear-time selection algorithm because the reduction is not to a smaller
instance.
Instead, we need a way to approximate the median well enough so that
the resulting algorithm runs in O(n) time. Consider the following strategy
for approximating the median. First, we arrange the n elements into an
M × ⌊n/M ⌋ array, where M is some fixed odd number. If n is not evenly
divisible by M , we will have up to M − 1 elements that will not fit in the
array — we will ignore these elements for now. Suppose we sort each column
in nondecreasing order. Further suppose that we order the columns (keeping
each column intact) so that the middle row is in nondecreasing order. We
then select an element in the center of the array as the pivot p (see Figure
10.4).
CHAPTER 10. DIVIDE AND CONQUER 345
≤p
≥p
smaller selection problem after the Dutch national flag algorithm has been
applied. This latter recursive call has no more than about 3n/4 elements.
The recurrence describing the running time is therefore of the form
as 1 − c > 0.
Base: We must still show the claim for 0 < n < n2 . In other words, we
need b ≥ f (n)/n for 0 < n < n2 . We can satisfy this constraint and the one
above if b = max{a/(1 − c), f (n)/n | 0 < n < n2 } (note that because this
set is finite and nonempty, it must have a maximum element).
LinearSelect(A[1..n], k)
if n ≤ 4
Sort(A[1..n])
return A[k]
else
m ← ⌊n/5⌋; T ← new Array[1..m]
for i ← 1 to m
Sort(A[5i − 4..5i]); T [i] ← A[5i − 2]
p ← LinearSelect(T [1..m], ⌈m/2⌉); r ← 0; w ← 0; b ← 0
// Invariant: r, w, b ∈ N, r + w + b ≤ n, and A[i] < p for 1 ≤ i ≤ r,
// A[i] = p for n − b − w < i ≤ n − b, and A[i] > p for n − b < i ≤ n.
while r + w + b < n
j ←n−b−w
if A[j] < p
r ← r + 1; A[j] ↔ A[r]
else if A[j] = p
w ←w+1
else
A[j] ↔ A[n − b]; b = b + 1
if r ≥ k
return LinearSelect(A[1..r], k)
else if r + w ≥ k
return p
else
return LinearSelect(A[r + w + 1..n], k − r − w)
CHAPTER 10. DIVIDE AND CONQUER 350
Figure 10.6 Multiplication and division functions for use with the BigNum
ADT defined in Figure 4.18 on page 146
The largest such n is therefore 10 + 4 = 14. Then for all n ≥ 15, recurrence
(10.5) gives an upper bound on the running time of LinearSelect. From
Theorem 10.2, the running time is in O(n), and hence in Θ(n).
Various performance improvements can be made to LinearSelect. For
example, if n = 5, there is no reason to apply the Dutch national flag
algorithm after sorting the array — we can simply return A[k]. In other
words, it would be better if the base case included n = 5, and perhaps some
larger values as well. Furthermore, sorting is not the most efficient way to
solve the selection problem for small n. We explore some alternatives in the
exercises.
Even with these performance improvements, however, LinearSelect
does not perform nearly as well as the randomized algorithm outlined at
the beginning of this section. Better still is using a quick approximation
of the median, such as finding the median of the first, middle, and last
elements, as the value of p. This approach yields an algorithm whose worst-
case running time is in Θ(n2 ), but which typically performs better than even
the randomized algorithm.
DivideDC(u, v)
m ← u.NumBits(); n ← v.NumBits()
if n = 1
return u
else
if n mod 2 = 1
u ← u.Shift(1); m ← m + 1
v ← v.Shift(1); n ← n + 1
digLen ← n/2; numDig ← ⌈m/digLen⌉
qLen ← digLen × (numDig − 1)
qBits ← new Array[0..qLen − 1]
vFirst ← new BigNum(v.GetBits(digLen, digLen))
rem ← new BigNum(u.GetBits(qLen, digLen))
// Invariant: rem < v and
// v × new BigNum(qBits[i + digLen..qLen − 1]) + rem =
// new BigNum(u.GetBits(i + digLen, m − i − digLen)).
for i ← qLen − digLen to 0 by −digLen
next ← new BigNum(u.GetBits(i, digLen))
rem ← rem.Shift(digLen).Add(next)
if rem.CompareTo(v) < 0
qDig ← zero.GetBits(0, digLen)
else
approx ← DivideDC(rem.Shift(−digLen), vFirst)
prod ← Multiply(v, approx)
// Invariant: prod = v × approx, approx ≥ ⌊rem/v⌋, and the
// invariant for the outer loop.
while prod.CompareTo(rem) > 0
approx ← approx.Subtract(one)
prod ← prod.Subtract(v)
qDig ← approx.GetBits(0, digLen)
Copy(qDig[0..digLen − 1], qBits[i..i + digLen − 1])
return new BigNum(qBits)
CHAPTER 10. DIVIDE AND CONQUER 354
because the n/2 low-order bits of the numerator do not affect the value of
the expression. Furthermore, the right-hand side above is no larger than
⌊rem/v⌋. Thus,
j r k r
approx − ⌊rem/v⌋ ≤ −
vFirst vFirst + 1
r r − vFirst
≤ −
vFirst vFirst + 1
r(vFirst + 1) − vFirst(r − vFirst)
=
vFirst(vFirst + 1)
r + vFirst2
= .
vFirst(vFirst + 1)
Now because rem < v2n/2 , it follows that r < vFirst×2n/2 . We therefore
have
r + vFirst2
approx − ⌊rem/v⌋ ≤
vFirst(vFirst + 1)
vFirst × 2n/2 + vFirst2
<
vFirst(vFirst + 1)
2n/2 + vFirst
=
vFirst + 1
2n/2 vFirst
= + .
vFirst + 1 vFirst + 1
Because vFirst contains n/2 significant bits, its value must be at least
2n/2−1 . The value of the first term on the right-hand side above is therefore
strictly less than 2. Clearly, the value of the second term is strictly less than
1, so that the right-hand side is strictly less than 3. Because the left-hand
side is an integer, its value must therefore be at most 2. It follows from
the while loop invariant that the loop terminates when approx = ⌊rem/v⌋.
Because this loop decrements approx by 1 each iteration, it must iterate at
most twice. Its running time is therefore in Θ(n).
It is now easily seen that, excluding the recursive call, the running time
of the body of the main loop is dominated by the running time of the
multiplication. Because the result of the multiplication contains at most
3n/2 + 1 significant bits, this multiplication can be done in Θ(nlg 3 ) time
using the multiplication algorithm suggested at the beginning of this section.
CHAPTER 10. DIVIDE AND CONQUER 355
numDig − 1 = ⌈m/digLen⌉ − 1
m
= −1
n/2
= ⌈2m/n⌉ − 1.
Thus, the running time of the main loop, excluding the recursive call, is in
We now observe that for even n, there are in the worst case ⌈2m/n⌉ − 1
recursive calls. For odd n, the worst-case number of recursive calls is
⌈2(m + 1)/(n + 1)⌉ − 1. The resulting recurrence is therefore quite com-
plicated. However, consider the parameters of the recursive call. We have
already shown that rem < v2n/2 , and vFirst = ⌊v2−n/2 ⌋. This recursive call
therefore divides a value strictly less than v by ⌊v2−n/2 ⌋. Thus, in any of
these calls, the dividend is less than the divisor plus 1, multiplied by 2n ,
where n is the number of bits in the divisor. In addition, it is easily seen
that the dividend is never less than the divisor. Furthermore, if these rela-
tionships initially hold for odd n, they hold for the recursive call in this case
as well. We therefore will first restrict our attention to this special case.
Let n, the number of significant bits in v, be even. If
v ≤ u < (v + 1)2n ,
⌈2m/n⌉ − 1 ≤ ⌈4n/n⌉ − 1
= 3.
Because each iteration may contain a recursive call, this suggests that there
are a total of at most 3 recursive calls. However, note that whenever a
recursive call is made, the dividend is no less than the divisor, so that a
nonzero digit results in the quotient. Suppose the first of the three digits
of the quotient is nonzero. Because the first n bits of u are at most v, the
only possible nonzero result for the first digit is 1. The remainder of the
quotient is then formed by dividing a value strictly less than 2n by v, which
is at least 2n−1 . This result is also at most 1, so that the second digit must
be 0. We conclude that no more than two recursive calls are ever made. In
each of these recursive calls, the divisor has n/2 bits.
CHAPTER 10. DIVIDE AND CONQUER 356
v ≤ u < (v + 1)2n ,
becomes
f (n) ∈ 2f (⌈n/2⌉) + O(n lg n lg lg n)
for n > 1. Applying Theorem 3.32 to this recurrence yields
DivideRecip(u, v)
if u.CompareTo(v) < 0
return zero
else
m ← u.NumBits(); n ← v.NumBits()
r ← Reciprocal(v, m − n)
q ← Multiply(u, r).Shift(1 − r.NumBits() − n);
prod ← Multiply(v, q)
if prod.CompareTo(u.Subtract(v)) ≤ 0
q ← q.Add(one)
else if prod.CompareTo(u) > 0
q ← q.Subtract(one)
return q
Reciprocal(y, k)
x0 x1 f (x)
1/x0 − y
x1 = x0 −
−x−2
0
= x0 + x0 − yx20
= 2x0 − yx20 .
x1 = 2x0 − yx20
1+ǫ 2
2(1 + ǫ)
= −y
y y
2 + 2ǫ − 1 − 2ǫ − ǫ2
=
y
1−ǫ 2
= . (10.6)
y
Thus, each iteration squares the error term ǫ. For 1/2 ≤ y < 1, it is not
hard to show that an initial estimate of 3/2 yields
We need for this value to differ from 1/y by at most 2−k ; i.e., we need
β 2αβ
+ α β − yα − γ ≤ 2−k .
2 2
+ (10.7)
y2 y
Note that because y > 0, β ≥ 0, and γ ≥ 0, all terms except the second
are always nonnegative. In order to ensure that the inequality holds when
the value inside the absolute value bars is nonnegative, we can therefore
ignore the last two terms. We therefore need
β 2αβ
2
+ + α2 β ≤ 2−k .
y y
If we replace α by |α| in the above inequality, the left-hand side does not
decrease. For fixed α and β, the resulting left-hand side is maximized when
y is minimized. Setting y to its minimum possible value of 1/2, it therefore
suffices to ensure that
4β + 4|α|β + α2 β ≤ 2−k .
In order to keep the first term sufficiently small, we need β < 2−k−2 . In
order to leave room for the other two terms, let us take β ≤ 2−k−3 . In other
words, we will use the first k + 3 bits of y in applying the iteration. Then
as long as |α| ≤ 1/2, we have
Let us now consider the case in which the value inside the absolute value
bars in (10.7) is negative. We can now ignore the first and third terms. We
therefore need
2αβ
yα2 + γ − ≤ 2−k .
y
Here, we can safely replace α by −|α|. For fixed α, β, and γ in the resulting
inequality, the first term is maximized when y is maximized, but the third
term is maximized when y is minimized. It therefore suffices to ensure that
α2 + γ + 4|α|β ≤ 2−k .
Again taking β ≤ 2−k−3 , we only need |α| ≤ 2−(k+1)/2 and γ ≤ 2−k−2 . We
then have
α2 + γ + 4|α|β ≤ 2−k−1 + 2−k−2 + 2k−1−(k+1)/2
≤ 2−k ,
provided k ≥ 1.
We can satisfy the constraints on α and γ by finding an approximation
within 2−⌈(k+1)/2⌉ , and returning k + 3 bits of the result of applying the
iteration (recall that the result has one bit to the left of the radix point).
Note that if we take k as the size of the problem instance, we are reducing
the problem to an instance roughly half the original size. We therefore have
a divide-and-conquer algorithm.
In order to complete the algorithm, we need to handle the base cases.
Because ⌈(k + 1)/2⌉ < k only when k > 2, these cases occur for k ≤ 2. It
turns out that these cases are important for ensuring that the approximation
is at least 1 and strictly less than 2. From (10.6), the result of the iteration
is never more than 1/y (here y denotes the portion we are actually using
in computing the iteration). Thus, if y > 1/2, the estimate is less than 2.
Furthermore, if y = 1/2, an initial estimate less than 2 will ensure that some
error remains, so that the result is still strictly less than 2. Finally, provided
ǫ < 1, the result is always closer to 1/y than the initial estimate. Thus, if
we make sure that our base case gives a value that is less than 2 and no
worse an estimate than 1 would be, the approximation will always be in the
proper range.
We leave it as an exercise to show that the estimate
11 − ⌊8y⌋
4
satisfies the specification and the requirements discussed above for k ≤ 2.
⌊8y⌋ is simply the first 3 bits of y. Because 1/2 ≤ y < 1, 4 ≤ ⌊8y⌋ < 8. The
CHAPTER 10. DIVIDE AND CONQUER 363
RecipNewton(y, k)
n ← y.NumBits(); len ← k + 3
if k ≤ 2
return eleven.Subtract(y.Shift(3 − n))
else
x0 ← RecipNewton(y, ⌈(k + 1)/2⌉)
y ← y.Shift(len − n)
t ← x0 .Shift(len + x0 .NumBits())
x1 ← t.Subtract(Multiply(y, Multiply(x0 , x0 )))
return x1 .Shift(len − x1 .NumBits())
is smooth, from Exercise 3.18, the time required for the two multiplications
is in O(M (k)).
Because the remainder of the operations, excluding the recursive call,
run in linear time, the total time excluding the recursive call is in O(M (k)).
The total running time is therefore given by the recurrence
k+1
f (k) ∈ f + O(M (k))
2
f1 (k) = f (k + 1)
k+2
∈f + O(M (k + 1))
2
k
=f + 1 + O(M (k + 1))
2
= f1 (⌈k/2⌉) + O(M (k + 1))
= f1 (⌈k/2⌉) + O(M (k)),
because M is smooth.
In order to be able to apply Theorem 3.32 to f1 , we need additional
assumptions on M . We therefore assume that M (k) = k q g(k), where q ≥ 1
and g1 (k) = g(2k+2 ) is smooth. (Note that the functions k lg 3 and k lg k lg lg k
both satisfy these assumptions on M .) Then from Theorem 3.32, f1 (k) ∈
O(M (k)). Because M is smooth, f (k) = f1 (k − 1) ∈ O(M (k)).
We can now analyze the running time of DivideRecip. If m < n, the
running time is clearly in Θ(1). Suppose m ≥ n. Then the value r returned
by Reciprocal(v, m − n) has m − n + 3 bits in the worst case. Hence,
the result of the first multiplication has 2m − n + 3 bits in the worst case.
The worst-case running time of this multiplication is therefore in O(M (m)).
q then has m − n + 1 bits in the worst case. The result of the second
multiplication therefore has m + 1 bits in the worst case, and hence runs
in O(M (m)) time. Because Reciprocal runs in O(M (m − n)) time, and
the remaining operations run in O(m) time, the overall running time is in
O(M (m)). The running time for DivideRecip is therefore the same as for
multiplication, even if our multiplication algorithm runs in O(n lg n lg lg n)
time.
CHAPTER 10. DIVIDE AND CONQUER 365
10.7 Summary
The divide-and-conquer technique involves reducing large instances of a
problem to one or more smaller instances, each of which is a fraction of
the size of the original problem. The running time of the resulting algo-
rithm can typically be analyzed by deriving a recurrence to which Theorem
3.32 applies. Theorem 3.32 can also suggest how to improve a divide-and-
conquer algorithm.
Some variations of the divide-and-conquer technique don’t completely fit
the above description. For example, quick sort does not necessarily produce
subproblems whose sizes are a fraction of the size of the original array. As a
result, Theorem 3.32 does not apply. However, we still consider quick sort to
be a divide-and-conquer algorithm because its goal is to partition an array
into two arrays of approximately half the size of the input array, and to
sort these arrays recursively. Likewise, in LinearSelect, the sizes of the
two recursive calls are very different, but because they are both fractions
of the original size, the analysis ends up being related to that of a more
standard divide-and-conquer algorithm. Finally, DivideDC does not divide
the problem into a bounded number of subproblems; however, all of the
recursive calls in turn yield at most two recursive calls, so we can analyze
these calls using standard divide-and-conquer techniques.
10.8 Exercises
Exercise 10.1 Prove that PolyMult, shown in Figure 10.1, meets its
specification.
Exercise 10.6 Prove that MergeSort, shown in Figure 10.2, meets its
specification.
Exercise 10.8 Prove that QuickSort, shown in Figure 10.3, meets its
specification.
Exercise 10.9 Notice that one of the recursive calls in QuickSort is tail
recursion. Taking advantage of this fact, convert one of the recursive calls
to iteration. Notice that the calls can be made in either order, and so
either may be converted to iteration. Make the proper choice so that the
resulting algorithm uses Θ(lg n) stack space in the worst case on an array
of n elements.
p ← A[RandomInteger(1, n)]
Show that the expected running time of this algorithm is in Θ(n). [Hint:
Your analysis should be similar to the analysis of QuickSort in Section
10.3.]
i=1
i=1
Exercise 10.18 Show that it is possible to find either the smallest or largest
of n elements using at most n − 1 comparisons.
* Exercise 10.19 Show that it is possible to find either the second largest
or second smallest of n elements using at most n + ⌈lg n⌉ − 2 comparisons.
* Exercise 10.20 Show that it is possible to find the median of five ele-
ments using at most six comparisons.
Exercise 10.22 Prove that DivideRecip, shown in Figure 10.8, meets its
specification as given in Figure 10.6.
a. Show that if
11 − ⌊8y⌋
x0 = ,
4
then
1 1
y − x0 ≤ .
4
* Exercise 10.26 Given two natural numbers u and v which are not both
0, the greatest common divisor of u and v (or gcd(u, v)) is the largest integer
that evenly divides both u and v.
a. Prove that for any positive integers u and v, gcd(u, v) = gcd(v, u mod
v).
* Exercise 10.27 Given two positive integers u and m such that u < m,
a multiplicative inverse of u mod m is any positive integer v such that 1 ≤
v < m and (uv) mod m = 1.
a. Prove that for any positive integers u and v, there exist integers a and
b such that au + bv = gcd(u, v).
b. Prove that u has a multiplicative inverse mod m iff gcd(u, m) = 1.
[Hint: See Theorem 7.4 on page 265.]
c. Prove that for 1 ≤ u < m, u has at most one multiplicative inverse
mod m.
d. Give a efficient divide-and-conquer algorithm that takes as input pos-
itive integers u and m such that u < m and returns the multiplicative
inverse of u mod m, or nil if no inverse exists. Your algorithm should
run in O(lg m) time. [Hint: Modify the algorithm for Exercise 10.26
to find a and b as described in part a.]
(x4 , y4 )
(x2 , y2 )
(x3 , y3 ) (x5 , y5 )
(x1 , y1 )
(x6 , y6 )
Your algorithm should return the minimum distance separating any two
distinct points.
√
* Exercise 10.32 Give a divide-and-conquer algorithm for computing ⌊ n⌋,
where n is a BigNum. Your algorithm’s running time should be in O(M (lg n)),
where M (n) is as defined in Section 10.6.
Hoare [59]. See Bentley and McIlroy [13] for a good practical implementation
of quick sort.
Algorithm LinearSelect is due to Blum, et al. [15]. The solution to
Exercise 10.19 is due to Aigner [4].
The solution to Exercise 10.31 is due to Bentley [11]. The solution to
Exercise 10.33 is due to Strassen [101].
Chapter 11
Optimization I: Greedy
Algorithms
In this chapter and the next, we consider algorithms for optimization prob-
lems. We have already seen an example of an optimization problem — the
maximum subsequence sum problem from Chapter 1. We can characterize
optimization problems as admitting a set of candidate solutions. In the max-
imum subsequence sum problem, the candidate solutions are the contiguous
subsequences in the input array. An objective function then typically maps
these candidate solutions to numeric values. The objective function for the
maximum subsequence sum problem maps each contiguous subsequence to
its sum. The goal is to find a candidate solution that either maximizes or
minimizes, depending on the problem, the objective function. Thus, the
goal of the maximum subsequence problem is to find a candidate solution
that maximizes the objective function.
In this chapter, we will examine optimization problems which admit
greedy solutions. A greedy algorithm builds a specific candidate solution
incrementally. The aspect of a greedy algorithm that makes it “greedy” is
how it chooses from among the different ways of incrementing the current
partial solution. In general, the different choices are ordered according to
some criterion, and the best choice according to this criterion is taken. Thus,
the algorithm builds the solution by always taking the step that appears to
be most promising at that moment. Though there are many problems for
which greedy strategies do not produce optimal solutions, when they do,
they tend to be quite efficient. In the next chapter, we will examine a more
general technique for solving optimization problems when greedy strategies
fail.
374
CHAPTER 11. OPTIMIZATION I: GREEDY ALGORITHMS 375
been made, the values of the jobs scheduled so far have no effect on future
decisions — their values are simply added to the total value of the schedule.
As a result, all we really need to know about the schedule constructed so far
is what time slots are still available. Furthermore, maximizing the values of
jobs scheduled in the remaining slots will maximize the total value, because
the values of all scheduled jobs are simply added together.
We can therefore focus our attention on the following version of the
problem. The input consists of a set X of (unscheduled) jobs and an array
avail[1..n] of boolean values. A valid schedule either assigns a job xi into a
time slot t such that t is no more than the deadline of xi and avail[t] = true,
or it does not schedule xi . The goal is to maximize the total value of
scheduled jobs. The following theorem shows that an optimal schedule can
be constructed by selecting the job with maximum value and scheduling it
at the latest possible time, assuming it can be scheduled.
of their values. Using heap sort or merge sort, this can be done in Θ(m lg m)
time. Schedule, shown in Figure 8.2, then implements the greedy strat-
egy. Because Schedule can be implemented to run in O(n + m lg n) time,
if m ∈ Θ(n), the entire algorithm runs in Θ(n lg n) time.
The spanning tree will initially contain only the vertex 0; hence, it is
unnecessary to include the index 0 for the arrays best and bestCost. We
can then initialize each best[k] to 0 and each bestCost[k] to the cost of edge
{0, k}, or to ∞ if there is no such edge. In order to find an edge to add
to the spanning tree we can find the minimum bestCost[k] such that k is
not in the spanning tree. If we denote this index by next, then the edge
{best[next], next} is the next edge to be added, thus connecting next to
the spanning tree. For each k that is still not in the spanning tree, we must
CHAPTER 11. OPTIMIZATION I: GREEDY ALGORITHMS 381
then update bestCost[k] by comparing it to the cost of {next, k}, and update
best[k] accordingly. The algorithm is shown in Figure 11.2.
It is easily seen that if G is a MatrixGraph, the running time is in
Θ(n2 ). This is an improvement over Kruskal’s algorithm when a Matrix-
Graph is used. If a ListGraph is used, however, the running time is still
in Ω(n2 ), and can be as bad as Θ(n3 ) for dense graphs. Thus, Kruskal’s
algorithm is preferred when a ListGraph is used, but Prim’s algorithm is
preferred when a MatrixGraph is used. If we have the freedom of choosing
the Graph implementation, we should choose a ListGraph and Kruskal’s
algorithm for sparse graphs, but a MatrixGraph and Prim’s algorithm for
dense graphs.
of a tree rooted at u. This rooted tree can be used to represent the shortest
paths.
Let us now generalize the problem so that a tree T rooted at u and
containing a subset of the edges and vertices of the graph is provided as
additional input. Suppose that this tree is a proper subtree of a shortest
path tree; i.e., suppose that for each vertex w in the tree, the path from u to
w in the tree is a shortest path from u to w in G. We need to add a vertex
x and an edge (w, x), where w is a vertex in T , so that the path from u to
x in the resulting tree is a shortest path in G from u to x.
For each vertex w in T , let dw give the length of the path from u to w
in T . For each edge (x, y) in G, let len(x, y) give the length of (x, y). Let
(w, x) be an edge in G such that
• w is in T ;
• x is not in T ; and
• dw + len(w, x) is minimized.
M p
much more frequently than other characters like ‘X’. It might make sense,
then, to use a variable-width encoding, so that the more frequently occurring
characters have shorter codes. Furthermore, if the encoding were chosen for
a particular text document, we could use shorter codes overall because we
would only need to encode those characters that actually appear in the
document.
The difficulty with variable-width encodings is choosing the encoding so
that it is clear where one character ends and the next begins. For example,
if we encode ‘n’ with 11 and ‘o’ with 111, then the encoding 11111 would
be ambiguous — it could encode either “no” or “on”. To overcome this
difficulty, we arrange the characters as the leaves of a binary tree in which
each non-leaf has two nonempty children (see Figure 11.3). The encoding of
a character is determined by the path from the root to the leaf containing
that character: each left child on the path denotes a 0 in the encoding, and
each right child on the path denotes a 1 in the encoding. Thus, in Figure
11.3, ‘M’ is encoded as 100. Because no path from the root to a leaf is a
proper prefix of any other path from the root to a leaf, no ambiguity results.
Example 11.3 For example, we can use the tree in Figure 11.3 to en-
code “Mississippi” as 100011110111101011010. We parse this encoding by
traversing the tree according to the paths specified by the encoding. Start-
ing at the root, we go right-left-left, arriving at the leaf ‘M’. Starting at the
root again, we go left, arriving at the leaf ‘i’. Continuing in this manner, we
CHAPTER 11. OPTIMIZATION I: GREEDY ALGORITHMS 385
see that the bit-string decodes into “Mississippi”. Note that because there
are four distinct characters, a fixed-width encoding would require at least
two bits per character, yielding a bit string of length 22. However, the bit
string produced by the given encoding has length 21.
The specific problem we wish to address in this section is that of pro-
ducing a tree that yields a minimum-length encoding for a given text. We
will not concern ourselves with the counting of the characters in the text;
rather, we will assume that a frequency table has been produced and is pro-
vided as our input. This frequency table gives the number of occurrences of
each character in the text. To simplify matters, we will assume that none
of the characters in the table has a frequency of 0. Furthermore, we will
not concern ourselves with producing the encoding from the tree; i.e., our
output consists solely of a binary tree storing the information we need in
order to extract each character’s code.
We first need to consider how we can determine the length of an encoded
string for a particular encoding tree. Note that when we decode a bit string,
we traverse exactly one edge in the tree for each bit of the encoding. One
way to determine the length of the encoding is therefore to compute the
number of times each edge would be traversed during decoding. A given
edge (u, v) is traversed once for each occurrence of each character in the
subtree rooted at v. For a subtree t, let us therefore define weight(t) to be
the total number of occurrences of all characters in t. For an encoding tree
T , we can then define cost(T ) to be the sum of the weights of all proper
subtrees of T . (Note that weight(T ) will always be the length of the given
text.) cost(T ) then gives the length of the encoding based on T . For a
given frequency table, we define a Huffman tree to be an encoding tree with
minimum cost for that table.
Let us now generalize the problem so that the input is a collection of
trees, t1 , . . . , tn , each of which encodes a portion of the frequency table.
We assume that each character in the frequency table occurs in exactly one
of the input trees, and that the frequency table has a Huffman tree that
contains all of the input trees as subtrees. Note that if all of the trees are
single nodes, this input is just the information from the frequency table. If
the input consists of more than one tree, we need to merge two of the trees
by making them the children of a new root. Furthermore, we need to be
able to do this so that the frequency table has a Huffman tree containing
all of the resulting trees as subtrees. We claim that merging two trees of
minimum weight will produce such a tree.
CHAPTER 11. OPTIMIZATION I: GREEDY ALGORITHMS 386
Theorem 11.4 Let T be a Huffman tree for a frequency table F , and let
t1 , . . . , tn be subtrees of T such that n > 1 and each leaf of T occurs in exactly
one of t1 , . . . , tn . Suppose weight(t1 ) ≤ weight(t2 ) ≤ · · · ≤ weight(tn ). Let
tn+1 be the binary tree formed by making t1 and t2 the left and right children,
respectively, of a new root. Then there is a Huffman tree T ′ for F containing
t3 , t4 , . . . , tn+1 as subtrees.
Case 1: The path from x to t1 is no longer than the path from x to t2 . Let
t be the sibling of t2 in T . Without loss of generality, assume t is the left
child and t2 is the right child (otherwise, we can swap them). Clearly, t can
be neither t1 nor t2 . Furthermore, it cannot be a proper subtree of any of
t1 , . . . , tn , because then t2 would also be a proper subtree of the same tree.
Finally, t cannot contain t1 as a proper subtree, because then the path from
x to t1 would be longer than the path from x to t2 . We conclude that t must
contain one or more of t3 , . . . , tn . We can therefore swap t1 with t, letting
the result be T ′ .
Because t contains one or more of t3 , . . . , tn , weight(t1 ) ≤ weight(t);
hence, weight(t) − weight(t1 ) ≥ 0. The swap then causes the weights of all
nodes except x on the path from x to the parent of t1 in T to increase by
weight(t)−weight(t1 ). Furthermore, it causes the weights of all nodes except
x on the path from x to the parent of t2 in T to decrease by weight(t) −
weight(t1 ). No other nodes change weight. Because there are at least as
many nodes on the path from x to t2 in T as on the path from x to t1 in
T , the swap cannot increase the cost of the tree. Therefore T ′ is a Huffman
tree.
11.5 Summary
Greedy algorithms provide an efficient mechanism for solving certain opti-
mization problems. The major steps involved in the construction of a greedy
algorithm are:
Priority queues are often useful in facilitating quick access to the best
extension, as determined by the selection criterion. In many cases, the
extension involves joining pieces of a partial solution in a way that can be
modeled effectively using a DisjointSets structure.
Proving that the incremental extension can be extended to an optimal
solution is essential, because it is not true for all selection criteria. In fact,
there are optimization problems for which there is no greedy solution. In
the next chapter, we will examine a more general, though typically more
expensive, technique for solving optimization problems.
11.6 Exercises
Exercise 11.1 Prove that Kruskal, shown in Figure 11.1, meets its spec-
ification.
Exercise 11.2 Prove that Prim, shown in Figure 11.2, meets its specifica-
tion.
CHAPTER 11. OPTIMIZATION I: GREEDY ALGORITHMS 389
Exercise 11.3 Instead of using the arrays best and bestCost, Prim’s algo-
rithm could use a priority queue to store all of the edges from vertices in the
spanning tree. As vertices are added to the spanning tree, all edges from
these vertices would be added to the priority queue. As edges are removed
from the priority queue, they would need to be checked to see if they con-
nect a vertex in the spanning tree with one that is not in the spanning tree.
Implement this algorithm and analyze its running time assuming the graph
is implemented as a ListGraph.
Exercise 11.5 Modify your algorithm from Exercise 11.4 to use a priority
queue as suggested in Exercise 11.3. Analyze its running time assuming the
graph is implemented as a ListGraph.
Exercise 11.7 Construct a Huffman tree for the string, “banana split”,
and give its resulting encoding in binary. Don’t forget the blank character.
Exercise 11.8 Prove that HuffmanTree, shown in Figure 11.4, meets its
specification.
Exercise 11.9 Suppose we have a set of jobs, each having a positive integer
execution time. We must schedule all of the jobs on a single server so that
at most one job occupies the server at any given time and each job occupies
the server for a length of time equal to its execution time. Our goal is to
minimize the sum of the finish times of all of the jobs. Design a greedy
algorithm to accomplish this and prove that it is optimal. Your algorithm
should run in O(n lg n) time, where n is the number of jobs.
CHAPTER 11. OPTIMIZATION I: GREEDY ALGORITHMS 390
Exercise 11.10 Extend the above exercise to k servers, so that each job is
scheduled on one of the servers.
Exercise 11.11 Suppose we are given a set of events, each having a start
time and a finish time. Each event requires a single room. We wish to assign
events to rooms using as few rooms as possible so that no two events in the
same room overlap (they may, however, be scheduled “back-to-back” with
no break in between). Give a greedy algorithm to accomplish this and prove
that it is optimal. Your algorithm should run in O(n lg n) time.
Exercise 11.12 Repeat the above exercise with the constraint that only
one room is available. The goal is to schedule as many events as possible.
Exercise 11.13 We wish to plan a trip across country in a car that can go
d miles on a full tank of gasoline. We have identified all of the gas stations
along the proposed route. We wish to plan the trip so as to make as few stops
for gasoline as possible. Design a greedy algorithm that gives an optimal
set of stops when given d and an array dist[1..n] such that dist[i] gives the
distance from the starting point to the ith gas station. Your algorithm
should operate in O(n) time.
b. Show using a specific example that this greedy algorithm does not
always give an optimal solution if we require that each ai be either 0
or 1.
c. Using techniques from Chapter 10, improve the running time of your
algorithm to O(n).
In the last chapter, we saw that greedy algorithms are efficient solutions to
certain optimization problems. However, there are optimization problems
for which no greedy algorithm exists. In this chapter, we will examine a more
general technique, known as dynamic programming, for solving optimization
problems.
Dynamic programming is a technique of implementing a top-down solu-
tion using bottom-up computation. We have already seen several examples
of how top-down solutions can be implemented bottom-up. Dynamic pro-
gramming extends this idea by saving the results of many subproblems in
order to solve the desired problem. As a result, dynamic programming algo-
rithms tend to be more costly, in terms of both time and space, than greedy
algorithms. On the other hand, they are often much more efficient than
straightforward recursive implementations of the top-down solution. Thus,
when greedy algorithms are not possible, dynamic programming algorithms
are often the most appropriate.
392
CHAPTER 12. OPTIMIZATION II: DYNAMIC PROGRAMMING 393
An obvious greedy strategy is to choose at each step the largest coin that
does not cause the total to exceed n. For some sets of coin denominations,
this strategy will result in the minimum number of coins for any n. However,
suppose n = 30, d1 = 1, d2 = 10, and d3 = 25. The greedy strategy first
takes 25. At this point, the only denomination that does not cause the total
to exceed n is 1. The greedy strategy therefore gives a total of six coins: one
25 and five 1s. This solution is not optimal, however, as we can produce 30
with three 10s.
Let us consider a more direct top-down solution. If k = 1, then dk = 1,
so the only solution contains n coins. Otherwise, if dk > n, we can reduce
the size of the problem by removing dk from the set of denominations, and
the solution to the resulting problem is the solution to the original problem.
Finally, suppose dk ≤ n. There are now two possibilities: the optimal
solution either contains dk or it does not. In what follows, we consider these These two
possibilities are
two cases separately. not exclusive —
Let us first consider the case in which the optimal solution does not there could be
contain dk . In this case, we do not change the optimal solution if we remove one optimal
solution that
dk from the set of denominations. We therefore have reduced the original contains dk and
problem to a smaller problem instance. another that
does not.
Now suppose the optimal solution contains dk . Suppose we remove one
dk coin from this optimal solution. What remains is an optimal solution
to the instance with the same set of denominations and a target value of
n − dk . Now working in the other direction, if we have the optimal solution
to the smaller instance, we can obtain an optimal solution to the original
instance by adding a dk coin. Again, we have reduced the original problem
to a smaller problem instance.
To summarize, when dk ≤ n, the optimal solution can be obtained from
the optimal solution to one of two smaller problem instances. We have no
way of knowing in advance which of these smaller instances is the right
one; however, if we obtain both of them, we can compare the two resulting
candidate solutions. The one with fewer coins is the optimal solution. In
fact, if we could quickly determine which of these smaller instances would
yield fewer coins, we could use this test as the selection criterion for a greedy
algorithm. Therefore, let us focus for now on the more difficult aspect of
this problem — that of determining the minimum number of coins in an
optimal solution.
Based on the above discussion, the following recurrence gives the mini-
mum number of coins needed to obtain a value of n from the denominations
CHAPTER 12. OPTIMIZATION II: DYNAMIC PROGRAMMING 394
d1 , . . . , dk :
n
if k = 1
C(n, k) = C(n, k − 1) if k > 1, dk > n (12.1)
min(C(n, k − 1), C(n − dk , k) + 1) if k > 1, n ≥ dk .
n − k ≥ k2 − k
= k(k − 1)
≥ (k − 1)2
2. C(n, k) also requires C(n − k, k), which requires C(n − k, k − 1), which
requires C(n − k, k − 2), which requires C(n − 2k + 2, k − 2).
each C(i, j) in constant time. All (n + 1)k of these values can therefore be
computed in Θ(nk) time. Once all of these values have been computed, then
the optimal collection of coins can be constructed in a greedy fashion, as
suggested above. The algorithm is shown in Figure 12.1. This algorithm is
easily seen to use Θ(nk) time and space.
A characteristic of this problem that is essential in order for the dynamic
programming approach to work is that it is possible to decompose a large
problem instance into smaller problem instances in a way that optimal so-
lutions to the smaller instances can be used to produce an optimal solution
to the larger instance. This is, of course, one of the main principles of the
top-down approach. However, this characteristic may be stated succinctly
for optimization problems: For any optimal solution, any portion of that
solution is itself an optimal solution to a smaller instance. This principle
is known as the principle of optimality. It applies to the change-making
problem because any sub-collection of an optimal collection of coins is it-
self an optimal collection for the value it yields; otherwise, we could replace
the sub-collection with a smaller sub-collection yielding the same value, and
obtain a better solution to the original instance.
The principle of optimality usually applies to optimization problems, but
not always in a convenient way. For example, consider the problem of finding
a longest simple path in a graph from a given vertex u to a given vertex v. A simple path in
a graph is a path
If we take a portion of the longest path, say from x to y, this subpath is in which each
not necessarily the longest simple path from x to y in the original graph. vertex appears at
However, it is guaranteed to be the longest simple path from x to y in the most once.
subgraph consisting of only those vertices on that subpath and all edges
between them in the original graph. Thus, a subproblem consists of a start
vertex, a final vertex, and a subset of the vertices. Because a graph with
n vertices has 2n subsets of vertices, there are an exponential number of
subproblems to solve. Thus, in order for dynamic programming to be an
effective design technique, the principle of optimality must apply in a way
that yields relatively few subproblems.
One characteristic that often leads to relatively few subproblems, while
at the same time causing direct recursive implementations to be quite ex-
pensive, is that the top-down solution results in overlapping subproblems.
As we have already discussed, the top-down solution for the change-making
problem can result in two subproblems which have a subproblem in common.
This overlap results in redundant computation in the direct recursive imple-
mentation. On the other hand, it reduces the total number of subproblems,
so that the dynamic programming approach is more efficient.
CHAPTER 12. OPTIMIZATION II: DYNAMIC PROGRAMMING 396
Figure 12.1 Algorithm for computing the minimum number of coins needed
to achieve a given value
Precondition: d[1..k] is an array of Ints such that 1 = d[1] < d[2] < · · · <
d[k], and n is a Nat.
Postcondition: Returns an array A[1..k] such that A[i] gives the number
of coins of denomination d[i] in a minimum-sized collection of coins with
value n.
Change(d[1..k], n)
C ← new Array[0..n, 1..k]; A ← new Array[1..k]
for i ← 0 to n
C[i, 1] ← i
for i ← 0 to n
for j ← 2 to k
if i < d[j]
C[i, j] ← C[i, j − 1]
else
C[i, j] ← Min(C[i, j − 1], C[i − d[j], j] + 1)
for j ← 1 to k
A[j] ← 0
i ← n; j ← k P
k
// Invariant: l=1 A[l]d[l] = n − i, and there is an optimal solution
// that includes all of the coins in A[1..k], but no additional coins from
// d[j + 1..k].
while j > 1
if i < d[j] or C[i, j − 1] < C[i − d[j], j] + 1
j ←j−1
else
A[j] ← A[j] + 1; i ← i − d[j]
A[1] ← i
return A[1..k]
CHAPTER 12. OPTIMIZATION II: DYNAMIC PROGRAMMING 397
M1 M2 · · · Mn ,
• M1 : 2 × 3;
• M2 : 3 × 4; and
• M3 : 4 × 1.
2 · 3 · 4 + 2 · 4 · 1 = 32.
3 · 4 · 1 + 2 · 3 · 1 = 18.
CHAPTER 12. OPTIMIZATION II: DYNAMIC PROGRAMMING 398
Thus, the way in which the matrices are parenthesized can affect the
number of scalar multiplications performed in computing the matrix prod-
uct. This fact motivates an optimization problem: Given a sequence of
positive integer dimensions d0 , . . . , dn , determine the minimum number of
scalar multiplications needed to compute the product M1 · · · Mn , assuming
Mi is a di−1 × di matrix for 1 ≤ i ≤ n, and that the number of scalar
multiplications required to multiply two matrices is as described above.
Various greedy strategies might be applied to this problem, but none can
guarantee an optimal solution. Let us therefore look for a direct top-down
solution to the problem of finding the minimum number of scalar multipli-
cations for a product Mi · · · Mj . Let us focus on finding the last matrix
multiplication. This multiplication will involve the products Mi · · · Mk and
Mk+1 · · · Mj for some k, 1 ≤ k < n. The sizes of these two matrices are
di−1 × dk and dk × dj . Therefore, once these two matrices are computed, an
additional di−1 dk dj scalar multiplications must be performed. The principle
of optimality clearly holds for this problem, as a better way of computing
either sub-product results in fewer total scalar multiplications. Therefore,
the following recurrence gives the minimum number of scalar multiplications
needed to compute Mi · · · Mj :
0 if i = j
m(i, j) = (12.2)
min (m(i, k) + m(k + 1, j) + di−1 dk dj ) if i < j
i≤k<j
fractional knapsack problem does not extend to the so-called 0-1 knapsack
problem — the variation in which the items cannot be broken. Specifically,
in this variation we are given a set of n items, each having a positive weight
wi ∈ N and a positive value vi ∈ N, and a weight bound W ∈ N. We wish
to find a subset S ⊆ {1, . . . , n} that maximizes
X
vi
i∈S
Let us then compute the minimum weight required to achieve each possible
value v ≤ V . The largest value v yielding a minimum weight no larger than
W is then our optimal value.
Taking this approach, we observe that item n is either in the set of items
for which value v can be achieved with minimum weight, or it isn’t. If it
CHAPTER 12. OPTIMIZATION II: DYNAMIC PROGRAMMING 403
is, then the minimum weight can be computed by removing item n and
finding the minimum weight needed to achieve a value of v − vn . Otherwise,
the minimum weight can be computed by removing item n. The following
recurrence therefore gives the minimum weight Wi (j) needed to achieve a
value of exactly j from the first i items, for 0 ≤ i ≤ n, 0 ≤ j ≤ V :
0 if j = 0
∞ if i = 0, j > 0
Wi (j) = (12.5)
Wi−1 (j)
if i > 0, 0 < j < vi
min(W (j), W (j − v ) + w ) otherwise.
i−1 i−1 i i
12.5 Summary
Dynamic programming algorithms provide more power for solving optimiza-
tion problems than do greedy algorithms. Efficient dynamic programming
algorithms can be found when the following conditions apply:
• The principle of optimality can be applied to decompose the problem
into subinstances of the same problem.
12.6 Exercises
Exercise 12.1 Prove by induction on n + k that C(n, k), as defined in
recurrence (12.1), gives the minimum number of coins needed to give a
value of exactly n if the denominations are d1 < d2 < · · · < dk and d1 = 1.
Exercise 12.2 Prove that Change, shown in Figure 12.1, meets its spec-
ification. You do not need to focus on the first half of the algorithm; i.e.,
you can assume that C(i, j), as defined in recurrence (12.1), is assigned to
C[i, j]. Furthermore, you may use the result of Exercise 12.1 in your proof.
a. Prove that for denominations d1 < d2 < · · · < dk , where k > 1, if the
greedy algorithm fails for some value, then it must fail for some value
n < dk + dk−1 .
Exercise 12.6
a. Modify Floyd’s algorithm (Figure 12.3) so that it returns an array
S[0..n − 1, 0..n − 1] such that for i 6= j, S[i, j] gives the vertex k such
that (i, k) is the first edge in a shortest path from i to j. If there is
no path from i to j, or if i = j, then S[i, j] should be −1.
b. Give an algorithm that takes the array S[0..n − 1, 0..n − 1] defined
above, along with i and j such that 0 ≤ i < n and 0 ≤ j < n, and
prints the vertices along a shortest path from i to j. The first vertex
printed should be i, followed by the vertices in order along the path,
until the last vertex j is printed. If i = j, only i should be printed. If
there is no path from i to j, a message to that effect should be printed.
Your algorithm should run in O(n) time.
Exercise 12.7 Give an algorithm for the 0-1 knapsack problem that runs
in O(nW ) time and space, where n is the number of items and W is the
weight bound. Your algorithm should use dynamic programming to compute
recurrence (12.4) for 0 ≤ i ≤ n and 0 ≤ j ≤ W , then use these values to
guide a greedy algorithm for selecting the items to put into the knapsack.
Your algorithm should return an array selected[1..n] of booleans such that
selected[i] is true iff item i is in the optimal packing.
Exercise 12.8 Repeat Exercise 12.7 using recurrence (12.5) instead of (12.4).
Your algorithm should use Θ(nV ) time and space, where n is the number
of items and V is the total value of all the items.
Exercise 12.10 Let A[1..m] and B[1..n] be two arrays. An array C[1..k]
is a common subsequence of A and B if there are two sequences of indices
hi1 , . . . , ik i and hj1 , . . . , jk i such that
a. Give a recurrence for L(i, j), the length of the longest common subse-
quence of A[1..i] and B[1..j].
Exercise 12.11 A palindrome is a string that reads the same from right
to left as it does from left to right (“abcba”, for example). Give a dynamic
programming algorithm that takes a String (see Figure 4.17 on page 144)
s as input, and returns a longest palindrome contained as a substring within
s. Your algorithm should operate in O(n2 ) time, where n is the length of
s. You may use the results of Exercise 4.13 (page 143) in analyzing your
algorithm. [Hint: For each pair of indices i ≤ j, determine whether the
substring from i to j is a palindrome.]
a) b) c)
* Exercise 12.15 A chain is a rooted tree with exactly one leaf. We are
given a chain representing a sequence of n pipelined processes. Each node i
in the chain represents a process and has a positive execution time ei ∈ N.
Each edge (i, j) has a positive communication cost cij ∈ N. For edge (i, j),
if processes i and j are executed on separate processors, the time needed to
send data from process i to process j is cij ; if the processes are executed on
the same processor, this time is 0. We wish to assign processes to processors
such that each processor has total weight no more than a given value B ∈ N.
The weight of a processor is given by the sum of the execution times of the
processes assigned to that processor, plus the sum of the communication
CHAPTER 12. OPTIMIZATION II: DYNAMIC PROGRAMMING 408
3
Processor 1: Weight = 8
2
B = 20
3 Communication cost = 2
5
Processor 2: Weight = 17
8
costs of edges between tasks on that processor and tasks on other processors
(see Figure 12.5). The communication cost of an assignment is the sum of
the communication costs of edges that connect nodes assigned to different
processors.
Give a dynamic programming algorithm that finds the minimum com-
munication cost of any assignment of processes to processors such that each
processor has weight no more than B. Note that we place no restriction on
the number of processors used. Your algorithm should run in O(n2 ) time.
Prove that your algorithm is correct.
Exercise 12.16 Given two strings x and y, we define the edit distance from
x to y as the minimum number of operations required to transform x into
y, where the operations are chosen from the following:
• insert a character;
• delete a character; or
CHAPTER 12. OPTIMIZATION II: DYNAMIC PROGRAMMING 409
• change a character.
We say that a binary search tree containing these keys is optimal if the
expected cost of a look-up in this tree is minimum over the set of all binary
search trees containing these keys.
* d. Using the above result, improve your algorithm to run in O(n2 ) time.
Your algorithm should run in O(n2 ) time and use O(n) space. Prove that
your algorithm is correct.
including a single blank character between adjacent words on the same line.
Furthermore, we wish to minimize a “sloppiness” criterion. Specifically, we
wish to minimize the following objective function:
k−1
X
f (m − ci ),
i=1
413
Chapter 13
Depth-First Search
414
CHAPTER 13. DEPTH-FIRST SEARCH 415
We interpret the size of the structure to be the size of num, and we interpret
the value of num[i] as the value associated with i. Clearly, the constructor
runs in Θ(n) time, and the operations all run in Θ(1) time.
The algorithm shown in Figure 13.2 combines the preorder and postorder
traversals of the tree T . We use a (directed) Graph to represent T . pre is
a VisitCounter that records the order in which nodes are visited in the
CHAPTER 13. DEPTH-FIRST SEARCH 416
Precondition: n is a Nat.
Postcondition: Constructs a VisitCounter of size n, all of whose
values are 0.
VisitCounter(n)
count ← 0; num ← new Array[0..n − 1]
for i ← 0 to n − 1
num[i] ← 0
Precondition: true.
Postcondition: Returns the size of this VisitCounter.
VisitCounter.Size()
return SizeOf(num)
CHAPTER 13. DEPTH-FIRST SEARCH 417
Correctness: Assume the invariant holds and that L is empty when the
loop terminates. We need to show that the postcondition holds when the
algorithm finishes. Let S denote the set of descendants of i.
Let us first consider which values in pre and post have been changed
by the algorithm. From the invariant, only pre.Num(i), pre.Num(j), and
post.Num(j), where j ∈ S \{i}, have changed by the time the loop ter-
minates. The final call to post.Visit(i) changes post.Num(i). Therefore,
the only values to have been changed by the algorithm are pre.Num(j) and
post.Num(j) such that j ∈ S.
Let j ∈ S and k 6∈ S. If j 6= i, then from the invariant, pre.Num(j) >
pre.Num(k) and post.Num(j) > post.Num(k). If j = i, then from the
invariant pre.Num(j) > pre.Num(k). Furthermore, the call to post.Visit(i)
makes post.Num(j) > post.Num(k).
Now let j, k ∈ S. We must show that j is a proper ancestor of k iff
pre.Num(j) < pre.Num(k), and post.Num(j) > post.Num(k).
Figure 13.3 Algorithm for testing ancestry for multiple pairs of nodes in a
rooted tree
runs in Θ(1) time. Furthermore, the while loop iterates exactly m times,
where m is the number of children of i. Because each iteration of the while
loop results in one recursive call, it is easily seen that the running time is
proportional to the total number of calls to PrePostTraverse. It is easily
shown by induction on n, the number of nodes in the subtree rooted at i,
that a call in which the second parameter is i results in exactly n total calls.
The running time is therefore in Θ(n).
The algorithm for testing ancestry for multiple pairs of nodes is given
in Figure 13.3. The initialization prior to the call to PrePostTraverse
clearly runs in Θ(n) time, as does the call to PrePostTraverse. The
body of the loop runs in Θ(1) time. Because the loop iterates m times, the
entire algorithm runs in Θ(n + m) time.
• G′ has the edge (j, k) iff a call ReachDFS(G, j, sel, pre, post) is made,
which in turn calls ReachDFS(G, k, sel, pre, post).
CHAPTER 13. DEPTH-FIRST SEARCH 422
Precondition: n is a Nat.
Postcondition: Constructs a Selector of size n, all of whose elements
are selected.
Selector(n)
Precondition: true.
Postcondition: Selects all elements.
Selector.SelectAll()
Precondition: true.
Postcondition: Unselects all elements.
Selector.UnselectAll()
Precondition: i is a Nat less than the number of elements.
Postcondition: Selects element i.
Selector.Select(i)
Precondition: i is a Nat less than the number of elements.
Postcondition: Unselects element i.
Selector.Unselect(i)
Precondition: i is a Nat less than the number of elements.
Postcondition: Returns true if element i is selected, or false otherwise.
Selector.IsSelected(i)
We therefore have the ADT specified in Figure 13.6. The generic depth-first
search is shown in Figure 13.7.
Let us now consider the useful properties of depth-first spanning trees.
These properties concern the non-tree edges. First, we show the following
theorem regarding undirected graphs.
The above theorem gives the property of depth-first spanning trees that
makes depth-first search so useful for connected undirected graphs. Given
a connected undirected graph G and a depth-first spanning tree T of G, let
us refer to edges of G that correspond to edges in T as tree edges. We will
call all other edges back edges. By definition, tree edges connect parents
with children. Theorem 13.2 tells us that back edges connect ancestors with
descendants.
However, Theorem 13.2 does not apply to depth-first search on a directed
graph. To see why, consider the graph shown in Figure 13.8. The solid
edges in part (b) show a depth-first search tree for the graph in part (a);
the remaining edges of the graph are shown with dashed lines in part (b).
Because 0 is the root and all other vertices are reachable from 0, all other
CHAPTER 13. DEPTH-FIRST SEARCH 426
Precondition: n is a Nat.
Postcondition: Constructs a new Searcher of size n.
Searcher(n)
Precondition: i is a Nat less than the size of this Searcher.
Postcondition: true.
Searcher.PreProc(i)
Precondition: i is a Nat less than the size of this Searcher.
Postcondition: true.
Searcher.PostProc(i)
Precondition: e is an Edge whose vertices are less than the size of this
Searcher.
Postcondition: true.
Searcher.TreePreProc(e)
Precondition: e is an Edge whose vertices are less than the size of this
Searcher.
Postcondition: true.
Searcher.TreePostProc(e)
Precondition: e is an Edge whose vertices are less than the size of this
Searcher.
Postcondition: true.
Searcher.OtherEdgeProc(e)
CHAPTER 13. DEPTH-FIRST SEARCH 427
(a) 0 (b) 0
1 3 1 3
2 4 2 4
matter which endpoint we call j. However, with a directed edge, either the
source or the destination may be unselected first, and we must consider
both cases. Given the assumption that the source is unselected first, the
remainder of the proof follows. We therefore have the following theorem.
Theorem 13.3 Let G be a directed graph with n vertices such that all
vertices are reachable from i, and let sel be a Selector of size n in which
all elements are selected. Suppose we call Dfs(G, i, sel, s), where s is a
Searcher of size n. Then for every edge (j, k) processed as a non-tree
edge, if j is unselected before k is, then j is an ancestor of k.
Thus, if we draw a depth-first spanning tree with subtrees listed from
left to right in the order we unselect them (as in Figure 13.8), there will be
no edges leading from left to right. As we can see from Figure 13.8, all three
remaining possibilities can occur, namely:
• edges from descendants to ancestors (we call these back edges); and
Theorem 13.3 gives us the property we need to make use of depth-first search
with directed graphs.
As a final observation, we note that back edges in directed graphs always
form cycles, because there is always a path along the tree edges from a vertex
to any of its descendants. Hence, a directed acyclic graph cannot have back
edges.
CHAPTER 13. DEPTH-FIRST SEARCH 429
In the next three sections, we will show how to use depth-first search to
design algorithms for connected undirected graphs, directed acyclic graphs,
and directed graphs.
conclude that i is an articulation point iff i has at least one child j in T such
that no descendant of j is adjacent to a proper ancestor of i.
If we can efficiently test the above property, then we should be able
to find all articulation points with a single depth-first search. Note that
it is sufficient to know, for each vertex j other than the root, the highest
ancestor k of j that is adjacent to some descendant of j. The parent i of
j is an articulation point if k = i. On the other hand, if for each child
j of i, the highest ancestor k adjacent to some descendant of j is a proper
ancestor of i, then i is not an articulation point. Because a vertex is preorder
processed before all of its descendants, we can determine which of a given
set of ancestors of a vertex is the closest to the root by determining which
was preorder processed first. Thus, let us use a VisitCounter pre to keep
track of the order the vertices are preorder processed. We then need to
compute the following value for each vertex i other than the root:
We can now build a Searcher s so that Dfs(G, 0, sel, s) will find the
articulation points of G, where sel is an appropriate Selector. (Note that
it doesn’t matter which node is used as the root of the depth-first search,
so we will arbitrarily use 0.) Let n be the number of vertices in G. We
need as representation variables a VisitCounter pre of size n, an array
highest[0..n − 1], a readable array artPoints[0..n − 1] of booleans to store the
results, and a natural number rootChildren to record the number of children
of the root. Note that making artPoints readable makes this data structure
insecure, because code that can read the reference to the array can change
values in the array. We will discuss this issue in more detail shortly.
To implement the Searcher operations, we only need to determine
when the various calculations need to be done. Initialization should go in the
CHAPTER 13. DEPTH-FIRST SEARCH 431
constructor; however, because the elements of the arrays are not needed until
the corresponding vertices are processed, we can initialize these elements in
PreProc. We want the processing of a vertex i to compute highest[i]. In
order to use recurrence (13.2), we need pre.Num(k) for each back edge {i, k}
and highest[j] for each child j of i. We therefore include the code to compute
highest[i] in OtherEdgeProc and TreePostProc. The determination
of whether a vertex i other than the root is an articulation point needs to
occur once we have computed highest[j] for each child j of i; hence, we
include this code in TreePostProc. To be able to determine whether the
root is an articulation point, we count its children in TreePostProc. We
can then make the determination once all of the processing is complete, i.e.,
in the call to PostProc for the root.
The implementation of ArtSearcher is shown in Figure 13.9. We have
not given an implementation of the TreePreProc operation — it does
nothing. We have also not specified any preconditions or postconditions
for the constructor or any of the operations. The reason for this is that
we are only interested in what happens when we use this structure with a
depth-first search. It therefore doesn’t make sense to prove its correctness
in every context. As a result, we don’t need to make this structure secure.
Furthermore, the code in each of the operations is so simple that specifying
preconditions and postconditions is more trouble than it is worth. As we
will see, it will be a straightforward matter to prove that the algorithm that
uses this structure is correct.
We can now construct an algorithm that uses depth-first search to find
the articulation points in a connected undirected Graph G. The algorithm
is shown in Figure 13.10. Let n be the number of vertices and a be the
number of edges in G, and suppose G is implemented as a ListGraph.
Because each of the operations in ArtSearcher runs in Θ(1) time, it is
easily seen that the call to Dfs runs in Θ(a) time, the same as ReachDFS.
The remaining statements run in Θ(n) time. Because G is connected, n ∈
O(a), so the entire algorithm runs in Θ(a) time.
To prove that ArtPts is correct, we need to show that the call to Dfs
results in s.artPoints containing the correct boolean values. In order to
prove this, it is helpful to prove first that s.highest contains the correct
values. This proof uses the fact that Dfs performs a traversal of a depth-
first spanning tree of G.
ArtSearcher(n)
artPoints ← new Array[0..n − 1]; highest ← new Array[0..n − 1]
pre ← new VisitCounter(n); rootChildren ← 0
ArtSearcher.PreProc(i)
pre.Visit(i); artPoints[i] ← false; highest[i] ← ∞
ArtSearcher.TreePostProc(e)
i ← e.Source(); j ← e.Dest(); highest[i] ← Min(highest[i], highest[j])
if i = 0
rootChildren ← rootChildren + 1
else if highest[j] = pre.Num(i)
artPoints[i] ← true
ArtSearcher.OtherEdgeProc(e)
i ← e.Source(); k ← e.Dest()
highest[i] ← Min(highest[i], pre.Num(k))
ArtSearcher.PostProc(i)
if i = 0 and rootChildren > 1
artPoints[i] ← true
Induction Hypothesis: Assume that for any j with fewer than m descen-
CHAPTER 13. DEPTH-FIRST SEARCH 433
We must show that s.pre.Num(k) and s.highest[j] have the correct values
when they are used. s.pre.Num(k) is used in OtherEdgeProc(e), where
e = (i, k). This operation is only called when k is unselected, and hence after We are denoting
edges as directed
PreProc(k) has been called. s.pre.Num(k) has therefore been set to its edges because a
correct value. s.highest[j] is used in TreePostProc(e), where e = (i, j). Graph uses only
Hence, j is a child of i that has been processed. Because j is a child of i, it directed edges
(see p. 311).
has strictly fewer than m descendants. Thus, by the Induction Hypothesis,
its processing sets it to its correct value. Thus, the processing of vertex i
sets s.highest[i] to the value given in Equation (13.2), which we have shown
to be equivalent to Equation (13.1).
Proof: Let 0 ≤ i < n. We must show that the call to Dfs results in
s.artPoints[i] being true if i is an articulation point, or false otherwise. We
first note that artPoints[i] is changed only during the processing of i. Fur-
thermore, it is initialized to false in PreProc(i). We must therefore show
that it is set to true iff i is an articulation point. We consider two cases.
Case 2: i > 0. Then artPoints[i] is set to true iff s.highest[j] has a value
equal to s.pre.Num(i) in the call to TreePostProc(e), where e = (i, j)
for some vertex j. Because (i, j) is passed to TreePostProc, j must be
a child of i. From Lemma 13.4, the call to Dfs sets s.highest[j] to the
correct value, as defined in Equation (13.1). Furthermore, an examination
of the proof of Lemma 13.4 reveals that this value is set by the processing of
vertex j. This processing is done prior to the call to TreePostProc(e), so
that s.highest[j] has the correct value by the time it is used. Furthermore,
s.pre.Num(i) is set to its proper value by PreProc(i), which is also called
before TreePostProc(e). As we have already shown, i is an articulation
point iff s.highest[j] = s.pre.Num(i) for some child j of i.
TopSortSearcher(n)
order ← new Array[0..n − 1]; loc ← n
TopSortSearcher.PostProc(i)
loc ← loc − 1; order[loc] ← i
of these types of edge (i, j), j is postorder processed before i. This property
suggests a straightforward algorithm for topological sort, namely, to order
the vertices in the reverse of the order in which they are postorder processed
by a depth-first search.
The Searcher for this algorithm needs as representation variables a
readable array order[0..n − 1] for storing the listing of vertices in topological
order and a natural number loc for storing the location in order of the last
vertex to be inserted. Only the constructor and the PostProc operation
are nonempty; these are shown in Figure 13.12. The topological sort algo-
rithm is shown in Figure 13.13. If G is implemented as a ListGraph, the
algorithm’s running time is clearly in Θ(n + a), where n is the number of
vertices and a is the number of edges in G. We leave the proof of correctness
as an exercise.
CHAPTER 13. DEPTH-FIRST SEARCH 437
Proof: Clearly, for every vertex j ∈ S, there is a path from j to i that stays
entirely within S. Because i is postorder processed last of the vertices in S,
this path stays within G′ . Therefore, let j be a vertex such that there is a
CHAPTER 13. DEPTH-FIRST SEARCH 438
Induction Hypothesis: Let n > 0, and assume that for every m < n, if
there is a path of length m from k to i in G′ , then k is a descendant of i.
Case 2: (j, k) is either a forward edge or a tree edge. Then i and j are both Corrected
5/2/12.
ancestors of k. Because j is in G′ , it can be postorder processed no later
than i. Therefore, j cannot be a proper ancestor of i. j must therefore be a
descendant of i.
The above theorem suggests the following approach to finding the con-
nected components of G. We first do a depth-first search on the entire graph
using a postorder VisitCounter post. We then select all of the vertices.
To see how we might find an arbitrary strongly-connected component, sup-
pose some of the components have been found and unselected. We find the
selected vertex i that has maximum post.Num(i). We then find all vertices
j from among the selected vertices such that there is a path from j to i
containing only selected vertices.
We have to be careful at this point because the set of selected vertices
may not be exactly the set of vertices that are postorder processed no later
than i. Specifically, there may be a vertex j that belongs to one of the
components that have already been found, but which is postorder processed
before i. However, Theorem 13.6 tells us that because j belongs to a different
component than i, there is no path from j to i. Therefore, eliminating such
nodes will not interfere with the correct identification of a strongly connected
CHAPTER 13. DEPTH-FIRST SEARCH 439
component. We conclude that the vertices that we find comprise the strongly
connected component containing i.
In order to be able to implement this algorithm, we need to be able to
find all vertices j from which i is reachable via selected vertices. This is
almost the same as the reachability problem covered in Section 13.2, except
that the edges are now directed, and we must follow the edges in the wrong
direction. It is not hard to see that we can use depth-first search to find all
vertices reachable from a given vertex i in a directed graph. In order to be
able to use this algorithm to find all vertices j from which i is reachable, we
must reverse the direction of the edges.
Because DfsAll processes all of the edges in the graph, we can use it
to build a new graph in which all of the edges have been reversed. In fact,
we can use the same depth-first search to record the order of the postorder
processing of the vertices. We use three representation variables:
RevSearcher(n)
reverse ← new ListMultigraph(n)
order ← new Array[0..n − 1]; loc ← n
RevSearcher.TreePreProc(e)
reverse.Put(e.Dest(), e.Source(), e.Data())
RevSearcher.OtherEdgeProc(e)
reverse.Put(e.Dest(), e.Source(), e.Data())
RevSearcher.PostProc(i)
loc ← loc − 1; order[loc] ← i
13.7 Summary
Many graph problems can be reduced to depth-first search. In perform-
ing the reduction, we focus on a depth-first spanning tree or a depth-first
spanning forest. Because a rooted tree is more amenable to the top-down
approach than is a graph, algorithmic design is made easier. Furthermore,
depth-first spanning trees have structural properties that are often useful in
designing graph algorithms.
The implementation of a reduction to depth-first search consists mainly
of defining an implementation of the Searcher ADT. This data structure
defines what processing will occur at the various stages of the traversal of
the depth-first spanning tree. Proofs of correctness can then focus on the
traversal, utilizing induction as necessary.
13.8 Exercises
Exercise 13.1 Analyze the worst-case running time of the algorithm Pre-
PostTraverse, shown in Figure 13.2, assuming the tree T is implemented
as a MatrixGraph.
CHAPTER 13. DEPTH-FIRST SEARCH 441
SccSearcher(n)
components ← new Array[0..n − 1]; count ← 0
SccSearcher.PreProc(i)
components[i] ← count
SccSearcher.NextComp()
count ← count + 1
Exercise 13.2 Prove that DfsTopSort, shown in Figures 13.12 and 13.13,
meets its specification.
For example, Figure 14.1 shows a flow network whose source is 0, whose
sink is 5, and whose edges all have capacity 1. Intuitively, the capacities
represent the maximum flow that the associated edges can support. We are
interested in finding the maximum total flow from u to v that the network
can support.
The above definition is more general than what is typical. The standard
definition prohibits incoming edges to the source and outgoing edges from
the sink. However, this more general definition is useful for the development
of our algorithms.
In order to define formally a network flow, we need some additional
notation. For a vertex x in a directed graph, let
445
CHAPTER 14. NETWORK FLOW AND MATCHING 446
sink 5
1 1
3 4
1
1 1
1 2
1 1
source 0
Thus, the flow on each edge is no more than that edge’s capacity, and the
total flow into a vertex other than the source or the sink is the same as the
total flow out of that vertex. An example of a flow on the network shown
in Figure 14.1 would have a flow of 1 on every edge except (4, 1); this edge
would have a flow of 0.
We leave it as an exercise to show that for any flow F of a flow network
(G, u, v, C),
X X X X
F (e) − F (e) = F (e) − F (e). (14.1)
e∈u→ e∈u← e∈v ← e∈v →
the flow described above for the network in Figure 14.1 has a value of 2.
Given a flow network, the network flow problem is to find a network flow
with maximum value. Clearly, 2 is the maximum value of any flow for the
network in Figure 14.1.
In the next two sections, we will examine algorithms for the network
flow problem. In the remainder of the chapter, we will consider the bipartite
matching problem, and show how to reduce it to network flow.
is no such edge, we add it to the graph. When we combine the two flows,
we allow flows in opposite directions to cancel each other; i.e., if edge (x, y)
has flow k and edge (y, x) has flow k ′ ≤ k, we set the flow on (x, y) to k − k ′
and the flow on (y, x) to 0. Note that because any edge added to the graph
by the reduction will have a capacity of m, and the initial flow will be m
in the opposite direction, the combination of the two flows will result in no
flow on any edge that was added to the graph. We can therefore remove
these edges from the resulting flow.
Let us now formalize the construction outlined above. Let (G, u, v, C)
be a flow network, and let P be the set of edges in some augmenting path.
Let m be the minimum capacity of any edge in P . We define the residual
network of (G, u, v, C) with respect to P to be the flow network (G′ , u, v, C ′ ),
where G′ and C ′ are defined as follows:
• G′ is constructed from G by removing any edges in P with capacity
m and by adding edges (y, x) such that (x, y) ∈ P and (y, x) is not an
edge in G.
Thus, Figure 14.2(a) shows the residual network for the flow network in
Figure 14.1 with respect to the augmenting path h0, 2, 4, 1, 3, 5i. This graph
has an augmenting path: h0, 1, 4, 5i. The residual network with respect to
this augmenting path is shown in Figure 14.2(b). There is no augmenting
path in this graph. If we combine the flows obtained by assigning a flow of
1 to each edge in the respective paths, the flows on the edges (4, 1) in the
original graph and (1, 4) in the graph in Figure 14.2(a) cancel each other
out. The resulting flow therefore has a flow of 1 on each edge except (4, 1)
in the original network. This flow has a value of 2, which is maximum.
We now need to prove the correctness of this reduction. We begin by
showing the following lemma.
Lemma 14.1 Let (G, u, v, C) be a flow network, and let P be the set of
edges on some augmenting path. Let F1 be the flow obtained by adding a
flow of m to each edge in P , where m is the minimum capacity of any edge
in P . Let F2 be a maximum flow on the residual graph of (G, u, v, C) with
CHAPTER 14. NETWORK FLOW AND MATCHING 449
(a) (b)
sink 5 sink 5
1 1 1 1
3 4 3 4
1 1
1 1 1 1
1 2 1 2
1 1 1 1
source 0 source 0
Proof: We must first show that the combination of the two flows does not
give a flow where there is no edge in G. This can only happen if there is a
flow in F2 on an edge (x, y) that is not in G. Then (y, x) must be an edge
in P . The capacity of (x, y) in the residual graph is therefore m. Because
the flow on (y, x) in F1 is m, the combination of F1 and F2 cannot give a
positive flow on (x, y).
We will now show that in the combination of F1 with F2 , the flow on
each edge (x, y) is no more than C((x, y)). The only way this can happen
is if there is positive flow on (x, y) in F2 . We consider three cases.
Case 1: (x, y) ∈ P . Then C ′ ((x, y)) = C((x, y)) − m. The sum of the two
flows on (x, y) is therefore at most C((x, y)).
Case 3: (x, y) 6∈ P and (y, x) 6∈ P . Then C ′ ((x, y)) = C((x, y)). In the
CHAPTER 14. NETWORK FLOW AND MATCHING 450
Using the above Lemma, we can prove the theorem below. Combined
with Lemma 14.1, this theorem ensures that the reduction yields a maximum
flow for the given network.
Proof: Let F1 be a maximum flow for (G, u, v, C). Then F1 has value k.
To simplify our discussion, let us interpret F1 as a function F1 : V × V → Z,
where V is the set of vertices in G, such that F1 (x, y) gives the flow over
(x, y) minus the flow over (y, x); if either of these edges does not exist, we
use 0 for this edge’s flow.
Let us assign flows to the edges of G′ as follows:
On the other hand the algorithm could have achieved the same flow with
two augmenting paths: h0, 1, 3i and h0, 2, 3i.
In the next section, we will consider how to make good augmenting path
choices. For now, we note that in the worst case, the loop in NetworkFlow
can iterate M times, where M is the value of the maximum flow. Assuming
the initialization and the body of the loop are implemented to run in Θ(n+a)
time, where n is the number of vertices and a is the number of edges in G,
the algorithm runs in Θ(M (n + a)) time in the worst case. If we assume
that all vertices are reachable from the source, then a ≥ n − 1, and we can
simplify the running time to Θ(M a).
Before we move on to a discussion on finding augmenting paths, we note
one additional property of the Ford-Fulkerson algorithm. As long as each
augmenting path found is simple — and there is no reason a path-finding
algorithm would find a path containing a cycle — the path will not contain
any edges to the source or any edges from the sink. The obvious consequence
is that if the graph contains edges into the source or out of the sink, they
will not be used. A less obvious consequence is that after such edges are
introduced into the residual graph, they will not be used. Thus, once a flow
is added to an edge from the source or to the sink, the flow on that edge
will never be decreased.
Knowing that no vertex ever gets any closer to the source over the course
of the Edmonds-Karp algorithm, we will now prove a lemma that shows when
a vertex must get farther away from the source. If a vertex is farther than
n − 1 edges from the source, where n is the number of vertices, then it must
be unreachable. By Lemma 14.3, once a vertex becomes unreachable, it will
never become reachable. This next lemma will therefore enable us to bound
the number of iterations of the Edmonds-Karp algorithm.
Lemma 14.4 Suppose that when some vertex x is at a distance d from the
source u, the Edmonds-Karp algorithm removes an edge (x, y). Suppose that
later this edge is added again. Then after this edge is added, the distance
from u to x is at least d + 2.
CHAPTER 14. NETWORK FLOW AND MATCHING 455
If the initialization and the body of the loop are implemented to run in
Θ(n + a) time, we can conclude that the algorithm runs in O(na(n + a))
time in the worst case. Furthermore, the analysis of the last section still
applies, so that the running time is in O(min(M, na)(n + a)), where M is
the value of the maximum flow. If we assume that every vertex is reachable
from the source, we can simplify this to O(min(M a, na2 )).
4 5 6 7 8
0 1 2 3
{3, 7}, respectively, and they share a common vertex. Hence, any matching
must exclude either 1 or 3. Therefore, there is no matching of size larger
than 3.
We will now show how to reduce bipartite matching to network flow.
Given a bipartite graph G, we construct an instance of network flow as
follows. For simplicity, we will assume that the vertices of the bipartite
graph have already been partitioned into the sets V1 and V2 (see Exercise
13.13). We first direct all of the edges from V1 to V2 . We then add a new
source vertex u and edges from u to each vertex in V1 . Next, we add a new
sink vertex v and edges from each vertex in V2 to v. Finally, we assign a
capacity of 1 to each edge. See Figure 14.6 for the result of applying this
reduction to the graph in Figure 14.5.
Consider any matching in the given bipartite graph. We can construct
a flow in the constructed network by adding a flow of 1 to each edge in the
matching, as well as to each edge leading to a matched vertex in V1 and
to each edge leading from a matched vertex in V2 . Clearly, any unmatched
vertex from the bipartite graph will have a flow of 0 on all of its incoming
and outgoing edges. Furthermore, each matched vertex in V1 will have a
flow of 1 on its incoming edge and a flow of 1 on the single outgoing edge
in the matching. Likewise, each matched vertex in V2 will have a flow of 1
on the single incoming edge in the matching and a flow of 1 on its outgoing
edge. Thus, we have constructed a flow whose value is the number of edges
in the matching.
Conversely, consider any flow on the constructed network. Because any
vertex in V1 can have an incoming flow of at most 1, at most one of its
outgoing edges will contain a positive flow. Likewise, because any vertex in
V2 can have an outgoing flow of at most 1, at most one of its incoming edges
will contain a positive flow. The edges from V1 to V2 containing positive flow
CHAPTER 14. NETWORK FLOW AND MATCHING 457
Figure 14.6 The flow network constructed from the bipartite graph shown
in Figure 14.5
sink 10
4 5 6 7 8
0 1 2 3
source 9
flow algorithms to operate without the source and/or the sink explicitly
represented.
We also note that as flow is added, the edges containing the flow —
which are the edges of a matching — have their direction reversed. Rather
than explicitly reversing the direction of the edges, we could keep track of
which edges have been included in the matching in some other way. For
example, we could use an array matching[0..n − 1] such that matching[i]
gives the vertex to which i is matched, or is −1 if i is unmatched. Because
a matching has at most one edge incident on any vertex, this may end up
being a more efficient way of keeping track of the vertices adjacent (in the
flow network) to vertices in V2 . The maximum-sized matching could also be
returned via this array.
As we observed at the end of Section 14.1, once flow is added to any
edge from the source or to any edge to the sink, that flow is never removed.
To put this in terms of the matching algorithm, once a vertex is matched, it
remains matched, although the vertex to which it is matched may change.
Furthermore, we claim that if we ever attempt to add a vertex w ∈ V1 to
the current matching M and are unable to do so (i.e., there is no path from
w to an unmatched vertex in V2 ), then we will never be able to add w to
the matching.
To see why this is true, notice that if there were a maximum-sized match-
ing containing all currently matched vertices and w, then there is a matching
M ′ containing no other vertices from V1 . If we delete all vertices from V1
that are unmatched in M ′ , then M ′ is clearly a maximum-sized matching
for the resulting graph. The Ford-Fulkerson algorithm must therefore be
able to find a path that yields M ′ from M .
As a result, we only need to do a single search from each vertex in V1 .
The following theorem summarizes this property.
Theorem 14.6 Let G be a bipartite graph, and let S be the set of vertices
in some matching on G. Suppose some maximum-sized matching includes
all of the vertices in S. Let i be some vertex not in S. Then there is a
maximum-sized matching including S ∪ {i} if there is a matching including
S ∪ {i}.
In order to implement the above optimizations, it is helpful to define
a data structure called MatchingGraph, which implements the Graph
ADT. Its purpose is to represent a particular directed graph G′ derived from
a given bipartite graph G and a matching on G. Suppose G has vertices
0, 1, . . . , n − 1. G′ then contains the vertices 0, 1, . . . , n. If {i, j} is an edge in
CHAPTER 14. NETWORK FLOW AND MATCHING 459
Figure 14.7 The MatchingGraph for the bipartite graph shown in Figure
14.5 with matching {{0, 5}, {3, 7}}
4
0
5
1
9 6
2
7
3
8
G and j is unmatched, then G′ will contain the edge (i, n). Every other edge
in G′ will represent two edges in G — an edge not in the matching followed
by an edge in the matching. Thus, for 0 ≤ i < n, 0 ≤ j < n, and i 6= j,
G′ contains the edge (i, j) iff there is a vertex k in G such that {k, j} is in
the matching and {i, k} is an edge in G. For example, Figure 14.7 shows
the MatchingGraph for the bipartite graph of Figure 14.5 with matching
{{0, 5}, {3, 7}}.
Suppose the two partitions of G are V1 and V2 . Then augmenting paths
in the flow network constructed by the reduction correspond to paths from
unmatched vertices in V1 to n in G′ . In order to find an augmenting path in
the flow network, we need to find a path to n in G′ from an unmatched vertex
in G. For example, consider the MatchingGraph shown in Figure 14.7.
We can add vertex 2 to the matching by finding a path from 2 to 9 in G′ .
Taking the path h2, 0, 9i could yield the augmenting path h2, 5, 0, 4i, which
produces the matching shown in Figure 14.5. This path could also yield the
augmenting path h2, 5, 0, 6i because 0 is adjacent to two unmatched vertices,
4 and 6. Alternatively, taking the path h2, 9i would yield the augmenting
path h2, 8i.
Note that G′ actually represents two flow networks. If (i, j) is an edge in
′
G and j 6= n, then i and j must both be in the same partition. Therefore,
the subgraph induced by V1 ∪{n} represents the flow network in which edges
CHAPTER 14. NETWORK FLOW AND MATCHING 460
MatchingGraph.Size()
return bipartite.Size() + 1
MatchingGraph.AllFrom(i)
n ← bipartite.Size(); L ← new ConsList()
if i < n
foundUnmatched ← false; adj ← bipartite.AllFrom(i)
while not adj.IsEmpty()
e ← adj.Head(); adj ← adj.Tail()
k ← e.Dest(); j ← matching[k]
if j = −1 and not foundUnmatched
L ← new ConsList(new Edge(i, n, k), L)
foundUnmatched ← true
else if j 6= −1
L ← new ConsList(new Edge(i, j, k), L)
return L
PathSearcher(n)
incoming ← new Array[0..n − 1]
PathSearcher.TreePreProc(e)
incoming[e.Dest()] ← e
We arrange the edges so that when we try to add vertex 2i for 0 ≤ i <
k, we first encounter the edge {2i, 2k + i}. Because 2k + i is not in the
matching, it is added. It will then be impossible to add vertex 2i + 1,
but each node 2k + j, for 0 ≤ j < i, will be reached in the search for an
augmenting path. (For example, consider the search when trying to add 5
CHAPTER 14. NETWORK FLOW AND MATCHING 464
8 9 10 11 12 13 14 15
0 1 2 3 4 5 6 7
to the matching {{0, 8}, {2, 9}, {4, 10}} in Figure 14.11.) Constructing this
matching therefore uses Ω(k 2 ) time.
We can now generalize the above construction to arbitrary n by adding
or removing a few vertices adjacent to 2k − 1. Furthermore, we can add
edges {2i, 2k + j} for 0 ≤ i < k and 0 ≤ j < i − 1 without increasing the
size of the maximum-sized matching. However, these additional edges must
all be traversed when we try to add vertex 2i + 1 to the matching. This
construction therefore forces the algorithm to use Ω(na) time. Furthermore,
the number of edges added can be as many as
X
k−1
k(k − 1)
(i − 1) = −k
2
i=0
k 2 − 3k
=
2
n2 − 12n
= .
32
Including the n−1 original edges, the total number of edges a is in the range
n(n + 20)
n−1≤a< .
32
The above construction is more general than we really need, but its
generality shows that some simple modifications to the algorithm won’t im-
prove its asymptotic running time. For example, the graph is connected,
so processing connected components separately won’t help. Also, the two
partitions are the same size, so processing the smaller (or larger) partition
first won’t help either. Furthermore, using breadth-first search won’t help
because it will process just as many edges when no augmenting path exists.
CHAPTER 14. NETWORK FLOW AND MATCHING 465
On the other hand, this algorithm is not the most efficient one known for
this problem. In the exercises, we explore how it might be improved.
Although the optimizations we made over a direct reduction to network
flow did not improve the asymptotic running time of the algorithm, the
resulting algorithm may have other advantages. For example, suppose we are
trying to match jobs with job applicants. Each applicant may be qualified
for several jobs. We wish to fill as many jobs as possible, but still assign
jobs so that priority is given to those who applied earlier. If we process the
applicants in the order in which they applied, we will obey this priority.
14.4 Summary
The network flow problem is a general combinatorial optimization problem
to which many other problems can be reduced. Although the Ford-Fulkerson
algorithm can behave poorly when the maximum flow is large in comparison
to the size of the graph, its flexibility makes it useful for those cases in which
the maximum flow is known to be small. For cases in which the maximum
flow may be large, the Edmonds-Karp algorithm, which is simply the Ford-
Fulkerson algorithm using breadth-first search to find augmenting paths,
performs adequately.
The bipartite matching problem is an example of a problem which occurs
quite often in practice and which can be reduced to network flow to yield a
reasonably efficient algorithm. Furthermore, a careful study of the reduction
yields insight into the problem that leads to a more general algorithm.
14.5 Exercises
Exercise 14.1 Prove Equation (14.1) on page 446. [Hint: Show by induc-
tion that the net flow out of any set of vertices including the source but not
the sink is equal to the left-hand side.]
of edges in F and R together. For the purposes of your analysis, you may
assume that F and R are implemented as ListGraphs, and that the Edges
in P form a simple path in R.
Exercise 14.9 Suppose we modify the network flow problem so that the
input includes an array cap[0..n − 1] of integers such that for each vertex i,
cap[i] gives an upper bound on the flow we allow to go to and from vertex
i. Show how to reduce this problem to the ordinary network flow problem.
Your reduction must run in O(n + a) time, where n is the number of vertices
and a is the number of edges in the graph.
* Exercise 14.12 We are given two arrays of integers, R[1..m] and C[1..n]
such that
Xm X
n
R[i] = C[i] = k.
i=1 i=1
15.1 Convolutions
Let a = ha0 , . . . , am−1 i and b = hb0 , . . . , bn−1 i be two vectors. We define the
convolution of a and b as the vector c = hc0 , . . . , cm+n−2 i, where
min(j,m−1)
X
cj = ai bj−i .
i=max(0,j−n+1)
469
CHAPTER 15. * THE FAST FOURIER TRANSFORM 470
10.4 shows that this can be done in O(n1+ǫ ) time for any ǫ ∈ R>0 (though
in fact the hidden constant becomes quite large as ǫ approaches 0). We wish
to improve on these algorithms.
It is a well-known fact that a polynomial of degree n − 1 is uniquely
determined by its values at any n distinct points. Therefore, one way to
multiply two polynomials p(x) and q(x) whose product has degree n − 1 is
as follows:
1. Evaluate p(xi ) and q(xi ) for n distinct values xi , 0 ≤ i < n.
vA−1 = pAA−1
= p.
• ω n = 1; and
• for 1 ≤ j < n,
n−1
X
ω ij = 0.
i=0
We will show how to find such values in C. First, however, let us consider
why having a principal nth root of unity might be helpful. Given a principal
nth root of unity ω, let A be the n × n matrix such that Aij = ω ij . Given a
1 × n vector p, the product pA is said to be the discrete Fourier transform of
p with respect to ω. Note that if p is the coefficient vector for a polynomial
p(x), then pA gives the values of p(ω j ) for 0 ≤ j < n.
In what follows, we will develop a divide-and-conquer algorithm for com-
puting a DFT. To simplify matters, let’s assume that n is a power of 2. The
following theorem shows an important property of principal nth roots of
unity when n is a power of 2. We will use this property in designing our
divide-and-conquer algorithm.
ω i(2j) = ω nj ω (2i−n)j
= ω (2i−n)j
Note that each sum on the right-hand side is the jth component of the
DFT with respect to ω 2 of a 1 × n/2 vector. Specifically, let d′ and d′′ be the
CHAPTER 15. * THE FAST FOURIER TRANSFORM 473
DFTs of p′ and p′′ , respectively, with respect to ω 2 , and let d be the DFT
of p with respect to ω. Then for 0 ≤ j < n/2, we have
Furthermore,
n n
2
−1 2
−1
X X
dj+n/2 = p′′i ω 2i(j+n/2) + ω j+n/2 p′i ω 2i(j+n/2)
i=0 i=0
n n
2
−1 2
−1
X X
= p′′i ω 2ij + ω j+n/2 p′i ω 2ij
i=0 i=0
= d′′j + ω j+n/2 d′j . (15.2)
Proof:
(ω 2 )n/4 = ω n/2
= −1.
Using the fact that ω n/2 = −1, we can now rewrite (15.2) for 0 ≤ j < n/2
as
dj+n/2 = d′′j − ω j d′j . (15.3)
We therefore have the divide-and-conquer algorithm, known as the Fast
Fourier Transform, shown in Figure 15.1. Note that we use the type Com-
plex to represent a complex number.
Because Fft should only be called with a vector whose size n is a power
of 2, n is not a good measure of the size of the problem instance for the
purpose of analyzing the algorithm. Instead, we will use k = lg n. Assuming
each arithmetic operation on complex numbers can be performed in Θ(1)
time, it is easily seen that the running time excluding the recursive calls is
in Θ(2k ). The worst-case running time is therefore given by the recurrence
f (k) ∈ 2f (k − 1) + Θ(2k ).
d′′ ← Fft(p′′ , ω 2 )
υ←1
// Invariant: d[0..i − 1] and d[mid..mid + i − 1] contain the correct
// values for the DFT of p, and υ = ω i .
for i ← 0 to mid − 1
d[i] ← d′′ [i] + υ(d′ [i])
d[i + mid] ← d′′ [i] − υ(d′ [i])
υ ← υω
return d
Theorem 15.3 Let A be the n × n matrix such that for 0 ≤ i < n and
0 ≤ j < n, Aij = ω ij , where ω is a principal nth root of unity. Then A−1 is
the matrix B, where Bij = ω −ij /n.
Case 1: i = j. Then
n−1
1 X k(i−j)
Cij = ω
n
k=0
n−1
1 X
= ω0
n
k=0
= 1.
Note that the matrix A−1 can be written A′ /n, where A′ij = ω −ij . The
following theorem shows that ω −1 is also a principal nth root of unity, so
that multiplication by A′ is also a DFT. As a result, we can use Fft to
compute the inverse transform.
Proof: From Theorem 15.2, we need only to show that ω −n/2 = −1. Be-
cause ω n = 1, we have
ω −n/2 = ω n−n/2
= ω n/2
= −1
Base: n = 2. Then
2π n/2
2π
cos + i sin = cos π + i sin π
n n
= −1 + 0i
= −1.
CHAPTER 15. * THE FAST FOURIER TRANSFORM 479
2π k/2
2π
cos + i sin = −1.
k k
Induction Step:
!n/4
2π n/2 2π 2
2π 2π
cos + i sin = cos + i sin
n n n n
n/4
2 2π 2 2π 2π 2π
= cos − sin + 2i cos sin .
n n n n
and
2(cos x sin x) = sin 2x.
We therefore have
2π n/2 4π n/4
2π 4π
cos + i sin = cos + i sin
n n n n
n/2
2π 2π 2
= cos + i sin
n/2 n/2
= −1
X n−1
X n−1
n−1 n−1 n−1 n−1
X 1 XXX
pi qk ω (i+k)l ω −lj /n = pi qk ω (i+k−j)l
n
l=0 i=0 k=0 l=0 i=0 k=0
X n−1
n−1 n−1
1 X X
= pi qk ω (i+k−j)l .
n
i=0 k=0 l=0
calls to Fft is in Θ(n). If k = lg m, the running time for each call to Fft is
in Θ(k2k ) = Θ(n lg n). The overall running time is therefore in Θ(n lg n). It
is easily seen that this algorithm can be used to multiply two polynomials
over C in Θ(n lg n) time, where n is the degree of the product.
Throughout this discussion, we have been assuming that we can store ar-
bitrary complex numbers and perform arithmetic operations on them in Θ(1)
time. These assumptions are rather dubious. However, for most scientific
and engineering applications, it is sufficient to use floating-point approxima-
tions. The Convolution algorithm is therefore very useful in practice.
Example 15.6 hZ, +i, the set of integers with addition, is an abelian group.
Example 15.7 hN, +i, the set of natural numbers with addition, is not a
group because only 0 has an inverse.
CHAPTER 15. * THE FAST FOURIER TRANSFORM 483
Example 15.8 For a positive integer m, let Zm denote the set of natural
numbers strictly less than m, and let + denote addition mod n. It is not
hard to see that hZm , +i is an abelian group, with 0 being the identity and
n − i being the inverse of i.
• Associativity: (x · y) · z = x · (y · z).
• Distributivity: x · (y + z) = x · y + x · z and (x + y) · z = x · z + y · z.
Example 15.10 It is not hard to see that hC, +, ·i, where + and · denote
ordinary addition and multiplication, respectively, is a commutative ring
with unit element 1.
Example 15.12 Let S = {0, 2, 4, 6}, and let + and · denote addition and
multiplication, respectively, mod 8. Then it is not hard to see that hS, +, ·i
is a commutative ring. However, it does not have a unit element, because
0 · 2 = 0, 2 · 4 = 0, and 6 · 2 = 4.
CHAPTER 15. * THE FAST FOURIER TRANSFORM 484
Example 15.13 Let S be the set of 2 × 2 matrices over R, and let + and
· denote matrix addition and multiplication, respectively. Then It is not
hard to show that hS, +, ·i is a ring, and that the identity matrix is a unit
element. However, the ring is not commutative; for example,
1 1 1 0 2 1
= ,
0 1 1 1 1 1
but
1 0 1 1 1 1
= .
1 1 0 1 1 2
In what follows, we will show that the results of the previous section
extend to an arbitrary commutative ring R = hS, +, ·i with unit element
1. For convenience, we will typically abbreviate x · y as xy. We will also
abbreviate x + (−y) as x − y.
We first observe that for x ∈ S and n ∈ N, we can define xn as follows:
(
n 1 if n = 0
x = n−1
xx otherwise.
Hence, the definition of a principal nth root of unity makes sense for R.
Furthermore, the definition of a discrete Fourier transform also makes sense
over this ring. The following theorem states that some familiar properties
of exponentiation must hold for any ring with unit element; its proof is left
as an exercise.
Theorem 15.14 Let R be any ring with unit element. Then the following
properties hold for any x in R and any m, n ∈ N:
a. xm xn = xm+n .
b. (xm )n = x(mn) .
Theorem 15.1 can be shown using only the properties given in the def-
inition of a ring, together with Theorem 15.14. It therefore applies to R.
The derivations of equations (15.1) and (15.2) use the properties of a ring,
together with commutativity, so that they also hold for R. The proof of
Theorem 15.2 applies for arbitrary rings with unit elements, so equation
(15.3) holds for R. The algorithm Fft therefore can be used to compute a
DFT over R, provided ω is a principal nth root of unity for that ring, and
that addition and multiplication on elements of the ring are the + and ·
operations from R.
CHAPTER 15. * THE FAST FOURIER TRANSFORM 485
ω n/2 mod m = m − 1, the inverse of 1 in hZm , +i. One way of satisfying this
constraint is to select m = 2k + 1 for some positive integer k. Then 22k/n is
a principal nth root of unity, provided 2k is divisible by n. Because n is a
power of 2, we should require k to be a power of 2 such that 2k ≥ n.
We also need to be able to find the multiplicative inverse of n in this
ring. Because 22k/n is a principal nth root of unity, 22k = 1 in this ring.
Therefore, n−1 = 22k /n. We therefore have the following theorem.
Theorem 15.15 Let k and n be powers of 2 such that 1 ≤ n ≤ 2k, and let
m = 2k + 1. In the ring hZm , +, ·i:
Note that if k and n are both powers of 2 such that 1 ≤ n ≤ 2k, both
22k/n and 22k /n are also powers of 2. This fact is advantageous because
multiplying a BigNum by a power of 2 can be done very efficiently via the
Shift operation.
In order to complete the reduction of arbitrary-precision multiplication
to multiplication in a ring hZm , +, ·i, we must select a specific m. Suppose
the natural numbers u and v together have a total of n bits. Then uv will
have at most n bits. We can therefore set k to the smallest power of 2 no
smaller than n, and let m = 2k + 1. The resulting algorithm is shown in
Figure 15.3 (see Figure 10.6 on page 350 for its specification).
Let us consider how to multiply two k-bit numbers, u and v, mod 2k + 1,
where k is a power of 2. Suppose we break u and v into b blocks of l bits
each. Let these blocks be u0 , . . . , ub−1 and v0 , . . . , vb−1 , so that
b−1
X
u= ui 2il
i=0
and
b−1
X
v= vi 2il .
i=0
MultFft(u, v)
n ← Max(1, u.NumBits() + v.NumBits()); k ← 2⌈lg n⌉
return ModMult(u, v, k)
ModMult(u, v, k)
Note that the last term in the above sum (i.e., for j = 2b − 1) is 0. We
include it in order to simplify the derivation that follows.
Because k = bl, 2bl = −1 in the ring hZm , +, ·i, where m = 2k + 1. We
can therefore write the product uv in this ring as
b−1 X
X j 2b−1
X X b−1
uv = ui vj−i 2jl − ui vj−i 2(j−b)l
j=0 i=0 j=b i=j−b+1
X j
b−1 X b−1 X
X b−1
= ui vj−i 2jl − ui vj−i+b 2jl
j=0 i=0 j=0 i=j+1
b−1
X j
X b−1
X
= 2jl ui vj−i − ui vj−i+b .
j=0 i=0 i=j+1
Theorem 15.16 Let R be a commutative ring with unit element, and sup-
pose ψ is a principal (2n)th root of unity in R. Let p and q be 1 × n vectors
over R, and let Ψ and Ψ′ be 1×n vectors such that Ψj = ψ j and Ψ′j = ψ 2n−j
for 0 ≤ j < n. Then the negative wrapped convolution of p and q is given
by
Ψ′ · ((Ψ · p) ⊗ (Ψ · q)), (15.4)
where · denotes the component-wise product of two vectors over R.
uniquely determines
j
X b−1
X
ui vj−i − ui vj−i+b .
i=0 i=j+1
Because each component of u and v is strictly less than 2l , the above expres-
sion is strictly less than b22l and strictly greater than −b22l . We therefore
need
′
2k + 1 ≥ 2b22l
k ′ ≥ lg(b22l+1 − 1).
ModMultFft(u, v, k)
if k < 16
return ToRing(MultiplyAdHoc(u, v), k)
else
if (lg k) mod
√ 2=√ 0
b ← 2 k; l ← k/2
else √ p
b ← 2k; l ← k/2
uarray ← new Array[0..b − 1]; varray ← new Array[0..b − 1]
for j ← 0 to b − 1
uarray[j] ← new BigNum(u.GetBits(jl, l))
varray[j] ← new BigNum(v.GetBits(jl, l))
conv ← NegConv(uarray, varray, 4l)
return Eval(conv, k, l)
for the modular ring; however, because the precondition requires that n is a
power of 2, we don’t need to copy the elements to arrays of such a size. In
order to facilitate multiplication by n−1 , we use the variable lgInv to store
lg(n−1 ). Also, recall that the precondition for ModMult (Figure 15.3)
requires that each argument is at most k bits. However, the discrete Fourier
transforms may contains elements equal to 2k , which has k + 1 bits. We
must therefore handle this case separately.
The principal nth root of unity used for computing the DFT will be 22k/n .
For computing the inverse DFT, we therefore must use the multiplicative
inverse of 22k/n . Because 22k mod (2k + 1) = 1, (22k/n )−1 = 22k−2k/n . For
reasons of efficiency and ease of analysis, we use a boolean to indicate which
of these roots the function ModFft is to use.
The implementation of ModFft is shown in Figure 15.7. It is a fairly
straightforward adaptation of Fft (Figure 15.1) to the ring hZm , +, ·i. We
must be careful, however, when subtracting ω i d′ [i] from d′′ [i] in order to
obtain d[i + mid], because ω i d′ [i] may be greater than d′′ [i]. In order to
satisfy the precondition of BigNum.Subtract (Figure 4.18 on page 146),
CHAPTER 15. * THE FAST FOURIER TRANSFORM 492
Figure 15.7 The Fast Fourier Transform algorithm over a modular ring
we first subtract ω i d′ [i] from m, then add the result, mod m, to d′′ [i]. In
order to compute m, we assume the existence of a constant one, which refers
to a BigNum representing 1.
Let us now turn to the implementation of ToRing, specified in Figure
15.4. A straightforward way of computing x mod m is to divide x by m
using long division, and return the remainder. Fortunately, the form of m
makes this long division easy. Suppose we break m and x into k-bit digits.
Then the representation of m in this radix is 11.
In order to see how each step of the long division can proceed, suppose
x = a2k + b, where b < 2k and a < m. We first approximate the quotient as
a. If a ≤ b, the quotient is, in fact a, and the remainder is b − a. If a > b,
CHAPTER 15. * THE FAST FOURIER TRANSFORM 494
ToRing(x, k)
numDig ← ⌈x.NumBits()/k⌉; m ← one.Shift(k).Add(one)
rem ← x.GetBits(k(numDig − 1), k)
// Invariant:
// rem = x.GetBits((i + 1)k, x.NumBits() − (i + 1)k) mod m
for i ← numDig − 2 to 0 by −1
next ← x.GetBits(ik, k)
if rem.CompareTo(next) > 0
next ← next.Add(m)
rem ← next.Subtract(rem)
return rem
(a − 1)(2k + 1) = a2k + a − 2k − 1
≤ a2k − 1
≤ a2k + b.
a2k + b − (a − 1)(2k + 1) = b + 2k − a + 1
= b + m − a.
Eval(v[0..n − 1], k, l)
m ← one.Shift(k).Add(one); m′ ← one.Shift(4l).Add(one)
half ← m′ .Shift(−1)
pos ← new Array[0..nl − 1]; neg ← new Array[0..nl − 1]
posCarry ← zero; negCarry ← zero
for j ← 0 to n − 1
if v[j].CompareTo(half ) > 0
negCarry ← negCarry.Add(m′ .Subtract(v[j]))
else
posCarry ← posCarry.Add(v[j])
negBits ← negCarry.GetBits(0, l); negCarry ← negCarry.Shift(l)
posBits ← posCarry.GetBits(0, l); posCarry ← posCarry.Shift(l)
Copy(negBits[0..l − 1], neg[jl..j(l + 1) − 1])
Copy(posBits[0..l − 1], pos[jl..j(l + 1) − 1]);
posNum ← posCarry.Shift(nl).Add(new BigNum(pos))
negNum ← negCarry.Shift(nl).Add(new BigNum(neg))
return ToRing(posNum.Add(m.Subtract(ToRing(negNum, k))), k)
combine the two arrays into a single BigNum. The algorithm is shown in
Figure 15.9.
To analyze the running time of our multiplication algorithm, we begin by
analyzing ToRing. From the loop invariant, the value of rem never exceeds
2k ; hence, the value of next never exceeds 2k+1 . Thus, the body of the loop
clearly runs in Θ(k) time. The number of iterations is ⌈n/k⌉ − 1, where n
is the number of bits in x. The loop therefore runs in Θ(n) time, provided
n > k. Because the initialization runs in Θ(k) time, the entire algorithm
runs in Θ(max(n, k)) time.
In order to analyze ModFft, let us first ignore the computations whose
running times depend on k, namely, the calculation of m and the calls to
ToRing, Add, and Subtract. Thus, if we let n = 2N , the running time
of the remaining code is in Θ(N 2N ). Specifically, we can conclude that the
total number of iterations of each of the for loops is in Θ(N 2N ).
Now let k = 2K , and let us analyze the running time of a single iteration
of the second for loop, including the calls to ToRing, Add, and Subtract.
CHAPTER 15. * THE FAST FOURIER TRANSFORM 496
We first observe that for all i, d′ [i] ≤ 2k and d′′ [i] ≤ 2k . Because the number
of bits added by the Shift is at most 2k, the Shift therefore runs in O(2K )
time. Because the result of the Shift has O(2K ) bits, the call to ToRing
runs in Θ(2K ) time. Likewise, it is easily seen that the remaining operations
run in O(2K ) time as well. A single iteration of the second for loop therefore
runs in Θ(2K ) time. We conclude that ModFft runs in Θ(N 2N +K ) time.
It is easily seen that the running time of PosConv, excluding the calls
to ModFft and ModMult, is in Θ(2N +K ). Because the first two argu-
ments to ModMult must have at most 2K bits, we can describe the running
time of ModMult in terms of K. In particular, let f (K) denote the worst-
case running time of ModMult, assuming it is implemented using Mod-
MultFft. Because ModMult is called no more than 2K times, the running
time of PosConv is bounded above by a function in O(N 2N +K ) + 2N f (K).
Likewise, it is easily seen that NegConv has the same asymptotic running
time.
In order to analyze Eval, let l = 2L . It is easily seen that exclud-
ing the return statement, this function runs in Θ(max(2K , 2N +L )) time.
Furthermore, it is not hard to see that when the return statement is ex-
ecuted, posNum and negNum each contain at most n(3 + l) bits; hence,
ToRing(negNum, k) runs in Θ(max(2K , 2N +L )) time. Likewise, it is not
hard to see that the entire return statement runs in Θ(max(2K , 2N +L ))
time.
We can now obtain an asymptotic recurrence for f (K), the worst-case
running time of ModMult. In what follows, we assume K ≥ 4. We first
observe that if K is even, then b = 2(K/2)+1 and l = 2(K/2)−1 . Likewise, if
K is odd, b = 2(K+1)/2 , and l = 2(K−1)/2 . We can combine these two cases
by saying that b = 2⌈(K+1)/2⌉ and l = 2⌊(K−1)/2⌋ . The running time of the
for loop is therefore in
Θ(bl) = Θ(2⌈(K+1)/2⌉+⌊(K−1)/2⌋ )
= Θ(2K ).
K ≥ 1. Then
O((K + 3)2K+3 ) + 2⌈(K+4)/2⌉ f (⌊(K + 2)/2⌋ + 2)
g(K) ∈
2K
4f (⌊K/2⌋ + 3)
= O(K) +
2⌊K/2⌋
= O(K) + 4g(⌊K/2⌋). (15.6)
Applying Theorem 3.32, we have g(K) ∈ O(K 2 ). Thus, for K ≥ 4,
f (K) = 2K−4 g(K − 3)
∈ 2K−4 O(K 2 )
⊆ O(2K K 2 ).
The running time of ModMult is therefore in O(2K K 2 ). We can therefore
conclude that the running time of MultFft is in
O(2⌈lg n⌉ ⌈lg n⌉2 ) = O(n lg2 n),
where n is the number of bits in the product.
The above analysis is almost sufficient to show that the running time
of MultFft is in Θ(n lg2 n). Specifically, we only need to show that there
are inputs for each sufficiently large n such that the call to ModMult is
made on each iteration of the first loop in PosConv. Unfortunately, such a
proof would be quite difficult. On the other hand, it seems unlikely that our
upper bound on this algorithm’s worst-case running time can be improved.
f (K − c + 2d)
g(K) =
2K
O((K − c + 2d)2K−c+2d ) + 2⌈(K+2d)/2⌉ f (⌊(K − 2c + 2d)/2⌋ + d)
∈
2K
d
2 f (⌊K/2⌋ − c + 2d)
= O(K) +
2⌊K/2⌋
d
= O(K) + 2 g(⌊K/2⌋).
ModMultSS(u, v, k)
if k < 8
return ToRing(MultiplyAdHoc(u, v), k)
else
if (lg k)√mod 2 =√0
b ← k; l ← k
else √ p
b ← 2k; l ← k/2
uarray ← new Array[0..b − 1]; varray ← new Array[0..b − 1]
uarray ′ ← new Array[0..b − 1]; varray ′ ← new Array[0..b − 1]
for j ← 0 to b − 1
uarray[j] ← new BigNum(u.GetBits(jl, l))
varray[j] ← new BigNum(v.GetBits(jl, l))
uarray ′ [j] ← new BigNum(u.GetBits(jl, lg b + 1))
varray ′ [j] ← new BigNum(v.GetBits(jl, lg b + 1))
conv ← NegConv(uarray, varray, 2l)
conv ′ ← NegConvSS(uarray ′ , varray ′ , lg b + 1)
return EvalSS(conv, conv ′ , k, l)
and
conv ′ [j] = wj mod (2b) (15.9)
for 0 ≤ j < b.
Because 2b is a power of 2 and 22l + 1 is odd, they are relatively prime.
Theorem 15.17 therefore guarantees that if wj ≥ 0, then it is the only natural
number less than 2b(22l + 1) that satisfies (15.8) and (15.9). Furthermore,
it is not hard to see that wj + 2b(22l + 1) also satisfies these constraints.
Thus, Theorem 15.17 guarantees that if wj < 0, then wj + 2b(22l + 1) is the
only natural number less than 2b(22l + 1) that satisfies these constraints.
The proof of Theorem 15.17 will be constructive, so that we will be able to
compute the value that it guarantees. Finally, because wj < b(22l + 1) <
wj +2b(22l +1), we can determine whether the value guaranteed by Theorem
15.17 is wj or wj + 2b(22l + 1).
In order to prove Theorem 15.17, we need the following lemma.
Proof: Let r1 = a mod bm, so that for some integer p, bmp + r1 = a. Let
r2 = a mod m, so that for some integer q,
mq + r2 = a
= r1 + bmp
m(q − bp) = r1 − r2 .
We can multiply by 22l + 1 using a bit shift and an addition. We can then
determine wj by comparing the above value with b(22l+1 ) and subtracting
2b(22l + 1) if necessary. The algorithm is shown in Figure 15.11.
In order to implement NegConvSS, we must be able to compute a
negative wrapped convolution over a ring hZm , +, ·i, where m is a power
of 2. However, because the values of the vectors are much smaller than
those used in the other convolution, we don’t need to be quite as careful
regarding the efficiency of this algorithm. Specifically, we don’t need to use
the FFT. Instead, we can first compute a non-wrapped convolution mod 2k .
Let us refer to this convolution as conv[0..2b − 1]. Element j of the negative
wrapped convolution is then (conv[j] − conv[n + j]) mod 2k . The algorithm
is shown in Figure 15.12.
Recall that PolyMult (Figure 10.1 on page 335) computes a non-
wrapped convolution of two vectors over hZ, +, ·i. We can therefore modify
this algorithm to operate on BigNums such that all operations are mod
2k . In order for the resulting algorithm to satisfy the specification of Non-
WrappedConv, we would also need to modify it to return an array whose
CHAPTER 15. * THE FAST FOURIER TRANSFORM 502
size is larger by one element, whose value will be 0. We leave the details as
an exercise.
In order to analyze the Schönhage-Strassen algorithm, which is simply
the Multiply algorithm of Figure 15.3 with ModMult implemented using
ModMultSS, we first observe that the analysis of EvalSS is similar to
the analysis of Eval in the previous section. Hence, its running time is in
Θ(max(2K , 2N +L )), where k = 2K , n = 2N , and l = 2L . Because Poly-
Mult runs in Θ(nlg 3 ) time, where n is the degree of the product, Non-
WrappedConv can be implemented to run in O(nlg 3 M (k)) time, where
M (k) is the time needed to multiply two k-bit BigNums mod 2k . Because
M (k) must be in Ω(k), NegConvSS then runs in O(nlg 3 M (k)) time.
To analyze ModMultSS, we first recall that the running time of Neg-
Conv is in O(N 2N +K ) + 2N f (K), where f (K) denotes the worst-case run-
ning time of ModMult; here, we will assume that ModMult is imple-
mented with ModMultSS. If we now let 2K be the value of k in the call to
ModMultSS, then the call to NegConv runs in O(K2K )+2⌈K/2⌉ f (⌊K/2⌋+
1), and NegConvSS runs in O(2⌈K/2⌉ lg 3 M (K)) time. Hence, even if M (K)
is in Θ(K 2 ), the running time for these two calls together is in O(K2K ) +
2⌈K/2⌉ f (⌊K/2⌋ + 1). Because the call to EvalSS runs in Θ(2K ) time, the
CHAPTER 15. * THE FAST FOURIER TRANSFORM 504
15.5 Summary
The Fast Fourier Transform is an efficient algorithm for computing a con-
volution, a problem which arises in a variety of applications. For numerical
applications, applying the FFT over hC, +, ·i is appropriate; however, for
number-theoretic applications like arbitrary-precision integer multiplication,
other algebraic structures are more appropriate. The algorithm extends to
any commutative ring containing a principal nth root of unity, and over
which n has a multiplicative inverse, where n is a power of 2 giving the
number of elements in the vectors.
Some rings that are particularly useful for number-theoretic applications
are rings of the form hZm , +, ·i, where m is of the form 2k +1. The properties
of these rings contribute in several ways to the efficiency of the Schönhage-
Strassen integer multiplication algorithm. First, we can compute n mod
(2k +1) efficiently. Second, the principal nth roots of unity in these rings are
powers of 2, so that we can use bit shifting to multiply by these roots. Third,
when n is a power of 2, it has a multiplicative inverse that is also a power of
2. Fourth, we can compute a product in this ring with a negative wrapped
convolution of vectors with half as many elements as would be needed to
compute a non-wrapped convolution. Finally, because any power of 2 is
relatively prime to 2k + 1, we can reduce by half the number of bits we use
in computing the negative wrapped convolution if we instead perform some
computation on a few bits of each value and apply the Chinese Remainder
Theorem.
15.6 Exercises
Exercise 15.1 Prove Theorem 15.14. [Hint: Use induction on either m or
n.]
CHAPTER 15. * THE FAST FOURIER TRANSFORM 505
Exercise 15.3
a. Prove Theorem 15.17 by showing that for any a1 ∈ Zm1 and any a2 ∈
Zm2 , if i = (m2 c2 a1 + m1 c1 a2 ) mod m1 m2 , where (m1 c1 ) mod m2 = 1
and (m2 c2 ) mod m1 = 1, then i mod m1 = a1 and i mod m2 = a2 .
Thus, if c is a principal nth root of unity, then the chirp transform with
respect to c is a DFT. Show how to reduce the problem of computing a
chirp transform for arbitrary c ∈ C to the problem of computing a convo-
lution. Using this reduction, give an O(n lg n) algorithm for evaluating a
chirp transform.
Exercise 15.6 A Toeplitz matrix is an n×n array A such that for 1 ≤ i < n
and 1 ≤ j < n, Aij = Ai−1,j−1 . Thus, we can describe a Toeplitz matrix
by giving only its first row and its first column. Give an algorithm for
CHAPTER 15. * THE FAST FOURIER TRANSFORM 506
Intractable Problems
507
Chapter 16
N P-Completeness
• ¬: logical negation;
• ∧: logical and.
508
CHAPTER 16. N P-COMPLETENESS 509
• Is F valid ? That is, does F evaluate to true for every possible assign-
ment of true or false to the variables in F?
• a single variable;
∨ or
¬ ∧ not and
x y x 1 2 1
and, and or to represent the three operators, and we can represent the vari-
ables using positive integers. In order to avoid unnecessary complications,
we will assume that if j represents a variable in the expression, then for
1 ≤ i ≤ j, i represents some variable in the expression (note, however, that
we cannot apply this assumption to arbitrary subtrees). For example, Fig-
ure 16.1 shows the tree representing the formula ¬x ∨ (y ∧ x), first in a more
abstract form, then in a more concrete form using 1 to represent x and 2
to represent y. Finally, we can represent an assignment of truth values to
the variables using an array A[1..n] of boolean values, where A[i] gives the
value assigned to the variable represented by i.
Given an expression tree for F and an array representing an assignment
of truth values, we can then evaluate F using BoolEval, shown in Figure
16.2. It is not hard to see that this algorithm runs in Θ(m) time, where
m is the number of operators in F (note that the number of leaves in the
expression tree can be no more than m + 1).
Returning to the satisfiability problem, we now point out some charac-
teristics that this problem has in common with the other problems we will
study in this chapter. First, it is a decision problem — the output is either
“yes” or “no”. Second, when the answer is “yes”, there is a relatively short
proof of this fact — an assignment of truth values that satisfies the expres-
sion. Third, given a proposed proof of satisfiability (i.e., some assignment
of truth values to the variables), we can efficiently verify whether it does,
in fact, prove the satisfiability of the expression. However, finding such a
proof, or proving that none exists, appears to be an expensive task in the
worst case (note that there are 2n possible assignments of truth values to n
variables).
CHAPTER 16. N P-COMPLETENESS 511
the set of all bit strings that encode boolean expressions. We can then let
Sat denote the set of all expressions in I that are satisfiable. It will also be
convenient to use Sat to denote the problem itself. In general, given a set
of instances I, we will refer to any subset X ⊆ I as a decision problem over
I.
We must now address the question of how efficient is “efficient”. We
will somewhat arbitrarily delineate the “efficient” algorithms as those that
operate within a running time bounded by some polynomial in the length
of the input. This delineation, however, is not entirely satisfactory. On the
one hand, one could make a persuasive argument that an algorithm with a
running time in Θ(n1000 ) is not “efficient”. On the other hand, suppose some
algorithm has a running time in Θ(n⌈α(n)/4⌉ ), where α is as defined in Section
8.4. Because n⌈α(n)/4⌉ ≤ n for every positive n that can be coded in binary
within our universe, such an algorithm might reasonably be considered to
be “efficient”. However, n⌈α(n)/4⌉ is not bounded above by any polynomial.
The main reason we equate polynomial running time with efficiency
is that polynomials have several useful closure properties. Specifically, if
p1 (n) and p2 (n) are polynomials, then so are p1 (n) + p2 (n), p1 (n)p2 (n), and
p1 (p2 (n)). As we will see, these closure properties make the theory much
cleaner. Furthermore, if we can say that a particular decision problem can-
not be solved by any polynomial-time algorithm, then we can be fairly safe
in concluding that there is no algorithm that will terminate in a reasonable
amount of time on large inputs in the worst case.
Before we formalize this idea, we need to be careful about one aspect of
our running-time measures. Specifically, we have assumed in this text that
arithmetic operations can be performed in a single step. This assumption is
valid if we can reasonably expect the numbers to fit in a single machine word.
For larger values, we should use the BigNum type in order to get an ap-
propriate measure of the running time. Also note that all of the algorithms
in this text except those using real or complex numbers can be expressed
using only booleans and natural numbers as primitive types (i.e., those not
defined in terms of other variables). Furthermore, only the algorithms of
Section 15.1 require real or complex numbers — all other algorithms can be
restricted to rational numbers, which can be expressed as a pair of natural
numbers and a sign bit. Thus, it will make sense to stipulate that an “effi-
cient” algorithm contains only boolean or natural number variables, along
with other types built from these, and that each natural number variable
will contain only a polynomial number of bits.
We can now formalize our notion of “efficient”. We say that an algorithm
A is a polynomial-time algorithm if there is a polynomial p(n) such that
CHAPTER 16. N P-COMPLETENESS 513
We then define P to be the set of all decision problems X such that there
exists a deterministic (i.e., not randomized) polynomial-time algorithm A
deciding X. It is this set P that we will consider to be the set of efficiently
solvable decision problems.
The decision to define P using only deterministic algorithms is rather ar-
bitrary. Indeed, there is a branch of computational complexity theory that
focuses on efficient randomized algorithms. However, the study of deter-
ministic algorithms is more fundamental, and therefore is a more reasonable
starting point for us.
such that
• Y ∈ P;
Theorem 16.1 P ⊆ N P.
Note that Theorem 16.2 does not say that if Y can be decided in O(f (n))
time, then X can be decided in O(f (n)) time. Indeed, in the proof of the
theorem, the bound on the time to decide X can be much larger than the
time to decide Y . Thus, if we interpret X ≤pm Y as indicating that X is no
harder than Y , we must understand “no harder than” in a very loose sense
— simply that if Y ∈ P, then X ∈ P.
CHAPTER 16. N P-COMPLETENESS 516
expression tree. For a negated variable ¬x, we use −i, where i is the integer
representing the variable x. We will again assume that for an input formula
F, if j represents a variable in F, then for 1 ≤ i ≤ j, i represents a variable
in F. Again, this assumption will not apply to arbitrary sub-formulas.
One obvious way of reducing Sat to CSat is to convert a given boolean
expression to an equivalent expression in CNF. However, there are boolean
expressions for which the shortest equivalent CNF expression has size expo-
nential in the size of the original expression. As a result, any such conversion
algorithm must require at least exponential time in the worst case.
Fortunately, our reduction doesn’t need to construct an equivalent ex-
pression, but only one that is satisfiable iff the given expression is satisfiable.
In fact, the constructed expression isn’t even required to contain the same
variables. We will use this flexibility in designing our reduction.
For the first step of our reduction, we will construct an equivalent formula
in which negations are applied only to variables. Because of this restriction,
we can simplify our representation for this kind of expression by allowing
leaves to contain either positive or negative integers, as in our representation
of CNF formulas. Using this representation, we no longer need nodes repre-
senting the ¬ operation. We will refer to this representation as a normalized
expression tree.
Fortunately, there is a polynomial-time algorithm for normalizing a bool-
ean expression tree. The algorithm uses DeMorgan’s laws:
• ¬(x ∧ y) = ¬x ∨ ¬y.
The algorithm is shown in Figure 16.3. This algorithm solves a slightly more
general problem for which the input includes a boolean neg, which indicates
whether the normalized expression should be equivalent to F or ¬F. It is
easily seen that its running time is proportional to the number of nodes in
the tree, which is in O(m), where m is the number of operators in F.
As the second step in our reduction, we need to find the largest integer
used to represent a variable in a normalized expression tree. We need this
value in order to be able to introduce new variables. Such an algorithm is
shown in Figure 16.4. Clearly, its running time is in O(|F|).
As the third step in our reduction, we will construct from a normalized
expression tree F and a value larger than any integer representing a variable
in F, a CNF expression F ′ having the following properties:
F ′ = F1′ ∧ F2′ is in CNF and clearly satisfies properties P1 -P3 with respect
to F.
• α1 ∨ α2 ∨ u1 ;
• ¬um−3 ∨ αm−1 ∨ αm .
We first claim that any assignment of boolean values that satisfies C can
be extended to an assignment that satisfies each of the new clauses. To see
why, first observe that if C is satisfied, then αi must be true for some i. We
can then set u1 , . . . , ui−2 to true and ui−1 , . . . , um−3 to false. Then each of
the first i − 2 clauses is satisfied because u1 , . . . , ui−2 are true. The (i − 1)st
clause, ¬ui−2 ∨αi ∨ui−1 is satisfied because αi is true. Finally, the remaining
clauses are satisfied because ¬ui−1 , . . . , ¬um−3 are true.
We now claim that any assignment that satisfies the new clauses will also
satisfy C. Suppose to the contrary that all the new clauses are satisfied, but
that C is not satisfied — i.e., that α1 , . . . , αm are all false. Then in order for
the first clause to be satisfied, u1 must be true. Likewise, it is easily shown
by induction on i that each ui must be true. Then the last clause is not
satisfied — a contradiction.
If we apply the above transformation to each clause having more than
3 literals in a CNF formula F and retain those clauses with no more than
3 literals, then the resulting 3-CNF formula is satisfiable iff F is satisfiable.
Furthermore, it is not hard to implement this reduction in O(|F|) time —
the details are left as an exercise. Hence, CSat ≤pm 3-Sat. We therefore
conclude that 3-Sat is N P-complete.
c12 c22
x1 x2 x3
xi in F, two vertices xi and ¬xi , together with an edge {xi , ¬xi }. Then
any vertex cover must include either xi or ¬xi . Furthermore, by choosing
an appropriate size for the vertex cover, we might be able to prohibit the
simultaneous inclusion of both xi and ¬xi .
In order to complete the reduction, we need to ensure that any vertex
cover of size k describes a satisfying assignment for F, and that for any
satisfying assignment for F, there is a vertex cover of size k that describes
it. To this end, we will add more structure to the graph we are constructing.
We know that for a satisfying assignment, each clause contains at least one
true literal. In order to model this constraint with a graph, let us construct,
for each clause Ci , the vertices ci1 , ci2 , and ci3 , along with the edges {ci1 , ci2 },
{ci2 , ci3 }, and {ci3 , ci1 }. Then any vertex cover must contain at least two of
these three vertices.
Finally, for 1 ≤ i ≤ n and 1 ≤ j ≤ 3, we construct an additional edge
{cij , αij }. For example, Figure 16.7 shows the graph constructed from the
3-CNF formula (x1 ∨ ¬x2 ∨ x3 ) ∧ (¬x1 ∨ x3 ). By setting k = m + 2n, where m
CHAPTER 16. N P-COMPLETENESS 526
the vertices of this directed graph. 3DM is then the natural extension of
this problem to 3-dimensional hypergraphs. Using the algorithm Matching
(Figure 14.10 on page 463), we can decide the 2-dimensional version (2DM)
in O(na) time, where n is the number of vertices and a is the number of
edges in the graph; thus, 2DM ∈ P. However, we will now show that 3DM
is N P-complete.
In order to show that 3DM ∈ N P, let us first denote an instance by
X = {x1 , . . . , xm }
Y = {y1 , . . . , ym }
Z = {z1 , . . . , zm }
W = {w1 , . . . , wn }.
We interpret a bit string φ as encoding an array A[1..k] such that each block
of b bits encodes an element of A, where b is the number of bits needed to
encode n. Any bit string that does not have length exactly bm will be
considered to be invalid. To verify that the array A encoded by φ is a proof,
we can check that
• φ is valid;
• 1 ≤ A[i] ≤ n for 1 ≤ i ≤ m; and
• each element of X∪Y ∪Z belongs to some triple wA[i] , where 1 ≤ i ≤ m.
This can easily be done in O(bm2 ) time — the details are left as an exercise.
Hence, 3DM ∈ N P.
In order to show that 3DM is N P-hard, we need to reduce some N P-
complete problem to it. So far, we have identified five N P-complete prob-
lems: three satisfiability problems and two graph problems. However, none
of these bears much resemblance to 3DM. We therefore make use of a prin-
ciple that has proven to be quite effective over the years: when in doubt,
try 3-Sat.
As we did in showing 3-Sat ≤pm VC, we will begin by focusing on the
proofs of membership in N P for two problems. Specifically, we want to
relate the choice of a subset of W to the choice of truth values for boolean
variables. Let’s start by considering two triples, hx, ax , bx i and h¬x, ax , bx i,
where x is some boolean variable. If these are the only two triples containing
ax or bx , any matching must include exactly one of these triples. This choice
could be used to set the value of x.
If we were to construct two such triples for each variable, we would then
need to construct triples to represent the clauses. Using a similar idea,
CHAPTER 16. N P-COMPLETENESS 529
we could introduce, for a given clause αi1 ∨ αi2 ∨ αi3 , the triples hαi1 , ci , di i,
hαi2 , ci , di i, and hαi3 , ci , di i — one triple for each literal in the clause. Again,
any matching must contain exactly one of these triples. If we let x be false
when hx, ax , bx i is chosen, then the triple chosen for the clause must contain
a true literal.
This construction has a couple of shortcomings, however. First, because
each literal must occur exactly once in a matching, we can use a given
variable to satisfy only one clause. Furthermore, if more than one literal is
true in a given clause, there may remain literals that are unmatched. These
shortcomings should not be too surprising, as we could do essentially the
same construction producing pairs instead of triples — the third components
are redundant. Thus, if this construction had worked, we could have used
the same technique to reduce 3-Sat to 2DM, which belongs to P. We would
have therefore proved that P = N P.
In order to overcome the first shortcoming, we need to enrich our con-
struction so that we have several copies of each literal. To keep it simple, we
will make one copy for each clause, regardless of whether the literal appears
in the clause. We must be careful, however, so that when we choose the
triples to set the boolean value, we must either take all triples containing x
or all triples containing ¬x. Because we are constructing triples rather than
pairs, we can indeed accomplish these goals.
Let x1 , . . . , xn denote all the copies of the literal x, and let ¬x1 , . . . , ¬xn
denote all the copies of the literal ¬x. We then introduce the following
triples (see Figure 16.8):
• hxi , axi , bxi i for 1 ≤ i ≤ n;
• h¬xi , axi , bx,i+1 i for 1 ≤ i ≤ n − 1; and
• h¬xn , axn , bx1 i.
It is not too hard to see that in order to match all of the axi s and bxi s, a
matching must include either those triples containing the xi s or those triples
containing the ¬xi s.
We can now use the construction described earlier for building triples
from clauses, except that for clause i, we include the ith copy of each literal
in its triple. Thus, in any matching, there must be for each clause at least one
triple containing a copy of a literal. However, there still may be unmatched
copies of literals. We need to introduce more triples in order to match the
remaining copies.
Suppose our 3-CNF formula F has n clauses and m variables. Then our
construction so far contains:
CHAPTER 16. N P-COMPLETENESS 530
Figure 16.8 Triples for setting boolean values in the reduction from 3-Sat
to 3DM, with n = 4
x1
bx2 ax4
x2 x4
ax2 bx4
x3
• mn bs and n ds.
In order to make the above three sets of equal size, we add ei to the second
set and fi to the third set, for 1 ≤ i ≤ (m−1)n. We then include all possible
triples hxi , ej , fj i and h¬xi , ej , fj i for 1 ≤ i ≤ n and 1 ≤ j ≤ (m−1)n. Using
this construction, we can now show the following theorem.
• one triple for each literal in each clause, or at most 3n triples; and
• triples hxi , ej , fj i and h¬xi , ej , fj i for each variable x, each i such that
1 ≤ i ≤ n, and each j such that 1 ≤ j ≤ (m − 1)n, or 2(m − 1)mn2
triples.
Thus, the total number of triples produced is at most
We will construct a weight for each triple, plus two additional weights.
Suppose hxi , yj , zk i ∈ W . The weight we construct for this triple will be
Therefore each weight constructed has a binary encoding with no more than
1 + ⌈3m lg(n + 1)⌉ bits. Because addition, subtraction, multiplication, and
exponentiation can all be performed in a time polynomial in the number
of bits in their operands (see Exercise 4.14, Section 10.1, Exercise 10.24,
and Sections 15.3-15.4), the construction can clearly be performed in time
polynomial in the size of the instance of 3DM. We therefore have the
following theorem.
1. there is a polynomial p1 such that p1 (|f (x)|) ≥ |x| for every instance
x of X; and
CHAPTER 16. N P-COMPLETENESS 536
• 2-Part ∈ P.
We will show that 3DM ≤pp m 4-Part. The reduction will be somewhat
similar to the reduction from 3DM to Part, but we must be careful that
the weights we construct are not too large. Let us describe an instance of
3DM using the same notation as we did for the earlier reduction. We will
assume that each element occurs in at least one triple. Otherwise, there is
no matching, and we can create an instance with 7 items having weight 6
CHAPTER 16. N P-COMPLETENESS 537
and one item having weight 8, so that the total weight is 50, and B = 25.
Clearly, 25/5 < 6 < 8 < 25/3; hence, this is a valid instance, but there is
clearly no way to form a subset with weight 25.
We will construct, for each triple hxi , yj , zk i ∈ W , four weights: one
weight for each of xi , yj , and zk , plus one weight for the triple itself. Because
each element of X ∪ Y ∪ Z can occur in several triples, we may construct
several items for each element. Exactly one of these will be a matching
item. All non-matching items constructed from the same element will have
the same weight, which will be different from that of the matching item
constructed from that element. We will construct the weights so that in any
4-partition, the item constructed from a triple must be grouped with either
the matching items constructed from the elements of the triple, or three
non-matching items — one corresponding to each element of the triple. In
this way, a 4-partition will exist iff W contains a matching.
As in the previous reduction, it will be convenient to view the weights
in a particular radix r, which we will specify later. In this case, however,
the weights will contain only a few radix-r digits. We will choose r to be
large enough that when we add any four of the weights we construct, each
column of digits will have a sum strictly less than r; hence, we will be able
to deal with each digit position independently in order to satisfy the various
constraints. Note that if we construct the weights so that for every triple,
the sum of the four weights constructed is the same, then this sum will be
B.
We will use the three low-order digits to enforce the constraint that the
four items within any partition must be derived from some triple and its
three components. To this end, we make the following assignments:
• For any weight constructed from xi ∈ X, we assign i + 1 to the first
digit and 0 to the second and third digits.
• For a triple hxi , yj , zk i, we assign the first three digits the values 2m−i,
2m − j, and 2m − k, respectively.
B will therefore have 2m + 1 as each of its three low-order digits.
Note that because each weight constructed from a triple has a value of
at least m + 1 in each of its three low-order digits, no two of these weights
CHAPTER 16. N P-COMPLETENESS 538
Furthermore,
so that every weight is larger than B/5. Furthermore, each weight is less
than
r4 + 4r3 < r4 + r4 /3
= 4r4 /3
< B/3.
Our proof will therefore involve a generic reduction. This kind of re-
duction is more abstract in that we begin with an arbitrary X ∈ N P.
Specifically, the same reduction should work for any problem X that we
might choose from N P. Thus, the only assumption we can make about X
is that it satisfies the definition of N P: there exist a polynomial p(n) and a
decision problem Y ⊆ I × B, where I is the set of instances of X, such that
• Y ∈ P — that is, there exist a polynomial p′ (n) and an algorithm A
that takes an element x ∈ I and an element φ ∈ B as its inputs and
decides within p′ (|x| + |φ|) steps whether (x, φ) ∈ Y ;
• a program counter, which is initially 0 and can store any natural num-
ber less than P ;
• two input streams from which values may be read one bit at a time;
and
Because we will be using this model only for representing an algorithm for
deciding whether (x, φ) ∈ Y , we need exactly two input streams, one for x
and one for φ. Furthermore, we can represent a “yes” output by setting the
output bit to 1.
We will assume that each memory location is addressed by a unique
natural number. Each machine will then have the following instruction set:
• Input(i, l): Stores the next bit from input stream i, where i is either
0 or 1, in memory location l. If all of the input has already been read,
the value 2 is stored.
Add(0, ∗1)
such that
the problem X; however, the instance x is the input for the construction, so
that we cannot know it in advance.
Because we can use any polynomial-time M that decides Y , we can sim-
plify matters further by making a couple of assumptions about M . First,
we can assume that M contains at least one Halt(1) instruction — if there
is no input yielding a “yes” answer, we can always include such an instruc-
tion at an unreachable location. Second, because statements that cannot
be executed due to error conditions have the same effect as Halt(0), and
because the instruction set is powerful enough to check any run-time er-
ror conditions, we can assume that all statements can be executed without
error.
In addition, we make some simplifying assumptions regarding the poly-
nomials p and p′ . First, we note that by removing any negative terms from
p′ , we obtain a polynomial that is nondecreasing and never less than the
original polynomial. Thus, we can assume that p′ is nondecreasing, so that
p′ (|x| + p(|x|)) will give an upper bound on the number of steps executed by
M on an input (x, φ) with |φ| ≤ p(|x|). Furthermore, because p′ (n + p(n))
is a polynomial, we can assume that p′ (n) is an upper bound on the number
of steps taken by M on any input (x, φ) such that |x| = n and |φ| ≤ p(n).
Note that with these assumptions, p′ (n) ≥ p(n) for all n. We can therefore
choose p(n) = p′ (n), so that we can use a single polynomial p to bound both
|φ| and the number of steps executed by M . Finally, we can choose p so
that p(n) ≥ n for all n ∈ N.
As the first step in our construction, we need boolean variables to rep-
resent the various components of the state of M at various times in its
execution. First, we need variables describing the input sequences x and φ.
For x, we will use the variables x[k] for 1 ≤ k ≤ n, where n is the length
of x. Because φ is unknown, even during the execution of the construction,
we cannot know its exact length; however, we do know that its length is no
more than p(n). We therefore will use the variables φ[k] for 1 ≤ k ≤ p(n) to
represent φ.
We also need variables to keep track of which bits are unread at each
step of the execution of M . For this purpose, we will use the variables x bi [k]
for 0 ≤ i ≤ p(n) and 0 ≤ k ≤ n, plus the variables φbi [k] for 0 ≤ i ≤ p(n)
and 0 ≤ k ≤ p(n). We want x bi [k] to be true iff the kth bit of x has not been
read after i execution steps. Likewise, we want φbi [k] to be true iff the kth
bit of φ exists and has not been read after i execution steps.
We then need to record the value of the program counter at each execu-
tion step. We will use the variables pij for 0 ≤ i ≤ p(n) and 0 ≤ j < P for
this purpose, where P denotes the number of instructions in the program of
CHAPTER 16. N P-COMPLETENESS 545
M . We want pij to be true iff the program counter has a value of j after i
execution steps.
Recording the values of the memory locations at each execution step
presents more of a challenge. Because a memory location can contain any
value less than 2p(n) , M can access any memory location with an address less
than 2p(n) . If we were to construct variables for each of these locations, we
would end up with exponentially many variables. We cannot hope to con-
struct a formula containing this many variables in a polynomial amount of
time. However, the number of memory locations accessed by any instruction
is at most four — the number accessed by Copy(∗l1 , ∗l2 ) in the worst case.
As a result, M can access a total of no more than 4p(n) different memory
locations. We can therefore use a technique similar to the implementation of
a VArray (see Section 9.5) in order to keep track of the memory locations
actually used.
Specifically, we will let the variables aj [k], for 1 ≤ j ≤ 4p(n) and 1 ≤ k ≤
p(n), denote the value of the kth bit of the address of some location lj , where
the first bit is the least significant bit. Here, we will let true represent 1 and
false represent 0. Then the variables vij [k], for 0 ≤ i ≤ p(n), 1 ≤ j ≤ 4p(n),
and 1 ≤ k ≤ p(n), will record the value of the kth bit of the value stored
at location lj after i execution steps. We will make no requirement that
location lj actually be used by M , nor do we require that lj be a different
location from lj ′ when j 6= j ′ .
Finally, we will use the additional variables ci [0..p(n)] and di [1..p(n)] for
1 ≤ i ≤ p(n). We will explain their purposes later.
Before we describe the formula F that we will construct, let us first
define some abbreviations, or “macros”, that will make the description of F
simpler. First, we will define the following:
If(y, z) = ¬y ∨ z.
This abbreviation specifies that if y is true, then z must also be true. How-
ever, if y is false, then no constraint is placed upon z. Note that such an
expression can be constructed in O(1) time.
We can extend the above abbreviation to specify an if-then-else con-
struct:
IfElse(y, z1 , z2 ) = If(y, z1 ) ∧ If(¬y, z2 ).
This specifies that if y is true, then z1 is true, but if not, then z2 is true.
Clearly it can be constructed in O(1) time.
CHAPTER 16. N P-COMPLETENESS 546
We will now define an abbreviation for the specification that two vari-
ables y and z are equal:
Eq(y, true) = y
and
Eq(y, false) = ¬y.
We can also extend the abbreviation to arrays of variables as follows:
n
^
Eq(y[1..n], z[1..n]) = Eq(y[k], z[k]).
k=1
Because aj , a′j , l, and vij ′ are arrays of p(n) elements, this expression can
be constructed in O(p2 (n)) time.
CHAPTER 16. N P-COMPLETENESS 547
We then observe that the carry bit of this sum is 1 iff at least two of the
three bits are 1. Stated another way, the carry bit is 1 iff either
For the first constraint, we will specify that for 0 ≤ i ≤ p(n), there is
at most one j such that pij is true. Thus, there will be no ambiguity as to
the value of the program counter at each execution step. We specify this
constraint with the sub-formula,
p(n) P −2 P −1
^ ^ ^
F1 = If(pij , ¬pij ′ ).
i=0 j=0 j ′ =j+1
F2 = p0,0 .
F3 = Eq(x, x).
Because the size of A depends only on the problem X, this sub-formula can
be constructed in O(p(n)) time.
To complete the formula, we need constraints specifying the correct be-
havior of M . To this end, we will construct one sub-formula for each instruc-
tion in the program of M . These sub-formulas will depend on the particular
instruction. Let 0 ≤ q < P , where P is the number of instructions in
the program. In what follows, we will describe how the sub-formula Fq′ is
constructed depending on the instruction at program location q.
Regardless of the specific instruction, the sub-formula will have the same
general form. In each case, Fq′ must specify that some particular behavior
occurs whenever the program counter has a value of q. Fq′ will therefore
have the following form:
p(n)
^
Fp′ = If(pi−1,q , ψq (i)), (16.1)
i=1
where ψq (i) is a predicate specifying the result of executing the ith instruc-
tion.
Each ψq (i) will be a conjunction of predicates, each specifying some
aspect of the result of executing the ith instruction. In particular, ψq (i) will
be the conjunction of the following predicates:
• Uq (i), which specifies how the memory locations are updated;
• Eq (i), which specifies what memory locations must be represented in
F in order for this instruction to be simulated (this specification is
needed to prevent Uq (i) from being vacuously satisfied);
CHAPTER 16. N P-COMPLETENESS 550
There are some instances of the above predicates that occur for more
than one type of instruction.
• If the instruction at location q is not an Input instruction, then
and
Eq (i) = true. (16.6)
In what follows, we will define the remaining predicates for several of the
possible instructions. We leave the remaining cases as exercises.
Let us first consider an instruction Load(n, l). Because l is the only
memory location that is accessed, we can define
4p(n)
_
Eq (i) = Eq(aj , l).
j=1
Note that the above expression specifies that every vij such that aj = l has
its value changed to n.
CHAPTER 16. N P-COMPLETENESS 551
Let us now compute the time needed to construct the resulting sub-
formula Fq′ . Because the arrays aj and l each contain p(n) elements, Eq (i, j)
can be constructed in O(p2 (n)) time. It is not hard to verify that Uq (i) can
be constructed in O(p2 (n)) time as well. Clearly, Pq (i) as defined in (16.4)
can be constructed in O(1) time. Finally, Iq (i) as defined in (16.3) can be
constructed in O(p(n)) time. Thus, ψq (i) can be constructed in O(p2 (n))
time. The sub-formula Fq′ can therefore be constructed in O(p3 (n)) time.
We can handle an instruction Load(n, ∗l) in a similar way, but using
the Ind abbreviation. Thus, we define
4p(n)
_
Eq (i) = Ind(i − 1, j, l)
j=1
and
4p(n)
^
Uq (i) = If(Ind(i − 1, j, l), Eq(vij , n), Eq(vij , vi−1,j )).
j=1
In this case, Eq (i) and Uq (i) can be constructed in O(p3 (n)) time, so that
Fq′ can be constructed in O(p4 (n)) time.
Let us now consider an instruction IfLeq(l1 , l2 , q ′ ). Because the memory
locations l1 and l2 are referenced, we define
4p(n) 4p(n)
_ _
Eq = Eq(aj , l1 ) ∧ Eq(aj , l2 ).
j=1 j=1
piq′ , pi,q+1 .
Eq (i) can be constructed in O(p2 (n)) time, and both Uq (i) and Pq (i) can
be constructed in O(p3 (n)) time. Furthermore, Iq (i) as given in (16.3) can
be constructed in O(p(n)) time. The total time needed to construct Fq′ is
therefore in O(p4 (n)).
Finally, let us consider a Halt instruction. For a Halt instruction, we
have already defined Iq (i) (16.3), Uq (i) (16.5), and Eq (i) (16.6). To define
Pq (i), we need to specify that for all i′ > i, each pi′ j is false:
p(n) P −1
^ ^
Pq (i) = ¬pi′ j .
i′ =i+1 j=0
is true iff the ith instruction executed by M on input (x, φ) is the instruction
at program location j. Finally, because F7 must be satisfied, one of these
instructions must be a Halt(1) instruction.
Now suppose that for some φ ∈ B, M executes a Halt(1) instruction
on input (x, φ). By our choice of the polynomial p(n), we can assume that
|φ| ≤ p(|x|). Let us now set x = x and φ[1..|φ|] = φ. We will also set
b
φ[k] = true for 1 ≤ k ≤ |φ| and φ[k] b = false for |φ| < k ≤ p(n). We
can clearly assign truth values to the variables in the sub-formulas Fq for
2 ≤ q ≤ 6 so that all of these sub-formulas are satisfied. By the above
construction, we can then assign truth values to the variables in each of the
sub-formulas Fq′ for 1 ≤ q ≤ p(n) so that these formulas, along with F1 , are
satisfied. Such an assignment will yield pij = true iff pj is the ith instruction
executed by M on input (x, φ). Because M executes a Halt(1) instruction
on this input, F7 must also be satisfied. Therefore, F is satisfied by this
assignment.
We have therefore shown that X ≤pm Sat. Because X can be any prob-
lem in N P, it follows that Sat is N P-hard. Because Sat ∈ N P, it follows
that Sat is N P-complete.
16.9 Summary
The N P-complete problems comprise a large class of decision problems for
which no polynomial-time algorithms are known. Furthermore, if a poly-
nomial time algorithm were found for any one of these problems, we would
be able to construct polynomial-time algorithms for all of them. For this
reason, along with many others that are beyond the scope of this book, we
tend to believe that none of these problems can be solved in polynomial
time. Note, however, that this conjecture has not been proven. Indeed,
this question — whether P = N P — is the most famous open question in
theoretical computer science.
Proofs of N P-completeness consist of two parts: membership in N P
and N P-hardness. Without knowledge of any N P-complete problems, it is
quite tedious to prove a problem to be N P-hard. However, given one or
more N P-complete problems, the task of proving additional problems to be
N P-hard is greatly eased using polynomial-time many-one reductions.
Some general guidelines for finding a reduction from a known N P-
complete problem to a problem known to be in N P are as follows:
• Look for a known N P-complete problem that has similarities with the
problem in question.
CHAPTER 16. N P-COMPLETENESS 554
16.10 Exercises
Exercise 16.1 Prove that if X, Y , and Z are decision problems such that
X ≤pm Y and Y ≤pm Z, then X ≤pm Z.
Exercise 16.8 Two graphs G and G′ are said to be isomorphic if the ver-
tices of G can be renamed so that the resulting graph is G′ . Given two
graphs G and G′ and a natural number k, we wish to decide whether G and
G′ contain isomorphic subgraphs with k vertices. Show that this problem is
N P-complete. You may use the result of Exercise 16.7.
Exercise 16.10 Repeat Exercise 16.9 for directed graphs. You may use the
result of Exercise 16.9.
Exercise 16.12 Repeat Exercise 16.11 for directed graphs. You may use
the results of Exercises 16.9, 16.10, and 16.11.
Exercise 16.16 Given a finite sequence of finite sets and a natural number
k, we wish to decide whether the sequence contains at least k mutually
disjoint sets. Show that this problem is N P-complete.
Exercise 16.18 Suppose we modify the 0-1 knapsack problem (see Section
12.4) by including a target value V as an additional input. The problem
then is to decide whether there is a subset of the items whose total weight
does not exceed the weight bound W and whose total value is at least V .
Prove that this problem is N P-complete.
Exercise 16.24 Show that the problem of deciding whether a given undi-
rected graph has a k-coloring is N P-complete for each fixed k ≥ 4. You
may use the result of Exercise 16.23.
** Exercise 16.25 Certain aspects of the board game Axis and AlliesTM
can be modeled as follows. The game is played on an undirected graph.
The playing pieces include fighters and aircraft carriers, each of which has
a natural number range. These pieces are each assigned to a vertex of
the graph. Each vertex may be assigned any number of pieces. A combat
scenario is valid if it is possible to move each piece to a new vertex (possibly
the same one) so that
• for each move, the distance (i.e., number of edges) from the starting
vertex to the ending vertex is no more than the range of piece moved;
and
• after the pieces are moved, each vertex has no more than twice as
many fighters as aircraft carriers.
Exercise 16.27 The bin packing problem (BP) is to decide whether a given
set of items, each having a weight wi , can be partitioned into k disjoint sets
each having a total weight of at most W , where k and W are given positive
integers. Show that BP is N P-complete in the strong sense.
Exercise 16.31 Define the predicate Pq (i) for the case in which the in-
struction at location q is Goto(q ′ ). Show that the resulting sub-formula Fq′
can be constructed in O(p(n)) time.
Exercise 16.32 Define the predicates Eq (i) and Uq (i) for the case in which
the instruction at location q is Copy(l1 , l2 ). Show that the resulting sub-
formula Fq′ can be constructed in O(p4 (n)) time.
Exercise 16.33 Define the predicates Eq (i) and Uq (i) for the case in which
the instruction at location q is Copy(∗l1 , ∗l2 ). Show that the resulting sub-
formula Fq′ can be constructed in O(p5 (n)) time.
CHAPTER 16. N P-COMPLETENESS 559
Exercise 16.34 Define the predicates Eq (i), Uq (i), and Pq (i) for the case
in which the instruction at location q is IfLeq(∗l1 , ∗l2 , q ′ ). Show that the
resulting sub-formula Fq′ can be constructed in O(p5 (n)) time.
Exercise 16.35 Define the predicates Eq (i) and Uq (i) for the case in which
the instruction at location q is Add(∗l1 , ∗l2 ). Show that the resulting sub-
formula Fq′ can be constructed in O(p5 (n)) time.
Exercise 16.36 Define the predicates Eq (i) and Uq (i) for the case in which
the instruction at location q is Subtract(∗l1 , ∗l2 ). Show that the resulting
sub-formula Fq′ can be constructed in O(p5 (n)) time.
Exercise 16.37 Define the predicates Eq (i) and Uq (i) for the case in which
the instruction at location q is Shift(∗l). Show that the resulting sub-
formula Fq′ can be constructed in O(p4 (n)) time.
* Exercise 16.38 Define the predicates Iq (i), Eq (i), and Uq (i) for the case
in which the instruction at location q is Input(1, ∗l). Show that the resulting
sub-formula Fq′ can be constructed in O(p5 (n)) time.
solution to Exercise 16.30 is attributed to Perl and Zaks by Garey and John-
son [52].
Axis and AlliesTM (mentioned in Exercise 16.25) is a registered trade-
mark of Hasbro, Inc.
Chapter 17
Approximation Algorithms
561
CHAPTER 17. APPROXIMATION ALGORITHMS 562
• the time required to obtain a solution for x, excluding any time needed
to solve instances of Y , is bounded above by p(|x|); and
17.2 Knapsack
The first problem we will examine is the 0-1 knapsack problem, as defined
in Section 12.4. As is suggested by Exercise 16.18, the associated decision
problem is N P-complete; hence, the optimization problem is N P-hard.
Consider the following greedy strategy for filling the knapsack. Suppose
we take an item whose ratio of value to weight is maximum. If this item
won’t fit, we discard it and solve the remaining problem. Otherwise, we
include it in the knapsack and solve the problem that results from removing
this item and decreasing the capacity by its weight. We have thus reduced
the problem to a smaller instance of itself. Clearly, this strategy results
in a set of items whose total weight does not exceed the weight bound.
Furthermore, it is not hard to implement this strategy in O(n lg n) time,
where n is the number of items.
Because the problem is N P-hard, we would not expect this greedy strat-
egy to yield an optimal solution in all cases. What we need is a way to
measure how good an approximation to an optimal solution it provides. In
order to motivate an analysis, let us consider a simple example. Consider
the following instance consisting of two items:
• The first item has weight 1 and value 2.
• The second item has weight 10 and value 10.
• The weight bound is 10.
The value-to-weight ratios of the two items are 2 and 1, respectively. The
greedy algorithm therefore takes the first item first. Because the second item
will no longer fit, the solution provided by the greedy algorithm consists of
the first item by itself. The value of this solution is 2. However, it is easily
seen that the optimal solution is the second item by itself. This solution has
a value of 10.
A common way of measuring the quality of an approximation is to form
a ratio with the actual value. Specifically, for a maximization problem, we
define the approximation ratio of a given approximation to be the ratio of
the optimal value to the approximation. Thus, the approximation ratio for
the above example is 5. For a minimization problem, we use the reciprocal
of this ratio, so that the approximation ratio is always at least 1. As the ap-
proximation ratio approaches 1, the approximation approaches the optimal
value.
Note that for a minimization problem, the approximation ratio cannot
take a finite value if the optimal value is 0. For this reason, we will restrict
CHAPTER 17. APPROXIMATION ALGORITHMS 564
Proof: We begin by showing the lower bound. Let ǫ ∈ R>0 , and without
loss of generality, assume ǫ < 1. We first define the weight bound as
4
W =2 .
ǫ
• The second and third items each have a weight and value of W/2.
The optimal solution clearly consists of the second and third items. This
solution has value W . Each iteration of the outer loop of KnapsackAp-
prox yields a solution containing the first item and one of the other two.
The solution returned by this algorithm therefore has a value of W/2 + 2.
CHAPTER 17. APPROXIMATION ALGORITHMS 565
S = {i | 1 ≤ i ≤ n, A[i] = true},
then X
w[i] ≤ W.
i∈S
Because items i and k both belong to S and v[i] ≥ v[k], v[k] ≤ V (S)/2.
We therefore have
in Θ(nV ) time, where n is the number of items and V is the sum of their
values. We can make V as small as we wish by replacing each value v by
⌊v/d⌋ for some positive integer d. If some of the values become 0, we re-
move these items. Observe that because we don’t change any weights or the
weight bound, any packing for the new instance is a packing for the origi-
nal. However, because we take the floor of each v/d, the optimal packing
for the new instance might not be optimal for the original. The smaller we
make d, the better our approximation, but the less efficient our dynamic
programming algorithm.
In order to determine an appropriate value for d, we need to analyze
the approximation ratio of this approximation algorithm. Let S be some
optimal set of items. The optimal value is then
X
V∗ = vi .
i∈S
Let v be the largest value of any item in the original instance. Assuming
that no item’s weight exceeds the weight bound, we can conclude that V ∗ ≥
v. Thus, if v ≥ (k + 1)n, we can satisfy the above inequality by setting d to
⌊v/((k + 1)n)⌋. However, if v < (k + 1)n, this value becomes 0. In this case,
we can certainly set d to 1, as the dynamic programming algorithm would
then give the optimal solution. We therefore set
v
d = max ,1 .
(k + 1)n
We can clearly compute the scaled values in O(n) time. If v ≥ 2(k + 1)n,
the sum of the scaled values is no more than
nv nv
=j k
d v
(k+1)n
nv
≤ v−(k+1)n
(k+1)n
(k + 1)n2 v
=
v − (k + 1)n
(k + 1)n2 v
≤
v/2
= 2(k + 1)n2 .
problem, we have
V 1
∗
≤1+
V k
V∗
V ≤V∗+
k
V∗
V −V∗ ≤
k
< 1.
Proof: We will first show as an invariant of the for loop that at most one
bin is no more than half full. This clearly holds initially. Suppose it holds
at the beginning of some iteration. If w[i] > W/2, then no matter where
w[i] is placed, it cannot increase the number of bins that are no more than
half full. Suppose w[i] ≤ W/2. Then if there is a bin that is no more than
half full, w[i] will fit into this bin. Thus, the only case in which the number
of bins that are no more than half full increases is if there are no bins that
are no more than half full. In this case, the number cannot be increased to
more than one.
We conclude that the packing returned by this algorithm has at most
one bin that is no more than half full. Suppose this packing consists of k
bins. The total weight must therefore be strictly larger than (k − 1)W/2.
CHAPTER 17. APPROXIMATION ALGORITHMS 573
BinPackingFF(W , w[1..n])
B ← new Array[1..n]; slack ← new Array[1..n]; numBins ← 0
for i ← 1 to n
j←1
while j ≤ numBins and w[i] > slack[j]
j ←j+1
if j > numBins
numBins ← numBins + 1; B[j] ← new ConsList(); slack[j] ← W
B[j] ← new ConsList(i, B[j]); slack[j] ← slack[j] − w[i]
return B[1..numBins]
The optimal packing must therefore contain more than (k −1)/2 bins. Thus,
the number of bins in the optimal packing is at least
k−1
k+1
+1=
2 2
≥ k/2.
The approximation ratio is therefore at most 2.
not give a polynomial-time algorithm for ǫ-ApproxBP for any ǫ < 3/2,
as the proof of Theorem 17.5 essentially shows the hardness of deciding
whether B ∗ = 2. Furthermore, Theorem 17.5 does not preclude the existence
of a pseudopolynomial algorithm with an approximation ratio bounded by
some value less than 3/2. We leave it as an exercise to show that dynamic
programming can be combined with the first-fit decreasing strategy to yield,
for any positive ǫ, an approximation algorithm with an approximation ratio
bounded by 11 9 + ǫ.
Proof: Let ǫ > 0, and let HC be the problem of deciding whether a given
undirected graph G contains a Hamiltonian cycle. By Exercise 16.9, HC
is N P-complete. Since there are no integers in the problem instance, it is
strongly N P-complete. We will show that HC ≤pp T ǫ-ApproxTSP, where
pp
≤T denotes a pseudopolynomial Turing reduction. It will then follow that
ǫ-ApproxTSP is N P-hard in the strong sense.
CHAPTER 17. APPROXIMATION ALGORITHMS 575
MetricTspSearcher(n)
pre ← new VisitCounter(n); order ← new Array[0..n − 1]
MetricTspSearcher.PreProc(i)
pre.Visit(i); order[pre.Num(i)] ← i
CHAPTER 17. APPROXIMATION ALGORITHMS 578
11.1, page 379) or Prim’s algorithm (Figure 11.2, page 382). Our graph is
complete, so that the number of edges is in Θ(n2 ), where n is the number of
vertices. If we are using a ListGraph representation, Kruskal’s algorithm
is more efficient, running in Θ(n2 lg n) time. However, if we are using a
MatrixGraph representation, Prim’s algorithm is more efficient, running
in Θ(n2 ) time. Because we can construct a MatrixGraph from a List-
Graph in Θ(n2 ) time, we will use Prim’s algorithm. The entire algorithm
is shown in Figure 17.5.
Assuming G is a MatrixGraph, the call to Prim runs in Θ(n2 ) time.
The ListMultigraph constructor then runs in Θ(n) time. Because the
ConsList returned by Prim contains exactly n − 1 edges, and the List-
Multigraph.Put operation runs in Θ(1) time, the loop runs in Θ(n) time.
As was shown in Section 9.5, the ListGraph constructor runs in Θ(n)
time. The Selector constructor runs in Θ(n) time, and the MetricTsp-
Searcher constructor clearly runs in Θ(1) time. Because the MetricTsp-
CHAPTER 17. APPROXIMATION ALGORITHMS 579
Proof: For a given vertex i, let Wi denote the sum of the weights of all
edges {i, j} such that 0 ≤ j < i. At the end of iteration i, the value of the
cut increases by Wi − clusterInc[m], where clusterInc[m] is the sum of the
weights of the edges from i to other vertices in partition m. m is chosen
so that clusterInc[m] is minimized; hence, for each partition other than m,
the sum of the weights of the edges from i to vertices in that partition is at
CHAPTER 17. APPROXIMATION ALGORITHMS 581
clusterInc[m] ≤ Wi /k.
1
0 1
x x
x
x
max cut
2 3
1
approximation
weight x. Because the sum of all other edges is less than x, the maximum cut
must contain all of the edges with weight x. Note that connecting any two of
the vertices 0, k, . . . , k2 − k is an edge with weight x; hence, a maximum cut
must place all of these k vertices into different clusters. Then for 0 ≤ i < k
and 1 ≤ j < k, there is an edge of weight x between vertex ik + j and i′ k
for each i′ 6= i. As a result, vertex ik + j must be placed in the same cluster
as vertex ik. Hence, the maximum cut partitions the vertices so that each
group forms a cluster.
Now consider the behavior of MaxCut on G. It will first place vertices
0, 1, . . . , k − 1 into different clusters. Then vertex k is adjacent to exactly
one vertex in each of the clusters via an edge with weight x. Because placing
k in any of the clusters would increase the weight of that cluster by x, k is
placed in the first cluster with vertex 0. Placing k + 1 in this cluster would
increase its weight by x+1; however, placing k +1 in any other cluster would
increase that cluster’s weight by only x. As a result, k + 1 is placed in the
second cluster with vertex 1. It is easily seen that the algorithm continues
by placing vertices into clusters in round-robin fashion, so that each cluster
ultimately contains exactly one vertex from each group.
We can use symmetry to help us to evaluate the approximation ratio of
MaxCut on G. In the maximum cut, each vertex is adjacent to (k − 1)k
CHAPTER 17. APPROXIMATION ALGORITHMS 583
vertices in other clusters via edges whose weights are all x. In the cut
produced by MaxCut, each vertex is also adjacent to (k − 1)k vertices in
other clusters; however, only (k − 1)2 of these edges have weight x, while the
remaining edges each have weight 1. The approximation ratio is therefore
(k − 1)kx kx
= ,
(k − 1)2 x + k − 1 (k − 1)x + 1
which approaches
k 1
=1+
k−1 k−1
as x approaches ∞. The bound of Theorem 17.8 is therefore tight.
Let us now analyze the approximation ratio for MaxCut as an approx-
imation algorithm for the minimum cluster problem. Again we can use
symmetry to simplify the analysis. In the optimal solution, each vertex is
adjacent to k − 1 vertices in the same cluster via edges whose weights are
all 1. In the solution given by MaxCut, each vertex is adjacent to k − 1
vertices in the same cluster via edges whose weights are all x. Thus, the
approximation ratio is x, which can be chosen to be arbitrarily large. We
can therefore see that even though the maximum cut and minimum clus-
ter optimization problems are essentially the same, the MaxCut algorithm
yields vastly different approximation ratios relative to the two problems.
To carry this idea a step further, we will now show that the minimum
cluster problem has no approximation algorithm with a bounded approxi-
mation ratio unless P = N P. For a given ǫ ∈ R>0 and integer k ≥ 2, let the
ǫ-Approx-k-Cluster problem be the problem of finding, for a given com-
plete undirected graph G with positive integer edge weights, a k-cut whose
sum of cluster weights is at most W ∗ (1 + ǫ), where W ∗ is the minimum sum
of cluster weights. Likewise, let ǫ-Approx-Cluster be the corresponding
problem with k provided as an input. We will show that for every positive ǫ
and every integer k ≥ 3, the ǫ-Approx-k-Cluster problem is N P-hard in
the strong sense. Because ǫ-Approx-3-Cluster ≤pp T ǫ-Approx-Cluster,
it will then follow that this latter problem is also N P-hard in the strong
sense. Whether the result extends to ǫ-Approx-2-Cluster is unknown at
the time of this writing.
17.6 Summary
Using Turing reducibility, we can extend the definition of N P-hardness from
Chapter 16 to apply to problems other than decision problems in a natural
way. We can then identify certain optimization problems as being N P-hard,
either in the strong sense or the ordinary sense. One way of coping with
N P-hard optimization problems is by using approximation algorithms.
For some N P-hard optimization problems we can find polynomial ap-
proximation schemes, which take as input an instance x of the problem and
a positive real number ǫ and return, in time polynomial in |x|, an approxi-
mate solution with approximation ratio no more than 1+ǫ. If this algorithm
runs in time polynomial in |x| and 1/ǫ, it is called a fully polynomial ap-
proximation scheme.
However, Theorem 17.4 tells us that for most optimization problems, if
CHAPTER 17. APPROXIMATION ALGORITHMS 585
17.7 Exercises
Exercise 17.1 Give an approximation algorithm that takes an instance of
the knapsack problem and a positive integer k and returns a packing with
an approximation ratio of no more than 1 + k1 . Your algorithm must run in
O(nk+1 ) time. Prove that both of these bounds (approximation ratio and
running time) are met by your algorithm.
Exercise 17.3 The best-fit algorithm for bin packing considers the items
in the given order, always choosing the largest-weight bin in which the item
will fit. Show that the approximation ratio for this algorithm is no more
than 2.
* Exercise 17.5
* Exercise 17.7
scheme of Section 17.2 is due to Ibarra and Kim [66]. Theorem 17.7 was
shown by Sahni and Gonzalez [94].