Algorithms 20
Algorithms 20
006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Lecture 1: Introduction
Lecture 1: Introduction
The goal of this class is to teach you to solve computation problems, and to communicate that
your solutions are correct and efficient.
Problem
• Binary relation from problem inputs to correct outputs
• Usually don’t specify every correct output for all inputs (too many!)
– Example: Given any set of n students, is there a pair of students with same birthday?
– If birthday is just one of 365, for n > 365, answer always true by pigeon-hole
– Assume resolution of possible birthdays exceeds n (include year, time, etc.)
Algorithm
• Procedure mapping each input to a single output (deterministic)
• Algorithm solves a problem if it returns a correct output for every problem input
Correctness
• Programs/algorithms have fixed size, so how to prove correct?
• For arbitrarily large inputs, algorithm must be recursive or loop in some way
• Must use induction (why recursion is such a key concept in computer science)
Efficiency
• How fast does an algorithm produce a correct output?
– Upper bounds (O), lower bounds (Ω), tight bounds (Θ) ∈, =, is, order
– Time estimate below based on one operation per cycle on a 1 GHz single-core machine
– Particles in universe estimated < 10100
Model of Computation
• Specification for what operations on the machine can be performed in O(1) time
• Processor supports many constant time operations on a O(1) number of words (integers):
Data Structure
• A data structure is a way to store non-constant data, that supports a set of operations
• Data structures may implement the same interface with different performance
• Example: Static Array - fixed width slots, fixed length, static sequence interface
• Like Python tuple plus set at(i, x), Python list is a dynamic array (see L02)
4 Lecture 1: Introduction
1 def birthday_match(students):
2 ’’’
3 Find a pair of students with the same birthday
4 Input: tuple of student (name, bday) tuples
5 Output: tuple of student names or None
6 ’’’
7 n = len(students) # O(1)
8 record = StaticArray(n) # O(n)
9 for k in range(n): # n
10 (name1, bday1) = students[k] # O(1)
11 # Return pair if bday1 in record
12 for i in range(k): # k
13 (name2, bday2) = record.get_at(i) # O(1)
14 if bday1 == bday2: # O(1)
15 return (name1, name2) # O(1)
16 record.set_at(k, (name1, bday1)) # O(1)
17 return None # O(1)
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms
Introduction to Algorithms: 6.006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Lecture 2: Data Structures
Array Sequence
• Array is great for static operations! get at(i) and set at(i, x) in Θ(1) time!
• But not so great at dynamic operations...
• (For consistency, we maintain the invariant that array is full)
• Then inserting and removing items requires:
– reallocating the array
– shifting all items after the modified item
• Each item stored in a node which contains a pointer to the next node in sequence
• Can now insert and delete from the front in Θ(1) time! Yay!
• (Inserting/deleting efficiently from back is also possible; you will do this in PS1)
• But now get at(i) and set at(i, x) each take O(n) time... :(
• Idea! Allocate extra space so reallocation does not occur with every dynamic operation
• Whenever array is full (r = 1), allocate Θ(n) extra space at end to fill ratio ri (e.g., 1/2)
Amortized Analysis
• Data structure analysis technique to distribute cost over many operations
• “T (n) amortized” roughly means T (n) “on average” over many operations
• However, can be very wasteful in space. Want size of data structure to stay Θ(n)
• Attempt: if very empty, resize to r = 1. Alternating insertion and deletion could be bad...
• Idea! When r < rd , resize array to ratio ri where rd < ri (e.g., rd = 1/4, ri = 1/2)
• Then Θ(n) cheap operations must be made before next expensive resize
1 rd +1
• Can limit extra space usage to (1 + ε)n for any ε > 0 (set rd = ,r
1+ε i
= 2
)
• Python List append and pop are amortized O(1) time, other operations can be O(n)!
• (Inserting/deleting efficiently from front is also possible; you will do this in PS1)
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms
Introduction to Algorithms: 6.006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Lecture 3: Sorting
Lecture 3: Sorting
• Storing items in an array in arbitrary order can implement a (not so efficient) set
Operations O(·)
Set Container Static Dynamic Order
Data Structure build(X) find(k) insert(x) find min() find prev(k)
delete(k) find max() find next(k)
Array n n n n n
Sorted Array n log n log n n 1 log n
Sorting
• Given a sorted array, we can leverage binary search to make an efficient set data structure.
• A sort is in place if it uses O(1) extra space (implies destructive: in place ⊆ destructive)
Permutation Sort
• There are n! permutations of A, at least one of which is sorted
• Example: [2, 3, 1] → {[1, 2, 3], [1, 3, 2], [2, 1, 3], [2, 3, 1], [3, 1, 2], [3, 2, 1]}
1 def permutation_sort(A):
2 ’’’Sort A’’’
3 for B in permutations(A): # O(n!)
4 if is_sorted(B): # O(n)
5 return B # O(1)
Solving Recurrences
• Substitution: Guess a solution, replace with representative function, recurrence holds true
• Recurrence Tree: Draw a tree representing the recursive calls and sum computation at nodes
Selection Sort
• Find a largest number in prefix A[:i + 1] and swap it to A[i]
• Example: [8, 2, 4, 9, 3], [8, 2, 4, 3, 9], [3, 2, 4, 8, 9], [3, 2, 4, 8, 9], [2, 3, 4, 8, 9]
Insertion Sort
• Recursively sort prefix A[:i]
• Sort prefix A[:i + 1] assuming that prefix A[:i] is sorted by repeated swaps
• Example: [8, 2, 4, 9, 3], [2, 8, 4, 9, 3], [2, 4, 8, 9, 3], [2, 4, 8, 9, 3], [2, 3, 4, 8, 9]
1 def insertion_sort(A, i = None): # T(i)
2 ’’’Sort A[:i + 1]’’’
3 if i is None: i = len(A) - 1 # O(1)
4 if i > 0: # O(1)
5 insertion_sort(A, i - 1) # T(i - 1)
6 insert_last(A, i) # S(i)
7
8 def insert_last(A, i): # S(i)
9 ’’’Sort A[:i + 1] assuming sorted A[:i]’’’
10 if i > 0 and A[i] < A[i - 1]: # O(1)
11 A[i], A[i - 1] = A[i - 1], A[i] # O(1)
12 insert_last(A, i - 1) # S(i - 1)
Merge Sort
• Recursively sort first half and second half (may assume power of two)
• Merge sorted halves into one sorted list (two finger algorithm)
• Example: [7, 1, 5, 6, 2, 4, 9, 3], [1, 7, 5, 6, 2, 4, 3, 9], [1, 5, 6, 7, 2, 3, 4, 9], [1, 2, 3, 4, 5, 6, 7, 9]
1 def merge_sort(A, a = 0, b = None): # T(b - a = n)
2 ’’’Sort A[a:b]’’’
3 if b is None: b = len(A) # O(1)
4 if 1 < b - a: # O(1)
5 c = (a + b + 1) // 2 # O(1)
6 merge_sort(A, a, c) # T(n / 2)
7 merge_sort(A, c, b) # T(n / 2)
8 L, R = A[a:c], A[c:b] # O(n)
9 merge(L, R, A, len(L), len(R), a, b) # S(n)
10
11 def merge(L, R, A, i, j, a, b): # S(b - a = n)
12 ’’’Merge sorted L[:i] and R[:j] into A[a:b]’’’
13 if a < b: # O(1)
14 if (j <= 0) or (i > 0 and L[i - 1] > R[j - 1]): # O(1)
15 A[b - 1] = L[i - 1] # O(1)
16 i = i - 1 # O(1)
17 else: # O(1)
18 A[b - 1] = R[j - 1] # O(1)
19 j = j - 1 # O(1)
20 merge(L, R, A, i, j, a, b - 1) # S(n - 1)
• merge analysis:
– Base case: for n = 0, arrays are empty, so vacuously correct
– Induction: assume correct for n, item in A[r] must be a largest number from remaining
prefixes of L and R, and since they are sorted, taking largest of last items suffices;
remainder is merged by induction
– S(0) = Θ(1), S(n) = S(n − 1) + Θ(1) =⇒ S(n) = Θ(n)
• merge sort analysis:
– Base case: for n = 1, array has one element so is sorted
– Induction: assume correct for k < n, algorithm sorts smaller halves by induction, and
then merge merges into a sorted array as proved above.
– T (1) = Θ(1), T (n) = 2T (n/2) + Θ(n)
∗ Substitution: Guess T (n) = Θ(n log n)
cn log n = Θ(n) + 2c(n/2) log(n/2) =⇒ cn log(2) = Θ(n)
with depth log2 n andPn leaves, level i has 2i
∗ Recurrence Tree: complete binary treeP
nodes with O(n/2i ) work each, total: log 2n i i
i=0 (2 )(n/2 ) =
log2 n
i=0 n = Θ(n log n)
MIT OpenCourseWare
https://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms
Introduction to Algorithms: 6.006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Lecture 4: Hashing
Lecture 4: Hashing
Review
Operations O(·)
Container Static Dynamic Order
Data Structure build(X) find(k) insert(x) find min() find prev(k)
Array n n n n n
Sorted Array n log n log n n 1 log n
• Idea! Want faster search and dynamic operations. Can we find(k) faster than Θ(log n)?
Comparison Model
• In this model, assume algorithm can only differentiate items via comparisons
Decision Tree
• Any algorithm can be viewed as a decision tree of operations performed
• Need at least one leaf for each algorithm output, so search requires ≥ n + 1 leaves
2 Lecture 4: Hashing
• running time ≥ # comparisons ≥ max length of any root-to-leaf path ≥ height of tree
• Minimum height when binary tree is complete (all rows full except last)
• Height ≥ dlg(n + 1)e − 1 = Ω(log n), so running time of any comparison sort is Ω(log n)
• More generally, height of tree with Θ(n) leaves and max branching factor b is Ω(logb n)
• To get faster, need an operation that allows super-constant ω(1) branching factor. How??
• Idea! Give item unique integer key k in {0, . . . , u − 1}, store item in an array at index k
• Anything in computer memory is a binary integer, or use (static) 64-bit address in memory
• Example: if keys are ten-letter names, for one bit per name, requires 2610 ≈ 17.6 TB space
Hashing
• Idea! If n u, map keys to a smaller range m = Θ(n) and use smaller direct access array
• Direct access array called hash table, h(k) called the hash of key k
Chaining
• Idea! Store collisions in another data structure (a chain)
• If keys roughly evenly distributed over indices, chain size is n/m = n/Ω(n) = O(1)!
• If chain has O(1) size, all operations take O(1) time! Yay!
• If not, many items may map to same location, e.g. h(k) = constant, chain size is Θ(n) :(
Hash Functions
• If u n, every hash function will have some input set that will a create O(n) size chain
• Idea! Don’t use a fixed hash function! Choose one randomly (but carefully)!
4 Lecture 4: Hashing
• Parameterized by a fixed prime p > u, with a and b chosen from range {0, . . . , p − 1}
• Xij indicator random variable over h ∈ H: Xij = 1 if h(ki ) = h(kj ), Xij = 0 otherwise
P
• Size of chain at index h(ki ) is random variable Xi = j Xij
Dynamic
• If n/m far from 1, rebuild with new randomly chosen hash function for new size m
• Same analysis as dynamic arrays, cost can be amortized over many dynamic operations
• So a hash table can implement dynamic set operations in expected amortized O(1) time! :)
Operations O(·)
Container Static Dynamic Order
Data Structure build(X) find(k) insert(x) find min() find prev(k)
Array n n n n n
Sorted Array n log n log n n 1 log n
Direct Access Array u 1 1 u u
Hash Table n(e) 1(e) 1(a)(e) n n
MIT OpenCourseWare
https://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms
Introduction to Algorithms: 6.006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Lecture 5: Linear Sorting
Review
• Comparison search lower bound: any decision tree with n nodes has height ≥ dlg(n+1)e−1
• Can do faster using random access indexing: an operation with linear branching factor!
• Direct access array is fast, but may use a lot of space (Θ(u))
• Expectation input-independent: choose hash function randomly from universal hash family
• Last time we achieved faster find. Can we also achieve faster sort?
Operations O(·)
Container Static Dynamic Order
Data Structure build(X) find(k) insert(x) find min() find prev(k)
Array n n n n n
Sorted Array n log n log n n 1 log n
Direct Access Array u 1 1 u u
Hash Table n(e) 1(e) 1(a)(e) n n
2 Lecture 5: Linear Sorting
1 def direct_access_sort(A):
2 "Sort A assuming items have distinct non-negative keys"
3 u = 1 + max([x.key for x in A]) # O(n) find maximum key
4 D = [None] * u # O(u) direct access array
5 for x in A: # O(n) insert items
6 D[x.key] = x
7 i = 0
8 for key in range(u): # O(u) read out items in order
9 if D[key] is not None:
10 A[i] = D[key]
11 i += 1
Tuple Sort
• Item keys are tuples of equal length, i.e. item x.key = (x.k1 , x.k2 , x.k2 , . . .).
• Want to sort on all entries lexicographically, so first key k1 is most significant
• How to sort? Idea! Use other auxiliary sorting algorithms to separately sort each key
• (Like sorting rows in a spreadsheet by multiple columns)
• What order to sort them in? Least significant to most significant!
• Exercise: [32, 03, 44, 42, 22] =⇒ [42, 22, 32, 03, 44] =⇒ [03, 22, 32, 42, 44](n=5)
• Idea! Use tuple sort with auxiliary direct access array sort to sort tuples (a, b).
• Problem! Many integers could have the same a or b value, even if input keys distinct
• Need sort allowing repeated keys which preserves input order
• Want sort to be stable: repeated keys appear in output in same order as input
• Direct access array sort cannot even sort arrays having repeated keys!
• Can we modify direct access array sort to admit multiple keys in a way that is stable?
Counting Sort
• Instead of storing a single item at each array index, store a chain, just like hashing!
• For stability, chain data structure should remember the order in which items were added
• Use a sequence data structure which maintains insertion order
• To insert item x, insert last to end of the chain at index x.key
• Then to sort, read through all chains in sequence order, returning items one by one
1 def counting_sort(A):
2 "Sort A assuming items have non-negative keys"
3 u = 1 + max([x.key for x in A]) # O(n) find maximum key
4 D = [[] for i in range(u)] # O(u) direct access array of chains
5 for x in A: # O(n) insert into chain at x.key
6 D[x.key].append(x)
7 i = 0
8 for chain in D: # O(u) read out items in order
9 for x in chain:
10 A[i] = x
11 i += 1
4 Lecture 5: Linear Sorting
Radix Sort
• Idea! If u < n2 , use tuple sort with auxiliary counting sort to sort tuples (a, b)
• Sort least significant key b, then most significant key a
• Stability ensures previous sorts stay sorted
• Running time for this algorithm is O(2n) = O(n). Yay!
• If every key < nc for some positive c = logn (u), every key has at most c digits base n
• A c-digit number can be written as a c-element tuple in O(c) time
• We sort each of the c base-n digits in O(n) time
• So tuple sort with auxiliary counting sort runs in O(cn) time in total
• If c is constant, so each key is ≤ nc , this sort is linear O(n)!
1 def radix_sort(A):
2 "Sort A assuming items have non-negative keys"
3 n = len(A)
4 u = 1 + max([x.key for x in A]) # O(n) find maximum key
5 c = 1 + (u.bit_length() // n.bit_length())
6 class Obj: pass
7 D = [Obj() for a in A]
8 for i in range(n): # O(nc) make digit tuples
9 D[i].digits = []
10 D[i].item = A[i]
11 high = A[i].key
12 for j in range(c): # O(c) make digit tuple
13 high, low = divmod(high, n)
14 D[i].digits.append(low)
15 for i in range(c): # O(nc) sort each digit
16 for j in range(n): # O(n) assign key i to tuples
17 D[j].key = D[j].digits[i]
18 counting_sort(D) # O(n) sort on digit i
19 for i in range(n): # O(n) output to A
20 A[i] = D[i].item
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms
Introduction to Algorithms: 6.006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Lecture 6: Binary Trees I
• Binary tree is pointer-based data structure with three pointers per node
• Example:
Terminology
• The root of a tree has no parent (Ex: <A>)
• Define depth(<X>) of node <X> in a tree rooted at <R> to be length of path from <X> to <R>
• Define height(<X>) of node <X> to be max depth of any node in the subtree rooted at <X>
• Idea: Design operations to run in O(h) time for root height h, and maintain h = O(log n)
– Recursively list left subtree, list self, then recursively list right subtree
– Runs in O(n) time, since O(1) work is done to list each node
– Example: Traversal order is (<F>, <D>, <B>, <E>, <A>, <C>)
• Right now, traversal order has no meaning relative to the stored items
Tree Navigation
• Find first node in the traversal order of node <X>’s subtree (last is symmetric)
– If <X> has left child, recursively return the first node in the left subtree
– Otherwise, <X> is the first node, so return it
– Running time is O(h) where h is the height of the tree
– Example: first node in <A>’s subtree is <F>
Dynamic Operations
• Change the tree by a single item (only add or remove leaves):
– If <X> has no right child, make <Y> the right child of <X>
– Otherwise, make <Y> the left child of <X>’s successor (which cannot have a left child)
– Running time is O(h) where h is the height of the tree
Application: Set
• Idea! Set Binary Tree (a.k.a. Binary Search Tree / BST):
Traversal order is sorted order increasing by key
– Equivalent to BST Property: for every node, every key in left subtree ≤ node’s key ≤
every key in right subtree
• Then can find the node with key k in node <X>’s subtree in O(h) time like binary search:
– If k is smaller than the key at <X>, recurse in left subtree (or return None)
– If k is larger than the key at <X>, recurse in right subtree (or return None)
– Otherwise, return the item stored at <X>
Application: Sequence
• Idea! Sequence Binary Tree: Traversal order is sequence order
• How do we find ith node in traversal order of a subtree? Call this operation subtree at(i)
• Could just iterate through entire traversal order, but that’s bad, O(n)
• However, if we could compute a subtree’s size in O(1), then can solve in O(h) time
• Maintain the size of each node’s subtree at the node via augmentation
• Naively, build(X) takes O(nh) time, but can be done in O(n) time; see recitation
Lecture 6: Binary Trees I 5
So Far
Operations O(·)
Set Container Static Dynamic Order
Data Structure build(X) find(k) insert(x) find min() find prev(k)
delete(k) find max() find next(k)
Binary Tree n log n h h h h
Goal n log n log n log n log n log n
Operations O(·)
Sequence Container Static Dynamic
Data Structure build(X) get at(i) insert first(x) insert last(x) insert at(i, x)
set at(i,x) delete first() delete last() delete at(i)
Binary Tree n h h h h
Goal n log n log n log n log n
Next Time
• Keep a binary tree balanced after insertion or deletion
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms
Introduction to Algorithms: 6.006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Lecture 7: Binary Trees II: AVL
Height Balance
• How to maintain height h = O(log n) where n is number of nodes in tree?
• A binary tree that maintains O(log n) height under dynamic operations is called balanced
– There are many balancing schemes (Red-Black Trees, Splay Trees, 2-3 Trees, . . . )
– First proposed balancing scheme was the AVL Tree (Adelson-Velsky and Landis, 1962)
Rotations
• Need to reduce height of tree without changing its traversal order, so that we represent the
same sequence of items
• How to change the structure of a tree, while preserving traversal order? Rotations!
• A rotation relinks O(1) pointers to modify tree structure and maintains traversal order
2 Lecture 7: Binary Trees II: AVL
Rotations Suffice
• Claim: O(n) rotations can transform a binary tree to any other with same traversal order.
• Proof: Repeatedly perform last possible right rotation in traversal order; resulting tree is a
canonical chain. Each rotation increases depth of the last node by 1. Depth of last node in
final chain is n − 1, so at most n − 1 rotations are performed. Reverse canonical rotations to
reach target tree.
• Can maintain height-balance by using O(n) rotations to fully balance the tree, but slow :(
– A node is height-balanced if heights of its left and right subtrees differ by at most 1
– Let skew of a node be the height of its right subtree minus that of its left subtree
– Then a node is height-balanced if its skew is −1, 0, or 1
• Claim: A binary tree with height-balanced nodes has height h = O(log n) (i.e., n = 2Ω(h) )
• Proof: Suffices to show fewest nodes F (h) in any height h tree is F (h) = 2Ω(h)
1 __<B>______ ______<F>____
2 <A> ___<F>___ __<B>___ <G>
3 / \ <D> <G> => <A> <D> / \
4 /___\ / \ / \ / \ / \ / \
5 /___\ / \ /___\ /___\ /_____\
6 /_____\ /_____\ /_____\
1 __<B>___________ _____<D>______
2 <A> _____<F>__ __<B>__ __<F>__
3 / \ __<D>__ <G> => <A> <C> <E> <G>
4 /___\ <C> <E> / \ / \ /_\ /_\ / \
5 /_\ /_\ /___\ /___\ /___\ /___\ /___\
6 /___\ /___\
• Global Rebalance: Add or remove a leaf from height-balanced tree T to produce tree T 0 .
Then T 0 can be transformed into a height-balanced tree T 00 using at most O(log n) rotations.
• Proof:
– Only ancestors of the affected leaf have different height in T 0 than in T
– Affected leaf has at most h = O(log n) ancestors whose subtrees may have changed
– Let <X> be lowest ancestor that is not height-balanced (with skew magnitude 2)
– If a leaf was added into T :
∗ Insertion increases height of <X>, so in Case 2 or 3 of Local Rebalancing
∗ Rotation decreases subtree height: balanced after one rotation
– If a leaf was removed from T :
∗ Deletion decreased height of one child of <X>, not <X>, so only imbalance
∗ Could decrease height of <X> by 1; parent of <X> may now be imbalanced
∗ So may have to rebalance every ancestor of <X>, but at most h = O(log n) of them
• So can maintain height-balance using only O(log n) rotations after insertion/deletion!
• But requires us to evaluate whether possibly O(log n) nodes were height-balanced
Computing Height
• How to tell whether node <X> is height-balanced? Compute heights of subtrees!
• How to compute the height of node <X>? Naive algorithm:
– Recursively compute height of the left and right subtrees of <X>
– Add 1 to the max of the two heights
– Runs in Ω(n) time, since we recurse on every node :(
• Idea: Augment each node with the height of its subtree! (Save for later!)
• Height of <X> can be computed in O(1) time from the heights of its children:
– Look up the stored heights of left and right subtrees in O(1) time
– Add 1 to the max of the two heights
• During dynamic operations, we must maintain our augmentation as the tree changes shape
• Recompute subtree augmentations at every node whose subtree changes:
– Update relinked nodes in a rotation operation in O(1) time (ancestors don’t change)
– Update all ancestors of an inserted or deleted node in O(h) time by walking up the tree
Lecture 7: Binary Trees II: AVL 5
– State the subtree property P(<X>) you want to store at each node <X>
– Show how to compute P(<X>) from the augmentations of <X>’s children in O(1) time
• Then stored property P(<X>) can be maintained without changing dynamic operation costs
Application: Sequence
• For sequence binary tree, we needed to know subtree sizes
• For just inserting/deleting a leaf, this was easy, but now need to handle rotations
– Can compute size from sizes of children by summing them and adding 1
Conclusion
• Set AVL trees achieve O(lg n) time for all set operations,
except O(n log n) time for build and O(n) time for iter
• Sequence AVL trees achieve O(lg n) time for all sequence operations,
except O(n) time for build and iter
Application: Sorting
• Any Set data structure defines a sorting algorithm: build (or repeatedly insert) then iter
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms
Introduction to Algorithms: 6.006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Lecture 8: Binary Heaps
• Many sorting algorithms we’ve seen can be viewed as priority queue sort:
Priority Queue Operations O(·) Priority Queue Sort
Data Structure build(A) insert(x) delete max() Time In-place?
2
Dynamic Array n 1(a) n n Y Selection Sort
Sorted Dynamic Array n log n n 1(a) n2 Y Insertion Sort
Set AVL Tree n log n log n log n n log n N AVL Sort
Goal n log n(a) log n(a) n log n Y Heap Sort
2 Lecture 8: Binary Heaps
• Can speed up find min() and find max() to O(1) time via subtree augmentation
• But this data structure is complicated and resulting sort is not in-place
• Is there a simpler data structure for just priority queue, and in-place O(n lg n) sort?
YES, binary heap and heap sort
• Essentially implement a Set data structure on top of a Sequence data structure (array), using
what we learned about binary trees
• delete max(): find max in O(n), swap max to the end and remove
• Can we find a compromise between these two array priority queue extremes?
Lecture 8: Binary Heaps 3
1 d0 ______O____
2 d1 ____O____ __O__
3 d2 __O__ __O O O
4 d3 O O O
• Equivalently, complete tree is filled densely in reading order: root to leaves, left to right
• Height of complete tree perspective of array of n item is dlg ne, so balanced binary tree
• Root is at index 0
left(i) = 2i + 1
right(i) = 2i + 2
i−1
parent(i) =
2
4 Lecture 8: Binary Heaps
Binary Heaps
• Idea: keep larger elements higher in tree, but only locally
• Claim: In a max-heap, every node i satisfies Q[i] ≥ Q[j] for all nodes j in subtree(i)
• Proof:
Heap Insert
• Append new item x to end of array in O(1) amortized, making it next leaf i in reading order
• Correctness:
– Max-Heap Property guarantees all nodes ≥ descendants, except Q[i] might be > some
of its ancestors (unless i is the root, so we’re done)
– If swap necessary, same guarantee is true with Q[parent(i)] instead of Q[i]
• So swap item at root node i = 0 with last item at node n − 1 in heap array
• max heapify down(i): swap root with larger child until Max-Heap Property
• Correctness:
– Max-Heap Property guarantees all nodes ≥ descendants, except Q[i] might be < some
descendants (unless i is a leaf, so we’re done)
– If swap is necessary, same guarantee is true with Q[j] instead of Q[i]
Heap Sort
• Plugging max-heap into priority queue sort gives us a new sorting algorithm
• Running time is O(n log n) because each insert and delete max takes O(log n)
• |Q| is initially zero, eventually |A| (after inserts), then zero again (after deletes)
• delete max() moves max item to end, then abandons it by decrementing |Q|
• In-place priority queue sort with Sorted Array is exactly Insertion Sort
• In-place priority queue sort with binary Max Heap is Heap Sort
6 Lecture 8: Binary Heaps
• Idea! Treat full array as a complete binary tree from start, then max heapify down(i)
for i from n − 1 to 0 (leaves up):
n−1 n−1
nn nn
X X
worst-case swaps ≈ height(i) = (lg n−lg i) = lg = Θ lg √ = O(n)
i=0 i=0
n! n(n/e)n
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms
Introduction to Algorithms: 6.006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Lecture 9: Breadth-First Search
Graph Applications
• Why? Graphs are everywhere!
• the state space of any discrete system can be represented by a transition graph
Graph Definitions
G1 G2 G3
0 1 0 a s d f
2 3 1 2 b c e g
• Undirected edges are unordered pairs, e.g., {u, v} for u, v ∈ V i.e., (u, v) and (v, u)
– edges are distinct, e.g., (u, v) only occurs once in E (though (v, u) may appear), and
6 v for all (u, v) ∈ E
– edges are pairs of distinct vertices, e.g., u =
– Simple implies |E| = O(|V | ), since |E| ≤ |V2 | for undirected, ≤ 2 |V2 | for directed
2
� �
2 Lecture 9: Breadth-First Search
Neighbor Sets/Adjacencies
• The outgoing neighbor set of u ∈ V is Adj+ (u) = {v ∈ V | (u, v) ∈ E}
• For undirected graphs, Adj− (u) = Adj+ (u) and deg− (u) = deg+ (u)
• Dropping superscript defaults to outgoing, i.e., Adj(u) = Adj+ (u) and deg(u) = deg+ (u)
Graph Representations
• To store a graph G = (V, E), we need to store the outgoing edges Adj(u) for all u ∈ V
• Then for each u, need to store Adj(u) in another data structure called an adjacency list
• Common to use direct access array or hash table for Adj, since want lookup fast by vertex
• Common to use array or linked list for each Adj(u) since usually only iteration is needed1
• For the common representations, Adj has size Θ(|V |), while each Adj(u) has size Θ(deg(u))
P
• Since u∈V deg(u) ≤ 2|E| by handshaking lemma, graph storable in Θ(|V | + |E|) space
• Thus, for algorithms on graphs, linear time will mean Θ(|V | + |E|) (linear in size of graph)
Examples
• Examples 1 and 2 assume vertices are labeled {0, 1, . . . , |V | − 1}, so can use a direct access
array for Adj, and store Adj(u) in an array. Example 3 uses a hash table for Adj.
• Note that in an undirected graph, connections are symmetric as every edge is outgoing twice
1
A hash table for each Adj(u) can allow checking for an edge (u, v) ∈ E in O(1)(e) time
Lecture 9: Breadth-First Search 3
Paths
• A path is a sequence of vertices p = (v1 , v2 , . . . , vk ) where (vi , vi+1 ) ∈ E for all 1 ≤ i < k.
• The distance δ(u, v) from u ∈ V to v ∈ V is the minimum length of any path from u to v,
i.e., the length of a shortest path from u to v
(by convention, δ(u, v) = ∞ if u is not connected to v)
• S INGLE PAIR S HORTEST PATH (G, s, t): return distance δ(s, t), and
a shortest path in G = (V, E) from s ∈ V to t ∈ V
• S INGLE S OURCE S HORTEST PATHS (G, s): return δ(s, v) for all v ∈ V , and
a shortest-path tree containing a shortest path from s to every v ∈ V (defined below)
• Instead, show one algorithm that solves the hardest in O(|V | + |E|) time!
• Many paths could have length Ω(|V |), so returning every path could require Ω(|V |2 ) time
• Instead, for all v ∈ V , store its parent P (v): second to last vertex on a shortest path from s
• Let P (s) be null (no second to last vertex on shortest path from s to s)
2
A path in 6.006 is a “walk” in 6.042. A “path” in 6.042 is a simple path in 6.006.
4 Lecture 9: Breadth-First Search
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms
Introduction to Algorithms: 6.006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Lecture 10: Depth-First Search
Previously
• Graph definitions (directed/undirected, simple, neighbors, degree)
Examples
G1 G2
a b c a b c
d e f d e f
2 Lecture 10: Depth-First Search
Correctness
• Claim: DFS visits v and correctly sets P (v) for every vertex v reachable from s
• Proof: induct on k, for claim on only vertices within distance k from s
Running Time
• Algorithm visits each vertex u at most once and spends O(1) time for each v ∈ Adj(u)
P
• Work upper bounded by O(1) × u∈V deg(u) = O(|E|)
• Unlike BFS, not returning a distance for each vertex, so DFS runs in O(|E|) time
Lecture 10: Depth-First Search 3
– Choose an arbitrary unvisited vertex s, use A to explore all vertices reachable from s
• Visits every vertex once, so both Full-BFS and Full-DFS run in O(|V | + |E|) time
G1 G2
a b c a b c
d e f d e f
Graph Connectivity
• An undirected graph is connected if there is a path connecting every pair of vertices
• In a directed graph, vertex u may be reachable from v, but v may not be reachable from u
• Connectivity is more complicated for directed graphs (we won’t discuss in this class)
• Proof: Run Full-A. For each run of A, put visited vertices in a connected component
4 Lecture 10: Depth-First Search
Topological Sort
• A Directed Acyclic Graph (DAG) is a directed graph that contains no directed cycle.
• A Topological Order of a graph G = (V, E) is an ordering f on the vertices such that:
every edge (u, v) ∈ E satisfies f (u) < f (v).
• Exercise: Prove that a directed graph admits a topological ordering if and only if it is a DAG.
• How to find a topological order?
• A Finishing Order is the order in which a Full-DFS finishes visiting each vertex in G
• Claim: If G = (V, E) is a DAG, the reverse of a finishing order is a topological order
• Proof: Need to prove, for every edge (u, v) ∈ E that u is ordered before v,
i.e., the visit to v finishes before visiting u. Two cases:
– If u visited before v:
∗ Before visit to u finishes, will visit v (via (u, v) or otherwise)
∗ Thus the visit to v finishes before visiting u
– If v visited before u:
∗ u can’t be reached from v since graph is acyclic
∗ Thus the visit to v finishes before visiting u
Cycle Detection
• Full-DFS will find a topological order if a graph G = (V, E) is acyclic
• If reverse finishing order for Full-DFS is not a topological order, then G must contain a cycle
• Check if G is acyclic: for each edge (u, v), check if v is before u in reverse finishing order
• Can be done in O(|E|) time via a hash table or direct access array
• To return such a cycle, maintain the set of ancestors along the path back to s in Full-DFS
• Claim: If G contains a cycle, Full-DFS will traverse an edge from v to an ancestor of v.
• Proof: Consider a cycle (v0 , v1 , . . . , vk , v0 ) in G
– Without loss of generality, let v0 be the first vertex visited by Full-DFS on the cycle
– For each vi , before visit to vi finishes, will visit vi+1 and finish
– Will consider edge (vi , vi+1 ), and if vi+1 has not been visited, it will be visited now
– Thus, before visit to v0 finishes, will visit vk (for the first time, by v0 assumption)
– So, before visit to vk finishes, will consider (vk , v0 ), where v0 is an ancestor of vk
MIT OpenCourseWare
https://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms
Introduction to Algorithms: 6.006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Lecture 11: Weighted Shortest Paths
Review
• Single-Source Shortest Paths with BFS in O(|V | + |E|) time (return distance per vertex)
• Single-Source Reachability with BFS or DFS in O(|E|) time (return only reachable vertices)
Weighted Graphs
• A weighted graph is a graph G = (V, E) together with a weight function w : E → Z
– Inside graph representation: store edge weight with each vertex in adjacency lists
– Store separate Set data structure mapping each edge to its weight
• We assume a representation that allows querying the weight of an edge in O(1) time
Examples
G1 G2
−5 −1 5 −5 −1 5
a b c d a b c d
6 9 6 9
7 −4 1 4 7 −4 1 4
8 8
e f g h e f g h
3 2 −2 3 2 −2
2 Lecture 11: Weighted Shortest Paths
Weighted Paths
• The weight w(π) of a path π in a weighted graph is the sum of weights of edges in the path
• (Often use “distance” for shortest-path weight in weighted graphs, not number of edges)
• Why infimum not minimum? Possible that no finite-length minimum-weight path exists
• When? Can occur if there is a negative-weight cycle in the graph, Ex: (b, f, g, c, b) in G1
• A negative-weight cycle is a path π starting and ending at same vertex with w(π) < 0
• If this occurs, don’t want a shortest path, but may want the negative-weight cycle
• (No parent pointers: can reconstruct shortest paths tree in linear time after. Next page!)
• Already know one algorithm: Breadth-First Search! Runs in O(|V | + |E|) time when, e.g.:
– graph has positive weights, and all weights are the same
– graph has positive weights, and sum of all weights at most O(|V | + |E|)
• For general weighted graphs, we don’t know how to solve SSSP in O(|V | + |E|) time
Shortest-Paths Tree
• For BFS, we kept track of parent pointers during search. Alternatively, compute them after!
• If know δ(s, v) for all vertices v ∈ V , can construct shortest-path tree in O(|V | + |E|) time
• For weighted shortest paths from s, only need parent pointers for vertices v with finite δ(s, v)
• Parent pointers may traverse cycles of zero weight. Mark each vertex in such a cycle.
– For each v ∈ Adj+ (u) where v is marked and δ(s, v) = δ(s, u) + w(u, v):
∗ Unmark vertices in cycle containing v by traversing parent pointers from v
∗ Set P (v) = u, breaking the cycle
• Exercise: Prove this algorithm correctly computes parent pointers in linear time
DAG Relaxation
• Idea! Maintain a distance estimate d(s, v) (initially ∞) for each vertex v ∈ V ,
that always upper bounds true distance δ(s, v), then gradually lowers until d(s, v) = δ(s, v)
• Triangle Inequality: the shortest-path weight from u to v cannot be greater than the shortest
path from u to v through another vertex x, i.e., δ(u, v) ≤ δ(u, x) + δ(x, v) for all u, v, x ∈ V
• If d(s, v) > d(s, u) + w(u, v) for some edge (u, v), then triangle inequality is violated :(
• Fix by lowering d(s, v) to d(s, u) + w(u, v), i.e., relax (u, v) to satisfy violated constraint
• Claim: Relaxation is safe: maintains that each d(s, v) is weight of a path to v (or ∞) ∀v ∈ V
• Proof: Assume d(s, v 0 ) is weight of a path (or ∞) for all v 0 ∈ V . Relaxing some edge (u, v)
sets d(s, v) to d(s, u) + w(u, v), which is the weight of a path from s to v through u.
4 Lecture 11: Weighted Shortest Paths
Correctness
• Claim: At end of DAG Relaxation: d(s, v) = δ(s, v) for all v ∈ V
• Proof: Induct on k: d(s, v) = δ(s, v) for all v in first k vertices in topological order
– Base case: Vertex s and every vertex before s in topological order satisfies claim at start
– Inductive step: Assume claim holds for first k 0 vertices, let v be the (k 0 + 1)th
– Consider a shortest path from s to v, and let u be the vertex preceding v on path
– u occurs before v in topological order, so d(s, u) = δ(s, u) by induction
– When processing u, d(s, v) is set to be no larger (≤) than δ(s, u) + w(u, v) = δ(s, v)
– But d(s, v) ≥ δ(s, v), since relaxation is safe, so d(s, v) = δ(s, v)
• Alternatively:
– For any vertex v, DAG relaxation sets d(s, v) = min{d(s, u) + w(u, v) | u ∈ Adj− (v)}
– Shortest path to v must pass through some incoming neighbor u of v
– So if d(s, u) = δ(s, u) for all u ∈ Adj− (v) by induction, then d(s, v) = δ(s, v)
Running Time
• Initialization takes O(|V |) time, and Topological Sort takes O(|V | + |E|) time
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms
Introduction to Algorithms: 6.006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Lecture 12: Bellman-Ford
Previously
• Weighted graphs, shortest-path weight, negative-weight cycles
• DAG Relaxation: algorithm to solve SSSP on a weighted DAG in O(|V | + |E|) time
Warmups
• Exercise 1: Given undirected graph G, return whether G contains a negative-weight cycle
• Solution: Return Yes if there is an edge with negative weight in G in O(|E|) time :O
• Exercise 2: Given SSSP algorithm A that runs in O(|V |(|V | + |E|) time,
show how to use it to solve SSSP in O(|V ||E|) time
• Solution: Run BFS or DFS to find the vertices reachable from s in O(|E|) time
– Mark each vertex v not reachable from s with δ(s, v) = ∞ in O(|V |) time
– Make graph G0 = (V 0 , E 0 ) with only vertices reachable from s in O(|V | + |E|) time
– Run A from s in G0 .
– G0 is connected, so |V 0 | = O(|E 0 |) = O(|E|) so A runs in O(|V ||E|) time
• Today, we will find a SSSP algorithm with this running time that works for general graphs!
• If graph does not contain negative-weight cycles, shortest paths are simple!
• Proof: By contradiction:
– Suppose no simple shortest path; let π be a shortest path with fewest vertices
– π not simple, so exists cycle C in π; C has non-negative weight (or else δ(s, v) = −∞)
– Removing C from π forms path π 0 with fewer vertices and weight w(π 0 ) ≤ w(π)
• Since simple paths cannot repeat vertices, finite shortest paths contain at most |V | − 1 edges
• Idea! Compute δ|V |−1 (s, v) and δ|V | (s, v) for all v ∈ V
6 −∞, δ(s, v) = δ|V |−1 (s, v), since a shortest path is simple (or nonexistent)
– If δ(s, v) =
– If δ|V | (s, v) < δ|V |−1 (s, v)
∗ there exists a shorter non-simple path to v, so δ|V | (s, v) = −∞
∗ call v a (negative cycle) witness
– However, there may be vertices with −∞ shortest-path weight that are not witnesses
• Proof: Suffices to prove: every negative-weight cycle reachable from s contains a witness
– If C contains no witness, δ|V | (s, v) ≥ δ|V |−1 (s, v) for all v ∈ C, a contradiction
Lecture 12: Bellman-Ford 3
Bellman-Ford
• Idea! Use graph duplication: make multiple copies (or levels) of the graph
• |V | + 1 levels: vertex vk in level k represents reaching vertex v from s using ≤ k edges
• If edges only increase in level, resulting graph is a DAG!
Example
G G0
−5
a b a0 b0 c0 d0
−1 6 −4 0 0
−4 −5 −1 3
6 0 0
c d a1 b1 c1 d1
3
δ(a0 , vk )
a2 b2 c2 d2
k\v a b c d
0 0 ∞ ∞ ∞
1 0 −5 6 ∞
a3 b3 c3 d3
2 0 −5 −9 9
3 0 −5 −9 −6
4 0 −7 −9 −6
a4 b4 c4 d4
δ(a, v) 0 −∞ −∞ −∞
4 Lecture 12: Bellman-Ford
Correctness
• Claim 3: δ(s0 , vk ) = δk (s, v) for all v ∈ V and k ∈ {0, . . . , |V |}
• Proof: By induction on k:
– Base case: true for all v ∈ V when k = 0 (only v0 reachable from s0 is v = s)
– Inductive Step: Assume true for all k < k 0 , prove for k = k 0
δ(s0 , vk0 ) = min{δ(s0 , uk0 −1 ) + w(uk0 −1 , vk0 ) | uk0 −1 ∈ Adj− (vk0 )}
= min{{δ(s0 , uk0 −1 ) + w(u, v) | u ∈ Adj− (v)} ∪ {δ(s0 , vk0 −1 )}}
= min{{δk0 −1 (s, u) + w(u, v) | u ∈ Adj− (v)} ∪ {δk0 −1 (s, v)}} (by induction)
= δk0 (s, v)
• Claim 4: At the end of Bellman-Ford d(s, v) = δ(s, v)
• Proof: Correctly computes δ|V |−1 (s, v) and δ|V | (s, v) for all v ∈ V by Claim 3
6 −∞, correctly sets d(s, v) = δ|V |−1 (s, v) = δ(s, v)
– If δ(s, v) =
– Then sets d(s, v) = −∞ for any v reachable from a witness; correct by Claim 2
Running Time
• G0 has size O(|V |(|V | + |E|)) and can be constructed in as much time
• Running DAG Relaxation on G0 takes linear time in the size of G0
• Does O(1) work for each vertex reachable from a witness
• Finding reachability of a witness takes O(|E|) time, with at most O(|V |) witnesses: O(|V ||E|)
• (Alternatively, connect super node x to witnesses via 0-weight edges, linear search from x)
• Pruning G at start to only subgraph reachable from s yields O(|V ||E|)-time algorithm
• Can use just O(|V |) space by storing only δ(s0 , vk−1 ) and δ(s0 , vk ) for each k from 1 to |V |
• Traditionally, Bellman-Ford stores only one value per vertex, attempting to relax every edge
in |V | rounds; but estimates do not correspond to k-Edge Distances, so analysis trickier
• But these space optimizations don’t return a negative weight cycle
MIT OpenCourseWare
https://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms
Introduction to Algorithms: 6.006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Lecture 13: Dijkstra’s Algorithm
Review
• Single-Source Shortest Paths on weighted graphs
• Previously: O(|V | + |E|)-time algorithms for small positive weights or DAGs
• Last time: Bellman-Ford, O(|V ||E|)-time algorithm for general graphs with negative weights
• Today: faster for general graphs with non-negative edge weights, i.e., for e ∈ E, w(e) ≥ 0
Dijkstra’s Algorithm
• Named for famous Dutch computer scientist Edsger Dijkstra (actually Dÿkstra!)
• Idea! Relax edges from each vertex in increasing order of distance from source s
• Idea! Efficiently find next vertex in the order using a data structure
• Changeable Priority Queue Q on items with keys and unique IDs, supporting operations:
Q.build(X) initialize Q with items in iterator X
Q.delete min() remove an item with minimum key
Q.decrease key(id, k) find stored item with ID id and change key to k
• Assume vertex IDs are integers from 0 to |V | − 1 so can use a direct access array for D
• Build changeable priority queue Q with an item (v, d(s, v)) for each vertex v ∈ V
• While Q not empty, delete an item (u, d(s, u)) from Q that has minimum d(s, u)
Example
Delete d(s, v)
v from Q s a b c d 2
G a b
s 0 ∞ ∞ ∞ ∞ 10
c 10 ∞ 3 ∞
d 7 11 5 s 4 1 5 7
8
a 7 10 3
b 9 c d
2
δ(s, v) 0 7 9 3 5
Correctness
• Claim: At end of Dijkstra’s algorithm, d(s, v) = δ(s, v) for all v ∈ V
• Proof:
– If relaxation sets d(s, v) to δ(s, v), then d(s, v) = δ(s, v) at the end of the algorithm
∗ Relaxation can only decrease estimates d(s, v)
∗ Relaxation is safe, i.e., maintains that each d(s, v) is weight of a path to v (or ∞)
– Suffices to show d(s, v) = δ(s, v) when vertex v is removed from Q
∗ Proof by induction on first k vertices removed from Q
∗ Base Case (k = 1): s is first vertex removed from Q, and d(s, s) = 0 = δ(s, s)
∗ Inductive Step: Assume true for k < k 0 , consider k 0 th vertex v 0 removed from Q
∗ Consider some shortest path π from s to v 0 , with w(π) = δ(s, v 0 )
∗ Let (x, y) be the first edge in π where y is not among first k 0 − 1 (perhaps y = v 0 )
∗ When x was removed from Q, d(s, x) = δ(s, x) by induction, so:
Running Time
• Count operations on changeable priority queue Q, assuming it contains n items:
Operation Time Occurrences in Dijkstra
Q.build(X) (n = |X|) Bn 1
Q.delete min() Mn |V |
Q.decrease key(id, k) Dn |E|
• Total running time is O(B|V | + |V | · M|V | + |E| · D|V | )
• Assume pruned graph to search only vertices reachable from the source, so |V | = O(|E|)
• If graph is dense, i.e., |E| = Θ(|V |2 ), using an Array for Q0 yields O(|V |2 ) time
• If graph is sparse, i.e., |E| = Θ(|V |), using a Binary Heap for Q0 yields O(|V | log |V |) time
• A Fibonacci Heap is theoretically good in all cases, but is not used much in practice
• We won’t discuss Fibonacci Heaps in 6.006 (see 6.854 or CLRS chapter 19 for details)
• You should assume Dijkstra runs in O(|E|+|V | log |V |) time when using in theory problems
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms
Introduction to Algorithms: 6.006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Lecture 14: Johnson’s Algorithm
Previously
• Useful when understanding whole network, e.g., transportation, circuit layout, supply chains...
• Just doing a SSSP algorithm |V | times is actually pretty good, since output has size O(|V |2 )
– |V | · O(|V | + |E|) with BFS if weights positive and bounded by O(|V | + |E|)
– |V | · O(|V | + |E|) with DAG Relaxation if acyclic
– |V | · O(|V | log |V | + |E|) with Dijkstra if weights non-negative or graph undirected
– |V | · O(|V | · |E|) with Bellman-Ford (general)
• Today: Solve APSP in any weighted graph in |V | · O(|V | log |V | + |E|) time
2 Lecture 14: Johnson’s Algorithm
Approach
• Idea: Make all edge weights non-negative while preserving shortest paths!
• Claim: Can compute distances in G from distances in G0 in O(|V |(|V | + |E|)) time
– Compute shortest-path tree from distances, for each s ∈ V 0 in O(|V | + |E|) time (L11)
– Also shortest-paths tree in G, so traverse tree with DFS while also computing distances
– Takes O(|V | · (|V | + |E|)) time (which is less time than |V | times Dijkstra)
• But how to make G0 with non-negative edge weights? Is this even possible??
• Proof: Shortest paths are simple if no negative weights, but not if negative-weight cycle
• FAIL: Does not preserve shortest paths! Biases toward paths traversing fewer edges :(
• Idea! Given vertex v, add h to all outgoing edges and subtract h from all incoming edges
• Proof:
• This is a very general and useful trick to transform a graph while preserving shortest paths!
Lecture 14: Johnson’s Algorithm 3
• Make graph G0 : same as G but edge (u, v) ∈ E has weight w0 (u, v) = w(u, v) + h(u) − h(v)
• Proof:
– (Sum of h’s telescope, since there is a positive and negative h(vi ) for each interior i)
– Every path from v0 to vk changes by the same amount
– So any shortest path will still be shortest
• i.e., is there an h such that w(u, v) + h(u) − h(v) ≥ 0 for every (u, v) ∈ E?
• Re-arrange this condition to h(v) ≤ h(u) + w(u, v), looks like triangle inequality!
• Idea! Condition would be satisfied if h(v) = δ(s, v) and δ(s, v) is finite for some s
• But graph may be disconnected, so may not exist any such vertex s... :(
• Claim: If δ(s, v) = −∞ for any v ∈ V , then the original graph has a negative-weight cycle
• Proof:
Johnson’s Algorithm
• Construct Gx from G by adding vertex x connected to each vertex v ∈ V with 0-weight edge
• Else:
Correctness
• Already proved that transformation from G to G0 preserves shortest paths
Running Time
• O(|V | + |E|) time to construct Gx
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms
Introduction to Algorithms: 6.006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Lecture 15: Recursive Algorithms
Fibonacci Numbers
• Suppose we want to compute the nth Fibonacci number Fn
• To solve, either:
• A subtlety is that Fibonacci numbers grow to Θ(n) bits long, potentially word size w
Dynamic Programming
• Weird name coined by Richard Bellman
• “Recurse but re-use” (Top down: record and lookup subproblem solutions)
2. Relate subproblem solutions recursively x(i) = f (x(j), . . .) for one or more j < i
4. Base cases
• State solutions for all (reachable) independent subproblems where relation breaks down
5. Original problem
6. Time analysis
P
• x∈X work(x), or if work(x) = O(W ) for all x ∈ X, then |X| · O(W )
• work(x) measures nonrecursive work in relation; treat recursions as taking O(1) time
1
This property often called optimal substructure. It is a property of recursion, not just dynamic programming
Lecture 15: Recursive Algorithms 5
• DAG Relaxation computes the same min values as this dynamic program, just
– step-by-step (if new value < min, update min via edge relaxation), and
– from the perspective of u and Adj+ (u) instead of v and Adj− (v)
Bowling
• Given n pins labeled 0, 1, . . . , n − 1
Bowling Algorithms
• Let’s start with a more familiar divide-and-conquer algorithm:
– Subproblems: B(i, j) = maximum score starting with just pins i, i + 1, . . . , j − 1,
for 0 ≤ i ≤ j ≤ n
– Relation:
∗ m = b(i + j)/2c
∗ Either hit m and m + 1 together, or don’t
∗ B(i, j) = max{vm · vm+1 + B(i, m) + B(m + 2, j), B(i, m + 1) + B(m + 1, j)}
– Topo. order: Increasing j − i
– Base cases: B(i, i) = 0, B(i, i + 1) = max{vi , 0}
– Original: B(0, n)
– Time: T (n) = 4 T (n/2) + O(1) = O(n2 )
• This algorithm works but isn’t very fast, and doesn’t generalize well
(e.g., to allow for a bigger ball that hits three balls at once)
v0 · v1 v1 · v2 v2 · v3
Lecture 15: Recursive Algorithms 7
Bowling Code
• Converting a SRT BOT specification into code is automatic/straightforward
• Here’s the result for the Bowling Dynamic Program above:
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms
Introduction to Algorithms: 6.006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Lecture 16: Dyn. Prog. Subproblems
• “Recurse but re-use” (Top down: record and lookup subproblem solutions)
2. Relate subproblem solutions recursively x(i) = f (x(j), . . .) for one or more j < i
• Identify a question about a subproblem solution that, if you knew the answer to, reduces
the subproblem to smaller subproblem(s)
• Locally brute-force all possible answers to the question
4. Base cases
• State solutions for all (reachable) independent subproblems where relation breaks down
5. Original problem
6. Time analysis
P
• x2X work(x), or if work(x) = O(W ) for all x 2 X, then |X| · O(W )
• work(x) measures nonrecursive work in relation; treat recursions as taking O(1) time
2 Lecture 16: Dyn. Prog. Subproblems
1. Subproblems
2. Relate
3. Topological order
4. Base
5. Original problem
6. Time
1. Subproblems
• x(i) = length of longest increasing subsequence of suffix A[i :] that includes A[i]
• For 0 i |A|
2. Relate
• We’re told that A[i] is in LIS (first element)
• Next question: what is the second element of LIS?
– Could be any A[j] where j > i and A[j] > A[i] (so increasing)
– Or A[i] might be the last element of LIS
• x(i) = max{1 + x(j) | i < j < |A|, A[j] > A[i]} [ {1}
3. Topological order
• Decreasing i
4. Base
• No base case necessary, because we consider the possibility that A[i] is last
5. Original problem
• What is the first element of LIS? Guess!
• Length of LIS of A is max{x(i) | 0 i < |A|}
• Store parent pointers to reconstruct subsequence
Lecture 16: Dyn. Prog. Subproblems 5
6. Time
• # subproblems: |A|
• work per subproblem: O(|A|)
• O(|A|2 ) running time
• Exercise: speed up to O(|A| log |A|) by doing only O(log |A|) work per subproblem,
via AVL tree augmentation
1 def lis(A):
2 a = len(A)
3 x = [1] * a
4 for i in reversed(range(a)):
5 for j in range(i, a):
6 if A[j] > A[i]:
7 x[i] = max(x[i], 1 + x[j])
8 return max(x)
6 Lecture 16: Dyn. Prog. Subproblems
1. Subproblems
• Choose subproblems that correspond to the state of the game
• For every contiguous subsequence of coins from i to j, 0 i j < n
• x(i, j) = maximum total value I can take starting from coins of values vi , . . . , vj
2. Relate
• I must choose either coin i or coin j (Guess!)
• Then it’s your turn, so you’ll get value x(i + 1, j) or x(i, j 1), respectively
• To figure out how much value I get, subtract this from total coin values
P P
• x(i, j) = max{vi + jk=i+1 vk x(i + 1, j), vj + jk=i1 vk x(i, j 1)}
3. Topological order
• Increasing j i
4. Base
• x(i, i) = vi
5. Original problem
• x(0, n 1)
• Store parent pointers to reconstruct strategy
6. Time
• # subproblems: ⇥(n2 )
• work per subproblem: ⇥(n) to compute sums
• ⇥(n3 ) running time
Pj
• Exercise: speed up to ⇥(n2 ) time by precomputing all sums k=i vk in ⇥(n2 ) time,
via dynamic programming (!)
Lecture 16: Dyn. Prog. Subproblems 7
• Second solution uses subproblem expansion: add subproblems for when you move next
1. Subproblems
2. Relate
3. Topological order
• Increasing j i
4. Base
• x(i, i, me) = vi
• x(i, i, you) = 0
5. Original problem
• x(0, n 1, me)
• Store parent pointers to reconstruct strategy
6. Time
• # subproblems: ⇥(n2 )
• work per subproblem: ⇥(1)
• ⇥(n2 ) running time
8 Lecture 16: Dyn. Prog. Subproblems
• If you find yourself lacking information to check the desired conditions of the problem, or
lack the natural subproblem to recurse on, try subproblem constraint/expansion!
• More subproblems and constraints give the relation more to work with, so can make DP
more feasible
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms
Introduction to Algorithms: 6.006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Lecture 17: Dyn. Prog. III
2. Relate subproblem solutions recursively x(i) = f (x(j), . . .) for one or more j < i
• Identify a question about a subproblem solution that, if you knew the answer to, reduces
the subproblem to smaller subproblem(s)
• Locally brute-force all possible answers to the question
4. Base cases
• State solutions for all (reachable) independent subproblems where relation breaks down
5. Original problem
6. Time analysis
P
• x∈X work(x), or if work(x) = O(W ) for all x ∈ X, then |X| · O(W )
• work(x) measures nonrecursive work in relation; treat recursions as taking O(1) time
2. Relate
3. Topological order
4. Base
5. Original problem
6. Time
• # subproblems: |V | × (|V | + 1)
• Work for subproblem δk (s, v): O(degin (v))
|V | |V |
X X X
O(degin (v)) = O(|E|) = O(|V | · |E|)
k=0 v∈V k=0
1. Subproblems
• d(u, v, k) = minimum weight of a path from u to v that only uses vertices from
{1, 2, . . . , k} ∪ {u, v}
• For u, v ∈ V and 1 ≤ k ≤ |V |
2. Relate
• x(u, v, k) = min{x(u, k, k − 1) + x(k, v, k − 1), x(u, v, k − 1)}
• Only constant branching! No longer guessing previous vertex/edge
3. Topological order
• Increasing k: relation depends only on smaller k
4. Base
• x(u, u, 0) = 0
• x(u, v, 0) = w(u, v) if (u, v) ∈ E
• x(u, v, 0) = ∞ if none of the above
5. Original problem
• x(u, v, |V |) for all u, v ∈ V
6. Time
• O(|V |3 ) subproblems
• Each O(1) work
• O(|V |3 ) in total
• Constant number of dependencies per subproblem brings the factor of O(|E|) in the
running time down to O(|V |).
4 Lecture 17: Dyn. Prog. III
Arithmetic Parenthesization
• Input: arithmetic expression a0 ∗1 a1 ∗2 a2 · · · ∗n−1 an−1
where each ai is an integer and each ∗i ∈ {+, ×}
1. Subproblems
• Sufficient to maximize each subarray? No! (−3) × (−3) = 9 > (−2) × (−2) = 4
• x(i, j, opt) = opt value obtainable by parenthesizing ai ∗i+1 · · · ∗j−1 aj−1
• For 0 ≤ i < j ≤ n and opt ∈ {min, max}
2. Relate
3. Topological order
4. Base
5. Original problem
• X(0, n, max)
• Store parent pointers (two!) to find parenthesization (forms binary tree!)
6. Time
Piano Fingering
• Given sequence t0 , t1 , . . . , tn−1 of n single notes to play with right hand (will generalize to
multiple notes and hands later)
• Given metric d(t, f, t0 , f 0 ) of difficulty of transitioning from note t with finger f to note t0
with finger f 0
• First attempt:
1. Subproblems
2. Relate
• Solution:
1. Subproblems
• x(i, f ) = minimum total difficulty for playing notes ti , ti+1 , . . . , tn−1 starting with fin-
ger f on note ti
• For 0 ≤ i < n and 1 ≤ f ≤ F
2. Relate
3. Topological order
4. Base
5. Original problem
• min{x(0, f ) | 1 ≤ f ≤ F }
6. Time
• Θ(n · F ) subproblems
• Θ(F ) work per subproblem
• Θ(n · F 2 )
• No dependence on the number of different notes!
Lecture 17: Dyn. Prog. III 7
Guitar Fingering
• Up to S = number of strings different ways to play the same note
• Goal: fingering fi : ti → {1, 2,P. . . , F } specifying how to finger each note (including which
string for guitar) to minimize n−1i=1 d(ti−1 , fi−1 , ti , fi )
• Θ(n · T F ) subproblems
• Θ(n · T 2F ) time
– F = 2 feet
– T = 2 (at most two notes at once)
– Exercise: handle sustained notes, using “where each foot is” (on an arrow or in the
middle) as added state for suffix subproblems
MIT OpenCourseWare
https://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms
Introduction to Algorithms: 6.006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Lecture 18: Pseudopolynomial
2. Relate subproblem solutions recursively x(i) = f (x(j), . . .) for one or more j < i
• Identify a question about a subproblem solution that, if you knew the answer to, reduces
the subproblem to smaller subproblem(s)
• Locally brute-force all possible answers to the question
4. Base cases
• State solutions for all (reachable) independent subproblems where relation breaks down
5. Original problem
6. Time analysis
P
• x2X work(x), or if work(x) = O(W ) for all x 2 X, then |X| · O(W )
• work(x) measures nonrecursive work in relation; treat recursions as taking O(1) time
2 Lecture 18: Pseudopolynomial
Rod Cutting
• Given a rod of length L and value v(`) of rod of length ` for all ` 2 {1, 2, . . . , L}
• Goal: Cut the rod to maximize the value of cut rod pieces
• Example: L = 7, v = [0, 1, 10, 13, 18, 20, 31, 32]
`= 0 1 2 3 4 5 6 7
1 # recursive
2 x = {}
3 def cut_rod(l, v):
4 if l < 1: return 0 # base case
5 if l not in x: # check memo
6 for piece in range(1, l + 1): # try piece
7 x_ = v[piece] + cut_rod(l - piece, v) # recurrence
8 if (l not in x) or (x[l] < x_): # update memo
9 x[l] = x_
10 return x[l]
1 # iterative
2 def cut_rod(L, v):
3 x = [0] * (L + 1) # base case
4 for l in range(L + 1): # topological order
5 for piece in range(1, l + 1): # try piece
6 x_ = v[piece] + x[l - piece] # recurrence
7 if x[l] < x_: # update memo
8 x[l] = x_
9 return x[L]
Subset Sum
• Input: Sequence of n positive integers A = {a0 , a1 , . . . , an }
P
• Output: Is there a subset of A that sums exactly to T ? (i.e., 9A0 ✓ A s.t. a2A0 a = T ?)
• Example: A = (1, 3, 4, 12, 19, 21, 22), T = 47 allows A0 = {3, 4, 19, 21}
• In example, answer is YES. However, answer is NO for some T , e.g., 2, 6, 9, 10, 11, . . .
1. Subproblems
2. Relate
3. Topological order
4. Base
5. Original problem
6. Time
Is This Polynomial?
• Input size is n + 1: one integer T and n integers in A
• On w-bit word RAM, T 2w and w lg(n + 1), but we don’t have an upper bound on w
• E.g., w = n is not unreasonable, but then running time is O(n2n ), which is exponential
Pseudopolynomial
• Algorithm has pseudopolynomial time: running time is bounded above by a constant-
degree polynomial in input size and input integers
• Such algorithms are polynomial in the case that integers are polynomially bounded in input
size, i.e., nO(1) (same case that Radix Sort runs in O(n) time)
• Counting sort O(n + u), radix sort O(n logn u), direct-access array build O(n + u), and
Fibonacci O(n) are all pseudopolynomial algorithms we’ve seen already
• Radix sort is actually weakly polynomial (a notion in between strongly polynomial and
pseudopolynomial): bounded above by a constant-degree polynomial in the input size mea-
sured in bits, i.e., in the logarithm of the input integers
Complexity
• Is Subset Sum solvable in polynomial time when integers are not polynomially bounded?
• Subproblems:
• Subproblem constraints/expansion:
• Relation:
• Original problem:
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms
Introduction to Algorithms: 6.006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Recitation 1
Recitation 1
Algorithms
The study of algorithms searches for efficient procedures to solve problems. The goal of this class
is to not only teach you how to solve problems, but to teach you to communicate to others that a
solution to a problem is both correct and efficient.
• An algorithm solves a problem if for every problem input it returns a correct output.
While a problem input may have more than one correct output, an algorithm should only return one
output for a given input (it is a function). As an example, consider the problem of finding another
student in your recitation who shares the same birthday.
Problem: Given the students in your recitation, return either the names of two students
who share the same birthday and year, or state that no such pair exists.
This problem relates one input (your recitation) to one or more outputs comprising birthday-
matching pairs of students or one negative result. A problem input is sometimes called an instance
of the problem. One algorithm that solves this problem is the following.
Of course, our algorithm solves a much more general problem than the one proposed above. The
same algorithm can search for a birthday-matching pair in any set of students, not just the students
in your recitation. In this class, we try to solve problems which generalize to inputs that may be
arbitrarily large. The birthday matching algorithm can be applied to a recitation of any size. But
how can we determine whether the algorithm is correct and efficient?
Recitation 1 2
Correctness
Any computer program you write will have finite size, while an input it acts on may be arbitrarily
large. Thus every algorithm we discuss in this class will need to repeat commands in the algorithm
via loops or recursion, and we will be able to prove correctness of the algorithm via induction.
Let’s prove that the birthday algorithm is correct.
Proof. Induct on the first k students interviewed. Base case: for k = 0, there is no
matching pair, and the algorithm returns that there is no matching pair. Alternatively,
assume for induction that the algorithm returns correctly for the first k students. If the
first k students contain a matching pair, than so does the first k + 1 students and the
algorithm already returned a matching pair. Otherwise the first k students do not contain
a matching pair, so if the k +1 students contain a match, the match includes student k +1,
and the algorithm checks whether the student k + 1 has the same birthday as someone
already processed.
Efficiency
What makes a computer program efficient? One program is said to be more efficient than another
if it can solve the same problem input using fewer resources. We expect that a larger input might
take more time to solve than another input having smaller size. In addition, the resources used by
a program, e.g. storage space or running time, will depend on both the algorithm used and the ma-
chine on which the algorithm is implemented. We expect that an algorithm implemented on a fast
machine will run faster than the same algorithm on a slower machine, even for the same input. We
would like to be able to compare algorithms, without having to worry about how fast our machine
is. So in this class, we compare algorithms based on their asymptotic performance relative to
problem input size, in order to ignore constant factor differences in hardware performance.
Recitation 1 3
Asymptotic Notation
We can use asymptotic notation to ignore constants that do not change with the size of the problem
input. O(f (n)) represents the set of functions with domain over the natural numbers satisfying the
following property.
O Notation: Non-negative function g(n) is in O(f (n)) if and only if there exists a
positive real number c and positive integer n0 such that g(n) ≤ c · f (n) for all n ≥ n0 .
This definition upper bounds the asymptotic growth of a function for sufficiently large n, i.e.,
the bound on growth is true even if we were to scale or shift our function by a constant amount.
By convention, it is more common for people to say that a function g(n) is O(f (n)) or equal
to O(f (n)), but what they really mean is set containment, i.e., g(n) ∈ O(f (n)). So since our
problem’s input size is cn for some constant c, we can forget about c and say the input size is O(n)
(order n). A similar notation can be used for lower bounds.
Ω Notation: Non-negative function g(n) is in Ω(f (n)) if and only if there exists a
positive real number c and positive integer n0 such that c · f (n) ≤ g(n) for all n ≥ n0 .
When one function both asymptotically upper bounds and asymptotically lower bounds another
function, we use Θ notation. When g(n) = Θ(f (n)), we say that f (n) represents a tight bound
on g(n).
Θ Notation: Non-negative g(n) is in Θ(f (n)) if and only if g(n) ∈ O(f (n)) ∩ Ω(f (n)).
We often use shorthand to characterize the asymptotic growth (i.e., asymptotic complexity) of
common functions, such as those shown in the table below1 . Here we assume c ∈ Θ(1).
Linear time is often necessary to solve problems where the entire input must be read in order to
solve the problem. However, if the input is already accessible in memory, many problems can
be solved in sub-linear time. For example, the problem of finding a value in a sorted array (that
has already been loaded into memory) can be solved in logarithmic time via binary search. We
focus on polynomial time algorithms in this class, typically for small values of c. There’s a big
difference between logarithmic, linear, and exponential. If n = 1000, log n ≈ 101 , n ≈ 103 , while
2n ≈ 10300 . For comparison, the number of atoms in the universe is estimated around 1080 . It is
common to use the variable ‘n’ to represent a parameter that is linear in the problem input size,
though this is not always the case. For example, when talking about graph algorithms later in the
term, a problem input will be a graph parameterized by vertex set V and edge set E, so a natural
input size will be Θ(|V | + |E|). Alternatively, when talking about matrix algorithms, it is common
to let n be the width of a square matrix, where a problem input will have size Θ(n2 ), specifying
each element of the n × n matrix.
c
1
Note that exponential 2Θ(n ) is a convenient abuse of notation meaning {2p | p ∈ Θ(nc )}.
Recitation 1 4
Model of Computation
In order to precisely calculate the resources used by an algorithm, we need to model how long a
computer takes to perform basic operations. Specifying such a set of operations provides a model
of computation upon which we can base our analysis. In this class, we will use the w-bit Word-
RAM model of computation, which models a computer as a random access array of machine
words called memory, together with a processor that can perform operations on the memory.
A machine word is a sequence of w bits representing an integer from the set {0, . . . , 2w − 1}.
A Word-RAM processor can perform basic binary operations on two machine words in constant
time, including addition, subtraction, multiplication, integer division, modulo, bitwise operations,
and binary comparisons. In addition, given a word a, the processor can read or write the word
in memory located at address a in constant time. If a machine word contains only w bits, the
processor will only be able to read and write from at most 2w addresses in memory2 . So when
solving a problem on an input stored in n machine words, we will always assume our Word-RAM
has a word size of at least w > log2 n bits, or else the machine would not be able to access all of the
input in memory. To put this limitation in perspective, a Word-RAM model of a byte-addressable
64-bit machine allows inputs up to ∼ 1010 GB in size.
Data Structure
The running time of our birthday matching algorithm depends on how we store the record of names
and birthdays. A data structure is a way to store a non-constant amount of data, supporting a set
of operations to interact with that data. The set of operations supported by a data structure is
called an interface. Many data structures might support the same interface, but could provide
different performance for each operation. Many problems can be solved trivially by storing data
in an appropriate choice of data structure. For our example, we will use the most primitive data
structure native to the Word-RAM: the static array. A static array is simply a contiguous sequence
of words reserved in memory, supporting a static sequence interface:
• StaticArray.get at(i): return the word stored at array index i in Θ(1) time
• StaticArray.set at(i, x): write the word x to array index i in Θ(1) time
The get at(i) and set at(i, x) operations run in constant time because each item in the
array has the same size: one machine word. To store larger objects at an array index, we can
interpret the machine word at the index as a memory address to a larger piece of memory. A Python
tuple is like a static array without set at(i, x). A Python list implements a dynamic array
(see L02).
2
For example, on a typical 32-bit machine, each byte (8-bits) is addressable (for historical reasons), so the size of
the machine’s random-access memory (RAM) is limited to (8-bits)×(232 ) ≈ 4 GB.
Recitation 1 5
1 class StaticArray:
2 def __init__(self, n):
3 self.data = [None] * n
4 def get_at(self, i):
5 if not (0 <= i < len(self.data)): raise IndexError
6 return self.data[i]
7 def set_at(self, i, x):
8 if not (0 <= i < len(self.data)): raise IndexError
9 self.data[i] = x
10
11 def birthday_match(students):
12 ’’’
13 Find a pair of students with the same birthday
14 Input: tuple of student (name, bday) tuples
15 Output: tuple of student names or None
16 ’’’
17 n = len(students) # O(1)
18 record = StaticArray(n) # O(n)
19 for k in range(n): # n
20 (name1, bday1) = students[k] # O(1)
21 for i in range(k): # k Check if in record
22 (name2, bday2) = record.get_at(i) # O(1)
23 if bday1 == bday2: # O(1)
24 return (name1, name2) # O(1)
25 record.set_at(k, (name1, bday1)) # O(1)
26 return None # O(1)
This is quadratic in n, which is polynomial! Is this efficient? No! We can do better by using a
different data structure for our record. We will spend the first half of this class studying elementary
data structures, where each data structure will be tailored to support a different set of operations
efficiently.
3
This is a reasonable restriction, which allows names and birthdays to contain O(w) characters from a constant
sized alphabet. Since w > log2 n, this restriction still allows each student’s information to be distinct.
Recitation 1 6
Asymptotics Exercises
1. Have students generate 10 functions and order them based on asymptotic growth.
� n
2. Find a simple, tight asymptotic bound for 6006 .
Solution: Definition yields n(n − 1) . . . (n − 6005) in the numerator �(a degree 6006 poly-
n
nomial) and 6006! in the denominator (constant with respect to n). So 6006 = Θ(n6006 ).
� � √ 2
3. Find a simple, tight asymptotic bound for log6006 log n n .
Solution: Recall exponent and logarithm rules: log ab = log a + log b, log (ab ) = b log a,
and loga b = log b/ log a.
√ 2 2
n
�√
log6006 log n = log n log n
log 6006
= Θ(log n1/2 + log log n) = Θ(log n)
n+1 n
4. Show that 2n+1 ∈ Θ(2n ), but that 22 6∈ O(22 ).
Solution: In the first case, 2n+1 = 2 · 2n , which is a constant factor larger than 2n . In the
n+1 � n 2 n
second case, 22 = 22 , which is definitely more than a constant factor larger than 22 .
5. Show that (log n)a = O(nb ) for all positive constants a and b.
Solution: It’s enough to show nb /(log n)a limits to ∞ as n → ∞, and this is equivalent to
arguing that the log of this expression approaches ∞:
nb
lim log = lim (b log n − a log log n) = lim (bx − a log x) = ∞,
n→∞ (log n)a n→∞ x→∞
as desired.
Note: for the same reasons, na = O(cn ) for any c > 1.
6. Show that (log n)log n = Ω(n).
Solution: Note that mm = Ω(2m ), so setting n = 2m completes the proof.
7. Show that (6n)! 6∈ Θ(n!), but that log ((6n)!) ∈ Θ(log(n!)).
Solution: We invoke Sterling’s approximation,
√ n n
1
n! = 2πn 1+Θ .
e n
Substituting in 6n gives an expression that is at least 66n larger than the original. But taking
the logarithm of Sterling’s gives log(n!) = Θ(n log n), and substituting in 6n yields only
constant additional factors.
MIT OpenCourseWare
https://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms
Introduction to Algorithms: 6.006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Recitation 2
Recitation 2
Sequence Implementations
Here, we will discuss three data structures to implement the sequence interface. In Problem Set
1, you will extend both Linked Lists and Dynamic arrays to make both first and last dynamic
operations O(1) time for each. Notice that none of these data structures support dynamic operations
at arbitrary index in sub-linear time. We will learn how to improve this operation in Lecture 7.
Array Sequence
Computer memory is a finite resource. On modern computers many processes may share the same
main memory store, so an operating system will assign a fixed chunk of memory addresses to
each active process. The amount of memory assigned depends on the needs of the process and the
availability of free memory. For example, when a computer program makes a request to store a
variable, the program must tell the operating system how much memory (i.e. how many bits) will
be required to store it. To fulfill the request, the operating system will find the available memory
in the process’s assigned memory address space and reserve it (i.e. allocate it) for that purpose
until it is no longer needed. Memory management and allocation is a detail that is abstracted away
by many high level languages including Python, but know that whenever you ask Python to store
something, Python makes a request to the operating system behind-the-scenes, for a fixed amount
of memory in which to store it.
Now suppose a computer program wants to store two arrays, each storing ten 64-bit words. The
program makes separate requests for two chunks of memory (640 bits each), and the operating
system fulfills the request by, for example, reserving the first ten words of the process’s assigned
address space to the first array A, and the second ten words of the address space to the second array
B. Now suppose that as the computer program progresses, an eleventh word w needs to be added
to array A. It would seem that there is no space near A to store the new word: the beginning of the
process’s assigned address space is to the left of A and array B is stored on the right. Then how
can we add w to A? One solution could be to shift B right to make room for w, but tons of data
may already be reserved next to B, which you would also have to move. Better would be to simply
request eleven new words of memory, copy A to the beginning of the new memory allocation, store
w at the end, and free the first ten words of the process’s address space for future memory requests.
A fixed-length array is the data structure that is the underlying foundation of our model of com-
putation (you can think of your computer’s memory as a big fixed-length array that your operating
Recitation 2 3
system allocates from). Implementing a sequence using an array, where index i in the array cor-
responds to item i in the sequence allows get at and set at to be O(1) time because of our
random access machine. However, when deleting or inserting into the sequence, we need to move
items and resize the array, meaning these operations could take linear-time in the worst case. Below
is a full Python implementation of an array sequence.
1 class Array_Seq:
2 def __init__(self): # O(1)
3 self.A = []
4 self.size = 0
5
6 def __len__(self): return self.size # O(1)
7 def __iter__(self): yield from self.A # O(n) iter_seq
8
9 def build(self, X): # O(n)
10 self.A = [a for a in X] # pretend this builds a static array
11 self.size = len(self.A)
12
13 def get_at(self, i): return self.A[i] # O(1)
14 def set_at(self, i, x): self.A[i] = x # O(1)
15
16 def _copy_forward(self, i, n, A, j): # O(n)
17 for k in range(n):
18 A[j + k] = self.A[i + k]
19
20 def _copy_backward(self, i, n, A, j): # O(n)
21 for k in range(n - 1, -1, -1):
22 A[j + k] = self.A[i + k]
23
24 def insert_at(self, i, x): # O(n)
25 n = len(self)
26 A = [None] * (n + 1)
27 self._copy_forward(0, i, A, 0)
28 A[i] = x
29 self._copy_forward(i, n - i, A, i + 1)
30 self.build(A)
31
1 class Linked_List_Node:
2 def __init__(self, x): # O(1)
3 self.item = x
4 self.next = None
5
6 def later_node(self, i): # O(i)
7 if i == 0: return self
8 assert self.next
9 return self.next.later_node(i - 1)
Such data structures are sometimes called pointer-based or linked and are much more flexible than
array-based data structures because their constituent items can be stored anywhere in memory. A
linked list stores the address of the node storing the first element of the list called the head of the
list, along with the linked list’s size, the number of items stored in the linked list. It is easy to add
an item after another item in the list, simply by changing some addresses (i.e. relinking pointers).
In particular, adding a new item at the front (head) of the list takes O(1) time. However, the only
way to find the ith item in the sequence is to step through the items one-by-one, leading to worst-
case linear time for get at and set at operations. Below is a Python implementation of a full
linked list sequence.
1 class Linked_List_Seq:
2 def __init__(self): # O(1)
3 self.head = None
4 self.size = 0
5
6 def __len__(self): return self.size # O(1)
7
8 def __iter__(self): # O(n) iter_seq
9 node = self.head
10 while node:
11 yield node.item
12 node = node.next
13
14 def build(self, X): # O(n)
15 for a in reversed(X):
16 self.insert_first(a)
17
18 def get_at(self, i): # O(i)
19 node = self.head.later_node(i)
20 return node.item
21
Recitation 2 5
Then how does Python support appending to the end of a length n Python List in worst-case O(1)
time? The answer is simple: it doesn’t. Sometimes appending to the end of a Python List requires
O(n) time to transfer the array to a larger allocation in memory, so sometimes appending to a
Python List takes linear time. However, allocating additional space in the right way can guarantee
that any sequence of n insertions only takes at most O(n) time (i.e. such linear time transfer oper-
ations do not occur often), so insertion will take O(1) time per insertion on average. We call this
asymptotic running time amortized constant time, because the cost of the operation is amortized
(distributed) across many applications of the operation.
To achieve an amortized constant running time for insertion into an array, our strategy will be to
allocate extra space in proportion to the size of the array being stored. Allocating O(n) additional
space ensures that a linear number of insertions must occur before an insertion will overflow the
allocation. A typical implementation of a dynamic array will allocate double the amount of space
needed to store the current array, sometimes referred to as table doubling. However, allocating
any constant fraction of additional space will achieve the amortized bound. Python Lists allocate
additional space according to the following formula (from the Python source code written in C):
Here, the additional allocation is modest, roughly one eighth of the size of the array being appended
(bit shifting the size to the right by 3 is equivalent to floored division by 8). But the additional al-
location is still linear in the size of the array, so on average, n/8 insertions will be performed for
every linear time allocation of the array, i.e. amortized constant time.
What if we also want to remove items from the end of the array? Popping the last item can occur in
constant time, simply by decrementing a stored length of the array (which Python does). However,
if a large number of items are removed from a large list, the unused additional allocation could
occupy a significant amount of wasted memory that will not available for other purposes. When
the length of the array becomes sufficiently small, we can transfer the contents of the array to a
new, smaller memory allocation so that the larger memory allocation can be freed. How big should
this new allocation be? If we allocate the size of the array without any additional allocation, an
immediate insertion could trigger another allocation. To achieve constant amortized running time
for any sequence of n appends or pops, we need to make sure there remains a linear fraction of
unused allocated space when we rebuild to a smaller array, which guarantees that at least Ω(n)
sequential dynamic operations must occur before the next time we need to reallocate memory.
1 class Dynamic_Array_Seq(Array_Seq):
2 def __init__(self, r = 2): # O(1)
3 super().__init__()
4 self.size = 0
5 self.r = r
6 self._compute_bounds()
7 self._resize(0)
8
9 def __len__(self): return self.size # O(1)
10
11 def __iter__(self): # O(n)
12 for i in range(len(self)): yield self.A[i]
13
14 def build(self, X): # O(n)
15 for a in X: self.insert_last(a)
16
17 def _compute_bounds(self): # O(1)
18 self.upper = len(self.A)
19 self.lower = len(self.A) // (self.r * self.r)
20
21 def _resize(self, n): # O(1) or O(n)
22 if (self.lower < n < self.upper): return
23 m = max(n, 1) * self.r
24 A = [None] * m
25 self._copy_forward(0, self.size, A, 0)
26 self.A = A
27 self._compute_bounds()
28
29 def insert_last(self, x): # O(1)a
30 self._resize(self.size + 1)
31 self.A[self.size] = x
32 self.size += 1
33
34 def delete_last(self): # O(1)a
35 self.A[self.size - 1] = None
36 self.size -= 1
37 self._resize(self.size)
38
39 def insert_at(self, i, x): # O(n)
40 self.insert_last(None)
41 self._copy_backward(i, self.size - (i + 1), self.A, i + 1)
42 self.A[i] = x
43
44 def delete_at(self, i): # O(n)
45 x = self.A[i]
46 self._copy_forward(i + 1, self.size - (i + 1), self.A, i)
47 self.delete_last()
48 return x
49 # O(n)
50 def insert_first(self, x): self.insert_at(0, x)
51 def delete_first(self): return self.delete_at(0)
Recitation 2 8
Exercises:
• Suppose the next pointer of the last node of a linked list points to an earlier node in the list,
creating a cycle. Given a pointer to the head of the list (without knowing its size), describe a
linear-time algorithm to find the number of nodes in the cycle. Can you do this while using
only constant additional space outside of the original linked list?
Solution: Begin with two pointers pointing at the head of the linked list: one slow pointer
and one fast pointer. The pointers take turns traversing the nodes of the linked list, starting
with the fast pointer. On the slow pointer’s turn, the slow pointer simply moves to the next
node in the list; while on the fast pointer’s turn, the fast pointer initially moves to the next
node, but then moves on to the next node’s next node before ending its turn. Every time the
fast pointer visits a node, it checks to see whether it’s the same node that the slow pointer
is pointing to. If they are the same, then the fast pointer must have made a full loop around
the cycle, to meet the slow pointer at some node v on the cycle. Now to find the length of
the cycle, simply have the fast pointer continue traversing the list until returning back to v,
counting the number of nodes visited along the way.
To see that this algorithm runs in linear time, clearly the last step of traversing the cycle takes
at most linear time, as v is the only node visited twice while traversing the cycle. Further,
we claim the slow pointer makes at most one move per node. Suppose for contradiction the
slow pointer moves twice away from some node u before being at the same node as the fast
pointer, meaning that u is on the cycle. In the same time the slow pointer takes to traverse the
cycle from u back to u, the fast pointer will have traveled around the cycle twice, meaning
that both pointers must have existed at the same node prior to the slow pointer leaving u, a
contradiction.
• Given a data structure implementing the Sequence interface, show how to use it to implement
the Set interface. (Your implementation does not need to be efficient.)
Solution:
1 def Set_from_Seq(seq):
2 class set_from_seq:
3 def __init__(self): self.S = seq()
4 def __len__(self): return len(self.S)
5 def __iter__(self): yield from self.S
6
7 def build(self, A):
8 self.S.build(A)
9
10 def insert(self, x):
11 for i in range(len(self.S)):
12 if self.S.get_at(i).key == x.key:
13 self.S.set_at(i, x)
14 return
15 self.S.insert_last(x)
16
Recitation 2 9
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms
Introduction to Algorithms: 6.006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Recitation 3
Recitation 3
Recall that in Recitation 2 we reduced the Set interface to the Sequence Interface (we simulated
one with the other). This directly provides a Set data structure from an array (albeit a poor one).
Operations O(·)
Container Static Dynamic Order
Data Structure build(X) find(k) insert(x) find min() find prev(k)
delete(k) find max() find next(k)
Array n n n n n
We would like to do better, and we will spend the next five lectures/recitations trying to do exactly
that! One of the simplest ways to get a faster Set is to store our items in a sorted array, where the
item with the smallest key appears first (at index 0), and the item with the largest key appears last.
Then we can simply binary search to find keys and support Order operations! This is still not great
for dynamic operations (items still need to be shifted when inserting or removing from the middle
of the array), but finding items by their key is much faster! But how do we get a sorted array in the
first place?
Operations O(·)
Container Static Dynamic Order
Data Structure build(X) find(k) insert(x) find min() find prev(k)
delete(k) find max() find next(k)
Sorted Array ? log n n 1 log n
1 class Sorted_Array_Set:
2 def __init__(self): self.A = Array_Seq() # O(1)
3 def __len__(self): return len(self.A) # O(1)
4 def __iter__(self): yield from self.A # O(n)
5 def iter_order(self): yield from self # O(n)
6
20 return m
21
22 def find_min(self): # O(1)
23 if len(self) > 0: return self.A.get_at(0)
24 else: return None
25
Sorting
Sorting an array A of comparable items into increasing order is a common subtask of many com-
putational problems. Insertion sort and selection sort are common sorting algorithms for sorting
small numbers of items because they are easy to understand and implement. Both algorithms are
incremental in that they maintain and grow a sorted subset of the items until all items are sorted.
The difference between them is subtle:
• Selection sort maintains and grows a subset the largest i items in sorted order.
• Insertion sort maintains and grows a subset of the first i input items in sorted order.
Selection Sort
Here is a Python implementation of selection sort. Having already sorted the largest items into
sub-array A[i+1:], the algorithm repeatedly scans the array for the largest item not yet sorted
and swaps it with item A[i]. As can be seen from the code, selection sort can require Ω(n2 )
comparisons, but will perform at most O(n) swaps in the worst case.
Insertion Sort
Here is a Python implementation of insertion sort. Having already sorted sub-array A[:i], the
algorithm repeatedly swaps item A[i] with the item to its left until the left item is no larger than
A[i]. As can be seen from the code, insertion sort can require Ω(n2 ) comparisons and Ω(n2 )
swaps in the worst case.
Merge Sort
In lecture, we introduced merge sort, an asymptotically faster algorithm for sorting large numbers
of items. The algorithm recursively sorts the left and right half of the array, and then merges the
two halves in linear time. The recurrence relation for merge sort is then T (n) = 2T (n/2) + Θ(n),
which solves to T (n) = Θ(n log n). An Θ(n log n) asymptotic growth rate is much closer to
linear than quadratic, as log n grows exponentially slower than n. In particular, log n grows slower
than any polynomial nε for ε > 0.
Merge sort uses a linear amount of temporary storage (temp) when combining the two halves, so
it is not in-place. While there exist algorithms that perform merging using no additional space,
such implementations are substantially more complicated than the merge sort algorithm. Whether
merge sort is stable depends on how an implementation breaks ties when merging. The above
implementation is not stable, but it can be made stable with only a small modification. Can you
modify the implementation to make it stable? We’ve made CoffeeScript visualizers for the merge
step of this algorithm, as well as one showing the recursive call structure. You can find them here:
https://codepen.io/mit6006/pen/RYJdOG https://codepen.io/mit6006/pen/wEXOOq
Recurrences
There are three primary methods for solving recurrences:
• Recursion Tree: Draw a tree representing the recurrence and sum computation at nodes.
This is a very general method, and is the one we’ve used in lecture so far.
• Master Theorem: A general formula to solve a large class of recurrences. It is useful, but
can also be hard to remember.
Master Theorem
The Master Theorem provides a way to solve recurrence relations in which recursive calls de-
crease problem size by a constant factor. Given a recurrence relation of the form T (n) = aT (n/b)+
f (n) and T (1) = Θ(1), with branching factor a ≥ 1, problem size reduction factor b > 1, and
asymptotically non-negative function f (n), the Master Theorem gives the solution to the recur-
rence by comparing f (n) to alogb n = nlogb a , the number of leaves at the bottom of the recursion
tree. When f (n) grows asymptotically faster than nlogb a , the work done at each level decreases
geometrically so the work at the root dominates; alternatively, when f (n) grows slower, the work
done at each level increases geometrically and the work at the leaves dominates. When their growth
rates are comparable, the work is evenly spread over the tree’s O(log n) levels.
n 1 × f (n)
n
b a × f ( nb )
n
bi ai × f ( bni )
1 alogb n × f (1)
The Master Theorem takes on a simpler form when f (n) is a polynomial, such that the recurrence
has the from T (n) = aT (n/b) + Θ(nc ) for some constant c ≥ 0.
Recitation 3 6
This special case is straight-forward to prove by substitution (this can be done in recitation). To
apply the Master Theorem (or this simpler special case), you should state which case applies, and
show that your recurrence relation satisfies all conditions required by the relevant case. There are
even stronger (more general) formulas1 to solve recurrences, but we will not use them in this class.
Exercises
1. Write a recurrence for binary search and solve it.
Solution: T (n) = T (n/2) + O(1) so T (n) = O(log n) by case 2 of Master Theorem.
2. T (n) = T (n − 1) + O(1)
Solution: T (n) = O(n), length n chain, O(1) work per node.
3. T (n) = T (n − 1) + O(n)
Solution: T (n) = O(n2 ), length n chain, O(k) work per node at height k.
4. T (n) = 2T (n − 1) + O(1)
Solution: T (n) = O(2n ), height n binary tree, O(1) work per node.
1
http://en.wikipedia.org/wiki/Akra-Bazzi_method
MIT OpenCourseWare
https://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms
Introduction to Algorithms: 6.006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Recitation 4
Recitation 4
Operations O(·)
Container Static Dynamic Order
Data Structure build(X) find(k) insert(x) find min() find prev(k)
delete(k) find max() find next(k)
Array n n n n n
Sorted Array n log n log n n 1 log n
We’ve learned how to implement a set interface using a sorted array, where query operations are
efficient but whose dynamic operations are lacking. Recalling that Θ(log n) growth is much closer
to Θ(1) than Θ(n), a sorted array provides really good performance! But one of the most common
operations you will do in programming is to search for something you’re storing, i.e., find(k).
Is it possible to find faster than Θ(log n)? It turns out that if the only thing we can do to items is
to compare their relative order, then the answer is no!
Comparison Model
The comparison model of computation acts on a set of comparable objects. The objects can be
thought of as black boxes, supporting only a set of binary boolean operations called comparisons
(namely <, ≤, >, ≥, =, and = 6 ). Each operation takes as input two objects and outputs a Boolean
value, either True or False, depending on the relative ordering of the elements. A search algorithm
operating on a set of n items will return a stored item with a key equal to the input key, or return
no item if no such item exists. In this section, we assume that each item has a unique key.
If binary comparisons are the only way to distinguish between stored items and a search key, a
deterministic comparison search algorithm can be thought of as a fixed binary decision tree rep-
resenting all possible executions of the algorithm, where each node represents a comparison per-
formed by the algorithm. During execution, the algorithm walks down the tree along a path from
the root. For any given input, a comparison sorting algorithm will make some comparison first, the
comparison at the root of the tree. Depending on the outcome of this comparison, the computation
will then proceed with a comparison at one of its two children. The algorithm repeatedly makes
comparisons until a leaf is reached, at which point the algorithm terminates, returning an output to
the algorithm. There must be a leaf for each possible output to the algorithm. For search, there are
n + 1 possible outputs, the n items and the result where no item is found, so there must be at least
n + 1 leaves in the decision tree. Then the worst-case number of comparisons that must be made
by any comparison search algorithm will be the height of the algorithm’s decision tree, i.e., the
length of any longest root to leaf path.
Recitation 4 2
Exercise: Prove that the smallest height for any tree on n nodes is dlg(n + 1)e − 1 = Ω(log n).
Solution: We show that the maximum number of nodes in any binary tree with height h is
n ≤ T (h) = 2h+1 − 1, so h ≥ (lg(n + 1)) − 1. Proof by induction on h. The only tree of
height zero has one node, so T (0) = 1, a base case satisfying the claim. The maximum number
of nodes in a height-h tree must also have the maximum number of nodes in its two subtrees, so
T (h) = 2T (h − 1) + 1. Substituting T (h) yields 2h+1 − 1 = 2(2h − 1) + 1, proving the claim.
A tree with n + 1 leaves has more than n nodes, so its height is at least Ω(log n). Thus the min-
imum number of comparisons needed to distinguish between the n items is at least Ω(log n), and
the worst-case running time of any deterministic comparison search algorithm is at least Ω(log n)!
So sorted arrays and balanced BSTs are able to support find(k) asymptotically optimally, in a
comparison model of computation.
Comparisons are very limiting because each operation performed can lead to at most constant
branching factor in the decision tree. It doesn’t matter that comparisons have branching factor
two; any fixed constant branching factor will lead to a decision tree with at least Ω(log n) height.
If we were not limited to comparisons, it opens up the possibility of faster-than-O(log n) search.
More specifically, if we can use an operation that allows for asymptotically larger than constant
ω(1) branching factor, then our decision tree could be shallower, leading to a faster algorithm.
Now suppose we want to store a set of n items, each associated with a unique integer key in the
bounded range from 0 to some large number u − 1. We can store the items in a length u direct
access array, where each array slot i contains an item associated with integer key i, if it exists. To
find an item having integer key i, a search algorithm can simply look in array slot i to respond to
the search query in worst-case constant time! However, order operations on this data structure
will be very slow: we have no guarantee on where the first, last, or next element is in the direct
access array, so we may have to spend u time for order operations.
Recitation 4 3
Worst-case constant time search comes at the cost of storage space: a direct access array must have
a slot available for every possible key in range. When u is very large compared to the number of
items being stored, storing a direct access array can be wasteful, or even impossible on modern
machines. For example, suppose you wanted to support the set find(k) operation on ten-letter
names using a direct access array. The space of possible names would be u ≈ 2610 ≈ 9.5 × 1013 ;
even storing a bit array of that length would require 17.6 Terabytes of storage space. How can we
overcome this obstacle? The answer is hashing!
1 class DirectAccessArray:
2 def __init__(self, u): self.A = [None] * u # O(u)
3 def find(self, k): return self.A[k] # O(1)
4 def insert(self, x): self.A[x.key] = x # O(1)
5 def delete(self, k): self.A[k] = None # O(1)
6 def find_next(self, k):
7 for i in range(k, len(self.A)): # O(u)
8 if A[i] is not None:
9 return A[i]
10 def find_max(self):
11 for i in range(len(self.A) - 1, -1, -1): # O(u)
12 if A[i] is not None:
13 return A[i]
14 def delete_max(self):
15 for i in range(len(self.A) - 1, -1, -1): # O(u)
16 x = A[i]
17 if x is not None:
18 A[i] = None
19 return x
Hashing
Is it possible to get the performance benefits of a direct access array while using only linear O(n)
space when n u? A possible solution could be to store the items in a smaller dynamic direct
access array, with m = O(n) slots instead of u, which grows and shrinks like a dynamic array
depending on the number of items stored. But to make this work, we need a function that maps
item keys to different slots of the direct access array, h(k) : {0, . . . , u − 1} → {0, . . . , m − 1}. We
call such a function a hash function or a hash map, while the smaller direct access array is called
a hash table, and h(k) is the hash of integer key k. If the hash function happens to be injective
over the n keys you are storing, i.e. no two keys map to the same direct access array index, then
we will be able to support worst-case constant time search, as the hash table simply acts as a direct
access array over the smaller domain m.
Recitation 4 4
Unfortunately, if the space of possible keys is larger than the number of array indices, i.e. m < u,
then any hash function mapping u possible keys to m indices must map multiple keys to the same
array index, by the pigeonhole principle. If two items associated with keys k1 and k2 hash to the
same index, i.e. h(k1 ) = h(k2 ), we say that the hashes of k1 and k2 collide. If you don’t know in
advance what keys will be stored, it is extremely unlikely that your choice of hash function will
avoid collisions entirely1 . If the smaller direct access array hash table can only store one item at
each index, when collisions occur, where do we store the colliding items? Either we store collisions
somewhere else in the same direct access array, or we store collisions somewhere else. The first
strategy is called open addressing, which is the way most hash tables are actually implemented,
but such schemes can be difficult to analyze. We will adopt the second strategy called chaining.
Chaining
Chaining is a collision resolution strategy where colliding keys are stored separately from the orig-
inal hash table. Each hash table index holds a pointer to a chain, a separate data structure that sup-
ports the dynamic set interface, specifically operations find(k), insert(x), and delete(k).
It is common to implement a chain using a linked list or dynamic array, but any implementation
will do, as long as each operation takes no more than linear time. Then to insert item x into the
hash table, simply insert x into the chain at index h(x.key); and to find or delete a key k from
the hash table, simply find or delete k from the chain at index h(k).
Ideally, we want chains to be small, because if our chains only hold a constant number of items,
the dynamic set operations will run in constant time. But suppose we are unlucky in our choice of
hash function, and all the keys we want to store has all of them to the same index location, into the
same chain. Then the chain will have linear size, meaning the dynamic set operations could take
linear time. A good hash function will try to minimize the frequency of such collisions in order to
minimize the maximum size of any chain. So what’s a good hash function?
Hash Functions
Division Method (bad): The simplest mapping from an integer key domain of size u to a smaller
one of size m is simply to divide the key by m and take the remainder: h(k) = (k mod m), or in
Python, k % m. If the keys you are storing are uniformly distributed over the domain, the division
method will distribute items roughly evenly among hashed indices, so we expect chains to have
small size providing good performance. However, if all items happen to have keys with the same
remainder when divided by m, then this hash function will be terrible. Ideally, the performance of
our data structure would be independent of the keys we choose to store.
1
If you know all of the keys you will want to store in advance, it is possible to design a hashing scheme that will
always avoid collisions between those keys. This idea, called perfect hashing, follows from the Birthday Paradox.
Recitation 4 5
Universal Hashing (good): For a large enough key domain u, every hash function will be bad for
some set of n inputs2 . However, we can achieve good expected bounds on hash table performance
by choosing our hash function randomly from a large family of hash functions. Here the expecta-
tion is over our choice of hash function, which is independent of the input. This is not expectation
over the domain of possible input keys. One family of hash functions that performs well is:
n o
H(m, p) = hab (k) = (((ak + b) mod p) mod m) a, b ∈ {0, . . . , p − 1} and a 6= 0 ,
where p is a prime that is larger than the key domain u. A single hash function from this family
is specified by choosing concrete values for a and b. This family of hash functions is universal3 :
for any two keys, the probability that their hashes will collide when hashed using a hash function
chosen uniformly at random from the universal family, is no greater than 1/m, i.e.
Pr {h(ki ) = h(kj )} ≤ 1/m, ∀ki =
6 kj ∈ {0, . . . , u − 1}.
h∈H
If we know that a family of hash functions is universal, then we can upper bound the expected
size of any chain, in expectation over our choice of hash function from the family. Let Xij be
the indicator random variable representing the value 1 if keys ki and kj collide for a chosen hash
function, and 0 otherwise. Then the Prandom variable representing the number of items hashed to
index h(ki ) will be the sum Xi = j Xij over all keys kj from the set of n keys {k0 , . . . , kn−1 }
stored in the hash table. Then the expected number of keys hashed to the chain at index h(ki ) is:
⎧ ⎫
⎨X ⎬ X X
E {Xi } = E Xij = E {Xij } = 1 + E {Xij }
h∈H h∈H ⎩ ⎭ h∈H h∈H
j j 6 i
j=
X
=1+ (1) Pr {h(ki ) = h(kj )} + (0) Pr {h(ki ) =
6 h(kj )}
h∈H h∈H
j6=i
X
≤1+ 1/m = 1 + (n − 1)/m.
j6=i
If the size of the hash table is at least linear in the number of items stored, i.e. m = Ω(n), then
the expected size of any chain will be 1 + (n − 1)/Ω(n) = O(1), a constant! Thus a hash table
where collisions are resolved using chaining, implemented using a randomly chosen hash function
from a universal family, will perform dynamic set operations in expected constant time, where
the expectation is taken over the random choice of hash function, independent from the input keys!
Note that in order to maintain m = O(n), insertion and deletion operations may require you to
rebuild the direct access array to a different size, choose a new hash function, and reinsert all the
items back into the hash table. This can be done in the same way as in dynamic arrays, leading to
amortized bounds for dynamic operations.
2
If u > nm, every hash function from u to m maps some n keys to the same hash, by the pigeonhole principle.
3
The proof that this family is universal is beyond the scope of 6.006, though it is usually derived in 6.046.
Recitation 4 6
1 class Hash_Table_Set:
2 def __init__(self, r = 200): # O(1)
3 self.chain_set = Set_from_Seq(Linked_List_Seq)
4 self.A = []
5 self.size = 0
6 self.r = r # 100/self.r = fill ratio
7 self.p = 2**31 - 1
8 self.a = randint(1, self.p - 1)
9 self._compute_bounds()
10 self._resize(0)
11
12 def __len__(self): return self.size # O(1)
13 def __iter__(self): # O(n)
14 for X in self.A:
15 yield from X
16
17 def build(self, X): # O(n)e
18 for x in X: self.insert(x)
19
20 def _hash(self, k, m): # O(1)
21 return ((self.a * k) % self.p) % m
22
23 def _compute_bounds(self): # O(1)
24 self.upper = len(self.A)
25 self.lower = len(self.A) * 100*100 // (self.r*self.r)
26
27 def _resize(self, n): # O(n)
28 if (self.lower >= n) or (n >= self.upper):
29 f = self.r // 100
30 if self.r % 100: f += 1
31 # f = ceil(r / 100)
32 m = max(n, 1) * f
33 A = [self.chain_set() for _ in range(m)]
34 for x in self:
35 h = self._hash(x.key, m)
36 A[h].insert(x)
37 self.A = A
38 self._compute_bounds()
39
Operations O(·)
Container Static Dynamic Order
Data Structure build(X) find(k) insert(x) find min() find prev(k)
delete(k) find max() find next(k)
Array n n n n n
Sorted Array n log n log n n 1 log n
Direct Access Array u 1 1 u u
Hash Table n(e) 1(e) 1(a)(e) n n
Exercise
Given an unsorted array A = [a0 , . . . , an−1 ] containing n positive integers, the D UPLICATES prob-
lem asks whether two integers in the array have the same value.
4
In 6.006, you do not have to specify these details when answering problems. You may simply quote that hash
tables can achieve the expected/amortized bounds for operations described in class.
MIT OpenCourseWare
https://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms
Introduction to Algorithms: 6.006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Recitation 5
Recitation 5
Comparison Sorting
Last time we discussed a lower bound on search in a comparison model. We can use a similar
analysis to lower bound the worst-case running time of any sorting algorithm that only uses com-
parisons. There are n! possible outputs to a sorting algorithm: the n! permutations of the items.
Then the decision tree for any deterministic sorting algorithm that uses only comparisons must
have at least n! leaves, and thus (by the same analysis as the search decision tree) must have height
that is at least Ω(log(n!)) = Ω(n log n) height1 , leading to a running time of at least Ω(n log n).
1 def direct_access_sort(A):
2 "Sort A assuming items have distinct non-negative keys"
3 u = 1 + max([x.key for x in A]) # O(n) find maximum key
4 D = [None] * u # O(u) direct access array
5 for x in A: # O(n) insert items
6 D[x.key] = x
7 i = 0
8 for key in range(u): # O(u) read out items in order
9 if D[key] is not None:
10 A[i] = D[key]
11 i += 1
√
1
We can prove this directly via Stirling’s approximation, n! ≈ 2πn(n/e)n , or by observing that n! > (n/2)n/2 .
Recitation 5 2
Counting Sort
To solve the first problem, we simply link a chain to each direct access array index, just like in
hashing. When multiple items have the same key, we store them both in the chain associated with
their key. Later, it will be important that this algorithm be stable: that items with duplicate keys
appear in the same order in the output as the input. Thus, we choose chains that will support
a sequence queue interface to keep items in order, inserting to the end of the queue, and then
returning items back in the order that they were inserted.
1 def counting_sort(A):
2 "Sort A assuming items have non-negative keys"
3 u = 1 + max([x.key for x in A]) # O(n) find maximum key
4 D = [[] for i in range(u)] # O(u) direct access array of chains
5 for x in A: # O(n) insert into chain at x.key
6 D[x.key].append(x)
7 i = 0
8 for chain in D: # O(u) read out items in order
9 for x in chain:
10 A[i] = x
11 i += 1
Counting sort takes O(u) time to initialize the chains of the direct access array, O(n) time to insert
all the elements, and then O(u) time to scan back through the direct access array to return the
items; so the algorithm runs in O(n + u) time. Again, when u = O(n), then counting sort runs in
linear time, but this time allowing duplicate keys.
There’s another implementation of counting sort which just keeps track of how many of each key
map to each index, and then moves each item only once, rather the implementation above which
moves each item into a chain and then back into place. The implementation below computes the
final index location of each item via cumulative sums.
1 def counting_sort(A):
2 "Sort A assuming items have non-negative keys"
3 u = 1 + max([x.key for x in A]) # O(n) find maximum key
4 D = [0] * u # O(u) direct access array
5 for x in A: # O(n) count keys
6 D[x.key] += 1
7 for k in range(1, u): # O(u) cumulative sums
8 D[k] += D[k - 1]
9 for x in list(reversed(A)): # O(n) move items into place
10 A[D[x.key] - 1] = x
11 D[x.key] -= 1
Now what if we want to sort keys from a larger integer range? Our strategy will be to break up
integer keys into parts, and then sort each part! In order to do that, we will need a sorting strategy
to sort tuples, i.e. multiple parts.
Recitation 5 3
Tuple Sort
Suppose we want to sort tuples, each containing many different keys (e.g. x.k1 , x.k2 , x.k3 , . . .), so
that the sort is lexicographic with respect to some ordering of the keys (e.g. that key k1 is more
important than key k2 is more important than key k3 , etc.). Then tuple sort uses a stable sorting
algorithm as a subroutine to repeatedly sort the objects, first according to the least important key,
then the second least important key, all the way up to most important key, thus lexicographically
sorting the objects. Tuple sort is similar to how one might sort on multiple rows of a spreadsheet
by different columns. However, tuple sort will only be correct if the sorting from previous rounds
are maintained in future rounds. In particular, tuple sort requires the subroutine sorting algorithms
be stable.
Radix Sort
Now, to increase the range of integer sets that we can sort in linear time, we break each integer up
into its multiples of powers of n, representing each item key its sequence of digits when represented
in base n. If the integers are non-negative and the largest integer in the set is u, then this base n
number will have dlogn ue digits. We can think of these digit representations as tuples and sort
them with tuple sort by sorting on each digit in order from least significant to most significant digit
using counting sort. This combination of tuple sort and counting sort is called radix sort. If the
largest integer in the set u ≤ nc , then radix sort runs in O(nc) time. Thus, if c is constant, then
radix sort also runs in linear time!
1 def radix_sort(A):
2 "Sort A assuming items have non-negative keys"
3 n = len(A)
4 u = 1 + max([x.key for x in A]) # O(n) find maximum key
5 c = 1 + (u.bit_length() // n.bit_length())
6 class Obj: pass
7 D = [Obj() for a in A]
8 for i in range(n): # O(nc) make digit tuples
9 D[i].digits = []
10 D[i].item = A[i]
11 high = A[i].key
12 for j in range(c): # O(c) make digit tuple
13 high, low = divmod(high, n)
14 D[i].digits.append(low)
15 for i in range(c): # O(nc) sort each digit
16 for j in range(n): # O(n) assign key i to tuples
17 D[j].key = D[j].digits[i]
18 counting_sort(D) # O(n) sort on digit i
19 for i in range(n): # O(n) output to A
20 A[i] = D[i].item
We’ve made a CoffeeScript Counting/Radix sort visualizer which you can find here:
https://codepen.io/mit6006/pen/LgZgrd
Recitation 5 4
Exercises
1) Sort the following integers using a base-10 radix sort.
(329, 457, 657, 839, 436, 720, 355) −→ (329, 355, 436, 457, 657, 720, 839)
2) Describe a linear time algorithm to sort n integers from the range [−n2 , . . . , n3 ].
Solution: Add n2 to each number so integers are all positive, apply Radix sort, and then subtract
n2 from each element of the output.
3) Describe a linear time algorithm to sort a set n of strings, each having k English characters.
Solution: Use tuple sort to repeatedly sort the strings by each character from right to left with
counting sort, using the integers {0, . . . , 25} to represent the English alphabet. There are k rounds
of counting sort, and each round takes Θ(n + 26) = Θ(n) time, thus the algorithm runs in Θ(nk)
time. This running time is linear because the input size is Θ(nk).
MIT OpenCourseWare
https://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms
Introduction to Algorithms: 6.006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Recitation 6
Recitation 6
Binary Trees
A binary tree is a tree (a connected graph with no cycles) of binary nodes: a linked node con-
tainer, similar to a linked list node, having a constant number of fields:
• a pointer to an item stored at the node,
• a pointer to a parent node (possibly None),
• a pointer to a left child node (possibly None), and
• a pointer to a right child node (possibly None).
1 class Binary_Node:
2 def __init__(A, x): # O(1)
3 A.item = x
4 A.left = None
5 A.right = None
6 A.parent = None
7 # A.subtree_update() # wait for R07!
Why is a binary node called “binary”? In actuality, a binary node can be connected to three other
nodes (its parent, left child, and right child), not just two. However, we will differentiate a node’s
parent from it’s children, and so we call the node “binary” based on the number of children the
node has.
A binary tree has one node that that is the root of the tree: the only node in the tree lacking a
parent. All other nodes in the tree can reach the root of the tree containing them by traversing
parent pointers. The set of nodes passed when traversing parent pointers from node <X> back to
the root are called the ancestors for <X> in the tree. The depth of a node <X> in the subtree rooted
at <R> is the length of the path from <X> back to <R>. The height of node <X> is the maximum
depth of any node in the subtree rooted at <X>. If a node has no children, it is called a leaf.
Why would we want to store items in a binary tree? The difficulty with a linked list is that many
linked-list nodes can be O(n) pointer hops away from the head of the list, so it may take O(n)
time to reach them. By contrast, as we’ve seen in earlier recitations, it is possible to construct a
binary tree on n nodes such that no node is more than O(log n) pointer hops away from the root,
i.e., there exist binary trees with logarithmic height. The power of a binary tree structure is if we
can keep the height h of the tree low, i.e., O(log n), and only perform operations on the tree that
run in time on the order of the height of the tree, then these operations will run in O(h) = O(log n)
time (which is much closer to O(1) than to O(n)).
Recitation 6 2
Traversal Order
The nodes in a binary tree have a natural order based on the fact that we distinguish one child to
be left and one child to be right. We define a binary tree’s traversal order based on the following
implicit characterization:
• every node in the left subtree of node <X> comes before <X> in the traversal order; and
• every node in the right subtree of node <X> comes after <X> in the traversal order.
Given a binary node <A>, we can list the nodes in <A>’s subtree by recursively listing the nodes in
<A>’s left subtree, listing <A> itself, and then recursively listing the nodes in <A>’s right subtree.
This algorithm runs in O(n) time because every node is recursed on once doing constant work.
Right now, there is no semantic connection between the items being stored and the traversal order
of the tree. Next time, we will provide two different semantic meanings to the traversal order (one
of which will lead to an efficient implementation of the Sequence interface, and the other will lead
to an efficient implementation of the Set interface), but for now, we will just want to preserve the
traversal order as we manipulate the tree.
Tree Navigation
Given a binary tree, it will be useful to be able to navigate the nodes in their traversal order effi-
ciently. Probably the most straight forward operation is to find the node in a given node’s subtree
that appears first (or last) in traversal order. To find the first node, simply walk left if a left child
exists. This operation takes O(h) time because each step of the recursion moves down the tree.
Find the last node in a subtree is symmetric.
Given a node in a binary tree, it would also be useful too find the next node in the traversal order,
i.e., the node’s successor, or the previous node in the traversal order, i.e., the node’s predecessor.
To find the successor of a node <A>, if <A> has a right child, then <A>’s successor will be the first
node in the right child’s subtree. Otherwise, <A>’s successor cannot exist in <A>’s subtree, so we
walk up the tree to find the lowest ancestor of <A> such that <A> is in the ancestor’s left subtree.
Recitation 6 3
In the first case, the algorithm only walks down the tree to find the successor, so it runs in O(h)
time. Alternatively in the second case, the algorithm only walks up the tree to find the successor,
so it also runs in O(h) time. The predecessor algorithm is symmetric.
Dynamic Operations
If we want to add or remove items in a binary tree, we must take care to preserve the traversal order
of the other items in the tree. To insert a node <B> before a given node <A> in the traversal order,
either node <A> has a left child or not. If <A> does not have a left child, than we can simply add
<B> as the left child of <A>. Otherwise, if <A> has a left child, we can add <B> as the right child of
the last node in <A>’s left subtree (which cannot have a right child). In either case, the algorithm
walks down the tree at each step, so the algorithm runs in O(h) time. Inserting after is symmetric.
To delete the item contained in a given node from its binary tree, there are two cases based on
whether the node storing the item is a leaf. If the node is a leaf, then we can simply clear the
child pointer from the node’s parent and return the node. Alternatively, if the node is not a leaf, we
can swap the node’s item with the item in the node’s successor or predecessor down the tree until
the item is in a leaf which can be removed. Since swapping only occurs down the tree, again this
operation runs in O(h) time.
Recitation 6 4
1 class Binary_Node:
2 def __init__(A, x): # O(1)
3 A.item = x
4 A.left = None
5 A.right = None
6 A.parent = None
7 # A.subtree_update() # wait for R07!
8
9 def subtree_iter(A): # O(n)
10 if A.left: yield from A.left.subtree_iter()
11 yield A
12 if A.right: yield from A.right.subtree_iter()
13
14 def subtree_first(A): # O(h)
15 if A.left: return A.left.subtree_first()
16 else: return A
17
18 def subtree_last(A): # O(h)
19 if A.right: return A.right.subtree_last()
20 else: return A
21
22 def successor(A): # O(h)
23 if A.right: return A.right.subtree_first()
24 while A.parent and (A is A.parent.right):
25 A = A.parent
26 return A.parent
27
28 def predecessor(A): # O(h)
29 if A.left: return A.left.subtree_last()
30 while A.parent and (A is A.parent.left):
31 A = A.parent
32 return A.parent
33
Recitation 6 5
1 class Binary_Tree:
2 def __init__(T, Node_Type = Binary_Node):
3 T.root = None
4 T.size = 0
5 T.Node_Type = Node_Type
6
7 def __len__(T): return T.size
8 def __iter__(T):
9 if T.root:
10 for A in T.root.subtree_iter():
11 yield A.item
Recitation 6 6
Exercise: Given an array of items A = (a0 , . . . , an−1 ), describe a O(n)-time algorithm to con-
struct a binary tree T containing the items in A such that (1) the item stored in the ith node of T ’s
traversal order is item ai , and (2) T has height O(log n).
Solution: Build T by storing the middle item in a root node, and then recursively building the
remaining left and right halves in left and right subtrees. This algorithm satisfies property (1) by
definition of traversal order, and property (2) because the height roughly follows the recurrence
H(n) = 1 + H(n/2). The algorithm runs in O(n) time because every node is recursed on once
doing constant work.
1 def build(X):
2 A = [x for x in X]
3 def build_subtree(A, i, j):
4 c = (i + j) // 2
5 root = self.Node_Type(A[c])
6 if i < c: # needs to store more items in left subtree
7 root.left = build_subtree(A, i, c - 1)
8 root.left.parent = root
9 if c < j: # needs to store more items in right subtree
10 root.right = build_subtree(A, c + 1, j)
11 root.right.parent = root
12 return root
13 self.root = build_subtree(A, 0, len(A)-1)
Exercise: Argue that the following iterative procedure to return the nodes of a tree in traversal
order takes O(n) time.
1 def tree_iter(T):
2 node = T.subtree_first()
3 while node:
4 yield node
5 node = node.successor()
Solution: This procedure walks around the tree traversing each edge of the tree twice: once going
down the tree, and once going back up. Then because the number of edges in a tree is one fewer
than the number of nodes, the traversal takes O(n) time.
Recitation 6 7
Application: Set
To use a Binary Tree to implement a Set interface, we use the traversal order of the tree to store the
items sorted in increasing key order. This property is often called the Binary Search Tree Prop-
erty, where keys in a node’s left subtree are less than the key stored at the node, and keys in the
node’s right subtree are greater than the key stored at the node. Then finding the node containing
a query key (or determining that no node contains the key) can be done by walking down the tree,
recursing on the appropriate side.
Exercise: Make a Set Binary Tree (Binary Search Tree) by inserting student-chosen items one by
one, then searching and/or deleting student-chosen keys one by one.
1 class BST_Node(Binary_Node):
2 def subtree_find(A, k): # O(h)
3 if k < A.item.key:
4 if A.left: return A.left.subtree_find(k)
5 elif k > A.item.key:
6 if A.right: return A.right.subtree_find(k)
7 else: return A
8 return None
9
10 def subtree_find_next(A, k): # O(h)
11 if A.item.key <= k:
12 if A.right: return A.right.subtree_find_next(k)
13 else: return None
14 elif A.left:
15 B = A.left.subtree_find_next(k)
16 if B: return B
17 return A
18
19 def subtree_find_prev(A, k): # O(h)
20 if A.item.key >= k:
21 if A.left: return A.left.subtree_find_prev(k)
22 else: return None
23 elif A.right:
24 B = A.right.subtree_find_prev(k)
25 if B: return B
26 return A
27
28 def subtree_insert(A, B): # O(h)
29 if B.item.key < A.item.key:
30 if A.left: A.left.subtree_insert(B)
31 else: A.subtree_insert_before(B)
32 elif B.item.key > A.item.key:
33 if A.right: A.right.subtree_insert(B)
34 else: A.subtree_insert_after(B)
35 else: A.item = B.item
Recitation 6 8
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms
Introduction to Algorithms: 6.006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Recitation 7
Recitation 7
There are many ways to keep a binary tree balanced under insertions and deletions (Red-Black
Trees, B-Trees, 2-3 Trees, Splay Trees, etc.). The oldest (and perhaps simplest) method is called
an AVL Tree. Every node of an AVL Tree is height-balanced (i.e., satisfies the AVL Property)
where the left and right subtrees of a height-balanced node differ in height by at most 1. To put it
a different way, define the skew of a node to be the height of its right subtree minus the height of
its left subtree (where the height of an empty subtree is −1. Then a node is height-balanced if it’s
skew is either −1, 0, or 1. A tree is height-balanced if every node in the tree is height-balanced.
Height-balance is good because it implies balance!
F (h) = 1 + F (h − 1) + F (h − 2) ≥ 2F (h − 2),
since the subtrees of the root’s children should also contain the fewest nodes. As base cases, the
fewest nodes in a height-balanced tree of height 0 is one, i.e., F (0) = 1, while the fewest nodes in
a height-balanced tree of height 1 is two, i.e., F (1) = 2. Then this recurrence is lower bounded by
F (h) ≥ 2h/2 = 2Ω(h) as desired.
Recitation 7 2
Rotations
As we add or remove nodes to our tree, it is possible that our tree will become imbalanced. We
will want to change the structure of the tree without changing its traversal order, in the hopes that
we can make the tree’s structure more balanced. We can change the structure of a tree using a local
operation called a rotation. A rotation takes a subtree that locally looks like one the following two
configurations and modifies the connections between nodes in O(1) time to transform it into the
other configuration.
This operation preserves the traversal order of the tree while changing the depth of the nodes
in subtrees <A> and <E>. Next time, we will use rotations to enforce that a balanced tree stays
balanced after inserting or deleting a node.
Maintaining Height-Balance
Suppose we have a height-balanced AVL tree, and we perform a single insertion or deletion by
adding or removing a leaf. Either the resulting tree is also height-balanced, or the change in leaf
has made at least one node in the tree have magnitude of skew greater than 1. In particular, the
only nodes in the tree whose subtrees have changed after the leaf modification are ancestors of
that leaf (at most O(h) of them), so these are the only nodes whose skew could have changed and
they could have changed by at most 1 to have magnitude at most 2. As shown in lecture via a
brief case analysis, given a subtree whose root has skew is 2 and every other node in its subtree is
height-balanced, we can restore balance to the subtree in at most two rotations. Thus to rebalance
the entire tree, it suffices to walk from the leaf to the root, rebalancing each node along the way,
performing at most O(log n) rotations in total. A detailed proof is outlined in the lecture notes and
is not repeated here; but the proof may be reviewed in recitation if students would like to see the
Recitation 7 3
full argument. Below is code to implement the rebalancing algorithm presented in lecture.
Unfortunately, it’s not clear how to efficiently evaluate the skew of a a node to determine whether
or not we need to perform rotations, because computing a node’s height naively takes time linear in
the size of the subtree. The code below to compute height recurses on every node in <A>’s subtree,
so takes at least Ω(n) time.
Rebalancing requires us to check at least Ω(log n) heights in the worst-case, so if we want rebal-
ancing the tree to take at most O(log n) time, we need to be able to evaluate the height of a node
in O(1) time. Instead of computing the height of a node every time we need it, we will speed up
computation via augmentation: in particular each node stores and maintains the value of its own
subtree height. Then when we’re at a node, evaluating its height is a simple as reading its stored
value in O(1) time. However, when the structure of the tree changes, we will need to update and
recompute the height at nodes whose height has changed.
1 def height(A):
2 if A: return A.height
3 else: return -1
In the dynamic operations presented in R06, we put commented code to call update on every node
whose subtree changed during insertions, deletions, or rotations. A rebalancing insertion or dele-
tion operation only calls subtree update on at most O(log n) nodes, so as long as updating a
Recitation 7 4
node takes at most O(1) time to recompute augmentations based on the stored augmentations of
the node’s children, then the augmentations can be maintained during rebalancing in O(log n) time.
In general, the idea behind augmentation is to store additional information at each node so that
information can be queried quickly in the future. You’ve done some augmentation already in PS1,
where you augmented a singly-linked list with back pointers to make it faster to evaluate a node’s
predecessor. To augment the nodes of a binary tree with a subtree property P(<X>), you need to:
• show how to compute P(<X>) in O(1) time from the augmentations of <X>’s children.
If you can do that, then you will be able to store and maintain that property at each node without
affecting the O(log n) running time of rebalancing insertions and deletions. We’ve shown how
to traverse around a binary tree and perform insertions and deletions, each in O(h) time while
also maintaining height-balance so that h = O(log n). Now we are finally ready to implement an
efficient Sequence and Set.
1 def height(A):
2 if A: return A.height
3 else: return -1
4
5 class Binary_Node:
6 def __init__(A, x): # O(1)
7 A.item = x
8 A.left = None
9 A.right = None
10 A.parent = None
11 A.subtree_update()
12
13 def subtree_update(A): # O(1)
14 A.height = 1 + max(height(A.left), height(A.right))
15
16 def skew(A): # O(1)
17 return height(A.right) - height(A.left)
18
19 def subtree_iter(A): # O(n)
20 if A.left: yield from A.left.subtree_iter()
21 yield A
22 if A.right: yield from A.right.subtree_iter()
Recitation 7 5
23
24 def subtree_first(A): # O(log n)
25 if A.left: return A.left.subtree_first()
26 else: return A
27
28 def subtree_last(A): # O(log n)
29 if A.right: return A.right.subtree_last()
30 else: return A
31
32 def successor(A): # O(log n)
33 if A.right: return A.right.subtree_first()
34 while A.parent and (A is A.parent.right):
35 A = A.parent
36 return A.parent
37
72
73
Recitation 7 6
Application: Set
Using our new definition of Binary Node that maintains balance, the implementation presented
in R06 of the Binary Tree Set immediately supports all operations in h = O(log n) time,
except build(X) and iter() which run in O(n log n) and O(n) time respectively. This data
structure is what’s normally called an AVL tree, but what we will call a Set AVL.
Application: Sequence
To use a Binary Tree to implement a Sequence interface, we use the traversal order of the tree to
store the items in Sequence order. Now we need a fast way to find the ith item in the sequence
because traversal would take O(n) time. If we knew how many items were stored in our left
subtree, we could compare that size to the index we are looking for and recurse on the appropriate
side. In order to evaluate subtree size efficiently, we augment each node in the tree with the size
of its subtree. A node’s size can be computed in constant time given the sizes of its children by
summing them and adding 1.
1 class Size_Node(Binary_Node):
2 def subtree_update(A): # O(1)
3 super().subtree_update()
4 A.size = 1
5 if A.left: A.size += A.left.size
6 if A.right: A.size += A.right.size
7
Once we are able to find the ith node in a balanced binary tree in O(log n) time, the remainder of
the Sequence interface operations can be implemented directly using binary tree operations. Fur-
ther, via the first exercise in R06, we can build such a tree from an input sequence in O(n) time.
We call this data structure a Sequence AVL.
Implementations of both the Sequence and Set interfaces can be found on the following pages.
We’ve made a CoffeeScript Balanced Binary Search Tree visualizer which you can find here:
https://codepen.io/mit6006/pen/NOWddZ
Recitation 7 8
1 class Seq_Binary_Tree(Binary_Tree):
2 def __init__(self): super().__init__(Size_Node)
3
4 def build(self, X):
5 def build_subtree(X, i, j):
6 c = (i + j) // 2
7 root = self.Node_Type(A[c])
8 if i < c:
9 root.left = build_subtree(X, i, c - 1)
10 root.left.parent = root
11 if c < j:
12 root.right = build_subtree(X, c + 1, j)
13 root.right.parent = root
14 root.subtree_update()
15 return root
16 self.root = build_subtree(X, 0, len(X) - 1)
17 self.size = self.root.size
18
Exercise: Make a Sequence AVL Tree or Set AVL Tree (Balanced Binary Search Tree) by inserting
student chosen items one by one. If any node becomes height-imbalanced, rebalance its ancestors
going up the tree. Here’s a Sequence AVL Tree example that may be instructive (remember to
update subtree heights and sizes as you modify the tree!).
1 T = Seq_Binary_Tree()
2 T.build([10,6,8,5,1,3])
3 T.get_at(4)
4 T.set_at(4, -4)
5 T.insert_at(4, 18)
6 T.insert_at(4, 12)
7 T.delete_at(2)
Solution:
1 Line # 1 | 2,3 | 4 | 5 | 6 | 7
2 | | | | |
3 Result None | ___8__ | ___8___ | ___8_____ | ___8_______ | __12____
4 | 10_ _1_ | 10_ _-4_ | 10_ ___-4_ | 10_ ____-4_ | __6_ __-4_
5 | 6 5 3 | 6 5 3 | 6 5__ 3 | 6 _12__ 3 | 10 5 18 3
6 | | | 18 | 5 18 |
7
8 Also labeled with subtree height H, size #:
9
10 None
11 ___________8H2#6__________
12 10H1#2_____ _____1H1#3_____
13 6H0#1 5H0#1 3H0#1
14
15 ___________8H2#6__________
16 10H1#2_____ _____1H1#3_____
17 6H0#1 5H0#1 3H0#1
18
19 ___________8H2#6___________
20 10H1#2_____ _____-4H1#3_____
21 6H0#1 5H0#1 3H0#1
22
23 ___________8H3#7_________________
24 10H1#2_____ ___________-4H2#4_____
25 6H0#1 5H1#2______ 3H0#1
26 18H0#1
27
28 ___________8H3#8_______________________
29 10H1#2_____ ____________-4H2#5_____
30 6H0#1 _____12H1#3______ 3H0#1
31 5H0#1 18H0#1
32
33 __________12H2#7____________
34 ______6H1#3_____ ______-4H1#3_____
35 10H0#1 5H0#1 18H0#1 3H0#1
Recitation 7 10
Exercise: Maintain a sequence of n bits that supports two operations, each in O(log n) time:
• count ones upto(i): return the number of bits in the prefix up to index i that are one
Solution: Maintain a Sequence Tree storing the bits as items, augmenting each node A with
A.subtree ones, the number of 1 bits in its subtree. We can maintain this augmentation in
O(1) time from the augmentations stored at its children.
1 def update(A):
2 A.subtree_ones = A.item
3 if A.left:
4 A.subtree_ones += A.left.subtree_ones
5 if A.right:
6 A.subtree_ones += A.right.subtree_ones
To implement flip(i), find the ith node A using subtree node at(i) and flip the bit stored
at A.item. Then update the augmentation at A and every ancestor of A by walking up the tree in
O(log n) time.
To implement count ones upto(i), we will first define the subtree-based recursive function
subtree count ones upto(A, i) which returns the number of 1 bits in the subtree of node
A that are at most index i within A’s subtree. Then count ones upto(i) is symantically equiv-
ilent to subtree count ones upto(T.root, i). Since each recursive call makes at most
one recursive call on a child, operation takes O(log n) time.
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms
Introduction to Algorithms: 6.006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Recitation 8
Recitation 8
Priority Queues
Priority queues provide a general framework for at least three sorting algorithms, which differ only
in the data structure used in the implementation.
Let’s look at Python code that implements these priority queues. We start with an abstract base
class that has the interface of a priority queue, maintains an internal array A of items, and trivially
implements insert(x) and delete max() (the latter being incorrect on its own, but useful for
subclasses).
1 class PriorityQueue:
2 def __init__(self):
3 self.A = []
4
8 def delete_max(self):
9 if len(self.A) < 1:
10 raise IndexError(’pop from empty priority queue’)
11 return self.A.pop() # NOT correct on its own!
12
13 @classmethod
14 def sort(Queue, A):
15 pq = Queue() # make empty priority queue
16 for x in A: # n x T_insert
17 pq.insert(x)
18 out = [pq.delete_max() for _ in A] # n x T_delete_max
19 out.reverse()
20 return out
Shared across all implementations is a method for sorting, given implementations of insert and
delete max. Sorting simply makes two loops over the array: one to insert all the elements, and
another to populate the output array with successive maxima in reverse order.
Recitation 8 2
Array Heaps
We showed implementations of selection sort and merge sort previously in recitation. Here are
implementations from the perspective of priority queues. If you were to unroll the organization of
this code, you would have essentially the same code as we presented before.
1 class PQ_Array(PriorityQueue):
2 # PriorityQueue.insert already correct: appends to end of self.A
3 def delete_max(self): # O(n)
4 n, A, m = len(self.A), self.A, 0
5 for i in range(1, n):
6 if A[m].key < A[i].key:
7 m = i
8 A[m], A[n] = A[n], A[m] # swap max with end of array
9 return super().delete_max() # pop from end of array
1 class PQ_SortedArray(PriorityQueue):
2 # PriorityQueue.delete_max already correct: pop from end of self.A
3 def insert(self, *args): # O(n)
4 super().insert(*args) # append to end of array
5 i, A = len(self.A) - 1, self.A # restore array ordering
6 while 0 < i and A[i + 1].key < A[i].key:
7 A[i + 1], A[i] = A[i], A[i + 1]
8 i -= 1
We use *args to allow insert to take one argument (as makes sense now) or zero arguments;
we will need the latter functionality when making the priority queues in-place.
Recitation 8 3
Binary Heaps
The next implementation is based on a binary heap, which takes advantage of the logarithmic
height of a complete binary tree to improve performance. The bulk of the work done by these
functions are encapsulated by max heapify up and max heapify down below.
1 class PQ_Heap(PriorityQueue):
2 def insert(self, *args): # O(log n)
3 super().insert(*args) # append to end of array
4 n, A = self.n, self.A
5 max_heapify_up(A, n, n - 1)
6
7 def delete_max(self): # O(log n)
8 n, A = self.n, self.A
9 A[0], A[n] = A[n], A[0]
10 max_heapify_down(A, n, 0)
11 return super().delete_max() # pop from end of array
Before we define max heapify operations, we need functions to compute parent and child
indices given an index representing a node in a tree whose root is the first element of the array. In
this implementation, if the computed index lies outside the bounds of the array, we return the input
index. Always returning a valid array index instead of throwing an error helps to simplify future
code.
1 def parent(i):
2 p = (i - 1) // 2
3 return p if 0 < i else i
4
5 def left(i, n):
6 l = 2 * i + 1
7 return l if l < n else i
8
Here is the meat of the work done by a max heap. Assuming all nodes in A[:n] satisfy the
Max-Heap Property except for node A[i] makes it easy for these functions to maintain the Node
Max-Heap Property locally.
1 def build_max_heap(A):
2 n = len(A)
3 for i in range(n // 2, -1, -1): # O(n) loop backward over array
4 max_heapify_down(A, n, i) # O(log n - log i)) fix max heap
To see that this procedure takes O(n) instead of O(n log n) time, we compute an upper
√ bound
explicitly using summation. In the derivation, we use Stirling’s approximation: n! = Θ( n(n/e)n ).
n n
nn
X n
T (n) < (log n − log i) = log = O log √
i=0
n! n(n/e)n
√ √
= O(log(en / n)) = O(n log e − log n) = O(n)
Note that using this linear-time procedure to build a max heap does not affect the asymptotic
efficiency of heap sort, because each of n delete max still takes O(log n) time each. But it is
practically more efficient procedure to initially insert n items into an empty heap.
Recitation 8 5
In-Place Heaps
To make heap sort in place1 (as well as restoring the in-place property of selection sort and inser-
tion sort), we can modify the base class PriorityQueue to take an entire array A of elements,
and maintain the queue itself in the prefix of the first n elements of A (where n <= len(A)). The
insert function is no longer given a value to insert; instead, it inserts the item already stored
in A[n], and incorporates it into the now-larger queue. Similarly, delete max does not return
a value; it merely deposits its output into A[n] before decreasing its size. This approach only
works in the case where all n insert operations come before all n delete max operations, as in
priority queue sort.
1 class PriorityQueue:
2 def __init__(self, A):
3 self.n, self.A = 0, A
4
15 @classmethod
16 def sort(Queue, A):
17 pq = Queue(A) # make empty priority queue
18 for i in range(len(A)): # n x T_insert
19 pq.insert()
20 for i in range(len(A)): # n x T_delete_max
21 pq.delete_max()
22 return pq.A
This new base class works for sorting via any of the subclasses: PQ Array, PQ SortedArray,
PQ Heap. The first two sorting algorithms are even closer to the original selection sort and inser-
tion sort, and the final algorithm is what is normally referred to as heap sort.
We’ve made a CoffeeScript heap visualizer which you can find here:
https://codepen.io/mit6006/pen/KxOpep
1
Recall that an in-place sort only uses O(1) additional space during execution, so only a constant number of array
elements can exist outside the array at any given time.
Recitation 8 6
Exercises
1. Draw the complete binary tree associated with the sub-array array A[:8]. Turn it into a max
heap via linear time bottom-up heap-ification. Run insert twice, and then delete max
twice.
1 A = [7, 3, 5, 6, 2, 0, 3, 1, 9, 4]
2. How would you find the minimum element contained in a max heap?
Solution: A max heap has no guarantees on the location of its minimum element, except that
it may not have any children. Therefore, one must search over all n/2 leaves of the binary
tree which takes Ω(n) time.
4. Proximate Sorting: An array of distinct integers is k-proximate if every integer of the array
is at most k places away from its place in the array after being sorted, i.e., if the ith integer
of the unsorted input array is the jth largest integer contained in the array, then |i − j| ≤ k.
In this problem, we will show how to sort a k-proximate array faster than Θ(n log n).
(a) Prove that insertion sort (as presented in this class, without any changes) will sort a
k-proximate array in O(nk) time.
Solution: To prove O(nk), we show that each of the n insertion sort rounds swap an
item left by at most O(k). In the original ordering, entries that are ≥ 2k slots apart must
already be ordered correctly: indeed, if A[s] > A[t] but t − s ≥ 2k, there is no way to
reverse the order of these two items while moving each at most k slots. This means that
for each entry A[i] in the original order, fewer than 2k of the items A[0], . . . , A[i − 1]
are greater than A[i]. Thus, on round i of insertion sort when A[i] is swapped into
place, fewer than 2k swaps are required, so round i requires O(k) time.
It’s possible to prove a stronger bound: that ai = A[i] is swapped at most k times in
round i (instead of 2k). This is a bit subtle: the final sorted index of ai is at most k slots
away from i by the k-proximate assumption, but ai might not move to its final position
immediately, but may move past its final sorted position and then be bumped to the
right in future rounds. Suppose for contradiction a loop swaps the pth largest item A[i]
to the left by more than k to position p0 < i − k, past at least k items larger than A[i].
Since A is k-proximate, i − p ≤ k, i.e. i − k ≤ p, so p0 < p. Thus at least one item
less than A[i] must exist to the right of A[i]. Let A[j] be the smallest such item, the qth
largest item in sorted order. A[j] is smaller than k + 1 items to the left of A[j], and no
item to the right of A[j] is smaller than A[j], so q ≤ j − (k + 1), i.e. j − q ≥ k + 1.
But A is k-proximate, so j − q ≤ k, a contradiction.
(b) Θ(nk) is asymptotically faster than Θ(n2 ) when k = o(n), but is not asymptotically
faster than Θ(n log n) when k = ω(log n). Describe an algorithm to sort a k-proximate
array in O(n log k) time, which can be faster (but no slower) than Θ(n log n).
Solution: We perform a variant of heap sort, where the heap only stores k + 1 items
at a time. Build a min-heap H out of A[0], . . . , A[k − 1]. Then, repeatedly, insert the
next item from A into H, and then store H.delete min() as the next entry in sorted
order. So we first call H.insert(A[k]) followed by B[0] = H.delete min();
the next iteration calls H.insert(A[k+1]) and B[1] = H.delete min(); and so
on. (When there are no more entries to insert into H, do only the delete min step.)
B is the sorted answer. This algorithm works because the ith smallest entry in array A
must be one of A[0], A[1], . . . , A[i + k] by the k-proximate assumption, and by the time
we’re about to write B[i], all of these entries have already been inserted into H (and
some also deleted). Assuming entries B[0], . . . , B[i − 1] are correct (by induction), this
means the ith smallest value is still in H while all smaller values have already been
removed, so this ith smallest value is in fact H.delete min(), and B[i] gets filled
correctly. Each heap operation takes time O(log k) because there are at most k + 1
items in the heap, so the n insertions and n deletions take O(n log k) total.
MIT OpenCourseWare
https://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms
Introduction to Algorithms: 6.006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Recitation 9
Recitation 9
Graphs
A graph G = (V, E) is a mathematical object comprising a set of vertices V (also called nodes)
and a set of edges E, where each edge in E is a two-element subset of vertices from V . A vertex
and edge are incident or adjacent if the edge contains the vertex. Let u and v be vertices. An edge
is directed if its subset pair is ordered, e.g., (u, v), and undirected if its subset pair is unordered,
e.g., {u, v} or alternatively both (u, v) and (v, u). A directed edge e = (u, v) extends from vertex
u (e’s tail) to vertex v (e’s head), with e an incoming edge of v and an outgoing edge of u. In
an undirected graph, every edge is incoming and outgoing. The in-degree and out-degree of a
vertex v denotes the number of incoming and outgoing edges connected to v respectively. Unless
otherwise specified, when we talk about degree, we generally mean out-degree.
As their name suggest, graphs are often depicted graphically, with vertices drawn as points, and
edges drawn as lines connecting the points. If an edge is directed, its corresponding line typically
includes an indication of the direction of the edge, for example via an arrowhead near the edge’s
head. Below are examples of a directed graph G1 and an undirected graph G2 .
G1 = (V1 , E1 ) V1 = {0, 1, 2, 3, 4} E1 = {(0, 1), (1, 2), (2, 0), (3, 4)}
G2 = (V2 , E2 ) V2 = {0, 1, 2, 3, 4} E2 = {{0, 1}, {0, 3}, {0, 4}, {2, 3}}
0 0
4 1 4 1
3 2 3 2
A path1 in a graph is a sequence of vertices (v0 , . . . , vk ) such that for every ordered pair of vertices
(vi , vi+1 ), there exists an outgoing edge in the graph from vi to vi+1 . The length of a path is the
number of edges in the path, or one less than the number of vertices. A graph is called strongly
connected if there is a path from every node to every other node in the graph. Note that every
connected undirected graph is also strongly connected because every undirected edge incident to a
vertex is also outgoing. Of the two connected components of directed graph G1 , only one of them
is strongly connected.
1
These are “walks” in 6.042. A “path” in 6.042 does not repeat vertices, which we would call a simple path.
Recitation 9 2
Graph Representations
There are many ways to represent a graph in code. The most common way is to store a Set data
structure Adj mapping each vertex u to another data structure Adj(u) storing the adjacencies of
v, i.e., the set of vertices that are accessible from v via a single outgoing edge. This inner data
structure is called an adjacency list. Note that we don’t store the edge pairs explicitly; we store
only the out-going neighbor vertices for each vertex. When vertices are uniquely labeled from 0
to |V | − 1, it is common to store the top-level Set Adj within a direct access array of length |V |,
where array slot i points to the adjacency list of the vertex labeled i. Otherwise, if the vertices
are not labeled in this way, it is also common to use a hash table to map each u ∈ V to Adj(u).
Then, it is common to store each adjacency list Adj(u) as a simple unordered array of the outgoing
adjacencies. For example, the following are adjacency list representations of G1 and G2 , using a
direct access array for the top-level Set and an array for each adjacency list.
Using an array for an adjacency list is a perfectly good data structures if all you need to do is loop
over the edges incident to a vertex (which will be the case for all algorithms we will discuss in
this class, so will be our default implementation). Each edge appears in any adjacency list at most
twice, so the size of an adjacency list representation implemented using arrays is Θ(|V | + |E|).
A drawback of this representation is that determining whether your graph contains a given edge
(u, v) might require Ω(|V |) time to step through the array representing the adjacency list of u or v.
We can overcome this obstacle by storing adjacency lists using hash tables instead of regular un-
sorted arrays, which will support edge checking in expected O(1) time, still using only Θ(|V |+|E|)
space. However, we won’t need this operation for our algorithms, so we will assume the simpler
unsorted-array-based adjacency list representation. Below are representations of G1 and G2 that
use a hash table for both the outer Adj Set and the inner adjacency lists Adj(u), using Python
dictionaries:
Breadth-First Search
Given a graph, a common query is to find the vertices reachable by a path from a queried vertex
s. A breadth-first search (BFS) from s discovers the level sets of s: level Li is the set of ver-
tices reachable from s via a shortest path of length i (not reachable via a path of shorter length).
Breadth-first search discovers levels in increasing order starting with i = 0, where L0 = {s} since
the only vertex reachable from s via a path of length i = 0 is s itself. Then any vertex reach-
able from s via a shortest path of length i + 1 must have an incoming edge from a vertex whose
shortest path from s has length i, so it is contained in level Li . So to compute level Li+1 , include
every vertex with an incoming edge from a vertex in Li , that has not already been assigned a level.
By computing each level from the preceding level, a growing frontier of vertices will be explored
according to their shortest path length from s.
Below is Python code implementing breadth-first search for a graph represented using index-
labeled adjacency lists, returning a parent label for each vertex in the direction of a shortest path
back to s. Parent labels (pointers) together determine a BFS tree from vertex s, containing some
shortest path from s to every other vertex in the graph.
How fast is breadth-first search? In particular, how many times can the inner loop on lines 9–11
be executed? A vertex is added to any level at most once in line 11, so the loop in line 7 processes
each vertex v at most once. The loop in line 8 cycles P through all deg(v) outgoing edges from
vertex v. Thus the inner loop is repeated at most O( v∈V deg(v)) = O(|E|) times. Because the
parent array returned has length |V |, breadth-first search runs in O(|V | + |E|) time.
Recitation 9 4
Exercise: For graphs G1 and G2 , conducting a breadth-first search from vertex v0 yields the parent
labels and level sets below.
We can use parent labels returned by a breadth-first search to construct a shortest path from a vertex
s to vertex t, following parent pointers from t backward through the graph to s. Below is Python
code to compute the shortest path from s to t which also runs in worst-case O(|V | + |E|) time.
Exercise: Given an unweighted graph G = (V, E), find a shortest path from s to t having an odd
number of edges.
Solution: Construct a new graph G0 = (V 0 , E 0 ). For every vertex u in V , construct two vertices
uE and uO in V 0 : these represent reaching the vertex u through an even and odd number of edges,
respectively. For every edge (u, v) in E, construct the edges (uE , vO ) and (uO , vE ) in E 0 . Run
breadth-first search on G0 from sE to find the shortest path from sE to tO . Because G0 is bipartite
between even and odd vertices, even paths from sE will always end at even vertices, and odd paths
will end at odd vertices, so finding a shortest path from sE to tO will represent a path of odd length
in the original graph. Because G0 has 2|V | vertices and 2|E| edges, constructing G0 and running
breadth-first search from sE each take O(|V | + |E|) time.
MIT OpenCourseWare
https://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms
Introduction to Algorithms: 6.006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Recitation 10
Recitation 10
Depth-First Search
A breadth-first search discovers vertices reachable from a queried vertex s level-by-level outward
from s. A depth-first search (DFS) also finds all vertices reachable from s, but does so by search-
ing undiscovered vertices as deep as possible before exploring other branches. Instead of exploring
all neighbors of s one after another as in a breadth-first search, depth-first searches as far as possi-
ble from the first neighbor of s before searching any other neighbor of s. Just as with breadth-first
search, depth-first search returns a set of parent pointers for vertices reachable from s in the order
the search discovered them, together forming a DFS tree. However, unlike a BFS tree, a DFS tree
will not represent shortest paths in an unweighted graph. (Additionally, DFS returns an order on
vertices discovered which will be discussed later.) Below is Python code implementing a recursive
depth-first search for a graph represented using index-labeled adjacency lists.
1 def dfs(Adj, s, parent = None, order = None): # Adj: adjacency list, s: start
2 if parent is None: # O(1) initialize parent list
3 parent = [None for v in Adj] # O(V) (use hash if unlabeled)
4 parent[s] = s # O(1) root
5 order = [] # O(1) initialize order array
6 for v in Adj[s]: # O(Adj[s]) loop over neighbors
7 if parent[v] is None: # O(1) parent not yet assigned
8 parent[v] = s # O(1) assign parent
9 dfs(Adj, v, parent, order) # Recursive call
10 order.append(s) # O(1) amortized
11 return parent, order
How fast is depth-first search? A recursive dfs call is performed only when a vertex does not have
a parent pointer, and is given a parent pointer immediately before the recursive call. Thus dfs is
called on each vertex at most once. Further, the amount of work done by each recursive search
from vertex v is proportional
P to the out-degree deg(v) of v. Thus, the amount of work done by
depth-first search is O( v∈V deg(v)) = O(|E|). Because the parent array returned has length |V |,
depth-first search runs in O(|V | + |E|) time.
Exercise: Describe a graph on n vertices for which BFS and DFS would first visit vertices in the
same order.
Solution: Many possible solutions. Two solutions are a chain of vertices from v, or a star graph
with an edge from v to every other vertex.
Recitation 10 2
For historical reasons (primarily for its connection to topological sorting as discussed later) depth-
first search is often used to refer to both a method to search a graph from a specific vertex, and
as a method to search an entire (as in graph explore). You may do the same when answering
problems in this class.
Exercise: Draw a graph, run DFS from a vertex, and classify each edge relative to the DFS tree.
Show that forward and cross edges cannot occur when running DFS on an undirected graph.
Solution: While performing a depth-first search, keep track of the set of ancestors of each vertex
in the DFS tree during the search (in a direct access array or a hash table). When processing
neighbor v of s in dfs(Adj, s), if v is an ancestor of s, then (s, v) is a back edge, and certifies a
cycle in the graph.
Recitation 10 3
Topological Sort
A directed graph containing no directed cycle is called a directed acyclic graph or a DAG. A
topological sort of a directed acyclic graph G = (V, E) is a linear ordering of the vertices such
that for each edge (u, v) in E, vertex u appears before vertex v in the ordering. In the dfs func-
tion, vertices are added to the order list in the order in which their recursive DFS call finishes. If
the graph is acyclic, the order returned by dfs (or graph search) is the reverse of a topolog-
ical sort order. Proof by cases. One of dfs(u) or dfs(v) is called first. If dfs(u) was called
before dfs(v), dfs(v) will start and end before dfs(u) completes, so v will appear before u
in order. Alternatively, if dfs(v) was called before dfs(u), dfs(u) cannot be called before
dfs(v) completes, or else a path from v to u would exist, contradicting that the graph is acyclic;
so v will be added to order before vertex u. Reversing the order returned by DFS will then repre-
sent a topological sort order on the vertices.
Exercise: A high school contains many student organization, each with its own hierarchical struc-
ture. For example, the school’s newspaper has an editor-in-chief who oversees all students con-
tributing to the newspaper, including a food-editor who oversees only students writing about school
food. The high school’s principal needs to line students up to receive diplomas at graduation, and
wants to recognize student leaders by giving a diploma to student a before student b whenever a
oversees b in any student organization. Help the principal determine an order to give out diplomas
that respects student organization hierarchy, or prove to the principal that no such order exists.
Solution: Construct a graph with one vertex per student, and a directed edge from student a to b if
student a oversees student b in some student organization. If this graph contains a cycle, the princi-
pal is out of luck. Otherwise, a topological sort of the students according to this graph will satisfy
the principal’s request. Run DFS on the graph (exploring the whole graph as in graph explore)
to obtain an order of DFS vertex finishing times in O(|V | + |E|) time. While performing the DFS,
keep track of the ancestors of each vertex in the DFS tree, and evaluate if each new edge processed
is a back edge. If a back edge is found from vertex u to v, follow parent pointers back to v from u to
obtain a directed cycle in the graph to prove to the principal that no such order exists. Otherwise, if
no cycle is found, the graph is acyclic and the order returned by DFS is the reverse of a topological
sort, which may then be returned to the principal.
We’ve made a CoffeeScript graph search visualizer which you can find here:
https://codepen.io/mit6006/pen/dgeKEN
MIT OpenCourseWare
https://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms
Introduction to Algorithms: 6.006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Recitation 11
Recitation 11
Weighted Graphs
For many applications, it is useful to associate a numerical weight to edges in a graph. For example,
a graph modeling a road network might weight each edge with the length of a road corresponding
to the edge, or a graph modeling an online dating network might contain edges from one user to
another weighted by directed attraction. A weighted graph is then a graph G = (V, E) together
with a weight function w : E → R, mapping edges to real-valued weights. In practice, edge
weights will often not be represented by a separate function at all, preferring instead to store each
weight as a value in an adjacency matrix, or inside an edge object stored in an adjacency list or
set. For example, below are randomly weighted adjacency set representations of the graphs from
Recitation 11. A function to extract such weights might be: def w(u,v): return W[u][v].
Now that you have an idea of how weights could be stored, for the remainder of this class you
may simply assume that a weight function w can be stored using O(|E|) space, and can return the
weight of an edge in constant time1 . When referencing the weight of an edge e = (u, v), we will
often use the notation w(u, v) interchangeably with w(e) to refer to the weight of an edge.
Exercise: Represent graphs W1 and W2 as adjacency matrices. How could you store weights in an
adjacency list representation?
In fact, when a graph contains a cycle (a path starting and ending at the same vertex) that has
negative weight, then some shortest paths might not even exist, because for any path containing
a vertex from the negative weight cycle, a shorter path can be found by adding a tour around the
cycle. If any path from s to some vertex v contains a vertex from a negative weight cycle, we will
say the shortest path from s to v is undefined, with weight −∞. If no path exists from s to v, then
we will say the shortest path from s to v is undefined, with weight +∞. In addition to breadth-first
search, we will present three additional algorithms to compute single source shortest paths that
cater to different types of weighted graphs.
Weighted Single Source Shortest Path Algorithms
Restrictions SSSP Algorithm
Graph Weights Name Running Time O(·)
General Unweighted BFS |V | + |E|
DAG Any DAG Relaxation |V | + |E|
General Any Bellman-Ford |V | · |E|
General Non-negative Dijkstra |V | log |V | + |E|
Relaxation
We’ve shown you one view of relaxation in lecture. Below is another framework by which you can
view DAG relaxation. As a general algorithmic paradigm, a relaxation algorithm searches for a
solution to an optimization problem by starting with a solution that is not optimal, then iteratively
improves the solution until it becomes an optimal solution to the original problem. In the single
source shortest paths problem, we would like to find the weight δ(s, v) of a shortest path from
source s to each vertex v in a graph. As a starting point, for each vertex v we will initialize an
upper bound estimate d(v) on the shortest path weight from s to v, +∞ for all d(s, v) except
d(s, s) = 0. During the relaxation algorithm, we will repeatedly relax some path estimate d(s, v),
decreasing it toward the true shortest path weight δ(s, v). If ever d(s, v) = δ(s, v), we say that
estimate d(s, v) is fully relaxed. When all shortest path estimates are fully relaxed, we will have
solved the original problem. Then an algorithm to find shortest paths could take the following
form:
There are a number of problems with this algorithm, not least of which is that it never terminates!
But if we can repeatedly decrease each shortest path estimates to fully relax each d(s, v), we will
have found shortest paths. How do we ‘relax’ vertices and when do we stop relaxing?
Recitation 11 3
To relax a shortest path estimate d(s, v), we will relax an incoming edge to v, from another vertex
u. If we maintain that d(s, u) always upper bounds the shortest path from s to u for all u ∈ V ,
then the true shortest path weight δ(s, v) can’t be larger than d(s, u) + w(u, v) or else going to u
along a shortest path and traversing the edge (u, v) would be a shorter path2 . Thus, if at any time
d(s, u) + w(u, v) < d(s, v), we can relax the edge by setting d(s, v) = d(s, u) + w(u, v), strictly
improving our shortest path estimate.
If we only change shortest path estimates via relaxation, than we can prove that the shortest path
estimates will never become smaller than true shortest paths.
If ever we arrive at an assignment of all shortest path estimates such that no edge in the graph can
be relaxed, then we can prove that shortest path estimates are in fact shortest path distances.
Termination Lemma: If no edge can be relaxed, then d(s, v) ≤ δ(s, v) for all v ∈ V .
Proof. Suppose for contradiction δ(s, v) < d(s, v) so that there is a shorter path π
from s to v. Let (a, b) be ther first edge of π such that d(b) > δ(s, b). Then edge (a, b)
can be relaxed, a contradiction.
So, we can change lines 5-6 of the general relaxation algorithm to repeatedly relax edges from the
graph until no edge can be further relaxed.
It remains to analyze the running time of this algorithm, which cannot be determined unless we
provide detail for how this algorithm chooses edges to relax. If there exists a negative weight cycle
in the graph reachable from s, this algorithm will never terminate as edges along the cycle could
be relaxed forever. But even for acyclic graphs, this algorithm could take exponential time.
2
This is a special case of the triangle inequality: δ(a, c) ≤ δ(a, b) + δ(b, c) for all a, b, c ∈ V .
Recitation 11 4
Exponential Relaxation
How many modifying edge relaxations could occur in an acyclic graph before all edges are fully
relaxed? Below is a weighted directed graph on 2n + 1 vertices and 3n edges for which the
relaxation framework could perform an exponential number of modifying relaxations, if edges are
relaxed in a bad order.
top
left
right
This graph contains n sections, with section i containing three edges, (v2i , v2i+1 ), (v2i , v2i+2 ), and
(v2i+1 , v2i+2 ), each with weight 2n−i ; we will call these edges within a section, left, top, and right
respectively. In this construction, the lowest weight path from v0 to vi is achieved by traversing
top edges until vi ’s section is reached. Shortest paths from v0 can easily by found by performing
only a linear number of modifying edge relaxations: relax the top and left edges of each successive
section. However, a bad relaxation order might result in many more modifying edge relaxations.
To demonstrate a bad relaxation order, initialize all minimum path weight estimates to ∞, except
d(s, s) = 0 for source s = v0 . First relax the left edge, then the right edge of section 0, updating
the shortest path estimate at v2 to d(s, v2 ) = 2n + 2n = 2n+1 . In actuality, the shortest path from
v0 to v2 is via the top edge, i.e., δ(s, v2 ) = 2n . But before relaxing the top edge of section 0,
recursively apply this procedure to fully relax the remainder of the graph, from section 1 to n − 1,
computing shortest path estimates based on the incorrect value of d(s, v2 ) = 2n+1 . Only then relax
the top edge of section 0, after which d(s, v2 ) is modified to its correct value 2n . Lastly, fully relax
sections 1 through n − 1 one more time recursively, to their correct and final values.
How many modifying edge relaxations are performed by this edge relaxation ordering? Let T (n)
represent the number of modifying edge relaxations performed by the procedure on a graph con-
taining n sections, with recurrence relation given by T (n) = 3 + 2T (n − 2). The solution to this
recurrence is T (n) = O(2n/2 ), exponential in the size of the graph. Perhaps there exists some edge
relaxation order requiring only a polynomial number of modifying edge relaxations?
DAG Relaxation
In a directed acyclic graph (DAG), there can be no negative weight cycles, so eventually relaxation
must terminate. It turns out that relaxing each outgoing edge from every vertex exactly once in
a topological sort order of the vertices, correctly computes shortest paths. This shortest paths
algorithm is sometimes called DAG Relaxation.
Recitation 11 5
Claim: The DAG Relaxation algorithm computes shortest paths in a directed acyclic graph.
Proof. We prove that at termination, d(s, v) = δ(s, v) for all v ∈ V . First observe that Safety
ensures that a vertex not reachable from s will retain d(s, v) = +∞ at termination. Alterna-
tively, consider any shortest path π = (v1 , . . . , vm ) from v1 = s to any vertex vm = v reach-
able from s. The topological sort order ensures that edges of the path are relaxed in the order in
which they appear in the path. Assume for induction that before edge (vi , vi+1 ) ∈ π is relaxed,
d(s, vi ) = δ(s, vi ). Setting d(s, s) = 0 at the start provides a base case. Then relaxing edge
(vi , vi+1 ) sets d(s, vi+1 ) = δ(s, vi ) + w(vi , vi+1 ) = δ(s, vi+1 ), as sub-paths of shortest paths are
also shortest paths. Thus the procedure constructs shortest path weights as desired. Since depth-
first search runs in linear time and the loops relax each edge exactly once, this algorithm takes
O(|V | + |E|) time.
Exercise: You have been recruited by MIT to take part in a new part time student initiative where
you will take only one class per term. You don’t care about graduating; all you really want to do
is to take 19.854, Advanced Quantum Machine Learning on the Blockchain: Neural Interfaces,
but are concerned because of its formidable set of prerequisites. MIT professors will allow you
take any class as long as you have taken at least one of the class’s prerequisites prior to taking the
class. But passing a class without all the prerequsites is difficult. From a survey of your peers, you
know for each class and prerequisite pair, how many hours of stress the class will demand. Given
a list of classes, prerequisites, and surveyed stress values, describe a linear time algorithm to find a
sequence of classes that minimizes the amount of stress required to take 19.854, never taking more
than one prerequisite for any class. You may assume that every class is offered every semester.
Solution: Build a graph with a vertex for every class and a directed edge from class a to class
b if b is a prerequisite of a, weighted by the stress of taking class a after having taken class b
as a prerequisite. Use topological sort relaxation to find the shortest path from class 19.854 to
every other class. From the classes containing no prerequisites (sinks of the DAG), find one with
minimum total stress to 19.854, and return its reversed shortest path.
MIT OpenCourseWare
https://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms
Introduction to Algorithms: 6.006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Recitation 12
Recitation 12
Bellman-Ford
In lecture, we presented a version of Bellman-Ford1 based on graph duplication and DAG Re-
laxation that solves SSSPs in O(|V ||E|) time and space, and can return a negative-weight cycle
reachable on a path from s to v, for any vertex v with δ(s, v) = −∞.
The original Bellman-Ford algorithm is easier to state but is a little less powerful. It solves SSSPs
in the same time using only O(|V |) space, but only detects whether a negative-weight cycle exists
(will not return such a negative weight cycle). It is based on the relaxation framework discussed in
R11. The algorithm is straight-forward: initialize distance estimates, and then relax every edge in
the graph in |V |−1 rounds. The claim is that: if the graph does not contain negative-weight cycles,
d(s, v) = δ(s, v) for all v ∈ V at termination; otherwise if any edge still relaxable (i.e., still violates
the triangle inequality), the graph contains a negative weight cycle. A Python implementation of
the Bellman-Ford algorithm is given below.
This algorithm has the same overall structure as the general relaxation paradigm, but limits the
order in which edges can be processed. In particular, the algorithm relaxes every edge of the graph
(lines 10-12), in a series of |V | − 1 rounds (line 9). The following lemma establishes correctness
of the algorithm.
1
This algorithm is called Bellman-Ford after two researchers who independently proposed the same algorithm in
different contexts.
Recitation 12 2
Lemma 1 At the end of relaxation round i of Bellman-Ford, d(s, v) = δ(s, v) for any vertex v that
has a shortest path from s to v which traverses at most i edges.
Proof. Proof by induction on round i. At the start of the algorithm (at end of round 0), the only
vertex with shortest path from s traversing at most 0 edges is vertex s, and Bellman-Ford correctly
sets d(s, s) = 0 = δ(s, s). Now suppose the claim is true at the end of round i − 1. Let v be a
vertex containing a shortest path from s traversing at most i edges. If v has a shortest path from s
traversing at most i−1 edges, d(s, v) = δ(s, v) prior to round i, and will continue to hold at the end
of round i by the upper-bound property2 Alternatively, d(s, v) 6= δ(s, v) prior to round i, and let
u be the second to last vertex visited along some shortest path from s to v which traverses exactly
i edges. Some shortest path from s to u traverses at most i − 1 edges, so d(s, u) = δ(s, u) prior
to round i. Then after the edge from u to v is relaxed during round i, d(s, v) = δ(s, v) as desired.
If the graph does not contain negative weight cycles, some shortest path is simple, and contains at
most |V |−1 edges as it traverses any vertex of the graph at most once. Thus after |V |−1 rounds of
Bellman-Ford, d(s, v) = δ(s, v) for every vertex with a simple shortest path from s to v. However,
if after |V | − 1 rounds of relaxation, some edge (u, v) still violates the triangle inequality (lines
14-17), then there exists a path from s to v using |V | edges which has lower weight than all paths
using fewer edges. Such a path cannot be simple, so it must contain a negative weight cycle.
This algorithm runs |V | rounds, where each round performs a constant amount of work for each
edge in the graph, so Bellman-Ford runs in O(|V ||E|) time. Note that lines 10-11 actually take
O(|V | + |E|) time to loop over the entire adjacency list structure, even for vertices adjacent to no
edge. If the graph contains isolated vertices that are not S, we can just remove them from Adj to
ensure that |V | = O(|E|). Note that if edges are processed in a topological sort order with respect
to a shortest path tree from s, then Bellman-Ford will correctly compute shortest paths from s after
its first round; of course, it is not easy to find such an order. However, for many graphs, significant
savings can be obtained by stopping Bellman-Ford after any round for which no edge relaxation is
modifying.
Note that this algorithm is different than the one presented in lecture in two important ways:
• The original Bellman-Ford only keeps track of one ‘layer’ of d(s, v) estimates in each round,
while the lecture version keeps track of dk (s, v) for k ∈ {0, . . . , |V |}, which can be then used
to construct negative-weight cycles.
• A distance estimate d(s, v) in round k of original Bellman-Ford does not necessarily equal
dk (s, v), the k-edge distance to v computed in the lecture version. This is because the original
Bellman-Ford may relax multiple edges along a shortest path to v in a single round, while
the lecture version relaxes at most one in each level. In other words, distance estimate d(s, v)
in round k of original Bellman-Ford is never larger than dk (s, v), but it may be much smaller
and converge to a solution quicker than the lecture version, so may be faster in practice.
2
Recall that the Safety Lemma from Recitation 11 ensures that relaxation maintains δ(s, v) ≤ d(s, v) for all v.
Recitation 12 3
Exercise: Alice, Bob, and Casey are best friends who live in different corners of a rural school
district. During the summer, they decide to meet every Saturday at some intersection in the district
to play tee-ball. Each child will bike to the meeting location from their home along dirt roads.
Each dirt road between road intersections has a level of fun associated with biking along it in a
certain direction, depending on the incline and quality of the road, the number of animals passed,
etc. Road fun-ness may be positive, but could also be be negative, e.g. when a road is difficult
to traverse in a given direction, or passes by a scary dog, etc. The children would like to: choose
a road intersection to meet and play tee-ball that maximizes the total fun of all three children
in reaching their chosen meeting location; or alternatively, abandon tee-ball altogether in favor
of biking, if a loop of roads exists in their district along which they can bike all day with ever
increasing fun. Help the children organize their Saturday routine by finding a tee-ball location,
or determining that there exists a continuously fun bike loop in their district (for now, you do not
have to find such a loop). You may assume that each child can reach any road in the district by bike.
Solution: Construct a graph on road intersections within the district, as well as the locations a, b,
and c of the homes of the three children, with a directed edge from one vertex to another if there
is a road between them traversable in that direction by bike, weighted by negative fun-ness of the
road. If a negative weight cycle exists in this graph, such a cycle would represent a continuously
fun bike loop. To check for the existence of any negative weight cycle in the graph, run Bellman-
Ford from vertex a. If Bellman-Ford detects a negative weight cycle by finding an edge (u, v)
that can be relaxed in round |V |, return that a continuously fun bike loop exists. Alternatively, if
no negative weight cycle exists, minimal weighted paths correspond to bike routes that maximize
fun. Running Bellman-Ford from vertex a then computes shortest paths d(s, v) from a to each
vertex v in the graph. Run Bellman-Ford two more times, once from vertex b and once from vertex
c, computing shortest paths values d(b, v) and d(c, v) respectively for each vertex v in the graph.
Then for each vertex v, compute the sum d(a, v) + d(b, v) + d(c, v). A vertex that minimizes this
sum will correspond to a road intersection that maximizes total fun of all three children in reaching
it. This algorithm runs Bellman-Ford three times and then compares a constant sized sum at each
vertex, so this algorithm runs in O(|V ||E|) time.
MIT OpenCourseWare
https://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms
Introduction to Algorithms: 6.006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Recitation 13
Recitation 13
Dijkstra’s Algorithm
Dijkstra is possibly the most commonly used weighted shortest paths algorithm; it is asymptoti-
cally faster than Bellman-Ford, but only applies to graphs containing non-negative edge weights,
which appear often in many applications. The algorithm is fairly intuitive, though its implemen-
tation can be more complicated than that of other shortest path algorithms. Think of a weighted
graph as a network of pipes, each with non-negative length (weight). Then turn on a water faucet at
a source vertex s. Assuming the water flowing from the faucet traverses each pipe at the same rate,
the water will reach each pipe intersection vertex in the order of their shortest distance from the
source. Dijkstra’s algorithm discretizes this continuous process by repeatedly relaxing edges from
a vertex whose minimum weight path estimate is smallest among vertices whose out-going edges
have not yet been relaxed. In order to efficiently find the smallest minimum weight path estimate,
Dijkstra’s algorithm is often presented in terms of a minimum priority queue data structure. Dijk-
stra’s running time then depends on how efficiently the priority queue can perform its supported
operations. Below is Python code for Dijkstra’s algorithm in terms of priority queue operations.
This algorithm follows the same structure as the general relaxation framework. Lines 2-4 initialize
shortest path weight estimates and parent pointers. Lines 5-7 initialize a priority queue with all
vertices from the graph. Lines 8-12 comprise the main loop. Each time the loop is executed, line
9 removes a vertex from the queue, so the queue will be empty at the end of the loop. The vertex
u processed in some iteration of the loop is a vertex from the queue whose shortest path weight
estimate is smallest, from among all vertices not yet removed from the queue. Then, lines 10-11
relax the out-going edges from u as usual. However, since relaxation may reduce the shortest path
weight estimate d(s, v), vertex v’s key in the queue must be updated (if it still exists in the queue);
line 12 accomplishes this update.
Recitation 13 2
Why does Dijkstra’s algorithm compute shortest paths for a graph with non-negative edge weights?
The key observation is that shortest path weight estimate of vertex u equals its actual shortest path
weight d(s, u) = δ(s, u) when u is removed from the priority queue. Then by the upper-bound
property, d(s, u) = δ(s, u) will still hold at termination of the algorithm. A proof of correctness is
described in the lecture notes, and will not be repeated here. Instead, we will focus on analyzing
running time for Dijkstra implemented using different priority queues.
Exercise: Construct a weighted graph with non-negative edge weights, and apply Dijkstra’s algo-
rithm to find shortest paths. Specifically list the key-value pairs stored in the priority queue after
each iteration of the main loop, and highlight edges corresponding to constructed parent pointers.
Priority Queues
An important aspect of Dijkstra’s algorithm is the use of a priority queue. The priority queue
interface used here differs slightly from our presentation of priority queues earlier in the term.
Here, a priority queue maintains a set of key-value pairs, where vertex v is a value and d(s, v) is its
key. Aside from empty initialization, the priority queue supports three operations: insert(val,
key) adds a key-value pair to the queue, extract min() removes and returns a value from the
queue whose key is minimum, and decrease key(val, new key) which reduces the key of
a given value stored in the queue to the provided new key. The running time of Dijkstra depends
on the running times of these operations. Specifically, if Ti , Te , and Td are the respective running
times for inserting a key-value pair, extracting a value with minimum key, and decreasing the key
of a value, the running time of Dijkstra will be:
If the graph is sparse, |E| = O(|V |), we can speed things up with more sophisticated priority queue
implementations. We’ve seen that a binary min heap can implement insertion and extract-min in
O(log n) time. However, decreasing the key of a value stored in a priority queue requires finding
the value in the heap in order to change its key, which naively could take linear time. However,
this difficulty is easily addressed: each vertex can maintain a pointer to its stored location within
the heap, or the heap can maintain a mapping from values (vertices) to locations within the heap
(you were asked to do this in Problem Set 5). Either solution can support finding a given value
in the heap in constant time. Then, after decreasing the value’s key, one can restore the min heap
property in logarithmic time by re-heapifying the tree. Since a binary heap can support each of the
three operations in O(log |V |) time, the running time of Dijkstra will be:
THeap = O((|V | + |E|) log |V |).
For sparse graphs, that’s O(|V | log |V |)! For graphs in between sparse and dense, there is an even
more sophisticated priority queue implementation using a data structure called a Fibonacci Heap,
which supports amortized O(1) time insertion and decrease-key operations, along with O(log n)
minimum extraction. Thus using a Fibonacci Heap to implement the Dijkstra priority queue leads
to the following worst-case running time:
TF ibHeap = O(|V | log |V | + |E|).
We won’t be talking much about Fibonacci Heaps in this class, but they’re theoretically useful for
speeding up Dijkstra on graphs that have a number of edges asymptotically in between linear and
quadratic in the number of graph vertices. You may quote the Fibonacci Heap running time bound
whenever you need to argue the running time of Dijkstra when solving theory questions.
Recitation 13 4
1 class Item:
2 def __init__(self, label, key):
3 self.label, self.key = label, key
4
5 class PriorityQueue: # Binary Heap Implementation
6 def __init__(self): # stores keys with unique labels
7 self.A = []
8 self.label2idx = {}
9
10 def min_heapify_up(self, c):
11 if c == 0: return
12 p = (c - 1) // 2
13 if self.A[p].key > self.A[c].key:
14 self.A[c], self.A[p] = self.A[p], self.A[c]
15 self.label2idx[self.A[c].label] = c
16 self.label2idx[self.A[p].label] = p
17 self.min_heapify_up(p)
18
Fibonacci Heaps are not actually used very often in practice as it is more complex to implement,
and results in larger constant factor overhead than the other two implementations described above.
When the number of edges in the graph is known to be at most linear (e.g., planar or bounded
degree graphs) or at least quadratic (e.g. complete graphs) in the number of vertices, then using a
binary heap or dictionary respectively will perform as well asymptotically as a Fibonacci Heap.
We’ve made a JavaScript Dijkstra visualizer which you can find here:
https://codepen.io/mit6006/pen/BqgXWM
Exercise: CIA officer Mary Cathison needs to drive to meet with an informant across an unwel-
come city. Some roads in the city are equipped with government surveillance cameras, and Mary
will be detained if cameras from more than one road observe her car on the way to her informant.
Mary has a map describing the length of each road and the locations and ranges of surveillance
cameras. Help Mary find the shortest drive to reach her informant, being seen by at most one
surveillance camera along the way.
Solution: Construct a graph having two vertices (v, 0) and (v, 1) for every road intersection v
within the city. Vertex (v, i) represents arriving at intersection v having already been spotted by
exactly i camera(s). For each road from intersection u to v: add two directed edges from (u, 0) to
(v, 0) and from (u, 1) to (v, 1) if traveling on the road will not be visible by a camera; and add one
directed edge from (u, 0) to (v, 1) if traveling on the road will be visible. If s is Mary’s start location
and t is the location of the informant, any path from (s, 0) to (t, 0) or (t, 1) in the constructed graph
will be a path visible by at most one camera. Let n be the number of road intersections and m be the
number of roads in the network. Assuming lengths of roads are positive, use Dijkstra’s algorithm
to find the shortest such path in O(m + n log n) time using a Fibonacci Heap for Dijkstra’s priority
queue. Alternatively, since the road network is likely planar and/or bounded degree, it may be
safe to assume that m = O(n), so a binary heap could be used instead to find a shortest path in
O(n log n) time.
MIT OpenCourseWare
https://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms
Introduction to Algorithms: 6.006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Recitation 14
Recitation 14
We presented these algorithms with respect to the SSSP problem, but along the way, we also
showed how to use these algorithms to solve other problems. For example, we can also count
connected components in a graph using Full-DFS or Full-BFS, topologically sort vertices in a
DAG using DFS, and detect negative weight cycles using Bellman-Ford.
Johnson’s Algorithm
The idea behind Johnson’s Algorithm is to reduce the ASPS problem on a graph with arbitrary
edge weights to the ASPS problem on a graph with non-negative edge weights. The algorithm
does this by re-weighting the edges in the original graph to non-negative values in such a way so
that shortest paths in the re-weighted graph are also shortest paths in the original graph. Then find-
ing shortest paths in the re-weighted graph using |V | times Dijkstra will solve the original problem.
How can we re-weight edges in a way that preserves shortest paths? Johnson’s clever idea is to
assign each vertex v a real number h(v), and change the weight of each edge (a, b) from w(a, b) to
w0 (a, b) = w(a, b) + h(a) − h(b), to form a new weight graph G0 = (V, E, w0 ).
Claim: A shortest path (v1 , v2 , . . . , vk ) in G0 is also a shortest path in G from v1 to vk .
Proof. Let w(π) = k−1 0
P
i=1 w(vi , vi+1 ) be the weight of path π in G. Then weight of π in G is:
k−1
X k−1
X
0
w (vi , vi+1 ) = w(vi , vi+1 ) + h(vi ) − h(vi+1 )
i=1 i=1
k−1
! k−1
! k−1
!
X X X
= w(vi , vi+1 ) + h(vi ) − h(vi+1 ) = w(π) + h(v1 ) − h(vk ).
i=1 i=1 i=1
So, since each path from v1 to vk is increased by the same number h(v1 ) − h(vk ), shortest paths
remain shortest.
It remains to find a vertex assignment function h, for which all edge weights w0 (a, b) in the modi-
fied graph are non-negative. Johnson’s defines h in the following way: add a new node x to G with
a directed edge from x to v for each vertex v ∈ V to construct graph G∗ , letting h(v) = δ(x, v).
This assignment of h ensures that w0 (a, b) ≥ 0 for every edge (a, b).
Claim: If h(v) = δ(x, v) and h(v) is finite, then w0 (a, b) = w(a, b) + h(a) − h(b) ≥ 0 for every
edge (a, b) ∈ E.
Proof. The claim is equivalent to claiming δ(x, b) ≤ w(a, b) + δ(x, a) for every edge (a, b) ∈ E,
i.e. the minimum weight of any path from x to b in G∗ is not greater than the minimum weight of
any path from x to a than traversing the edge from a to b, which is true by definition of minimum
weight. (This is simply a restatement of the triangle inequality.)
Johnson’s algorithm computes h(v) = δ(x, v), negative minimum weight distances from the added
node x, using Bellman-Ford. If δ(x, v) = −∞ for any vertex v, then there must be a negative
weight cycle in the graph, and Johnson’s can terminate as no output is required. Otherwise, John-
son’s can re-weight the edges of G to w0 (a, b) = w(a, b) + h(a) − h(b) ≥ 0 into G0 containing only
positive edge weights. Since shortest paths in G0 are shortest paths in G, we can run Dijkstra |V |
times on G0 to find a single source shortest paths distances δ 0 (u, v) from each vertex u in G0 . Then
we can compute each δ(u, v) by setting it to δ 0 (u, v)−δ(x, u)+δ(x, y). Johnson’s takes O(|V ||E|)
time to run Bellman-Ford, and O(|V |(|V | log |V | + |E|)) time to run Dijkstra |V | times, so this
algorithm runs in O(|V |2 log |V | + |V ||E|) time, asymptotically better than O(|V |2 |E|).
MIT OpenCourseWare
https://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms
Introduction to Algorithms: 6.006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Recitation 15
Recitation 15
Dynamic Programming
Dynamic Programming generalizes Divide and Conquer type recurrences when subproblem de-
pendencies form a directed acyclic graph instead of a tree. Dynamic Programming often applies to
optimization problems, where you are maximizing or minimizing a single scalar value, or counting
problems, where you have to count all possibilities. To solve a problem using dynamic program-
ming, we follow the following steps as part of a recursive problem solving framework.
2. Relate subproblem solutions recursively x(i) = f (x(j), . . .) for one or more j < i
4. Base cases
• State solutions for all (reachable) independent subproblems where relation doesn’t ap-
ply/work
5. Original problem
6. Time analysis
P
• x∈X work(x), or if work(x) = O(W ) for all x ∈ X, then |X| · O(W )
• work(x) measures nonrecursive work in relation; treat recursions as taking O(1) time
Recitation 15 2
Implementation
Once subproblems are chosen and a DAG of dependencies is found, there are two primary methods
for solving the problem, which are functionally equivalent but are implemented differently.
• A top down approach evaluates the recursion starting from roots (vertices incident to no
incoming edges). At the end of each recursive call the calculated solution to a subproblem
is recorded into a memo, while at the start of each recursive call, the memo is checked to see
if that subproblem has already been solved.
Top down is a recursive view, while Bottom up unrolls the recursion. Both implementations are
valid and often used. Memoization is used in both implementations to remember computation from
previous subproblems. While it is typical to memoize all evaluated subproblems, it is often possi-
ble to remember (memoize) fewer subproblems, especially when subproblems occur in ‘rounds’.
Often we don’t just want the value that is optimized, but we would also like to return a path of
subproblems that resulted in the optimized value. To reconstruct the answer, we need to maintain
auxiliary information in addition to the value we are optimizing. Along with the value we are
optimizing, we can maintain parent pointers to the subproblem or subproblems upon which a
solution to the current subproblem depends. This is analogous to maintaining parent pointers in
shortest path problems.
The player wins the round if the value of the player’s hand (i.e., the sum of cards drawn by the
player in the round) is ≤ 21 and exceeds the value of the dealer’s hand; otherwise, the player
loses the round. The game ends when a round ends with fewer than 5 cards remaining in the deck.
Given a deck of n cards with a known order, describe an O(n)-time algorithm to determine the
maximum number of rounds the player can win by playing simplified blackjack with the deck.
Recitation 15 3
Solution:
1. Subproblems
• Choose suffixes
• x(i) : maximum rounds player can win by playing blackjack using cards (ci , . . . , cn )
2. Relate
3. Topo
4. Base
5. Original
• Solve x(i) for i ∈ {1, . . . , n + 1}, via recursive top down or iterative bottom up
• x(1): the maximum rounds player can win by playing blackjack with the full deck
6. Time
• # subproblems: n + 1
• work per subproblem: O(1)
• O(n) running time
Recitation 15 4
Solution:
1. Subproblems
2. Relate
• The first line must break at some word, so try all possibilities
• x(i) = min{b(i, j) + x(j + 1) | i ≤ j < n}
3. Topo
4. Base
5. Original
6. Time
• # subproblems: O(n)
• work per subproblem: O(n2 )
Recitation 15 5
Optimization
1. Subproblems
2. Relate
P
• x(i, j) = k wk takes O(j − i) time to compute, slow!
• x(i, j) = x(i, j − 1) + wj takes O(1) time to compute, faster!
3. Topo
4. Base
5. Original
6. Time
• # subproblems: O(n2 )
• work per subproblem: O(1)
• O(n2 ) running time
MIT OpenCourseWare
https://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms
Introduction to Algorithms: 6.006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Recitation 16
Recitation 16
Solution: We could brute force in O(n3 ) by computing the sum of each of the O(n2 ) subarrays in
O(n) time. We can get a faster algorithm by noticing that the subarray with maximum sum must
end somewhere. Finding the maximum subarray ending at a particular location k can be computed
in O(n) time by scanning to the left from k, keeping track of a rolling sum, and remembering the
maximum along the way; since there are n ending locations, this algorithm runs in O(n2 ) time.
We can do even faster by recognizing that each successive scan to the left is redoing work that has
already been done in earlier scans. Let’s use dynamic programming to reuse this work!
1. Subproblems
2. Relate
3. Topo. Order
4. Base
5. Original
6. Time
• # subproblems: O(n)
• work per subproblem: O(1)
• time to solve original problem: O(n)
• O(n) time in total
1 # bottom up implementation
2 def max_subarray_sum(A):
3 x = [None for _ in A] # memo
4 x[0] = A[0] # base case
5 for k in range(1, len(A)): # iteration
6 x[k] = max(A[k], A[k] + x[k - 1]) # relation
7 return max(x) # original
Edit Distance
A plagiarism detector needs to detect the similarity between two texts, string A and string B. One
measure of similarity is called edit distance, the minimum number of edits that will transform
string A into string B. An edit may be one of three operations: delete a character of A, replace a
character of A with another letter, and insert a character between two characters of A. Describe a
O(|A||B|) time algorithm to compute the edit distance between A and B.
Solution:
1. Subproblems
2. Relate
• Replace changes A(i) to B(j) and removes both A(i) and B(j)
x(i − 1, j − 1) if A(i) = B(i)
• x(i, j) =
1 + min(x(i − 1, j), x(i, j − 1), x(i − 1, j − 1)) otherwise
3. Topo. Order
4. Base
5. Original
6. Time
• # subproblems: O(n2 )
• work per subproblem: O(1)
• O(n2 ) running time
Exercise: Modify the code above to return a minimal sequence of edits to transform string A into
string B. (Note, the base cases in the above code are computed individually to make reconstructing
a solution easier.)
MIT OpenCourseWare
https://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms
Introduction to Algorithms: 6.006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Recitation 17
Recitation 17
Treasureship!
The new boardgame Treasureship is played by placing 2 × 1 ships within a 2 × n rectangular grid.
Just as in regular battleship, each 2 × 1 ship can be placed either horizontally or vertically, occupy-
ing exactly 2 grid squares, and each grid square may only be occupied by a single ship. Each grid
square has a positive or negative integer value, representing how much treasure may be acquired
or lost at that square. You may place as many ships on the board as you like, with the score of a
placement of ships being the value sum of all grid squares covered by ships. Design an efficient
dynamic-programming algorithm to determine a placement of ships that will maximize your total
score.
Solution:
1. Subproblems
1 (0) ##### (+1) ##### (-1) #### (+2) ##### (-2) ### row 2
2 ##### #### ##### ### ##### row 1
• Exists optimal placement where no two ships aligned horizontally on top of each other
• Proof: cover instead by two vertical ships next to each other!
• So actually only need first three cases above: 0, +1, −1
• Let s(i, j) represent game board subset containing columns 1 to i of row 1, and columns
1 to i + j of row 2, for j ∈ {0, +1, −1}
• x(i, j): maximum score, only placing ships on board subset s(i, j)
• for i ∈ {0, . . . , n}, j ∈ {0, +1, −1}
Recitation 17 2
2. Relate
• If j = +1, can cover right-most square with horizontal ship or leave empty
• If j = −1, can cover right-most square with horizontal ship or leave empty
• If j = 0, can cover column i with vertical ship or not cover one of right-most squares
⎧
⎨ max{v(i, 1) + v(i − 1, 1) + x(i − 2, +1), x(i − 1, 0)} if j = −1
• x(i, j) = max{v(i + 1, 2) + v(i, 2) + x(i, −1), x(i, 0)} if j = +1
max{v(i, 1) + v(i, 2) + x(i − 1, 0), x(i, −1), x(i − 1, +1)} if j = 0
⎩
3. Topo
4. Base
5. Original
6. Time
• # subproblems: O(n)
• work per subproblem O(1)
• O(n) running time
Recitation 17 3
Wafer Power
A start-up is working on a new electronic circuit design for highly-parallel computing. Evenly-
spaced along the perimeter of a circular wafer sits n ports for either a power source or a computing
unit. Each computing unit needs energy from a power source, transferred between ports via a
wire etched into the top surface of the wafer. However, if a computing unit is connected to a power
source that is too close, the power can overload and destroy the circuit. Further, no two etched wires
may cross each other. The circuit designer needs an automated way to evaluate the effectiveness
of different designs, and has asked you for help. Given an arrangement of power sources and
computing units plugged into the n ports, describe an O(n3 )-time dynamic programming algorithm
to match computing units to power sources by etching non-crossing wires between them onto the
surface of the wafer, in order to maximize the number of powered computing units, where wires
may not connect two adjacent ports along the perimeter. Below is an example wafer, with non-
crossing wires connecting computing units (white) to power sources (black).
Solution:
1. Subproblems
• Let (a1 , . . . , an ) be the ports cyclically ordered counter-clockwise around the wafer,
where ports a1 and an are adjacent
• Let ai be True if the port is a computing unit, and False if it is a power source
• Want to match opposite ports connected by non-crossing wires
• If match across the wafer, need to independently match ports on either side (substrings!)
• x(i, j): maximum number of matchings, restricting to ports ak for all k ∈ {i, . . . , j}
• for i ∈ {1, . . . , n}, j ∈ {i − 1, . . . , n}
• j − i + 1 is number of ports in substring (allow j = i − 1 as an empty substring)
Recitation 17 4
2. Relate
3. Topo
4. Base
5. Original
6. Time
• # subproblems: O(n2 )
• work per subproblem: O(n)
• O(n3 ) running time
MIT OpenCourseWare
https://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms