0% found this document useful (0 votes)

17 views

Algorithms 20

Uploaded by

karthythevar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

Algorithms 20

Uploaded by

karthythevar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 217

Introduction to Algorithms: 6.

006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Lecture 1: Introduction

Lecture 1: Introduction

The goal of this class is to teach you to solve computation problems, and to communicate that
your solutions are correct and efﬁcient.

Problem
• Binary relation from problem inputs to correct outputs

• Usually don’t specify every correct output for all inputs (too many!)

• Provide a veriﬁable predicate (a property) that correct outputs must satisfy

• 6.006 studies problems on large general input spaces

• Not general: small input instance

– Example: In this room, is there a pair of students with same birthday?

• General: arbitrarily large inputs

– Example: Given any set of n students, is there a pair of students with same birthday?
– If birthday is just one of 365, for n > 365, answer always true by pigeon-hole
– Assume resolution of possible birthdays exceeds n (include year, time, etc.)

Algorithm
• Procedure mapping each input to a single output (deterministic)

• Algorithm solves a problem if it returns a correct output for every problem input

• Example: An algorithm to solve birthday matching

– Maintain a record of names and birthdays (initially empty)

– Interview each student in some order
∗ If birthday exists in record, return found pair!
∗ Else add name and birthday to record
– Return None if last student interviewed without success
2 Lecture 1: Introduction

Correctness
• Programs/algorithms have ﬁxed size, so how to prove correct?

• For small inputs, can use case analysis

• For arbitrarily large inputs, algorithm must be recursive or loop in some way

• Must use induction (why recursion is such a key concept in computer science)

• Example: Proof of correctness of birthday matching algorithm

– Induct on k: the number of students in record

– Hypothesis: if first k contain match, returns match before interviewing student k + 1
– Base case: k = 0, first k contains no match
– Assume for induction hypothesis holds for k = k0 , and consider k = k 0 + 1
– If first k 0 contains a match, already returned a match by induction
– Else first k 0 do not have match, so if first k 0 + 1 has match, match contains k 0 + 1
– Then algorithm checks directly whether birthday of student k 0 + 1 exists in first k 0

Efﬁciency
• How fast does an algorithm produce a correct output?

– Could measure time, but want performance to be machine independent

– Idea! Count number of fixed-time operations algorithm takes to return
– Expect to depend on size of input: larger input suggests longer time
– Size of input is often called ‘n’, but not always!
– Efficient if returns in polynomial time with respect to input
– Sometimes no efficient algorithm exists for a problem! (See L20)

• Asymptotic Notation: ignore constant factors and low order terms

– Upper bounds (O), lower bounds (Ω), tight bounds (Θ) ∈, =, is, order
– Time estimate below based on one operation per cycle on a 1 GHz single-core machine
– Particles in universe estimated < 10100

input constant logarithmic linear log-linear quadratic polynomial exponential

c
n Θ(1) Θ(log n) Θ(n) Θ(n log n) Θ(n2 ) Θ(nc ) 2Θ(n )
1000 1 ≈ 10 1000 ≈ 10,000 1,000,000 1000c 21000 ≈ 10301
Time 1 ns 10 ns 1 µs 10 µs 1 ms 103c−9 s 10281 millenia
Lecture 1: Introduction 3

Model of Computation
• Speciﬁcation for what operations on the machine can be performed in O(1) time

• Model in this class is called the Word-RAM

• Machine word: block of w bits (w is word size of a w-bit Word-RAM)

• Memory: Addressable sequence of machine words

• Processor supports many constant time operations on a O(1) number of words (integers):

– integer arithmetic: (+, -, *, //, %)

– logical operators: (&&, ||, !, ==, <, >, <=, =>)
– (bitwise arithmetic: (&, |, <<, >>, ...))
– Given word a, can read word at address a, write word to address a

• Memory address must be able to access every place in memory

– Requirement: w ≥ # bits to represent largest memory address, i.e., log2 n

– 32-bit words → max ∼ 4 GB memory, 64-bit words → max ∼ 16 exabytes of memory

• Python is a more complicated model of computation, implemented on a Word-RAM

Data Structure
• A data structure is a way to store non-constant data, that supports a set of operations

• A collection of operations is called an interface

– Sequence: Extrinsic order to items (ﬁrst, last, nth)

– Set: Intrinsic order to items (queries based on item keys)

• Data structures may implement the same interface with different performance

• Example: Static Array - ﬁxed width slots, ﬁxed length, static sequence interface

– StaticArray(n): allocate static array of size n initialized to 0 in Θ(n) time

– StaticArray.get at(i): return word stored at array index i in Θ(1) time
– StaticArray.set at(i, x): write word x to array index i in Θ(1) time

• Stored word can hold the address of a larger object

• Like Python tuple plus set at(i, x), Python list is a dynamic array (see L02)
4 Lecture 1: Introduction

1 def birthday_match(students):
2 ’’’
3 Find a pair of students with the same birthday
4 Input: tuple of student (name, bday) tuples
5 Output: tuple of student names or None
6 ’’’
7 n = len(students) # O(1)
8 record = StaticArray(n) # O(n)
9 for k in range(n): # n
10 (name1, bday1) = students[k] # O(1)
11 # Return pair if bday1 in record
12 for i in range(k): # k
13 (name2, bday2) = record.get_at(i) # O(1)
14 if bday1 == bday2: # O(1)
15 return (name1, name2) # O(1)
16 record.set_at(k, (name1, bday1)) # O(1)
17 return None # O(1)

Example: Running Time Analysis

• Two loops: outer k ∈ {0, . . . , n − 1}, inner is i ∈ {0, . . . , k}
• Running time is O(n) + n−1 2
P
k=0 (O(1) + k · O(1)) = O(n )

• Quadratic in n is polynomial. Efﬁcient? Use different data structure for record!

How to Solve an Algorithms Problem

1. Reduce to a problem you already know (use data structure or algorithm)
Search Problem (Data Structures) Sort Algorithms Shortest Path Algorithms
Static Array (L01) Insertion Sort (L03) Breadth First Search (L09)
Linked List (L02) Selection Sort (L03) DAG Relaxation (L11)
Dynamic Array (L02) Merge Sort (L03) Depth First Search (L10)
Sorted Array (L03) Counting Sort (L05) Topological Sort (L10)
Direct-Access Array (L04) Radix Sort (L05) Bellman-Ford (L12)
Hash Table (L04) AVL Sort (L07) Dijkstra (L13)
Balanced Binary Tree (L06-L07) Heap Sort (L08) Johnson (L14)
Binary Heap (L08) Floyd-Warshall (L18)
2. Design your own (recursive) algorithm
• Brute Force
• Decrease and Conquer
• Divide and Conquer
• Dynamic Programming (L15-L19)
• Greedy / Incremental
MIT OpenCourseWare
https://ocw.mit.edu

6.006 Introduction to Algorithms

Spring 2020

For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms
Introduction to Algorithms: 6.006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Lecture 2: Data Structures

Lecture 2: Data Structures

Data Structure Interfaces

• A data structure is a way to store data, with algorithms that support operations on the data

• Collection of supported operations is called an interface (also API or ADT)

• Interface is a speciﬁcation: what operations are supported (the problem!)

• Data structure is a representation: how operations are supported (the solution!)

• In this class, two main interfaces: Sequence and Set

Sequence Interface (L02, L07)

• Maintain a sequence of items (order is extrinsic)

• Ex: (x0 , x1 , x2 , . . . , xn−1 ) (zero indexing)

• (use n to denote the number of items stored in the data structure)

• Supports sequence operations:

Container build(X) given an iterable X, build sequence from items in X

len() return the number of stored items
Static iter seq() return the stored items one-by-one in sequence order
get at(i) return the ith item
set at(i, x) replace the ith item with x
Dynamic insert at(i, x) add x as the ith item
delete at(i) remove and return the ith item
insert first(x) add x as the ﬁrst item
delete first() remove and return the ﬁrst item
insert last(x) add x as the last item
delete last() remove and return the last item

• Special case interfaces:

stack insert last(x) and delete last()
queue insert last(x) and delete first()
2 Lecture 2: Data Structures

Set Interface (L03-L08)

• Sequence about extrinsic order, set is about intrinsic order
• Maintain a set of items having unique keys (e.g., item x has key x.key)
• (Set or multi-set? We restrict to unique keys for now.)
• Often we let key of an item be the item itself, but may want to store more info than just key
• Supports set operations:

Container build(X) given an iterable X, build sequence from items in X

len() return the number of stored items
Static find(k) return the stored item with key k
Dynamic insert(x) add x to set (replace item with key x.key if one already exists)
delete(k) remove and return the stored item with key k
Order iter ord() return the stored items one-by-one in key order
find min() return the stored item with smallest key
find max() return the stored item with largest key
find next(k) return the stored item with smallest key larger than k
find prev(k) return the stored item with largest key smaller than k

• Special case interfaces:

dictionary set without the Order operations
• In recitation, you will be asked to implement a Set, given a Sequence data structure.

Array Sequence
• Array is great for static operations! get at(i) and set at(i, x) in Θ(1) time!
• But not so great at dynamic operations...
• (For consistency, we maintain the invariant that array is full)
• Then inserting and removing items requires:
– reallocating the array
– shifting all items after the modiﬁed item

Operation, Worst Case O(·)

Data Container Static Dynamic
Structure build(X) get at(i) insert first(x) insert last(x) insert at(i, x)
set at(i,x) delete first() delete last() delete at(i)
Array n 1 n n n
Lecture 2: Data Structures 3

Linked List Sequence

• Pointer data structure (this is not related to a Python “list”)

• Each item stored in a node which contains a pointer to the next node in sequence

• Each node has two ﬁelds: node.item and node.next

• Can manipulate nodes simply by relinking pointers!

• Maintain pointers to the ﬁrst node in sequence (called the head)

• Can now insert and delete from the front in Θ(1) time! Yay!

• (Inserting/deleting efﬁciently from back is also possible; you will do this in PS1)

• But now get at(i) and set at(i, x) each take O(n) time... :(

• Can we get the best of both worlds? Yes! (Kind of...)

Operation, Worst Case O(·)

Data Container Static Dynamic
Structure build(X) get at(i) insert first(x) insert last(x) insert at(i, x)
set at(i,x) delete first() delete last() delete at(i)
Linked List n n 1 n n

Dynamic Array Sequence

• Make an array efﬁcient for last dynamic operations

• Python “list” is a dynamic array

• Idea! Allocate extra space so reallocation does not occur with every dynamic operation

• Fill ratio: 0 ≤ r ≤ 1 the ratio of items to space

• Whenever array is full (r = 1), allocate Θ(n) extra space at end to ﬁll ratio ri (e.g., 1/2)

• Will have to insert Θ(n) items before the next reallocation

• A single operation can take Θ(n) time for reallocation

• However, any sequence of Θ(n) operations takes Θ(n) time

• So each operation takes Θ(1) time “on average”

4 Lecture 2: Data Structures

Amortized Analysis
• Data structure analysis technique to distribute cost over many operations

• Operation has amortized cost T (n) if k operations cost at most ≤ kT (n)

• “T (n) amortized” roughly means T (n) “on average” over many operations

• Inserting into a dynamic array takes Θ(1) amortized time

• More amortization analysis techniques in 6.046!

Dynamic Array Deletion

• Delete from back? Θ(1) time without effort, yay!

• However, can be very wasteful in space. Want size of data structure to stay Θ(n)

• Attempt: if very empty, resize to r = 1. Alternating insertion and deletion could be bad...

• Idea! When r < rd , resize array to ratio ri where rd < ri (e.g., rd = 1/4, ri = 1/2)

• Then Θ(n) cheap operations must be made before next expensive resize
1 rd +1
• Can limit extra space usage to (1 + ε)n for any ε > 0 (set rd = ,r
1+ε i
= 2
)

• Dynamic arrays only support dynamic last operations in Θ(1) time

• Python List append and pop are amortized O(1) time, other operations can be O(n)!

• (Inserting/deleting efﬁciently from front is also possible; you will do this in PS1)

Operation, Worst Case O(·)

Data Container Static Dynamic
Structure build(X) get at(i) insert first(x) insert last(x) insert at(i, x)
set at(i,x) delete first() delete last() delete at(i)
Array n 1 n n n
Linked List n n 1 n n
Dynamic Array n 1 n 1(a) n
MIT OpenCourseWare
https://ocw.mit.edu

6.006 Introduction to Algorithms

Spring 2020

Lecture 3: Sorting

Set Interface (L03-L08)

Container build(X) given an iterable X, build set from items in X
len() return the number of stored items
Static find(k) return the stored item with key k
Dynamic insert(x) add x to set (replace item with key x.key if one already exists)
delete(k) remove and return the stored item with key k
Order iter ord() return the stored items one-by-one in key order
find min() return the stored item with smallest key
find max() return the stored item with largest key
find next(k) return the stored item with smallest key larger than k
find prev(k) return the stored item with largest key smaller than k

• Storing items in an array in arbitrary order can implement a (not so efﬁcient) set

• Stored items sorted increasing by key allows:

– faster ﬁnd min/max (at ﬁrst and last index of array)

– faster ﬁnds via binary search: O(log n)

Operations O(·)
Set Container Static Dynamic Order
Data Structure build(X) find(k) insert(x) find min() find prev(k)
delete(k) find max() find next(k)
Array n n n n n
Sorted Array n log n log n n 1 log n

• But how to construct a sorted array efﬁciently?

2 Lecture 3: Sorting

Sorting
• Given a sorted array, we can leverage binary search to make an efﬁcient set data structure.

• Input: (static) array A of n numbers

• Output: (static) array B which is a sorted permutation of A

– Permutation: array with same elements in a different order

– Sorted: B[i − 1] ≤ B[i] for all i ∈ {1, . . . , n}

• Example: [8, 2, 4, 9, 3] → [2, 3, 4, 8, 9]

• A sort is destructive if it overwrites A (instead of making a new array B that is a sorted

version of A)

• A sort is in place if it uses O(1) extra space (implies destructive: in place ⊆ destructive)

Permutation Sort
• There are n! permutations of A, at least one of which is sorted

• For each permutation, check whether sorted in Θ(n)

• Example: [2, 3, 1] → {[1, 2, 3], [1, 3, 2], [2, 1, 3], [2, 3, 1], [3, 1, 2], [3, 2, 1]}
1 def permutation_sort(A):
2 ’’’Sort A’’’
3 for B in permutations(A): # O(n!)
4 if is_sorted(B): # O(n)
5 return B # O(1)

• permutation sort analysis:

– Correct by case analysis: try all possibilities (Brute Force)

– Running time: Ω(n! · n) which is exponential :(

Solving Recurrences
• Substitution: Guess a solution, replace with representative function, recurrence holds true

• Recurrence Tree: Draw a tree representing the recursive calls and sum computation at nodes

• Master Theorem: A formula to solve many recurrences (R03)

Lecture 3: Sorting 3

Selection Sort
• Find a largest number in preﬁx A[:i + 1] and swap it to A[i]

• Recursively sort preﬁx A[:i]

• Example: [8, 2, 4, 9, 3], [8, 2, 4, 3, 9], [3, 2, 4, 8, 9], [3, 2, 4, 8, 9], [2, 3, 4, 8, 9]

1 def selection_sort(A, i = None): # T(i)

2 ’’’Sort A[:i + 1]’’’
3 if i is None: i = len(A) - 1 # O(1)
4 if i > 0: # O(1)
5 j = prefix_max(A, i) # S(i)
6 A[i], A[j] = A[j], A[i] # O(1)
7 selection_sort(A, i - 1) # T(i - 1)
8
9 def prefix_max(A, i): # S(i)
10 ’’’Return index of maximum in A[:i + 1]’’’
11 if i > 0: # O(1)
12 j = prefix_max(A, i - 1) # S(i - 1)
13 if A[i] < A[j]: # O(1)
14 return j # O(1)
15 return i # O(1)

• prefix max analysis:

– Base case: for i = 0, array has one element, so index of max is i

– Induction: assume correct for i, maximum is either the maximum of A[:i] or A[i],
returns correct index in either case.
– S(1) = Θ(1), S(n) = S(n − 1) + Θ(1)
∗ Substitution: S(n) = Θ(n), cn = Θ(1) + c(n − 1) =⇒ 1 = Θ(1)
∗ Recurrence tree: chain of n nodes with Θ(1) work per node, n−1
P
i=0 1 = Θ(n)

• selection sort analysis:

– Base case: for i = 0, array has one element so is sorted

– Induction: assume correct for i, last number of a sorted output is a largest number of
the array, and the algorithm puts one there; then A[:i] is sorted by induction
– T (1) = Θ(1), T (n) = T (n − 1) + Θ(n)
∗ Substitution: T (n) = Θ(n2 ), cn2 = Θ(n) + c(n − 1)2 =⇒ c(2n − 1) = Θ(n)
∗ Recurrence tree: chain of n nodes with Θ(i) work per node, n−1 2
P
i=0 i = Θ(n )
4 Lecture 3: Sorting

Insertion Sort
• Recursively sort preﬁx A[:i]

• Sort preﬁx A[:i + 1] assuming that preﬁx A[:i] is sorted by repeated swaps

• Example: [8, 2, 4, 9, 3], [2, 8, 4, 9, 3], [2, 4, 8, 9, 3], [2, 4, 8, 9, 3], [2, 3, 4, 8, 9]
1 def insertion_sort(A, i = None): # T(i)
2 ’’’Sort A[:i + 1]’’’
3 if i is None: i = len(A) - 1 # O(1)
4 if i > 0: # O(1)
5 insertion_sort(A, i - 1) # T(i - 1)
6 insert_last(A, i) # S(i)
7
8 def insert_last(A, i): # S(i)
9 ’’’Sort A[:i + 1] assuming sorted A[:i]’’’
10 if i > 0 and A[i] < A[i - 1]: # O(1)
11 A[i], A[i - 1] = A[i - 1], A[i] # O(1)
12 insert_last(A, i - 1) # S(i - 1)

• insert last analysis:

– Base case: for i = 0, array has one element so is sorted

– Induction: assume correct for i, if A[i] >= A[i - 1], array is sorted; otherwise,
swapping last two elements allows us to sort A[:i] by induction
– S(1) = Θ(1), S(n) = S(n − 1) + Θ(1) =⇒ S(n) = Θ(n)

• insertion sort analysis:

– Base case: for i = 0, array has one element so is sorted

– Induction: assume correct for i, algorithm sorts A[:i] by induction, and then
insert last correctly sorts the rest as proved above

– T (1) = Θ(1), T (n) = T (n − 1) + Θ(n) =⇒ T (n) = Θ(n2 )

Lecture 3: Sorting 5

Merge Sort
• Recursively sort ﬁrst half and second half (may assume power of two)
• Merge sorted halves into one sorted list (two ﬁnger algorithm)
• Example: [7, 1, 5, 6, 2, 4, 9, 3], [1, 7, 5, 6, 2, 4, 3, 9], [1, 5, 6, 7, 2, 3, 4, 9], [1, 2, 3, 4, 5, 6, 7, 9]
1 def merge_sort(A, a = 0, b = None): # T(b - a = n)
2 ’’’Sort A[a:b]’’’
3 if b is None: b = len(A) # O(1)
4 if 1 < b - a: # O(1)
5 c = (a + b + 1) // 2 # O(1)
6 merge_sort(A, a, c) # T(n / 2)
7 merge_sort(A, c, b) # T(n / 2)
8 L, R = A[a:c], A[c:b] # O(n)
9 merge(L, R, A, len(L), len(R), a, b) # S(n)
10
11 def merge(L, R, A, i, j, a, b): # S(b - a = n)
12 ’’’Merge sorted L[:i] and R[:j] into A[a:b]’’’
13 if a < b: # O(1)
14 if (j <= 0) or (i > 0 and L[i - 1] > R[j - 1]): # O(1)
15 A[b - 1] = L[i - 1] # O(1)
16 i = i - 1 # O(1)
17 else: # O(1)
18 A[b - 1] = R[j - 1] # O(1)
19 j = j - 1 # O(1)
20 merge(L, R, A, i, j, a, b - 1) # S(n - 1)

• merge analysis:
– Base case: for n = 0, arrays are empty, so vacuously correct
– Induction: assume correct for n, item in A[r] must be a largest number from remaining
preﬁxes of L and R, and since they are sorted, taking largest of last items sufﬁces;
remainder is merged by induction
– S(0) = Θ(1), S(n) = S(n − 1) + Θ(1) =⇒ S(n) = Θ(n)
• merge sort analysis:
– Base case: for n = 1, array has one element so is sorted
– Induction: assume correct for k < n, algorithm sorts smaller halves by induction, and
then merge merges into a sorted array as proved above.
– T (1) = Θ(1), T (n) = 2T (n/2) + Θ(n)
∗ Substitution: Guess T (n) = Θ(n log n)
cn log n = Θ(n) + 2c(n/2) log(n/2) =⇒ cn log(2) = Θ(n)
with depth log2 n andPn leaves, level i has 2i
∗ Recurrence Tree: complete binary treeP
nodes with O(n/2i ) work each, total: log 2n i i
i=0 (2 )(n/2 ) =
log2 n
i=0 n = Θ(n log n)
MIT OpenCourseWare
https://ocw.mit.edu

6.006 Introduction to Algorithms

Spring 2020

Lecture 4: Hashing

Review
Operations O(·)
Container Static Dynamic Order
Data Structure build(X) find(k) insert(x) find min() find prev(k)

delete(k) find max() find next(k)

Array n n n n n
Sorted Array n log n log n n 1 log n

• Idea! Want faster search and dynamic operations. Can we find(k) faster than Θ(log n)?

• Answer is no (lower bound)! (But actually, yes...!?)

Comparison Model
• In this model, assume algorithm can only differentiate items via comparisons

• Comparable items: black boxes only supporting comparisons between pairs

• Comparisons are <, ≤, >, ≥, =, =

6 , outputs are binary: True or False

• Goal: Store a set of n comparable items, support find(k) operation

• Running time is lower bounded by # comparisons performed, so count comparisons!

Decision Tree
• Any algorithm can be viewed as a decision tree of operations performed

• An internal node represents a binary comparison, branching either True or False

• For a comparison algorithm, the decision tree is binary (draw example)

• A leaf represents algorithm termination, resulting in an algorithm output

• A root-to-leaf path represents an execution of the algorithm on some input

• Need at least one leaf for each algorithm output, so search requires ≥ n + 1 leaves
2 Lecture 4: Hashing

Comparison Search Lower Bound

• What is worst-case running time of a comparison search algorithm?

• running time ≥ # comparisons ≥ max length of any root-to-leaf path ≥ height of tree

• What is minimum height of any binary tree on ≥ n nodes?

• Minimum height when binary tree is complete (all rows full except last)

• Height ≥ dlg(n + 1)e − 1 = Ω(log n), so running time of any comparison sort is Ω(log n)

• Sorted arrays achieve this bound! Yay!

• More generally, height of tree with Θ(n) leaves and max branching factor b is Ω(logb n)

• To get faster, need an operation that allows super-constant ω(1) branching factor. How??

Direct Access Array

• Exploit Word-RAM O(1) time random access indexing! Linear branching factor!

• Idea! Give item unique integer key k in {0, . . . , u − 1}, store item in an array at index k

• Associate a meaning with each index of array

• If keys ﬁt in a machine word, i.e. u ≤ 2w , worst-case O(1) ﬁnd/dynamic operations! Yay!

• 6.006: assume input numbers/strings ﬁt in a word, unless length explicitly parameterized

• Anything in computer memory is a binary integer, or use (static) 64-bit address in memory

• But space O(u), so really bad if n u... :(

• Example: if keys are ten-letter names, for one bit per name, requires 2610 ≈ 17.6 TB space

• How can we use less space?

Hashing
• Idea! If n u, map keys to a smaller range m = Θ(n) and use smaller direct access array

• Hash function: h(k) : {0, . . . , u − 1} → {0, . . . , m − 1} (also hash map)

• Direct access array called hash table, h(k) called the hash of key k

• If m u, no hash function is injective by pigeonhole principle

Lecture 4: Hashing 3

• Always exists keys a, b such that h(a) = h(b) → Collision! :(

• Can’t store both items at same index, so where to store? Either:

– store somewhere else in the array (open addressing)

∗ complicated analysis, but common and practical
– store in another data structure supporting dynamic set interface (chaining)

Chaining
• Idea! Store collisions in another data structure (a chain)

• If keys roughly evenly distributed over indices, chain size is n/m = n/Ω(n) = O(1)!

• If chain has O(1) size, all operations take O(1) time! Yay!

• If not, many items may map to same location, e.g. h(k) = constant, chain size is Θ(n) :(

• Need good hash function! So what’s a good hash function?

Hash Functions

Division (bad): h(k) = (k mod m)

• Heuristic, good when keys are uniformly distributed!

• m should avoid symmetries of the stored keys

• Large primes far from powers of 2 and 10 can be reasonable

• Python uses a version of this with some additional mixing

• If u n, every hash function will have some input set that will a create O(n) size chain

• Idea! Don’t use a ﬁxed hash function! Choose one randomly (but carefully)!
4 Lecture 4: Hashing

Universal (good, theoretically): hab (k) = (((ak + b) mod p) mod m)

• Hash Family H(p, m) = {hab | a, b ∈ {0, . . . , p − 1} and a 6= 0}

• Parameterized by a ﬁxed prime p > u, with a and b chosen from range {0, . . . , p − 1}

• H is a Universal family: Pr {h(ki ) = h(kj )} ≤ 1/m ∀ki =

6 kj ∈ {0, . . . , u − 1}
h∈H

• Why is universality useful? Implies short chain lengths! (in expectation)

• Xij indicator random variable over h ∈ H: Xij = 1 if h(ki ) = h(kj ), Xij = 0 otherwise
P
• Size of chain at index h(ki ) is random variable Xi = j Xij

• Expected size of chain at index h(ki )

( )
X X X
E {Xi } = E Xij = E {Xij } = 1 + E {Xij }
h∈H h∈H h∈H h∈H
j j 6 i
j=
X
=1+ (1) Pr {h(ki ) = h(kj )} + (0) Pr {h(ki ) =
6 h(kj )}
h∈H h∈H
j6=i
X
≤1+ 1/m = 1 + (n − 1)/m
j6=i

• Since m = Ω(n), load factor α = n/m = O(1), so O(1) in expectation!

Dynamic
• If n/m far from 1, rebuild with new randomly chosen hash function for new size m

• Same analysis as dynamic arrays, cost can be amortized over many dynamic operations

• So a hash table can implement dynamic set operations in expected amortized O(1) time! :)

Operations O(·)
Container Static Dynamic Order
Data Structure build(X) find(k) insert(x) find min() find prev(k)

delete(k) find max() find next(k)

Array n n n n n
Sorted Array n log n log n n 1 log n
Direct Access Array u 1 1 u u
Hash Table n(e) 1(e) 1(a)(e) n n
MIT OpenCourseWare
https://ocw.mit.edu

6.006 Introduction to Algorithms

Spring 2020

Lecture 5: Linear Sorting

Review
• Comparison search lower bound: any decision tree with n nodes has height ≥ dlg(n+1)e−1

• Can do faster using random access indexing: an operation with linear branching factor!

• Direct access array is fast, but may use a lot of space (Θ(u))

• Solve space problem by mapping (hashing) key space u down to m = Θ(n)

• Hash tables give expected O(1) time operations, amortized if dynamic

• Expectation input-independent: choose hash function randomly from universal hash family

• Data structure overview!

• Last time we achieved faster ﬁnd. Can we also achieve faster sort?

Operations O(·)
Container Static Dynamic Order
Data Structure build(X) find(k) insert(x) find min() find prev(k)

delete(k) find max() find next(k)

Array n n n n n
Sorted Array n log n log n n 1 log n
Direct Access Array u 1 1 u u
Hash Table n(e) 1(e) 1(a)(e) n n
2 Lecture 5: Linear Sorting

Comparison Sort Lower Bound

• Comparison model implies that algorithm decision tree is binary (constant branching factor)
• Requires # leaves L ≥ # possible outputs
• Tree height lower bounded by Ω(log L), so worst-case running time is Ω(log L)
• To sort array of n elements, # outputs is n! permutations
• Thus height lower bounded by log(n!) ≥ log((n/2)n/2 ) = Ω(n log n)
• So merge sort is optimal in comparison model
• Can we exploit a direct access array to sort faster?

Direct Access Array Sort

• Example: [5, 2, 7, 0, 4]
• Suppose all keys are unique non-negative integers in range {0, . . . , u − 1}, so n ≤ u
• Insert each item into a direct access array with size u in Θ(n)
• Return items in order they appear in direct access array in Θ(u)
• Running time is Θ(u), which is Θ(n) if u = Θ(n). Yay!

1 def direct_access_sort(A):
2 "Sort A assuming items have distinct non-negative keys"
3 u = 1 + max([x.key for x in A]) # O(n) find maximum key
4 D = [None] * u # O(u) direct access array
5 for x in A: # O(n) insert items
6 D[x.key] = x
7 i = 0
8 for key in range(u): # O(u) read out items in order
9 if D[key] is not None:
10 A[i] = D[key]
11 i += 1

• What if keys are in larger range, like u = Ω(n2 ) < n2 ?

• Idea! Represent each key k by tuple (a, b) where k = an + b and 0 ≤ b < n
• Speciﬁcally a = bk/nc < n and b = (k mod n) (just a 2-digit base-n number!)
• This is a built-in Python operation (a, b) = divmod(k, n)
• Example: [17, 3, 24, 22, 12] ⇒ [(3,2), (0,3), (4,4), (4,2), (2,2)] ⇒ [32, 03, 44, 42, 22](n=5)
• How can we sort tuples?
Lecture 5: Linear Sorting 3

Tuple Sort
• Item keys are tuples of equal length, i.e. item x.key = (x.k1 , x.k2 , x.k2 , . . .).
• Want to sort on all entries lexicographically, so first key k1 is most significant
• How to sort? Idea! Use other auxiliary sorting algorithms to separately sort each key
• (Like sorting rows in a spreadsheet by multiple columns)
• What order to sort them in? Least significant to most significant!
• Exercise: [32, 03, 44, 42, 22] =⇒ [42, 22, 32, 03, 44] =⇒ [03, 22, 32, 42, 44](n=5)

• Idea! Use tuple sort with auxiliary direct access array sort to sort tuples (a, b).
• Problem! Many integers could have the same a or b value, even if input keys distinct
• Need sort allowing repeated keys which preserves input order
• Want sort to be stable: repeated keys appear in output in same order as input
• Direct access array sort cannot even sort arrays having repeated keys!
• Can we modify direct access array sort to admit multiple keys in a way that is stable?

Counting Sort
• Instead of storing a single item at each array index, store a chain, just like hashing!
• For stability, chain data structure should remember the order in which items were added
• Use a sequence data structure which maintains insertion order
• To insert item x, insert last to end of the chain at index x.key
• Then to sort, read through all chains in sequence order, returning items one by one

1 def counting_sort(A):
2 "Sort A assuming items have non-negative keys"
3 u = 1 + max([x.key for x in A]) # O(n) find maximum key
4 D = [[] for i in range(u)] # O(u) direct access array of chains
5 for x in A: # O(n) insert into chain at x.key
6 D[x.key].append(x)
7 i = 0
8 for chain in D: # O(u) read out items in order
9 for x in chain:
10 A[i] = x
11 i += 1
4 Lecture 5: Linear Sorting

Radix Sort
• Idea! If u < n2 , use tuple sort with auxiliary counting sort to sort tuples (a, b)
• Sort least signiﬁcant key b, then most signiﬁcant key a
• Stability ensures previous sorts stay sorted
• Running time for this algorithm is O(2n) = O(n). Yay!
• If every key < nc for some positive c = logn (u), every key has at most c digits base n
• A c-digit number can be written as a c-element tuple in O(c) time
• We sort each of the c base-n digits in O(n) time
• So tuple sort with auxiliary counting sort runs in O(cn) time in total
• If c is constant, so each key is ≤ nc , this sort is linear O(n)!

1 def radix_sort(A):
2 "Sort A assuming items have non-negative keys"
3 n = len(A)
4 u = 1 + max([x.key for x in A]) # O(n) find maximum key
5 c = 1 + (u.bit_length() // n.bit_length())
6 class Obj: pass
7 D = [Obj() for a in A]
8 for i in range(n): # O(nc) make digit tuples
9 D[i].digits = []
10 D[i].item = A[i]
11 high = A[i].key
12 for j in range(c): # O(c) make digit tuple
13 high, low = divmod(high, n)
14 D[i].digits.append(low)
15 for i in range(c): # O(nc) sort each digit
16 for j in range(n): # O(n) assign key i to tuples
17 D[j].key = D[j].digits[i]
18 counting_sort(D) # O(n) sort on digit i
19 for i in range(n): # O(n) output to A
20 A[i] = D[i].item

Algorithm Time O(·) In-place? Stable? Comments

Insertion Sort n2 Y Y O(nk) for k-proximate
Selection Sort n2 Y N O(n) swaps
Merge Sort n log n N Y stable, optimal comparison
Counting Sort n+u N Y O(n) when u = O(n)
Radix Sort n + n logn (u) N Y O(n) when u = O(nc )
MIT OpenCourseWare
https://ocw.mit.edu

6.006 Introduction to Algorithms

Spring 2020

Lecture 6: Binary Trees I

Previously and New Goal
Operations O(·)
Sequence Container Static Dynamic
Data Structure build(X) get at(i) insert first(x) insert last(x) insert at(i, x)
set at(i,x) delete first() delete last() delete at(i)
Array n 1 n n n
Linked List n n 1 n n
Dynamic Array n 1 n 1(a) n
Goal n log n log n log n log n
Operations O(·)
Set Container Static Dynamic Order
Data Structure build(X) find(k) insert(x) find min() find prev(k)
delete(k) find max() find next(k)
Array n n n n n
Sorted Array n log n log n n 1 log n
Direct Access Array u 1 1 u u
Hash Table n(e) 1(e) 1(a)(e) n n
Goal n log n log n log n log n log n

How? Binary Trees!

• Pointer-based data structures (like Linked List) can achieve worst-case performance

• Binary tree is pointer-based data structure with three pointers per node

• Node representation: node.{item, parent, left, right}

• Example:

1 ____<A>_ node | <A> | | <C> | <D> | <E> | <F> |

2 _______ <C> item | A | B | C | D | E | F |
3 __<D> <E> parent | - | <A> | <A> | | | <D> |
4 <F> left | | <C> | - | <F> | - | - |
5 right | <C> | <D> | - | - | - | - |
2 Lecture 6: Binary Trees I

Terminology
• The root of a tree has no parent (Ex: <A>)

• A leaf of a tree has no children (Ex: <C>, <E>, and <F>)

• Deﬁne depth(<X>) of node <X> in a tree rooted at <R> to be length of path from <X> to <R>

• Deﬁne height(<X>) of node <X> to be max depth of any node in the subtree rooted at <X>

• Idea: Design operations to run in O(h) time for root height h, and maintain h = O(log n)

• A binary tree has an inherent order: its traversal order

– every node in node <X>’s left subtree is before <X>

– every node in node <X>’s right subtree is after <X>

• List nodes in traversal order via a recursive algorithm starting at root:

– Recursively list left subtree, list self, then recursively list right subtree
– Runs in O(n) time, since O(1) work is done to list each node
– Example: Traversal order is (<F>, <D>, , <E>, <A>, <C>)

• Right now, traversal order has no meaning relative to the stored items

• Later, assign semantic meaning to traversal order to implement Sequence/Set interfaces

Tree Navigation
• Find ﬁrst node in the traversal order of node <X>’s subtree (last is symmetric)

– If <X> has left child, recursively return the first node in the left subtree
– Otherwise, <X> is the first node, so return it
– Running time is O(h) where h is the height of the tree
– Example: first node in <A>’s subtree is <F>

• Find successor of node <X> in the traversal order (predecessor is symmetric)

– If <X> has right child, return ﬁrst of right subtree

– Otherwise, return lowest ancestor of <X> for which <X> is in its left subtree
– Running time is O(h) where h is the height of the tree
– Example: Successor of: is <E>, <E> is <A>, and <C> is None
Lecture 6: Binary Trees I 3

Dynamic Operations
• Change the tree by a single item (only add or remove leaves):

– add a node after another in the traversal order (before is symmetric)

– remove an item from the tree

• Insert node <Y> after node <X> in the traversal order

– If <X> has no right child, make <Y> the right child of <X>
– Otherwise, make <Y> the left child of <X>’s successor (which cannot have a left child)
– Running time is O(h) where h is the height of the tree

• Example: Insert node <G> before <E> in traversal order

1 _____<A>__ ________<A>__
2 ____ <C> => _______ <C>
3 __<D> <E> __<D> __<E>
4 <F> <F> <G>

• Example: Insert node <H> after <A> in traversal order

1 ________<A>___ ________<A>_____
2 _______ <C> => _______ __<C>
3 __<D> __<E> __<D> __<E> <H>
4 <F> <G> <F> <G>

• Delete the item in node <X> from <X>’s subtree

– If <X> is a leaf, detach from parent and return

– Otherwise, <X> has a child
∗ If <X> has a left child, swap items with the predecessor of <X> and recurse
∗ Otherwise <X> has a right child, swap items with the successor of <X> and recurse
– Running time is O(h) where h is the height of the tree
– Example: Remove <F> (a leaf)
1 ________<A>_____ ________<A>_____
2 _______ __<C> => _______ __<C>
3 __<D> __<E> <H> <D> __<E> <H>
4 <F> <G> <G>

– Example: Remove <A> (not a leaf, so ﬁrst swap down to a leaf)

1 ________<A>_____ ________<E>_____ _____<E>_____
2 _______ __<C> => _______ __<C> => ____ __<C>
3 <D> __<E> <H> <D> __<G> <H> <D> <G> <H>
4 <G> <A>
4 Lecture 6: Binary Trees I

Application: Set
• Idea! Set Binary Tree (a.k.a. Binary Search Tree / BST):
Traversal order is sorted order increasing by key

– Equivalent to BST Property: for every node, every key in left subtree ≤ node’s key ≤
every key in right subtree

• Then can ﬁnd the node with key k in node <X>’s subtree in O(h) time like binary search:

– If k is smaller than the key at <X>, recurse in left subtree (or return None)
– If k is larger than the key at <X>, recurse in right subtree (or return None)
– Otherwise, return the item stored at <X>

• Other Set operations follow a similar pattern; see recitation

Application: Sequence
• Idea! Sequence Binary Tree: Traversal order is sequence order

• How do we ﬁnd ith node in traversal order of a subtree? Call this operation subtree at(i)

• Could just iterate through entire traversal order, but that’s bad, O(n)

• However, if we could compute a subtree’s size in O(1), then can solve in O(h) time

– How? Check the size nL of the left subtree and compare to i

– If i < nL , recurse on the left subtree
– If i > nL , recurse on the right subtree with i0 = i − nL − 1
– Otherwise, i = nL , and you’ve reached the desired node!

• Maintain the size of each node’s subtree at the node via augmentation

– Add node.size ﬁeld to each node

– When adding new leaf, add +1 to a.size for all ancestors a in O(h) time
– When deleting a leaf, add −1 to a.size for all ancestors a in O(h) time

• Sequence operations follow directly from a fast subtree at(i) operation

• Naively, build(X) takes O(nh) time, but can be done in O(n) time; see recitation
Lecture 6: Binary Trees I 5

So Far
Operations O(·)
Set Container Static Dynamic Order
Data Structure build(X) find(k) insert(x) find min() find prev(k)
delete(k) find max() find next(k)
Binary Tree n log n h h h h
Goal n log n log n log n log n log n
Operations O(·)
Sequence Container Static Dynamic
Data Structure build(X) get at(i) insert first(x) insert last(x) insert at(i, x)
set at(i,x) delete first() delete last() delete at(i)
Binary Tree n h h h h
Goal n log n log n log n log n

Next Time
• Keep a binary tree balanced after insertion or deletion

• Reduce O(h) running times to O(log n) by keeping h = O(log n)

MIT OpenCourseWare
https://ocw.mit.edu

6.006 Introduction to Algorithms

Spring 2020

Lecture 7: Binary Trees II: AVL

Last Time and Today’s Goal

Operations O(·)
Sequence Container Static Dynamic
Data Structure build(X) get at(i) insert first(x) insert last(x) insert at(i, x)
set at(i,x) delete first() delete last() delete at(i)
Binary Tree n h h h h
AVL Tree n log n log n log n log n
Operations O(·)
Set Container Static Dynamic Order
Data Structure build(X) find(k) insert(x) find min() find prev(k)
delete(k) find max() find next(k)
Binary Tree n log n h h h h
AVL Tree n log n log n log n log n log n

Height Balance
• How to maintain height h = O(log n) where n is number of nodes in tree?

• A binary tree that maintains O(log n) height under dynamic operations is called balanced

– There are many balancing schemes (Red-Black Trees, Splay Trees, 2-3 Trees, . . . )
– First proposed balancing scheme was the AVL Tree (Adelson-Velsky and Landis, 1962)

Rotations
• Need to reduce height of tree without changing its traversal order, so that we represent the
same sequence of items

• How to change the structure of a tree, while preserving traversal order? Rotations!

1 ___<D> rotate_right(<D>) ___

2 ____ <E> => <A> __<D>__
3 <A> <C> / \ / \ <C> <E>
4 / \ / \ /___\ <= /___\ / \ / \
5 /___\ /___\ rotate_left() /___\ /___\

• A rotation relinks O(1) pointers to modify tree structure and maintains traversal order
2 Lecture 7: Binary Trees II: AVL

Rotations Sufﬁce
• Claim: O(n) rotations can transform a binary tree to any other with same traversal order.

• Proof: Repeatedly perform last possible right rotation in traversal order; resulting tree is a
canonical chain. Each rotation increases depth of the last node by 1. Depth of last node in
ﬁnal chain is n − 1, so at most n − 1 rotations are performed. Reverse canonical rotations to
reach target tree.

• Can maintain height-balance by using O(n) rotations to fully balance the tree, but slow :(

• We will keep the tree balanced in O(log n) time per operation!

AVL Trees: Height Balance

• AVL trees maintain height-balance (also called the AVL Property)

– A node is height-balanced if heights of its left and right subtrees differ by at most 1
– Let skew of a node be the height of its right subtree minus that of its left subtree
– Then a node is height-balanced if its skew is −1, 0, or 1

• Claim: A binary tree with height-balanced nodes has height h = O(log n) (i.e., n = 2Ω(h) )

• Proof: Sufﬁces to show fewest nodes F (h) in any height h tree is F (h) = 2Ω(h)

F (0) = 1, F (1) = 2, F (h) = 1+F (h−1)+F (h−2) ≥ 2F (h−2) =⇒ F (h) ≥ 2h/2

• Suppose adding or removing leaf from a height-balanced tree results in imbalance

– Only subtrees of the leaf’s ancestors have changed in height or skew

– Heights changed by only ±1, so skews still have magnitude ≤ 2
– Idea: Fix height-balance of ancestors starting from leaf up to the root
– Repeatedly rebalance lowest ancestor that is not height-balanced, wlog assume skew 2
Lecture 7: Binary Trees II: AVL 3

• Local Rebalance: Given binary tree node :

– whose skew 2 and
– every other node in ’s subtree is height-balanced,
– then ’s subtree can be made height-balanced via one or two rotations
– (after which ’s height is the same or one less than before)
• Proof:
– Since skew of is 2, ’s right child <F> exists
– Case 1: skew of <F> is 0 or Case 2: skew of <F> is 1
∗ Perform a left rotation on

1 ________ ______<F>____
2 <A> ___<F>___ _____ <G>
3 / \ <D> <G> => <A> <D> / \
4 /___\ / \ / \ / \ / \ / \
5 /___\ / \ /___\ /___\ /_____\
6 /_____\ /_____\ /_____\

∗ Let h = height(<A>). Then height(<G>) = h + 1 and height(<D>) is h + 1 in

Case 1, h in Case 2
∗ After rotation:
· the skew of is either 1 in Case 1 or 0 in Case 2, so is height balanced
· the skew of <F> is −1, so <F> is height balanced
· the height of before is h + 3, then after is h + 3 in Case 1, h + 2 in Case 2
– Case 3: skew of <F> is −1, so the left child <D> of <F> exists
∗ Perform a right rotation on <F>, then a left rotation on

1 _____________ _____<D>______
2 <A> _____<F>__ ____ __<F>__
3 / \ __<D>__ <G> => <A> <C> <E> <G>
4 /___\ <C> <E> / \ / \ /_\ /_\ / \
5 /_\ /_\ /___\ /___\ /___\ /___\ /___\
6 /___\ /___\

∗ Let h = height(<A>). Then height(<G>) = h while height(<C>) and height(<E>)

are each either h or h − 1
∗ After rotation:
· the skew of is either 0 or −1, so is height balanced
· the skew of <F> is either 0 or 1, so <F> is height balanced
· the skew of <D> is 0, so D is height balanced
· the height of is h + 3 before, then after is h + 2
4 Lecture 7: Binary Trees II: AVL

• Global Rebalance: Add or remove a leaf from height-balanced tree T to produce tree T 0 .
Then T 0 can be transformed into a height-balanced tree T 00 using at most O(log n) rotations.
• Proof:
– Only ancestors of the affected leaf have different height in T 0 than in T
– Affected leaf has at most h = O(log n) ancestors whose subtrees may have changed
– Let <X> be lowest ancestor that is not height-balanced (with skew magnitude 2)
– If a leaf was added into T :
∗ Insertion increases height of <X>, so in Case 2 or 3 of Local Rebalancing
∗ Rotation decreases subtree height: balanced after one rotation
– If a leaf was removed from T :
∗ Deletion decreased height of one child of <X>, not <X>, so only imbalance
∗ Could decrease height of <X> by 1; parent of <X> may now be imbalanced
∗ So may have to rebalance every ancestor of <X>, but at most h = O(log n) of them
• So can maintain height-balance using only O(log n) rotations after insertion/deletion!
• But requires us to evaluate whether possibly O(log n) nodes were height-balanced

Computing Height
• How to tell whether node <X> is height-balanced? Compute heights of subtrees!
• How to compute the height of node <X>? Naive algorithm:
– Recursively compute height of the left and right subtrees of <X>
– Add 1 to the max of the two heights
– Runs in Ω(n) time, since we recurse on every node :(
• Idea: Augment each node with the height of its subtree! (Save for later!)
• Height of <X> can be computed in O(1) time from the heights of its children:
– Look up the stored heights of left and right subtrees in O(1) time
– Add 1 to the max of the two heights
• During dynamic operations, we must maintain our augmentation as the tree changes shape
• Recompute subtree augmentations at every node whose subtree changes:
– Update relinked nodes in a rotation operation in O(1) time (ancestors don’t change)
– Update all ancestors of an inserted or deleted node in O(h) time by walking up the tree
Lecture 7: Binary Trees II: AVL 5

Steps to Augment a Binary Tree

• In general, to augment a binary tree with a subtree property P, you must:

– State the subtree property P(<X>) you want to store at each node <X>
– Show how to compute P(<X>) from the augmentations of <X>’s children in O(1) time

• Then stored property P(<X>) can be maintained without changing dynamic operation costs

Application: Sequence
• For sequence binary tree, we needed to know subtree sizes

• For just inserting/deleting a leaf, this was easy, but now need to handle rotations

• Subtree size is a subtree property, so can maintain via augmentation

– Can compute size from sizes of children by summing them and adding 1

Conclusion
• Set AVL trees achieve O(lg n) time for all set operations,
except O(n log n) time for build and O(n) time for iter

• Sequence AVL trees achieve O(lg n) time for all sequence operations,
except O(n) time for build and iter

Application: Sorting
• Any Set data structure deﬁnes a sorting algorithm: build (or repeatedly insert) then iter

• For example, Direct Access Array Sort from Lecture 5

• AVL Sort is a new O(n lg n)-time sorting algorithm

MIT OpenCourseWare
https://ocw.mit.edu

6.006 Introduction to Algorithms

Spring 2020

Lecture 8: Binary Heaps

Priority Queue Interface
• Keep track of many items, quickly access/remove the most important
– Example: router with limited bandwidth, must prioritize certain kinds of messages
– Example: process scheduling in operating system kernels
– Example: discrete-event simulation (when is next occurring event?)
– Example: graph algorithms (later in the course)
• Order items by key = priority so Set interface (not Sequence interface)
• Optimized for a particular subset of Set operations:
build(X) build priority queue from iterable X
insert(x) add item x to data structure
delete max() remove and return stored item with largest key
find max() return stored item with largest key
• (Usually optimized for max or min, not both)
• Focus on insert and delete max operations: build can repeatedly insert;
find max() can insert(delete min())

Priority Queue Sort

• Any priority queue data structure translates into a sorting algorithm:
– build(A), e.g., insert items one by one in input order
– Repeatedly delete min() (or delete max()) to determine (reverse) sorted order
• All the hard work happens inside the data structure
• Running time is Tbuild + n · Tdelete max ≤ n · Tinsert + n · Tdelete max

• Many sorting algorithms we’ve seen can be viewed as priority queue sort:
Priority Queue Operations O(·) Priority Queue Sort
Data Structure build(A) insert(x) delete max() Time In-place?
2
Dynamic Array n 1(a) n n Y Selection Sort
Sorted Dynamic Array n log n n 1(a) n2 Y Insertion Sort
Set AVL Tree n log n log n log n n log n N AVL Sort
Goal n log n(a) log n(a) n log n Y Heap Sort
2 Lecture 8: Binary Heaps

Priority Queue: Set AVL Tree

• Set AVL trees support insert(x), find min(), find max(), delete min(), and
delete max() in O(log n) time per operation

• So priority queue sort runs in O(n log n) time

– This is (essentially) AVL sort from Lecture 7

• Can speed up find min() and find max() to O(1) time via subtree augmentation

• But this data structure is complicated and resulting sort is not in-place

• Is there a simpler data structure for just priority queue, and in-place O(n lg n) sort?
YES, binary heap and heap sort

• Essentially implement a Set data structure on top of a Sequence data structure (array), using
what we learned about binary trees

Priority Queue: Array

• Store elements in an unordered dynamic array

• insert(x): append x to end in amortized O(1) time

• delete max(): ﬁnd max in O(n), swap max to the end and remove

• insert is quick, but delete max is slow

• Priority queue sort is selection sort! (plus some copying)

Priority Queue: Sorted Array

• Store elements in a sorted dynamic array

• insert(x): append x to end, swap down to sorted position in O(n) time

• delete max(): delete from end in O(1) amortized

• delete max is quick, but insert is slow

• Priority queue sort is insertion sort! (plus some copying)

• Can we ﬁnd a compromise between these two array priority queue extremes?
Lecture 8: Binary Heaps 3

Array as a Complete Binary Tree

• Idea: interpret an array as a complete binary tree, with maximum 2i nodes at depth i except
at the largest depth, where all nodes are left-aligned

1 d0 ______O____
2 d1 ____O____ __O__
3 d2 __O__ __O O O
4 d3 O O O

• Equivalently, complete tree is ﬁlled densely in reading order: root to leaves, left to right

• Perspective: bijection between arrays and complete binary trees

1 Q = [0,1,2,3,4,5,6,7,8,9]
2 d0 0 -> ______0____
3 d1 1 2 -> ____1____ __2__
4 d2 3 4 5 6 -> __3__ __4 5 6
5 d3 7 8 9 -> 7 8 9

• Height of complete tree perspective of array of n item is dlg ne, so balanced binary tree

Implicit Complete Tree

• Complete binary tree structure can be implicit instead of storing pointers

• Root is at index 0

• Compute neighbors by index arithmetic:

left(i) = 2i + 1
right(i) = 2i + 2

i−1
parent(i) =
2
4 Lecture 8: Binary Heaps

Binary Heaps
• Idea: keep larger elements higher in tree, but only locally

• Max-Heap Property at node i: Q[i] ≥ Q[j] for j ∈ {left(i), right(i)}

• Max-heap is an array satisfying max-heap property at all nodes

• Claim: In a max-heap, every node i satisﬁes Q[i] ≥ Q[j] for all nodes j in subtree(i)

• Proof:

– Induction on d = depth(j) − depth(i)

– Base case: d = 0 implies i = j implies Q[i] ≥ Q[j] (in fact, equal)
– depth(parent(j)) − depth(i) = d − 1 < d, so Q[i] ≥ Q[parent(j)] by induction
– Q[parent(j)] ≥ Q[j] by Max-Heap Property at parent(j)

• In particular, max item is at root of max-heap

Heap Insert
• Append new item x to end of array in O(1) amortized, making it next leaf i in reading order

• max heapify up(i): swap with parent until Max-Heap Property

– Check whether Q[parent(i)] ≥ Q[i] (part of Max-Heap Property at parent(i))

– If not, swap items Q[i] and Q[parent(i)], and recursively max heapify up(parent(i))

• Correctness:

– Max-Heap Property guarantees all nodes ≥ descendants, except Q[i] might be > some
of its ancestors (unless i is the root, so we’re done)
– If swap necessary, same guarantee is true with Q[parent(i)] instead of Q[i]

• Running time: height of tree, so Θ(log n)!

Lecture 8: Binary Heaps 5

Heap Delete Max

• Can only easily remove last element from dynamic array, but max key is in root of tree

• So swap item at root node i = 0 with last item at node n − 1 in heap array

• max heapify down(i): swap root with larger child until Max-Heap Property

– Check whether Q[i] ≥ Q[j] for j ∈ {left(i), right(i)} (Max-Heap Property at i)

– If not, swap Q[i] with Q[j] for child j ∈ {left(i), right(i)} with maximum key, and
recursively max heapify down(j)

• Correctness:

– Max-Heap Property guarantees all nodes ≥ descendants, except Q[i] might be < some
descendants (unless i is a leaf, so we’re done)
– If swap is necessary, same guarantee is true with Q[j] instead of Q[i]

• Running time: height of tree, so Θ(log n)!

Heap Sort
• Plugging max-heap into priority queue sort gives us a new sorting algorithm

• Running time is O(n log n) because each insert and delete max takes O(log n)

• But often include two improvements to this sorting algorithm:

In-place Priority Queue Sort

• Max-heap Q is a preﬁx of a larger array A, remember how many items |Q| belong to heap

• |Q| is initially zero, eventually |A| (after inserts), then zero again (after deletes)

• insert() absorbs next item in array at index |Q| into heap

• delete max() moves max item to end, then abandons it by decrementing |Q|

• In-place priority queue sort with Array is exactly Selection Sort

• In-place priority queue sort with Sorted Array is exactly Insertion Sort

• In-place priority queue sort with binary Max Heap is Heap Sort
6 Lecture 8: Binary Heaps

Linear Build Heap

• Inserting n items into heap calls max heapify up(i) for i from 0 to n − 1 (root down):
n−1
X n−1
X
worst-case swaps ≈ depth(i) = lg i = lg(n!) ≥ (n/2) lg(n/2) = Ω(n lg n)
i=0 i=0

• Idea! Treat full array as a complete binary tree from start, then max heapify down(i)
for i from n − 1 to 0 (leaves up):
n−1 n−1
nn nn
X X
worst-case swaps ≈ height(i) = (lg n−lg i) = lg = Θ lg √ = O(n)
i=0 i=0
n! n(n/e)n

• So can build heap in O(n) time

• (Doesn’t speed up O(n lg n) performance of heap sort)

Sequence AVL Tree Priority Queue

• Where else have we seen linear build time for an otherwise logarithmic data structure?
Sequence AVL Tree!
• Store items of priority queue in Sequence AVL Tree in arbitrary order (insertion order)
• Maintain max (and/or min) augmentation:
node.max = pointer to node in subtree of node with maximum key

– This is a subtree property, so constant factor overhead to maintain

• find min() and find max() in O(1) time
• delete min() and delete max() in O(log n) time
• build(A) in O(n) time
• Same bounds as binary heaps (and more)

Set vs. Multiset

• While our Set interface assumes no duplicate keys, we can use these Sets to implement
Multisets that allow items with duplicate keys:
– Each item in the Set is a Sequence (e.g., linked list) storing the Multiset items with the
same key, which is the key of the Sequence
• In fact, without this reduction, binary heaps and AVL trees work directly for duplicate-key
items (where e.g. delete max deletes some item of maximum key), taking care to use ≤
constraints (instead of < in Set AVL Trees)
MIT OpenCourseWare
https://ocw.mit.edu

6.006 Introduction to Algorithms

Spring 2020

Lecture 9: Breadth-First Search

New Unit: Graphs!

• Quiz 1 next week covers lectures L01 - L08 on Data Structures and Sorting

• Today, start new unit, lectures L09 - L14 on Graph Algorithms

Graph Applications
• Why? Graphs are everywhere!

• any network system has direct connection to graphs

• e.g., road networks, computer networks, social networks

• the state space of any discrete system can be represented by a transition graph

• e.g., puzzle & games like Chess, Tetris, Rubik’s cube

• e.g., application workﬂows, speciﬁcations

Graph Deﬁnitions

G1 G2 G3
0 1 0 a s d f

2 3 1 2 b c e g

• Graph G = (V, E) is a set of vertices V and a set of pairs of vertices E ⊆ V × V .

• Directed edges are ordered pairs, e.g., (u, v) for u, v ∈ V

• Undirected edges are unordered pairs, e.g., {u, v} for u, v ∈ V i.e., (u, v) and (v, u)

• In this class, we assume all graphs are simple:

– edges are distinct, e.g., (u, v) only occurs once in E (though (v, u) may appear), and
6 v for all (u, v) ∈ E
– edges are pairs of distinct vertices, e.g., u =
– Simple implies |E| = O(|V | ), since |E| ≤ |V2 | for undirected, ≤ 2 |V2 | for directed
2
� �
2 Lecture 9: Breadth-First Search

Neighbor Sets/Adjacencies
• The outgoing neighbor set of u ∈ V is Adj+ (u) = {v ∈ V | (u, v) ∈ E}

• The incoming neighbor set of u ∈ V is Adj− (u) = {v ∈ V | (v, u) ∈ E}

• The out-degree of a vertex u ∈ V is deg+ (u) = |Adj+ (u)|

• The in-degree of a vertex u ∈ V is deg− (u) = |Adj− (u)|

• For undirected graphs, Adj− (u) = Adj+ (u) and deg− (u) = deg+ (u)

• Dropping superscript defaults to outgoing, i.e., Adj(u) = Adj+ (u) and deg(u) = deg+ (u)

Graph Representations
• To store a graph G = (V, E), we need to store the outgoing edges Adj(u) for all u ∈ V

• First, need a Set data structure Adj to map u to Adj(u)

• Then for each u, need to store Adj(u) in another data structure called an adjacency list

• Common to use direct access array or hash table for Adj, since want lookup fast by vertex

• Common to use array or linked list for each Adj(u) since usually only iteration is needed1

• For the common representations, Adj has size Θ(|V |), while each Adj(u) has size Θ(deg(u))
P
• Since u∈V deg(u) ≤ 2|E| by handshaking lemma, graph storable in Θ(|V | + |E|) space

• Thus, for algorithms on graphs, linear time will mean Θ(|V | + |E|) (linear in size of graph)

Examples
• Examples 1 and 2 assume vertices are labeled {0, 1, . . . , |V | − 1}, so can use a direct access
array for Adj, and store Adj(u) in an array. Example 3 uses a hash table for Adj.

Ex 1 (Undirected) | Ex 2 (Directed) | Ex 3 (Undirected)

G1 = [ | G2 = [ | G3 = {
[2, 1], # 0 | [2], # 0 | a: [s, b], b: [a],
[2, 0, 3], # 1 | [2, 0], # 1 | s: [a, c], c: [s, d, e],
[1, 3, 0], # 2 | [1], # 2 | d: [c, e, f], e: [c, d, f],
[1, 2], # 3 | ] | f: [d, e], g: [],
] | | }

• Note that in an undirected graph, connections are symmetric as every edge is outgoing twice

1
A hash table for each Adj(u) can allow checking for an edge (u, v) ∈ E in O(1)(e) time
Lecture 9: Breadth-First Search 3

Paths
• A path is a sequence of vertices p = (v1 , v2 , . . . , vk ) where (vi , vi+1 ) ∈ E for all 1 ≤ i < k.

• A path is simple if it does not repeat vertices2

• The length `(p) of a path p is the number of edges in the path

• The distance δ(u, v) from u ∈ V to v ∈ V is the minimum length of any path from u to v,
i.e., the length of a shortest path from u to v
(by convention, δ(u, v) = ∞ if u is not connected to v)

Graph Path Problems

• There are many problems you might want to solve concerning paths in a graph:

• S INGLE PAIR R EACHABILITY (G, s, t): is there a path in G from s ∈ V to t ∈ V ?

• S INGLE PAIR S HORTEST PATH (G, s, t): return distance δ(s, t), and
a shortest path in G = (V, E) from s ∈ V to t ∈ V

• S INGLE S OURCE S HORTEST PATHS (G, s): return δ(s, v) for all v ∈ V , and
a shortest-path tree containing a shortest path from s to every v ∈ V (deﬁned below)

• Each problem above is at least as hard as every problem above it

(i.e., you can use a black-box that solves a lower problem to solve any higher problem)

• We won’t show algorithms to solve all of these problems

• Instead, show one algorithm that solves the hardest in O(|V | + |E|) time!

Shortest Paths Tree

• How to return a shortest path from source vertex s for every vertex in graph?

• Many paths could have length Ω(|V |), so returning every path could require Ω(|V |2 ) time

• Instead, for all v ∈ V , store its parent P (v): second to last vertex on a shortest path from s

• Let P (s) be null (no second to last vertex on shortest path from s to s)

• Set of parents comprise a shortest paths tree with O(|V |) size!

(i.e., reversed shortest paths back to s from every vertex reachable from s)

2
A path in 6.006 is a “walk” in 6.042. A “path” in 6.042 is a simple path in 6.006.
4 Lecture 9: Breadth-First Search

Breadth-First Search (BFS)

• How to compute δ(s, v) and P (v) for all v ∈ V ?
• Store δ(s, v) and P (v) in Set data structures mapping vertices v to distance and parent
• (If no path from s to v, do not store v in P and set δ(s, v) to ∞)
• Idea! Explore graph nodes in increasing order of distance
• Goal: Compute level sets Li = {v | v ∈ V and d(s, v) = i} (i.e., all vertices at distance i)
• Claim: Every vertex v ∈ Li must be adjacent to a vertex u ∈ Li−1 (i.e., v ∈ Adj(u))
• Claim: No vertex that is in Lj for some j < i, appears in Li
• Invariant: δ(s, v) and P (v) have been computed correctly for all v in any Lj for j < i

• Base case (i = 1): L0 = {s}, δ(s, s) = 0, P (s) = None

• Inductive Step: To compute Li :
– for every vertex u in Li−1 :
∗ for every vertex v ∈ Adj(u) that does not appear in any Lj for j < i:
· add v to Li , set δ(s, v) = i, and set P (v) = u
• Repeatedly compute Li from Lj for j < i for increasing i until Li is the empty set
• Set δ(s, v) = ∞ for any v ∈ V for which δ(s, v) was not set

• Breadth-ﬁrst search correctly computes all δ(s, v) and P (v) by induction

• Running time analysis:
– Store each Li in data structure with Θ(|Li |)-time iteration and O(1)-time insertion
(i.e., in a dynamic array or linked list)
– Checking for a vertex v in any Lj for j < i can be done by checking for v in P
– Maintain δ and P in Set data structures supporting dictionary ops in O(1) time
(i.e., direct access array or hash table)
– Algorithm adds each vertex u to ≤ 1 level and spends O(1) time for each v ∈ Adj(u)
P
– Work upper bounded by O(1) × u∈V deg(u) = O(|E|) by handshake lemma
– Spend Θ(|V |) at end to assign δ(s, v) for vertices v ∈ V not reachable from s
– So breadth-ﬁrst search runs in linear time! O(|V | + |E|)
• Run breadth-ﬁrst search from s in the graph in Example 3
MIT OpenCourseWare
https://ocw.mit.edu

6.006 Introduction to Algorithms

Spring 2020

Lecture 10: Depth-First Search

Previously
• Graph deﬁnitions (directed/undirected, simple, neighbors, degree)

• Graph representations (Set mapping vertices to adjacency lists)

• Paths and simple paths, path length, distance, shortest path

• Graph Path Problems

– Single Pair Reachability(G,s,t)

– Single Source Reachability(G,s)
– Single Pair Shortest Path(G,s,t)
– Single Source Shortest Paths(G,s) (SSSP)

• Breadth-First Search (BFS)

– algorithm that solves Single Source Shortest Paths

– with appropriate data structures, runs in O(|V | + |E|) time (linear in input size)

Examples

G1 G2
a b c a b c

d e f d e f
2 Lecture 10: Depth-First Search

Depth-First Search (DFS)

• Searches a graph from a vertex s, similar to BFS
• Solves Single Source Reachability, not SSSP. Useful for solving other problems (later!)
• Return (not necessarily shortest) parent tree of parent pointers back to s

• Idea! Visit outgoing adjacencies recursively, but never revisit a vertex

• i.e., follow any path until you get stuck, backtrack until ﬁnding an unexplored path to explore
• P (s) = None, then run visit(s), where
• visit(u) :

– for every v ∈ Adj(u) that does not appear in P :

∗ set P (v) = u and recursively call visit(v)
– (DFS ﬁnishes visiting vertex u, for use later!)

• Example: Run DFS on G1 and/or G2 from a

Correctness
• Claim: DFS visits v and correctly sets P (v) for every vertex v reachable from s
• Proof: induct on k, for claim on only vertices within distance k from s

– Base case (k = 0): P (s) is set correctly for s and s is visited

– Inductive step: Consider vertex v with δ(s, v) = k 0 + 1
– Consider vertex u, the second to last vertex on some shortest path from s to v
– By induction, since δ(s, u) = k 0 , DFS visits u and sets P (u) correctly
– While visiting u, DFS considers v ∈ Adj(u)
– Either v is in P , so has already been visited, or v will be visited while visiting u
– In either case, v will be visited by DFS and will be added correctly to P

Running Time
• Algorithm visits each vertex u at most once and spends O(1) time for each v ∈ Adj(u)
P
• Work upper bounded by O(1) × u∈V deg(u) = O(|E|)

• Unlike BFS, not returning a distance for each vertex, so DFS runs in O(|E|) time
Lecture 10: Depth-First Search 3

Full-BFS and Full-DFS

• Suppose want to explore entire graph, not just vertices reachable from one vertex

• Idea! Repeat a graph search algorithm A on any unvisited vertex

• Repeat the following until all vertices have been visited:

– Choose an arbitrary unvisited vertex s, use A to explore all vertices reachable from s

• We call this algorithm Full-A, speciﬁcally Full-BFS or Full-DFS if A is BFS or DFS

• Visits every vertex once, so both Full-BFS and Full-DFS run in O(|V | + |E|) time

• Example: Run Full-DFS/Full-BFS on G1 and/or G2

G1 G2
a b c a b c

d e f d e f

Graph Connectivity
• An undirected graph is connected if there is a path connecting every pair of vertices

• In a directed graph, vertex u may be reachable from v, but v may not be reachable from u

• Connectivity is more complicated for directed graphs (we won’t discuss in this class)

• Connectivity(G): is undirected graph G connected?

• Connected Components(G): given undirected graph G = (V, E), return partition of V

into subsets Vi ⊆ V (connected components) where each Vi is connected in G and there are
no edges between vertices from different connected components

• Consider a graph algorithm A that solves Single Source Reachability

• Claim: A can be used to solve Connected Components

• Proof: Run Full-A. For each run of A, put visited vertices in a connected component
4 Lecture 10: Depth-First Search

Topological Sort
• A Directed Acyclic Graph (DAG) is a directed graph that contains no directed cycle.
• A Topological Order of a graph G = (V, E) is an ordering f on the vertices such that:
every edge (u, v) ∈ E satisfies f (u) < f (v).
• Exercise: Prove that a directed graph admits a topological ordering if and only if it is a DAG.
• How to find a topological order?
• A Finishing Order is the order in which a Full-DFS finishes visiting each vertex in G
• Claim: If G = (V, E) is a DAG, the reverse of a finishing order is a topological order
• Proof: Need to prove, for every edge (u, v) ∈ E that u is ordered before v,
i.e., the visit to v finishes before visiting u. Two cases:
– If u visited before v:
∗ Before visit to u finishes, will visit v (via (u, v) or otherwise)
∗ Thus the visit to v finishes before visiting u
– If v visited before u:
∗ u can’t be reached from v since graph is acyclic
∗ Thus the visit to v finishes before visiting u

Cycle Detection
• Full-DFS will find a topological order if a graph G = (V, E) is acyclic
• If reverse finishing order for Full-DFS is not a topological order, then G must contain a cycle
• Check if G is acyclic: for each edge (u, v), check if v is before u in reverse finishing order
• Can be done in O(|E|) time via a hash table or direct access array
• To return such a cycle, maintain the set of ancestors along the path back to s in Full-DFS
• Claim: If G contains a cycle, Full-DFS will traverse an edge from v to an ancestor of v.
• Proof: Consider a cycle (v0 , v1 , . . . , vk , v0 ) in G
– Without loss of generality, let v0 be the first vertex visited by Full-DFS on the cycle
– For each vi , before visit to vi finishes, will visit vi+1 and finish
– Will consider edge (vi , vi+1 ), and if vi+1 has not been visited, it will be visited now
– Thus, before visit to v0 finishes, will visit vk (for the first time, by v0 assumption)
– So, before visit to vk finishes, will consider (vk , v0 ), where v0 is an ancestor of vk
MIT OpenCourseWare
https://ocw.mit.edu

6.006 Introduction to Algorithms

Spring 2020

Lecture 11: Weighted Shortest Paths

Review
• Single-Source Shortest Paths with BFS in O(|V | + |E|) time (return distance per vertex)

• Single-Source Reachability with BFS or DFS in O(|E|) time (return only reachable vertices)

• Connected components with Full-BFS or Full-DFS in O(|V | + |E|) time

• Topological Sort of a DAG with Full-DFS in O(|V | + |E|) time

• Previously: distance = number of edges in path Today: generalize meaning of distance

Weighted Graphs
• A weighted graph is a graph G = (V, E) together with a weight function w : E → Z

• i.e., assigns each edge e = (u, v) ∈ E an integer weight: w(e) = w(u, v)

• Many applications for edge weights in a graph:

– distances in road network

– latency in network connections
– strength of a relationship in a social network

• Two common ways to represent weights computationally:

– Inside graph representation: store edge weight with each vertex in adjacency lists
– Store separate Set data structure mapping each edge to its weight

• We assume a representation that allows querying the weight of an edge in O(1) time

Examples

G1 G2
−5 −1 5 −5 −1 5
a b c d a b c d
6 9 6 9
7 −4 1 4 7 −4 1 4
8 8
e f g h e f g h
3 2 −2 3 2 −2
2 Lecture 11: Weighted Shortest Paths

Weighted Paths
• The weight w(π) of a path π in a weighted graph is the sum of weights of edges in the path

• The (weighted) shortest path from s ∈ V to t ∈ V is path of minimum weight from s to t

• δ(s, t) = inf{w(π) | path π from s to t} is the shortest-path weight from s to t

• (Often use “distance” for shortest-path weight in weighted graphs, not number of edges)

• As with unweighted graphs:

– δ(s, t) = ∞ if no path from s to t

– Subpaths of shortest paths are shortest paths (or else could splice in a shorter path)

• Why inﬁmum not minimum? Possible that no ﬁnite-length minimum-weight path exists

• When? Can occur if there is a negative-weight cycle in the graph, Ex: (b, f, g, c, b) in G1

• A negative-weight cycle is a path π starting and ending at same vertex with w(π) < 0

• δ(s, t) = −∞ if there is a path from s to t through a vertex on a negative-weight cycle

• If this occurs, don’t want a shortest path, but may want the negative-weight cycle

Weighted Shortest Paths Algorithms

• Next four lectures: algorithms to ﬁnd shortest-path weights in weighted graphs

• (No parent pointers: can reconstruct shortest paths tree in linear time after. Next page!)

• Already know one algorithm: Breadth-First Search! Runs in O(|V | + |E|) time when, e.g.:

– graph has positive weights, and all weights are the same
– graph has positive weights, and sum of all weights at most O(|V | + |E|)

• For general weighted graphs, we don’t know how to solve SSSP in O(|V | + |E|) time

• But if your graph is a Directed Acyclic Graph you can!

Restrictions SSSP Algorithm

Graph Weights Name Running Time O(·) Lecture
General Unweighted BFS |V | + |E| L09
DAG Any DAG Relaxation |V | + |E| L11 (Today!)
General Any Bellman-Ford |V | · |E| L12
General Non-negative Dijkstra |V | log |V | + |E| L13
Lecture 11: Weighted Shortest Paths 3

Shortest-Paths Tree
• For BFS, we kept track of parent pointers during search. Alternatively, compute them after!

• If know δ(s, v) for all vertices v ∈ V , can construct shortest-path tree in O(|V | + |E|) time

• For weighted shortest paths from s, only need parent pointers for vertices v with ﬁnite δ(s, v)

• Initialize empty P and set P (s) = None

• For each vertex u ∈ V where δ(s, v) is ﬁnite:

– For each outgoing neighbor v ∈ Adj+ (u):

∗ If P (v) not assigned and δ(s, v) = δ(s, u) + w(u, v):
· There exists a shortest path through edge (u, v), so set P (v) = u

• Parent pointers may traverse cycles of zero weight. Mark each vertex in such a cycle.

• For each unmarked vertex u ∈ V (including vertices later unmarked):

– For each v ∈ Adj+ (u) where v is marked and δ(s, v) = δ(s, u) + w(u, v):
∗ Unmark vertices in cycle containing v by traversing parent pointers from v
∗ Set P (v) = u, breaking the cycle

• Exercise: Prove this algorithm correctly computes parent pointers in linear time

• Because we can compute parent pointers afterward, we focus on computing distances

DAG Relaxation
• Idea! Maintain a distance estimate d(s, v) (initially ∞) for each vertex v ∈ V ,
that always upper bounds true distance δ(s, v), then gradually lowers until d(s, v) = δ(s, v)

• When do we lower? When an edge violates the triangle inequality!

• Triangle Inequality: the shortest-path weight from u to v cannot be greater than the shortest
path from u to v through another vertex x, i.e., δ(u, v) ≤ δ(u, x) + δ(x, v) for all u, v, x ∈ V

• If d(s, v) > d(s, u) + w(u, v) for some edge (u, v), then triangle inequality is violated :(

• Fix by lowering d(s, v) to d(s, u) + w(u, v), i.e., relax (u, v) to satisfy violated constraint

• Claim: Relaxation is safe: maintains that each d(s, v) is weight of a path to v (or ∞) ∀v ∈ V

• Proof: Assume d(s, v 0 ) is weight of a path (or ∞) for all v 0 ∈ V . Relaxing some edge (u, v)
sets d(s, v) to d(s, u) + w(u, v), which is the weight of a path from s to v through u.
4 Lecture 11: Weighted Shortest Paths

• Set d(s, v) = ∞ for all v ∈ V , then set d(s, s) = 0

• Process each vertex u in a topological sort order of G:

– For each outgoing neighbor v ∈ Adj+ (u):

∗ If d(s, v) > d(s, u) + w(u, v):
· relax edge (u, v), i.e., set d(s, v) = d(s, u) + w(u, v)

• Example: Run DAG Relaxation from vertex a in G2

Correctness
• Claim: At end of DAG Relaxation: d(s, v) = δ(s, v) for all v ∈ V

• Proof: Induct on k: d(s, v) = δ(s, v) for all v in ﬁrst k vertices in topological order

– Base case: Vertex s and every vertex before s in topological order satisﬁes claim at start
– Inductive step: Assume claim holds for ﬁrst k 0 vertices, let v be the (k 0 + 1)th
– Consider a shortest path from s to v, and let u be the vertex preceding v on path
– u occurs before v in topological order, so d(s, u) = δ(s, u) by induction
– When processing u, d(s, v) is set to be no larger (≤) than δ(s, u) + w(u, v) = δ(s, v)
– But d(s, v) ≥ δ(s, v), since relaxation is safe, so d(s, v) = δ(s, v)

• Alternatively:

– For any vertex v, DAG relaxation sets d(s, v) = min{d(s, u) + w(u, v) | u ∈ Adj− (v)}
– Shortest path to v must pass through some incoming neighbor u of v
– So if d(s, u) = δ(s, u) for all u ∈ Adj− (v) by induction, then d(s, v) = δ(s, v)

Running Time
• Initialization takes O(|V |) time, and Topological Sort takes O(|V | + |E|) time

• Additional work upper bounded by O(1) × u∈V deg+ (u) = O(|E|)

• Total running time is linear, O(|V | + |E|)

MIT OpenCourseWare
https://ocw.mit.edu

6.006 Introduction to Algorithms

Spring 2020

Lecture 12: Bellman-Ford

Previously
• Weighted graphs, shortest-path weight, negative-weight cycles

• Finding shortest-path tree from shortest-path weights in O(|V | + |E|) time

• DAG Relaxation: algorithm to solve SSSP on a weighted DAG in O(|V | + |E|) time

• SSSP for graph with negative weights

– Compute δ(s, v) for all v ∈ V (−∞ if v reachable via negative-weight cycle)

– If a negative-weight cycle reachable from s, return one

Warmups
• Exercise 1: Given undirected graph G, return whether G contains a negative-weight cycle

• Solution: Return Yes if there is an edge with negative weight in G in O(|E|) time :O

• So for this lecture, we restrict our discussion to directed graphs

• Exercise 2: Given SSSP algorithm A that runs in O(|V |(|V | + |E|) time,
show how to use it to solve SSSP in O(|V ||E|) time

• Solution: Run BFS or DFS to ﬁnd the vertices reachable from s in O(|E|) time

– Mark each vertex v not reachable from s with δ(s, v) = ∞ in O(|V |) time
– Make graph G0 = (V 0 , E 0 ) with only vertices reachable from s in O(|V | + |E|) time
– Run A from s in G0 .
– G0 is connected, so |V 0 | = O(|E 0 |) = O(|E|) so A runs in O(|V ||E|) time

• Today, we will ﬁnd a SSSP algorithm with this running time that works for general graphs!

Restrictions SSSP Algorithm

Graph Weights Name Running Time O(·) Lecture
General Unweighted BFS |V | + |E| L09
DAG Any DAG Relaxation |V | + |E| L11
General Any Bellman-Ford |V | · |E| L12 (Today!)
General Non-negative Dijkstra |V | log |V | + |E| L13
2 Lecture 12: Bellman-Ford

Simple Shortest Paths

• If graph contains cycles and negative weights, might contain negative-weight cycles :(

• If graph does not contain negative-weight cycles, shortest paths are simple!

• Claim 1: If δ(s, v) is ﬁnite, there exists a shortest path to v that is simple

• Proof: By contradiction:

– Suppose no simple shortest path; let π be a shortest path with fewest vertices
– π not simple, so exists cycle C in π; C has non-negative weight (or else δ(s, v) = −∞)
– Removing C from π forms path π 0 with fewer vertices and weight w(π 0 ) ≤ w(π)

• Since simple paths cannot repeat vertices, ﬁnite shortest paths contain at most |V | − 1 edges

Negative Cycle Witness

• k-Edge Distance δk (s, v): the minimum weight of any path from s to v using ≤ k edges

• Idea! Compute δ|V |−1 (s, v) and δ|V | (s, v) for all v ∈ V

6 −∞, δ(s, v) = δ|V |−1 (s, v), since a shortest path is simple (or nonexistent)
– If δ(s, v) =
– If δ|V | (s, v) < δ|V |−1 (s, v)
∗ there exists a shorter non-simple path to v, so δ|V | (s, v) = −∞
∗ call v a (negative cycle) witness
– However, there may be vertices with −∞ shortest-path weight that are not witnesses

• Claim 2: If δ(s, v) = −∞, then v is reachable from a witness

• Proof: Sufﬁces to prove: every negative-weight cycle reachable from s contains a witness

– Consider a negative-weight cycle C reachable from s

– For v ∈ C, let v 0 ∈ C denote v’s predecessor in C, where w(v 0 , v) < 0
P
v∈C
– Then δ|V | (s, v) ≤ δ|V |−1 (s, v 0 )+w(v 0 , v) (RHS weight of some path on ≤ |V | vertices)
δ|V |−1 (s, v 0 ) + w(v 0 , v) <
P P P P
– So δ|V | (s, v) ≤ δ|V |−1 (s, v)
v∈C v∈C v∈C v∈C

– If C contains no witness, δ|V | (s, v) ≥ δ|V |−1 (s, v) for all v ∈ C, a contradiction
Lecture 12: Bellman-Ford 3

Bellman-Ford
• Idea! Use graph duplication: make multiple copies (or levels) of the graph
• |V | + 1 levels: vertex vk in level k represents reaching vertex v from s using ≤ k edges
• If edges only increase in level, resulting graph is a DAG!

• Construct new DAG G0 = (V 0 , E 0 ) from G = (V, E):

– G0 has |V |(|V | + 1) vertices vk for all v ∈ V and k ∈ {0, . . . , |V |}
– G0 has |V |(|V | + |E|) edges:
∗ |V | edges (vk−1 , vk ) for k ∈ {1, . . . , |V |} of weight zero for each v ∈ V
∗ |V | edges (uk−1 , vk ) for k ∈ {1, . . . , |V |} of weight w(u, v) for each (u, v) ∈ E
• Run DAG Relaxation on G0 from s0 to compute δ(s0 , vk ) for all vk ∈ V 0
• For each vertex: set d(s, v) = δ(s0 , v|V |−1 )
• For each witness u ∈ V where δ(s0 , u|V | ) < δ(s0 , u|V |−1 ):
– For each vertex v reachable from u in G:
∗ set d(s, v) = −∞

Example

G G0
−5
a b a0 b0 c0 d0
−1 6 −4 0 0
−4 −5 −1 3
6 0 0
c d a1 b1 c1 d1
3

δ(a0 , vk )
a2 b2 c2 d2
k\v a b c d
0 0 ∞ ∞ ∞
1 0 −5 6 ∞
a3 b3 c3 d3
2 0 −5 −9 9
3 0 −5 −9 −6
4 0 −7 −9 −6
a4 b4 c4 d4
δ(a, v) 0 −∞ −∞ −∞
4 Lecture 12: Bellman-Ford

Correctness
• Claim 3: δ(s0 , vk ) = δk (s, v) for all v ∈ V and k ∈ {0, . . . , |V |}
• Proof: By induction on k:
– Base case: true for all v ∈ V when k = 0 (only v0 reachable from s0 is v = s)
– Inductive Step: Assume true for all k < k 0 , prove for k = k 0
δ(s0 , vk0 ) = min{δ(s0 , uk0 −1 ) + w(uk0 −1 , vk0 ) | uk0 −1 ∈ Adj− (vk0 )}
= min{{δ(s0 , uk0 −1 ) + w(u, v) | u ∈ Adj− (v)} ∪ {δ(s0 , vk0 −1 )}}
= min{{δk0 −1 (s, u) + w(u, v) | u ∈ Adj− (v)} ∪ {δk0 −1 (s, v)}} (by induction)
= δk0 (s, v)
• Claim 4: At the end of Bellman-Ford d(s, v) = δ(s, v)
• Proof: Correctly computes δ|V |−1 (s, v) and δ|V | (s, v) for all v ∈ V by Claim 3
6 −∞, correctly sets d(s, v) = δ|V |−1 (s, v) = δ(s, v)
– If δ(s, v) =
– Then sets d(s, v) = −∞ for any v reachable from a witness; correct by Claim 2

Running Time
• G0 has size O(|V |(|V | + |E|)) and can be constructed in as much time
• Running DAG Relaxation on G0 takes linear time in the size of G0
• Does O(1) work for each vertex reachable from a witness
• Finding reachability of a witness takes O(|E|) time, with at most O(|V |) witnesses: O(|V ||E|)
• (Alternatively, connect super node x to witnesses via 0-weight edges, linear search from x)
• Pruning G at start to only subgraph reachable from s yields O(|V ||E|)-time algorithm

Extras: Return Negative-Weight Cycle or Space Optimization

• Claim 5: Shortest s0 − v|V | path π for any witness v contains a negative-weight cycle in G
• Proof: Since π contains |V | + 1 vertices, must contain at least one cycle C in G
– C has negative weight (otherwise, remove C to make path π 0 with fewer vertices and
w(π 0 ) ≤ w(π), contradicting witness v)

• Can use just O(|V |) space by storing only δ(s0 , vk−1 ) and δ(s0 , vk ) for each k from 1 to |V |
• Traditionally, Bellman-Ford stores only one value per vertex, attempting to relax every edge
in |V | rounds; but estimates do not correspond to k-Edge Distances, so analysis trickier
• But these space optimizations don’t return a negative weight cycle
MIT OpenCourseWare
https://ocw.mit.edu

6.006 Introduction to Algorithms

Spring 2020

Lecture 13: Dijkstra’s Algorithm

Review
• Single-Source Shortest Paths on weighted graphs
• Previously: O(|V | + |E|)-time algorithms for small positive weights or DAGs
• Last time: Bellman-Ford, O(|V ||E|)-time algorithm for general graphs with negative weights
• Today: faster for general graphs with non-negative edge weights, i.e., for e ∈ E, w(e) ≥ 0

Restrictions SSSP Algorithm

Graph Weights Name Running Time O(·) Lecture
General Unweighted BFS |V | + |E| L09
DAG Any DAG Relaxation |V | + |E| L11
General Any Bellman-Ford |V | · |E| L12
General Non-negative Dijkstra |V | log |V | + |E| L13 (Today!)

Non-negative Edge Weights

• Idea! Generalize BFS approach to weighted graphs:
– Grow a sphere centered at source s
– Repeatedly explore closer vertices before further ones
– But how to explore closer vertices if you don’t know distances beforehand? :(

• Observation 1: If weights non-negative, monotonic distance increase along shortest paths

– i.e., if vertex u appears on a shortest path from s to v, then δ(s, u) ≤ δ(s, v)
– Let Vx ⊂ V be the subset of vertices reachable within distance ≤ x from s
– If v ∈ Vx , then any shortest path from s to v only contains vertices from Vx
– Perhaps grow Vx one vertex at a time! (but growing for every x is slow if weights large)
• Observation 2: Can solve SSSP fast if given order of vertices in increasing distance from s
– Remove edges that go against this order (since cannot participate in shortest paths)
– May still have cycles if zero-weight edges: repeatedly collapse into single vertices
– Compute δ(s, v) for each v ∈ V using DAG relaxation in O(|V | + |E|) time
2 Lecture 13: Dijkstra’s Algorithm

Dijkstra’s Algorithm
• Named for famous Dutch computer scientist Edsger Dijkstra (actually Dÿkstra!)

• Idea! Relax edges from each vertex in increasing order of distance from source s

• Idea! Efﬁciently ﬁnd next vertex in the order using a data structure

• Changeable Priority Queue Q on items with keys and unique IDs, supporting operations:
Q.build(X) initialize Q with items in iterator X
Q.delete min() remove an item with minimum key
Q.decrease key(id, k) ﬁnd stored item with ID id and change key to k

• Implement by cross-linking a Priority Queue Q0 and a Dictionary D mapping IDs into Q0

• Assume vertex IDs are integers from 0 to |V | − 1 so can use a direct access array for D

• For brevity, say item x is the tuple (x.id, x.key)

• Set d(s, v) = ∞ for all v ∈ V , then set d(s, s) = 0

• Build changeable priority queue Q with an item (v, d(s, v)) for each vertex v ∈ V

• While Q not empty, delete an item (u, d(s, u)) from Q that has minimum d(s, u)

– For vertex v in outgoing adjacencies Adj+ (u):

∗ If d(s, v) > d(s, u) + w(u, v):
· Relax edge (u, v), i.e., set d(s, v) = d(s, u) + w(u, v)
· Decrease the key of v in Q to new estimate d(s, v)

• Run Dijkstra on example

Lecture 13: Dijkstra’s Algorithm 3

Example

Delete d(s, v)
v from Q s a b c d 2
G a b
s 0 ∞ ∞ ∞ ∞ 10
c 10 ∞ 3 ∞
d 7 11 5 s 4 1 5 7
8
a 7 10 3
b 9 c d
2
δ(s, v) 0 7 9 3 5

Correctness
• Claim: At end of Dijkstra’s algorithm, d(s, v) = δ(s, v) for all v ∈ V

• Proof:

– If relaxation sets d(s, v) to δ(s, v), then d(s, v) = δ(s, v) at the end of the algorithm
∗ Relaxation can only decrease estimates d(s, v)
∗ Relaxation is safe, i.e., maintains that each d(s, v) is weight of a path to v (or ∞)
– Suffices to show d(s, v) = δ(s, v) when vertex v is removed from Q
∗ Proof by induction on first k vertices removed from Q
∗ Base Case (k = 1): s is first vertex removed from Q, and d(s, s) = 0 = δ(s, s)
∗ Inductive Step: Assume true for k < k 0 , consider k 0 th vertex v 0 removed from Q
∗ Consider some shortest path π from s to v 0 , with w(π) = δ(s, v 0 )
∗ Let (x, y) be the first edge in π where y is not among first k 0 − 1 (perhaps y = v 0 )
∗ When x was removed from Q, d(s, x) = δ(s, x) by induction, so:

d(s, y) ≤ δ(s, x) + w(x, y) relaxed edge (x, y) when removed x

= δ(s, y) subpaths of shortest paths are shortest paths
≤ δ(s, v 0 ) non-negative edge weights
≤ d(s, v 0 ) relaxation is safe
≤ d(s, y) v 0 is vertex with minimum d(s, v 0 ) in Q

∗ So d(s, v 0 ) = δ(s, v 0 ), as desired

4 Lecture 13: Dijkstra’s Algorithm

Running Time
• Count operations on changeable priority queue Q, assuming it contains n items:
Operation Time Occurrences in Dijkstra
Q.build(X) (n = |X|) Bn 1
Q.delete min() Mn |V |
Q.decrease key(id, k) Dn |E|
• Total running time is O(B|V | + |V | · M|V | + |E| · D|V | )
• Assume pruned graph to search only vertices reachable from the source, so |V | = O(|E|)

Priority Queue Q0 Q Operations O(·) Dijkstra O(·)

on n items build(X) delete min() decrease key(id, k) n = |V | = O(|E|)
Array n n 1 |V |2
Binary Heap n log n(a) log n |E| log |V |
Fibonacci Heap n log n(a) 1(a) |E| + |V | log |V |

• If graph is dense, i.e., |E| = Θ(|V |2 ), using an Array for Q0 yields O(|V |2 ) time
• If graph is sparse, i.e., |E| = Θ(|V |), using a Binary Heap for Q0 yields O(|V | log |V |) time
• A Fibonacci Heap is theoretically good in all cases, but is not used much in practice
• We won’t discuss Fibonacci Heaps in 6.006 (see 6.854 or CLRS chapter 19 for details)
• You should assume Dijkstra runs in O(|E|+|V | log |V |) time when using in theory problems

Summary: Weighted Single-Source Shortest Paths

Restrictions SSSP Algorithm
Graph Weights Name Running Time O(·)
General Unweighted BFS |V | + |E|
DAG Any DAG Relaxation |V | + |E|
General Non-negative Dijkstra |V | log |V | + |E|
General Any Bellman-Ford |V | · |E|

• What about All-Pairs Shortest Paths?

• Doing a SSSP algorithm |V | times is actually pretty good, since output has size O(|V |2 )
• Can do better than |V | · O(|V | · |E|) for general graphs with negative weights (next time!)
MIT OpenCourseWare
https://ocw.mit.edu

6.006 Introduction to Algorithms

Spring 2020

Lecture 14: Johnson’s Algorithm

Previously

Restrictions SSSP Algorithm

Graph Weights Name Running Time O(·)
General Unweighted BFS |V | + |E|
DAG Any DAG Relaxation |V | + |E|
General Non-negative Dijkstra |V | log |V | + |E|
General Any Bellman-Ford |V | · |E|

All-Pairs Shortest Paths (APSP)

• Input: directed graph G = (V, E) with weights w : E → Z

• Output: δ(u, v) for all u, v ∈ V , or abort if G contains negative-weight cycle

• Useful when understanding whole network, e.g., transportation, circuit layout, supply chains...

• Just doing a SSSP algorithm |V | times is actually pretty good, since output has size O(|V |2 )

– |V | · O(|V | + |E|) with BFS if weights positive and bounded by O(|V | + |E|)
– |V | · O(|V | + |E|) with DAG Relaxation if acyclic
– |V | · O(|V | log |V | + |E|) with Dijkstra if weights non-negative or graph undirected
– |V | · O(|V | · |E|) with Bellman-Ford (general)

• Today: Solve APSP in any weighted graph in |V | · O(|V | log |V | + |E|) time
2 Lecture 14: Johnson’s Algorithm

Approach
• Idea: Make all edge weights non-negative while preserving shortest paths!

• i.e., reweight G to G0 with no negative weights, where a shortest path in G is shortest in G0

• If non-negative, then just run Dijkstra |V | times to solve APSP

• Claim: Can compute distances in G from distances in G0 in O(|V |(|V | + |E|)) time

– Compute shortest-path tree from distances, for each s ∈ V 0 in O(|V | + |E|) time (L11)
– Also shortest-paths tree in G, so traverse tree with DFS while also computing distances
– Takes O(|V | · (|V | + |E|)) time (which is less time than |V | times Dijkstra)

• But how to make G0 with non-negative edge weights? Is this even possible??

• Claim: Not possible if G contains a negative-weight cycle

• Proof: Shortest paths are simple if no negative weights, but not if negative-weight cycle

• Given graph G with negative weights but no negative-weight cycles,

can we make edge weights non-negative while preserving shortest paths?

Making Weights Non-negative

• Idea! Add negative of smallest weight in G to every edge! All weights non-negative! :)

• FAIL: Does not preserve shortest paths! Biases toward paths traversing fewer edges :(

• Idea! Given vertex v, add h to all outgoing edges and subtract h from all incoming edges

• Claim: Shortest paths are preserved under the above reweighting

• Proof:

– Weight of every path starting at v changes by h

– Weight of every path ending at v changes by −h
– Weight of a path passing through v does not change (locally)

• This is a very general and useful trick to transform a graph while preserving shortest paths!
Lecture 14: Johnson’s Algorithm 3

• Even works with multiple vertices!

• Deﬁne a potential function h : V → Z mapping each vertex v ∈ V to a potential h(v)

• Make graph G0 : same as G but edge (u, v) ∈ E has weight w0 (u, v) = w(u, v) + h(u) − h(v)

• Claim: Shortest paths in G are also shortest paths in G0

• Proof:

– Weight of path π = (v0 , . . . , vk ) in G is w(π) = ki=1 w(vi−1 , vi )

– Weight of π in G0 is: ki=1 w(vi−1 , vi ) + h(vi−1 ) − h(vi ) = w(π) + h(v0 ) − h(vk )

– (Sum of h’s telescope, since there is a positive and negative h(vi ) for each interior i)
– Every path from v0 to vk changes by the same amount
– So any shortest path will still be shortest

Making Weights Non-negative

• Can we ﬁnd a potential function such that G0 has no negative edge weights?

• i.e., is there an h such that w(u, v) + h(u) − h(v) ≥ 0 for every (u, v) ∈ E?

• Re-arrange this condition to h(v) ≤ h(u) + w(u, v), looks like triangle inequality!

• Idea! Condition would be satisﬁed if h(v) = δ(s, v) and δ(s, v) is ﬁnite for some s

• But graph may be disconnected, so may not exist any such vertex s... :(

• Idea! Add a new vertex s with a directed 0-weight edge to every v ∈ V ! :)

• δ(s, v) ≤ 0 for all v ∈ V , since path exists a path of weight 0

• Claim: If δ(s, v) = −∞ for any v ∈ V , then the original graph has a negative-weight cycle

• Proof:

– Adding s does not introduce new cycles (s has no incoming edges)

– So if reweighted graph has a negative-weight cycle, so does the original graph

• Alternatively, if δ(s, v) is ﬁnite for all v ∈ V :

– w0 (u, v) = w(u, v) + h(u) − h(v) ≥ 0 for every (u, v) ∈ E by triangle inequality!

– New weights in G0 are non-negative while preserving shortest paths!
4 Lecture 14: Johnson’s Algorithm

Johnson’s Algorithm
• Construct Gx from G by adding vertex x connected to each vertex v ∈ V with 0-weight edge

• Compute δx (x, v) for every v ∈ V (using Bellman-Ford)

• If δx (x, v) = −∞ for any v ∈ V :

– Abort (since there is a negative-weight cycle in G)

• Else:

– Reweight each edge w0 (u, v) = w(u, v) + δx (x, u) − δx (x, v) to form graph G0

– For each u ∈ V :
∗ Compute shortest-path distances δ 0 (u, v) to all v in G0 (using Dijkstra)
∗ Compute δ(u, v) = δ 0 (u, v) − δx (x, u) + δx (x, v) for all v ∈ V

Correctness
• Already proved that transformation from G to G0 preserves shortest paths

• Rest reduces to correctness of Bellman-Ford and Dijkstra

• Reducing from Signed APSP to Non-negative APSP

• Reductions save time! No induction today! :)

Running Time
• O(|V | + |E|) time to construct Gx

• O(|V ||E|) time for Bellman-Ford

• O(|V | + |E|) time to construct G0

• O(|V | · (|V | log |V | + |E|)) time for |V | runs of Dijkstra

• O(|V |2 ) time to compute distances in G from distances in G0

• O(|V |2 log |V | + |V ||E|) time in total

MIT OpenCourseWare
https://ocw.mit.edu

6.006 Introduction to Algorithms

Spring 2020

Lecture 15: Recursive Algorithms

How to Solve an Algorithms Problem (Review)

• Reduce to a problem you already know (use data structure or algorithm)
Search Data Structures Sort Algorithms Graph Algorithms
Array Insertion Sort Breadth First Search
Linked List Selection Sort DAG Relaxation (DFS + Topo)
Dynamic Array Merge Sort Dijkstra
Sorted Array Counting Sort Bellman-Ford
Direct-Access Array Radix Sort Johnson
Hash Table AVL Sort
AVL Tree Heap Sort
Binary Heap
• Design your own recursive algorithm
– Constant-sized program to solve arbitrary input
– Need looping or recursion, analyze by induction
– Recursive function call: vertex in a graph, directed edge from A → B if B calls A
– Dependency graph of recursive calls must be acyclic (if can terminate)
– Classify based on shape of graph
Class Graph
Brute Force Star
Decrease & Conquer Chain
Divide & Conquer Tree
Dynamic Programming DAG
Greedy/Incremental Subgraph

– Hard part is thinking inductively to construct recurrence on subproblems

– How to solve a problem recursively (SRT BOT)
1. Subproblem deﬁnition
2. Relate subproblem solutions recursively
3. Topological order on subproblems (⇒ subproblem DAG)
4. Base cases of relation
5. Original problem solution via subproblem(s)
6. Time analysis
2 Lecture 15: Recursive Algorithms

Merge Sort in SRT BOT Framework

• Merge sorting an array A of n elements can be expressed in SRT BOT as follows:

– Subproblems: S(i, j) = sorted array on elements of A[i : j] for 0 ≤ i ≤ j ≤ n

– Relation: S(i, j) = merge(S(i, m), S(m, j)) where m = b(i + j)/2c
– Topo. order: Increasing j − i
– Base cases: S(i, i + 1) = [A[i]]
– Original: S(0, n)
– Time: T (n) = 2 T (n/2) + O(n) = O(n lg n)

• In this case, subproblem DAG is a tree (divide & conquer)

Fibonacci Numbers
• Suppose we want to compute the nth Fibonacci number Fn

• Subproblems: F (i) = the ith Fibonacci number Fi for i ∈ {0, 1, . . . , n}

• Relation: F (i) = F (i − 1) + F (i − 2) (deﬁnition of Fibonacci numbers)

• Topo. order: Increasing i

• Base cases: F (0) = 0, F (1) = 1

• Original prob.: F (n)

1 def fib(n):
2 if n < 2: return n # base case
3 return fib(n - 1) + fib(n - 2) # recurrence

• Divide and conquer implies a tree of recursive calls (draw tree)

• Time: T (n) = T (n − 1) + T (n − 2) + O(1) > 2T (n − 2), T (n) = Ω(2n/2 ) exponential... :(

• Subproblem F (k) computed more than once! (F (n − k) times)

• Can we avoid this waste?

Lecture 15: Recursive Algorithms 3

Re-using Subproblem Solutions

• Draw subproblem dependencies as a DAG

• To solve, either:

– Top down: record subproblem solutions in a memo and re-use

(recursion + memoization)
– Bottom up: solve subproblems in topological sort order (usually via loops)

• For Fibonacci, n + 1 subproblems (vertices) and < 2n dependencies (edges)

• Time to compute is then O(n) additions

1 # recursive solution (top down)

2 def fib(n):
3 memo = {}
4 def F(i):
5 if i < 2: return i # base cases
6 if i not in memo: # check memo
7 memo[i] = F(i - 1) + F(i - 2) # relation
8 return memo[i]
9 return F(n) # original

1 # iterative solution (bottom up)

2 def fib(n):
3 F = {}
4 F[0], F[1] = 0, 1 # base cases
5 for i in range(2, n + 1): # topological order
6 F[i] = F[i - 1] + F[i - 2] # relation
7 return F[n] # original

• A subtlety is that Fibonacci numbers grow to Θ(n) bits long, potentially word size w

• Each addition costs O(dn/we) time

• So total cost is O(ndn/we) = O(n + n2 /w) time

4 Lecture 15: Recursive Algorithms

Dynamic Programming
• Weird name coined by Richard Bellman

– Wanted government funding, needed cool name to disguise doing mathematics!

– Updating (dynamic) a plan or schedule (program)

• Existence of recursive solution implies decomposable subproblems1

• Recursive algorithm implies a graph of computation

• Dynamic programming if subproblem dependencies overlap (DAG, in-degree > 1)

• “Recurse but re-use” (Top down: record and lookup subproblem solutions)

• “Careful brute force” (Bottom up: do each subproblem in order)

• Often useful for counting/optimization problems: almost trivially correct recurrences

How to Solve a Problem Recursively (SRT BOT)

1. Subproblem deﬁnition subproblem x ∈ X

• Describe the meaning of a subproblem in words, in terms of parameters

• Often subsets of input: preﬁxes, sufﬁxes, contiguous substrings of a sequence
• Often record partial state: add subproblems by incrementing some auxiliary variables

2. Relate subproblem solutions recursively x(i) = f (x(j), . . .) for one or more j < i

3. Topological order to argue relation is acyclic and subproblems form a DAG

4. Base cases

• State solutions for all (reachable) independent subproblems where relation breaks down

5. Original problem

• Show how to compute solution to original problem from solutions to subproblem(s)

• Possibly use parent pointers to recover actual solution, not just objective function

6. Time analysis
P
• x∈X work(x), or if work(x) = O(W ) for all x ∈ X, then |X| · O(W )
• work(x) measures nonrecursive work in relation; treat recursions as taking O(1) time

1
This property often called optimal substructure. It is a property of recursion, not just dynamic programming
Lecture 15: Recursive Algorithms 5

DAG Shortest Paths

• Recall the DAG SSSP problem: given a DAG G and vertex s, compute δ(s, v) for all v ∈ V

• Subproblems: δ(s, v) for all v ∈ V

• Relation: δ(s, v) = min{δ(s, u) + w(u, v) | u ∈ Adj− (v)} ∪ {∞}

• Topo. order: Topological order of G

• Base cases: δ(s, s) = 0

• Original: All subproblems

P −
• Time: v∈V O(1 + | Adj (v)|) = O(|V | + |E|)

• DAG Relaxation computes the same min values as this dynamic program, just

– step-by-step (if new value < min, update min via edge relaxation), and
– from the perspective of u and Adj+ (u) instead of v and Adj− (v)

Bowling
• Given n pins labeled 0, 1, . . . , n − 1

• Pin i has value vi

• Ball of size similar to pin can hit either

– 1 pin i, in which case we get vi points

– 2 adjacent pins i and i + 1, in which case we get vi · vi+1 points

• Once a pin is hit, it can’t be hit again (removed)

• Problem: Throw zero or more balls to maximize total points

• Example: [ −1, 1 , 1 , 1 , 9, 9 , 3 , −3, −5 , 2, 2 ]

6 Lecture 15: Recursive Algorithms

Bowling Algorithms
• Let’s start with a more familiar divide-and-conquer algorithm:
– Subproblems: B(i, j) = maximum score starting with just pins i, i + 1, . . . , j − 1,
for 0 ≤ i ≤ j ≤ n
– Relation:
∗ m = b(i + j)/2c
∗ Either hit m and m + 1 together, or don’t
∗ B(i, j) = max{vm · vm+1 + B(i, m) + B(m + 2, j), B(i, m + 1) + B(m + 1, j)}
– Topo. order: Increasing j − i
– Base cases: B(i, i) = 0, B(i, i + 1) = max{vi , 0}
– Original: B(0, n)
– Time: T (n) = 4 T (n/2) + O(1) = O(n2 )
• This algorithm works but isn’t very fast, and doesn’t generalize well
(e.g., to allow for a bigger ball that hits three balls at once)

• Dynamic programming algorithm: use sufﬁxes

– Subproblems: B(i) = maximum score starting with just pins i, i + 1, . . . , n − 1,
for 0 ≤ i ≤ n
– Relation:
∗ Locally brute-force what could happen with ﬁrst pin (original pin i):
skip pin, hit one pin, hit two pins
∗ Reduce to smaller sufﬁx and recurse, either B(i + 1) or B(i + 2)
∗ B(i) = max{B(i + 1), vi + B(i + 1), vi · vi+1 + B(i + 2)}
– Topo. order: Decreasing i (for i = n, n − 1, . . . , 0)
– Base cases: B(n) = B(n + 1) = 0
– Original: B(0)
– Time: (assuming memoization)
∗ Θ(n) subproblems · Θ(1) work in each
∗ Θ(n) total time
• Fast and easy to generalize!
• Equivalent to maximum-weight path in Subproblem DAG:

max{v0 , 0} max{v1 , 0} max{v2 , 0}

B0 B1 B2 B3 ··· Bn

v0 · v1 v1 · v2 v2 · v3
Lecture 15: Recursive Algorithms 7

Bowling Code
• Converting a SRT BOT speciﬁcation into code is automatic/straightforward
• Here’s the result for the Bowling Dynamic Program above:

1 # recursive solution (top down)

2 def bowl(v):
3 memo = {}
4 def B(i):
5 if i >= len(v): return 0 # base cases
6 if i not in memo: # check memo
7 memo[i] = max(B(i+1), # relation: skip pin i
8 v[i] + B(i+1), # OR bowl pin i separately
9 v[i] * v[i+1] + B(i+2)) # OR bowl pins i and i+1 together
10 return memo[i]
11 return B(0) # original

1 # iterative solution (bottom up)

2 def bowl(v):
3 B = {}
4 B[len(v)] = 0 # base cases
5 B[len(v)+1] = 0
6 for i in reversed(range(len(v))): # topological order
7 B[i] = max(B[i+1], # relation: skip pin i
8 v[i] + B(i+1), # OR bowl pin i separately
9 v[i] * v[i+1] + B(i+2)) # OR bowl pins i and i+1 together
10 return B[0] # original

How to Relate Subproblem Solutions

• The general approach we’re following to define a relation on subproblem solutions:
– Identify a question about a subproblem solution that, if you knew the answer to, would
reduce to “smaller” subproblem(s)
∗ In case of bowling, the question is “how do we bowl the first couple of pins?”
– Then locally brute-force the question by trying all possible answers, and taking the best
∗ In case of bowling, we take the max because the problem asks to maximize
– Alternatively, we can think of correctly guessing the answer to the question, and di-
rectly recursing; but then we actually check all possible guesses, and return the “best”
• The key for efficiency is for the question to have a small (polynomial) number of possible
answers, so brute forcing is not too expensive
• Often (but not always) the nonrecursive work to compute the relation is equal to the number
of answers we’re trying
MIT OpenCourseWare
https://ocw.mit.edu

6.006 Introduction to Algorithms

Spring 2020

Lecture 16: Dyn. Prog. Subproblems

Dynamic Programming Review

• Recursion where subproblem dependencies overlap, forming DAG

• “Recurse but re-use” (Top down: record and lookup subproblem solutions)

• “Careful brute force” (Bottom up: do each subproblem in order)

Dynamic Programming Steps (SRT BOT)

1. Subproblem deﬁnition subproblem x 2 X

• Describe the meaning of a subproblem in words, in terms of parameters

• Often subsets of input: preﬁxes, sufﬁxes, contiguous substrings of a sequence
• Often multiply possible subsets across multiple inputs
• Often record partial state: add subproblems by incrementing some auxiliary variables

2. Relate subproblem solutions recursively x(i) = f (x(j), . . .) for one or more j < i

• Identify a question about a subproblem solution that, if you knew the answer to, reduces
the subproblem to smaller subproblem(s)
• Locally brute-force all possible answers to the question

3. Topological order to argue relation is acyclic and subproblems form a DAG

4. Base cases

• State solutions for all (reachable) independent subproblems where relation breaks down

5. Original problem

• Show how to compute solution to original problem from solutions to subproblem(s)

• Possibly use parent pointers to recover actual solution, not just objective function

6. Time analysis
P
• x2X work(x), or if work(x) = O(W ) for all x 2 X, then |X| · O(W )

• work(x) measures nonrecursive work in relation; treat recursions as taking O(1) time
2 Lecture 16: Dyn. Prog. Subproblems

Longest Common Subsequence (LCS)

• Given two strings A and B, ﬁnd a longest (not necessarily contiguous) subsequence of A
that is also a subsequence of B.

• Example: A = hieroglyphology, B = michaelangelo

• Solution: hello or heglo or iello or ieglo, all length 5

• Maximization problem on length of subsequence

1. Subproblems

• x(i, j) = length of longest common subsequence of sufﬁxes A[i :] and B[j :]

• For 0  i  |A| and 0  j  |B|

2. Relate

• Either ﬁrst characters match or they don’t

• If first characters match, some longest common subsequence will use them
• (if no LCS uses first matched pair, using it will only improve solution)
• (if an LCS uses first in A[i] and not first in B[j], matching B[j] is also optimal)
• If they do not match, they cannot both be in a longest common subsequence
• Guess whether A[i] or B[j] is not in LCS
⇢
x(i + 1, j + 1) + 1 if A[i] = B[j]
• x(i, j) =
max{x(i + 1, j), x(i, j + 1)} otherwise
• (draw subset of all rectangular grid dependencies)

3. Topological order

• Subproblems x(i, j) depend only on strictly larger i or j or both

• Simplest order to state: Decreasing i + j
• Nice order for bottom-up code: Decreasing i, then decreasing j

4. Base

• x(i, |B|) = x(|A|, j) = 0 (one string is empty)

5. Original problem

• Length of longest common subsequence of A and B is x(0, 0)

• Store parent pointers to reconstruct subsequence
• If the parent pointer increases both indices, add that character to LCS
Lecture 16: Dyn. Prog. Subproblems 3

6. Time

• # subproblems: (|A| + 1) · (|B| + 1)

• work per subproblem: O(1)
• O(|A| · |B|) running time

1 def lcs(A, B):

2 a, b = len(A), len(B)
3 x = [[0] * (b + 1) for _ in range(a + 1)]
4 for i in reversed(range(a)):
5 for j in reversed(range(b)):
6 if A[i] == B[j]:
7 x[i][j] = x[i + 1][j + 1] + 1
8 else:
9 x[i][j] = max(x[i + 1][j], x[i][j + 1])
10 return x[0][0]
4 Lecture 16: Dyn. Prog. Subproblems

Longest Increasing Subsequence (LIS)

• Given a string A, find a longest (not necessarily contiguous) subsequence of A that strictly
increases (lexicographically).
• Example: A = carbohydrate
• Solution: abort, of length 5
• Maximization problem on length of subsequence
• Attempted solution:
– Natural subproblems are prefixes or suffixes of A, say suffix A[i :]
– Natural question about LIS of A[i :]: is A[i] in the LIS? (2 possible answers)
– But then how do we recurse on A[i + 1 :] and guarantee increasing subsequence?
– Fix: add constraint to subproblems to give enough structure to achieve increasing
property

1. Subproblems

• x(i) = length of longest increasing subsequence of suffix A[i :] that includes A[i]
• For 0  i  |A|
2. Relate
• We’re told that A[i] is in LIS (first element)
• Next question: what is the second element of LIS?
– Could be any A[j] where j > i and A[j] > A[i] (so increasing)
– Or A[i] might be the last element of LIS
• x(i) = max{1 + x(j) | i < j < |A|, A[j] > A[i]} [ {1}
3. Topological order
• Decreasing i
4. Base
• No base case necessary, because we consider the possibility that A[i] is last
5. Original problem
• What is the first element of LIS? Guess!
• Length of LIS of A is max{x(i) | 0  i < |A|}
• Store parent pointers to reconstruct subsequence
Lecture 16: Dyn. Prog. Subproblems 5

6. Time

• # subproblems: |A|
• work per subproblem: O(|A|)
• O(|A|2 ) running time
• Exercise: speed up to O(|A| log |A|) by doing only O(log |A|) work per subproblem,
via AVL tree augmentation

1 def lis(A):
2 a = len(A)
3 x = [1] * a
4 for i in reversed(range(a)):
5 for j in range(i, a):
6 if A[j] > A[i]:
7 x[i] = max(x[i], 1 + x[j])
8 return max(x)
6 Lecture 16: Dyn. Prog. Subproblems

Alternating Coin Game

• Given sequence of n coins of value v0 , v1 , . . . , vn 1

• Two players (“me” and “you”) take turns

• In a turn, take ﬁrst or last coin among remaining coins
• My goal is to maximize total value of my taken coins, where I go ﬁrst
• First solution exploits that this is a zero-sum game: I take all coins you don’t

1. Subproblems
• Choose subproblems that correspond to the state of the game
• For every contiguous subsequence of coins from i to j, 0  i  j < n
• x(i, j) = maximum total value I can take starting from coins of values vi , . . . , vj

2. Relate
• I must choose either coin i or coin j (Guess!)
• Then it’s your turn, so you’ll get value x(i + 1, j) or x(i, j 1), respectively
• To ﬁgure out how much value I get, subtract this from total coin values
P P
• x(i, j) = max{vi + jk=i+1 vk x(i + 1, j), vj + jk=i1 vk x(i, j 1)}
3. Topological order
• Increasing j i
4. Base
• x(i, i) = vi
5. Original problem
• x(0, n 1)
• Store parent pointers to reconstruct strategy
6. Time
• # subproblems: ⇥(n2 )
• work per subproblem: ⇥(n) to compute sums
• ⇥(n3 ) running time
Pj
• Exercise: speed up to ⇥(n2 ) time by precomputing all sums k=i vk in ⇥(n2 ) time,
via dynamic programming (!)
Lecture 16: Dyn. Prog. Subproblems 7

• Second solution uses subproblem expansion: add subproblems for when you move next

1. Subproblems

• Choose subproblems that correspond to the full state of the game

• Contiguous subsequence of coins from i to j, and which player p goes next
• x(i, j, p) = maximum total value I can take when player p 2 {me, you} starts from
coins of values vi , . . . , vj

2. Relate

• Player p must choose either coin i or coin j (Guess!)

• If p = me, then I get the value; otherwise, I get nothing
• Then it’s the other player’s turn
• x(i, j, me) = max{vi + x(i + 1, j, you), vj + x(i, j 1, you)}
• x(i, j, you) = min{x(i + 1, j, me), x(i, j 1, me)}

3. Topological order

• Increasing j i

4. Base

• x(i, i, me) = vi
• x(i, i, you) = 0

5. Original problem

• x(0, n 1, me)
• Store parent pointers to reconstruct strategy

6. Time

• # subproblems: ⇥(n2 )
• work per subproblem: ⇥(1)
• ⇥(n2 ) running time
8 Lecture 16: Dyn. Prog. Subproblems

Subproblem Constraints and Expansion

• We’ve now seen two examples of constraining or expanding subproblems

• If you ﬁnd yourself lacking information to check the desired conditions of the problem, or
lack the natural subproblem to recurse on, try subproblem constraint/expansion!

• More subproblems and constraints give the relation more to work with, so can make DP
more feasible

• Usually a trade-off between number of subproblems and branching/complexity of relation

• More examples next lecture

MIT OpenCourseWare
https://ocw.mit.edu

6.006 Introduction to Algorithms

Spring 2020

Lecture 17: Dyn. Prog. III

Dynamic Programming Steps (SRT BOT)

1. Subproblem deﬁnition subproblem x ∈ X

• Describe the meaning of a subproblem in words, in terms of parameters

2. Relate subproblem solutions recursively x(i) = f (x(j), . . .) for one or more j < i

• Identify a question about a subproblem solution that, if you knew the answer to, reduces
the subproblem to smaller subproblem(s)
• Locally brute-force all possible answers to the question

3. Topological order to argue relation is acyclic and subproblems form a DAG

4. Base cases

• State solutions for all (reachable) independent subproblems where relation breaks down

5. Original problem

• Show how to compute solution to original problem from solutions to subproblem(s)

• Possibly use parent pointers to recover actual solution, not just objective function

6. Time analysis
P
• x∈X work(x), or if work(x) = O(W ) for all x ∈ X, then |X| · O(W )
• work(x) measures nonrecursive work in relation; treat recursions as taking O(1) time

Recall: DAG Shortest Paths [L15]

• Subproblems: δ(s, v) for all v ∈ V

• Relation: δ(s, v) = min{δ(s, u) + w(u, v) | u ∈ Adj− (v)} ∪ {∞}

• Topo. order: Topological order of G

2 Lecture 17: Dyn. Prog. III

Single-Source Shortest Paths Revisited

1. Subproblems

• Expand subproblems to add information to make acyclic!

(an example we’ve already seen of subproblem expansion)
• δk (s, v) = weight of shortest path from s to v using at most k edges
• For v ∈ V and 0 ≤ k ≤ |V |

2. Relate

• Guess last edge (u, v) on shortest path from s to v

• δk (s, v) = min{δk−1 (s, u) + w(u, v) | (u, v) ∈ E} ∪ {δk−1 (s, v)}

3. Topological order

• Increasing k: subproblems depend on subproblems only with strictly smaller k

4. Base

• δ0 (s, s) = 0 and δ0 (s, v) = ∞ for v 6= s (no edges)

• (draw subproblem graph)

5. Original problem

• If has ﬁnite shortest path, then δ(s, v) = δ|V |−1 (s, v)

• Otherwise some δ|V | (s, v) < δ|V |−1 (s, v), so path contains a negative-weight cycle
• Can keep track of parent pointers to subproblem that minimized recurrence

6. Time

• # subproblems: |V | × (|V | + 1)
• Work for subproblem δk (s, v): O(degin (v))
|V | |V |
X X X
O(degin (v)) = O(|E|) = O(|V | · |E|)
k=0 v∈V k=0

This is just Bellman-Ford! (computed in a slightly different order)

Lecture 17: Dyn. Prog. III 3

All-Pairs Shortest Paths: Floyd–Warshall

• Could deﬁne subproblems δk (u, v) = minimum weight of path from u to v using at most k
edges, as in Bellman–Ford
• Resulting running time is |V | times Bellman–Ford, i.e., O(|V |2 · |E|) = O(|V |4 )
• Know a better algorithm from L14: Johnson achieves O(|V |2 log |V | + |V | · |E|) = O(|V |3 )
• Can achieve Θ(|V |3 ) running time (matching Johnson for dense graphs) with a simple dy-
namic program, called Floyd–Warshall
• Number vertices so that V = {1, 2, . . . , |V |}

1. Subproblems
• d(u, v, k) = minimum weight of a path from u to v that only uses vertices from
{1, 2, . . . , k} ∪ {u, v}
• For u, v ∈ V and 1 ≤ k ≤ |V |
2. Relate
• x(u, v, k) = min{x(u, k, k − 1) + x(k, v, k − 1), x(u, v, k − 1)}
• Only constant branching! No longer guessing previous vertex/edge
3. Topological order
• Increasing k: relation depends only on smaller k
4. Base
• x(u, u, 0) = 0
• x(u, v, 0) = w(u, v) if (u, v) ∈ E
• x(u, v, 0) = ∞ if none of the above
5. Original problem
• x(u, v, |V |) for all u, v ∈ V
6. Time
• O(|V |3 ) subproblems
• Each O(1) work
• O(|V |3 ) in total
• Constant number of dependencies per subproblem brings the factor of O(|E|) in the
running time down to O(|V |).
4 Lecture 17: Dyn. Prog. III

Arithmetic Parenthesization
• Input: arithmetic expression a0 ∗1 a1 ∗2 a2 · · · ∗n−1 an−1
where each ai is an integer and each ∗i ∈ {+, ×}

• Output: Where to place parentheses to maximize the evaluated expression

• Example: 7 + 4 × 3 + 5 → ((7) + (4)) × ((3) + (5)) = 88

• Allow negative integers!

• Example: 7 + (−4) × 3 + (−5) → ((7) + ((−4) × ((3) + (−5)))) = 15

1. Subproblems

• Sufﬁcient to maximize each subarray? No! (−3) × (−3) = 9 > (−2) × (−2) = 4
• x(i, j, opt) = opt value obtainable by parenthesizing ai ∗i+1 · · · ∗j−1 aj−1
• For 0 ≤ i < j ≤ n and opt ∈ {min, max}

2. Relate

• Guess location of outermost parentheses / last operation evaluated

• x(i, j, opt) = opt {x(i, k, opt0 ) ∗k x(k, j, opt00 )) | i < k < j; opt0 , opt00 ∈ {min, max}}

3. Topological order

• Increasing j − i: subproblem x(i, j, opt) depends only on strictly smaller j − i

4. Base

• x(i, i + 1, opt) = ai , only one number, no operations left!

5. Original problem

• X(0, n, max)
• Store parent pointers (two!) to ﬁnd parenthesization (forms binary tree!)

6. Time

• # subproblems: less than n · n · 2 = O(n2 )

• work per subproblem O(n) · 2 · 2 = O(n)
• O(n3 ) running time
Lecture 17: Dyn. Prog. III 5

Piano Fingering
• Given sequence t0 , t1 , . . . , tn−1 of n single notes to play with right hand (will generalize to
multiple notes and hands later)

• Performer has right-hand ﬁngers 1, 2, . . . , F (F = 5 for most humans)

• Given metric d(t, f, t0 , f 0 ) of difficulty of transitioning from note t with finger f to note t0
with finger f 0

– Typically a sum of penalties for various difﬁculties, e.g.:

– 1 < f < f 0 and t > t0 is uncomfortable
6 t0 (else inﬁnite penalty)
– Legato (smooth) play requires t =
– Weak-ﬁnger rule: prefer to avoid f 0 ∈ {4, 5}
– {f, f 0 } = {3, 4} is annoying

• Goal: Assign ﬁngers to notes to minimize total difﬁculty

• First attempt:

1. Subproblems

• x(i) = minimum total difﬁculty for playing notes ti , ti+1 , . . . , tn−1

2. Relate

• Guess ﬁrst ﬁnger: assignment f for ti

• x(i) = min{x(i + 1) + d(ti , f, ti+1 , ?) | 1 ≤ f ≤ F }
• Not enough information to fill in ?
• Need to know which finger at the start of x(i + 1)
• But different starting fingers could hurt/help both x(i + 1) and d(ti , f, ti+1 , ?)
• Need a table mapping start fingers to optimal solutions for x(i + 1)
• I.e., need to expand subproblems with start condition
6 Lecture 17: Dyn. Prog. III

• Solution:

1. Subproblems

• x(i, f ) = minimum total difﬁculty for playing notes ti , ti+1 , . . . , tn−1 starting with ﬁn-
ger f on note ti
• For 0 ≤ i < n and 1 ≤ f ≤ F

2. Relate

• Guess next ﬁnger: assignment f 0 for ti+1

• x(i, f ) = min{x(i + 1, f 0 ) + d(ti , f, ti+1 , f 0 ) | 1 ≤ f 0 ≤ F }

3. Topological order

• Decreasing i (any f order)

4. Base

• x(n − 1, f ) = 0 (no transitions)

5. Original problem

• min{x(0, f ) | 1 ≤ f ≤ F }

6. Time

• Θ(n · F ) subproblems
• Θ(F ) work per subproblem
• Θ(n · F 2 )
• No dependence on the number of different notes!
Lecture 17: Dyn. Prog. III 7

Guitar Fingering
• Up to S = number of strings different ways to play the same note

• Redefine “finger” to be tuple (finger playing note, string playing note)

• Throughout algorithm, F gets replaced by F · S

• Running time is thus Θ(n · F 2 · S 2 )

Multiple Notes at Once

• Now suppose ti is a set of notes to play at time i

• Given a bigger transition difﬁculty function d(t, f, t0 , f 0 )

• Goal: ﬁngering fi : ti → {1, 2,P. . . , F } specifying how to ﬁnger each note (including which
string for guitar) to minimize n−1i=1 d(ti−1 , fi−1 , ti , fi )

• At most T F choices for each ﬁngering fi , where T = maxi |ti |

– T ≤ F = 10 for normal piano (but there are exceptions)

– T ≤ S for guitar

• Θ(n · T F ) subproblems

• Θ(T F ) work per subproblem

• Θ(n · T 2F ) time

• Θ(n) time for T, F ≤ 10

Video Game Appliactions

• Guitar Hero / Rock Band

– F = 4 (and 5 different notes)

• Dance Dance Revolution

– F = 2 feet
– T = 2 (at most two notes at once)
– Exercise: handle sustained notes, using “where each foot is” (on an arrow or in the
middle) as added state for sufﬁx subproblems
MIT OpenCourseWare
https://ocw.mit.edu

6.006 Introduction to Algorithms

Spring 2020

Lecture 18: Pseudopolynomial

Dynamic Programming Steps (SRT BOT)

1. Subproblem deﬁnition subproblem x 2 X

• Describe the meaning of a subproblem in words, in terms of parameters

2. Relate subproblem solutions recursively x(i) = f (x(j), . . .) for one or more j < i

• Identify a question about a subproblem solution that, if you knew the answer to, reduces
the subproblem to smaller subproblem(s)
• Locally brute-force all possible answers to the question

3. Topological order to argue relation is acyclic and subproblems form a DAG

4. Base cases

• State solutions for all (reachable) independent subproblems where relation breaks down

5. Original problem

• Show how to compute solution to original problem from solutions to subproblem(s)

• Possibly use parent pointers to recover actual solution, not just objective function

6. Time analysis
P
• x2X work(x), or if work(x) = O(W ) for all x 2 X, then |X| · O(W )

• work(x) measures nonrecursive work in relation; treat recursions as taking O(1) time
2 Lecture 18: Pseudopolynomial

Rod Cutting
• Given a rod of length L and value v(`) of rod of length ` for all ` 2 {1, 2, . . . , L}
• Goal: Cut the rod to maximize the value of cut rod pieces
• Example: L = 7, v = [0, 1, 10, 13, 18, 20, 31, 32]
`= 0 1 2 3 4 5 6 7

• Maybe greedily take most valuable per unit length?

• Nope! arg max` v[`]/` = 6, and partitioning [6, 1] yields 32 which is not optimal!
• Solution: v[2] + v[2] + v[3] = 10 + 10 + 13 = 33
• Maximization problem on value of partition
1. Subproblems

• x(`): maximum value obtainable by cutting rod of length `

• For ` 2 {0, 1, . . . , L}
2. Relate
• First piece has some length p (Guess!)
• x(`) = max{v(p) + x(` p) | p 2 {1, . . . , `}}
• (draw dependency graph)
3. Topological order
• Increasing `: Subproblems x(`) depend only on strictly smaller `, so acyclic
4. Base
• x(0) = 0 (length-zero rod has no value!)
5. Original problem
• Maximum value obtainable by cutting rod of length L is x(L)
• Store choices to reconstruct cuts
• If current rod length ` and optimal choice is `0 , remainder is piece p = ` `0
• (maximum-weight path in subproblem DAG!)
6. Time
• # subproblems: L + 1
• work per subproblem: O(`) = O(L)
• O(L2 ) running time
Lecture 18: Pseudopolynomial 3

Is This Polynomial Time?

• (Strongly) polynomial time means that the running time is bounded above by a constant-
degree polynomial in the input size measured in words

• In Rod Cutting, input size is L + 1 words (one integer L and L integers in v)

• O(L2 ) is a constant-degree polynomial in L + 1, so YES: (strongly) polynomial time

1 # recursive
2 x = {}
3 def cut_rod(l, v):
4 if l < 1: return 0 # base case
5 if l not in x: # check memo
6 for piece in range(1, l + 1): # try piece
7 x_ = v[piece] + cut_rod(l - piece, v) # recurrence
8 if (l not in x) or (x[l] < x_): # update memo
9 x[l] = x_
10 return x[l]

1 # iterative
2 def cut_rod(L, v):
3 x = [0] * (L + 1) # base case
4 for l in range(L + 1): # topological order
5 for piece in range(1, l + 1): # try piece
6 x_ = v[piece] + x[l - piece] # recurrence
7 if x[l] < x_: # update memo
8 x[l] = x_
9 return x[L]

1 # iterative with parent pointers

2 def cut_rod_pieces(L, v):
3 x = [0] * (L + 1) # base case
4 parent = [None] * (L + 1) # parent pointers
5 for l in range(1, L + 1): # topological order
6 for piece in range(1, l + 1): # try piece
7 x_ = v[piece] + x[l - piece] # recurrence
8 if x[l] < x_: # update memo
9 x[l] = x_
10 parent[l] = l - piece # update parent
11 l, pieces = L, []
12 while parent[l] is not None: # walk back through parents
13 piece = l - parent[l]
14 pieces.append(piece)
15 l = parent[l]
16 return pieces
4 Lecture 18: Pseudopolynomial

Subset Sum
• Input: Sequence of n positive integers A = {a0 , a1 , . . . , an }
P
• Output: Is there a subset of A that sums exactly to T ? (i.e., 9A0 ✓ A s.t. a2A0 a = T ?)

• Example: A = (1, 3, 4, 12, 19, 21, 22), T = 47 allows A0 = {3, 4, 19, 21}

• Optimization problem? Decision problem! Answer is YES or NO, TRUE or FALSE

• In example, answer is YES. However, answer is NO for some T , e.g., 2, 6, 9, 10, 11, . . .

1. Subproblems

• x(i, t) = does any subset of A[i :] sum to t?

• For i 2 {0, 1, . . . , n}, t 2 {0, 1, . . . , T }

2. Relate

• Idea: Is ﬁrst item ai in a valid subset A0 ? (Guess!)

• If yes, then try to sum to t ai 0 using remaining items
• If no, then try to sum to t using remaining items
⇢
x(i + 1, t A[i]) if t A[i]
• x(i, t) = OR
x(i + 1, t) always

3. Topological order

• Subproblems x(i, t) only depend on strictly larger i, so acyclic

• Solve in order of decreasing i

4. Base

• x(i, 0) = YES for i 2 {0, . . . , n} (space packed exactly!)

• x(0, t) = NO for j 2 {1, . . . , T } (no more items available to pack)

5. Original problem

• Original problem given by x(0, T )

• Example: A = (3, 4, 3, 1), T = 6 solution: A0 = (3, 3)
• Bottom up: Solve all subproblems (Example has 35)
Lecture 18: Pseudopolynomial 5

• Top down: Solve only reachable subproblems (Example, only 14!)

6. Time

• # subproblems: O(nT ), O(1) work per subproblem, O(nT ) time

6 Lecture 18: Pseudopolynomial

Is This Polynomial?
• Input size is n + 1: one integer T and n integers in A

• Is O(nT ) bounded above by a polynomial in n + 1? NO, not necessarily

• On w-bit word RAM, T  2w and w lg(n + 1), but we don’t have an upper bound on w

• E.g., w = n is not unreasonable, but then running time is O(n2n ), which is exponential

Pseudopolynomial
• Algorithm has pseudopolynomial time: running time is bounded above by a constant-
degree polynomial in input size and input integers

• Such algorithms are polynomial in the case that integers are polynomially bounded in input
size, i.e., nO(1) (same case that Radix Sort runs in O(n) time)

• Counting sort O(n + u), radix sort O(n logn u), direct-access array build O(n + u), and
Fibonacci O(n) are all pseudopolynomial algorithms we’ve seen already

• Radix sort is actually weakly polynomial (a notion in between strongly polynomial and
pseudopolynomial): bounded above by a constant-degree polynomial in the input size mea-
sured in bits, i.e., in the logarithm of the input integers

• Contrast with Rod Cutting, which was polynomial

– Had pseudopolynomial dependence on L

– But luckily had L input integers too
– If only given subset of sellable rod lengths (Knapsack Problem, which generalizes
Rod Cutting and Subset Sum — see recitation), then algorithm would have been only
pseudopolynomial

Complexity
• Is Subset Sum solvable in polynomial time when integers are not polynomially bounded?

• No if P 6= NP. What does that mean? Next lecture!

Lecture 18: Pseudopolynomial 7

Main Features of Dynamic Programs

• Review of examples from lecture

• Subproblems:

– Preﬁx/sufﬁxes: Bowling, LCS, LIS, Floyd–Warshall, Rod Cutting (coincidentally, re-

ally Integer subproblems), Subset Sum
– Substrings: Alternating Coin Game, Arithmetic Parenthesization
– Multiple sequences: LCS
– Integers: Fibonacci, Rod Cutting, Subset Sum
⇤ Pseudopolynomial: Fibonacci, Subset Sum
– Vertices: DAG shortest paths, Bellman–Ford, Floyd–Warshall

• Subproblem constraints/expansion:

– Nonexpansive constraint: LIS (include ﬁrst item)

– 2⇥ expansion: Alternating Coin Game (who goes first?), Arithmetic Parenthesization
(min/max)
– ⇥(1)⇥ expansion: Piano Fingering (first finger assignment)
– ⇥(n)⇥ expansion: Bellman–Ford (# edges)

• Relation:

– Branching = # dependant subproblems in each subproblem

– ⇥(1) branching: Fibonacci, Bowling, LCS, Alternating Coin Game, Floyd–Warshall,
Subset Sum
– ⇥(degree) branching (source of |E| in running time): DAG shortest paths, Bellman–
Ford
– ⇥(n) branching: LIS, Arithmetic Parenthesization, Rod Cutting
– Combine multiple solutions (not path in subproblem DAG): Fibonacci, Floyd–
Warshall, Arithmetic Parenthesization

• Original problem:

– Combine multiple subproblems: DAG shortest paths, Bellman–Ford, Floyd–Warshall,

LIS, Piano Fingering
MIT OpenCourseWare
https://ocw.mit.edu

6.006 Introduction to Algorithms

Spring 2020

Recitation 1

Algorithms
The study of algorithms searches for efﬁcient procedures to solve problems. The goal of this class
is to not only teach you how to solve problems, but to teach you to communicate to others that a
solution to a problem is both correct and efﬁcient.

• A problem is a binary relation connecting problem inputs to correct outputs.

• A (deterministic) algorithm is a procedure that maps inputs to single outputs.

• An algorithm solves a problem if for every problem input it returns a correct output.

While a problem input may have more than one correct output, an algorithm should only return one
output for a given input (it is a function). As an example, consider the problem of ﬁnding another
student in your recitation who shares the same birthday.

Problem: Given the students in your recitation, return either the names of two students
who share the same birthday and year, or state that no such pair exists.

This problem relates one input (your recitation) to one or more outputs comprising birthday-
matching pairs of students or one negative result. A problem input is sometimes called an instance
of the problem. One algorithm that solves this problem is the following.

Algorithm: Maintain an initially empty record of student names and birthdays. Go

around the room and ask each student their name and birthday. After interviewing each
student, check to see whether their birthday already exists in the record. If yes, return the
names of the two students found. Otherwise, add their name and birthday to the record.
If after interviewing all students no satisfying pair is found, return that no matching pair
exists.

Of course, our algorithm solves a much more general problem than the one proposed above. The
same algorithm can search for a birthday-matching pair in any set of students, not just the students
in your recitation. In this class, we try to solve problems which generalize to inputs that may be
arbitrarily large. The birthday matching algorithm can be applied to a recitation of any size. But
how can we determine whether the algorithm is correct and efﬁcient?
Recitation 1 2

Correctness
Any computer program you write will have ﬁnite size, while an input it acts on may be arbitrarily
large. Thus every algorithm we discuss in this class will need to repeat commands in the algorithm
via loops or recursion, and we will be able to prove correctness of the algorithm via induction.
Let’s prove that the birthday algorithm is correct.

Proof. Induct on the first k students interviewed. Base case: for k = 0, there is no
matching pair, and the algorithm returns that there is no matching pair. Alternatively,
assume for induction that the algorithm returns correctly for the first k students. If the
first k students contain a matching pair, than so does the first k + 1 students and the
algorithm already returned a matching pair. Otherwise the first k students do not contain
a matching pair, so if the k +1 students contain a match, the match includes student k +1,
and the algorithm checks whether the student k + 1 has the same birthday as someone
already processed.

Efficiency
What makes a computer program efficient? One program is said to be more efficient than another
if it can solve the same problem input using fewer resources. We expect that a larger input might
take more time to solve than another input having smaller size. In addition, the resources used by
a program, e.g. storage space or running time, will depend on both the algorithm used and the ma-
chine on which the algorithm is implemented. We expect that an algorithm implemented on a fast
machine will run faster than the same algorithm on a slower machine, even for the same input. We
would like to be able to compare algorithms, without having to worry about how fast our machine
is. So in this class, we compare algorithms based on their asymptotic performance relative to
problem input size, in order to ignore constant factor differences in hardware performance.
Recitation 1 3

Asymptotic Notation
We can use asymptotic notation to ignore constants that do not change with the size of the problem
input. O(f (n)) represents the set of functions with domain over the natural numbers satisfying the
following property.
O Notation: Non-negative function g(n) is in O(f (n)) if and only if there exists a
positive real number c and positive integer n0 such that g(n) ≤ c · f (n) for all n ≥ n0 .
This deﬁnition upper bounds the asymptotic growth of a function for sufﬁciently large n, i.e.,
the bound on growth is true even if we were to scale or shift our function by a constant amount.
By convention, it is more common for people to say that a function g(n) is O(f (n)) or equal
to O(f (n)), but what they really mean is set containment, i.e., g(n) ∈ O(f (n)). So since our
problem’s input size is cn for some constant c, we can forget about c and say the input size is O(n)
(order n). A similar notation can be used for lower bounds.
Ω Notation: Non-negative function g(n) is in Ω(f (n)) if and only if there exists a
positive real number c and positive integer n0 such that c · f (n) ≤ g(n) for all n ≥ n0 .
When one function both asymptotically upper bounds and asymptotically lower bounds another
function, we use Θ notation. When g(n) = Θ(f (n)), we say that f (n) represents a tight bound
on g(n).

Θ Notation: Non-negative g(n) is in Θ(f (n)) if and only if g(n) ∈ O(f (n)) ∩ Ω(f (n)).

We often use shorthand to characterize the asymptotic growth (i.e., asymptotic complexity) of
common functions, such as those shown in the table below1 . Here we assume c ∈ Θ(1).

Shorthand Constant Logarithmic Linear Quadratic Polynomial Exponential1

c
Θ(f (n)) Θ(1) Θ(log n) Θ(n) Θ(n2 ) Θ(nc ) 2Θ(n )

Linear time is often necessary to solve problems where the entire input must be read in order to
solve the problem. However, if the input is already accessible in memory, many problems can
be solved in sub-linear time. For example, the problem of ﬁnding a value in a sorted array (that
has already been loaded into memory) can be solved in logarithmic time via binary search. We
focus on polynomial time algorithms in this class, typically for small values of c. There’s a big
difference between logarithmic, linear, and exponential. If n = 1000, log n ≈ 101 , n ≈ 103 , while
2n ≈ 10300 . For comparison, the number of atoms in the universe is estimated around 1080 . It is
common to use the variable ‘n’ to represent a parameter that is linear in the problem input size,
though this is not always the case. For example, when talking about graph algorithms later in the
term, a problem input will be a graph parameterized by vertex set V and edge set E, so a natural
input size will be Θ(|V | + |E|). Alternatively, when talking about matrix algorithms, it is common
to let n be the width of a square matrix, where a problem input will have size Θ(n2 ), specifying
each element of the n × n matrix.
c
1
Note that exponential 2Θ(n ) is a convenient abuse of notation meaning {2p | p ∈ Θ(nc )}.
Recitation 1 4

Model of Computation
In order to precisely calculate the resources used by an algorithm, we need to model how long a
computer takes to perform basic operations. Specifying such a set of operations provides a model
of computation upon which we can base our analysis. In this class, we will use the w-bit Word-
RAM model of computation, which models a computer as a random access array of machine
words called memory, together with a processor that can perform operations on the memory.
A machine word is a sequence of w bits representing an integer from the set {0, . . . , 2w − 1}.
A Word-RAM processor can perform basic binary operations on two machine words in constant
time, including addition, subtraction, multiplication, integer division, modulo, bitwise operations,
and binary comparisons. In addition, given a word a, the processor can read or write the word
in memory located at address a in constant time. If a machine word contains only w bits, the
processor will only be able to read and write from at most 2w addresses in memory2 . So when
solving a problem on an input stored in n machine words, we will always assume our Word-RAM
has a word size of at least w > log2 n bits, or else the machine would not be able to access all of the
input in memory. To put this limitation in perspective, a Word-RAM model of a byte-addressable
64-bit machine allows inputs up to ∼ 1010 GB in size.

Data Structure
The running time of our birthday matching algorithm depends on how we store the record of names
and birthdays. A data structure is a way to store a non-constant amount of data, supporting a set
of operations to interact with that data. The set of operations supported by a data structure is
called an interface. Many data structures might support the same interface, but could provide
different performance for each operation. Many problems can be solved trivially by storing data
in an appropriate choice of data structure. For our example, we will use the most primitive data
structure native to the Word-RAM: the static array. A static array is simply a contiguous sequence
of words reserved in memory, supporting a static sequence interface:

• StaticArray(n): allocate a new static array of size n initialized to 0 in Θ(n) time

• StaticArray.get at(i): return the word stored at array index i in Θ(1) time

• StaticArray.set at(i, x): write the word x to array index i in Θ(1) time

The get at(i) and set at(i, x) operations run in constant time because each item in the
array has the same size: one machine word. To store larger objects at an array index, we can
interpret the machine word at the index as a memory address to a larger piece of memory. A Python
tuple is like a static array without set at(i, x). A Python list implements a dynamic array
(see L02).
2
For example, on a typical 32-bit machine, each byte (8-bits) is addressable (for historical reasons), so the size of
the machine’s random-access memory (RAM) is limited to (8-bits)×(232 ) ≈ 4 GB.
Recitation 1 5

1 class StaticArray:
2 def __init__(self, n):
3 self.data = [None] * n
4 def get_at(self, i):
5 if not (0 <= i < len(self.data)): raise IndexError
6 return self.data[i]
7 def set_at(self, i, x):
8 if not (0 <= i < len(self.data)): raise IndexError
9 self.data[i] = x
10
11 def birthday_match(students):
12 ’’’
13 Find a pair of students with the same birthday
14 Input: tuple of student (name, bday) tuples
15 Output: tuple of student names or None
16 ’’’
17 n = len(students) # O(1)
18 record = StaticArray(n) # O(n)
19 for k in range(n): # n
20 (name1, bday1) = students[k] # O(1)
21 for i in range(k): # k Check if in record
22 (name2, bday2) = record.get_at(i) # O(1)
23 if bday1 == bday2: # O(1)
24 return (name1, name2) # O(1)
25 record.set_at(k, (name1, bday1)) # O(1)
26 return None # O(1)

Running Time Analysis

Now let’s analyze the running time of our birthday matching algorithm on a recitation containing
n students. We will assume that each name and birthday ﬁts into a constant number of machine
words so that a single student’s information can be collected and manipulated in constant time3 .
We step through the algorithm line by line. All the lines take constant time except for lines 8, 9,
and 11. Line 8 takes Θ(n) time to initialize the static array record; line 9 loops at most n times; and
line 11 loops through the k items existing in the record. Thus the running time for this algorithm
is at most:
Xn−1
O(n) + (O(1) + k · O(1)) = O(n2 )
k=0

This is quadratic in n, which is polynomial! Is this efficient? No! We can do better by using a
different data structure for our record. We will spend the first half of this class studying elementary
data structures, where each data structure will be tailored to support a different set of operations
efficiently.

3
This is a reasonable restriction, which allows names and birthdays to contain O(w) characters from a constant
sized alphabet. Since w > log2 n, this restriction still allows each student’s information to be distinct.
Recitation 1 6

Asymptotics Exercises
1. Have students generate 10 functions and order them based on asymptotic growth.
� n
2. Find a simple, tight asymptotic bound for 6006 .
Solution: Deﬁnition yields n(n − 1) . . . (n − 6005) in the numerator �(a degree 6006 poly-
n

nomial) and 6006! in the denominator (constant with respect to n). So 6006 = Θ(n6006 ).
� � √ 2
3. Find a simple, tight asymptotic bound for log6006 log n n .
Solution: Recall exponent and logarithm rules: log ab = log a + log b, log (ab ) = b log a,
and loga b = log b/ log a.

√ 2 2
n
�√
log6006 log n = log n log n
log 6006
= Θ(log n1/2 + log log n) = Θ(log n)

n+1 n
4. Show that 2n+1 ∈ Θ(2n ), but that 22 6∈ O(22 ).
Solution: In the ﬁrst case, 2n+1 = 2 · 2n , which is a constant factor larger than 2n . In the
n+1 � n 2 n
second case, 22 = 22 , which is deﬁnitely more than a constant factor larger than 22 .
5. Show that (log n)a = O(nb ) for all positive constants a and b.
Solution: It’s enough to show nb /(log n)a limits to ∞ as n → ∞, and this is equivalent to
arguing that the log of this expression approaches ∞:
nb

lim log = lim (b log n − a log log n) = lim (bx − a log x) = ∞,
n→∞ (log n)a n→∞ x→∞

as desired.
Note: for the same reasons, na = O(cn ) for any c > 1.
6. Show that (log n)log n = Ω(n).
Solution: Note that mm = Ω(2m ), so setting n = 2m completes the proof.
7. Show that (6n)! 6∈ Θ(n!), but that log ((6n)!) ∈ Θ(log(n!)).
Solution: We invoke Sterling’s approximation,

√ n n
1
n! = 2πn 1+Θ .
e n
Substituting in 6n gives an expression that is at least 66n larger than the original. But taking
the logarithm of Sterling’s gives log(n!) = Θ(n log n), and substituting in 6n yields only
constant additional factors.
MIT OpenCourseWare
https://ocw.mit.edu

6.006 Introduction to Algorithms

Spring 2020

Recitation 2

Sequence Interface (L02, L07)

Sequences maintain a collection of items in an extrinsic order, where each item stored has a rank
in the sequence, including a first item and a last item. By extrinsic, we mean that the first item is
‘first’, not because of what the item is, but because some external party put it there. Sequences are
generalizations of stacks and queues, which support a subset of sequence operations.

Container build(X) given an iterable X, build sequence from items in X

Set Interface (L03-L08)

By contrast, Sets maintain a collection of items based on an intrinsic property involving what the
items are, usually based on a unique key, x.key, associated with each item x. Sets are generaliza-
tions of dictionaries and other intrinsic query databases.

Container build(X) given an iterable X, build set from items in X

Sequence Implementations
Here, we will discuss three data structures to implement the sequence interface. In Problem Set
1, you will extend both Linked Lists and Dynamic arrays to make both ﬁrst and last dynamic
operations O(1) time for each. Notice that none of these data structures support dynamic operations
at arbitrary index in sub-linear time. We will learn how to improve this operation in Lecture 7.

Operation, Worst Case O(·)

Array Sequence
Computer memory is a finite resource. On modern computers many processes may share the same
main memory store, so an operating system will assign a fixed chunk of memory addresses to
each active process. The amount of memory assigned depends on the needs of the process and the
availability of free memory. For example, when a computer program makes a request to store a
variable, the program must tell the operating system how much memory (i.e. how many bits) will
be required to store it. To fulfill the request, the operating system will find the available memory
in the process’s assigned memory address space and reserve it (i.e. allocate it) for that purpose
until it is no longer needed. Memory management and allocation is a detail that is abstracted away
by many high level languages including Python, but know that whenever you ask Python to store
something, Python makes a request to the operating system behind-the-scenes, for a fixed amount
of memory in which to store it.

Now suppose a computer program wants to store two arrays, each storing ten 64-bit words. The
program makes separate requests for two chunks of memory (640 bits each), and the operating
system fulfills the request by, for example, reserving the first ten words of the process’s assigned
address space to the first array A, and the second ten words of the address space to the second array
B. Now suppose that as the computer program progresses, an eleventh word w needs to be added
to array A. It would seem that there is no space near A to store the new word: the beginning of the
process’s assigned address space is to the left of A and array B is stored on the right. Then how
can we add w to A? One solution could be to shift B right to make room for w, but tons of data
may already be reserved next to B, which you would also have to move. Better would be to simply
request eleven new words of memory, copy A to the beginning of the new memory allocation, store
w at the end, and free the first ten words of the process’s address space for future memory requests.

A ﬁxed-length array is the data structure that is the underlying foundation of our model of com-
putation (you can think of your computer’s memory as a big ﬁxed-length array that your operating
Recitation 2 3

system allocates from). Implementing a sequence using an array, where index i in the array cor-
responds to item i in the sequence allows get at and set at to be O(1) time because of our
random access machine. However, when deleting or inserting into the sequence, we need to move
items and resize the array, meaning these operations could take linear-time in the worst case. Below
is a full Python implementation of an array sequence.

1 class Array_Seq:
2 def __init__(self): # O(1)
3 self.A = []
4 self.size = 0
5
6 def __len__(self): return self.size # O(1)
7 def __iter__(self): yield from self.A # O(n) iter_seq
8
9 def build(self, X): # O(n)
10 self.A = [a for a in X] # pretend this builds a static array
11 self.size = len(self.A)
12
13 def get_at(self, i): return self.A[i] # O(1)
14 def set_at(self, i, x): self.A[i] = x # O(1)
15
16 def _copy_forward(self, i, n, A, j): # O(n)
17 for k in range(n):
18 A[j + k] = self.A[i + k]
19
20 def _copy_backward(self, i, n, A, j): # O(n)
21 for k in range(n - 1, -1, -1):
22 A[j + k] = self.A[i + k]
23
24 def insert_at(self, i, x): # O(n)
25 n = len(self)
26 A = [None] * (n + 1)
27 self._copy_forward(0, i, A, 0)
28 A[i] = x
29 self._copy_forward(i, n - i, A, i + 1)
30 self.build(A)
31

32 def delete_at(self, i): # O(n)

33 n = len(self)
34 A = [None] * (n - 1)
35 self._copy_forward(0, i, A, 0)
36 x = self.A[i]
37 self._copy_forward(i + 1, n - i - 1, A, i)
38 self.build(A)
39 return x
40 # O(n)
41 def insert_first(self, x): self.insert_at(0, x)
42 def delete_first(self): return self.delete_at(0)
43 def insert_last(self, x): self.insert_at(len(self), x)
44 def delete_last(self): return self.delete_at(len(self) - 1)
Recitation 2 4

Linked List Sequence

A linked list is a different type of data structure entirely. Instead of allocating a contiguous chunk
of memory in which to store items, a linked list stores each item in a node, node, a constant-sized
container with two properties: node.item storing the item, and node.next storing the memory
address of the node containing the next item in the sequence.

1 class Linked_List_Node:
2 def __init__(self, x): # O(1)
3 self.item = x
4 self.next = None
5
6 def later_node(self, i): # O(i)
7 if i == 0: return self
8 assert self.next
9 return self.next.later_node(i - 1)

Such data structures are sometimes called pointer-based or linked and are much more flexible than
array-based data structures because their constituent items can be stored anywhere in memory. A
linked list stores the address of the node storing the first element of the list called the head of the
list, along with the linked list’s size, the number of items stored in the linked list. It is easy to add
an item after another item in the list, simply by changing some addresses (i.e. relinking pointers).
In particular, adding a new item at the front (head) of the list takes O(1) time. However, the only
way to find the ith item in the sequence is to step through the items one-by-one, leading to worst-
case linear time for get at and set at operations. Below is a Python implementation of a full
linked list sequence.

1 class Linked_List_Seq:
2 def __init__(self): # O(1)
3 self.head = None
4 self.size = 0
5
6 def __len__(self): return self.size # O(1)
7
8 def __iter__(self): # O(n) iter_seq
9 node = self.head
10 while node:
11 yield node.item
12 node = node.next
13
14 def build(self, X): # O(n)
15 for a in reversed(X):
16 self.insert_first(a)
17
18 def get_at(self, i): # O(i)
19 node = self.head.later_node(i)
20 return node.item
21
Recitation 2 5

22 def set_at(self, i, x): # O(i)

23 node = self.head.later_node(i)
24 node.item = x
25
26 def insert_first(self, x): # O(1)
27 new_node = Linked_List_Node(x)
28 new_node.next = self.head
29 self.head = new_node
30 self.size += 1
31
32 def delete_first(self): # O(1)
33 x = self.head.item
34 self.head = self.head.next
35 self.size -= 1
36 return x
37
38 def insert_at(self, i, x): # O(i)
39 if i == 0:
40 self.insert_first(x)
41 return
42 new_node = Linked_List_Node(x)
43 node = self.head.later_node(i - 1)
44 new_node.next = node.next
45 node.next = new_node
46 self.size += 1
47
48 def delete_at(self, i): # O(i)
49 if i == 0:
50 return self.delete_first()
51 node = self.head.later_node(i - 1)
52 x = node.next.item
53 node.next = node.next.next
54 self.size -= 1
55 return x
56 # O(n)
57 def insert_last(self, x): self.insert_at(len(self), x)
58 def delete_last(self): return self.delete_at(len(self) - 1)

Dynamic Array Sequence

The array’s dynamic sequence operations require linear time with respect to the length of array
A. Is there another way to add elements to an array without paying a linear overhead transfer cost
each time you add an element? One straight-forward way to support faster insertion would be to
over-allocate additional space when you request space for the array. Then, inserting an item would
be as simple as copying over the new value into the next empty slot. This compromise trades a little
extra space in exchange for constant time insertion. Sounds like a good deal, but any additional
allocation will be bounded; eventually repeated insertions will ﬁll the additional space, and the
array will again need to be reallocated and copied over. Further, any additional space you reserve
will mean less space is available for other parts of your program.
Recitation 2 6

Then how does Python support appending to the end of a length n Python List in worst-case O(1)
time? The answer is simple: it doesn’t. Sometimes appending to the end of a Python List requires
O(n) time to transfer the array to a larger allocation in memory, so sometimes appending to a
Python List takes linear time. However, allocating additional space in the right way can guarantee
that any sequence of n insertions only takes at most O(n) time (i.e. such linear time transfer oper-
ations do not occur often), so insertion will take O(1) time per insertion on average. We call this
asymptotic running time amortized constant time, because the cost of the operation is amortized
(distributed) across many applications of the operation.

To achieve an amortized constant running time for insertion into an array, our strategy will be to
allocate extra space in proportion to the size of the array being stored. Allocating O(n) additional
space ensures that a linear number of insertions must occur before an insertion will overﬂow the
allocation. A typical implementation of a dynamic array will allocate double the amount of space
needed to store the current array, sometimes referred to as table doubling. However, allocating
any constant fraction of additional space will achieve the amortized bound. Python Lists allocate
additional space according to the following formula (from the Python source code written in C):

1 new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);

Here, the additional allocation is modest, roughly one eighth of the size of the array being appended
(bit shifting the size to the right by 3 is equivalent to ﬂoored division by 8). But the additional al-
location is still linear in the size of the array, so on average, n/8 insertions will be performed for
every linear time allocation of the array, i.e. amortized constant time.

What if we also want to remove items from the end of the array? Popping the last item can occur in
constant time, simply by decrementing a stored length of the array (which Python does). However,
if a large number of items are removed from a large list, the unused additional allocation could
occupy a signiﬁcant amount of wasted memory that will not available for other purposes. When
the length of the array becomes sufﬁciently small, we can transfer the contents of the array to a
new, smaller memory allocation so that the larger memory allocation can be freed. How big should
this new allocation be? If we allocate the size of the array without any additional allocation, an
immediate insertion could trigger another allocation. To achieve constant amortized running time
for any sequence of n appends or pops, we need to make sure there remains a linear fraction of
unused allocated space when we rebuild to a smaller array, which guarantees that at least Ω(n)
sequential dynamic operations must occur before the next time we need to reallocate memory.

Below is a Python implementation of a dynamic array sequence, including operationsinsert last

(i.e., Python list append) and delete last (i.e., Python list pop), using table doubling propor-
tions. When attempting to append past the end of the allocation, the contents of the array are
transferred to an allocation that is twice as large. When removing down to one fourth of the alloca-
tion, the contents of the array are transferred to an allocation that is half as large. Of course Python
Lists already support dynamic operations using these techniques; this code is provided to help you
understand how amortized constant append and pop could be implemented.
Recitation 2 7

1 class Dynamic_Array_Seq(Array_Seq):
2 def __init__(self, r = 2): # O(1)
3 super().__init__()
4 self.size = 0
5 self.r = r
6 self._compute_bounds()
7 self._resize(0)
8
9 def __len__(self): return self.size # O(1)
10
11 def __iter__(self): # O(n)
12 for i in range(len(self)): yield self.A[i]
13
14 def build(self, X): # O(n)
15 for a in X: self.insert_last(a)
16
17 def _compute_bounds(self): # O(1)
18 self.upper = len(self.A)
19 self.lower = len(self.A) // (self.r * self.r)
20
21 def _resize(self, n): # O(1) or O(n)
22 if (self.lower < n < self.upper): return
23 m = max(n, 1) * self.r
24 A = [None] * m
25 self._copy_forward(0, self.size, A, 0)
26 self.A = A
27 self._compute_bounds()
28
29 def insert_last(self, x): # O(1)a
30 self._resize(self.size + 1)
31 self.A[self.size] = x
32 self.size += 1
33
34 def delete_last(self): # O(1)a
35 self.A[self.size - 1] = None
36 self.size -= 1
37 self._resize(self.size)
38
39 def insert_at(self, i, x): # O(n)
40 self.insert_last(None)
41 self._copy_backward(i, self.size - (i + 1), self.A, i + 1)
42 self.A[i] = x
43
44 def delete_at(self, i): # O(n)
45 x = self.A[i]
46 self._copy_forward(i + 1, self.size - (i + 1), self.A, i)
47 self.delete_last()
48 return x
49 # O(n)
50 def insert_first(self, x): self.insert_at(0, x)
51 def delete_first(self): return self.delete_at(0)
Recitation 2 8

Exercises:
• Suppose the next pointer of the last node of a linked list points to an earlier node in the list,
creating a cycle. Given a pointer to the head of the list (without knowing its size), describe a
linear-time algorithm to ﬁnd the number of nodes in the cycle. Can you do this while using
only constant additional space outside of the original linked list?
Solution: Begin with two pointers pointing at the head of the linked list: one slow pointer
and one fast pointer. The pointers take turns traversing the nodes of the linked list, starting
with the fast pointer. On the slow pointer’s turn, the slow pointer simply moves to the next
node in the list; while on the fast pointer’s turn, the fast pointer initially moves to the next
node, but then moves on to the next node’s next node before ending its turn. Every time the
fast pointer visits a node, it checks to see whether it’s the same node that the slow pointer
is pointing to. If they are the same, then the fast pointer must have made a full loop around
the cycle, to meet the slow pointer at some node v on the cycle. Now to ﬁnd the length of
the cycle, simply have the fast pointer continue traversing the list until returning back to v,
counting the number of nodes visited along the way.
To see that this algorithm runs in linear time, clearly the last step of traversing the cycle takes
at most linear time, as v is the only node visited twice while traversing the cycle. Further,
we claim the slow pointer makes at most one move per node. Suppose for contradiction the
slow pointer moves twice away from some node u before being at the same node as the fast
pointer, meaning that u is on the cycle. In the same time the slow pointer takes to traverse the
cycle from u back to u, the fast pointer will have traveled around the cycle twice, meaning
that both pointers must have existed at the same node prior to the slow pointer leaving u, a
contradiction.

• Given a data structure implementing the Sequence interface, show how to use it to implement
the Set interface. (Your implementation does not need to be efﬁcient.)
Solution:

1 def Set_from_Seq(seq):
2 class set_from_seq:
3 def __init__(self): self.S = seq()
4 def __len__(self): return len(self.S)
5 def __iter__(self): yield from self.S
6
7 def build(self, A):
8 self.S.build(A)
9
10 def insert(self, x):
11 for i in range(len(self.S)):
12 if self.S.get_at(i).key == x.key:
13 self.S.set_at(i, x)
14 return
15 self.S.insert_last(x)
16
Recitation 2 9

17 def delete(self, k):

18 for i in range(len(self.S)):
19 if self.S.get_at(i).key == k:
20 return self.S.delete_at(i)
21
22 def find(self, k):
23 for x in self:
24 if x.key == k: return x
25 return None
26
27 def find_min(self):
28 out = None
29 for x in self:
30 if (out is None) or (x.key < out.key):
31 out = x
32 return out
33
34 def find_max(self):
35 out = None
36 for x in self:
37 if (out is None) or (x.key > out.key):
38 out = x
39 return out
40
41 def find_next(self, k):
42 out = None
43 for x in self:
44 if x.key > k:
45 if (out is None) or (x.key < out.key):
46 out = x
47 return out
48
49 def find_prev(self, k):
50 out = None
51 for x in self:
52 if x.key < k:
53 if (out is None) or (x.key > out.key):
54 out = x
55 return out
56
57 def iter_ord(self):
58 x = self.find_min()
59 while x:
60 yield x
61 x = self.find_next(x.key)
62
63 return set_from_seq
MIT OpenCourseWare
https://ocw.mit.edu

6.006 Introduction to Algorithms

Spring 2020

Recitation 3
Recall that in Recitation 2 we reduced the Set interface to the Sequence Interface (we simulated
one with the other). This directly provides a Set data structure from an array (albeit a poor one).

Operations O(·)
Container Static Dynamic Order
Data Structure build(X) find(k) insert(x) find min() find prev(k)
delete(k) find max() find next(k)
Array n n n n n

We would like to do better, and we will spend the next five lectures/recitations trying to do exactly
that! One of the simplest ways to get a faster Set is to store our items in a sorted array, where the
item with the smallest key appears first (at index 0), and the item with the largest key appears last.
Then we can simply binary search to find keys and support Order operations! This is still not great
for dynamic operations (items still need to be shifted when inserting or removing from the middle
of the array), but finding items by their key is much faster! But how do we get a sorted array in the
first place?

Operations O(·)
Container Static Dynamic Order
Data Structure build(X) find(k) insert(x) find min() find prev(k)
delete(k) find max() find next(k)
Sorted Array ? log n n 1 log n

1 class Sorted_Array_Set:
2 def __init__(self): self.A = Array_Seq() # O(1)
3 def __len__(self): return len(self.A) # O(1)
4 def __iter__(self): yield from self.A # O(n)
5 def iter_order(self): yield from self # O(n)
6

7 def build(self, X): # O(?)

8 self.A.build(X)
9 self._sort()
10

11 def _sort(self): # O(?)

12 ??
13

14 def _binary_search(self, k, i, j): # O(log n)

15 if i >= j: return i
16 m = (i + j) // 2
17 x = self.A.get_at(m)
18 if x.key > k: return self._binary_search(k, i, m - 1)
19 if x.key < k: return self._binary_search(k, m + 1, j)
Recitation 3 2

20 return m
21
22 def find_min(self): # O(1)
23 if len(self) > 0: return self.A.get_at(0)
24 else: return None
25

26 def find_max(self): # O(1)

27 if len(self) > 0: return self.A.get_at(len(self) - 1)
28 else: return None
29
30 def find(self, k): # O(log n)
31 if len(self) == 0: return None
32 i = self._binary_search(k, 0, len(self) - 1)
33 x = self.A.get_at(i)
34 if x.key == k: return x
35 else: return None
36
37 def find_next(self, k): # O(log n)
38 if len(self) == 0: return None
39 i = self._binary_search(k, 0, len(self) - 1)
40 x = self.A.get_at(i)
41 if x.key > k: return x
42 if i + 1 < len(self): return self.A.get_at(i + 1)
43 else: return None
44

45 def find_prev(self, k): # O(log n)

46 if len(self) == 0: return None
47 i = self._binary_search(k, 0, len(self) - 1)
48 x = self.A.get_at(i)
49 if x.key < k: return x
50 if i > 0: return self.A.get_at(i - 1)
51 else: return None
52
53 def insert(self, x): # O(n)
54 if len(self.A) == 0:
55 self.A.insert_first(x)
56 else:
57 i = self._binary_search(x.key, 0, len(self.A) - 1)
58 k = self.A.get_at(i).key
59 if k == x.key:
60 self.A.set_at(i, x)
61 return False
62 if k > x.key: self.A.insert_at(i, x)
63 else: self.A.insert_at(i + 1, x)
64 return True
65
66 def delete(self, k): # O(n)
67 i = self._binary_search(k, 0, len(self.A) - 1)
68 assert self.A.get_at(i).key == k
69 return self.A.delete_at(i)
Recitation 3 3

Sorting
Sorting an array A of comparable items into increasing order is a common subtask of many com-
putational problems. Insertion sort and selection sort are common sorting algorithms for sorting
small numbers of items because they are easy to understand and implement. Both algorithms are
incremental in that they maintain and grow a sorted subset of the items until all items are sorted.
The difference between them is subtle:
• Selection sort maintains and grows a subset the largest i items in sorted order.
• Insertion sort maintains and grows a subset of the ﬁrst i input items in sorted order.
Selection Sort
Here is a Python implementation of selection sort. Having already sorted the largest items into
sub-array A[i+1:], the algorithm repeatedly scans the array for the largest item not yet sorted
and swaps it with item A[i]. As can be seen from the code, selection sort can require Ω(n2 )
comparisons, but will perform at most O(n) swaps in the worst case.

1 def selection_sort(A): # Selection sort array A

2 for i in range(len(A) - 1, 0, -1): # O(n) loop over array
3 m = i # O(1) initial index of max
4 for j in range(i): # O(i) search for max in A[:i]
5 if A[m] < A[j]: # O(1) check for larger value
6 m = j # O(1) new max found
7 A[m], A[i] = A[i], A[m] # O(1) swap

Insertion Sort
Here is a Python implementation of insertion sort. Having already sorted sub-array A[:i], the
algorithm repeatedly swaps item A[i] with the item to its left until the left item is no larger than
A[i]. As can be seen from the code, insertion sort can require Ω(n2 ) comparisons and Ω(n2 )
swaps in the worst case.

1 def insertion_sort(A): # Insertion sort array A

2 for i in range(1, len(A)): # O(n) loop over array
3 j = i # O(1) initialize pointer
4 while j > 0 and A[j] < A[j - 1]: # O(i) loop over prefix
5 A[j - 1], A[j] = A[j], A[j - 1] # O(1) swap
6 j = j - 1 # O(1) decrement j

In-place and Stability

Both insertion sort and selection sort are in-place algorithms, meaning they can each be imple-
mented using at most a constant amount of additional space. The only operations performed on
the array are comparisons and swaps between pairs of elements. Insertion sort is stable, meaning
that items having the same value will appear in the sort in the same order as they appeared in the
input array. By comparison, this implementation of selection sort is not stable. For example, the
input (2, 1, 10 ) would produce the output (10 , 1, 2).
Recitation 3 4

Merge Sort
In lecture, we introduced merge sort, an asymptotically faster algorithm for sorting large numbers
of items. The algorithm recursively sorts the left and right half of the array, and then merges the
two halves in linear time. The recurrence relation for merge sort is then T (n) = 2T (n/2) + Θ(n),
which solves to T (n) = Θ(n log n). An Θ(n log n) asymptotic growth rate is much closer to
linear than quadratic, as log n grows exponentially slower than n. In particular, log n grows slower
than any polynomial nε for ε > 0.

1 def merge_sort(A, a = 0, b = None): # Sort sub-array A[a:b]

2 if b is None: # O(1) initialize
3 b = len(A) # O(1)
4 if 1 < b - a: # O(1) size k = b - a
5 c = (a + b + 1) // 2 # O(1) compute center
6 merge_sort(A, a, c) # T(k/2) recursively sort left
7 merge_sort(A, c, b) # T(k/2) recursively sort right
8 L, R = A[a:c], A[c:b] # O(k) copy
9 i, j = 0, 0 # O(1) initialize pointers
10 while a < b: # O(n)
11 if (j >= len(R)) or (i < len(L) and L[i] < R[j]): # O(1) check side
12 A[a] = L[i] # O(1) merge from left
13 i = i + 1 # O(1) decrement left pointer
14 else:
15 A[a] = R[j] # O(1) merge from right
16 j = j + 1 # O(1) decrement right pointer
17 a = a + 1 # O(1) decrement merge pointer

Merge sort uses a linear amount of temporary storage (temp) when combining the two halves, so
it is not in-place. While there exist algorithms that perform merging using no additional space,
such implementations are substantially more complicated than the merge sort algorithm. Whether
merge sort is stable depends on how an implementation breaks ties when merging. The above
implementation is not stable, but it can be made stable with only a small modiﬁcation. Can you
modify the implementation to make it stable? We’ve made CoffeeScript visualizers for the merge
step of this algorithm, as well as one showing the recursive call structure. You can ﬁnd them here:
https://codepen.io/mit6006/pen/RYJdOG https://codepen.io/mit6006/pen/wEXOOq

Build a Sorted Array

With an algorithm to sort our array in Θ(n log n), we can now complete our table! We sacriﬁce
some time in building the data structure to speed up order queries. This is a common technique
called preprocessing.
Operations O(·)
Container Static Dynamic Order
Data Structure build(X) find(k) insert(x) find min() find prev(k)
delete(k) find max() find next(k)
Array n n n n n
Sorted Array n log n log n n 1 log n
Recitation 3 5

Recurrences
There are three primary methods for solving recurrences:

• Substitution: Guess a solution and substitute to show the recurrence holds.

• Recursion Tree: Draw a tree representing the recurrence and sum computation at nodes.
This is a very general method, and is the one we’ve used in lecture so far.

• Master Theorem: A general formula to solve a large class of recurrences. It is useful, but
can also be hard to remember.

Master Theorem
The Master Theorem provides a way to solve recurrence relations in which recursive calls de-
crease problem size by a constant factor. Given a recurrence relation of the form T (n) = aT (n/b)+
f (n) and T (1) = Θ(1), with branching factor a ≥ 1, problem size reduction factor b > 1, and
asymptotically non-negative function f (n), the Master Theorem gives the solution to the recur-
rence by comparing f (n) to alogb n = nlogb a , the number of leaves at the bottom of the recursion
tree. When f (n) grows asymptotically faster than nlogb a , the work done at each level decreases
geometrically so the work at the root dominates; alternatively, when f (n) grows slower, the work
done at each level increases geometrically and the work at the leaves dominates. When their growth
rates are comparable, the work is evenly spread over the tree’s O(log n) levels.

n 1 × f (n)

n
b a × f ( nb )

n
bi ai × f ( bni )

1 alogb n × f (1)

case solution conditions

logb a
1 T (n) = Θ(n ) f (n) = O(nlogb a−ε ) for some constant ε > 0
2 T (n) = Θ(nlogb a logk+1 n) f (n) = Θ(nlogb a logk n) for some constant k ≥ 0
3 T (n) = Θ(f (n)) f (n) = Ω(nlogb a+ε ) for some constant ε > 0
and af (n/b) < cf (n) for some constant 0 < c < 1

The Master Theorem takes on a simpler form when f (n) is a polynomial, such that the recurrence
has the from T (n) = aT (n/b) + Θ(nc ) for some constant c ≥ 0.
Recitation 3 6

case solution conditions intuition

1 T (n) = Θ(nlogb a ) c < logb a Work done at leaves dominates
2 T (n) = Θ(nc log n) c = logb a Work balanced across the tree
c
3 T (n) = Θ(n ) c > logb a Work done at root dominates

This special case is straight-forward to prove by substitution (this can be done in recitation). To
apply the Master Theorem (or this simpler special case), you should state which case applies, and
show that your recurrence relation satisﬁes all conditions required by the relevant case. There are
even stronger (more general) formulas1 to solve recurrences, but we will not use them in this class.

Exercises
1. Write a recurrence for binary search and solve it.
Solution: T (n) = T (n/2) + O(1) so T (n) = O(log n) by case 2 of Master Theorem.

2. T (n) = T (n − 1) + O(1)
Solution: T (n) = O(n), length n chain, O(1) work per node.

3. T (n) = T (n − 1) + O(n)
Solution: T (n) = O(n2 ), length n chain, O(k) work per node at height k.

4. T (n) = 2T (n − 1) + O(1)
Solution: T (n) = O(2n ), height n binary tree, O(1) work per node.

5. T (n) = T (2n/3) + O(1)

Solution: T (n) = O(log n), length log3/2 (n) chain, O(1) work per node.

6. T (n) = 2T (n/2) + O(1)

Solution: T (n) = O(n), height log2 n binary tree, O(1) work per node.

7. T (n) = T (n/2) + O(n)

Solution: T (n) = O(n), length log2 n chain, O(2k ) work per node at height k.

8. T (n) = 2T (n/2) + O(n log n)

Solution: T (n) = O(n log2 n) (special case of Master Theorem does not apply because
n log n is not polynomial), height log2 n binary tree, O(k · 2k ) work per node at height k.

9. T (n) = 4T (n/2) + O(n)

Solution: T (n) = O(n2 ), height log2 n degree-4 tree, O(2k ) work per node at height k.

1
http://en.wikipedia.org/wiki/Akra-Bazzi_method
MIT OpenCourseWare
https://ocw.mit.edu

6.006 Introduction to Algorithms

Spring 2020

Recitation 4

Operations O(·)
Container Static Dynamic Order
Data Structure build(X) find(k) insert(x) find min() find prev(k)
delete(k) find max() find next(k)
Array n n n n n
Sorted Array n log n log n n 1 log n

We’ve learned how to implement a set interface using a sorted array, where query operations are
efﬁcient but whose dynamic operations are lacking. Recalling that Θ(log n) growth is much closer
to Θ(1) than Θ(n), a sorted array provides really good performance! But one of the most common
operations you will do in programming is to search for something you’re storing, i.e., find(k).
Is it possible to find faster than Θ(log n)? It turns out that if the only thing we can do to items is
to compare their relative order, then the answer is no!

Comparison Model
The comparison model of computation acts on a set of comparable objects. The objects can be
thought of as black boxes, supporting only a set of binary boolean operations called comparisons
(namely <, ≤, >, ≥, =, and = 6 ). Each operation takes as input two objects and outputs a Boolean
value, either True or False, depending on the relative ordering of the elements. A search algorithm
operating on a set of n items will return a stored item with a key equal to the input key, or return
no item if no such item exists. In this section, we assume that each item has a unique key.

If binary comparisons are the only way to distinguish between stored items and a search key, a
deterministic comparison search algorithm can be thought of as a ﬁxed binary decision tree rep-
resenting all possible executions of the algorithm, where each node represents a comparison per-
formed by the algorithm. During execution, the algorithm walks down the tree along a path from
the root. For any given input, a comparison sorting algorithm will make some comparison ﬁrst, the
comparison at the root of the tree. Depending on the outcome of this comparison, the computation
will then proceed with a comparison at one of its two children. The algorithm repeatedly makes
comparisons until a leaf is reached, at which point the algorithm terminates, returning an output to
the algorithm. There must be a leaf for each possible output to the algorithm. For search, there are
n + 1 possible outputs, the n items and the result where no item is found, so there must be at least
n + 1 leaves in the decision tree. Then the worst-case number of comparisons that must be made
by any comparison search algorithm will be the height of the algorithm’s decision tree, i.e., the
length of any longest root to leaf path.
Recitation 4 2

Exercise: Prove that the smallest height for any tree on n nodes is dlg(n + 1)e − 1 = Ω(log n).

Solution: We show that the maximum number of nodes in any binary tree with height h is
n ≤ T (h) = 2h+1 − 1, so h ≥ (lg(n + 1)) − 1. Proof by induction on h. The only tree of
height zero has one node, so T (0) = 1, a base case satisfying the claim. The maximum number
of nodes in a height-h tree must also have the maximum number of nodes in its two subtrees, so
T (h) = 2T (h − 1) + 1. Substituting T (h) yields 2h+1 − 1 = 2(2h − 1) + 1, proving the claim.

A tree with n + 1 leaves has more than n nodes, so its height is at least Ω(log n). Thus the min-
imum number of comparisons needed to distinguish between the n items is at least Ω(log n), and
the worst-case running time of any deterministic comparison search algorithm is at least Ω(log n)!
So sorted arrays and balanced BSTs are able to support find(k) asymptotically optimally, in a
comparison model of computation.

Comparisons are very limiting because each operation performed can lead to at most constant
branching factor in the decision tree. It doesn’t matter that comparisons have branching factor
two; any ﬁxed constant branching factor will lead to a decision tree with at least Ω(log n) height.
If we were not limited to comparisons, it opens up the possibility of faster-than-O(log n) search.
More speciﬁcally, if we can use an operation that allows for asymptotically larger than constant
ω(1) branching factor, then our decision tree could be shallower, leading to a faster algorithm.

Direct Access Arrays

Most operations within a computer only allow for constant logical branching, like if statements in
your code. However, one operation on your computer allows for non-constant branching factor:
specifically the ability to randomly access any memory address in constant time. This special oper-
ation allows an algorithm’s decision tree to branch with large branching factor, as large as there is
space in your computer. To exploit this operation, we define a data structure called a direct access
array, which is a normal static array that associates a semantic meaning with each array index
location: specifically that any item x with key k will be stored at array index k. This statement
only makes sense when item keys are integers. Fortunately, in a computer, any thing in memory
can be associated with an integer—for example, its value as a sequence of bits or its address in
memory—so from now on we will only consider integer keys.

Now suppose we want to store a set of n items, each associated with a unique integer key in the
bounded range from 0 to some large number u − 1. We can store the items in a length u direct
access array, where each array slot i contains an item associated with integer key i, if it exists. To
ﬁnd an item having integer key i, a search algorithm can simply look in array slot i to respond to
the search query in worst-case constant time! However, order operations on this data structure
will be very slow: we have no guarantee on where the ﬁrst, last, or next element is in the direct
access array, so we may have to spend u time for order operations.
Recitation 4 3

Worst-case constant time search comes at the cost of storage space: a direct access array must have
a slot available for every possible key in range. When u is very large compared to the number of
items being stored, storing a direct access array can be wasteful, or even impossible on modern
machines. For example, suppose you wanted to support the set find(k) operation on ten-letter
names using a direct access array. The space of possible names would be u ≈ 2610 ≈ 9.5 × 1013 ;
even storing a bit array of that length would require 17.6 Terabytes of storage space. How can we
overcome this obstacle? The answer is hashing!

1 class DirectAccessArray:
2 def __init__(self, u): self.A = [None] * u # O(u)
3 def find(self, k): return self.A[k] # O(1)
4 def insert(self, x): self.A[x.key] = x # O(1)
5 def delete(self, k): self.A[k] = None # O(1)
6 def find_next(self, k):
7 for i in range(k, len(self.A)): # O(u)
8 if A[i] is not None:
9 return A[i]
10 def find_max(self):
11 for i in range(len(self.A) - 1, -1, -1): # O(u)
12 if A[i] is not None:
13 return A[i]
14 def delete_max(self):
15 for i in range(len(self.A) - 1, -1, -1): # O(u)
16 x = A[i]
17 if x is not None:
18 A[i] = None
19 return x

Hashing
Is it possible to get the performance beneﬁts of a direct access array while using only linear O(n)
space when n u? A possible solution could be to store the items in a smaller dynamic direct
access array, with m = O(n) slots instead of u, which grows and shrinks like a dynamic array
depending on the number of items stored. But to make this work, we need a function that maps
item keys to different slots of the direct access array, h(k) : {0, . . . , u − 1} → {0, . . . , m − 1}. We
call such a function a hash function or a hash map, while the smaller direct access array is called
a hash table, and h(k) is the hash of integer key k. If the hash function happens to be injective
over the n keys you are storing, i.e. no two keys map to the same direct access array index, then
we will be able to support worst-case constant time search, as the hash table simply acts as a direct
access array over the smaller domain m.
Recitation 4 4

Unfortunately, if the space of possible keys is larger than the number of array indices, i.e. m < u,
then any hash function mapping u possible keys to m indices must map multiple keys to the same
array index, by the pigeonhole principle. If two items associated with keys k1 and k2 hash to the
same index, i.e. h(k1 ) = h(k2 ), we say that the hashes of k1 and k2 collide. If you don’t know in
advance what keys will be stored, it is extremely unlikely that your choice of hash function will
avoid collisions entirely1 . If the smaller direct access array hash table can only store one item at
each index, when collisions occur, where do we store the colliding items? Either we store collisions
somewhere else in the same direct access array, or we store collisions somewhere else. The ﬁrst
strategy is called open addressing, which is the way most hash tables are actually implemented,
but such schemes can be difﬁcult to analyze. We will adopt the second strategy called chaining.

Chaining
Chaining is a collision resolution strategy where colliding keys are stored separately from the orig-
inal hash table. Each hash table index holds a pointer to a chain, a separate data structure that sup-
ports the dynamic set interface, speciﬁcally operations find(k), insert(x), and delete(k).
It is common to implement a chain using a linked list or dynamic array, but any implementation
will do, as long as each operation takes no more than linear time. Then to insert item x into the
hash table, simply insert x into the chain at index h(x.key); and to find or delete a key k from
the hash table, simply ﬁnd or delete k from the chain at index h(k).

Ideally, we want chains to be small, because if our chains only hold a constant number of items,
the dynamic set operations will run in constant time. But suppose we are unlucky in our choice of
hash function, and all the keys we want to store has all of them to the same index location, into the
same chain. Then the chain will have linear size, meaning the dynamic set operations could take
linear time. A good hash function will try to minimize the frequency of such collisions in order to
minimize the maximum size of any chain. So what’s a good hash function?

Hash Functions
Division Method (bad): The simplest mapping from an integer key domain of size u to a smaller
one of size m is simply to divide the key by m and take the remainder: h(k) = (k mod m), or in
Python, k % m. If the keys you are storing are uniformly distributed over the domain, the division
method will distribute items roughly evenly among hashed indices, so we expect chains to have
small size providing good performance. However, if all items happen to have keys with the same
remainder when divided by m, then this hash function will be terrible. Ideally, the performance of
our data structure would be independent of the keys we choose to store.

1
If you know all of the keys you will want to store in advance, it is possible to design a hashing scheme that will
always avoid collisions between those keys. This idea, called perfect hashing, follows from the Birthday Paradox.
Recitation 4 5

Universal Hashing (good): For a large enough key domain u, every hash function will be bad for
some set of n inputs2 . However, we can achieve good expected bounds on hash table performance
by choosing our hash function randomly from a large family of hash functions. Here the expecta-
tion is over our choice of hash function, which is independent of the input. This is not expectation
over the domain of possible input keys. One family of hash functions that performs well is:
n o
H(m, p) = hab (k) = (((ak + b) mod p) mod m) a, b ∈ {0, . . . , p − 1} and a 6= 0 ,

where p is a prime that is larger than the key domain u. A single hash function from this family
is speciﬁed by choosing concrete values for a and b. This family of hash functions is universal3 :
for any two keys, the probability that their hashes will collide when hashed using a hash function
chosen uniformly at random from the universal family, is no greater than 1/m, i.e.
Pr {h(ki ) = h(kj )} ≤ 1/m, ∀ki =
6 kj ∈ {0, . . . , u − 1}.
h∈H

If we know that a family of hash functions is universal, then we can upper bound the expected
size of any chain, in expectation over our choice of hash function from the family. Let Xij be
the indicator random variable representing the value 1 if keys ki and kj collide for a chosen hash
function, and 0 otherwise. Then the Prandom variable representing the number of items hashed to
index h(ki ) will be the sum Xi = j Xij over all keys kj from the set of n keys {k0 , . . . , kn−1 }
stored in the hash table. Then the expected number of keys hashed to the chain at index h(ki ) is:
⎧ ⎫
⎨X ⎬ X X
E {Xi } = E Xij = E {Xij } = 1 + E {Xij }
h∈H h∈H ⎩ ⎭ h∈H h∈H
j j 6 i
j=
X
=1+ (1) Pr {h(ki ) = h(kj )} + (0) Pr {h(ki ) =
6 h(kj )}
h∈H h∈H
j6=i
X
≤1+ 1/m = 1 + (n − 1)/m.
j6=i

If the size of the hash table is at least linear in the number of items stored, i.e. m = Ω(n), then
the expected size of any chain will be 1 + (n − 1)/Ω(n) = O(1), a constant! Thus a hash table
where collisions are resolved using chaining, implemented using a randomly chosen hash function
from a universal family, will perform dynamic set operations in expected constant time, where
the expectation is taken over the random choice of hash function, independent from the input keys!
Note that in order to maintain m = O(n), insertion and deletion operations may require you to
rebuild the direct access array to a different size, choose a new hash function, and reinsert all the
items back into the hash table. This can be done in the same way as in dynamic arrays, leading to
amortized bounds for dynamic operations.

2
If u > nm, every hash function from u to m maps some n keys to the same hash, by the pigeonhole principle.
3
The proof that this family is universal is beyond the scope of 6.006, though it is usually derived in 6.046.
Recitation 4 6

1 class Hash_Table_Set:
2 def __init__(self, r = 200): # O(1)
3 self.chain_set = Set_from_Seq(Linked_List_Seq)
4 self.A = []
5 self.size = 0
6 self.r = r # 100/self.r = fill ratio
7 self.p = 2**31 - 1
8 self.a = randint(1, self.p - 1)
9 self._compute_bounds()
10 self._resize(0)
11
12 def __len__(self): return self.size # O(1)
13 def __iter__(self): # O(n)
14 for X in self.A:
15 yield from X
16
17 def build(self, X): # O(n)e
18 for x in X: self.insert(x)
19
20 def _hash(self, k, m): # O(1)
21 return ((self.a * k) % self.p) % m
22
23 def _compute_bounds(self): # O(1)
24 self.upper = len(self.A)
25 self.lower = len(self.A) * 100*100 // (self.r*self.r)
26
27 def _resize(self, n): # O(n)
28 if (self.lower >= n) or (n >= self.upper):
29 f = self.r // 100
30 if self.r % 100: f += 1
31 # f = ceil(r / 100)
32 m = max(n, 1) * f
33 A = [self.chain_set() for _ in range(m)]
34 for x in self:
35 h = self._hash(x.key, m)
36 A[h].insert(x)
37 self.A = A
38 self._compute_bounds()
39

40 def find(self, k): # O(1)e

41 h = self._hash(k, len(self.A))
42 return self.A[h].find(k)
43
44 def insert(self, x): # O(1)ae
45 self._resize(self.size + 1)
46 h = self._hash(x.key, len(self.A))
47 added = self.A[h].insert(x)
48 if added: self.size += 1
49 return added
50
51
Recitation 4 7

52 def delete(self, k): # O(1)ae

53 assert len(self) > 0
54 h = self._hash(k, len(self.A))
55 x = self.A[h].delete(k)
56 self.size -= 1
57 self._resize(self.size)
58 return x
59
60 def find_min(self): # O(n)
61 out = None
62 for x in self:
63 if (out is None) or (x.key < out.key):
64 out = x
65 return out
66

67 def find_max(self): # O(n)

68 out = None
69 for x in self:
70 if (out is None) or (x.key > out.key):
71 out = x
72 return out
73
74 def find_next(self, k): # O(n)
75 out = None
76 for x in self:
77 if x.key > k:
78 if (out is None) or (x.key < out.key):
79 out = x
80 return out
81

82 def find_prev(self, k): # O(n)

83 out = None
84 for x in self:
85 if x.key < k:
86 if (out is None) or (x.key > out.key):
87 out = x
88 return out
89
90 def iter_order(self): # O(nˆ2)
91 x = self.find_min()
92 while x:
93 yield x
94 x = self.find_next(x.key)
Recitation 4 8

Exercise
Given an unsorted array A = [a0 , . . . , an−1 ] containing n positive integers, the D UPLICATES prob-
lem asks whether two integers in the array have the same value.

1) Describe a brute-force worst-case

�n O(n2 )-time algorithm to solve D UPLICATES.
Solution: Loop through all 2 = O(n2 ) pairs of integers from the array and check if they are
equal in O(1) time.

2) Describe a worst-case O(n log n)-time algorithm to solve D UPLICATES.

Solution: Sort the array in worst-case O(n log n) time (e.g. using merge sort), and then scan
through the sorted array, returning if any of the O(n) adjacent pairs have the same value.

3) Describe an expected O(n)-time algorithm to solve D UPLICATES.

Solution: Hash each of the n integers into a hash table (implemented using chaining and a hash
function chosen randomly from a universal hash family4 ), with insertion taking expected O(1)
time. When inserting an integer into a chain, check it against the other integers already in the
chain, and return if another integer in the chain has the same value. Since each chain has expected
O(1) size, this check takes expected O(1) time, so the algorithm runs in expected O(n) time.

4) If k < n and ai ≤ k for all ai ∈ A, describe a worst-case O(1)-time algorithm to solve

D UPLICATES.
Solution: If k < n, a duplicate always exists, by the pigeonhole principle.

5) If n ≤ k and ai ≤ k for all ai ∈ A, describe a worst-case O(k)-time algorithm to solve

D UPLICATES.
Solution: Insert each of the n integers into a direct access array of length k, which will take
worst-case O(k) time to instantiate, and worst-case O(1) time per insert operation. If an integer
already exists at an array index when trying to insert, then return that a duplicate exists.

4
In 6.006, you do not have to specify these details when answering problems. You may simply quote that hash
tables can achieve the expected/amortized bounds for operations described in class.
MIT OpenCourseWare
https://ocw.mit.edu

6.006 Introduction to Algorithms

Spring 2020

Recitation 5

Comparison Sorting
Last time we discussed a lower bound on search in a comparison model. We can use a similar
analysis to lower bound the worst-case running time of any sorting algorithm that only uses com-
parisons. There are n! possible outputs to a sorting algorithm: the n! permutations of the items.
Then the decision tree for any deterministic sorting algorithm that uses only comparisons must
have at least n! leaves, and thus (by the same analysis as the search decision tree) must have height
that is at least Ω(log(n!)) = Ω(n log n) height1 , leading to a running time of at least Ω(n log n).

Direct Access Array Sort

Just as with search, if we are not limited to comparison operations, it is possible to beat the
Ω(n log n) bound. If the items to be sorted have unique keys from a bounded positive range
{0, . . . , u − 1} (so n ≤ u), we can sort them simply by using a direct access array. Construct a
direct access array with size u and insert each item x into index x.key. Then simply read through
the direct access array from left to right returning items as they are found. Inserting takes time
Θ(n) time while initializing and scanning the direct access array takes Θ(u) time, so this sorting
algorithm runs in Θ(n + u) time. If u = O(n), then this algorithm is linear! Unfortunately, this
sorting algorithm has two drawbacks: ﬁrst, it cannot handle duplicate keys, and second, it cannot
handle large key ranges.

√
1
We can prove this directly via Stirling’s approximation, n! ≈ 2πn(n/e)n , or by observing that n! > (n/2)n/2 .
Recitation 5 2

Counting Sort
To solve the ﬁrst problem, we simply link a chain to each direct access array index, just like in
hashing. When multiple items have the same key, we store them both in the chain associated with
their key. Later, it will be important that this algorithm be stable: that items with duplicate keys
appear in the same order in the output as the input. Thus, we choose chains that will support
a sequence queue interface to keep items in order, inserting to the end of the queue, and then
returning items back in the order that they were inserted.

Counting sort takes O(u) time to initialize the chains of the direct access array, O(n) time to insert
all the elements, and then O(u) time to scan back through the direct access array to return the
items; so the algorithm runs in O(n + u) time. Again, when u = O(n), then counting sort runs in
linear time, but this time allowing duplicate keys.

There’s another implementation of counting sort which just keeps track of how many of each key
map to each index, and then moves each item only once, rather the implementation above which
moves each item into a chain and then back into place. The implementation below computes the
ﬁnal index location of each item via cumulative sums.

1 def counting_sort(A):
2 "Sort A assuming items have non-negative keys"
3 u = 1 + max([x.key for x in A]) # O(n) find maximum key
4 D = [0] * u # O(u) direct access array
5 for x in A: # O(n) count keys
6 D[x.key] += 1
7 for k in range(1, u): # O(u) cumulative sums
8 D[k] += D[k - 1]
9 for x in list(reversed(A)): # O(n) move items into place
10 A[D[x.key] - 1] = x
11 D[x.key] -= 1

Now what if we want to sort keys from a larger integer range? Our strategy will be to break up
integer keys into parts, and then sort each part! In order to do that, we will need a sorting strategy
to sort tuples, i.e. multiple parts.
Recitation 5 3

Tuple Sort
Suppose we want to sort tuples, each containing many different keys (e.g. x.k1 , x.k2 , x.k3 , . . .), so
that the sort is lexicographic with respect to some ordering of the keys (e.g. that key k1 is more
important than key k2 is more important than key k3 , etc.). Then tuple sort uses a stable sorting
algorithm as a subroutine to repeatedly sort the objects, ﬁrst according to the least important key,
then the second least important key, all the way up to most important key, thus lexicographically
sorting the objects. Tuple sort is similar to how one might sort on multiple rows of a spreadsheet
by different columns. However, tuple sort will only be correct if the sorting from previous rounds
are maintained in future rounds. In particular, tuple sort requires the subroutine sorting algorithms
be stable.

Radix Sort
Now, to increase the range of integer sets that we can sort in linear time, we break each integer up
into its multiples of powers of n, representing each item key its sequence of digits when represented
in base n. If the integers are non-negative and the largest integer in the set is u, then this base n
number will have dlogn ue digits. We can think of these digit representations as tuples and sort
them with tuple sort by sorting on each digit in order from least signiﬁcant to most signiﬁcant digit
using counting sort. This combination of tuple sort and counting sort is called radix sort. If the
largest integer in the set u ≤ nc , then radix sort runs in O(nc) time. Thus, if c is constant, then
radix sort also runs in linear time!

We’ve made a CoffeeScript Counting/Radix sort visualizer which you can ﬁnd here:
https://codepen.io/mit6006/pen/LgZgrd
Recitation 5 4

Exercises
1) Sort the following integers using a base-10 radix sort.

(329, 457, 657, 839, 436, 720, 355) −→ (329, 355, 436, 457, 657, 720, 839)

2) Describe a linear time algorithm to sort n integers from the range [−n2 , . . . , n3 ].
Solution: Add n2 to each number so integers are all positive, apply Radix sort, and then subtract
n2 from each element of the output.

3) Describe a linear time algorithm to sort a set n of strings, each having k English characters.
Solution: Use tuple sort to repeatedly sort the strings by each character from right to left with
counting sort, using the integers {0, . . . , 25} to represent the English alphabet. There are k rounds
of counting sort, and each round takes Θ(n + 26) = Θ(n) time, thus the algorithm runs in Θ(nk)
time. This running time is linear because the input size is Θ(nk).
MIT OpenCourseWare
https://ocw.mit.edu

6.006 Introduction to Algorithms

Spring 2020

Recitation 6

Binary Trees
A binary tree is a tree (a connected graph with no cycles) of binary nodes: a linked node con-
tainer, similar to a linked list node, having a constant number of ﬁelds:
• a pointer to an item stored at the node,
• a pointer to a parent node (possibly None),
• a pointer to a left child node (possibly None), and
• a pointer to a right child node (possibly None).

1 class Binary_Node:
2 def __init__(A, x): # O(1)
3 A.item = x
4 A.left = None
5 A.right = None
6 A.parent = None
7 # A.subtree_update() # wait for R07!

Why is a binary node called “binary”? In actuality, a binary node can be connected to three other
nodes (its parent, left child, and right child), not just two. However, we will differentiate a node’s
parent from it’s children, and so we call the node “binary” based on the number of children the
node has.

A binary tree has one node that that is the root of the tree: the only node in the tree lacking a
parent. All other nodes in the tree can reach the root of the tree containing them by traversing
parent pointers. The set of nodes passed when traversing parent pointers from node <X> back to
the root are called the ancestors for <X> in the tree. The depth of a node <X> in the subtree rooted
at <R> is the length of the path from <X> back to <R>. The height of node <X> is the maximum
depth of any node in the subtree rooted at <X>. If a node has no children, it is called a leaf.

Why would we want to store items in a binary tree? The difﬁculty with a linked list is that many
linked-list nodes can be O(n) pointer hops away from the head of the list, so it may take O(n)
time to reach them. By contrast, as we’ve seen in earlier recitations, it is possible to construct a
binary tree on n nodes such that no node is more than O(log n) pointer hops away from the root,
i.e., there exist binary trees with logarithmic height. The power of a binary tree structure is if we
can keep the height h of the tree low, i.e., O(log n), and only perform operations on the tree that
run in time on the order of the height of the tree, then these operations will run in O(h) = O(log n)
time (which is much closer to O(1) than to O(n)).
Recitation 6 2

Traversal Order
The nodes in a binary tree have a natural order based on the fact that we distinguish one child to
be left and one child to be right. We deﬁne a binary tree’s traversal order based on the following
implicit characterization:

• every node in the left subtree of node <X> comes before <X> in the traversal order; and

• every node in the right subtree of node <X> comes after <X> in the traversal order.

Given a binary node <A>, we can list the nodes in <A>’s subtree by recursively listing the nodes in
<A>’s left subtree, listing <A> itself, and then recursively listing the nodes in <A>’s right subtree.
This algorithm runs in O(n) time because every node is recursed on once doing constant work.

1 def subtree_iter(A): # O(n)

2 if A.left: yield from A.left.subtree_iter()
3 yield A
4 if A.right: yield from A.right.subtree_iter()

Right now, there is no semantic connection between the items being stored and the traversal order
of the tree. Next time, we will provide two different semantic meanings to the traversal order (one
of which will lead to an efﬁcient implementation of the Sequence interface, and the other will lead
to an efﬁcient implementation of the Set interface), but for now, we will just want to preserve the
traversal order as we manipulate the tree.

Tree Navigation
Given a binary tree, it will be useful to be able to navigate the nodes in their traversal order effi-
ciently. Probably the most straight forward operation is to find the node in a given node’s subtree
that appears first (or last) in traversal order. To find the first node, simply walk left if a left child
exists. This operation takes O(h) time because each step of the recursion moves down the tree.
Find the last node in a subtree is symmetric.

1 def subtree_first(A): # O(h)

2 if A.left: return A.left.subtree_first()
3 else: return A
4
5 def subtree_last(A): # O(h)
6 if A.right: return A.right.subtree_last()
7 else: return A

Given a node in a binary tree, it would also be useful too find the next node in the traversal order,
i.e., the node’s successor, or the previous node in the traversal order, i.e., the node’s predecessor.
To find the successor of a node <A>, if <A> has a right child, then <A>’s successor will be the first
node in the right child’s subtree. Otherwise, <A>’s successor cannot exist in <A>’s subtree, so we
walk up the tree to find the lowest ancestor of <A> such that <A> is in the ancestor’s left subtree.
Recitation 6 3

In the first case, the algorithm only walks down the tree to find the successor, so it runs in O(h)
time. Alternatively in the second case, the algorithm only walks up the tree to find the successor,
so it also runs in O(h) time. The predecessor algorithm is symmetric.

1 def successor(A): # O(h)

2 if A.right: return A.right.subtree_first()
3 while A.parent and (A is A.parent.right):
4 A = A.parent
5 return A.parent
6
7 def predecessor(A): # O(h)
8 if A.left: return A.left.subtree_last()
9 while A.parent and (A is A.parent.left):
10 A = A.parent
11 return A.parent

Dynamic Operations
If we want to add or remove items in a binary tree, we must take care to preserve the traversal order
of the other items in the tree. To insert a node before a given node <A> in the traversal order,
either node <A> has a left child or not. If <A> does not have a left child, than we can simply add
 as the left child of <A>. Otherwise, if <A> has a left child, we can add as the right child of
the last node in <A>’s left subtree (which cannot have a right child). In either case, the algorithm
walks down the tree at each step, so the algorithm runs in O(h) time. Inserting after is symmetric.

1 def subtree_insert_before(A, B): # O(h)

2 if A.left:
3 A = A.left.subtree_last()
4 A.right, B.parent = B, A
5 else:
6 A.left, B.parent = B, A
7 # A.maintain() # wait for R07!
8
9 def subtree_insert_after(A, B): # O(h)
10 if A.right:
11 A = A.right.subtree_first()
12 A.left, B.parent = B, A
13 else:
14 A.right, B.parent = B, A
15 # A.maintain() # wait for R07!

To delete the item contained in a given node from its binary tree, there are two cases based on
whether the node storing the item is a leaf. If the node is a leaf, then we can simply clear the
child pointer from the node’s parent and return the node. Alternatively, if the node is not a leaf, we
can swap the node’s item with the item in the node’s successor or predecessor down the tree until
the item is in a leaf which can be removed. Since swapping only occurs down the tree, again this
operation runs in O(h) time.
Recitation 6 4

1 def subtree_delete(A): # O(h)

2 if A.left or A.right: # A is not a leaf
3 if A.left: B = A.predecessor()
4 else: B = A.successor()
5 A.item, B.item = B.item, A.item
6 return B.subtree_delete()
7 if A.parent: # A is a leaf
8 if A.parent.left is A: A.parent.left = None
9 else: A.parent.right = None
10 # A.parent.maintain() # wait for R07!
11 return A

Binary Node Full Implementation

1 class Binary_Node:
2 def __init__(A, x): # O(1)
3 A.item = x
4 A.left = None
5 A.right = None
6 A.parent = None
7 # A.subtree_update() # wait for R07!
8
9 def subtree_iter(A): # O(n)
10 if A.left: yield from A.left.subtree_iter()
11 yield A
12 if A.right: yield from A.right.subtree_iter()
13
14 def subtree_first(A): # O(h)
15 if A.left: return A.left.subtree_first()
16 else: return A
17
18 def subtree_last(A): # O(h)
19 if A.right: return A.right.subtree_last()
20 else: return A
21
22 def successor(A): # O(h)
23 if A.right: return A.right.subtree_first()
24 while A.parent and (A is A.parent.right):
25 A = A.parent
26 return A.parent
27
28 def predecessor(A): # O(h)
29 if A.left: return A.left.subtree_last()
30 while A.parent and (A is A.parent.left):
31 A = A.parent
32 return A.parent
33
Recitation 6 5

34 def subtree_insert_before(A, B): # O(h)

35 if A.left:
36 A = A.left.subtree_last()
37 A.right, B.parent = B, A
38 else:
39 A.left, B.parent = B, A
40 # A.maintain() # wait for R07!
41
42 def subtree_insert_after(A, B): # O(h)
43 if A.right:
44 A = A.right.subtree_first()
45 A.left, B.parent = B, A
46 else:
47 A.right, B.parent = B, A
48 # A.maintain() # wait for R07!
49
50 def subtree_delete(A): # O(h)
51 if A.left or A.right:
52 if A.left: B = A.predecessor()
53 else: B = A.successor()
54 A.item, B.item = B.item, A.item
55 return B.subtree_delete()
56 if A.parent:
57 if A.parent.left is A: A.parent.left = None
58 else: A.parent.right = None
59 # A.parent.maintain() # wait for R07!
60 return A

Top-Level Data Structure

All of the operations we have defined so far have been within the Binary Tree class, so that they
apply to any subtree. Now we can finally define a general Binary Tree data structure that stores a
pointer to its root, and the number of items it stores. We can implement the same operations with
a little extra work to keep track of the root and size.

1 class Binary_Tree:
2 def __init__(T, Node_Type = Binary_Node):
3 T.root = None
4 T.size = 0
5 T.Node_Type = Node_Type
6
7 def __len__(T): return T.size
8 def __iter__(T):
9 if T.root:
10 for A in T.root.subtree_iter():
11 yield A.item
Recitation 6 6

Exercise: Given an array of items A = (a0 , . . . , an−1 ), describe a O(n)-time algorithm to con-
struct a binary tree T containing the items in A such that (1) the item stored in the ith node of T ’s
traversal order is item ai , and (2) T has height O(log n).

Solution: Build T by storing the middle item in a root node, and then recursively building the
remaining left and right halves in left and right subtrees. This algorithm satisﬁes property (1) by
deﬁnition of traversal order, and property (2) because the height roughly follows the recurrence
H(n) = 1 + H(n/2). The algorithm runs in O(n) time because every node is recursed on once
doing constant work.

1 def build(X):
2 A = [x for x in X]
3 def build_subtree(A, i, j):
4 c = (i + j) // 2
5 root = self.Node_Type(A[c])
6 if i < c: # needs to store more items in left subtree
7 root.left = build_subtree(A, i, c - 1)
8 root.left.parent = root
9 if c < j: # needs to store more items in right subtree
10 root.right = build_subtree(A, c + 1, j)
11 root.right.parent = root
12 return root
13 self.root = build_subtree(A, 0, len(A)-1)

Exercise: Argue that the following iterative procedure to return the nodes of a tree in traversal
order takes O(n) time.

1 def tree_iter(T):
2 node = T.subtree_first()
3 while node:
4 yield node
5 node = node.successor()

Solution: This procedure walks around the tree traversing each edge of the tree twice: once going
down the tree, and once going back up. Then because the number of edges in a tree is one fewer
than the number of nodes, the traversal takes O(n) time.
Recitation 6 7

Application: Set
To use a Binary Tree to implement a Set interface, we use the traversal order of the tree to store the
items sorted in increasing key order. This property is often called the Binary Search Tree Prop-
erty, where keys in a node’s left subtree are less than the key stored at the node, and keys in the
node’s right subtree are greater than the key stored at the node. Then ﬁnding the node containing
a query key (or determining that no node contains the key) can be done by walking down the tree,
recursing on the appropriate side.

Exercise: Make a Set Binary Tree (Binary Search Tree) by inserting student-chosen items one by
one, then searching and/or deleting student-chosen keys one by one.

1 class BST_Node(Binary_Node):
2 def subtree_find(A, k): # O(h)
3 if k < A.item.key:
4 if A.left: return A.left.subtree_find(k)
5 elif k > A.item.key:
6 if A.right: return A.right.subtree_find(k)
7 else: return A
8 return None
9
10 def subtree_find_next(A, k): # O(h)
11 if A.item.key <= k:
12 if A.right: return A.right.subtree_find_next(k)
13 else: return None
14 elif A.left:
15 B = A.left.subtree_find_next(k)
16 if B: return B
17 return A
18
19 def subtree_find_prev(A, k): # O(h)
20 if A.item.key >= k:
21 if A.left: return A.left.subtree_find_prev(k)
22 else: return None
23 elif A.right:
24 B = A.right.subtree_find_prev(k)
25 if B: return B
26 return A
27
28 def subtree_insert(A, B): # O(h)
29 if B.item.key < A.item.key:
30 if A.left: A.left.subtree_insert(B)
31 else: A.subtree_insert_before(B)
32 elif B.item.key > A.item.key:
33 if A.right: A.right.subtree_insert(B)
34 else: A.subtree_insert_after(B)
35 else: A.item = B.item
Recitation 6 8

1 class Set_Binary_Tree(Binary_Tree): # Binary Search Tree

2 def __init__(self): super().__init__(BST_Node)
3
4 def iter_order(self): yield from self
5
6 def build(self, X):
7 for x in X: self.insert(x)
8
9 def find_min(self):
10 if self.root: return self.root.subtree_first().item
11
12 def find_max(self):
13 if self.root: return self.root.subtree_last().item
14
15 def find(self, k):
16 if self.root:
17 node = self.root.subtree_find(k)
18 if node: return node.item
19
20 def find_next(self, k):
21 if self.root:
22 node = self.root.subtree_find_next(k)
23 if node: return node.item
24
25 def find_prev(self, k):
26 if self.root:
27 node = self.root.subtree_find_prev(k)
28 if node: return node.item
29
30 def insert(self, x):
31 new_node = self.Node_Type(x)
32 if self.root:
33 self.root.subtree_insert(new_node)
34 if new_node.parent is None: return False
35 else:
36 self.root = new_node
37 self.size += 1
38 return True
39

40 def delete(self, k):

41 assert self.root
42 node = self.root.subtree_find(k)
43 assert node
44 ext = node.subtree_delete()
45 if ext.parent is None: self.root = None
46 self.size -= 1
47 return ext.item
MIT OpenCourseWare
https://ocw.mit.edu

6.006 Introduction to Algorithms

Spring 2020

Recitation 7

Balanced Binary Trees

Previously, we discussed binary trees as a general data structure for storing items, without bound-
ing the maximum height of the tree. The ultimate goal will be to keep our tree balanced: a tree on
n nodes is balanced if its height is O(log n). Then all the O(h)-time operations we talked about
last time will only take O(log n) time.

There are many ways to keep a binary tree balanced under insertions and deletions (Red-Black
Trees, B-Trees, 2-3 Trees, Splay Trees, etc.). The oldest (and perhaps simplest) method is called
an AVL Tree. Every node of an AVL Tree is height-balanced (i.e., satisﬁes the AVL Property)
where the left and right subtrees of a height-balanced node differ in height by at most 1. To put it
a different way, deﬁne the skew of a node to be the height of its right subtree minus the height of
its left subtree (where the height of an empty subtree is −1. Then a node is height-balanced if it’s
skew is either −1, 0, or 1. A tree is height-balanced if every node in the tree is height-balanced.
Height-balance is good because it implies balance!

Exercise: A height-balanced tree is balanced.

Solution: Balanced means that h = O(log n). Equivalently, balanced means that log n is lower
bounded by Ω(h) so that n = 2Ω(h) . So if we can show the minimum number of nodes in a height-
balanced tree is at least exponential in h, then it must also be balanced. Let F (h) denote the fewest
nodes in any height-balanced tree of height h. Then F (h) satisﬁes the recurrence:

F (h) = 1 + F (h − 1) + F (h − 2) ≥ 2F (h − 2),

since the subtrees of the root’s children should also contain the fewest nodes. As base cases, the
fewest nodes in a height-balanced tree of height 0 is one, i.e., F (0) = 1, while the fewest nodes in
a height-balanced tree of height 1 is two, i.e., F (1) = 2. Then this recurrence is lower bounded by
F (h) ≥ 2h/2 = 2Ω(h) as desired.
Recitation 7 2

Rotations
As we add or remove nodes to our tree, it is possible that our tree will become imbalanced. We
will want to change the structure of the tree without changing its traversal order, in the hopes that
we can make the tree’s structure more balanced. We can change the structure of a tree using a local
operation called a rotation. A rotation takes a subtree that locally looks like one the following two
configurations and modifies the connections between nodes in O(1) time to transform it into the
other configuration.

1 ___<D> rotate_right(<D>) ___

2 ____ <E> => <A> __<D>__
3 <A> <C> / \ / \ <C> <E>
4 / \ / \ /___\ <= /___\ / \ / \
5 /___\ /___\ rotate_left() /___\ /___\

This operation preserves the traversal order of the tree while changing the depth of the nodes
in subtrees <A> and <E>. Next time, we will use rotations to enforce that a balanced tree stays
balanced after inserting or deleting a node.

1 def subtree_rotate_right(D): def subtree_rotate_left(B): # O(1)

2 assert D.left assert B.right
3 B, E = D.left, D.right A, D = B.left, B.right
4 A, C = B.left, B.right C, E = D.left, D.right
5 D, B = B, D B, D = D, B
6 D.item, B.item = B.item, D.item B.item, D.item = D.item, B.item
7 B.left, B.right = A, D D.left, D.right = B, E
8 D.left, D.right = C, E B.left, B.right = A, C
9 if A: A.parent = B if A: A.parent = B
10 if E: E.parent = D if E: E.parent = D
11 # B.subtree_update() # B.subtree_update() # wait for R07!
12 # D.subtree_update() # D.subtree_update() # wait for R07!

Maintaining Height-Balance
Suppose we have a height-balanced AVL tree, and we perform a single insertion or deletion by
adding or removing a leaf. Either the resulting tree is also height-balanced, or the change in leaf
has made at least one node in the tree have magnitude of skew greater than 1. In particular, the
only nodes in the tree whose subtrees have changed after the leaf modiﬁcation are ancestors of
that leaf (at most O(h) of them), so these are the only nodes whose skew could have changed and
they could have changed by at most 1 to have magnitude at most 2. As shown in lecture via a
brief case analysis, given a subtree whose root has skew is 2 and every other node in its subtree is
height-balanced, we can restore balance to the subtree in at most two rotations. Thus to rebalance
the entire tree, it sufﬁces to walk from the leaf to the root, rebalancing each node along the way,
performing at most O(log n) rotations in total. A detailed proof is outlined in the lecture notes and
is not repeated here; but the proof may be reviewed in recitation if students would like to see the
Recitation 7 3

full argument. Below is code to implement the rebalancing algorithm presented in lecture.

1 def skew(A): # O(?)

2 return height(A.right) - height(A.left)
3
4 def rebalance(A): # O(?)
5 if A.skew() == 2:
6 if A.right.skew() < 0:
7 A.right.subtree_rotate_right()
8 A.subtree_rotate_left()
9 elif A.skew() == -2:
10 if A.left.skew() > 0:
11 A.left.subtree_rotate_left()
12 A.subtree_rotate_right()
13
14 def maintain(A): # O(h)
15 A.rebalance()
16 A.subtree_update()
17 if A.parent: A.parent.maintain()

Unfortunately, it’s not clear how to efﬁciently evaluate the skew of a a node to determine whether
or not we need to perform rotations, because computing a node’s height naively takes time linear in
the size of the subtree. The code below to compute height recurses on every node in <A>’s subtree,
so takes at least Ω(n) time.

1 def height(A): # Omega(n)

2 if A is None: return -1
3 return 1 + max(height(A.left), height(A.right))

Rebalancing requires us to check at least Ω(log n) heights in the worst-case, so if we want rebal-
ancing the tree to take at most O(log n) time, we need to be able to evaluate the height of a node
in O(1) time. Instead of computing the height of a node every time we need it, we will speed up
computation via augmentation: in particular each node stores and maintains the value of its own
subtree height. Then when we’re at a node, evaluating its height is a simple as reading its stored
value in O(1) time. However, when the structure of the tree changes, we will need to update and
recompute the height at nodes whose height has changed.

1 def height(A):
2 if A: return A.height
3 else: return -1

1 def subtree_update(A): # O(1)

2 A.height = 1 + max(height(A.left), height(A.right))

In the dynamic operations presented in R06, we put commented code to call update on every node
whose subtree changed during insertions, deletions, or rotations. A rebalancing insertion or dele-
tion operation only calls subtree update on at most O(log n) nodes, so as long as updating a
Recitation 7 4

node takes at most O(1) time to recompute augmentations based on the stored augmentations of
the node’s children, then the augmentations can be maintained during rebalancing in O(log n) time.

In general, the idea behind augmentation is to store additional information at each node so that
information can be queried quickly in the future. You’ve done some augmentation already in PS1,
where you augmented a singly-linked list with back pointers to make it faster to evaluate a node’s
predecessor. To augment the nodes of a binary tree with a subtree property P(<X>), you need to:

• clearly deﬁne what property of <X>’s subtree corresponds to P(<X>), and

• show how to compute P(<X>) in O(1) time from the augmentations of <X>’s children.

If you can do that, then you will be able to store and maintain that property at each node without
affecting the O(log n) running time of rebalancing insertions and deletions. We’ve shown how
to traverse around a binary tree and perform insertions and deletions, each in O(h) time while
also maintaining height-balance so that h = O(log n). Now we are ﬁnally ready to implement an
efﬁcient Sequence and Set.

Binary Node Implementation with AVL Balancing

1 def height(A):
2 if A: return A.height
3 else: return -1
4

5 class Binary_Node:
6 def __init__(A, x): # O(1)
7 A.item = x
8 A.left = None
9 A.right = None
10 A.parent = None
11 A.subtree_update()
12
13 def subtree_update(A): # O(1)
14 A.height = 1 + max(height(A.left), height(A.right))
15
16 def skew(A): # O(1)
17 return height(A.right) - height(A.left)
18
19 def subtree_iter(A): # O(n)
20 if A.left: yield from A.left.subtree_iter()
21 yield A
22 if A.right: yield from A.right.subtree_iter()
Recitation 7 5

23
24 def subtree_first(A): # O(log n)
25 if A.left: return A.left.subtree_first()
26 else: return A
27
28 def subtree_last(A): # O(log n)
29 if A.right: return A.right.subtree_last()
30 else: return A
31
32 def successor(A): # O(log n)
33 if A.right: return A.right.subtree_first()
34 while A.parent and (A is A.parent.right):
35 A = A.parent
36 return A.parent
37

38 def predecessor(A): # O(log n)

39 if A.left: return A.left.subtree_last()
40 while A.parent and (A is A.parent.left):
41 A = A.parent
42 return A.parent
43
44 def subtree_insert_before(A, B): # O(log n)
45 if A.left:
46 A = A.left.subtree_last()
47 A.right, B.parent = B, A
48 else:
49 A.left, B.parent = B, A
50 A.maintain()
51
52 def subtree_insert_after(A, B): # O(log n)
53 if A.right:
54 A = A.right.subtree_first()
55 A.left, B.parent = B, A
56 else:
57 A.right, B.parent = B, A
58 A.maintain()
59

60 def subtree_delete(A): # O(log n)

61 if A.left or A.right:
62 if A.left: B = A.predecessor()
63 else: B = A.successor()
64 A.item, B.item = B.item, A.item
65 return B.subtree_delete()
66 if A.parent:
67 if A.parent.left is A: A.parent.left = None
68 else: A.parent.right = None
69 A.parent.maintain()
70 return A
71

72
73
Recitation 7 6

74 def subtree_rotate_right(D): # O(1)

75 assert D.left
76 B, E = D.left, D.right
77 A, C = B.left, B.right
78 D, B = B, D
79 D.item, B.item = B.item, D.item
80 B.left, B.right = A, D
81 D.left, D.right = C, E
82 if A: A.parent = B
83 if E: E.parent = D
84 B.subtree_update()
85 D.subtree_update()
86
87 def subtree_rotate_left(B): # O(1)
88 assert B.right
89 A, D = B.left, B.right
90 C, E = D.left, D.right
91 B, D = D, B
92 B.item, D.item = D.item, B.item
93 D.left, D.right = B, E
94 B.left, B.right = A, C
95 if A: A.parent = B
96 if E: E.parent = D
97 B.subtree_update()
98 D.subtree_update()
99
100 def rebalance(A): # O(1)
101 if A.skew() == 2:
102 if A.right.skew() < 0:
103 A.right.subtree_rotate_right()
104 A.subtree_rotate_left()
105 elif A.skew() == -2:
106 if A.left.skew() > 0:
107 A.left.subtree_rotate_left()
108 A.subtree_rotate_right()
109
110 def maintain(A): # O(log n)
111 A.rebalance()
112 A.subtree_update()
113 if A.parent: A.parent.maintain()
Recitation 7 7

Application: Set
Using our new deﬁnition of Binary Node that maintains balance, the implementation presented
in R06 of the Binary Tree Set immediately supports all operations in h = O(log n) time,
except build(X) and iter() which run in O(n log n) and O(n) time respectively. This data
structure is what’s normally called an AVL tree, but what we will call a Set AVL.

Application: Sequence
To use a Binary Tree to implement a Sequence interface, we use the traversal order of the tree to
store the items in Sequence order. Now we need a fast way to ﬁnd the ith item in the sequence
because traversal would take O(n) time. If we knew how many items were stored in our left
subtree, we could compare that size to the index we are looking for and recurse on the appropriate
side. In order to evaluate subtree size efﬁciently, we augment each node in the tree with the size
of its subtree. A node’s size can be computed in constant time given the sizes of its children by
summing them and adding 1.

1 class Size_Node(Binary_Node):
2 def subtree_update(A): # O(1)
3 super().subtree_update()
4 A.size = 1
5 if A.left: A.size += A.left.size
6 if A.right: A.size += A.right.size
7

8 def subtree_at(A, i): # O(h)

9 assert 0 <= i
10 if A.left: L_size = A.left.size
11 else: L_size = 0
12 if i < L_size: return A.left.subtree_at(i)
13 elif i > L_size: return A.right.subtree_at(i - L_size - 1)
14 else: return A

Once we are able to ﬁnd the ith node in a balanced binary tree in O(log n) time, the remainder of
the Sequence interface operations can be implemented directly using binary tree operations. Fur-
ther, via the ﬁrst exercise in R06, we can build such a tree from an input sequence in O(n) time.
We call this data structure a Sequence AVL.

Implementations of both the Sequence and Set interfaces can be found on the following pages.
We’ve made a CoffeeScript Balanced Binary Search Tree visualizer which you can ﬁnd here:

https://codepen.io/mit6006/pen/NOWddZ
Recitation 7 8

1 class Seq_Binary_Tree(Binary_Tree):
2 def __init__(self): super().__init__(Size_Node)
3
4 def build(self, X):
5 def build_subtree(X, i, j):
6 c = (i + j) // 2
7 root = self.Node_Type(A[c])
8 if i < c:
9 root.left = build_subtree(X, i, c - 1)
10 root.left.parent = root
11 if c < j:
12 root.right = build_subtree(X, c + 1, j)
13 root.right.parent = root
14 root.subtree_update()
15 return root
16 self.root = build_subtree(X, 0, len(X) - 1)
17 self.size = self.root.size
18

19 def get_at(self, i):

20 assert self.root
21 return self.root.subtree_at(i).item
22
23 def set_at(self, i, x):
24 assert self.root
25 self.root.subtree_at(i).item = x
26
27 def insert_at(self, i, x):
28 new_node = self.Node_Type(x)
29 if i == 0:
30 if self.root:
31 node = self.root.subtree_first()
32 node.subtree_insert_before(new_node)
33 else:
34 self.root = new_node
35 else:
36 node = self.root.subtree_at(i - 1)
37 node.subtree_insert_after(new_node)
38 self.size += 1
39

40 def delete_at(self, i):

41 assert self.root
42 node = self.root.subtree_at(i)
43 ext = node.subtree_delete()
44 if ext.parent is None: self.root = None
45 self.size -= 1
46 return ext.item
47
48 def insert_first(self, x): self.insert_at(0, x)
49 def delete_first(self): return self.delete_at(0)
50 def insert_last(self, x): self.insert_at(len(self), x)
51 def delete_last(self): return self.delete_at(len(self) - 1)
Recitation 7 9

Exercise: Make a Sequence AVL Tree or Set AVL Tree (Balanced Binary Search Tree) by inserting
student chosen items one by one. If any node becomes height-imbalanced, rebalance its ancestors
going up the tree. Here’s a Sequence AVL Tree example that may be instructive (remember to
update subtree heights and sizes as you modify the tree!).

1 T = Seq_Binary_Tree()
2 T.build([10,6,8,5,1,3])
3 T.get_at(4)
4 T.set_at(4, -4)
5 T.insert_at(4, 18)
6 T.insert_at(4, 12)
7 T.delete_at(2)

Solution:
1 Line # 1 | 2,3 | 4 | 5 | 6 | 7
2 | | | | |
3 Result None | ___8__ | ___8___ | ___8_____ | ___8_______ | __12____
4 | 10_ _1_ | 10_ _-4_ | 10_ ___-4_ | 10_ ____-4_ | __6_ __-4_
5 | 6 5 3 | 6 5 3 | 6 5__ 3 | 6 _12__ 3 | 10 5 18 3
6 | | | 18 | 5 18 |
7
8 Also labeled with subtree height H, size #:
9
10 None
11 ___________8H2#6__________
12 10H1#2_____ _____1H1#3_____
13 6H0#1 5H0#1 3H0#1
14
15 ___________8H2#6__________
16 10H1#2_____ _____1H1#3_____
17 6H0#1 5H0#1 3H0#1
18
19 ___________8H2#6___________
20 10H1#2_____ _____-4H1#3_____
21 6H0#1 5H0#1 3H0#1
22

23 ___________8H3#7_________________
24 10H1#2_____ ___________-4H2#4_____
25 6H0#1 5H1#2______ 3H0#1
26 18H0#1
27
28 ___________8H3#8_______________________
29 10H1#2_____ ____________-4H2#5_____
30 6H0#1 _____12H1#3______ 3H0#1
31 5H0#1 18H0#1
32
33 __________12H2#7____________
34 ______6H1#3_____ ______-4H1#3_____
35 10H0#1 5H0#1 18H0#1 3H0#1
Recitation 7 10

Exercise: Maintain a sequence of n bits that supports two operations, each in O(log n) time:

• flip(i): ﬂip the bit at index i

• count ones upto(i): return the number of bits in the preﬁx up to index i that are one

Solution: Maintain a Sequence Tree storing the bits as items, augmenting each node A with
A.subtree ones, the number of 1 bits in its subtree. We can maintain this augmentation in
O(1) time from the augmentations stored at its children.

1 def update(A):
2 A.subtree_ones = A.item
3 if A.left:
4 A.subtree_ones += A.left.subtree_ones
5 if A.right:
6 A.subtree_ones += A.right.subtree_ones

To implement flip(i), ﬁnd the ith node A using subtree node at(i) and ﬂip the bit stored
at A.item. Then update the augmentation at A and every ancestor of A by walking up the tree in
O(log n) time.

To implement count ones upto(i), we will ﬁrst deﬁne the subtree-based recursive function
subtree count ones upto(A, i) which returns the number of 1 bits in the subtree of node
A that are at most index i within A’s subtree. Then count ones upto(i) is symantically equiv-
ilent to subtree count ones upto(T.root, i). Since each recursive call makes at most
one recursive call on a child, operation takes O(log n) time.

1 def subtree_count_ones_upto(A, i):

2 assert 0 <= i < A.size
3 out = 0
4 if A.left:
5 if i < A.left.size:
6 return subtree_count_ones_upto(A.left, i)
7 out += A.left.subtree_ones
8 i -= A.left.size
9 out += A.item
10 if i > 0:
11 assert A.right
12 out += subtree_count_ones_upto(A.right, i - 1)
13 return out
MIT OpenCourseWare
https://ocw.mit.edu

6.006 Introduction to Algorithms

Spring 2020

Recitation 8

Priority Queues
Priority queues provide a general framework for at least three sorting algorithms, which differ only
in the data structure used in the implementation.

algorithm data structure insertion extraction total

Selection Sort Array O(1) O(n) O(n2 )
Insertion Sort Sorted Array O(n) O(1) O(n2 )
Heap Sort Binary Heap O(log n) O(log n) O(n log n)

Let’s look at Python code that implements these priority queues. We start with an abstract base
class that has the interface of a priority queue, maintains an internal array A of items, and trivially
implements insert(x) and delete max() (the latter being incorrect on its own, but useful for
subclasses).

1 class PriorityQueue:
2 def __init__(self):
3 self.A = []
4

5 def insert(self, x):

6 self.A.append(x)
7

8 def delete_max(self):
9 if len(self.A) < 1:
10 raise IndexError(’pop from empty priority queue’)
11 return self.A.pop() # NOT correct on its own!
12

13 @classmethod
14 def sort(Queue, A):
15 pq = Queue() # make empty priority queue
16 for x in A: # n x T_insert
17 pq.insert(x)
18 out = [pq.delete_max() for _ in A] # n x T_delete_max
19 out.reverse()
20 return out

Shared across all implementations is a method for sorting, given implementations of insert and
delete max. Sorting simply makes two loops over the array: one to insert all the elements, and
another to populate the output array with successive maxima in reverse order.
Recitation 8 2

Array Heaps
We showed implementations of selection sort and merge sort previously in recitation. Here are
implementations from the perspective of priority queues. If you were to unroll the organization of
this code, you would have essentially the same code as we presented before.

1 class PQ_Array(PriorityQueue):
2 # PriorityQueue.insert already correct: appends to end of self.A
3 def delete_max(self): # O(n)
4 n, A, m = len(self.A), self.A, 0
5 for i in range(1, n):
6 if A[m].key < A[i].key:
7 m = i
8 A[m], A[n] = A[n], A[m] # swap max with end of array
9 return super().delete_max() # pop from end of array

1 class PQ_SortedArray(PriorityQueue):
2 # PriorityQueue.delete_max already correct: pop from end of self.A
3 def insert(self, *args): # O(n)
4 super().insert(*args) # append to end of array
5 i, A = len(self.A) - 1, self.A # restore array ordering
6 while 0 < i and A[i + 1].key < A[i].key:
7 A[i + 1], A[i] = A[i], A[i + 1]
8 i -= 1

We use *args to allow insert to take one argument (as makes sense now) or zero arguments;
we will need the latter functionality when making the priority queues in-place.
Recitation 8 3

Binary Heaps
The next implementation is based on a binary heap, which takes advantage of the logarithmic
height of a complete binary tree to improve performance. The bulk of the work done by these
functions are encapsulated by max heapify up and max heapify down below.

1 class PQ_Heap(PriorityQueue):
2 def insert(self, *args): # O(log n)
3 super().insert(*args) # append to end of array
4 n, A = self.n, self.A
5 max_heapify_up(A, n, n - 1)
6
7 def delete_max(self): # O(log n)
8 n, A = self.n, self.A
9 A[0], A[n] = A[n], A[0]
10 max_heapify_down(A, n, 0)
11 return super().delete_max() # pop from end of array

Before we deﬁne max heapify operations, we need functions to compute parent and child
indices given an index representing a node in a tree whose root is the ﬁrst element of the array. In
this implementation, if the computed index lies outside the bounds of the array, we return the input
index. Always returning a valid array index instead of throwing an error helps to simplify future
code.

1 def parent(i):
2 p = (i - 1) // 2
3 return p if 0 < i else i
4
5 def left(i, n):
6 l = 2 * i + 1
7 return l if l < n else i
8

9 def right(i, n):

10 r = 2 * i + 2
11 return r if r < n else i
Recitation 8 4

Here is the meat of the work done by a max heap. Assuming all nodes in A[:n] satisfy the
Max-Heap Property except for node A[i] makes it easy for these functions to maintain the Node
Max-Heap Property locally.

1 def max_heapify_up(A, n, c): # T(c) = O(log c)

2 p = parent(c) # O(1) index of parent (or c)
3 if A[p].key < A[c].key: # O(1) compare
4 A[c], A[p] = A[p], A[c] # O(1) swap parent
5 max_heapify_up(A, n, p) # T(p) = T(c/2) recursive call on parent

1 def max_heapify_down(A, n, p): # T(p) = O(log n - log p)

2 l, r = left(p, n), right(p, n) # O(1) indices of children (or p)
3 c = l if A[r].key < A[l].key else r # O(1) index of largest child
4 if A[p].key < A[c].key: # O(1) compare
5 A[c], A[p] = A[p], A[c] # O(1) swap child
6 max_heapify_down(A, n, c) # T(c) recursive call on child

O(n) Build Heap

Recall that repeated insertion using a max heap priority queue takes time ni=0 log i = log n! =
P
O(n log n). We can build a max heap in linear time if the whole array is accessible to you. The idea
is to construct the heap in reverse level order, from the leaves to the root, all the while maintaining
that all nodes processed so far maintain the Max-Heap Property by running max heapify down
at each node. As an optimization, we note that the nodes in the last half of the array are all leaves,
so we do not need to run max heapify down on them.

1 def build_max_heap(A):
2 n = len(A)
3 for i in range(n // 2, -1, -1): # O(n) loop backward over array
4 max_heapify_down(A, n, i) # O(log n - log i)) fix max heap

To see that this procedure takes O(n) instead of O(n log n) time, we compute an upper
√ bound
explicitly using summation. In the derivation, we use Stirling’s approximation: n! = Θ( n(n/e)n ).

n n
nn

X n
T (n) < (log n − log i) = log = O log √
i=0
n! n(n/e)n
√ √
= O(log(en / n)) = O(n log e − log n) = O(n)

Note that using this linear-time procedure to build a max heap does not affect the asymptotic
efﬁciency of heap sort, because each of n delete max still takes O(log n) time each. But it is
practically more efﬁcient procedure to initially insert n items into an empty heap.
Recitation 8 5

In-Place Heaps
To make heap sort in place1 (as well as restoring the in-place property of selection sort and inser-
tion sort), we can modify the base class PriorityQueue to take an entire array A of elements,
and maintain the queue itself in the preﬁx of the ﬁrst n elements of A (where n <= len(A)). The
insert function is no longer given a value to insert; instead, it inserts the item already stored
in A[n], and incorporates it into the now-larger queue. Similarly, delete max does not return
a value; it merely deposits its output into A[n] before decreasing its size. This approach only
works in the case where all n insert operations come before all n delete max operations, as in
priority queue sort.

1 class PriorityQueue:
2 def __init__(self, A):
3 self.n, self.A = 0, A
4

5 def insert(self): # absorb element A[n] into the queue

6 if not self.n < len(self.A):
7 raise IndexError(’insert into full priority queue’)
8 self.n += 1
9

10 def delete_max(self): # remove element A[n - 1] from the queue

11 if self.n < 1:
12 raise IndexError(’pop from empty priority queue’)
13 self.n -= 1 # NOT correct on its own!
14

15 @classmethod
16 def sort(Queue, A):
17 pq = Queue(A) # make empty priority queue
18 for i in range(len(A)): # n x T_insert
19 pq.insert()
20 for i in range(len(A)): # n x T_delete_max
21 pq.delete_max()
22 return pq.A

This new base class works for sorting via any of the subclasses: PQ Array, PQ SortedArray,
PQ Heap. The first two sorting algorithms are even closer to the original selection sort and inser-
tion sort, and the final algorithm is what is normally referred to as heap sort.
We’ve made a CoffeeScript heap visualizer which you can find here:
https://codepen.io/mit6006/pen/KxOpep

1
Recall that an in-place sort only uses O(1) additional space during execution, so only a constant number of array
elements can exist outside the array at any given time.
Recitation 8 6

Exercises
1. Draw the complete binary tree associated with the sub-array array A[:8]. Turn it into a max
heap via linear time bottom-up heap-iﬁcation. Run insert twice, and then delete max
twice.

1 A = [7, 3, 5, 6, 2, 0, 3, 1, 9, 4]

2. How would you ﬁnd the minimum element contained in a max heap?
Solution: A max heap has no guarantees on the location of its minimum element, except that
it may not have any children. Therefore, one must search over all n/2 leaves of the binary
tree which takes Ω(n) time.

3. How long would it take to convert a max heap to a min heap?

Solution: Run a modiﬁed build max heap on the original heap, enforcing a Min-Heap
Property instead of a Max-Heap Property. This takes linear time. The fact that the original
heap was a max heap does not improve the running time.
Recitation 8 7

4. Proximate Sorting: An array of distinct integers is k-proximate if every integer of the array
is at most k places away from its place in the array after being sorted, i.e., if the ith integer
of the unsorted input array is the jth largest integer contained in the array, then |i − j| ≤ k.
In this problem, we will show how to sort a k-proximate array faster than Θ(n log n).

(a) Prove that insertion sort (as presented in this class, without any changes) will sort a
k-proximate array in O(nk) time.
Solution: To prove O(nk), we show that each of the n insertion sort rounds swap an
item left by at most O(k). In the original ordering, entries that are ≥ 2k slots apart must
already be ordered correctly: indeed, if A[s] > A[t] but t − s ≥ 2k, there is no way to
reverse the order of these two items while moving each at most k slots. This means that
for each entry A[i] in the original order, fewer than 2k of the items A[0], . . . , A[i − 1]
are greater than A[i]. Thus, on round i of insertion sort when A[i] is swapped into
place, fewer than 2k swaps are required, so round i requires O(k) time.
It’s possible to prove a stronger bound: that ai = A[i] is swapped at most k times in
round i (instead of 2k). This is a bit subtle: the final sorted index of ai is at most k slots
away from i by the k-proximate assumption, but ai might not move to its final position
immediately, but may move past its final sorted position and then be bumped to the
right in future rounds. Suppose for contradiction a loop swaps the pth largest item A[i]
to the left by more than k to position p0 < i − k, past at least k items larger than A[i].
Since A is k-proximate, i − p ≤ k, i.e. i − k ≤ p, so p0 < p. Thus at least one item
less than A[i] must exist to the right of A[i]. Let A[j] be the smallest such item, the qth
largest item in sorted order. A[j] is smaller than k + 1 items to the left of A[j], and no
item to the right of A[j] is smaller than A[j], so q ≤ j − (k + 1), i.e. j − q ≥ k + 1.
But A is k-proximate, so j − q ≤ k, a contradiction.
(b) Θ(nk) is asymptotically faster than Θ(n2 ) when k = o(n), but is not asymptotically
faster than Θ(n log n) when k = ω(log n). Describe an algorithm to sort a k-proximate
array in O(n log k) time, which can be faster (but no slower) than Θ(n log n).
Solution: We perform a variant of heap sort, where the heap only stores k + 1 items
at a time. Build a min-heap H out of A[0], . . . , A[k − 1]. Then, repeatedly, insert the
next item from A into H, and then store H.delete min() as the next entry in sorted
order. So we first call H.insert(A[k]) followed by B[0] = H.delete min();
the next iteration calls H.insert(A[k+1]) and B[1] = H.delete min(); and so
on. (When there are no more entries to insert into H, do only the delete min step.)
B is the sorted answer. This algorithm works because the ith smallest entry in array A
must be one of A[0], A[1], . . . , A[i + k] by the k-proximate assumption, and by the time
we’re about to write B[i], all of these entries have already been inserted into H (and
some also deleted). Assuming entries B[0], . . . , B[i − 1] are correct (by induction), this
means the ith smallest value is still in H while all smaller values have already been
removed, so this ith smallest value is in fact H.delete min(), and B[i] gets filled
correctly. Each heap operation takes time O(log k) because there are at most k + 1
items in the heap, so the n insertions and n deletions take O(n log k) total.
MIT OpenCourseWare
https://ocw.mit.edu

6.006 Introduction to Algorithms

Spring 2020

Recitation 9

Graphs
A graph G = (V, E) is a mathematical object comprising a set of vertices V (also called nodes)
and a set of edges E, where each edge in E is a two-element subset of vertices from V . A vertex
and edge are incident or adjacent if the edge contains the vertex. Let u and v be vertices. An edge
is directed if its subset pair is ordered, e.g., (u, v), and undirected if its subset pair is unordered,
e.g., {u, v} or alternatively both (u, v) and (v, u). A directed edge e = (u, v) extends from vertex
u (e’s tail) to vertex v (e’s head), with e an incoming edge of v and an outgoing edge of u. In
an undirected graph, every edge is incoming and outgoing. The in-degree and out-degree of a
vertex v denotes the number of incoming and outgoing edges connected to v respectively. Unless
otherwise speciﬁed, when we talk about degree, we generally mean out-degree.

As their name suggest, graphs are often depicted graphically, with vertices drawn as points, and
edges drawn as lines connecting the points. If an edge is directed, its corresponding line typically
includes an indication of the direction of the edge, for example via an arrowhead near the edge’s
head. Below are examples of a directed graph G1 and an undirected graph G2 .

G1 = (V1 , E1 ) V1 = {0, 1, 2, 3, 4} E1 = {(0, 1), (1, 2), (2, 0), (3, 4)}
G2 = (V2 , E2 ) V2 = {0, 1, 2, 3, 4} E2 = {{0, 1}, {0, 3}, {0, 4}, {2, 3}}

0 0

4 1 4 1

3 2 3 2
A path1 in a graph is a sequence of vertices (v0 , . . . , vk ) such that for every ordered pair of vertices
(vi , vi+1 ), there exists an outgoing edge in the graph from vi to vi+1 . The length of a path is the
number of edges in the path, or one less than the number of vertices. A graph is called strongly
connected if there is a path from every node to every other node in the graph. Note that every
connected undirected graph is also strongly connected because every undirected edge incident to a
vertex is also outgoing. Of the two connected components of directed graph G1 , only one of them
is strongly connected.

1
These are “walks” in 6.042. A “path” in 6.042 does not repeat vertices, which we would call a simple path.
Recitation 9 2

Graph Representations
There are many ways to represent a graph in code. The most common way is to store a Set data
structure Adj mapping each vertex u to another data structure Adj(u) storing the adjacencies of
v, i.e., the set of vertices that are accessible from v via a single outgoing edge. This inner data
structure is called an adjacency list. Note that we don’t store the edge pairs explicitly; we store
only the out-going neighbor vertices for each vertex. When vertices are uniquely labeled from 0
to |V | − 1, it is common to store the top-level Set Adj within a direct access array of length |V |,
where array slot i points to the adjacency list of the vertex labeled i. Otherwise, if the vertices
are not labeled in this way, it is also common to use a hash table to map each u ∈ V to Adj(u).
Then, it is common to store each adjacency list Adj(u) as a simple unordered array of the outgoing
adjacencies. For example, the following are adjacency list representations of G1 and G2 , using a
direct access array for the top-level Set and an array for each adjacency list.

1 A1 = [[1], A2 = [[1, 4, 3], # 0

2 [2], [0], # 1
3 [0], [3], # 2
4 [4], [0, 2], # 3
5 []] [0]] # 4

Using an array for an adjacency list is a perfectly good data structures if all you need to do is loop
over the edges incident to a vertex (which will be the case for all algorithms we will discuss in
this class, so will be our default implementation). Each edge appears in any adjacency list at most
twice, so the size of an adjacency list representation implemented using arrays is Θ(|V | + |E|).
A drawback of this representation is that determining whether your graph contains a given edge
(u, v) might require Ω(|V |) time to step through the array representing the adjacency list of u or v.
We can overcome this obstacle by storing adjacency lists using hash tables instead of regular un-
sorted arrays, which will support edge checking in expected O(1) time, still using only Θ(|V |+|E|)
space. However, we won’t need this operation for our algorithms, so we will assume the simpler
unsorted-array-based adjacency list representation. Below are representations of G1 and G2 that
use a hash table for both the outer Adj Set and the inner adjacency lists Adj(u), using Python
dictionaries:

1 S1 = {0: {1}, S2 = {0: {1, 3, 4}, # 0

2 1: {2}, 1: {0}, # 1
3 2: {0}, 2: {3}, # 2
4 3: {4}} 3: {0, 2}, # 3
5 4: {0}} # 4
Recitation 9 3

Breadth-First Search
Given a graph, a common query is to find the vertices reachable by a path from a queried vertex
s. A breadth-first search (BFS) from s discovers the level sets of s: level Li is the set of ver-
tices reachable from s via a shortest path of length i (not reachable via a path of shorter length).
Breadth-first search discovers levels in increasing order starting with i = 0, where L0 = {s} since
the only vertex reachable from s via a path of length i = 0 is s itself. Then any vertex reach-
able from s via a shortest path of length i + 1 must have an incoming edge from a vertex whose
shortest path from s has length i, so it is contained in level Li . So to compute level Li+1 , include
every vertex with an incoming edge from a vertex in Li , that has not already been assigned a level.
By computing each level from the preceding level, a growing frontier of vertices will be explored
according to their shortest path length from s.

Below is Python code implementing breadth-ﬁrst search for a graph represented using index-
labeled adjacency lists, returning a parent label for each vertex in the direction of a shortest path
back to s. Parent labels (pointers) together determine a BFS tree from vertex s, containing some
shortest path from s to every other vertex in the graph.

1 def bfs(Adj, s): # Adj: adjacency list, s: starting vertex

2 parent = [None for v in Adj] # O(V) (use hash if unlabeled)
3 parent[s] = s # O(1) root
4 level = [[s]] # O(1) initialize levels
5 while 0 < len(level[-1]): # O(?) last level contains vertices
6 level.append([]) # O(1) amortized, make new level
7 for u in level[-2]: # O(?) loop over last full level
8 for v in Adj[u]: # O(Adj[u]) loop over neighbors
9 if parent[v] is None: # O(1) parent not yet assigned
10 parent[v] = u # O(1) assign parent from level[-2]
11 level[-1].append(v) # O(1) amortized, add to border
12 return parent

How fast is breadth-ﬁrst search? In particular, how many times can the inner loop on lines 9–11
be executed? A vertex is added to any level at most once in line 11, so the loop in line 7 processes
each vertex v at most once. The loop in line 8 cycles P through all deg(v) outgoing edges from
vertex v. Thus the inner loop is repeated at most O( v∈V deg(v)) = O(|E|) times. Because the
parent array returned has length |V |, breadth-ﬁrst search runs in O(|V | + |E|) time.
Recitation 9 4

Exercise: For graphs G1 and G2 , conducting a breadth-ﬁrst search from vertex v0 yields the parent
labels and level sets below.

1 P1 = [0, L1 = [[0], P2 = [0, L2 = [[0], # 0

2 0, [1], 0, [1,3,4], # 1
3 1, [2], 3, [2], # 2
4 None, []] 0, []] # 3
5 None] 0] # 4

We can use parent labels returned by a breadth-ﬁrst search to construct a shortest path from a vertex
s to vertex t, following parent pointers from t backward through the graph to s. Below is Python
code to compute the shortest path from s to t which also runs in worst-case O(|V | + |E|) time.

1 def unweighted_shortest_path(Adj, s, t):

2 parent = bfs(Adj, s) # O(V + E) BFS tree from s
3 if parent[t] is None: # O(1) t reachable from s?
4 return None # O(1) no path
5 i = t # O(1) label of current vertex
6 path = [t] # O(1) initialize path
7 while i != s: # O(V) walk back to s
8 i = parent[i] # O(1) move to parent
9 path.append(i) # O(1) amortized add to path
10 return path[::-1] # O(V) return reversed path

Exercise: Given an unweighted graph G = (V, E), ﬁnd a shortest path from s to t having an odd
number of edges.

Solution: Construct a new graph G0 = (V 0 , E 0 ). For every vertex u in V , construct two vertices
uE and uO in V 0 : these represent reaching the vertex u through an even and odd number of edges,
respectively. For every edge (u, v) in E, construct the edges (uE , vO ) and (uO , vE ) in E 0 . Run
breadth-first search on G0 from sE to find the shortest path from sE to tO . Because G0 is bipartite
between even and odd vertices, even paths from sE will always end at even vertices, and odd paths
will end at odd vertices, so finding a shortest path from sE to tO will represent a path of odd length
in the original graph. Because G0 has 2|V | vertices and 2|E| edges, constructing G0 and running
breadth-first search from sE each take O(|V | + |E|) time.
MIT OpenCourseWare
https://ocw.mit.edu

6.006 Introduction to Algorithms

Spring 2020

Recitation 10

Depth-First Search
A breadth-first search discovers vertices reachable from a queried vertex s level-by-level outward
from s. A depth-first search (DFS) also finds all vertices reachable from s, but does so by search-
ing undiscovered vertices as deep as possible before exploring other branches. Instead of exploring
all neighbors of s one after another as in a breadth-first search, depth-first searches as far as possi-
ble from the first neighbor of s before searching any other neighbor of s. Just as with breadth-first
search, depth-first search returns a set of parent pointers for vertices reachable from s in the order
the search discovered them, together forming a DFS tree. However, unlike a BFS tree, a DFS tree
will not represent shortest paths in an unweighted graph. (Additionally, DFS returns an order on
vertices discovered which will be discussed later.) Below is Python code implementing a recursive
depth-first search for a graph represented using index-labeled adjacency lists.

1 def dfs(Adj, s, parent = None, order = None): # Adj: adjacency list, s: start
2 if parent is None: # O(1) initialize parent list
3 parent = [None for v in Adj] # O(V) (use hash if unlabeled)
4 parent[s] = s # O(1) root
5 order = [] # O(1) initialize order array
6 for v in Adj[s]: # O(Adj[s]) loop over neighbors
7 if parent[v] is None: # O(1) parent not yet assigned
8 parent[v] = s # O(1) assign parent
9 dfs(Adj, v, parent, order) # Recursive call
10 order.append(s) # O(1) amortized
11 return parent, order

How fast is depth-first search? A recursive dfs call is performed only when a vertex does not have
a parent pointer, and is given a parent pointer immediately before the recursive call. Thus dfs is
called on each vertex at most once. Further, the amount of work done by each recursive search
from vertex v is proportional
P to the out-degree deg(v) of v. Thus, the amount of work done by
depth-first search is O( v∈V deg(v)) = O(|E|). Because the parent array returned has length |V |,
depth-first search runs in O(|V | + |E|) time.

Exercise: Describe a graph on n vertices for which BFS and DFS would ﬁrst visit vertices in the
same order.

Solution: Many possible solutions. Two solutions are a chain of vertices from v, or a star graph
with an edge from v to every other vertex.
Recitation 10 2

Full Graph Exploration

Of course not all vertices in a graph may be reachable from a query vertex s. To search all ver-
tices in a graph, one can use depth-first search (or breadth-first search) to explore each connected
component in the graph by performing a search from each vertex in the graph that has not yet been
discovered by the search. Such a search is conceptually equivalent to adding an auxiliary vertex
with an outgoing edge to every vertex in the graph and then running breadth-first or depth-first
search from the added vertex. Python code searching an entire graph via depth-first search is given
below.

1 def full_dfs(Adj): # Adj: adjacency list

2 parent = [None for v in Adj] # O(V) (use hash if unlabeled)
3 order = [] # O(1) initialize order list
4 for v in range(len(Adj)): # O(V) loop over vertices
5 if parent[v] is None: # O(1) parent not yet assigned
6 parent[v] = v # O(1) assign self as parent (a root)
7 dfs(Adj, v, parent, order) # DFS from v (BFS can also be used)
8 return parent, order

For historical reasons (primarily for its connection to topological sorting as discussed later) depth-
ﬁrst search is often used to refer to both a method to search a graph from a speciﬁc vertex, and
as a method to search an entire (as in graph explore). You may do the same when answering
problems in this class.

DFS Edge Classiﬁcation

To help prove things about depth-ﬁrst search, it can be useful to classify the edges of a graph in
relation to a depth-ﬁrst search tree. Consider a graph edge from vertex u to v. We call the edge a
tree edge if the edge is part of the DFS tree (i.e. parent[v] = u). Otherwise, the edge from u to
v is not a tree edge, and is either a back edge, forward edge, or cross edge depending respectively
on whether: u is a descendant of v, v is a descendant of u, or neither are descendants of each other,
in the DFS tree.

Exercise: Draw a graph, run DFS from a vertex, and classify each edge relative to the DFS tree.
Show that forward and cross edges cannot occur when running DFS on an undirected graph.

Exercise: How can you identify back edges computationally?

Solution: While performing a depth-ﬁrst search, keep track of the set of ancestors of each vertex
in the DFS tree during the search (in a direct access array or a hash table). When processing
neighbor v of s in dfs(Adj, s), if v is an ancestor of s, then (s, v) is a back edge, and certiﬁes a
cycle in the graph.
Recitation 10 3

Topological Sort
A directed graph containing no directed cycle is called a directed acyclic graph or a DAG. A
topological sort of a directed acyclic graph G = (V, E) is a linear ordering of the vertices such
that for each edge (u, v) in E, vertex u appears before vertex v in the ordering. In the dfs func-
tion, vertices are added to the order list in the order in which their recursive DFS call ﬁnishes. If
the graph is acyclic, the order returned by dfs (or graph search) is the reverse of a topolog-
ical sort order. Proof by cases. One of dfs(u) or dfs(v) is called ﬁrst. If dfs(u) was called
before dfs(v), dfs(v) will start and end before dfs(u) completes, so v will appear before u
in order. Alternatively, if dfs(v) was called before dfs(u), dfs(u) cannot be called before
dfs(v) completes, or else a path from v to u would exist, contradicting that the graph is acyclic;
so v will be added to order before vertex u. Reversing the order returned by DFS will then repre-
sent a topological sort order on the vertices.

Exercise: A high school contains many student organization, each with its own hierarchical struc-
ture. For example, the school’s newspaper has an editor-in-chief who oversees all students con-
tributing to the newspaper, including a food-editor who oversees only students writing about school
food. The high school’s principal needs to line students up to receive diplomas at graduation, and
wants to recognize student leaders by giving a diploma to student a before student b whenever a
oversees b in any student organization. Help the principal determine an order to give out diplomas
that respects student organization hierarchy, or prove to the principal that no such order exists.

Solution: Construct a graph with one vertex per student, and a directed edge from student a to b if
student a oversees student b in some student organization. If this graph contains a cycle, the princi-
pal is out of luck. Otherwise, a topological sort of the students according to this graph will satisfy
the principal’s request. Run DFS on the graph (exploring the whole graph as in graph explore)
to obtain an order of DFS vertex ﬁnishing times in O(|V | + |E|) time. While performing the DFS,
keep track of the ancestors of each vertex in the DFS tree, and evaluate if each new edge processed
is a back edge. If a back edge is found from vertex u to v, follow parent pointers back to v from u to
obtain a directed cycle in the graph to prove to the principal that no such order exists. Otherwise, if
no cycle is found, the graph is acyclic and the order returned by DFS is the reverse of a topological
sort, which may then be returned to the principal.

We’ve made a CoffeeScript graph search visualizer which you can ﬁnd here:
https://codepen.io/mit6006/pen/dgeKEN
MIT OpenCourseWare
https://ocw.mit.edu

6.006 Introduction to Algorithms

Spring 2020

Recitation 11

Weighted Graphs
For many applications, it is useful to associate a numerical weight to edges in a graph. For example,
a graph modeling a road network might weight each edge with the length of a road corresponding
to the edge, or a graph modeling an online dating network might contain edges from one user to
another weighted by directed attraction. A weighted graph is then a graph G = (V, E) together
with a weight function w : E → R, mapping edges to real-valued weights. In practice, edge
weights will often not be represented by a separate function at all, preferring instead to store each
weight as a value in an adjacency matrix, or inside an edge object stored in an adjacency list or
set. For example, below are randomly weighted adjacency set representations of the graphs from
Recitation 11. A function to extract such weights might be: def w(u,v): return W[u][v].

1 W1 = [0: {1: -2}, W2 = {0: {1: 1, 3: 2, 4: -1}, # 0

2 1: {2: 0}, 1: {0: 1}, # 1
3 2: {0: 1}, 2: {3: 0}, # 2
4 3: {4: 3}} 3: {0: 2, 2: 0}, # 3
5 4: {0: -1}} # 4

Now that you have an idea of how weights could be stored, for the remainder of this class you
may simply assume that a weight function w can be stored using O(|E|) space, and can return the
weight of an edge in constant time1 . When referencing the weight of an edge e = (u, v), we will
often use the notation w(u, v) interchangeably with w(e) to refer to the weight of an edge.

Exercise: Represent graphs W1 and W2 as adjacency matrices. How could you store weights in an
adjacency list representation?

Weighted Shortest Paths

A weighted path is simply a path in a weighted graph as defined in Recitation 11, where the weight
of the path is the sum of the weights from edges in the path. Again, we will often Pabuse our nota-
k−1
tion: if π = (v1 , . . . , vk ) is a weighted path, we let w(π) denote the path’s weight i=1 w(vi , vi+1 ).
The (single source) weighted shortest paths problem asks for a lowest weight path to every vertex
v in a graph from an input source vertex s, or an indication that no lowest weight path exists from
s to v. We already know how to solve the weighted shortest paths problem on graphs for which
all edge weights are positive and are equal to each other: simply run breadth-first search from s
to minimize the number of edges traversed, thus minimizing path weight. But when edges have
different and/or non-positive weights, breadth-first search cannot be applied directly.
1
We will typically only be picky with the distinction between worst-case and expected bounds when we want to
test your understanding of data structures. Hash tables perform well in practice, so use them!
Recitation 11 2

In fact, when a graph contains a cycle (a path starting and ending at the same vertex) that has
negative weight, then some shortest paths might not even exist, because for any path containing
a vertex from the negative weight cycle, a shorter path can be found by adding a tour around the
cycle. If any path from s to some vertex v contains a vertex from a negative weight cycle, we will
say the shortest path from s to v is undefined, with weight −∞. If no path exists from s to v, then
we will say the shortest path from s to v is undefined, with weight +∞. In addition to breadth-first
search, we will present three additional algorithms to compute single source shortest paths that
cater to different types of weighted graphs.
Weighted Single Source Shortest Path Algorithms
Restrictions SSSP Algorithm
Graph Weights Name Running Time O(·)
General Unweighted BFS |V | + |E|
DAG Any DAG Relaxation |V | + |E|
General Any Bellman-Ford |V | · |E|
General Non-negative Dijkstra |V | log |V | + |E|

Relaxation
We’ve shown you one view of relaxation in lecture. Below is another framework by which you can
view DAG relaxation. As a general algorithmic paradigm, a relaxation algorithm searches for a
solution to an optimization problem by starting with a solution that is not optimal, then iteratively
improves the solution until it becomes an optimal solution to the original problem. In the single
source shortest paths problem, we would like to ﬁnd the weight δ(s, v) of a shortest path from
source s to each vertex v in a graph. As a starting point, for each vertex v we will initialize an
upper bound estimate d(v) on the shortest path weight from s to v, +∞ for all d(s, v) except
d(s, s) = 0. During the relaxation algorithm, we will repeatedly relax some path estimate d(s, v),
decreasing it toward the true shortest path weight δ(s, v). If ever d(s, v) = δ(s, v), we say that
estimate d(s, v) is fully relaxed. When all shortest path estimates are fully relaxed, we will have
solved the original problem. Then an algorithm to ﬁnd shortest paths could take the following
form:

1 def general_relax(Adj, w, s): # Adj: adjacency list, w: weights, s: start

2 d = [float(’inf’) for _ in Adj] # shortest path estimates d(s, v)
3 parent = [None for _ in Adj] # initialize parent pointers
4 d[s], parent[s] = 0, s # initialize source
5 while True: # repeat forever!
6 relax some d[v] ?? # relax a shortest path estimate d(s, v)
7 return d, parent # return weights, paths via parents

There are a number of problems with this algorithm, not least of which is that it never terminates!
But if we can repeatedly decrease each shortest path estimates to fully relax each d(s, v), we will
have found shortest paths. How do we ‘relax’ vertices and when do we stop relaxing?
Recitation 11 3

To relax a shortest path estimate d(s, v), we will relax an incoming edge to v, from another vertex
u. If we maintain that d(s, u) always upper bounds the shortest path from s to u for all u ∈ V ,
then the true shortest path weight δ(s, v) can’t be larger than d(s, u) + w(u, v) or else going to u
along a shortest path and traversing the edge (u, v) would be a shorter path2 . Thus, if at any time
d(s, u) + w(u, v) < d(s, v), we can relax the edge by setting d(s, v) = d(s, u) + w(u, v), strictly
improving our shortest path estimate.

1 def try_to_relax(Adj, w, d, parent, u, v):

2 if d[v] > d[u] + w(u, v): # better path through vertex u
3 d[v] = d[u] + w(u, v) # relax edge with shorter path found
4 parent[v] = u

If we only change shortest path estimates via relaxation, than we can prove that the shortest path
estimates will never become smaller than true shortest paths.

Safety Lemma: Relaxing an edge maintains d(s, v) ≥ δ(s, v) for all v ∈ V .

Proof. We prove a stronger statement, that for all v ∈ V , d(s, v) is either inﬁnite or
the weight of some path from s to v (so cannot be larger than a shortest path). This
is true at initialization: each d(s, v) is +∞, except for d(s) = 0 corresponding to the
zero-length path. Now suppose at some other time the claim is true, and we relax edge
(u, v). Relaxing the edge decreases d(s, v) to a ﬁnite value d(s, u) + w(u, v), which by
induction is a length of a path from s to v: a path from s to u and the edge (u, v).

If ever we arrive at an assignment of all shortest path estimates such that no edge in the graph can
be relaxed, then we can prove that shortest path estimates are in fact shortest path distances.

Termination Lemma: If no edge can be relaxed, then d(s, v) ≤ δ(s, v) for all v ∈ V .
Proof. Suppose for contradiction δ(s, v) < d(s, v) so that there is a shorter path π
from s to v. Let (a, b) be ther ﬁrst edge of π such that d(b) > δ(s, b). Then edge (a, b)
can be relaxed, a contradiction.

So, we can change lines 5-6 of the general relaxation algorithm to repeatedly relax edges from the
graph until no edge can be further relaxed.

1 while some_edge_relaxable(Adj, w, d):

2 (u, v) = get_relaxable_edge(Adj, w, d)
3 try_to_relax(Adj, w, d, parent, u, v)

It remains to analyze the running time of this algorithm, which cannot be determined unless we
provide detail for how this algorithm chooses edges to relax. If there exists a negative weight cycle
in the graph reachable from s, this algorithm will never terminate as edges along the cycle could
be relaxed forever. But even for acyclic graphs, this algorithm could take exponential time.

2
This is a special case of the triangle inequality: δ(a, c) ≤ δ(a, b) + δ(b, c) for all a, b, c ∈ V .
Recitation 11 4

Exponential Relaxation
How many modifying edge relaxations could occur in an acyclic graph before all edges are fully
relaxed? Below is a weighted directed graph on 2n + 1 vertices and 3n edges for which the
relaxation framework could perform an exponential number of modifying relaxations, if edges are
relaxed in a bad order.

top

left
right

This graph contains n sections, with section i containing three edges, (v2i , v2i+1 ), (v2i , v2i+2 ), and
(v2i+1 , v2i+2 ), each with weight 2n−i ; we will call these edges within a section, left, top, and right
respectively. In this construction, the lowest weight path from v0 to vi is achieved by traversing
top edges until vi ’s section is reached. Shortest paths from v0 can easily by found by performing
only a linear number of modifying edge relaxations: relax the top and left edges of each successive
section. However, a bad relaxation order might result in many more modifying edge relaxations.

To demonstrate a bad relaxation order, initialize all minimum path weight estimates to ∞, except
d(s, s) = 0 for source s = v0 . First relax the left edge, then the right edge of section 0, updating
the shortest path estimate at v2 to d(s, v2 ) = 2n + 2n = 2n+1 . In actuality, the shortest path from
v0 to v2 is via the top edge, i.e., δ(s, v2 ) = 2n . But before relaxing the top edge of section 0,
recursively apply this procedure to fully relax the remainder of the graph, from section 1 to n − 1,
computing shortest path estimates based on the incorrect value of d(s, v2 ) = 2n+1 . Only then relax
the top edge of section 0, after which d(s, v2 ) is modiﬁed to its correct value 2n . Lastly, fully relax
sections 1 through n − 1 one more time recursively, to their correct and ﬁnal values.

How many modifying edge relaxations are performed by this edge relaxation ordering? Let T (n)
represent the number of modifying edge relaxations performed by the procedure on a graph con-
taining n sections, with recurrence relation given by T (n) = 3 + 2T (n − 2). The solution to this
recurrence is T (n) = O(2n/2 ), exponential in the size of the graph. Perhaps there exists some edge
relaxation order requiring only a polynomial number of modifying edge relaxations?

DAG Relaxation
In a directed acyclic graph (DAG), there can be no negative weight cycles, so eventually relaxation
must terminate. It turns out that relaxing each outgoing edge from every vertex exactly once in
a topological sort order of the vertices, correctly computes shortest paths. This shortest paths
algorithm is sometimes called DAG Relaxation.
Recitation 11 5

1 def DAG_Relaxation(Adj, w, s): # Adj: adjacency list, w: weights, s: start

2 _, order = dfs(Adj, s) # run depth-first search on graph
3 order.reverse() # reverse returned order
4 d = [float(’inf’) for _ in Adj] # shortest path estimates d(s, v)
5 parent = [None for _ in Adj] # initialize parent pointers
6 d[s], parent[s] = 0, s # initialize source
7 for u in order: # loop through vertices in topo sort
8 for v in Adj[u]: # loop through out-going edges of u
9 try_to_relax(Adj, w, d, parent, u, v) # try to relax edge from u to v
10 return d, parent # return weights, paths via parents

Claim: The DAG Relaxation algorithm computes shortest paths in a directed acyclic graph.
Proof. We prove that at termination, d(s, v) = δ(s, v) for all v ∈ V . First observe that Safety
ensures that a vertex not reachable from s will retain d(s, v) = +∞ at termination. Alterna-
tively, consider any shortest path π = (v1 , . . . , vm ) from v1 = s to any vertex vm = v reach-
able from s. The topological sort order ensures that edges of the path are relaxed in the order in
which they appear in the path. Assume for induction that before edge (vi , vi+1 ) ∈ π is relaxed,
d(s, vi ) = δ(s, vi ). Setting d(s, s) = 0 at the start provides a base case. Then relaxing edge
(vi , vi+1 ) sets d(s, vi+1 ) = δ(s, vi ) + w(vi , vi+1 ) = δ(s, vi+1 ), as sub-paths of shortest paths are
also shortest paths. Thus the procedure constructs shortest path weights as desired. Since depth-
ﬁrst search runs in linear time and the loops relax each edge exactly once, this algorithm takes
O(|V | + |E|) time.

Exercise: You have been recruited by MIT to take part in a new part time student initiative where
you will take only one class per term. You don’t care about graduating; all you really want to do
is to take 19.854, Advanced Quantum Machine Learning on the Blockchain: Neural Interfaces,
but are concerned because of its formidable set of prerequisites. MIT professors will allow you
take any class as long as you have taken at least one of the class’s prerequisites prior to taking the
class. But passing a class without all the prerequsites is difﬁcult. From a survey of your peers, you
know for each class and prerequisite pair, how many hours of stress the class will demand. Given
a list of classes, prerequisites, and surveyed stress values, describe a linear time algorithm to ﬁnd a
sequence of classes that minimizes the amount of stress required to take 19.854, never taking more
than one prerequisite for any class. You may assume that every class is offered every semester.

Solution: Build a graph with a vertex for every class and a directed edge from class a to class
b if b is a prerequisite of a, weighted by the stress of taking class a after having taken class b
as a prerequisite. Use topological sort relaxation to ﬁnd the shortest path from class 19.854 to
every other class. From the classes containing no prerequisites (sinks of the DAG), ﬁnd one with
minimum total stress to 19.854, and return its reversed shortest path.
MIT OpenCourseWare
https://ocw.mit.edu

6.006 Introduction to Algorithms

Spring 2020

Recitation 12

Bellman-Ford
In lecture, we presented a version of Bellman-Ford1 based on graph duplication and DAG Re-
laxation that solves SSSPs in O(|V ||E|) time and space, and can return a negative-weight cycle
reachable on a path from s to v, for any vertex v with δ(s, v) = −∞.

The original Bellman-Ford algorithm is easier to state but is a little less powerful. It solves SSSPs
in the same time using only O(|V |) space, but only detects whether a negative-weight cycle exists
(will not return such a negative weight cycle). It is based on the relaxation framework discussed in
R11. The algorithm is straight-forward: initialize distance estimates, and then relax every edge in
the graph in |V |−1 rounds. The claim is that: if the graph does not contain negative-weight cycles,
d(s, v) = δ(s, v) for all v ∈ V at termination; otherwise if any edge still relaxable (i.e., still violates
the triangle inequality), the graph contains a negative weight cycle. A Python implementation of
the Bellman-Ford algorithm is given below.

1 def bellman_ford(Adj, w, s): # Adj: adjacency list, w: weights, s: start

2 # initialization
3 infinity = float(’inf’) # number greater than sum of all + weights
4 d = [infinity for _ in Adj] # shortest path estimates d(s, v)
5 parent = [None for _ in Adj] # initialize parent pointers
6 d[s], parent[s] = 0, s # initialize source
7 # construct shortest paths in rounds
8 V = len(Adj) # number of vertices
9 for k in range(V - 1): # relax all edges in (V - 1) rounds
10 for u in range(V): # loop over all edges (u, v)
11 for v in Adj[u]: # relax edge from u to v
12 try_to_relax(Adj, w, d, parent, u, v)
13 # check for negative weight cycles accessible from s
14 for u in range(V): # Loop over all edges (u, v)
15 for v in Adj[u]:
16 if d[v] > d[u] + w(u,v): # If edge relax-able, report cycle
17 raise Exception(’Ack! There is a negative weight cycle!’)
18 return d, parent

This algorithm has the same overall structure as the general relaxation paradigm, but limits the
order in which edges can be processed. In particular, the algorithm relaxes every edge of the graph
(lines 10-12), in a series of |V | − 1 rounds (line 9). The following lemma establishes correctness
of the algorithm.

1
This algorithm is called Bellman-Ford after two researchers who independently proposed the same algorithm in
different contexts.
Recitation 12 2

Lemma 1 At the end of relaxation round i of Bellman-Ford, d(s, v) = δ(s, v) for any vertex v that
has a shortest path from s to v which traverses at most i edges.
Proof. Proof by induction on round i. At the start of the algorithm (at end of round 0), the only
vertex with shortest path from s traversing at most 0 edges is vertex s, and Bellman-Ford correctly
sets d(s, s) = 0 = δ(s, s). Now suppose the claim is true at the end of round i − 1. Let v be a
vertex containing a shortest path from s traversing at most i edges. If v has a shortest path from s
traversing at most i−1 edges, d(s, v) = δ(s, v) prior to round i, and will continue to hold at the end
of round i by the upper-bound property2 Alternatively, d(s, v) 6= δ(s, v) prior to round i, and let
u be the second to last vertex visited along some shortest path from s to v which traverses exactly
i edges. Some shortest path from s to u traverses at most i − 1 edges, so d(s, u) = δ(s, u) prior
to round i. Then after the edge from u to v is relaxed during round i, d(s, v) = δ(s, v) as desired.

If the graph does not contain negative weight cycles, some shortest path is simple, and contains at
most |V |−1 edges as it traverses any vertex of the graph at most once. Thus after |V |−1 rounds of
Bellman-Ford, d(s, v) = δ(s, v) for every vertex with a simple shortest path from s to v. However,
if after |V | − 1 rounds of relaxation, some edge (u, v) still violates the triangle inequality (lines
14-17), then there exists a path from s to v using |V | edges which has lower weight than all paths
using fewer edges. Such a path cannot be simple, so it must contain a negative weight cycle.

This algorithm runs |V | rounds, where each round performs a constant amount of work for each
edge in the graph, so Bellman-Ford runs in O(|V ||E|) time. Note that lines 10-11 actually take
O(|V | + |E|) time to loop over the entire adjacency list structure, even for vertices adjacent to no
edge. If the graph contains isolated vertices that are not S, we can just remove them from Adj to
ensure that |V | = O(|E|). Note that if edges are processed in a topological sort order with respect
to a shortest path tree from s, then Bellman-Ford will correctly compute shortest paths from s after
its first round; of course, it is not easy to find such an order. However, for many graphs, significant
savings can be obtained by stopping Bellman-Ford after any round for which no edge relaxation is
modifying.

Note that this algorithm is different than the one presented in lecture in two important ways:
• The original Bellman-Ford only keeps track of one ‘layer’ of d(s, v) estimates in each round,
while the lecture version keeps track of dk (s, v) for k ∈ {0, . . . , |V |}, which can be then used
to construct negative-weight cycles.
• A distance estimate d(s, v) in round k of original Bellman-Ford does not necessarily equal
dk (s, v), the k-edge distance to v computed in the lecture version. This is because the original
Bellman-Ford may relax multiple edges along a shortest path to v in a single round, while
the lecture version relaxes at most one in each level. In other words, distance estimate d(s, v)
in round k of original Bellman-Ford is never larger than dk (s, v), but it may be much smaller
and converge to a solution quicker than the lecture version, so may be faster in practice.
2
Recall that the Safety Lemma from Recitation 11 ensures that relaxation maintains δ(s, v) ≤ d(s, v) for all v.
Recitation 12 3

Exercise: Alice, Bob, and Casey are best friends who live in different corners of a rural school
district. During the summer, they decide to meet every Saturday at some intersection in the district
to play tee-ball. Each child will bike to the meeting location from their home along dirt roads.
Each dirt road between road intersections has a level of fun associated with biking along it in a
certain direction, depending on the incline and quality of the road, the number of animals passed,
etc. Road fun-ness may be positive, but could also be be negative, e.g. when a road is difficult
to traverse in a given direction, or passes by a scary dog, etc. The children would like to: choose
a road intersection to meet and play tee-ball that maximizes the total fun of all three children
in reaching their chosen meeting location; or alternatively, abandon tee-ball altogether in favor
of biking, if a loop of roads exists in their district along which they can bike all day with ever
increasing fun. Help the children organize their Saturday routine by finding a tee-ball location,
or determining that there exists a continuously fun bike loop in their district (for now, you do not
have to find such a loop). You may assume that each child can reach any road in the district by bike.

Solution: Construct a graph on road intersections within the district, as well as the locations a, b,
and c of the homes of the three children, with a directed edge from one vertex to another if there
is a road between them traversable in that direction by bike, weighted by negative fun-ness of the
road. If a negative weight cycle exists in this graph, such a cycle would represent a continuously
fun bike loop. To check for the existence of any negative weight cycle in the graph, run Bellman-
Ford from vertex a. If Bellman-Ford detects a negative weight cycle by ﬁnding an edge (u, v)
that can be relaxed in round |V |, return that a continuously fun bike loop exists. Alternatively, if
no negative weight cycle exists, minimal weighted paths correspond to bike routes that maximize
fun. Running Bellman-Ford from vertex a then computes shortest paths d(s, v) from a to each
vertex v in the graph. Run Bellman-Ford two more times, once from vertex b and once from vertex
c, computing shortest paths values d(b, v) and d(c, v) respectively for each vertex v in the graph.
Then for each vertex v, compute the sum d(a, v) + d(b, v) + d(c, v). A vertex that minimizes this
sum will correspond to a road intersection that maximizes total fun of all three children in reaching
it. This algorithm runs Bellman-Ford three times and then compares a constant sized sum at each
vertex, so this algorithm runs in O(|V ||E|) time.
MIT OpenCourseWare
https://ocw.mit.edu

6.006 Introduction to Algorithms

Spring 2020

Recitation 13

Dijkstra’s Algorithm
Dijkstra is possibly the most commonly used weighted shortest paths algorithm; it is asymptoti-
cally faster than Bellman-Ford, but only applies to graphs containing non-negative edge weights,
which appear often in many applications. The algorithm is fairly intuitive, though its implemen-
tation can be more complicated than that of other shortest path algorithms. Think of a weighted
graph as a network of pipes, each with non-negative length (weight). Then turn on a water faucet at
a source vertex s. Assuming the water flowing from the faucet traverses each pipe at the same rate,
the water will reach each pipe intersection vertex in the order of their shortest distance from the
source. Dijkstra’s algorithm discretizes this continuous process by repeatedly relaxing edges from
a vertex whose minimum weight path estimate is smallest among vertices whose out-going edges
have not yet been relaxed. In order to efficiently find the smallest minimum weight path estimate,
Dijkstra’s algorithm is often presented in terms of a minimum priority queue data structure. Dijk-
stra’s running time then depends on how efficiently the priority queue can perform its supported
operations. Below is Python code for Dijkstra’s algorithm in terms of priority queue operations.

1 def dijkstra(Adj, w, s):

2 d = [float(’inf’) for _ in Adj] # shortest path estimates d(s, v)
3 parent = [None for _ in Adj] # initialize parent pointers
4 d[s], parent[s] = 0, s # initialize source
5 Q = PriorityQueue() # initialize empty priority queue
6 V = len(Adj) # number of vertices
7 for v in range(V): # loop through vertices
8 Q.insert(v, d[v]) # insert vertex-estimate pair
9 for _ in range(V): # main loop
10 u = Q.extract_min() # extract vertex with min estimate
11 for v in Adj[u]: # loop through out-going edges
12 try_to_relax(Adj, w, d, parent, u, v)
13 Q.decrease_key(v, d[v]) # update key of vertex
14 return d, parent

This algorithm follows the same structure as the general relaxation framework. Lines 2-4 initialize
shortest path weight estimates and parent pointers. Lines 5-7 initialize a priority queue with all
vertices from the graph. Lines 8-12 comprise the main loop. Each time the loop is executed, line
9 removes a vertex from the queue, so the queue will be empty at the end of the loop. The vertex
u processed in some iteration of the loop is a vertex from the queue whose shortest path weight
estimate is smallest, from among all vertices not yet removed from the queue. Then, lines 10-11
relax the out-going edges from u as usual. However, since relaxation may reduce the shortest path
weight estimate d(s, v), vertex v’s key in the queue must be updated (if it still exists in the queue);
line 12 accomplishes this update.
Recitation 13 2

Why does Dijkstra’s algorithm compute shortest paths for a graph with non-negative edge weights?
The key observation is that shortest path weight estimate of vertex u equals its actual shortest path
weight d(s, u) = δ(s, u) when u is removed from the priority queue. Then by the upper-bound
property, d(s, u) = δ(s, u) will still hold at termination of the algorithm. A proof of correctness is
described in the lecture notes, and will not be repeated here. Instead, we will focus on analyzing
running time for Dijkstra implemented using different priority queues.

Exercise: Construct a weighted graph with non-negative edge weights, and apply Dijkstra’s algo-
rithm to ﬁnd shortest paths. Speciﬁcally list the key-value pairs stored in the priority queue after
each iteration of the main loop, and highlight edges corresponding to constructed parent pointers.

Priority Queues
An important aspect of Dijkstra’s algorithm is the use of a priority queue. The priority queue
interface used here differs slightly from our presentation of priority queues earlier in the term.
Here, a priority queue maintains a set of key-value pairs, where vertex v is a value and d(s, v) is its
key. Aside from empty initialization, the priority queue supports three operations: insert(val,
key) adds a key-value pair to the queue, extract min() removes and returns a value from the
queue whose key is minimum, and decrease key(val, new key) which reduces the key of
a given value stored in the queue to the provided new key. The running time of Dijkstra depends
on the running times of these operations. Speciﬁcally, if Ti , Te , and Td are the respective running
times for inserting a key-value pair, extracting a value with minimum key, and decreasing the key
of a value, the running time of Dijkstra will be:

TDijkstra = O(|V | · Ti + |V | · Te + |E| · Td ).

There are many different ways to implement a priority queue, achieving different running times
for each operation. Probably the simplest implementation is to store all the vertices and their
current shortest path estimate in a dictionary. A hash table of size O(|V |) can support expected
constant time O(1) insertion and decrease-key operations, though to find and extract the vertex
with minimum key takes linear time O(|V |). If the vertices are indices into the vertex set with
a linear range, then we can alternatively use a direct access array, leading to worst case O(1)
time insertion and decrease-key, while remaining linear O(|V |) to find and extract the vertex with
minimum key. In either case, the running time for Dijkstra simplifies to:

TDict = O(|V |2 + |E|).

This is actually quite good! If the graph is dense, |E| = Ω(|V |2 ), this implementation is linear in
the size of the input! Below is a Python implementation of Dijkstra using a direct access array to
implement the priority queue.
Recitation 13 3

1 class PriorityQueue: # Hash Table Implementation

2 def __init__(self): # stores keys with unique labels
3 self.A = {}
4
5 def insert(self, label, key): # insert labeled key
6 self.A[label] = key
7
8 def extract_min(self): # return a label with minimum key
9 min_label = None
10 for label in self.A:
11 if (min_label is None) or (self.A[label] < self.A[min_label].key):
12 min_label = label
13 del self.A[min_label]
14 return min_label
15
16 def decrease_key(self, label, key): # decrease key of a given label
17 if (label in self.A) and (key < self.A[label]):
18 self.A[label] = key

If the graph is sparse, |E| = O(|V |), we can speed things up with more sophisticated priority queue
implementations. We’ve seen that a binary min heap can implement insertion and extract-min in
O(log n) time. However, decreasing the key of a value stored in a priority queue requires finding
the value in the heap in order to change its key, which naively could take linear time. However,
this difficulty is easily addressed: each vertex can maintain a pointer to its stored location within
the heap, or the heap can maintain a mapping from values (vertices) to locations within the heap
(you were asked to do this in Problem Set 5). Either solution can support finding a given value
in the heap in constant time. Then, after decreasing the value’s key, one can restore the min heap
property in logarithmic time by re-heapifying the tree. Since a binary heap can support each of the
three operations in O(log |V |) time, the running time of Dijkstra will be:
THeap = O((|V | + |E|) log |V |).
For sparse graphs, that’s O(|V | log |V |)! For graphs in between sparse and dense, there is an even
more sophisticated priority queue implementation using a data structure called a Fibonacci Heap,
which supports amortized O(1) time insertion and decrease-key operations, along with O(log n)
minimum extraction. Thus using a Fibonacci Heap to implement the Dijkstra priority queue leads
to the following worst-case running time:
TF ibHeap = O(|V | log |V | + |E|).
We won’t be talking much about Fibonacci Heaps in this class, but they’re theoretically useful for
speeding up Dijkstra on graphs that have a number of edges asymptotically in between linear and
quadratic in the number of graph vertices. You may quote the Fibonacci Heap running time bound
whenever you need to argue the running time of Dijkstra when solving theory questions.
Recitation 13 4

1 class Item:
2 def __init__(self, label, key):
3 self.label, self.key = label, key
4
5 class PriorityQueue: # Binary Heap Implementation
6 def __init__(self): # stores keys with unique labels
7 self.A = []
8 self.label2idx = {}
9
10 def min_heapify_up(self, c):
11 if c == 0: return
12 p = (c - 1) // 2
13 if self.A[p].key > self.A[c].key:
14 self.A[c], self.A[p] = self.A[p], self.A[c]
15 self.label2idx[self.A[c].label] = c
16 self.label2idx[self.A[p].label] = p
17 self.min_heapify_up(p)
18

19 def min_heapify_down(self, p):

20 if p >= len(self.A): return
21 l = 2 * p + 1
22 r = 2 * p + 2
23 if l >= len(self.A): l = p
24 if r >= len(self.A): r = p
25 c = l if self.A[r].key > self.A[l].key else r
26 if self.A[p].key > self.A[c].key:
27 self.A[c], self.A[p] = self.A[p], self.A[c]
28 self.label2idx[self.A[c].label] = c
29 self.label2idx[self.A[p].label] = p
30 self.min_heapify_down(c)
31
32 def insert(self, label, key): # insert labeled key
33 self.A.append(Item(label, key))
34 idx = len(self.A) - 1
35 self.label2idx[self.A[idx].label] = idx
36 self.min_heapify_up(idx)
37

38 def extract_min(self): # remove a label with minimum key

39 self.A[0], self.A[-1] = self.A[-1], self.A[0]
40 self.label2idx[self.A[0].label] = 0
41 del self.label2idx[self.A[-1].label]
42 min_label = self.A.pop().label
43 self.min_heapify_down(0)
44 return min_label
45
46 def decrease_key(self, label, key): # decrease key of a given label
47 if label in self.label2idx:
48 idx = self.label2idx[label]
49 if key < self.A[idx].key:
50 self.A[idx].key = key
51 self.min_heapify_up(idx)
Recitation 13 5

Fibonacci Heaps are not actually used very often in practice as it is more complex to implement,
and results in larger constant factor overhead than the other two implementations described above.
When the number of edges in the graph is known to be at most linear (e.g., planar or bounded
degree graphs) or at least quadratic (e.g. complete graphs) in the number of vertices, then using a
binary heap or dictionary respectively will perform as well asymptotically as a Fibonacci Heap.

We’ve made a JavaScript Dijkstra visualizer which you can ﬁnd here:
https://codepen.io/mit6006/pen/BqgXWM

Exercise: CIA ofﬁcer Mary Cathison needs to drive to meet with an informant across an unwel-
come city. Some roads in the city are equipped with government surveillance cameras, and Mary
will be detained if cameras from more than one road observe her car on the way to her informant.
Mary has a map describing the length of each road and the locations and ranges of surveillance
cameras. Help Mary ﬁnd the shortest drive to reach her informant, being seen by at most one
surveillance camera along the way.

Solution: Construct a graph having two vertices (v, 0) and (v, 1) for every road intersection v
within the city. Vertex (v, i) represents arriving at intersection v having already been spotted by
exactly i camera(s). For each road from intersection u to v: add two directed edges from (u, 0) to
(v, 0) and from (u, 1) to (v, 1) if traveling on the road will not be visible by a camera; and add one
directed edge from (u, 0) to (v, 1) if traveling on the road will be visible. If s is Mary’s start location
and t is the location of the informant, any path from (s, 0) to (t, 0) or (t, 1) in the constructed graph
will be a path visible by at most one camera. Let n be the number of road intersections and m be the
number of roads in the network. Assuming lengths of roads are positive, use Dijkstra’s algorithm
to ﬁnd the shortest such path in O(m + n log n) time using a Fibonacci Heap for Dijkstra’s priority
queue. Alternatively, since the road network is likely planar and/or bounded degree, it may be
safe to assume that m = O(n), so a binary heap could be used instead to ﬁnd a shortest path in
O(n log n) time.
MIT OpenCourseWare
https://ocw.mit.edu

6.006 Introduction to Algorithms

Spring 2020

Recitation 14

Single Source Shortest Paths Review

We’ve learned four algorithms to solve the single source shortest paths (SSSP) problem; they are
listed in the table below. Then, to solve shortest paths problems, you must ﬁrst deﬁne or construct
a graph related to your problem, and then running an SSSP algorithm on that graph in a way that
solves your problem. Generally, you will want to use the fastest SSSP algorithm that solves your
problem. Bellman-Ford applies to any weighted graph but is the slowest of the four, so we prefer
the other algorithms whenever they are applicable.

Restrictions SSSP Algorithm

Graph Weights Name Running Time O(·)
General Unweighted BFS |V | + |E|
DAG Any DAG Relaxation |V | + |E|
General Non-negative Dijkstra |V | log |V | + |E|
General Any Bellman-Ford |V | · |E|

We presented these algorithms with respect to the SSSP problem, but along the way, we also
showed how to use these algorithms to solve other problems. For example, we can also count
connected components in a graph using Full-DFS or Full-BFS, topologically sort vertices in a
DAG using DFS, and detect negative weight cycles using Bellman-Ford.

All Pairs Shortest Paths

Given a weighted graph G = (V, E, w), the (weighted) All Pairs Shortest Paths (APSP) problem
asks for the minimum weight δ(u, v) of any path from u to v for every pair of vertices u, v ∈ V . To
make the problem a little easier, if there exists a negative weight cycle in G, our algorithm is not
required to return any output. A straight-forward way to solve this problem is to reduce to solving
an SSSP problem |V | times, once from each vertex in V . This strategy is actually quite good for
special types of graphs! For example, suppose we want to solve ASPS on an unweighted graph
that is sparse (i.e. |E| = O(|V |)). Running BFS from each vertex takes O(|V |2 ) time. Since
we need to return a value δ(u, v) for each pair of vertices, any ASPS algorithm requires at least
Ω(V 2 ) time, so this algorithm is optimal for graphs that are unweighted and sparse. However,
for general graphs, possibly containing negative weight edges, running Bellman-Ford |V | times is
quite slow, O(|V |2 |E|), a factor of |E| larger than the output. By contrast, if we have a graph that
only has non-negative weights, applying Dijkstra |V | times takes O(|V |2 log |V | + |V ||E|) time.
On a sparse graph, running Dijkstra |V | times is only a log |V | factor larger than the output, while
|V | times Bellman-Ford is a linear |V | factor larger. Is it possible to solve the APSP problem on
general weighted graphs faster than O(|V |2 |E|)?
Recitation 14 2

Johnson’s Algorithm
The idea behind Johnson’s Algorithm is to reduce the ASPS problem on a graph with arbitrary
edge weights to the ASPS problem on a graph with non-negative edge weights. The algorithm
does this by re-weighting the edges in the original graph to non-negative values in such a way so
that shortest paths in the re-weighted graph are also shortest paths in the original graph. Then ﬁnd-
ing shortest paths in the re-weighted graph using |V | times Dijkstra will solve the original problem.
How can we re-weight edges in a way that preserves shortest paths? Johnson’s clever idea is to
assign each vertex v a real number h(v), and change the weight of each edge (a, b) from w(a, b) to
w0 (a, b) = w(a, b) + h(a) − h(b), to form a new weight graph G0 = (V, E, w0 ).
Claim: A shortest path (v1 , v2 , . . . , vk ) in G0 is also a shortest path in G from v1 to vk .
Proof. Let w(π) = k−1 0
P
i=1 w(vi , vi+1 ) be the weight of path π in G. Then weight of π in G is:
k−1
X k−1
X
0
w (vi , vi+1 ) = w(vi , vi+1 ) + h(vi ) − h(vi+1 )
i=1 i=1
k−1
! k−1
! k−1
!
X X X
= w(vi , vi+1 ) + h(vi ) − h(vi+1 ) = w(π) + h(v1 ) − h(vk ).
i=1 i=1 i=1

So, since each path from v1 to vk is increased by the same number h(v1 ) − h(vk ), shortest paths
remain shortest.

It remains to find a vertex assignment function h, for which all edge weights w0 (a, b) in the modi-
fied graph are non-negative. Johnson’s defines h in the following way: add a new node x to G with
a directed edge from x to v for each vertex v ∈ V to construct graph G∗ , letting h(v) = δ(x, v).
This assignment of h ensures that w0 (a, b) ≥ 0 for every edge (a, b).
Claim: If h(v) = δ(x, v) and h(v) is finite, then w0 (a, b) = w(a, b) + h(a) − h(b) ≥ 0 for every
edge (a, b) ∈ E.
Proof. The claim is equivalent to claiming δ(x, b) ≤ w(a, b) + δ(x, a) for every edge (a, b) ∈ E,
i.e. the minimum weight of any path from x to b in G∗ is not greater than the minimum weight of
any path from x to a than traversing the edge from a to b, which is true by definition of minimum
weight. (This is simply a restatement of the triangle inequality.)

Johnson’s algorithm computes h(v) = δ(x, v), negative minimum weight distances from the added
node x, using Bellman-Ford. If δ(x, v) = −∞ for any vertex v, then there must be a negative
weight cycle in the graph, and Johnson’s can terminate as no output is required. Otherwise, John-
son’s can re-weight the edges of G to w0 (a, b) = w(a, b) + h(a) − h(b) ≥ 0 into G0 containing only
positive edge weights. Since shortest paths in G0 are shortest paths in G, we can run Dijkstra |V |
times on G0 to ﬁnd a single source shortest paths distances δ 0 (u, v) from each vertex u in G0 . Then
we can compute each δ(u, v) by setting it to δ 0 (u, v)−δ(x, u)+δ(x, y). Johnson’s takes O(|V ||E|)
time to run Bellman-Ford, and O(|V |(|V | log |V | + |E|)) time to run Dijkstra |V | times, so this
algorithm runs in O(|V |2 log |V | + |V ||E|) time, asymptotically better than O(|V |2 |E|).
MIT OpenCourseWare
https://ocw.mit.edu

6.006 Introduction to Algorithms

Spring 2020

Recitation 15

Dynamic Programming
Dynamic Programming generalizes Divide and Conquer type recurrences when subproblem de-
pendencies form a directed acyclic graph instead of a tree. Dynamic Programming often applies to
optimization problems, where you are maximizing or minimizing a single scalar value, or counting
problems, where you have to count all possibilities. To solve a problem using dynamic program-
ming, we follow the following steps as part of a recursive problem solving framework.

How to Solve a Problem Recursively (SRT BOT)

1. Subproblem deﬁnition subproblem x ∈ X

• Describe the meaning of a subproblem in words, in terms of parameters

• Often subsets of input: preﬁxes, sufﬁxes, contiguous subsequences
• Often record partial state: add subproblems by incrementing some auxiliary variables

2. Relate subproblem solutions recursively x(i) = f (x(j), . . .) for one or more j < i

3. Topological order to argue relation is acyclic and subproblems form a DAG

4. Base cases

• State solutions for all (reachable) independent subproblems where relation doesn’t ap-
ply/work

5. Original problem

• Show how to compute solution to original problem from solutions to subproblems

• Possibly use parent pointers to recover actual solution, not just objective function

6. Time analysis
P
• x∈X work(x), or if work(x) = O(W ) for all x ∈ X, then |X| · O(W )

• work(x) measures nonrecursive work in relation; treat recursions as taking O(1) time
Recitation 15 2

Implementation
Once subproblems are chosen and a DAG of dependencies is found, there are two primary methods
for solving the problem, which are functionally equivalent but are implemented differently.

• A top down approach evaluates the recursion starting from roots (vertices incident to no
incoming edges). At the end of each recursive call the calculated solution to a subproblem
is recorded into a memo, while at the start of each recursive call, the memo is checked to see
if that subproblem has already been solved.

• A bottom up approach calculates each subproblem according to a topological sort order of

the DAG of subproblem dependencies, also recording each subproblem solution in a memo
so it can be used to solve later subproblems. Usually subproblems are constructed so that a
topological sort order is obvious, especially when subproblems only depend on subproblems
having smaller parameters, so performing a DFS to ﬁnd this ordering is usually unnecessary.

Top down is a recursive view, while Bottom up unrolls the recursion. Both implementations are
valid and often used. Memoization is used in both implementations to remember computation from
previous subproblems. While it is typical to memoize all evaluated subproblems, it is often possi-
ble to remember (memoize) fewer subproblems, especially when subproblems occur in ‘rounds’.

Often we don’t just want the value that is optimized, but we would also like to return a path of
subproblems that resulted in the optimized value. To reconstruct the answer, we need to maintain
auxiliary information in addition to the value we are optimizing. Along with the value we are
optimizing, we can maintain parent pointers to the subproblem or subproblems upon which a
solution to the current subproblem depends. This is analogous to maintaining parent pointers in
shortest path problems.

Exercise: Simpliﬁed Blackjack

We deﬁne a simpliﬁed version of the game blackjack between one player and a dealer. A deck of
cards is an ordered sequence of n cards D = (c1 , . . . , cn ), where each card ci is an integer between
1 and 10 inclusive (unlike in real blackjack, aces will always have value 1). Blackjack is played in
rounds. In one round, the dealer will draw the top two cards from the deck (initially c1 and c2 ),
then the player will draw the next two cards (initially c3 and c4 ), and then the player may either
choose to draw or not draw one additional card (a hit).

The player wins the round if the value of the player’s hand (i.e., the sum of cards drawn by the
player in the round) is ≤ 21 and exceeds the value of the dealer’s hand; otherwise, the player
loses the round. The game ends when a round ends with fewer than 5 cards remaining in the deck.
Given a deck of n cards with a known order, describe an O(n)-time algorithm to determine the
maximum number of rounds the player can win by playing simpliﬁed blackjack with the deck.
Recitation 15 3

Solution:

1. Subproblems

• Choose sufﬁxes
• x(i) : maximum rounds player can win by playing blackjack using cards (ci , . . . , cn )

2. Relate

• Guess whether the player hits or not

• Dealer’s hand always has value ci + ci+1
• Player’s hand will have value either:
– ci+2 + ci+3 (no hit, 4 cards used in round), or
– ci+2 + ci+3 + ci+4 (hit, 5 cards used in round)
• Let w(d, p) be the round result given hand values d and p (dealer and player)
– player win: w(d, p) = 1 if d < p ≤ 21
– player loss: w(d, p) = 0 otherwise (if p ≤ d or 21 < p)
• x(i) = max{w(ci +ci+1 , ci+2 +ci+3 )+x(i+4), w(ci +ci+1 , ci+2 +ci+3 +ci+4 )+x(i+5)}
• (for n − (i − 1) ≥ 5, i.e., i ≤ n − 4)

3. Topo

• Subproblems x(i) only depend on strictly larger i, so acyclic

4. Base

• x(n − 3) = x(n − 2) = x(n − 1) = x(n) = x(n + 1) = 0

• (not enough cards for another round)

5. Original

• Solve x(i) for i ∈ {1, . . . , n + 1}, via recursive top down or iterative bottom up
• x(1): the maximum rounds player can win by playing blackjack with the full deck

6. Time

• # subproblems: n + 1
• work per subproblem: O(1)
• O(n) running time
Recitation 15 4

Exercise: Text Justiﬁcation

Text Justification is the problem of fitting a sequence of n space separated words into a column of
lines with constant width s, to minimize the amount of white-space between words. Each word can
be represented by its width wi < s. A good way to minimize white space in a line is to minimize
badness of a line. Assuming a line contains words from wi to wj , the badness of the line is defined
as b(i, j) = (s − (wi + . . . + wj ))3 if s > (wi + . . . + wj ), and b(i, j) = ∞ otherwise. A good
text justification would then partition words into lines to minimize the sum total of badness over all
lines containing words. The cubic power heavily penalizes large white space in a line. Microsoft
Word uses a greedy algorithm to justify text that puts as many words into a line as it can before
moving to the next line. This algorithm can lead to some really bad lines. LATEX on the other hand
formats text to minimize this measure of white space using a dynamic program. Describe an O(n2 )
algorithm to fit n words into a column of width s that minimizes the sum of badness over all lines.

Solution:

1. Subproblems

• Choose sufﬁxes as subproblems

• x(i): minimum badness sum of formatting the words from wi to wn−1

2. Relate

• The ﬁrst line must break at some word, so try all possibilities
• x(i) = min{b(i, j) + x(j + 1) | i ≤ j < n}

3. Topo

• Subproblems x(i) only depend on strictly larger i, so acyclic

4. Base

• x(n) = 0 badness of justifying zero words is zero

5. Original

• Solve subproblems via recursive top down or iterative bottom up

• Solution to original problem is x(0)
• Store parent pointers to reconstruct line breaks

6. Time

• # subproblems: O(n)
• work per subproblem: O(n2 )
Recitation 15 5

• O(n3 ) running time

• Can we do even better?

Optimization

• Computing badness b(i, j) could take linear time!

• If we could pre-compute and remember each b(i, j) in O(1) time, then:

• work per subproblem: O(n)

• O(n2 ) running time

Pre-compute all b(i, j) in O(n2 ), also using dynamic programming!

1. Subproblems

• x(i, j): sum of word lengths wi to wj

2. Relate
P
• x(i, j) = k wk takes O(j − i) time to compute, slow!
• x(i, j) = x(i, j − 1) + wj takes O(1) time to compute, faster!

3. Topo

• Subproblems x(i, j) only depend on strictly smaller j − i, so acyclic

4. Base

• x(i, i) = wi for all 0 ≤ i < n, just one word

5. Original

• Solve subproblems via recursive top down or iterative bottom up

• Compute each b(i, j) = (s − x(i, j))3 in O(1) time

6. Time

• # subproblems: O(n2 )
• work per subproblem: O(1)
• O(n2 ) running time
MIT OpenCourseWare
https://ocw.mit.edu

6.006 Introduction to Algorithms

Spring 2020

Recitation 16

Dynamic Programming Exercises

Max Subarray Sum
Given an array A of n integers, what is the largest sum of any nonempty subarray?
(in this class, subarray always means a contiguous sequence of elements)
Example: A = [-9, 1, -5, 4, 3, -6, 7, 8, -2], largest subsum is 16.

Solution: We could brute force in O(n3 ) by computing the sum of each of the O(n2 ) subarrays in
O(n) time. We can get a faster algorithm by noticing that the subarray with maximum sum must
end somewhere. Finding the maximum subarray ending at a particular location k can be computed
in O(n) time by scanning to the left from k, keeping track of a rolling sum, and remembering the
maximum along the way; since there are n ending locations, this algorithm runs in O(n2 ) time.
We can do even faster by recognizing that each successive scan to the left is redoing work that has
already been done in earlier scans. Let’s use dynamic programming to reuse this work!

1. Subproblems

• x(k): the max subarray sum ending at A[k]

• Prefix subproblem, but with condition, like Longest Increase Subsequence
• (Exercise: reformulate in terms of suffixes instead of prefixes)

2. Relate

• Maximizing subarray ending at k either uses item k − 1 or it doesn’t

• If it doesn’t, then subarray is just A[k]
• Otherwise, k − 1 is used, and we should include the maximum subarray ending at k − 1
• x(k) = max{A[k], A[k] + x(k − 1)}

3. Topo. Order

• Subproblems x(k) only depend on strictly smaller k, so acyclic

4. Base

• x(0) = A[0] (since subarray must be nonempty)

5. Original

• Solve subproblems via recursive top down or iterative bottom up

Recitation 16 2

• Solution to original is max of all subproblems, i.e., max{x(k) | k ∈ {0, . . . , n − 1}}

• Subproblems are used twice: when computing the next larger, and in the ﬁnal max

6. Time

• # subproblems: O(n)
• work per subproblem: O(1)
• time to solve original problem: O(n)
• O(n) time in total

1 # bottom up implementation
2 def max_subarray_sum(A):
3 x = [None for _ in A] # memo
4 x[0] = A[0] # base case
5 for k in range(1, len(A)): # iteration
6 x[k] = max(A[k], A[k] + x[k - 1]) # relation
7 return max(x) # original

Edit Distance
A plagiarism detector needs to detect the similarity between two texts, string A and string B. One
measure of similarity is called edit distance, the minimum number of edits that will transform
string A into string B. An edit may be one of three operations: delete a character of A, replace a
character of A with another letter, and insert a character between two characters of A. Describe a
O(|A||B|) time algorithm to compute the edit distance between A and B.

Solution:
1. Subproblems

• Approach will be to modify A until its last character matches B

• x(i, j): minimum number of edits to transform prefix up to A(i) to prefix up to B(j)
• (Exercise: reformulate in terms of suffixes instead of prefixes)

2. Relate

• If A(i) = B(j), then match!

• Otherwise, need to edit to make last element of A equal to B(j)
• Edit is either an insertion, replace, or deletion (Guess!)
• Deletion removes A(i)
• Insertion adds B(j) to end of A, then removes it and B(j)
Recitation 16 3

• Replace changes A(i) to B(j) and removes both A(i) and B(j)

x(i − 1, j − 1) if A(i) = B(i)
• x(i, j) =
1 + min(x(i − 1, j), x(i, j − 1), x(i − 1, j − 1)) otherwise
3. Topo. Order

• Subproblems x(i, j) only depend on strictly smaller i and j, so acyclic

4. Base

• x(i, 0) = i, x(0, j) = j (need many insertions or deletions)

5. Original

• Solve subproblems via recursive top down or iterative bottom up

• Solution to original problem is x(|A|, |B|)
• (Can store parent pointers to reconstruct edits transforming A to B)

6. Time

• # subproblems: O(n2 )
• work per subproblem: O(1)
• O(n2 ) running time

1 def edit_distance(A, B):

2 x = [[None] * len(A) for _ in range(len(B))] # memo
3 x[0][0] = 0 # base cases
4 for i in range(1, len(A)):
5 x[i][0] = x[i - 1][0] + 1 # delete A[i]
6 for j in range(1, len(B)):
7 x[0][j] = x[0][j - 1] + 1 # insert B[j] into A
8 for i in range(1, len(A)): # dynamic program
9 for j in range(1, len(B)):
10 if A[i] == B[j]:
11 x[i][j] = x[i - 1][j - 1] # matched! no edit needed
12 else: # edit needed!
13 ed_del = 1 + x[i - 1][j] # delete A[i]
14 ed_ins = 1 + x[i][j - 1] # insert B[j] after A[i]
15 ed_rep = 1 + x[i - 1][j - 1] # replace A[i] with B[j]
16 x[i][j] = min(ed_del, ed_ins, ed_rep)
17 return x[len(A) - 1][len(B) - 1]

Exercise: Modify the code above to return a minimal sequence of edits to transform string A into
string B. (Note, the base cases in the above code are computed individually to make reconstructing
a solution easier.)
MIT OpenCourseWare
https://ocw.mit.edu

6.006 Introduction to Algorithms

Spring 2020

Recitation 17

Treasureship!
The new boardgame Treasureship is played by placing 2 × 1 ships within a 2 × n rectangular grid.
Just as in regular battleship, each 2 × 1 ship can be placed either horizontally or vertically, occupy-
ing exactly 2 grid squares, and each grid square may only be occupied by a single ship. Each grid
square has a positive or negative integer value, representing how much treasure may be acquired
or lost at that square. You may place as many ships on the board as you like, with the score of a
placement of ships being the value sum of all grid squares covered by ships. Design an efﬁcient
dynamic-programming algorithm to determine a placement of ships that will maximize your total
score.

Solution:

1. Subproblems

• The game board has n columns of height 2 (alternatively 2 rows of width n)

• Let v(x, y) denote the grid value at row y column x, for y ∈ {1, 2} and x ∈ {1, . . . , n}
• Guess how to cover the right-most square(s) in an optimal placement
• Can either:
– not cover,
– place a ship to cover vertically, or
– place a ship to cover horizontally.
• After choosing an option, the remainder of the board may not be a rectangle
• Right side of board looks like one of the following cases:

1 (0) ##### (+1) ##### (-1) #### (+2) ##### (-2) ### row 2
2 ##### #### ##### ### ##### row 1

• Exists optimal placement where no two ships aligned horizontally on top of each other
• Proof: cover instead by two vertical ships next to each other!
• So actually only need ﬁrst three cases above: 0, +1, −1
• Let s(i, j) represent game board subset containing columns 1 to i of row 1, and columns
1 to i + j of row 2, for j ∈ {0, +1, −1}
• x(i, j): maximum score, only placing ships on board subset s(i, j)
• for i ∈ {0, . . . , n}, j ∈ {0, +1, −1}
Recitation 17 2

2. Relate

• If j = +1, can cover right-most square with horizontal ship or leave empty
• If j = −1, can cover right-most square with horizontal ship or leave empty
• If j = 0, can cover column i with vertical ship or not cover one of right-most squares
⎧
⎨ max{v(i, 1) + v(i − 1, 1) + x(i − 2, +1), x(i − 1, 0)} if j = −1
• x(i, j) = max{v(i + 1, 2) + v(i, 2) + x(i, −1), x(i, 0)} if j = +1
max{v(i, 1) + v(i, 2) + x(i − 1, 0), x(i, −1), x(i − 1, +1)} if j = 0
⎩

3. Topo

• Subproblems x(i, j) only depend on strictly smaller 2i + j, so acyclic.

4. Base

• s(i, j) contains 2i + j grid squares

• x(i, j) = 0 if 2i + j < 2 (can’t place a ship if fewer than 2 squares!)

5. Original

• Solution is x(n, 0), the maximum considering all grid squares.

• Store parent pointers to reconstruct ship locations

6. Time

• # subproblems: O(n)
• work per subproblem O(1)
• O(n) running time
Recitation 17 3

Wafer Power
A start-up is working on a new electronic circuit design for highly-parallel computing. Evenly-
spaced along the perimeter of a circular wafer sits n ports for either a power source or a computing
unit. Each computing unit needs energy from a power source, transferred between ports via a
wire etched into the top surface of the wafer. However, if a computing unit is connected to a power
source that is too close, the power can overload and destroy the circuit. Further, no two etched wires
may cross each other. The circuit designer needs an automated way to evaluate the effectiveness
of different designs, and has asked you for help. Given an arrangement of power sources and
computing units plugged into the n ports, describe an O(n3 )-time dynamic programming algorithm
to match computing units to power sources by etching non-crossing wires between them onto the
surface of the wafer, in order to maximize the number of powered computing units, where wires
may not connect two adjacent ports along the perimeter. Below is an example wafer, with non-
crossing wires connecting computing units (white) to power sources (black).

Solution:

1. Subproblems

• Let (a1 , . . . , an ) be the ports cyclically ordered counter-clockwise around the wafer,
where ports a1 and an are adjacent
• Let ai be True if the port is a computing unit, and False if it is a power source
• Want to match opposite ports connected by non-crossing wires
• If match across the wafer, need to independently match ports on either side (substrings!)
• x(i, j): maximum number of matchings, restricting to ports ak for all k ∈ {i, . . . , j}
• for i ∈ {1, . . . , n}, j ∈ {i − 1, . . . , n}
• j − i + 1 is number of ports in substring (allow j = i − 1 as an empty substring)
Recitation 17 4

2. Relate

• Guess what port to match with ﬁrst port in substring. Either:

– ﬁrst port does not match with anything, try to match the rest;
– ﬁrst port matches a port in the middle, try to match each side independently.
• Non-adjacency condition restricts possible matchings between i and some port t:
– if (i, j) = (1, n), can’t match i with last port n or 2, so try t ∈ {3, . . . , n − 1}
– otherwise, just can’t match i with i + 1, so try t ∈ {i + 2, . . . , j}
• Let m(i, j) = 1 if ai 6= aj and m(i, j) = 0 otherwise (ports of opposite type match)
• x(1, n) = max{x(2, n)} ∪ {m(1, t) + x(2, t − 1) + x(t + 1, n) | t ∈ {3, . . . , n − 1}}
• x(i, j) = max{x(i + 1, j)} ∪ {m(i, t) + x(i + 1, t − 1) + x(t + 1, j) | t ∈ {i + 2, . . . , j}}

3. Topo

• Subproblems x(i, j) only depend on strictly smaller j − i, so acyclic

4. Base

• x(i, j) = 0 for any j − i + 1 ∈ {0, 1, 2} (no match within 0, 1, or 2 adjacent ports)

5. Original

• Solve subproblems via recursive top down or iterative bottom up.

• Solution to original problem is x(1, n).
• Store parent pointers to reconstruct matching (e.g., choice of t or no match at each step)

6. Time

• # subproblems: O(n2 )
• work per subproblem: O(n)
• O(n3 ) running time
MIT OpenCourseWare
https://ocw.mit.edu

6.006 Introduction to Algorithms

Spring 2020

For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms

2024 Edexcel A Level Predicted Paper 3 Version 1
No ratings yet
2024 Edexcel A Level Predicted Paper 3 Version 1
17 pages
Comp 272 Notes
0% (1)
Comp 272 Notes
26 pages
Chapter 1
No ratings yet
Chapter 1
8 pages
1.introduction DS Unit-1
No ratings yet
1.introduction DS Unit-1
39 pages
MIT6 006S20 Review1
No ratings yet
MIT6 006S20 Review1
5 pages
01 - Introduction + Sorting
No ratings yet
01 - Introduction + Sorting
49 pages
Birla Institute of Technology and Science, Pilani: Pilani Campus AUGS/ AGSR Division
No ratings yet
Birla Institute of Technology and Science, Pilani: Pilani Campus AUGS/ AGSR Division
8 pages
AlgDs1LectureNotes-2025-02-16
No ratings yet
AlgDs1LectureNotes-2025-02-16
89 pages
Notes On Data Structures and Algorithms: Dr. Anindita Kundu
No ratings yet
Notes On Data Structures and Algorithms: Dr. Anindita Kundu
64 pages
Data Structures and Algorithms: CS210/CS210A
No ratings yet
Data Structures and Algorithms: CS210/CS210A
31 pages
Data Structure-ECE NOTES
No ratings yet
Data Structure-ECE NOTES
102 pages
Untitled
No ratings yet
Untitled
66 pages
Alg Ds 1 Lecture Notes
No ratings yet
Alg Ds 1 Lecture Notes
86 pages
Mc4101 Ads Notes Advance Data Structure Nodes
0% (1)
Mc4101 Ads Notes Advance Data Structure Nodes
144 pages
DSA Theory Notes
No ratings yet
DSA Theory Notes
22 pages
@3 All UNIT 3 perfect UNIT 4 perfect
No ratings yet
@3 All UNIT 3 perfect UNIT 4 perfect
144 pages
DSAD Regular HO
No ratings yet
DSAD Regular HO
6 pages
CM1035 Ads
No ratings yet
CM1035 Ads
29 pages
Algorithms Lecture Notes Cambridge
No ratings yet
Algorithms Lecture Notes Cambridge
133 pages
01 Introduction
No ratings yet
01 Introduction
73 pages
1 INTRO
No ratings yet
1 INTRO
125 pages
Mc4101 Ads Notes Advance Data Structure Nodes
No ratings yet
Mc4101 Ads Notes Advance Data Structure Nodes
144 pages
Dsad Ho
No ratings yet
Dsad Ho
6 pages
CS_F211_1094_20250105224042
No ratings yet
CS_F211_1094_20250105224042
8 pages
Lec01 Motivation
No ratings yet
Lec01 Motivation
30 pages
Lecture5 Data Stuctures Algorithms
No ratings yet
Lecture5 Data Stuctures Algorithms
53 pages
Data Structure Part 1
No ratings yet
Data Structure Part 1
94 pages
ProblemSolvingwithAlgorithmsandDataStructures_2
No ratings yet
ProblemSolvingwithAlgorithmsandDataStructures_2
240 pages
Dsa PDF
No ratings yet
Dsa PDF
293 pages
Lecture 1 Basic Concepts Related To DSA
No ratings yet
Lecture 1 Basic Concepts Related To DSA
36 pages
CSE 241 Class Notes
No ratings yet
CSE 241 Class Notes
7 pages
IntroductiontoDataStructureandArray PDF
No ratings yet
IntroductiontoDataStructureandArray PDF
34 pages
Data Structures and Algorithms: (CS210/ESO207/ESO211)
No ratings yet
Data Structures and Algorithms: (CS210/ESO207/ESO211)
21 pages
Iare DS PPT 0
No ratings yet
Iare DS PPT 0
221 pages
Data Structures PYQ (1)
No ratings yet
Data Structures PYQ (1)
26 pages
Data Structure & Algorithms
No ratings yet
Data Structure & Algorithms
41 pages
CS214-DS2024-lec-1-Intro
No ratings yet
CS214-DS2024-lec-1-Intro
42 pages
CSIT571 - 01SP23 - Module - 01 Introduction To Algorithms
No ratings yet
CSIT571 - 01SP23 - Module - 01 Introduction To Algorithms
62 pages
Buy ebook (Ebook) Data Structures and Algorithms Using Python by Rance D. Necaise ISBN 9780470618295, 0470618299 cheap price
100% (3)
Buy ebook (Ebook) Data Structures and Algorithms Using Python by Rance D. Necaise ISBN 9780470618295, 0470618299 cheap price
81 pages
Introduction Handout
No ratings yet
Introduction Handout
27 pages
Introduction To Data Structure and Algorithm
No ratings yet
Introduction To Data Structure and Algorithm
16 pages
Lecture 1 - Intro
No ratings yet
Lecture 1 - Intro
20 pages
Birla Institute of Technology & Science, Pilani Course Handout Part A: Content Design
No ratings yet
Birla Institute of Technology & Science, Pilani Course Handout Part A: Content Design
8 pages
Lecture 1 Basic Concepts Related to DSA
No ratings yet
Lecture 1 Basic Concepts Related to DSA
36 pages
Problem Solving With Algorithms And Data Structures Using Python Second Edition 2nd Bradley N Miller pdf download
No ratings yet
Problem Solving With Algorithms And Data Structures Using Python Second Edition 2nd Bradley N Miller pdf download
90 pages
DST Nirali Publication
No ratings yet
DST Nirali Publication
85 pages
DS 1
No ratings yet
DS 1
69 pages
Mc4101 - Adsa Notes
No ratings yet
Mc4101 - Adsa Notes
142 pages
Overview of Algorithm Analysis: Ms. Andleeb Yousaf Khan Spring 2020
No ratings yet
Overview of Algorithm Analysis: Ms. Andleeb Yousaf Khan Spring 2020
24 pages
DS PPT
No ratings yet
DS PPT
221 pages
Full download Problem Solving with Algorithms and Data Structures Using Python SECOND EDITION Bradley N. Miller pdf docx
100% (1)
Full download Problem Solving with Algorithms and Data Structures Using Python SECOND EDITION Bradley N. Miller pdf docx
61 pages
DSA Chapter 0
No ratings yet
DSA Chapter 0
58 pages
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Mastering Data Structures and Algorithms in C and C++
From Everand
Mastering Data Structures and Algorithms in C and C++
Sachin Naha
No ratings yet
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Coding Interview Questions and Answers
From Everand
Coding Interview Questions and Answers
Chinmoy Mukherjee
No ratings yet
50 Python Concepts Every Developer Should Know
From Everand
50 Python Concepts Every Developer Should Know
Hernando Abella
No ratings yet
Introduction to PHP, Part 3, Second Edition
From Everand
Introduction to PHP, Part 3, Second Edition
Adam Majczak
No ratings yet
Master Fundamental Concepts of Math Olympiad: Maths, #1
From Everand
Master Fundamental Concepts of Math Olympiad: Maths, #1
Subbalakshmi Devaki
No ratings yet
Data Structures and Algorithms with Python
From Everand
Data Structures and Algorithms with Python
Aadinath Pothuvaal
No ratings yet
Data Structures in C / C ++: Exercises and Solved Problems
From Everand
Data Structures in C / C ++: Exercises and Solved Problems
Fulbia Torres
No ratings yet
Yoga Wisdom & Practice
No ratings yet
Yoga Wisdom & Practice
260 pages
Yogic System of Patanjali Krishnananda
No ratings yet
Yogic System of Patanjali Krishnananda
178 pages
Yoga, Fascia, Anatomy and Movement
100% (1)
Yoga, Fascia, Anatomy and Movement
439 pages
Yogapoint Book of Practices
No ratings yet
Yogapoint Book of Practices
398 pages
Yoga Yajnavalkya
No ratings yet
Yoga Yajnavalkya
144 pages
Osborn 2013
No ratings yet
Osborn 2013
12 pages
Worksheet - Lesson 19 - Environment and Genetics
No ratings yet
Worksheet - Lesson 19 - Environment and Genetics
1 page
Worksheet - Lesson 36 - Master One Thing
No ratings yet
Worksheet - Lesson 36 - Master One Thing
1 page
Worksheet - Lesson 15 - Environment Design
No ratings yet
Worksheet - Lesson 15 - Environment Design
1 page
Worksheet - Lesson 32 - Break A Bad Habit For Good
No ratings yet
Worksheet - Lesson 32 - Break A Bad Habit For Good
1 page
Brown 2013
No ratings yet
Brown 2013
20 pages
Worksheet - Lesson 09 - 2 Minute Rule
No ratings yet
Worksheet - Lesson 09 - 2 Minute Rule
1 page
Worksheet - Lesson 24 - Keystone Habits
No ratings yet
Worksheet - Lesson 24 - Keystone Habits
1 page
Delacroix 1998
No ratings yet
Delacroix 1998
24 pages
DAA-Unit II
No ratings yet
DAA-Unit II
12 pages
ASP UNIT-2 Question Bank
No ratings yet
ASP UNIT-2 Question Bank
18 pages
Water Level Control Fuzzy Logic Neural Network
No ratings yet
Water Level Control Fuzzy Logic Neural Network
6 pages
OMET Question Types PART 2 - Sol
No ratings yet
OMET Question Types PART 2 - Sol
3 pages
Automatic Detection of Online Abuse Final
No ratings yet
Automatic Detection of Online Abuse Final
19 pages
Iterative Newton Raphson Method inverseAdmittivityProblem
No ratings yet
Iterative Newton Raphson Method inverseAdmittivityProblem
10 pages
Purpose of GIS
No ratings yet
Purpose of GIS
17 pages
CETPA-ANSYS Training PDF
No ratings yet
CETPA-ANSYS Training PDF
2 pages
Lab5 n01530481
No ratings yet
Lab5 n01530481
5 pages
BITS Pilani: ME F318: Computer Aided Design Tutorial/Assignment - 4
No ratings yet
BITS Pilani: ME F318: Computer Aided Design Tutorial/Assignment - 4
11 pages
Image Processing
No ratings yet
Image Processing
8 pages
Linear Equations Test-1
No ratings yet
Linear Equations Test-1
1 page
Model Uncertainty in Aircraft Impact Analysis
No ratings yet
Model Uncertainty in Aircraft Impact Analysis
17 pages
46 Calculation PDF
No ratings yet
46 Calculation PDF
10 pages
DVNHFRGN
No ratings yet
DVNHFRGN
10 pages
ME F343 Handout
No ratings yet
ME F343 Handout
2 pages
TPMS
No ratings yet
TPMS
8 pages
14.12 Economic Applications of Game Theory: N X X X X X
No ratings yet
14.12 Economic Applications of Game Theory: N X X X X X
6 pages
Analysis and Prediction of Soccer Games - An Application To The Kaggle European Soccer Database
No ratings yet
Analysis and Prediction of Soccer Games - An Application To The Kaggle European Soccer Database
6 pages
Algorithms For Color Look-Up-Table (LUT) Design
No ratings yet
Algorithms For Color Look-Up-Table (LUT) Design
5 pages
hkdse m1 notes
No ratings yet
hkdse m1 notes
3 pages
Matrix Chain Multiplication
No ratings yet
Matrix Chain Multiplication
11 pages
Neural Networks and Deep Learning
No ratings yet
Neural Networks and Deep Learning
19 pages
Lecture 07 KNN 14112022 034756pm
100% (1)
Lecture 07 KNN 14112022 034756pm
24 pages
Physics-Informed Neural Networks (PINNs) For Heat Transfer Problems
No ratings yet
Physics-Informed Neural Networks (PINNs) For Heat Transfer Problems
21 pages
Calculus Several Variables Canadian 8th Edition Adams Test Bank - Download PDF
No ratings yet
Calculus Several Variables Canadian 8th Edition Adams Test Bank - Download PDF
66 pages
A Comparison of Solving The Poisson Equation Using Several Numerical Methods in Matlab and Octave On The Cluster Maya
No ratings yet
A Comparison of Solving The Poisson Equation Using Several Numerical Methods in Matlab and Octave On The Cluster Maya
18 pages
Levenberg-Marquardt Backpropagation - MATLAB Trainlm
No ratings yet
Levenberg-Marquardt Backpropagation - MATLAB Trainlm
2 pages