0% found this document useful (0 votes)
61 views

Slides9 8

This document outlines an introductory lecture on algorithms for data science. It introduces insertion sort as a first algorithm and provides an overview of how to analyze algorithms, focusing on correctness, running time, and asymptotic notation. Insertion sort is analyzed in detail, proving its correctness and deriving its worst-case running time of O(n^2). The efficiency of insertion sort is then compared to a brute force sorting approach.

Uploaded by

Eartha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views

Slides9 8

This document outlines an introductory lecture on algorithms for data science. It introduces insertion sort as a first algorithm and provides an overview of how to analyze algorithms, focusing on correctness, running time, and asymptotic notation. Insertion sort is analyzed in detail, proving its correctness and deriving its worst-case running time of O(n^2). The efficiency of insertion sort is then compared to a brute force sorting approach.

Uploaded by

Eartha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Algorithms for Data Science

CSOR W4246
Eleni Drinea
Computer Science Department
Columbia University
Tuesday, September 8, 2015

Outline

1 Overview

2 A first algorithm: insertion sort

3 Analysis of algorithms

4 Efficient algorithms

5 Asymptotic notation

Today

1 Overview
2 A first algorithm: insertion sort
3 Analysis of algorithms
4 Efficient algorithms
5 Asymptotic notation

Algorithms

An algorithm is a well-defined computational procedure


that transforms the input (a set of values) into the output
(a new set of values).

The desired input/output relationship is specified by the


statement of the computational problem for which the
algorithm is designed.

An algorithm is correct if, for every input, it halts with


the correct output.

Efficient Algorithms

In this course we are interested in algorithms that are


correct and efficient.

Efficiency is related to the resources an algorithm uses:


time, space
I
I

How much time/space are used?


How do they scale as the input size grows?

We will primarily focus on efficiency in running time.

Running time

Running time = number of primitive computational steps


performed; typically these are
1. arithmetic operations: add, subtract, multiply, divide
fixed-size integers
2. data movement operations: load, store, copy
3. control operations: branching, subroutine call and return
We will use pseudocode for our algorithm descriptions.

Today

1 Overview
2 A first algorithm: insertion sort
3 Analysis of algorithms
4 Efficient algorithms
5 Asymptotic notation

Sorting
I

Input: A list A of n integers x1 , . . . , xn .

Output: A permutation x01 , x02 , . . . , x0n of the n integers


where they are sorted in non-decreasing order, i.e.,
x01 x02 . . . x0n

Sorting
I

Input: A list A of n integers x1 , . . . , xn .

Output: A permutation x01 , x02 , . . . , x0n of the n integers


where they are sorted in non-decreasing order, i.e.,
x01 x02 . . . x0n

Example
I

Input: n = 6, A = {9, 3, 2, 6, 8, 5}

Output: A = {2, 3, 5, 6, 8, 9}

What data structure should we use to represent the list?

Sorting
I

Input: A list A of n integers x1 , . . . , xn .

Output: A permutation x01 , x02 , . . . , x0n of the n integers


where they are sorted in non-decreasing order, i.e.,
x01 x02 . . . x0n

Example
I

Input: n = 6, A = {9, 3, 2, 6, 8, 5}

Output: A = {2, 3, 5, 6, 8, 9}

What data structure should we use to represent the list?

Array: collection of items of the same data type


I

allows for random access

zero indexed in C++ and Java

Insertion sort in English

unsorted

sorted
key
key

i-1

1. Start with a (trivially) sorted subarray of size 1 consisting of


the first element of A.

Insertion sort in English

unsorted

sorted
key
key

i-1

1. Start with a (trivially) sorted subarray of size 1 consisting of


the first element of A.
2. Increase the size of the sorted subarray by 1 by inserting
the next element of A into its correct location.
I

Compare that next element, call it key, with every element


x of the sorted subarray starting from the right.
I
I

If x > key, move x one position to the right.


Else (x key), insert key after x.

Insertion sort in English

unsorted

sorted
key
key

i-1

1. Start with a (trivially) sorted subarray of size 1 consisting of


the first element of A.
2. Increase the size of the sorted subarray by 1 by inserting
the next element of A into its correct location.
I

Compare that next element, call it key, with every element


x of the sorted subarray starting from the right.
I
I

If x > key, move x one position to the right.


Else (x key), insert key after x.

3. Repeat Step 2. until the sorted subarray has size n.

Example of insertion sort: n = 6, A = {9, 3, 2, 6, 8, 5}


key

unsorted

sorted
9

beginning of iteration i=2

beginning of iteration i=3

beginning of iteration i=4

key

unsorted

sorted
3

6
key

unsorted

sorted
2

3
9

8
key

unsorted

sorted
2

3
9

beginning of iteration i=5

key

unsorted

sorted
2

3
9

3
9

beginning of iteration i=6

end of iteration i=6

sorted
5

Pseudocode

Let A be an array of n integers.


insertion-sort(A)
for i = 2 to n do
key = A[i]
//Insert A[i] into the sorted subarray A[1, i 1]
j =i1
while j > 0 and A[j] > key do
A[j + 1] = A[j]
j =j1
end while
A[j + 1] = key
end for

Today

1 Overview
2 A first algorithm: insertion sort
3 Analysis of algorithms
4 Efficient algorithms
5 Asymptotic notation

Analysis of algorithms

Correctness

Running time

(Space)

Analysis of algorithms

Correctness: formal proof often by induction

Running time: number of primitive computational steps


I
I
I

Not the same as time it takes to execute the algorithm.


We want a measure that is independent of hardware.
We want to know how running time scales with the size of
the input.

Space: how much space is required by the algorithm

Analysis of insertion-sort

Notation: A[i, j] is the subarray of A that starts at position i and


ends at position j.

Correctness: follows from the key observation that after


loop i, the subarray A[1, i] is sorted

Running time: number of primitive computational steps

Space: in place algorithm (at most a constant number of


elements of A are stored outside A at any time)

Example of induction
Fact 1.
For all n 1,

Pn

i=1 i

n(n+1)
.
2

Example of induction
Fact 1.
For all n 1,

Pn

i=1 i

n(n+1)
.
2

Proof.
I

Base case: n = 1

Inductive hypothesis:
PAssume that the statement is
true for n 1, that is, ni=1 i = n(n+1)
.
2

Inductive step:
We show that the statement is true for
Pn+1
n + 1. That is, i=1 i = (n+1)(n+2)
. (Show this!)
2

Conclusion: It follows that the statement is true for all n


since we can apply the inductive step for n = 2, 3, . . ..

Correctness of insertion-sort
Notation: A[i, j] is the subarray of A that starts at position i
and ends at position j.
Minor change in the pseudocode: in line 1, start from i = 1
rather than i = 2. How does this change affect the algorithm?

Claim 1.
Let n 1 be a positive integer. For all 1 i n, after the i-th
loop, the subarray A[1, i] is sorted.
Correctness of insertion-sort follows if we show Claim 1
(why?).

Proof of Claim 1
By induction on i.
I

Base case: i = 1, trivial.

Induction hypothesis: assume that the statement is true


for some 1 i < n.

Inductive step: Show it true for i + 1.


In loop i + 1, element key = A[i + 1] is inserted into A[1, i],
which is sorted (by the induction hypothesis). Since key is
inserted after the first element A[`] for 1 ` i such that
key A[`], and all elements in A[` + 1, j] are pushed one
position to the right with their order preserved, the
statement is true for i + 1.

Running time T (n) of insertion-sort


for i = 2 to n do
key = A[i]
//Insert A[i] into the sorted subarray A[1, i 1]
j =i1
while j > 0 and A[j] > key do
A[j + 1] = A[j]
j =j1
end while
A[j + 1] = key
end for

How many primitive computational steps are executed by the


algorithm?

Equivalently, what is the running time T (n)? Bounds on T (n)?

Running time T (n) of insertion-sort


for i = 2 to n do
line 1
key = A[i]
line 2
//Insert A[i] into the sorted subarray A[1, i 1]
j =i1
line 3
while j > 0 and A[j] > key do
line 4
A[j + 1] = A[j]
line 5
j =j1
line 6
end while
A[j + 1] = key
line 7
end for
I

For 2 i n, let ti = # times line 4 is executed.

Running time T (n) of insertion-sort


for i = 2 to n do
line 1
key = A[i]
line 2
//Insert A[i] into the sorted subarray A[1, i 1]
j =i1
line 3
while j > 0 and A[j] > key do
line 4
A[j + 1] = A[j]
line 5
j =j1
line 6
end while
A[j + 1] = key
line 7
end for
I

For 2 i n, let ti = # times line 4 is executed. Then


n
n
n
X
X
X
T (n) = n + 3(n 1) +
ti + 2
(ti 1) = 3
ti + 2n 1
i=2

i=2

i=2

Which input yields the smallest (best-case) running time?

Which input yields the largest (worst-case) running time?

Running time T (n) of insertion-sort


for i = 2 to n do
line 1
key = A[i]
line 2
//Insert A[i] into the sorted subarray A[1, i 1]
j =i1
line 3
while j > 0 and A[j] > key do
line 4
A[j + 1] = A[j]
line 5
j =j1
line 6
end while
A[j + 1] = key
line 7
end for
I

For 2 i n, let ti = # times line 4 is executed. Then


n
X
T (n) = 3
ti + 2n 1
i=2

Best-case running time: 5n 4

Worst-case running time:

3n2
2

7n
2

Worst-case analysis

Definition 2.
Worst-case running time: largest possible running time of the
algorithm over all inputs of a given size n.
Why worst-case analysis?
I

It gives well-defined computable bounds.

Average-case analysis can be tricky: how do we generate a


random instance?

The worst-case running time of insertion-sort is quadratic.


Is insertion-sort efficient?

Today

1 Overview
2 A first algorithm: insertion sort
3 Analysis of algorithms
4 Efficient algorithms
5 Asymptotic notation

Efficiency of insertion-sort and the brute force solution


Compare to brute force solution:
I

At each step, generate a new permutation of the n integers.

If sorted, stop and output the permutation.

Efficiency of insertion-sort and the brute force solution


Compare to brute force solution:
I

At each step, generate a new permutation of the n integers.

If sorted, stop and output the permutation.

Worst-case analysis: generate n! permutations. Is brute force


solution efficient?

Efficiency of insertion-sort and the brute force solution


Compare to brute force solution:
I

At each step, generate a new permutation of the n integers.

If sorted, stop and output the permutation.

Worst-case analysis: generate n! permutations. Is brute force


solution efficient?
I

Efficiency relates to the performance of the algorithm


as n grows.
n
Stirlings approximation formula: n! ne .
I
I
I

For n = 10, generate 3.6710 210 permutations.


For n = 50, generate 18.350 2200 permutations.
For n = 100, generate 36.7100 2700 permutations!

Brute force solution is not efficient.

Efficient algorithms Attempt 1

Definition 3 (Attempt 1).


An algorithm is efficient if it achieves better worst-case
performance than brute-force search.

Efficient algorithms Attempt 1

Definition 3 (Attempt 1).


An algorithm is efficient if it achieves better worst-case
performance than brute-force search.
Caveat: fails to discuss the scaling properties of the algorithm.
I

If the input size grows by a constant factor, we would like


the running time T (n) of the algorithm to increase by a
constant factor as well.

Efficient algorithms Attempt 1

Definition 3 (Attempt 1).


An algorithm is efficient if it achieves better worst-case
performance than brute-force search.
Caveat: fails to discuss the scaling properties of the algorithm.
I

If the input size grows by a constant factor, we would like


the running time T (n) of the algorithm to increase by a
constant factor as well.

Note that polynomial running times scale well: on input of


size n, T (n) is at most c nd for c, d > 0 constants.
I

the smaller the exponent of the polynomial the better

Efficient algorithms
Definition 4.
An algorithm is efficient if it has a polynomial running time.
Caveat
I

What about huge constants in front of the leading term or


large exponents?

However
I

Small degree polynomial running times exist for most


problems that can be solved in polynomial time.

Conversely, problems for which no polynomial-time


algorithm is known tend to be very hard in practice.

So we can distinguish between easy and hard problems.

Remark 1.
Todays big data: even low degree polynomials might be too slow!

Are we done with sorting?

Insertion sort is efficient. Are we done with sorting?

Are we done with sorting?

Insertion sort is efficient. Are we done with sorting?

1. Can we do better?
2. And what is better?
I

E.g., is T (n) = n2 + n 4 worth aiming for?

Running time in terms of # primitive steps

To discuss this, we need a coarser classification of running times


of algorithms; exact characterizations
I

are too detailed;

do not reveal similarities between running times in an


immediate way as n grows large;

are often meaningless: pseudocode steps will expand by


a constant factor that depends on the hardware.

Aymptotic analysis

A framework that will allow us to compare the rate of growth of


different running times as the input size n grows.
I

We will express the running time as a function of


the number of primitive steps.

The number of primitive steps is itself a a function


of the size of the input n.

The running time is a function of the size of the input n.


I

To compare functions expressing running times, we will


ignore their low-order terms and focus solely on the
highest-order term.

Today

1 Overview
2 A first algorithm: insertion sort
3 Analysis of algorithms
4 Efficient algorithms
5 Asymptotic notation

Asymptotic upper bounds: Big-O notation


Definition 5 (O).
We say that T (n) = O(f (n)) if there exist constants c > 0 and
n0 0 s.t. for all n n0 , we have T (n) c f (n) .

T(n) = O(f(n))
c f(n)

T(n)

n
n0

Asymptotic upper bounds: Big-O notation

Definition 6 (O).
We say that T (n) = O(f (n)) if there exist constants c > 0 and
n0 0 s.t. for all n n0 , we have T (n) c f (n) .
Examples:
I

T (n) = an2 + b, a, b > 0 constants and f (n) = n2 .

T (n) = an2 + b, f (n) = n3 .

Asymptotic lower bounds: Big- notation


Definition 7 ().
We say that T (n) = (f (n)) if there exist constants c > 0 and
n0 0 s.t. for all n n0 , we have T (n) c f (n).

T(n) = (f(n))
T(n)

c f(n)

n0

Asymptotic lower bounds: Big- notation

Definition 8 ().
We say that T (n) = (f (n)) if there exist constants c > 0 and
n0 0 s.t. for all n n0 , we have T (n) c f (n).
Examples:
I

T (n) = an2 + b, a, b > 0 constants and f (n) = n2 .

T (n) = an2 + b, a, b > 0 constants and f (n) = n.

Asymptotic tight bounds: notation


Definition 9 ().
We say that T (n) = (f (n)) if there exist constants c1 , c2 > 0
and n0 0 s.t. for all n n0 , we have
c1 f (n) T (n) c2 f (n).
c2 f(n)
T(n) = (f(n))

T(n)
c1 f(n)

n
n0

Asymptotic tight bounds: notation

Definition 10 ().
We say that T (n) = (f (n)) if there exist constants c1 , c2 > 0
and n0 0 s.t. for all n n0 , we have
c1 f (n) T (n) c2 f (n).
Equivalent definition
T (n) = (f (n)) if T (n) = O(f (n)) and T (n) = (f (n))
Examples:
I

T (n) = an2 + b, a, b > 0 constants and f (n) = n2 .

T (n) = n log n + n, and f (n) = n log n.

Asymptotic upper bounds that are not tight: little-o


Definition 11 (o).
We say that T (n) = o(f (n)) if for any constant c > 0 there
exists a constant n0 0 s.t. for all n n0 , we have
T (n) < c f (n) .

Asymptotic upper bounds that are not tight: little-o


Definition 11 (o).
We say that T (n) = o(f (n)) if for any constant c > 0 there
exists a constant n0 0 s.t. for all n n0 , we have
T (n) < c f (n) .
I

Intuitively, T (n) becomes insignificant relative to f (n) as


n .

Proof by showing that lim

T (n)
n f (n)

= 0 (if the limit exists).

Asymptotic upper bounds that are not tight: little-o


Definition 11 (o).
We say that T (n) = o(f (n)) if for any constant c > 0 there
exists a constant n0 0 s.t. for all n n0 , we have
T (n) < c f (n) .
I

Intuitively, T (n) becomes insignificant relative to f (n) as


n .

Proof by showing that lim

T (n)
n f (n)

= 0 (if the limit exists).

Examples:
I

T (n) = an2 + b, a, b > 0 constants and f (n) = n3 .

T (n) = n log n, a, b, d > 0 constants and f (n) = n2 .

Asymptotic lower bounds that are not tight: little-

Definition 12 ().
We say that T (n) = (f (n)) if for any constant c > 0 there
exists n0 0 s.t. for all n n0 , we have T (n) > c f (n).

Asymptotic lower bounds that are not tight: little-

Definition 12 ().
We say that T (n) = (f (n)) if for any constant c > 0 there
exists n0 0 s.t. for all n n0 , we have T (n) > c f (n).
I

Intuitively T (n) becomes arbitrarily large relative to f (n)


as n .

T (n) = (f (n)) implies that lim

T (n)
n f (n)

exists. Then f (n) = o(T (n)).

= if the limit

Asymptotic lower bounds that are not tight: little-

Definition 12 ().
We say that T (n) = (f (n)) if for any constant c > 0 there
exists n0 0 s.t. for all n n0 , we have T (n) > c f (n).
I

Intuitively T (n) becomes arbitrarily large relative to f (n)


as n .

T (n) = (f (n)) implies that lim

T (n)
n f (n)

exists. Then f (n) = o(T (n)).


Examples:
I

T (n) = n2 and f (n) = n log n.

T (n) = 2n and f (n) = n5 .

= if the limit

Basic rules for omitting low order terms from functions

1. ignore multiplicative factors: e.g., 10n3 becomes n3


2. na dominates nb if a > b: e.g., n2 dominates n
3. exponentials dominate polynomials: e.g., 2n dominates n4
4. polynomials dominate logarithms: e.g., n dominates log3 n
for large enough n,
log n < n < n log n < n2 < 2n < nn

Properties of asymptotic growth rates


Transitivity
1. If f = O(g) and g = O(h) then f = O(h).
2. If f = (g) and g = (h) then f = (h).
3. If f = (g) and g = (h) then f = (h).
Sums of (up to a constant number of) functions
1. If f = O(h) and g = O(h) then f + g = O(h).
2. Let k be a fixed constant, and let f1 , f2 , . . . , fk , h be
functions s.t. for all i, fi = O(h). Then
f1 + f2 + . . . + fk = O(h).
Transpose symmetry
I

f = O(g) if and only if g = (f ).

f = o(g) if and only if g = (f ).

You might also like