0% found this document useful (0 votes)
7 views

MAT1830NotesBooklet (3)

Uploaded by

mai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

MAT1830NotesBooklet (3)

Uploaded by

mai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

Contents

Lesson 1: What is MAT1830 about?


Lesson 2: Divisors and primes
Lesson 3: Congruences
Lesson 4: Logic
Lesson 5: Tautologies and logical equivalence
Lesson 6: Rules of inference
Lesson 7: Predicates and quantifiers
Lesson 8: Predicate logic
Lesson 9: Mathematical induction
Lesson 10: Induction and well-ordering
Lesson 11: Sets
Lesson 12: Operations on sets
Lesson 13: Functions
Lesson 14: Examples of functions
Lesson 15: Composition and inversion
Lesson 16: Relations
Lesson 17: Equivalence relations
Lesson 18: Order relations
Lesson 19: Selections and arrangements
Lesson 20: Pascal’s triangle
Lesson 21: Probability
Lesson 22: Conditional probability and Bayes’ theorem
Lesson 23: Random variables
Lesson 24: Expectation and variance
Lesson 25: Discrete distributions
Lesson 26: Recursion
Lesson 27: Recursive algorithms
Lesson 28: Recursion, lists and sequences
Lesson 29: Graphs
Lesson 30: Walks, paths and trails
Lesson 31: Degree
Lesson 32: Trees
Lesson 33: Trees, queues and stacks
Lesson 1: What is MAT1830 about?

Discrete mathematics studies objects which we’ll see, even if you are pretty sure that some-
have distinct separated values (e.g. integers), as thing is true, it can be really useful to have a
opposed to objects which vary smoothly (e.g. proof of it, for a number of reasons.
real numbers). You can think of it as being
“digital” mathematics rather than “analogue” 1.3 Maths in computer science
mathematics.
Discrete mathematics is particularly impor- As we mentioned above, maths and computer
tant in computer science and the two fields are science are very closely related. The topics in
very closely linked. this course all have many applications to com-
This course covers a wide range of topics in puter science. For example:
discrete mathematics including the following:
ˆ Number theory is used in cryptography
ˆ Numbers to enable secure communication, identity
verification, online banking and shopping
ˆ Logic etc.

ˆ Induction and recursion ˆ Logic is used in digital circuit design and


in program control flow.
ˆ Sets, functions and relations
ˆ Induction and recursion are used to study
ˆ Probability algorithms and their effectiveness.
ˆ Graph theory ˆ Functions are important in the theory of
programming and relations are vital in
1.1 What to expect database theory and design.
What we do here might be a bit different to a lot ˆ Probability is vital for understanding ran-
of the maths you’ve done in the past. We’ll be domised algorithms and for creating sys-
concentrating on really understanding the con- tems to deal with uncertain situations.
cepts, rather than simply learning how to solve
certain types of questions. ˆ Graph theory is used in software which
For a lot of the questions we ask, there won’t solves allocation and scheduling problems.
be a single fixed procedure you can apply to get
the answer. Instead, you’ll have to think care- Questions
fully about what the question is asking and try
1.1 What maths that you’ve done in the past
to work out what is really going on. Don’t be
would count as discrete? What would
afraid to try different things, play around, and
count as continuous instead? Are there
look at examples.
grey areas?
We’ll also be emphasising the importance of
proving results. 1.2 Why might proofs be important to math-
ematicians and computer scientists?
1.2 Proofs
1.3 Can you think of other links between
A proof is essentially just a water-tight argu- maths and computer science?
ment that a certain statement must be true. As
Lesson 2: Divisors and primes

2.2 Recognising primes


We say that integer a divides integer b if
b = qa for some integer q. If an integer n > 1 has a divisor, it has a divisor
√ √
⩽ n, because for any divisor a > n we also

have the divisor n/a, which is < n.
Example. 2 divides 6 because 6 = 3 × 2. Thus to test whether 10001 is prime, say, we
only have to see whether any of the√numbers
This is the same as saying that division with 2, 3, 4, . . . ⩽ 100 divide 10001, since 10001 <
remainder gives remainder 0. Thus a does not 101. (The least divisor found is in fact 73, be-
divide b when the remainder is ̸= 0. cause 10001 = 73 × 137.)
This explains a common algorithm for recog-
Example. 3 does not divide 14 because it leaves nising whether n is prime: try dividing n by

remainder 2: 14 = 4 × 3 + 2. a = 2, 3, . . . while a ⩽ n.
The algorithm is written with a boolean vari-
When a divides b we also say: able prime, and n is prime if prime = T (true)
ˆ a is a divisor of b, when the algorithm terminates.
ˆ a is a factor of b,
ˆ b is divisible by a, assign a the value 2.
ˆ b is a multiple of a. assign prime the value T.

while a ⩽ n and prime= T
if a divides n
2.1 Primes give prime the value F
A positive integer p > 1 is a prime if its only else
positive integer divisors are 1 and p. Thus the increase the value of a by 1.
first few prime numbers are
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, . . .
2.3 Finding divisors
The number 1 is not counted as a prime, as this
would spoil the This algorithm also finds a prime divisor of n.
Either

the least a ⩽ n which divides n,
Fundamental Theorem of Arithmetic.
or,
Each integer > 1 can be expressed in exactly √
if we do not find a divisor among the a ⩽ n, n
one way, up to order, as a product of primes.
itself is prime.

Example. 210 = 2 × 3 × 5 × 7, and this is the 2.4 The greatest common divisor of
only product of primes which equals 210. two numbers
It is remarkable that we can find the greatest
This would not be true if 1 was counted as
common divisor of positive integers m and n,
a prime, because many factorisations involve 1.
gcd(m, n), without finding their prime divisors.
E.g.
This is done by the famous Euclidean al-
210 = 1 × 2 × 3 × 5 × 7 = 12 × 2 × 3 × 5 × 7 = . . . gorithm, which repeatedly divides the greater
number by the smaller, keeping the smaller num- given above, the gcd of the numbers in the last
ber and the remainder. two columns is always gcd(m, n).)

Euclidean Algorithm. 2.6 Extended Euclidean algorithm


Input: positive integers m and n with m ⩾ n
If we have used the Euclidean algorithm to find
Output: gcd(m, n)
that gcd(m, n) = d, we can “work backwards”
a := m, b := n through its steps to find integers a and b such
r := remainder when a is divided by b that am + bn = d.
while r ̸= 0 do
a := b Example. For our m = 237, n = 105 example
b := r above:
r := remainder when a is divided by b
end 3= 27 − 1×24
return b
3= 27 − 1(105 − 3×27) = −105 + 4×27
3 = −105 + 4(237 − 2×105) = 4×237 − 9×105
Example. m = 237, n = 105
The first values are a = 237, b = 105, So we see that a = 4 and b = −9 is a solution in
so r = 237 − 2 × 105 = 27. this case.
The next values are a = 105, b = 27, Our first line above was a rearrangement of
so r = 105 − 3 × 27 = 24. the second last line of our original Euclidean al-
The next values are a = 27, b = 24, gorithm working. In the second line we made a
so r = 27 − 1 × 24 = 3. substitution for 24 based on the second line of
The next values are a = 24, b = 3, our original Euclidean algorithm working. In
so r = 24 − 8 × 3 = 0. the third line we made a substitution for 27
Thus the final value of b is 3, which is based on the first line of our original Euclidean
gcd(237, 105). algorithm working.
This can be set out more neatly:
Questions
237 = 2 × 105 + 27
105 = 3 × 27 + 24 2.1 Write down multiples of 13, and multiples
27 = 1 × 24 + 3 of 21, until you find a multiple of 13 and
24 = 8 × 3 + 0 a multiple of 21 which differ by 1.

2.5 The Euclidean algorithm works! 2.2 Can a multiple of 15 and a multiple of 21
differ by 1? If not, what is the small-
We start with the precondition m ⩾ n > 0. est positive difference between such mul-
Then the division theorem tells us there is a re- tiples?
mainder r < b when a = m is divided by b = n.
Repeating the process gives successively smaller 2.3 Find gcd(13, 21) and gcd(15, 21), and sug-
remainders, and hence the algorithm eventually gest how they are related to the results in
returns a value. Questions 2.1 and 2.2.
That the value returned value is actually
2.4 Work out the prime factorisations of 999
gcd(m, n) relies on the following fact.
and 1000.

Fact. If a, b and k are integers, then 2.5 You should find no common prime factor
of 999 and 1000. How could you see this
gcd(a − kb, b) = gcd(a, b).
without factorising the numbers? (Hint: a
By using this fact repeatedly, we can show that common divisor of 1000 and 999 is also a
after each execution of the while loop in the al- common divisor of . . . what?)
gorithm gcd(b, r) = gcd(m, n). When the algo-
rithm terminates, this means b = gcd(b, 0) =
gcd(m, n). (Equivalently, in the neat set out
Lesson 3: Congruences

We’re used to classifying the integers as ei- Example.


ther even or odd. The even integers are those
19 ≡ 13 (mod 6) because 6 divides 19-13
that can be written as 2k for some integer k.
The odd integers are those that can be written 12 ≡ 20 (mod 4) because 4 divides 20-12
as 2k + 1 for some integer k. 22 ≡ 13 (mod 3) because 3 divides 22-13

even . . . , −6, −4, −2, 0, 2, 4, 6, . . .


odd . . . , −5, −3, −1, 1, 3, 5, . . . 3.2 Working with congruences
When working with congruences modulo some
fixed integer n, we can “substitute in” just like
This classification is useful because even and we can with equalities.
odd integers have particular properties. For ex-
ample, the sum of any two odd integers is even.
Similarly we can split the integers into three
If a ≡ b (mod n) and b ≡ c (mod n), then
classes: those that are 3k for some integer k,
those that are 3k + 1 for some integer k, and a ≡ c (mod n).
those that are 3k + 2 for some integer k.
Example. Suppose x ≡ 13 (mod 7). Then
3k . . . , −9, −6, −3, 0, 3, 6, 9, . . . x ≡ 6 (mod 7) because 13 ≡ 6 (mod 7).
3k + 1 . . . , −8, −5, −2, 1, 4, 7, 10, . . .
We can add, subtract and multiply congru-
3k + 2 . . . , −7, −4, −1, 2, 5, 8, 11, . . . ences just like we can with equations.

If a1 ≡ b1 (mod n) and a2 ≡ b2 (mod n), then


These classes also have particular properties.
For example, the sum of an integer in the sec- ˆ a1 + a2 ≡ b1 + b2 (mod n)
ond class and an integer in the third class will
always be in the first class. ˆ a1 − a2 ≡ b1 − b2 (mod n)
We don’t have to stop with 3. We could di-
ˆ a1 a2 ≡ b1 b2 (mod n).
vide integers into 4 different classes according to
their remainders when divided by 4, and so on.
Example. If x ≡ 3 (mod 8) and y ≡ 2 (mod 8),
then
3.1 Congruences ˆ x + y ≡ 5 (mod 8)

ˆ x − y ≡ 1 (mod 8)
Let n ⩾ 2 be an integer. We say integers a
and b are congruent modulo n and write ˆ xy ≡ 6 (mod 8).
a ≡ b (mod n) We can also deduce that x + 4 ≡ 7 (mod 8),
when n divides a − b. that 4x ≡ 12 (mod 8) and so on, because obvi-
ously 4 ≡ 4 (mod 8). Note as well that 4x ≡
12 (mod 8) can be simplified to 4x ≡ 4 (mod 8).
In some situations we can also “divide 3.4 Modular inverses
through” a congruence by an integer.
A modular multiplicative inverse of an inte-
If a ≡ b (mod n) and d divides a, b and n, ger a modulo n is an integer x such that
then ax ≡ 1 (mod n).
a b n
d ≡ d (mod d ).
From the last section we know that such an in-
verse will exist if and only if gcd(a, n) = 1. If
3.3 Solving linear congruences inverses do exist then we can find them using
the extended Euclidean algorithm (there will
Think of a congruence like 7x ≡ 5 (mod 9). This be lots of inverses, but they will all be in one
will hold if 9 divides 7x − 5 or in other words if congruence class modulo n). These inverses
there is an integer y such that 7x − 5 = 9y. So have important applications to cryptography
to solve our original congruence we can find an and random number generation.
integer solution to 7x − 9y = 5.
Some congruences don’t have solutions. Example. 8 should have a multiplicative in-
For example, there is no solution to 10x ≡ verse modulo 45 because gcd(8, 45) = 1. Using
6 (mod 20) because there are no integers x and the extended Euclidean algorithm we see that
y such that 10x − 20y = 6. −3 × 45 + 17 × 8 = 1. So 8 × 17 ≡ 1 (mod 45).
To find an expression for all the integers x This means that 17 is a multiplicative inverse of
that satisfy a congruence like ax ≡ b (mod n), 8 modulo 45.
first find d = gcd(a, n) and then act as follows.
If d = 1: Find integers x′ and y ′ such that Questions
ax − ny ′ = b. The integers x that satisfy the

original congruence are exactly those for which 3.1 Are the following true or false?
x ≡ x′ (mod n). ˆ 6 ≡ 3 (mod 3)
If d > 1 and d divides b: The method
above will still work but it will only give some ˆ 9 ≡ 18 (mod 8)
of the solutions. To find all of the solutions, ˆ 5x + 6 ≡ 2x (mod 3)
first divide through the congruence by d to get
3.2 Prove all of the facts about congruences
the equivalent congruence ad x ≡ db (mod nd ) and
that were stated in this lesson (use the
then use the method above on the new congru-
definition of congruence modulo n and the
ence.
definition of divides).
If d doesn’t divide b: The congruence has
no solutions. 3.3 Find an expression for all the integers x
Example. Find all integers x such that that satisfy 9x ≡ 12 (mod 60).
36x ≡ 10 (mod 114).
Using the Euclidean algorithm we find
gcd(36, 114) = 6. So 6 divides 36x − 114y
for any integers x and y, and consequently
36x − 114y ̸= 10. This means that there are
no integers x such that 36x ≡ 10 (mod 114).

Example. Find all integers x such that


24x ≡ 8 (mod 44).
Using the Euclidean algorithm we find
gcd(24, 44) = 4. So we divide through by 4 to
get the equivalent congruence 6x ≡ 2 (mod 11).
Using the extended Euclidean algorithm we see
that 2×6−1×11 = 1, and hence 4×6−2×11 = 2.
Thus the integers x such that 24x ≡ 8 (mod 44)
are exactly the integers x ≡ 4 (mod 11).
Lesson 4: Logic

The simplest and most commonly used part Similarly, p ∨ q is true when p is true or q is
of logic is the logic of “and”, “or” and “not”, true, but now we have to be more precise, be-
which is known as propositional logic. cause “or” has at least two meanings in ordinary
A proposition is any sentence which has a speech.
definite truth value (true= T or false= F), such
as We define ∨ by the truth table
1 + 1 = 2, or p q p∨q
10 is a prime number. T T T
but not T F T
What is your name? or F T T
This sentence is false. F F F

Propositions are denoted by letters such as


This is the inclusive sense of “p or q” (often writ-
p, q, r, . . . , and they are combined into com-
ten “p and/or q” and meaning at least one of p,
pound propositions by connectives such as ∧
q is true).
(and), ∨ (or) and ¬ (not).
Finally, “not” ¬ (also called negation) is de-
fined as follows.

We define ¬ by the truth table


4.1 Connectives ∧, ∨ and ¬
p ¬p
∧, ∨ and ¬ are called “connectives” because they
T F
can be used to connect two sentences p and q
into one. These particular connectives are de- F T
fined so that they agree with the most com-
mon interpretations of the words “and”, “or” The connectives ∧, ∨ and ¬, are functions of the
and “not”. propositional variables p and q, which can take
To define p ∧ q, for example, we only have to the two values T and F. For this reason, ∧, ∨
say that p ∧ q is true only when p is true and q and ¬ are also called truth functions.
is true.
4.2 Implication

We define ∧ by the following truth table: Another important truth function is p → q,


which corresponds to “if p then q” or “p implies
p q p∧q q” in ordinary speech.
T T T In ordinary speech the value of p → q de-
T F F pends only on what happens when p is true. For
example to decide whether
F T F
F F F MCG flooded → the cricket is off
it is enough to see what happens when the MCG
is flooded. Thus we agree that p → q is true
when p is false. in some programming languages.

We define → by the truth table


p q p→q
T T T
T F F
F T T
F F T

4.3 Other connectives


Two other important connectives are ↔ (“if and
only if”) and ∨ (“exclusive or”).
The sentence p ↔ q is true exactly when the
truth values of p and q agree.

We define ↔ by the truth table


p q p↔q
T T T
T F F
F T F
F F T

We could also write p ↔ q as (p → q) ∧ (q → p).


We’ll see how to prove this in the next lesson.
The sentence p ∨ q is true exactly when the
truth values of p and q disagree.

We define ∨ by the truth table

p q p∨q
T T F
T F T
F T T
F F F

4.4 Remarks
1. The symbols ∧ and ∨ are intentionally sim-
ilar to the symbols ∩ and ∪ for set intersection
and union because
x ∈ A ∩ B ⇔ (x ∈ A) ∧ (x ∈ B)
x ∈ A ∪ B ⇔ (x ∈ A) ∨ (x ∈ B)
(We study sets later.)

2. The “exclusive or” function ∨ is written XOR


3. If we write 0 for F and 1 for T then ∨ becomes
the function
p q p∨q
1 1 0
1 0 1
0 1 1
0 0 0
This is also known as the “mod 2 sum”, because
1 + 1 = 2 ≡ 0 (mod 2). (It could also be called
the “mod 2 difference” because a + b is the same
a − b, mod 2).

4. The mod 2 sum occurs in many homes where


two switches p, q control the same light. The
truth value of p ∨ q tells whether the light is on
or not, and the light can be switched to the op-
posite state by switching the value of either p or
q.

Questions
4.1 Which of the following are propositions?
1 + 1 = 3, 1 + 1, 3 divides 7, 3÷7

4.2 Let f be the proposition “foo” and let b be


the proposition “bar”. Write the following
propositions in symbols, using f, b, → and
¬.

ˆ if foo, then bar.


ˆ bar if foo.
ˆ bar only if foo.
ˆ foo implies not bar.
ˆ foo is sufficient for bar.
ˆ foo is necessary for bar.

4.3 In the following examples, is the “or” in-


tended to be inclusive or exclusive?

ˆ Would you like coffee or tea?


ˆ Oranges or lemons are a good source
of vitamin C.
ˆ He will arrive in a minute or two.
Lesson 5: Tautologies and logical equivalence

A major problem in logic is to recognise 5.2 Logical equivalence


statements that are “always true” or “always
false”.
Sentences ϕ and ψ are logically equivalent if they
are the same truth function, which also means
5.1 Tautologies and contradictions ϕ ↔ ψ is a tautology. This relation between
sentences is written ϕ ⇔ ψ or ϕ ≡ ψ.
A sentence ϕ in propositional logic is a formula
with variables p, q, r, . . . which can take the val-
ues T and F. The possible interpretations of ϕ
are all possible assignments of values to its vari-
ables.

Example. p → q ≡ (¬p) ∨ q
A sentence in propositional logic is We know p → q has the truth table
ˆ a tautology if it has value T under all
interpretations; p q p→q
ˆ a contradiction if it has value F under T T T
all interpretations. T F F
F T T
We can check whether ϕ is a tautology, a F F T
contradiction, or neither by computing its value Now (¬p) ∨ q has the truth table
for all possible values of its variables.
p q ¬p (¬p) ∨ q
Example. (¬p) ∨ p is a tautology. T T F T
The truth table for (¬p) ∨ p is T F F F
p ¬p (¬p) ∨ p F T T T
T F T F F T T
F T T So p → q and (¬p) ∨ q have the same truth
So (¬p)∨p has value T under all interpretations, table (looking just at their columns). It follows
and thus is a tautology. (It is sometimes known from this that p → q can always be rewritten
as the law of the excluded middle). as (¬p) ∨ q. In fact, all truth functions can be
expressed in terms of ∧, ∨, and ¬.
We can similarly compute the values of any
truth function ϕ, so this is an algorithm for
recognising tautologies. However, if ϕ has n
variables, they have 2n sets of values, so the
amount of computation grows rapidly with n.
One of the biggest unsolved problems of logic This is like finding identities in algebra – one
and computer science is to find an efficient algo- uses known equivalences to rearrange, expand
rithm for recognising tautologies. and simplify.
5.3 Useful equivalences brackets. Since p ∨ (q ∨ r) ≡ (p ∨ q) ∨ r, we
can write either side as p ∨ q ∨ r. This is like
The following equivalences are the most fre- p + (q + r) = (p + q) + r = p + q + r in ordinary
quently used in this “algebra of logic”. algebra.

Equivalence law 3. The distributive laws are used to “expand”


p ↔ q ≡ (p → q) ∧ (q → p) combinations of ∧ and ∨.
Implication law p ∧ (q ∨ r) ≡ (p ∧ q) ∨ (p ∧ r)
p → q ≡ (¬p) ∨ q is like
Double Negation law p(q + r) = pq + pr.
¬¬p ≡ p
The other distributive law is not like anything
Idempotent laws in ordinary algebra.
p∧p≡p
4. Some of these laws are redundant, in the sense
p∨p≡p
that other laws imply them. For example, the
Commutative laws absorption law
p∧q ≡q∧p
p ∧ (p ∨ q) ≡ p
p∨q ≡q∨p
follows from the distributive, idempotent, iden-
Associative laws
tity and annihilation laws:
p ∧ (q ∧ r) ≡ (p ∧ q) ∧ r
p ∧ (p ∨ q) ≡ (p ∧ p) ∨ (p ∧ q)
p ∨ (q ∨ r) ≡ (p ∨ q) ∨ r
by distributive law
Distributive laws
≡ p ∨ (p ∧ q)
p ∧ (q ∨ r) ≡ (p ∧ q) ∨ (p ∧ r) by idempotent law
p ∨ (q ∧ r) ≡ (p ∨ q) ∧ (p ∨ r) ≡ (p ∧ T) ∨ (p ∧ q)
De Morgan’s laws by identity law
¬(p ∧ q) ≡ (¬p) ∨ (¬q) ≡ p ∧ (T ∨ q)
¬(p ∨ q) ≡ (¬p) ∧ (¬q) by distributive law
Identity laws ≡ p∧T
p∧T≡p by annihilation law
p∨F≡p ≡ p
by identity law
Annihilation laws
p∧F≡F
Questions
p∨T≡T
Inverse laws 5.1 Explain why there are 8 ways to assign
truth values to variables p, q, r; 16 ways
p ∧ (¬p) ≡ F
to assign truth values to variables p, q, r, s;
p ∨ (¬p) ≡ T and in general 2n ways to assign truth val-
Absorption laws ues to n variables.
p ∧ (p ∨ q) ≡ p
5.2 Use truth tables to verify the de Morgan’s
p ∨ (p ∧ q) ≡ p laws and absorption laws.

5.3 If p ∨ q is the “exclusive or” discussed last


Remarks
lesson, see whether it satisfies the distribu-
1. The commutative laws are used to rear- tive laws
range terms, as in ordinary algebra. The law
p ∨ q ≡ q ∨ p is like p + q = q + p in ordinary p ∧ (q ∨ r) ≡ (p ∧ q) ∨ (p ∧ r)
algebra, and p ∧ q ≡ q ∧ p is like pq = qp. p ∨ (q ∧ r) ≡ (p ∨ q) ∧ (p ∨ r)

2. The associative laws are used to remove


Lesson 6: Rules of inference

Last time we saw how to recognise tautolo- Example. The contrapositive of


gies and logically equivalent sentences by com-
“If it’s a bird then it has feathers.”
puting their truth tables. Another way is to in-
fer new sentences from old by rules of inference. is
“If it doesn’t have feathers, then it’s not a bird.”
6.1 Replacement
The contrapositive has the same meaning as the
Any sentence may be replaced by a logically original statement.
equivalent sentence. Any series of such replace-
ments therefore leads to a sentence equivalent to
Example. On the other hand, the negation of
the one we started with.
the statement
Using replacement is like the usual method
of proving identities in algebra – make a series “If it’s a bird then it has feathers.”
of replacements until the left hand side is found is
equal to the right hand side.
“It’s a bird and it doesn’t have feathers.”
This is (roughly speaking) the opposite of the
Example. Prove that x → y ≡ (¬y) → (¬x).
original statement. Note that the negation of
an “implies” statement is an “and” statement,
x → y ≡ (¬x) ∨ y not another “implies” statement.
by implication law
≡ y ∨ (¬x)
by commutative law 6.3 Using logic laws
≡ (¬¬y) ∨ (¬x)
Example. Prove that p → (q → p) is a tautol-
by law of double negation ogy.
≡ (¬y) → (¬x)
by implication law p → (q → p)
≡ (¬p) ∨ (q → p)
6.2 Contrapositives
by implication law
≡ (¬p) ∨ ((¬q) ∨ p)
x → y ≡ (¬y) → (¬x)
by implication law
(¬y) → (¬x) is the contrapositive of x → y.
≡ (¬p) ∨ (p ∨ (¬q))
by commutative law
Example. The contrapositive of
≡ ((¬p) ∨ p) ∨ (¬q)
MCG flooded → cricket is off
by associative law
is ≡ (p ∨ (¬p)) ∨ (¬q)
Cricket is on → MCG not flooded. by commutatve law
An implication and its contrapositive are ≡ T ∨ (¬q) by inverse law
equivalent: they mean the same thing! ≡ T by annihilation law
Example. Prove that ((p → q) ∧ p) → q is a Questions
tautology.
6.1 The slogan “no pain, no gain” stands for
an implication. What is it?
((p → q) ∧ p) → q
6.2 What is the contrapositive of “no pain, no
≡ ¬((p → q) ∧ p) ∨ q
gain”?
by implication law
≡ (¬(p → q) ∨ (¬p)) ∨ q 6.3 Write the following sentences as implica-
by de Morgan’s law tions, and then write their contrapositives.

≡ ¬(p → q) ∨ ((¬p) ∨ q) ˆ You can’t make an omelette without


by associative law breaking eggs.
≡ ¬(p → q) ∨ (p → q) ˆ If n is even, so is n2
by implication law ˆ Haste makes waste.
≡ (p → q) ∨ ¬(p → q)
6.4 Show that p → (q → (r → p)) is a tautol-
by commutative law ogy using the laws of logic.
≡ T by inverse law
6.5 Find a tautology with n variables which
This tautology says that “if p implies q and p is
is p → (q → p) for n = 2 and p →
true then q is true”.
(q → (r → p)) for n = 3.

6.4 Logical consequence

A sentence ψ is a logical consequence of a sen-


tence ϕ, if ψ = T whenever ϕ = T. We write
this as ϕ ⇒ ψ.

It is the same to say that ϕ → ψ is a tautol-


ogy, but ϕ ⇒ ψ makes it clearer that we are dis-
cussing a relation between the sentences ϕ and
ψ.
Any sentence ψ logically equivalent to ϕ is
a logical consequence of ϕ, but not all conse-
quences of ψ are equivalent to it.

Example. p ∧ q ⇒ p
p is a logical consequence of p ∧ q, because p = T
whenever p ∧ q = T. However, we can have
p ∧ q = F when p = T (namely, when q = F).
Hence p ∧ q and p are not equivalent.
This example shows that ⇒ is not symmet-
ric:
(p ∧ q) ⇒ p but p ⇏ (p ∧ q)
This is where ⇒ differs from ≡, because if ϕ ≡ ψ
then ψ ≡ ϕ.
In fact, we build the relation ≡ from ⇒ the
same way ↔ is built from →:
ϕ ≡ ψ means (ϕ ⇒ ψ) and (ψ ⇒ ϕ).
Lesson 7: Predicates and quantifiers

We get a more expressive language than Another way is to use quantifiers:


propositional logic by admitting predicates like ˆ ∀ (meaning “for all”) and
ˆ ∃ (meaning “there exists” or “there is”).
P (n), Q(x, y), R(a, b, c)
These stand for properties or relations such as
Example. ∃nP (n) is the (true) sentence
P (n) : n is prime
Q(x, y) : x ⩽ y there exists an n such that n is prime.
R(a, b, c) : a + b = c. ∀nP (n) is the (false) sentence
Those with one variable, such as “n is prime,” for all n, n is prime.
are usually called properties, while those with
two or more variables, such as “x ⩽ y,” are usu-
ally called relations. Note that when ∃n is read “there exists an
n” we also add a “such that.”

7.1 Predicates
A predicate such as “n is prime” is not a proposi- 7.3 Quantifiers and connectives
tion because it is neither true nor false. Rather,
it is a function P (n) of n with the Boolean val- We can also combine quantifiers with connec-
ues T (true) or F (false). In this case, P (n) is a tives from propositional logic.
function of natural numbers defined by
(
T if n is prime Example. Let Sq(n) be the predicate “n is a
P (n) =
F otherwise. square,” and let P os(n) be the predicate “n is
Similarly, the “x ⩽ y” predicate is a function of positive” as above. Then we can symbolise the
pairs of real numbers, defined by following sentences:
( There is a positive square:
T if x ⩽ y
R(x, y) = ∃n(P os(n) ∧ Sq(n)).
F otherwise.
Since most of mathematics involves properties There is a positive integer which is not a
and relations such as these, only a language with square:
predicates is adequate for mathematics (and ∃n(P os(n) ∧ ¬Sq(n))
computer science).
All squares are positive:
∀n(Sq(n) → P os(n))
7.2 Building sentences from predi-
cates
Notice that the “All. . . are” combination in
One way to create a sentence from a predicate English actually involves an implication. This
is to replace its variables by constants. For ex- is needed because we are making a claim only
ample, when P (n) is the predicate “n is prime,” about squares and the implication serves to
P (3) is the sentence “3 is prime.” “narrow down” the set we are examining.
7.4 Alternating quantifiers Remark. Another way to say “you can’t fool
all of the people all of the time” is
Combinations of quantifiers like ∀x∃y . . . , “for
all x there is a y . . .” are common in mathe- ∃p∃t¬F (p, t).
matics, and can be confusing. It helps to have
some examples in mind to recall the difference Questions
between ∀x∃y . . . and ∃y∀x . . . 7.1 Write “roses are red” in the language of
The relation x < y is convenient to illustrate predicate logic, using
such combinations; we write x < y as the pred-
icate L(x, y) rose(x) for “x is a rose”
Then red(x) for “x is red.”
∀x∃yL(x, y) 7.2 If P (n) stands for “n is prime” and E(n)
is the (true) sentence stands for “n is even,” what does P (n) ∧
(¬E(n)) say about n?
for all x there is a y such that x < y,
which says that there is no greatest number. 7.3 Using the predicates
But with the opposite combination of quan- pol(x) for “x is a politician”
tifiers we have
liar(x) for “x is a liar”
∃y∀xL(x, y) represent the following statements in logic.
is the false sentence
ˆ all politicians are liars
there is a y such that for all x, x < y,
ˆ some politicians are liars
which says there is a number greater than all
ˆ no politicians are liars
numbers.
Even though these statements are usually ˆ some politicians are not liars.
written without brackets they are effectively
Are any of these sentences logically equiv-
bracketed “from the centre”. So ∀x∃yL(x, y)
alent?
means ∀x(∃yL(x, y)) and ∃y∀xL(x, y) means
∃y(∀xL(x, y)).

7.5 An example from Abraham Lin-


coln

You can fool all of the people some of the time


and
you can fool some of the people all of the time
but
you can’t fool all of the people all of the time.
Let F (p, t) be the predicate:
person p can be fooled at time t.
Then
∀p∃tF (p, t) says
you can fool all of the people some of the time,
∃p∀tF (p, t) says
you can fool some of the people all of the time,
¬∀p∀tF (p, t) says
you can’t fool all of the people all of the time.
Hence Lincoln’s sentence in symbols is:
∀p∃tF (p, t) ∧ ∃p∀tF (p, t) ∧ ¬∀p∀tF (p, t)
Lesson 8: Predicate logic

8.1 Valid sentences Example. The sentence


The language of predicate logic is based on ∀x∃yQ(x, y) → ∃x∀yQ(x, y)
predicate symbols, variables, constants, brack- is false if we interpret Q(x, y) as x ⩽ y on the
ets, ∀, ∃ and connectives. The examples from real numbers. With this interpretation
last lesson illustrate how these ingredients are
used to form sentences. ∀x∃yQ(x, y) is true
(for any number there is a larger number), but
A sentence in predicate logic is valid if it has
∃x∀yQ(x, y) is false
value T under all interpretations.
(there is no number ⩽ all numbers). Hence the
implication is false.
This is similar to the definition of a tautology
in propositional logic. But now “all interpreta-
tions” means all interpretations of the predicate
symbols, which is more complicated. The inter- 8.4 Consequence and equivalence
pretation of a symbol P (n), say, must include
both the range of the variable n, as well as say- As in propositional logic, a sentence ψ is a logi-
ing those n for which P (n) is true. cal consequence of a sentence ϕ if any interpre-
tation which makes ϕ true makes ψ true. Again
8.2 Interpretations we write ϕ ⇒ ψ if ψ is a consequence of ϕ, and
this is the same as saying ϕ → ψ is valid.
For example, one interpretation of P (n) is “n is
positive,” where n ranges over the real numbers.
Under this interpretation, ∀nP (n) is false. Example. Any interpretation which makes
A different interpretation of P (n) is “n is ∀x∀yP (x, y) true makes ∀y∀xP (x, y) true, and
positive,” where n ranges over the numbers > 2. so ∀x∀yP (x, y) ⇒ ∀y∀xP (x, y).
Under this interpretation, ∀nP (n) is true.
Unlike in propositional logic, there are in- Similarly, sentences ψ and ϕ are equivalent,
finitely many different interpretations of each written ψ ≡ ϕ, if each is a consequence of the
formula. Thus there is no truth table method other. Some sentences are equivalent for “propo-
for predicate logic. We cannot decide whether a sitional logic reasons.”
formula is valid by testing all interpretations.

8.3 Recognising valid sentences Example. We have

Nevertheless, in some cases, we can see that a ∀x(P (x) ∧ Q(x)) ≡ ∀x(Q(x) ∧ P (x))
sentence is true for all interpretations. simply because

Example. ∀x∀yP (x, y) → ∀y∀xP (x, y) is true P (x) ∧ Q(x) ≡ Q(x) ∧ P (x)
for all properties P , and hence is valid. for any x.
Likewise, we can sometimes see that a sen-
tence is not valid by finding an interpretation However there are also equivalences that
which makes it false. genuinely involve quantifiers.
8.5 Useful equivalences 8.7* Completeness and undecidabil-
ity
Two important equivalences involving quanti-
fiers are In 1930, Gödel proved that there is a complete
set of rules of inference for predicate logic. This
means, in particular, that there is an algorithm
to list the valid sentences.
¬∀xP (x) ≡ ∃x¬P (x) However, in 1936, Church and Turing proved
that there is no algorithm to list the logically
¬∃xP (x) ≡ ∀x¬P (x)
false sentences. This means, in particular, that
predicate logic is undecidable: there is no algo-
These make sense intuitively. For example, rithm which, for any sentence ϕ, decides whether
¬∀xP (x) means P (x) is not true for all x, hence ϕ is valid or not.
there is an x for which P (x) is false, that is, This negative result is due to the power of
∃x¬P (x). predicate logic: it can express all mathemati-
They can also be viewed as “infinite De Mor- cal or computational problems, and it is known
gan’s laws.” If x ranges over {1, 2, 3, . . .} for ex- that some of these problems cannot be solved by
ample, then algorithm.
∀xP (x) ≡ P (1) ∧ P (2) ∧ P (3) ∧ · · ·
Questions
and
8.1 Give interpretations which make the fol-
∃xP (x) ≡ P (1) ∨ P (2) ∨ P (3) ∨ · · ·
lowing sentences false.
Hence
∃nP (n) → ∀nP (n)
¬∀xP (x) ≡ ¬ (P (1) ∧ P (2) ∧ P (3) ∧ · · · )
∀x∀y (R(x, y) → R(y, x))
≡ (¬P (1)) ∨ (¬P (2)) ∨ (¬P (3)) ∨ · · ·
∀m∃nS(m, n)
by de Morgan’s law
≡ ∃x¬P (x). 8.2 Give interpretations which show that the
sentences
And similarly
∃x (P (x) ∧ L(x))
¬∃xP (x) ≡ ¬ (P (1) ∨ P (2) ∨ P (3) ∨ · · · )
and
≡ (¬P (1)) ∧ (¬P (2)) ∧ (¬P (3)) ∧ · · ·
by de Morgan’s law ∃x (P (x) ∧ ¬L(x))
≡ ∀x¬P (x). are not equivalent.

8.3 Is ∃y∀xR(x, y) a logical consequence of


∀x∃yR(x, y)?
If so, explain why. If not, give an interpre-
8.6 Simplification tation which makes ∀x∃yR(x, y) true and
∃y∀xR(x, y) false.
The infinite de Morgan’s laws allow a certain
simplification of predicate formulas by “pushing 8.4 Is ∀x∃yR(x, y) a logical consequence of
¬ inside quantifiers.” ∃y∀xR(x, y)?
If so, explain why. If not, give an interpre-
tation which makes ∃y∀xR(x, y) true and
Example. ∀x∃yR(x, y) false.
¬∀x∃yQ(x, y) ≡ ∃x¬∃yQ(x, y)
≡ ∃x∀y¬Q(x, y). 8.5 Explain why ¬∀p∀tF (p, t) ≡ ∃p∃t¬F (p, t).

It is in fact possible to transform any quanti-


fied statement in predicate logic to an equivalent
with all quantifiers at the front.
Lesson 9: Mathematical induction

Since the natural numbers 0, 1, 2, 3, . . . are as required. This completes the induction step,
generated by a process which begins with 0 and and hence completes the proof.
repeatedly adds 1, we have the following. Example 2. Prove there are 2n n-bit binary
strings.

Property P is true for all natural numbers if Let P (n) be “there are 2n n-bit binary strings”.
1. P (0) is true. Base step. There is 20 = 1 0-bit binary string
2. P (k) ⇒ P (k + 1) for all k ∈ N. (the empty string) so P (0) is true.

This is called the principle of mathematical Induction step. We want to prove that
induction.
It is used in a style of proof called proof by there are 2k k-bit binary strings
induction, which consists of two steps. ⇒ there are 2k+1 (k + 1)-bit binary strings
Base step: Proof that the required property P Well, a (k + 1)-bit binary string is either W 0 or
is true for 0. W 1, where W is any k-bit binary string. Thus
Induction step: Proof that if P (k) is true if there are 2k k-bit binary strings W , there are
then P (k + 1) is true, for each k ∈ N. 2 × 2k = 2k+1 (k + 1)-bit binary strings.
This completes the induction step, and hence
completes the proof.
9.1 Examples

Example 1. Prove that 3 divides n3 + 2n for 9.2 Starting the base step higher
all n ∈ N
It is not always appropriate to start the induc-
Let P (n) be “3 divides n3 + 2n”. tion at 0. Some properties are true only from a
Base step. 3 divides 03 + 2 × 0 = 0, so P (0) is certain positive integer upwards, in which case
true. the induction starts at that integer.

Induction step. We want to prove Example 3. Prove n! > 2n for all integers
3 divides k 3 + 2k n⩾4
⇒ 3 divides (k + 1)3 + 2(k + 1). Let P (n) be “n! > 2n ”.
Well, Base step. 4! = 4 × 3 × 2 = 24 > 16 = 24 , so
P (4) is true.
(k + 1)3 + 2(k + 1)
= k 3 + 3k 2 + 3k + 1 + 2k + 2 Induction step. We want to prove k! > 2k ⇒
(k + 1)! > 2k+1 for all integers k ⩾ 4.
= k 3 + 2k + 3k 2 + 3k + 3
Now, for k ⩾ 4, if k! > 2k ,
= k 3 + 2k + 3(k 2 + k + 1).
(k+1)! = (k+1)×k! > (k+1)×2k > 2×2k = 2k+1 .
Therefore,
(The first > holds because we are assuming
3 divides k 3 + 2k k! > 2k and the second holds because k ⩾ 4.)
⇒ 3 divides k 3 + 2k + 3(k 2 + k + 1) Thus k! > 2k ⇒ (k + 1)! > 2k+1 , as required to
⇒ 3 divides (k + 1)3 + 2(k + 1) complete the induction.
So n! > 2n for all n ⩾ 4. Remark. Another proof is to write down
1 + 2 + 3 + · · · + (n − 1) + n
Example 4. Prove any integer value n ⩾ 8 (in n + (n − 1) + · · · + 3 + 2 + 1
cents) is obtainable with 3c and 5c stamps. and observe that each of the n columns sums
to n + 1. Thus the sum of twice the series is
Let P (n) be “n cents is obtainable with 3c and
n(n + 1), and hence the sum of the series itself
5c stamps”.
is n(n + 1)/2. One could argue that this proof
Base step. 8c can be obtained by a 3c plus a 5c uses induction stealthily, to prove that the sum
stamp. So P (8) is true. of each column is the same.
Induction step. We have to show that if k cents
is obtainable, so is (k + 1) cents, when k ⩾ 8. Questions
Case 1. The k cents is obtained using a 5c
In most induction problems set for students we
stamp (among others). Replace the 5c stamp by
skip the experimental part, which is finding what
two 3c stamps, thus obtaining k + 1 cents.
to prove. Before trying to prove that 3 divides
Case 2. If the k cents is obtained using only
n3 + 2n, for example, someone needs to guess
3c stamps, there are at least three of them (since
that it is true, perhaps by trying n = 1, 2, 3, 4.
k ⩾ 8). In this case, replace three 3c stamps by
two 5c stamps, again obtaining k + 1 cents. 9.1 In this question, try to guess what ? stands
Thus in either case, when k ⩾ 8, P (k) ⇒ for, by trying a few values of n.
P (k + 1). This completes the induction proof
that n cents are obtainable from 3c and 5c ˆ ? divides n2 + n
stamps, for all integers n ⩾ 8. ˆ The sum of the first n odd numbers
is ?
ˆ 1
1×2
1
+ 2×3 1
+ 3×4 1
+ . . . + n(n+1) = 1−?
9.3 Sums of series 9.2 If you correctly guessed the sum
1 1 1 1
Induction is often used to prove that sum for- 1×2 + 2×3 + 3×4 + ... + n(n+1) ,
mulas are correct. you might wonder why it is so simple.
Here is a clue:
Example 5. Prove 1+2+3+· · ·+n = n(n+1)/2 1
= 1
− 12 .
1×2 1
for all integers n ⩾ 1.
1 1
Let P (n) be “1 + 2 + 3 + · · · + n = n(n + 1)/2”. What is 2×3 ? 3×4 ?
How does this lead to a simple formula for
Base step. When n = 1, the left hand side is 1,
1 1 1 1
and the right hand side is 1(1 + 1)/2 = 2/2 = 1, 1×2 + 2×3 + 3×4 + ... + n(n+1) ?
so P (1) is true.
OK, if we can guess formulas correctly,
Induction step. We have to prove that why bother proving them by induction?
1 + 2 + ··· + k = k(k+1) The reason is that a statement which fits
2
(k+1)(k+2) many values of n can still be wrong.
⇒ 1 + 2 + · · · + k + (k + 1) = 2 .
Now, if P (k) is true, 9.3 Show that n2 + n + 41 is a prime number
for n = 1, 2, 3, 4 (and go further, if you
1 + 2 + · · · + k + (k + 1) like). Do you think n2 + n + 41 is prime
= (1 + 2 + · · · + k) + (k + 1) for all natural numbers n?
k(k+1)
= 2 + (k + 1) using P (k)
= (k + 1)( k2 + 1)
(k+1)(k+2)
= 2
as required.
This completes the induction proof.
Lesson 10: Induction and well-ordering

In the previous lesson we were able to prove Example 2. Prove that every positive integer
a property P holds for 0, 1, 2, . . . as follows: is a sum of distinct powers of 2. (Just a power
Base step. Prove P (0) of two by itself counts as a “sum”.)
Induction step. Prove P (k) ⇒ P (k + 1) for
The idea behind this proof is to repeatedly sub-
each natural number k.
tract the largest possible power of 2. We illus-
This is sufficient to prove that P (n) holds for
trate with the number 27.
all natural numbers n, but it may be difficult to
prove that P (k + 1) follows from P (k). It may 27 − largest power of 2 less than 27
in fact be easier to prove the induction step = 27 − 16 = 11
11 − largest power of 2 less than 11
= 11 − 8 = 3
3 − largest power of 2 less than 3
P (0) ∧ P (1) ∧ · · · ∧ P (k) ⇒ P (k + 1).
=3−2=1
Hence 27 = 16 + 8 + 2 + 1 = 24 + 23 + 21 + 20 .
That is, it may help to assume P holds for (It is only interesting to find distinct powers
all numbers before k + 1. Induction with this of 2, because of course each integer ⩾ 1 is a sum
style of induction step is sometimes called the of 1s, and 1 = 20 .)
strong form of mathematical induction. The strong induction proof goes as follows.
Let P (n) be “n is a sum of distinct powers of
2”.
Base step. 1 = 20 , so 1 is a sum of (one) power
10.1 Examples of strong induction of 2. Thus P (1) is true.
Induction step. Suppose each of the numbers
Example 1. Prove that every integer ⩾ 2 is a 1, 2, 3, . . . , k is a sum of distinct powers of 2. We
product of primes. (Just a prime by itself counts wish to prove that k +1 is a sum of distinct pow-
as a “product”.) ers of 2.
Let P (n) be “n is a product of primes”. This is certainly true if k + 1 is a power of 2. If
not, let 2j be the greatest power of 2 less than
Base step. 2 is a prime, hence a product of (one) k + 1. Then
prime. So P (2) is true.
i = k + 1 − 2j
Induction step. Suppose 2, 3, . . . , k are products
is one of the numbers 1, 2, 3, ..., k, and hence it
of primes. We wish to prove that k +1 is a prod-
is a sum of distinct powers of 2.
uct of primes.
Also, the powers of 2 that sum to i are all
This is certainly true if k + 1 is a prime. If not
less than 2j , otherwise 2j is less than half k + 1,
k + 1 = i × j, contrary to the choice of 2j as the largest power
for some natural numbers i and j less than k +1. of 2 less than k + 1.
But then i and j are products of primes by our Hence k + 1 = 2j + powers of 2 that sum to
assumption, hence so is i × j = k + 1. i is a sum of distinct powers of 2.
This completes the induction proof. This completes the induction proof.

10.2 Well-ordering and descent But then 2 = m1 /n1 , and we can repeat
the argument to show that m1 and n1 are both
Induction expresses the fact that each natural
even, so m1 = 2m2 and n1 = 2n2 , and so on.
number n can be reached by starting at 0 and
Since the argument can be repeated indefi-
going upwards (e.g. adding 1) a finite number
nitely, we get an infinite descending sequence of
of times.
natural numbers
Equivalent facts are that it is only a finite
number of steps downwards from any natural m > m1 > m2 > · · ·
number to 0, that any descending sequence of which is impossible.
natural numbers is finite, and that any set of Hence
√ there are no natural numbers m and
natural numbers has a least element. n with 2 = m/n.
This property is called well-ordering of the
natural numbers. It is often convenient to ar- Questions
range a proof to “work downwards” and appeal
to well-ordering by saying that the process of 10.1 For each of the following statements, say
working downwards must eventually stop. which is likely to require strong induction
Such proofs are equivalent to induction, for its proof.
though they are sometimes called “infinite de- an+1 −1
ˆ 1 + a + a2 + · · · + an = a−1
scent” or similar.
ˆ ¬ (p1 ∨ p2 ∨ p3 ∨ · · · ∨ pn ) ≡ (¬p1 ) ∧
(¬p2 ) ∧ (¬p3 ) ∧ · · · ∧ (¬pn )
10.3 Proofs by descent
ˆ Each fraction m n
< 1 is a sum of
Example 1. Prove that any integer ⩾ 2 has a distinct fractions with numerator 1
prime divisor. for example, 11 1 1 1

12 = 2 + 3 + 12 .
If n is prime, then it is a prime divisor of itself.
10.2 There is something else which tells you ev-
If not, let n1 < n be a divisor of n.
ery integer ⩾ 1 is a sum of distinct powers
If n1 is prime, it is a prime divisor of n. If
of 2. What is it?
not, let n2 < n1 be a divisor of n1 (and hence of
n). 10.3 Is every integer ⩾ 1 a sum of distinct pow-
If n2 is prime, it is a prime divisor of n. If ers of 3?
not, let n3 < n2 be a divisor of n2 , etc.
The sequence n > n1 > n2 > n3 > · · · must
eventually terminate, and this means we find a
prime divisor of n.

Example 2. Prove 2 is irrational.

Suppose that 2 = m/n for natural numbers m
and n. We will show this is impossible. Since the
square of an odd number is odd, we can argue
as follows

2 = m/n
⇒ 2 = m2 /n2 squaring both sides
⇒ m2 = 2n2
⇒ m2 is even
⇒ m is even
since the square of an odd number is odd
⇒ m = 2m1 say
⇒ 2n2 = m2 = 4m21
⇒ n2 = 2m21
⇒ n is even, = 2n1 say
Lesson 11: Sets

Sets are vital in expressing mathematics for- 11.2 Universal set


mally and are also very important data struc-
The idea of a “set of all sets” leads to logical
tures in computer science.
difficulties. Difficulties are avoided by always
A set is basically just an unordered collection
working within a local “universal set” which in-
of distinct objects, which we call its elements or
cludes only those objects under consideration.
members. Note that there is no notion of order
For example, when discussing arithmetic it
for a set, even though we often write down its
might be sufficient to work just with the num-
elements in some order for convenience. Also,
bers 0, 1, 2, 3, . . .. Our universal set could then
there is no notion of multiplicity: an object is
be taken as
either in a set or not – it cannot be in the set
multiple times. N = {0, 1, 2, 3, . . .},
and other sets of interest, e.g. {x : x is prime},
Sets A and B are equal when every element are parts of N.
of A is an element of B and vice-versa.
11.3 Subsets

11.1 Set notation We say that A is a subset of B and write


A ⊆ B when each element of A is an element
of B.
ˆ x ∈ S means x is an element of set S.

ˆ {x1 , x2 , x3 , . . .} is the set with elements


Example. The set of primes forms a subset of
x1 , x2 , x3 , . . . .
N, that is {x : x is prime} ⊆ N.
ˆ {x : P (x)} is the set of all x with property
P. 11.4 Characteristic functions
A subset A of B can be specified by its charac-
Example. teristic function χA , which tells which elements
17 ∈ {x : x is prime} = {2, 3, 5, 7, 11, 13, . . .} of B are in A and which are not.
{1, 2, 3} = {3, 1, 2} (
{1, 1, 1} = {1} 1 if x ∈ A
χA (x) =
0 if x ∈ /A
Sometimes it is more convenient to use a
slight variation on the colon notation given Example. The subset A = {a, c} of B =
above. For a set S and a property P , we {a, b, c} has the characteristic function χA with
sometimes write {x ∈ S : P (x)} instead of χA (a) = 1, χA (b) = 0, χA (c) = 1.
{x : x ∈ S and P (x)}.
We also write this function more simply as

For a finite set S, we write |S| for the number a b c


of elements of S. 1 0 1
In fact we can list all characteristic functions subset
on {a, b, c}, and hence all subsets of {a, b, c}, by
{0, 2, 4, 6, . . .} = {n ∈ N : n is even}
listing all sequences of three binary digits:
corresponds to the property of being even.
characteristic function subset Similarly, the set
a b c {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, . . .}
0 0 0 {} corresponds to the property of being prime. The
0 0 1 {c} power set P(N) corresponds to all possible prop-
0 1 0 {b} erties of natural numbers.
0 1 1 {b, c}
11.7* What are numbers?
1 0 0 {a}
1 0 1 {a, c} The most common approach to building mathe-
1 1 0 {a, b} matics up from logical foundations considers all
mathematical objects to be fundamentally made
1 1 1 {a, b, c}
of sets. One simple way to define numbers using
sets (due to von Neumann) is the following.
We could similarly list all the subsets of a
0 = {}
four-element set, and there would be 24 = 16 of
them, corresponding to the 24 sequences of 0s 1 = {0}
and ls. 2 = {0, 1}
In the same way, we find that an n-element ..
.
set has 2n subsets, because there are 2n binary
n + 1 = {0, 1, 2, . . . , n}
sequences of length n. (Each of the n places in
the sequence can be filled in two ways.) We are not going to use this definition in this
course. Still, it is interesting that numbers can
11.5 Power set be defined in such a simple way.

The set of all subsets of a set U is called the Questions


power set P(U ) of U . 11.1 Suppose E(x) stands for “x is even” and
F (x) stands for “5 divides x.”
Example. We see from the previous table that ˆ What is the set {x : E(x) ∧ F (x)}?
P({a, b, c}) is the set
ˆ Write a formula using E(x) and
{}, {c}, {b}, {b, c}, {a}, {a, c}, {a, b}, {a, b, c} . F (x) which describes the set
{5, 15, 25, 35, . . .}.
If U has n elements, then P(U ) has 2n ele-
11.2 How many subsets does the set
ments.
{2, 5, 10, 20} have?
(The reason P(U ) is called the “power” set 11.3 Consider the infinitely many sets
is probably that the number of its elements is
this power of 2. In fact, the power set of U is {x : 0 < x < 1}
sometimes written 2U .) {x : 0 < x < 12 }
{x : 0 < x < 13 }
11.6 Sets and properties {x : 0 < x < 14 }
..
We mentioned at the beginning that {x : P (x)} .
stands for the set of objects x with property P . Do they have any element in common?
Thus sets correspond to properties.
Properties of the natural numbers
0, 1, 2, 3, . . . , for example, correspond to sub-
sets of the set N = {0, 1, 2, 3, . . .}. Thus the
Lesson 12: Operations on sets

There is an “arithmetic” of sets similar to or- 12.4 Difference A − B


dinary arithmetic. There are operations similar
The difference A − B of sets A and B consists of
to addition, subtraction and multiplication.
the elements in A and not in B, indicated by the
shaded region in the following Venn diagram.
12.1 Venn diagrams
The simple operations on sets can be visualised U
with the help of Venn diagrams, which show sets A B
A, B, C, . . . as disks within a rectangle represent-
ing the universal set U .

U
A B The difference U − B relative to the univer-
sal set U is called the complement B of B. Here
is the Venn diagram of B.

U
A B
12.2 Union A ∪ B
The union A ∪ B of sets A and B consists of
the elements in A or B, and is indicated by the
shaded region in the following Venn diagram.

U 12.5 Symmetric difference A△B


A B The union of A − B and B − A is called the
symmetric difference A△B of A and B.

U
A B

12.3 Intersection A ∩ B
The intersection A ∩ B of sets A and B consists
of the elements in A and B, indicated by the
shaded region in the following Venn diagram. A△B consists of the elements of one of A, B
but not the other.
U It is clear from the diagram that we have not
A B only
A△B = (A − B) ∪ (B − A),
but also
A△B = (A ∪ B) − (A ∩ B).
12.6 Ordered Pairs area l × w. In fact, we call it an “l × w rectan-
gle.” This is probably the reason for using the
Sometimes we do want order to be important.
× sign, and for calling A × B a “product.”
In computer science arrays are ubiquitous ex-
amples of ordered data structures. In maths,
ordered pairs are frequently used. An ordered
Questions
pair (a, b) consists simply of a first object a and 12.1 Draw a Venn diagram for A ∩ B. What is
a second object b. The objects a and b are some- another name for this set?
times called the entries or coordinates of the or-
dered pair. 12.2 Check the de Morgan laws by drawing
Venn diagrams for A ∪ B, A ∩ B, A ∩ B
and A ∪ B
Two ordered pairs (a, b) and (c, d) are equal
if and only if a = c and b = d. 12.3 Find which of the following is true by
drawing suitable Venn diagrams.
Example. {0, 1} = {1, 0} but (0, 1) ̸= (1, 0). A ∩ (B△C) = (A ∩ B)△(A ∩ C)?
A△(B ∩ C) = (A△B) ∩ (A△C)?
There’s no reason we need to stop with pairs.
We can similarly define ordered triples, quadru- 12.4 If plane = line × line, what do you think
ples, and so on. When there are k coordinates, line × circle is? What about circle × cir-
we call the object an ordered k-tuple. Two or- cle?
dered k-tuples are equal if and only if their ith
coordinates are equal for i = 1, 2, . . . , k.

12.7 Cartesian product A × B

The set of ordered pairs


A × B = {(a, b) : a ∈ A and b ∈ B}
is the cartesian product of sets A and B.

The commonest example is where A = B =


R (the set of real numbers, or the number line).
Then the pairs (a, b) are points in the plane, so
R × R is the plane.

(a, b)
b

O a

Because Descartes used this idea in geome-


try, the cartesian product is named after him.

12.8 A × B and multiplication


If A has |A| elements and B has |B| elements,
then A × B has |A| × |B| elements.
Similarly, if L is a line of length l, and W is
a line of length w, then L × W is a rectangle of
Lesson 13: Functions


A function can be thought of as a “black 2. The square root function sqrt(x) = x with
box” which accepts inputs and, for each input, domain R⩾0 , codomain R, and pairs
produces a single output. √
{(x, x) : x ∈ R and x ⩾ 0}.

13.1 Defining functions via sets 2


Formally we represent a function f as a set X
of possible inputs, a set Y so that every out- 1
put of f is guaranteed to be in Y , and a set
of (input,output) pairs from X × Y . The vital
0
property of a function is that each input gives
exactly one output.
-1
A function f consists of a domain X, a -1 0 1 2 3
codomain Y , and a set of ordered pairs from
The image of this function (the set of y values)
X × Y which has exactly one ordered pair
is the set R⩾0 .
(x, y) for each x ∈ X.
When (a, b) is in this set we write f (a) = b. 3. The cubing function cube(x) = x3 with do-
main R, codomain R, and pairs
The set of y values occurring in these pairs is
the image of f . {(x, x3 ) : x ∈ R},

Note that the image of a function is always


a subset of its codomain but they may or may
not be equal.

If the image of a function is equal to its


1
codomain, we say the function is onto.

Examples. 0

1. The squaring function square(x)= x2 with -1


domain R, codomain R, and pairs -1 0 1
2
{(x, x ) : x ∈ R},
which form what we usually call the plot of the
squaring function.
1
The image of this function is the whole of the
0 codomain R, so it is onto.
-1 0 1

The image of this function (the set of y val-


ues) is the set R⩾0 of real numbers ⩾ 0.
13.2 Arrow notation functions are or are not one-to-one.

Example. The function f : R → R given by


If f is a function with domain A and
f (x) = 6x + 2 is one-to-one because
codomain B we write
f (x1 ) = f (x2 )
f : A → B,
⇒ 6x1 + 2 = 6x2 + 2
and we say that f is from A to B.
⇒ 6x1 = 6x2
For example, we could define ⇒ x1 = x2 .
square : R → R. Example. The function f : R → R given
We could also define by f (x) = x2 + 1 is not one-to-one because
f (−1) = 2 and f (1) = 2 and so
square : R → R⩾0 .
f (−1) = f (1).
Likewise, we could define
cube : R → R. Questions
However we could not define 13.1 Some of the following “rules” do not define
⩾0 genuine functions. Which are they?
cube : R → R ,
because for some x ∈ R, cube(x) is negative. ˆ For each set S of natural numbers, let
For example, cube(−1) = −1. f (S) be the least element of S.
ˆ For each set X of real numbers be-
13.3 One-to-one functions tween 0 and 1, let g(X) be the least
element of X.
A function f : X → Y is one-to-one if for ˆ For each circle C in the (x, y) plane,
each y in the image of f there is only one let h(C) be the minimum distance
x ∈ X such that f (x) = y. from C to the x-axis.
For example, the function cube(x) is one-to- ˆ For each pair A, B of sets of real num-
one because each real number y is the cube of bers, let s(A, B) be the smallest set
exactly one real number x. such that both A and B are subsets
The function square: R → R is not one-to- of s(A, B).
one because the real number 1 is the square of ˆ For each pair A, B of sets of real num-
two different real numbers, 1 and −1. (In fact bers, let t(A, B) be the largest set
each real y > 0 is the square of two different real that is a subset of both A and B.
√ √
numbers, y and − y)
On the other hand, square : R⩾0 → R is one- 13.2 For each of the following, say which can be
to-one because each real number y in R⩾0 is the defined with domain R and codomain R.
square of only one real number in R⩾0 , namely √ √
√ x2 , 1/x, log x, x, 3 x
y.
The last example shows that the domain of
a function is an important part of its descrip-
tion, because changing the domain can change
the properties of the function.

13.4 Proving a function is one-to-one


There is an equivalent way of phrasing the def-
inition of one-to-one: a function f : X → Y is
one-to-one when, for all x1 , x2 ∈ X,
f (x1 ) = f (x2 ) ⇒ x1 = x2 .
This can be useful for proving that some
Lesson 14: Examples of functions

The functions discussed in the last lesson 14.3 Characteristic functions


were familiar functions of real numbers. Many
A subset of N = {0, 1, 2, 3, . . .} can be repre-
other examples occur elsewhere, however.
sented by its characteristic function. For exam-
ple, the set of squares is represented by the func-
tion χ : N → {0, 1} defined by
(
14.1 Functions of several variables 1 if n is a square
χ(n) =
0 if n is not a square
We might define a function which has the following sequence of values
sum : R × R → R by sum(x, y) = x + y. 110010000100000010000000010000000000100 . . .
Because the domain of this function is R×R, the (with 1s at the positions of the squares
inputs to this function are ordered pairs (x, y) 0, 1, 4, 9, 16, 25, 36, . . .).
of real numbers. Because its codomain is R, we Any property of natural numbers can like-
are guaranteed that each output will be a real wise be represented by a characteristic function.
number. This function can be thought of as a For example, the function χ above represents
function of two variables x and y. the property of being a square.
Similarly we might define a function Thus any set or property of natural numbers
is represented by a function
binomial : R × R × N → R
χ : N → {0, 1}.
by
Characteristic functions of two or more vari-
binomial(a, b, n) = (a + b)n . ables represent relations between two or more
Here the inputs are ordered triples (x, y, n) such objects. For example, the relation x ⩽ y be-
that x and y are real numbers and n is a natural tween real numbers x and y has the character-
number. We can think of this as a function of istic function χ : R × R → {0, 1} defined by
three variables. (
1 if x ⩽ y
χ(x, y) =
0 otherwise.

14.2 Sequences 14.4 Boolean functions


The connectives ∧, ∨ and ¬ are functions of vari-
An infinite sequence of numbers, such as ables whose values come from the set B = {T, F}
1, 21 , 14 , 18 , 16
1
,..., of Boolean values (named after George Boole).
¬ is a function of one variable, so
can be viewed as the function f : N → R de-
fined by f (n) = 2−n . In this case, the inputs to ¬:B→B
f are natural numbers, and its outputs are real and it is completely defined by giving its values
numbers. on T and F, namely
Any infinite sequence a0 , a1 , a2 , a3 , . . . can be
viewed as a function g(n) = an from N to some ¬T = F and ¬F = T.
set containing the values an . This is what we previously did by giving the
truth table of ¬. The construction of f is sometimes called a “di-
∧ and ∨ are functions of two variables, so agonalisation argument”, because we get its val-
ues by switching values along the diagonal in the
∧:B×B→B
table of values of f0 , f1 , f2 , f3 , . . ..
and
∨:B×B→B
They are completely defined by giving their val-
ues on the pairs (T, T), (T, F), (F, T), (F, F) in
B × B, which is what their truth tables do.
Questions
14.1 Suggest domains and codomains for the
14.5* Characteristic functions and following functions, writing the domain as
subsets of N a cartesian product where applicable.
Mathematicians say that two (possibly infinite) gcd, reciprocal, remainder ∩, ∪
sets A and B have the same cardinality (size) if
there is a one-to-one and onto function from A 14.2 If A and B are subsets of N with charac-
to B. This function associates each element of A teristic functions χA and χB respectively,
with a unique element of B and vice-versa. With what set does the function χA (n)χB (n)
this definition, it is not too hard to show that, represent?
for example, N and Z have the same cardinality
14.3 How many Boolean functions of n vari-
(they are both “countably infinite”).
ables are there?
It turns out, though, that P(N) has a strictly
greater cardinality than N. We can prove this
by showing: no sequence f0 , f1 , f2 , f3 , . . . in-
cludes all characteristic functions for subsets of
N. (This shows that there are more characteris-
tic functions than natural numbers.)
In fact, for any infinite list f0 , f1 , f2 , f3 , . . .
of characteristic functions, we can define a char-
acteristic function f which is not on the list.
Imagine each function given as the infinite se-
quence of its values, so the list might look like
this:
f0 values 0101010101 . . .
f1 values 0000011101 . . .
f2 values 1111111111 . . .
f3 values 0000000000 . . .
f4 values 1001001001 . . .
..
.
Now if we switch each of the underlined values
to its opposite, we get a characteristic function
(
1 if fn (n) = 0
f (n) =
0 if fn (n) = 1
which is different from each function on the list.
In fact, it has a different value from fn on the
number n.
For the given example, f has values
11011 . . .
Lesson 15: Composition and inversion

Complicated functions are often built from 15.2 Conditions for composition
simple parts. For example, the function f : R →
Composite functions do not always exist.
R defined by f (x) = (x2 + 1)3 is computed by
doing the following steps in succession: Example. If reciprocal : R − {0} → R is de-
fined by reciprocal(x) = x1 and predecessor :
ˆ square, R → R is defined by predecessor(x) = x − 1,
then reciprocal ◦ predecessor does not exist, be-
ˆ add 1, cause predecessor(1) = 0 is not a legal input for
reciprocal.
ˆ cube.
To avoid this problem, we demand that the
We say that f (x) = (x2 + 1)3 is the composite codomain of h be equal to the domain of g for
of the functions (from R to R) g ◦ h to exist. This ensures that each output of
h will be a legal input for g.
ˆ square(x)=x2 ,

ˆ successor(x)=x + 1, Let h : A → B and g : C → D be func-


tions. Then g ◦ h : A → D exists if and only
ˆ cube(x)=x3 . if B = C.

15.1 Notation for composite func-


15.3 The identity function
tions
On each set A the function iA : A → A defined
In the present example we write by
f (x) = cube(successor(square(x))), iA (x) = x,
or is called the identity function (on A).
f = cube ◦ successor ◦ square.
15.4 Inverse functions
Let h : X → Y and g : Y → Z be functions. Functions f : A → A and g : A → A are said to
The function g ◦ h : X → Z is defined by be inverses (of each other) if
g ◦ h(x) = g(h(x)) f ◦ g = g ◦ f = iA .
and is called the composite of g and h. Example. square and sqrt are inverses of each
other on the set R⩾0 of reals ⩾ 0.

Warning: Remember that g ◦ h means “do sqrt(square(x)) = x and square(sqrt(x)) = x.


h first, then g.” g ◦ h is usually different from In fact, this is exactly what sqrt is supposed
h ◦ g. to do – reverse the process of squaring. How-
ever, this works only if we restrict the domain to
R⩾0 . On R we do not have sqrt(square(x)) = x
Example.
because, for example,
square(successor(x)) = (x + 1)2 = x2 + 2x + 1
successor(square(x)) = x2 + 1 sqrt(square(−1)) = sqrt(1) = 1.
This problem arises whenever we seek an in- 15.6 Operations
verse for a function which is not one-to-one. The
An operation is a particular type of function,
squaring function on R sends both 1 and −1 to
with domain A × A × A × . . . × A and codomain
1, but we want a single value 1 for sqrt(1). Thus
A, for some set A.
we have to restrict the squaring function to R⩾0 .
For example, the addition function f (a, b) =
a + b is called an operation on R, because f :
15.5 Conditions for inversion R × R → R. (That is, addition is a function of
two real variables, which takes real values.)
A function f can have an inverse without its An operation with one variable is called
domain and codomain being equal. unary, an operation with two variables is called
binary, an operation with three variables is
The inverse of a function f : A → B is a called ternary, and so on.
function f −1 : B → A such that Examples
f −1 ◦ f = iA and f ◦ f −1 = iB . 1. Addition is a binary operation on R.

Note that f −1 ◦ f and f ◦ f −1 are both iden- 2. Successor is a unary operation on N.


tity functions but they have different domains.
Not every function has an inverse, but we 3. Intersection is a binary operation on P(A)
can neatly classify the ones that do. for any set A.

4. Complementation is a unary operation on


Let f : A → B be a function. Then : f −1 P(A) for any set A.
B → A exists if and only if f is one-to-one
and onto. Questions
15.1 Suppose f, m and s are the following func-
Example: ex and log tions on the set of people.
Consider f : R → R⩾0 − {0} defined by f (x) =
m(x) = mother of x
ex . We know that ex is one-to-one (e.g. because
it is strictly increasing), and onto. So it has an f (x) = father of x
inverse f −1 on R⩾0 − {0}. s(x) = spouse of x

3 What are the English terms for the follow-


ing composite functions?
2 m ◦ s, f ◦ s, m ◦ m, f ◦ m, s ◦ s

1 15.2 Write the following functions as compos-


ites of square(x), sqrt(x), successor(x) and
0 cube(x).

-4 -3 -2 -1 0 1 1 + x3 , x3/2 , (1 + x)3 , (1 + x3 )2
Plot of y = ex . 15.3 What interesting feature do the following
In fact, f −1 = log(y) where functions have in common? (Hint: con-
sider their inverses.)
log : R⩾0 − {0} → R.
Now ˆ ¬ on B
ˆ The reciprocal, f (x) = x1 , on R − {0}
elog x = x and log(ex ) = x,
ˆ The function g(x) = x
x−1 , on R − {1}.
so elog x and log(ex )
are both identity functions,
but they have different domains.
The domain of elog x is R⩾0 − {0} (note log
is defined only for reals > 0). The domain of
log(ex ) is R.
Lesson 16: Relations

Mathematical objects can be related in var- This relation is also a function (the identity
ious ways, and any particular way of relating function on R), since there is exactly one pair
objects is called a relation on the set of objects for each x ∈ R.
in question.
(This also applies to relations in the every-
day sense. For example, “parent of” is a relation
on the set of people.)

A binary relation R on a set A consists of A 2. The < relation on R.


and a set of ordered pairs from A × A. This relation consists of all the pairs (x, y)
When (a, b) is in this set we write aRb. with x < y. It is the following shaded subset
of the plane.
Similarly, a ternary relation on A would be
1
defined by a set of ordered triples from A×A×A,
and so on. (A unary relation on A is just a sub-
0
set of A.)

-1
16.1 Relations and functions -1 0 1
Any function f : X → Y can be viewed as a (The dashed line indicates that the points
relation R on X ∪ Y . The relation is defined by where x = y are omitted.)
xRy if and only if y = f (x).
However, not every relation is a function.
Remember that a function must have exactly
one output y for each input x in its domain. In
a relation, on the other hand, an element x may
be related to many elements y, or to none at all. 3. Algebraic curves.
An algebraic curve consists of the points
(x, y) satisfying an equation p(x, y) = 0,
16.2 Examples where p is a polynomial.
1. Equality on R. E.g. unit circle x2 + y 2 − 1 = 0.
This is the relation consisting of the pairs 1
(x, x) for x ∈ R. Thus it is the following
subset of the plane. 0
1
-1
0 -1 0 1

Notice that this relation is not a function,


-1 because there are two pairs with the same x,
-1 0 1 e.g. (0, 1) and (0, −1).
Likewise, the curve y 2 = x2 (x + 1) is not a These properties are clear if one remembers
function. that a ≡ b (mod n) means a and b have the same
2 remainder on division by n.

1 Questions
16.1 Which of the following relations R(x, y)
0 satisfy ∀x∃yR(x, y)?

-1 ˆ x∧y =T (for propositions x, y)


ˆ x⊆y (for sets x, y of natural num-
-2 bers)
-1 0 1 2 ˆ x>y (for real numbers x, y )
ˆ x divides y (for natural numbers
x, y)

16.2 Use logic symbols and the ⩽ relation to


4. The subset relation ⊆. write a relation between real numbers x, y
This consists of the ordered pairs of sets which says that the point (x, y) lies in the
(A, B) such that A ⊆ B. A and B must both square with corners (0,0), (1,0), (0,1) and
be subsets of some universal set U . (1,1).

5. Congruence modulo n.
For a fixed n, congruence modulo n is a bi-
nary relation. It consists of all the ordered
pairs of integers (a, b) such that n divides
a − b.

16.3 Properties of congruence


As the symbol ≡ suggests, congruence mod n is
a lot like equality. Numbers a and b which are
congruent mod n are not necessarily equal, but
they are “equal up to multiples of n,” because
they have equal remainders when divided by n.
Because congruence is like equality, congru-
ence a ≡ b (mod n) behave a lot like equations.
In particular, they have the following three prop-
erties.

1. Reflexive property.
a ≡ a (mod n)
for any number a.

2. Symmetric property.
a ≡ b (mod n) ⇒ b ≡ a (mod n)
for any numbers a and b.

3. Transitive property.
a ≡ b (mod n) and b ≡ c (mod n) ⇒
a ≡ c (mod n)
for any numbers a, b and c.
Lesson 17: Equivalence relations

3. Similarity of triangles.
An equivalence relation R on a set A is a bi-
Triangles ABC and A′ B ′ C ′ are similar if
nary relation with the following three prop-
erties. AB BC CA
′ ′
= ′ ′ = ′ ′.
AB BC CA
1. Reflexivity. E.g. the following triangles are similar
aRa C′
for all a ∈ A.

2. Symmetry.
C
aRb ⇒ bRa
for all a, b ∈ A.

3. Transitivity. A B B′ A′
aRb and bRc ⇒ aRc 4. Parallelism of lines.
for all a, b, c ∈ A. The relation L∥M (L is parallel to M ) is an
equivalence relation.
Equality and congruence mod n (for fixed n) are
Remark
examples of equivalence relations.
In all these cases the relation is an equivalence
because it says that objects are the same in some
respect.
17.1 Other equivalence relations

1. Equivalence of fractions. 1. Equivalent fractions have the same re-


Two fractions are equivalent if they reduce duced form.
to the same fraction when the numerator and
denominator of each are divided by their gcd.
E.g. 24 and 36 are equivalent because both re- 2. Congruent triangles have the same side
duce to 12 . lengths.

2. Congruence of triangles.
Triangles ABC and A′ B ′ C ′ are congruent if 3. Similar triangles have the same shape.
AB = A′ B ′ , BC = B ′ C ′ and CA = C ′ A′ . E.g.
the following triangles are congruent.
4. Parallel lines have the same direction.
C C′

Sameness is always reflexive (a is the same as


a), symmetric (if a is the same as b, then b is
the same as a) and transitive (if a is the same
A B B′ A′
as b and b is the same as c, then a is the same
as c).
17.2 Equivalence classes 17.4 Partitions and equivalence
classes
Conversely, we can show that if R is a reflexive,
symmetric and transitive relation then aRb says A partition of a set S is a set of subsets of S
that a and b are the same in some respect: they such that each element of S is in exactly one
have the same R-equivalence class. of the subsets.

Using what we showed in the last section, we


If R is an equivalence relation we define the have the following.
R-equivalence class of a to be
[a] = {s : sRa}. If R is an equivalence relation on a set A,
then the equivalence classes of R form a par-
Thus [a] consists of all the elements related to tition of A. Two elements of A are related if
a. It can also be defined as {s : aRs}, because and only if they are in the same equivalence
sRa if and only if aRs, by symmetry of R. class.

Examples
ˆ The parallel equivalence class of a line L Example. Let R be the relation on Z defined
consists of all lines parallel to L. by aRb if and only if a ≡ b (mod 3). The three
ˆ The equivalence class of 1 for congruence equivalence classes of R are
mod 2 is the set of all odd numbers. {x : x ≡ 0 (mod 3)} = {3k : k ∈ Z}
{x : x ≡ 1 (mod 3)} = {3k + 1 : k ∈ Z}
{x : x ≡ 2 (mod 3)} = {3k + 2 : k ∈ Z}.
17.3 Equivalence class properties These partition the set Z.

Claim. If two elements are related by an equiv- Questions


alence relation R on a set A, their equivalence
classes are equal. 17.1 Which of the following relations between
integers x and y are equivalence relations?
Proof. Suppose a, b ∈ A and aRb. Now
ˆ |x| = |y|
s ∈ [a] ⇒ sRa by definition of [a]
ˆ x3 − y 3 = 1
⇒ sRb by transitivity of R
ˆ x divides y
since sRa and aRb
ˆ 5 divides x − y
⇒ s ∈ [b] by definition of [b].
Thus all elements of [a] belong to [b]. Similarly, 17.2 For those relations in Question 17.1 that
all elements of [b] belong to [a], hence [a] = [b]. are not equivalence relations, say which
□ properties of equivalence they fail to sat-
isfy.
Claim. If R is an equivalence relation on a
set A, each element of A belongs to exactly one 17.3 For those that are equivalence relations,
equivalence class. say what is the “same” about the related
objects.
Proof. Suppose a, b, c ∈ A, and c ∈ [a] ∩ [b].
17.4 Also, for those relations that are equiva-
c ∈ [a] and c ∈ [b] lence relations, describe their equivalence
⇒ cRa and cRb classes.
by definition of [a] and [b]
⇒ aRc and cRb by symmetry
⇒ aRb by transitivity
⇒ [a] = [b]
by the previous claim.
Lesson 18: Order relations

18.1 Partial order relations 18.2 Total order relations


A total order relation is a special kind of partial
A partial order relation R on a set A is a bi- order relation that “puts everything in order”.
nary relation with the following three prop-
erties.
A total order relation R on a set A is a par-
1. Reflexivity. tial order relation that also has the property
aRb or bRa for all a, b ∈ A.
aRa
for all a ∈ A.
Examples.
2. Antisymmetry.
1. ⩽ on R
aRb and bRa ⇒ a = b This is a total order relation because for all
for all a, b ∈ A. real numbers a and b we have a ⩽ b or b ⩽ a.

3. Transitivity. 2. ⊆ on P(N).
This is not a total order because, for example,
aRb and bRc ⇒ aRc
{1, 2} ⊈ {1, 3} and {1, 3} ⊈ {1, 2}.
for all a, b, c ∈ A.
3. Divisibility on N.
Examples. This is not a total order because, for exam-
ple, 2 does not divide 3 and 3 does not divide
1. ⩽ on R. 2.
Reflexive: a ⩽ a for all a ∈ R.
Antisymmetric: a ⩽ b and b ⩽ a ⇒ a = b for 4. Alphabetical order of words.
all a, b ∈ R. This is a total order because given any two
Transitive: a ⩽ b and b ⩽ c ⇒ a ⩽ c for all different words, one will appear before the
a, b, c ∈ R. other in alphabetical order.
2. ⊆ on P(N).
Reflexive: A ⊆ A for all A ∈ P(N). 18.3 Hasse diagrams
Antisymmetric: A ⊆ B and B ⊆ A ⇒ A = B A partial order relation R on a finite set A can be
for all A, B ∈ P(N). represented as a Hasse diagram. The elements
Transitive: A ⊆ B and B ⊆ C ⇒ A ⊆ C for of A are written on the page and connected by
all A, B, C ∈ P(N). lines so that, for any a, b ∈ A, aRb exactly when
3. Divisibility on N. b can be reached from a by travelling upward
The relation “a divides b” on natural num- along the lines.
bers is reflexive, antisymmetric and transi-
Example. A Hasse diagram for the relation ⊆
tive. We leave checking this as an exercise.
on the set P({1, 2}) can be drawn as follows.
4. Alphabetical order of words.
{1, 2}
Words on the English alphabet are alphabet-
ically ordered by comparing the leftmost let-
{1} {2}
ter at which they differ. We leave checking
that this relation is reflexive, antisymmetric
and transitive as an exercise. ∅
Example. A Hasse diagram for the relation “di- Example. The relation ⩽ on {x : x ∈ R, x ⩾ 0}
vides” on the set {1, 2, 3, 5, 6, 10, 15, 30} can be is not a well-order relation. For example, the
drawn as follows. subset {x : x ∈ R, x > 3} has no least element.
30
Questions
6 15 18.1 Explain why “antisymmetric” does not
10
mean “not symmetric”. Give an example
3 of a relation which is neither symmetric
2 5
nor antisymmetric.
1 18.2 Draw a diagram of the positive divisors of
42 under the relation “divides.” Why does
Example. A Hasse diagram for the relation ⩽
it resemble the diagram for the positive di-
on the set {1, 2, 3, 4, 5} can be drawn as follows.
visors of 30}?
5
18.3 Invent a partial order relation on N × N.
Is your ordering a total ordering? Is your
4
ordering a well-ordering?
3

Notice how this last Hasse diagram can be


simply drawn as a vertical chain, when the pre-
vious two are “wider” and more complicated.
This corresponds to the fact that the last exam-
ple was of a total order relation but the previous
two were not of total order relations.

18.4 Well-ordering
A well-order relation on a set is a total order
relation that also has the property that each
nonempty set of its elements contains a least el-
ement.

A well-order relation R on a set A is a to-


tal order relation such that, for all nonempty
S ⊆ A, there exists an ℓ ∈ S such that ℓRa
for all a ∈ S.

Example. The relation ⩽ on N is a well-order


relation because every nonempty subset of N has
a least element.

The well-ordering of N is the basis of proofs


by induction.

Example. The relation ⩽ on Z is not a well-


order relation. For example, Z itself has no least
element.
Lesson 19: Selections and Arrangements

19.1 Ordered selections without For every unordered list our reviewer could
repetition make there are 3! = 6 corresponding possible or-
dered lists. And we’ve seen that she could make
A reviewer is going to compare ten phones and
10 × 9 × 8 ordered lists. So the number of un-
list, in order, a top three. In how many ways can
ordered lists she could make is 10×9×8
6 .
she do this? More generally, how many ways are
For every combination of r elements from a
there to arrange r objects chosen from a set of
set of n elements there are r! corresponding per-
n objects?
mutations. So, using our formula for the number
In our example, the reviewer has 10 options
of permutations we have the following.
for her favourite, but then only 9 for her second-
favourite, and 8 for third-favourite. So there are
10 × 9 × 8 ways she could make her list. The number of combinations of r elements
For an ordered selection without repetition from a set of n elements (0 ⩽ r ⩽ n) is
of r elements from a set of n elements there are n(n − 1) · · · (n − r + 1) n! n

= = .
r! r!(n − r)! r
n options for the 1st element
Notice that the notation nr is used for r!(n−r)!
n!

n−1 options for the 2nd element .
n−2 options for the 3rd element Expressions like this are called binomial coeffi-
.. .. cients. We’ll see why they are called this in the
. . next lesson.
n−r+1 options for the rth element.
19.3 Ordered selections with
So we have the following formula.
repetition
An ordered selection of r elements from a set X
The number of ordered selections without
is really just a sequence of length r with each
repetition of r elements from a set of n el-
term in X. If X has n elements, then there are
ements (0 ⩽ r ⩽ n) is
n possibilities for each term and so:
n!
n(n − 1) · · · (n − r + 1) = .
(n − r)!
The number of sequences of r terms, each
When r = n and all the elements of a set S from some set of n elements, is
are ordered, we just say that this is a permuta- r
tion of S. Our formula tells us there are n! such | ×n×
n {z· · · × n} = n .
r
permutations. For example, there are 3! = 6
permutations of the set {a, b, c}:
(a, b, c), (a, c, b), (b, a, c), (b, c, a), (c, a, b), (c, b, a). 19.4 Unordered selections with
repetition
19.2 Unordered selections without A shop has a special deal on any four cans of soft
repetition drink. Cola, lemonade and sarsaparilla flavours
What if our reviewer instead chose an unordered are available. In how many ways can you select
top three? In how many ways could she do that? four cans?
More generally, how many ways are there to We can write a selection in a table, for ex-
choose (without order) r objects from a set of ample,
n objects?
C L S C L S
and .
• •• • • •••
A combination of r elements from a set S is
a subset of S with r elements. We can change a table like this into a string
of zeroes and ones, by moving from left to right
reading a “•” as a 0 and a column separator as Questions
a 1. The tables above would be converted into
19.1 A bank requires a PIN that is a string of
0 1 0 0 1 0 and 1 0 1 0 0 0 four decimal digits. How many such PINs
are there? How many are made of four
Notice that each string has four zeroes (one different digits?
for each can selected) and two ones (one fewer
than the number of flavours). We can choose 19.2 How many binary strings of length 5 are
a string like this by beginning with a string of there? How many of these contain exactly
six ones and then choosing four ones to change two 1s?
6

to zeroes. There are 4 ways to do this and so
6

there are 4 possible can selections. 19.3 In a game, each of ten players holds red,
An unordered selection of r elements, with blue and green marbles, and places one
repetition allowed, from a set X of n elements marble in a bag. How many possibilities
can be thought of as a multiset with r elements, are there for the colours of marbles in the
each in X. As in the example, we can represent bag? If each player chooses their colour at
each such multiset with a string of r zeroes and random are all of these possibilities equally
n − 1 ones. We can choose a string like this by likely?
beginning with a string of n + r − 1 ones and
then choosing r ones to change to zeroes.

The number of multisets of r elements, each


from a set of n elements, is
n+r−1 (n + r − 1)!

= .
r r!(n − 1)!

19.5 The pigeonhole principle


The pigeonhole principle is a reasonably obvious
statement, but can still be very useful.

If n items are placed in m containers with


n > m, then at least one container has at
least two items.

Example. If a drawer contains only blue, black


and white socks and you take out four socks
without looking at them, then you are guaran-
teed to have two of the same colour.

We can generalise the pigeonhole principle


as follows.

If n items are placed in m containers, then at


n
least one container has at least ⌈ m ⌉ items.

n
In the above ⌈ m ⌉ means the smallest integer
n n
greater than or equal to m (or m “rounded up”).

Example. If 21 tasks have been distributed be-


tween four processor cores, the busiest core must
have been assigned at least 6 tasks.
Lesson 20: Pascal’s triangle

20.1 Pascal’s triangle try in the triangle is the sum of the two entries
above it. To see why this is, we’ll begin with an
We can write the binomial coefficients in an (in-
example.
finite) triangular array as follows:
Example. Why is 62 = 52 + 51 ?

0

1
0
1 There are 62 combinations of 2 elements of
0 1 {1, 2, 3, 4, 5, 6}. Every such combination either
2 2 2

3
0 31 32 3
0 41 42 43 4 ˆ does not contain a 6, in which case it is
4
one of the 52 combinations of 2 elements

0 1 2 3 4
5 5 5 5 5 5

0 1 2 3 4 5 of {1, 2, 3, 4, 5}; or
6 6 6 6 6 6 6

0 1 2 3 4 5 6
.. .. .. ˆ does contain a 6, in which case the rest of
. . . the combination is one of the 51 combi-
Here are the first ten rows with the entries as nations of 1 element from {1, 2, 3, 4, 5}.
integers: So 62 = 52 + 51 .

1
We can make a similar argument in general.
1 1
Let X be a set of n elements and x is a fixed
1 2 1
1 3 3 1 n−1
of X. For any r ∈ {1, . . . , n}, there are
element
r combinations of r elements of X that do
1 4 6 4 1
not contain x and there are n−1
r−1 combinations
1 5 10 10 5 1
of r elements of X that do contain x. So:
1 6 15 20 15 6 1
1 7
35 35 21 7 1 21
1 8
56 70 56 28 8 28 1 n n−1 n−1
= + for 1 ⩽ r ⩽ n.
1 9 36 84 126 126 84 36 9 1 r r r−1
1 10 45 120 210 252 210 120 45 10 1
This shows that every internal entry in Pascal’s
This triangular array is often called Pascal’s triangle is the sum of the two above it.
triangle (although Pascal was nowhere near the
first to discover it). 20.3 The binomial theorem
(x + y)0 = 1
20.2 Patterns (x + y)1 = x+y
Writing the binomial coeffcients this way reveals (x + y)2 = x2 + 2xy + y 2
3
(x + y) = x + 3x2 y + 3xy 2 + y 3
3
a lot of different patterns in them. Perhaps the
most obvious is that every row reads the same (x + y)4 = x4 + 4x3 y + 6x2 y 2 + 4xy 3 + y 4
left-to-right and right-to-left. Choosing r ele- (x+y) = x +5x4 y+10x3 y 2 +10x2 y 3 +5xy 4 +y 5
5 5

ments from a set of n elements to be in a combi-


nation is equivalent to choosing n − r elements Notice that the coefficients on the right are
from the same set to not be in the combination. exactly the same as the entries in Pascal’s tri-
So: angle. Why does this happen? Think about
expanding (x + y)3 and finding the coefficient of

n n
xy 2 , for example.
= for 0 ⩽ r ⩽ n. (x + y)(x + y)(x + y) = xxx + xxy + xyx + xyy
r n−r
+yxx + yxy + yyx + yyy
This shows that every row reads the same left- = x3 + 3x2 y + 3xy 2 + y 3
to-right and right-to-left. The coefficient of xy 2 is 3 because we have
Another pattern is that every “internal” en- three terms in the sum above that contain two
y’s (those underlined). This is because there are Questions
3
2 ways to choose two of the three factors in a 20.1 Substitute x = 1 and y = −1 into the
term to be y’s.
statement of the binomial theorem. What
The same logic holds in general. The coeffi- does this tell you about the rows of Pas-
cient of xn−r y r in (x + y)n will be nr because
cal’s triangle?
there will be nr ways to choose r of the n fac-
tors in a term to be y’s. This fact is called the 20.2 Find a pattern in the sums of the rows
binomial theorem. in Pascal’s triangle. Prove your pattern
holds using the binomial theorem. Also
prove it holds by considering the powerset
Binomial theorem For any n ∈ N,
of a set.
(x+y)n = n0 xn y 0 + n1 xn−1 y 1 + n2 xn−2 y 2 +

n 20.3 Use inclusion-exclusion to work out how
+ nn x0 y n .
1 n−1
· · · + n−1 x y
many numbers in the set {1, . . . , 100} are
divisible by 2 or 3 or 5.

20.4 Inclusion-exclusion
A school gives out prizes to its best ten students
in music and its best eight students in art. If
three students receive prizes in both, how many
students get a prize? If we try to calculate this
as 10 + 8 then we have counted the three over-
achievers twice. To compensate we need to sub-
tract three and calculate 10 + 8 − 3 = 15.
In general, if A and B are finite sets then we
have
|A ∪ B| = |A| + |B| − |A ∩ B|.
With a bit more care we can see that if A, B
and C are sets then we have
|A ∪ B ∪ C| = |A| + |B| + |C| − |A ∩ B|
−|A ∩ C| − |B ∩ C| + |A ∩ B ∩ C|.
This is part of a more general law called the
inclusion-exclusion principle.

Let X1 , X2 , . . . , Xt be finite sets. To calculate


|X1 ∪ X2 ∪ · · · ∪ Xt |:
ˆ add the sizes of the sets;
ˆ subtract the sizes of the 2-way intersections;
ˆ add the sizes of the 3-way intersections;
ˆ subtract the sizes of the 4-way intersections;
..
.
ˆ add/subtract the size of the t-way intersection.

To see why this works, think of an element x that


is in n of the sets X1 , X2 , . . . , Xt . It is counted

n n n n n
− + − + ··· ±
1 2 3 4 n
times. By the Binomial theorem with x = 1 and
y = −1 (see Question 20.1), this is equal to 1.
So each element is counted exactly once.
Lesson 21: Probability

Probability gives us a way to model ran- It can be convenient to give this as a table:
dom processes mathematically. These processes
s 1 2 3 4
could be anything from the rolling of dice, to .
1 1 1 1
radioactive decay of atoms, to the performance Pr(s) 2 4 8 8
of a stock market index. The mathematical en- Example. Rolling a fair six-sided die could
vironment we work in when dealing with prob- be modeled by a probability space with sample
abilities is called a probability space. space S = {1, 2, 3, 4, 5, 6} and probability func-
tion Pr given as follows.

21.1 Probability spaces s 1 2 3 4 5 6


1 1 1 1 1 1
.
Pr(s) 6 6 6 6 6 6
We’ll start with a formal definition and then
A sample space like this one where every out-
look at some examples of how the definition is
come has an equal probability is sometimes
used.
called a uniform sample space. Outcomes from
a uniform sample space are said to have been
A probability space consists of taken uniformly at random.

ˆ a set S called a sample space which con- 21.2 Events


tains all the possible outcomes of the ran-
dom process; and An event is a subset of the sample space.
ˆ a probability function Pr : S → [0, 1] such
An event is just a collection of outcomes we are
that the sum of the probabilities of the out-
interested in for some reason.
comes in S is 1.
Example. In the die rolling example with S =
Each time the process occurs it should produce {1, 2, 3, 4, 5, 6}, we could define the event of
exactly one outcome (never zero or more than rolling at least a 3. Formally, this would be the
one). The probability of an outcome is a mea- set {3, 4, 5, 6}. We could also define the event of
sure of the likeliness that it will occur. It is rolling an odd number as the set {1, 3, 5}.
given as a real number between 0 and 1 inclu-
sive, where 0 indicates that the outcome cannot The probability of an event A is the sum of
occur and 1 indicates that the outcome must oc- the probabilities of the outcomes in A.
cur.

Example. Example. In the spinner example, for the event


A = {1, 2, 4}, we have
4 Pr(A) = Pr(1) + Pr(2) + Pr(4)
3 = 1 1 1
1 2 + 4 + 8
7
= 8.
2
In a uniform sample space (where all out-
comes are equally likely) the probability of an
The spinner above might be modeled by a prob- event A can be calculated as:
ability space with sample space S = {1, 2, 3, 4}
number of outcomes in A |A|
and probability function given as follows. Pr(A) = = .
 number of outcomes |S|
1
2 for s = 1



 1 for s = 2 21.3 Operations on events

4
Pr(s) = 1
8 for s = 3 Because events are defined as sets we can per-



 1

form set operations on them. If A and B are
8 for s = 4.
1
events for a sample space S, then Pr(s) = 8 for any s ∈ S. So,
1
ˆ A ∪ B is the event “A or B,” A = {100, 101, 110, 111} Pr(A) = 2
1
B = {010, 011, 110, 111} Pr(B) = 2
ˆ A ∩ B is the event “A and B,” 3
C = {011, 101, 110} Pr(C) = 8
1
ˆ A is the event “not A.” A ∩ B = {110, 111} Pr(A ∩ B) = 4
1
A ∩ C = {101, 110} Pr(A ∩ C) = 4
We always take the sample space as our univer-
sal set, so A means S − A. So Pr(A ∩ B) = Pr(A)Pr(B) but Pr(A ∩ C) ̸=
Pr(A)Pr(C).

21.4 Probabilities of unions 21.6 Warning


We saw in the section on the inclusion-exclusion Random processes can occur in both discrete
principle that |A ∪ B| = |A| + |B| − |A ∩ B| for and continuous settings, and probability theory
finite sets A and B. We have a similar law in can be applied in either setting. In this lesson,
probability. and in the next four lessons, we are discussing
only the discrete case. Many of the definitions
For any two events A and B, and results we state apply only in this case. Our
definition of a probability space, for example, is
Pr(A ∪ B) = Pr(A) + Pr(B) − Pr(A ∩ B).
actually the definition of a discrete probability
space, and so on.
Example. In our die rolling example, let A = The discrete setting provides a good envi-
{1, 2} and B = {2, 3, 4} be events. Then ronment to learn most of the vital concepts and
intuitions of probability theory. What you learn
|A| |B| |A ∩ B| 2 3 1 2
Pr(A∪B) = + − = + − = . here is very useful in itself, and will act as a good
|S| |S| |S| 6 6 6 3
base if you go on to study continuous probabil-
Two events A and B are mutually exclusive ity.
if P r(A ∩ B) = 0, that is, if A and B cannot
occur together. For mutually exclusive events, Questions
we have
An integer is chosen uniformly at random from
Pr(A ∪ B) = Pr(A) + Pr(B). the set {1, 2, . . . , 30}. Let A be the event that
the integer is at most 20. Let B be the event
21.5 Independent events that the integer is divisible by 6. Let C be the
event that the integer’s last digit is a 5.
We say that two events are independent when
the occurrence or non-occurrence of one event 21.1 Write A, B and C as sets, and find their
does not affect the likelihood of the other occur- probabilities.
ring.
21.2 Find the probabilities of A ∪ B, A ∪ C and
B ∪ C. Which pairs of A, B, C are mutu-
Two events A and B are independent if ally exclusive?
Pr(A ∩ B) = Pr(A)Pr(B). 21.3 Find the probabilities of A ∩ B, A ∩ C and
B ∩ C. Which pairs of A, B, C are inde-
pendent?
Example. A binary string of length 3 is gener-
ated uniformly at random. The event A that the
first bit is a 1 is independent of the event B that
the second bit is a 1. But A is not independent
of the event C that the string contains exactly
two 1s.
Formally, the sample space is S =
{000, 001, 010, 011, 100, 101, 110, 111} and
Lesson 22: Conditional probability and Bayes’ theorem

Your friend believes that Python coding has least 3 and B be the event that the result was
become more popular than AFL in Melbourne. even. What is Pr(A|B)?
She bets you $10 that the next person to pass 1
Pr(A ∩ B) = Pr(4) = 8
you on the street will be a Python program-
1 1 3
mer. You feel confident about this bet. How- Pr(B) = Pr(2) + Pr(4) = 4 + 8 = 8
ever, when you see a man in a “Hello, world!” Thus,
Pr(A∩B)
t-shirt approaching, you don’t feel so confident Pr(A|B) = Pr(B) = ( 18 )/( 38 ) = 13 .
any more. Why is this? Example. A binary string of length 6 is gener-
We can think about this with a diagram. ated uniformly at random. Let A be the event
The rectangle represents the set of people in that the first bit is a 1 and B be the event that
Melbourne, the circle P is the set of Python the string contains two 1s. What is Pr(A|B)?
coders, and the circle T is the set of “Hello, There are 26 strings in our sample space.
world!” t-shirt owners. Now A ∩ B occurs when the first bit is 1 and the
5
rest of the string contains 1 one. There
6 are 1
5
such strings and so Pr(A ∩ B) = 1 /2 . Also,
there are 62 strings containing two 1s and so
Pr(B) = 62 /26 . Thus,
T P
Pr(A∩B) 5 6
= 13 .

Pr(A|B) = Pr(B) = 1 / 2

22.2 Independence again


Initially, you feel confident because the circle Our definition of conditional probability gives us
P takes up a small proportion of the rectan- another way of defining independence. We can
gle. But when you learn that your randomly say that events A and B are independent if
selected person is in the circle T , you feel bad
Pr(A) = Pr(A|B).
because the circle P covers almost all of T . In
mathematical language, the probability that a This makes sense intuitively: it is a formal way
random Melbournian is a Python coder is low, of saying that the likelihood of A does not de-
but the probability that a random Melbournian pend on whether or not B occurs.
is a Python coder given that they own a “Hello,
world!” t-shirt is high. 22.3 Independent repeated trials
Generally if we perform exactly the same action
22.1 Conditional probability multiple times, the results for each trial will be
independent of the others. For example, if we
Conditional probabilities measure the likelihood
roll a die twice, then the result of the first roll
of an event, given that some other event occurs.
will be independent of the result of the second.
For two independent repeated trials, each
from a sample space S, our overall sample
For events A and B, the conditional probabil- space is S × S and our probability function will
ity of A given B is be given by Pr((s1 , s2 )) = Pr(s1 )Pr(s2 ). For
Pr(A|B) = Pr(A∩B)
Pr(B) . three independent repeated trials the sample
space is S × S × S and the probability function
This definition also implies that Pr((s1 , s2 , s3 )) = Pr(s1 )Pr(s2 )Pr(s3 ), and so on.
Pr(A ∩ B) = Pr(A|B)Pr(B). Example. The spinner from the previous ex-
ample is spun twice. What is the probability
Example. The spinner from the last lesson is that the results add to 5?
spun. Let A be the event that the result was at A total of 5 can be obtained as (1,4), (4,1), (2,3)
or (3,2). Because the spins are independent: Example. A binary string is created so that the
1 1 1 first bit is a 0 with probability 31 and then each
Pr((1, 4)) = Pr((4, 1)) = 2 × 8 = 16 subsequent bit is the same as the preceding one
1 1 1
Pr((2, 3)) = Pr((3, 2)) = 4 × 8 = 32 with probability 34 . What is the probability that
So, because (1,4), (4,1), (2,3) and (3,2) are mu- the first bit is 0, given that the second bit is 0?
tually exclusive, the probability of the total be- Let F be the event that the first bit is 0 and
1 1 1 1 3 let S be the event that the second bit is 0. So
ing 5 is 16 + 16 + 32 + 32 = 16 .
Pr(F ) = 13 . If F occurs then the second bit will
be 0 with probability 43 and so Pr(S|F ) = 34 . If
22.4 Bayes’ theorem F does not occur then the second bit will be 0
with probability 14 and so Pr(S|F ) = 14 . So, by
Bayes’ theorem gives a way of calculating the
Bayes theorem,
conditional probability of an event A given an
event B when we already know the probabilities Pr(F |S) = Pr(F )Pr(S|F )
of A, of B given A, and of B given A. Pr(F )Pr(S|F )+Pr(F )Pr(S|F )
1
× 34
= 1
3
× 4 + 23 × 14
3
3

Bayes’ theorem. For the events A and B, = ( 14 )/( 12


5
)
Pr(B|A)Pr(A)
Pr(A|B) = . = 35 .
Pr(B|A)Pr(A) + Pr(B|A)Pr(A)

Note that the denominator above is simply an Questions


expression for Pr(B). The fact that
22.1 An integer is selected uniformly at random
Pr(B) = Pr(B|A)Pr(A) + Pr(B|A)Pr(A) from the set {1, 2, . . . , 15}. What is the
is due to the law of total probability. probability that it is divisible by 5, given
that it is odd?

22.5 Bayes’ theorem examples 22.2 A standard die is rolled twice. What is the
probability that the first roll is a 1, given
Example. Luke Skywalker discovers that some that the sum of the rolls is 6?
porgs have an extremely rare genetic mutation
that makes them powerful force users. He de- 22.3 A bag contains three black marbles and
velops a test for this mutation that is right 99% two white marbles and they are randomly
of the time and decides to test all the porgs on selected and removed, one at a time un-
Ahch-To. Suppose there are 100 mutant porgs til the bag in empty. Use Bayes’ theo-
in the population of 24 million. We would guess rem to calculate the probability that the
that the test would come up positive for 99 of the first marble selected is black, given that
100 mutants, but also for 239 999 non-mutants. the second marble selected is black.
We are assuming that the conditional prob-
ability of a porg testing positive given it’s a mu-
tant is 0.99. But what is the conditional prob-
ability of it being a mutant given that it tested
positive? From our guesses, we would expect
99
this to be 99+239999 ≈ 0.0004. Bayes’ theorem
gives us a way to formalise this:

Pr(P |M )Pr(M )
Pr(M |P ) = Pr(P |M )Pr(M )+Pr(P |M )Pr(M )
100
×0.99
= 100
24000000
×0.99+(1− 100
)×0.01
24000000 24000000

99
= 99+239999

≈ 0.0004.
Lesson 23: Random variables

In a game, three standard dice will be rolled Example. A standard die is rolled three times.
and the number of sixes will be recorded. We Let X be the number of sixes rolled. What is the
could let X stand for the number of sixes rolled. probability distribution of X? Obviously X can
Then X is a special kind of variable whose value only take values in {0, 1, 2, 3}. Each roll there
is based on a random process. These are called is a six with probability 16 and not a six with
random variables. probability 65 . The rolls are independent.
Because the value of X is random, it doesn’t 5 5 5
Pr(X = 0) = 6 × 6 × 6
make sense to ask whether X = 0, for exam-
ple. But we can ask what the probability is that Pr(X = 1) = ( 61 )( 65 )( 56 ) + ( 56 )( 16 )( 56 ) + ( 56 )( 56 )( 16 )
X = 0 or that X ⩾ 2. This is because “X = 0” Pr(X = 2) = ( 16 )( 61 )( 56 ) + ( 16 )( 56 )( 16 ) + ( 56 )( 16 )( 16 )
and “X ⩾ 2” correspond to events from our 1 1 1
sample space. Pr(X = 3) = 6 × 6 × 6
So the probability distribution of X is
23.1 Formal definition x 0 1 2 3
125 75 15 1
.
Formally, a random variable is defined as a func- Pr(X = x) 216 216 216 216
tion from the sample space to R. In the example
above, X is a function from the process’s sample 23.3 Independence
space that maps every outcome to the number
of sixes in that outcome. We have seen that two events are independent
when the occurrence or non-occurrence of one
Example. Let X be the number of 1s in event does not affect the likelihood of the other
a binary string of length 2 chosen uniformly occurring. Similarly two random variables are
at random. Formally, X is a function from independent if the value of one does not affect
{00, 01, 10, 11} to {0, 1, 2} such that the likelihood that the other will take a certain
value.
X(00) = 0, X(01) = 1, X(10) = 1, X(11) = 2.
For most purposes, however, we can think of X Random variables X and Y are independent
as simply a special kind of variable. if, for all x and y,
Pr(X = x ∧ Y = y) = Pr(X = x)Pr(Y = y).
23.2 Probability distribution
We can describe the behaviour of a random vari-
Example. An integer is generated uniformly at
able X by listing, for each value x that X can
random from the set {10, 11, . . . , 29}. Let X and
take, the probability that X = x. This gives the
Y be its first and second (decimal) digit. Then
probability distribution of the random variable.
X and Y are independent random variables be-
Again, formally this listing is a function from
cause, for x ∈ {1, 2} and {0, 1, . . . , 9},
the values of X to their probabilities.
1
Pr(X = x ∧ Y = y) = 20
Example. Continuing with the last example, 1 1
= 2 × 10
the probability distribution of X is given by
= Pr(X = x)Pr(Y = y).
 1

 4 if x = 0
Pr(X = x) = 1
if x = 1 23.4 Operations
 2
1
if x = 2.

4 From a random variable X, we can create new
It can be convenient to give this as a table: random variables such as X + 1, 2X and X 2 .
These variables work as you would expect them
x 0 1 2 to.
1 1 1
.
Pr(X = x) 4 2 4
Example. If X is the random variable with dis- Questions
tribution
23.1 An elevator is malfunctioning. Every
x −1 0 1 minute it is equally likely to ascend one
1 1 1
,
Pr(X = x) 6 3 2
floor, descend one floor, or stay where it
is. When it begins malfunctioning it is on
then the distributions of X + 1, 2X and X 2 are
level 5. Let X be the level it is on three
y 0 1 2 minutes later. Find the probability distri-
Pr(X + 1 = y) 1 1 1 bution for X.
6 3 2

y −2 0 2 y 0 1 23.2 An integer is generated uniformly at ran-


1 1 1 1 2
. dom from the set {11, 12, . . . , 30}. Let X
Pr(2X = y) 6 3 2 Pr(X 2 = y) 3 3
and Y be its first and second (decimal)
digit. Are the random variables X and Y
23.5 Sums and products
independent?
From random variables X and Y we can define a
23.3 Let X and Y be independent random vari-
new random variable Z = X + Y . Working out
ables with distributions
the distribution of Z can be complicated, how-
ever. We give an example below of doing this x 0 2
when X and Y are independent. Pr(X = x) 1 3
4 4

Example. Let X and Y be independent ran-


dom variables with y 0 1 2 3
1 1 1 1
.
x 0 1 2 y 0 1 2 3 Pr(Y = y) 4 4 4 4
1 1 1 1 1 1 1
.
Pr(X = x) 4 2 4 Pr(Y = y) 6 3 3 6 Find the probability distribution of Z =
X +Y.
Let Z = X + Y . To find Pr(Z = z) for some
value of z, we must consider all the ways that
X + Y could equal z. For example, X + Y = 3
could occur as (X, Y ) = (0, 3), (X, Y ) = (1, 2)
or (X, Y ) = (2, 1). Because X and Y are inde-
pendent, we can find the probability that each
of these occur
1 1 1
Pr(X = 0 ∧ Y = 3) = 4 × 6 = 24 ,
Pr(X = 1 ∧ Y = 2) = 12 × 13 = 61 ,
Pr(X = 2 ∧ Y = 1) = 41 × 13 = 12
1
.
So, because the three are mutually exclusive,
1 1 1 7
Pr(Z = 3) = + + = .
24 6 12 24
Doing similar calculations for each possible
value, we see that the distribution of Z is
z 0 1 2 3 4 5
1 1 7 7 1 1
.
Pr(Z = z) 24 6 24 24 6 24

The distribution of a product of two indepen-


dent random variables can be found in a similar
way.
Finding the distribution of sums or prod-
ucts of dependent random variables is even more
complicated. In general, this requires knowing
the probability of each possible combination of
values the variables can take.
Lesson 24: Expectation and variance

A standard die is rolled some number of Then


times and the average of the rolls is calculated. 2 1 1
E[X] = × −10 + ×4+ × 10 = −1
If the die is rolled only once this average is just 5 2 10
the value rolled and is equally likely to be 1, 2, Because this value is negative, Acme shares will
3, 4, 5 or 6. If the die is rolled ten times, then almost certainly decrease in value over the long
the average might be between 1 and 2 but this term.
is pretty unlikely – it’s much more likely to be Notice that it was important that we
between 3 and 4. If the die is rolled ten thou- weighted our average using the probabilities
sand times, then we can be almost certain that here. If we had just taken the average of -10,
the average will be very close to 3.5. We will 4 and 10 we would have gotten the wrong an-
see that 3.5 is the expected value of a random swer by ignoring the fact that some values were
variable representing the die roll. more likely than others.

24.1 Expected value 24.2 Law of large numbers


When we said “average” above, we really meant Our initial die-rolling example hinted that the
“mean”. Remember that the mean of a collec- average of a large number of independent trials
tion of numbers is the sum of the numbers di- will get very close to the expected value. This
vided by how many of them there are. So the is mathematically guaranteed by a famous the-
mean of x1 , . . . , xt is x1 +···+x
t
t
. The mean of 2,2,3 orem called the law of large numbers.
2+2+3+11
and 11 is 4 = 4.5, for example.
The expected value of a random variable is Let X1 , X2 , . . . be independent random vari-
calculated as a weighted average of its possible ables, all with the same distribution and ex-
values. pected value µ. Then
lim 1 (X1 + · · · + Xn ) = µ.
n→∞ n
If X is a random variable with distribution
x x1 x2 · · · xt
,
Pr(X = x) p1 p2 · · · pt 24.3 Linearity of expectation
then the expected value of X is We saw in the last lesson that adding random
E[X] = p1 x1 + p2 x2 + · · · + pt xt . variables can be difficult. Finding the expected
value of a sum of random variables is easy if we
know the expected values of the variables.
Example. If X is a random variable represent-
ing a die roll, then
1 1 1
If X and Y are random variables, then
E[X] = ×1+ × 2 + ··· + × 6 = 3.5.
6 6 6 E[X + Y ] = E[X] + E[Y ].
Example. Someone estimates that each year
the share price of Acme Corporation has a 10% This works even if X and Y are not independent.
chance of increasing by $10, a 50% chance of in- Similarly, finding the expected value of a
creasing by $4, and a 40% chance of falling by scalar multiple of a random variable is easy if
$10. Assuming that this estimate is good, are we know the expected value of the variable.
Acme shares likely to increase in value over the
long term?
If X is a random variable and s ∈ R, then
We can represent the change in the Acme
share price by a random variable X with distri- E[sX] = sE[X].
bution
x −10 4 10 Example. Two standard dice are rolled. What
2 1 1
.
Pr(X = x) 5 2 10
is the expected total?
1
Let X1 and X2 be random variables repre- value with probability 100 . So
senting the first and second die rolls. From the 99 1
earlier example E[X1 ] = E[X2 ] = 3.5 and so Var[X] = × (−1)2 + × 992 = 99.
100 100
Similarly,
E[X1 + X2 ] = E[X1 ] + E[X2 ] = 3.5 + 3.5 = 7.
1 1
Var[Y ] = 2 × (−1)2 + 2 × 12 = 1
Example. What is the expected number of ‘11’ 1 1
Var[Z] = 2 × (−50)2 + 2 × 502 = 2500.
substrings in a binary string of length 5 chosen
uniformly at random? Notice that the variance of X is much smaller
For i = 1, . . . , 4, let Xi be a random vari- than the variance of Z because X is very likely
able that is equal to 1 if the ith and (i + 1)th to be close to its expected value whereas Z will
bits of the string are both 1 and is equal to 0 certainly be far from its expected value.
otherwise. Then X1 + · · · + X4 is the number
Example. Let X be a random variable with
of ‘11’ substrings in the string. Because the bits
distribution given by
are independent, Pr(Xi = 1) = 21 × 12 = 14 and
E[Xi ] = 41 for i = 1, . . . , 4. So, x 0 2 6
1 1 1
.
4 Pr(X = x) 6 2 3
E[X1 + · · · + X4 ] = E[X1 ] + · · · + E[X4 ] = = 1.
4 Then the expected value of X is
Note that the variables X1 , . . . , X4 in the above 1 1 1
example were not independent, but we were still E[X] = ×0+ ×2+ × 6 = 3.
6 2 3
allowed to use linearity of expectation. So, the variance of X is
1 1 1
Var[X] = ×(0−3)2 + ×(2−3)2 + ×(6−3)2 = 5.
24.4 Variance 6 2 3

Think of the random variables X, Y and Z Questions


whose distributions are given below.
24.1 Do you agree or disagree with the following
x −1 99 y −1 1 statement? “The expected value of a ran-
99 1 1 1
Pr(X = x) 100 100 Pr(Y = y) 2 2 dom variable is the value it is most likely
to take.”
z −50 50
1 1
Pr(Z = z) 2 2 24.2 Let X be the sum of 1000 spins of the
spinner from Lesson 21, and let Y be 1000
These variables are very different. Perhaps X
times the result of a single spin. Find E[X]
corresponds to buying a raffle ticket, Y to mak-
and E[Y ]. Which of X and Y do you think
ing a small bet on a coin flip, and Z to making
would have greater variance?
a large bet on a coin flip. However, if you only
consider expected value, all of these variables 24.3 Let X be the number of heads occurring
look the same – they each have expected value when three fair coins are flipped. Find
0. E[X] and Var[X].
To give a bit more information about a ran-
dom variable we can define its variance, which
measures how “spread out” its distribution is.

If X is a random variables with E[X] = µ,


Var[X] = E[(X − µ)2 ].

So the variance is a measure of how much we


expect the variable to differ from its expected
value.

Example. The variable X above will be 1


smaller than its expected value with probabil-
99
ity 100 and will be 99 larger than its expected
Lesson 25: Discrete distributions

In this lesson we’ll introduce some of the 25.3 Geometric distribution


most common and useful (discrete) probability
This distribution gives the probability that, in a
distributions. These arise in various different
sequence of independent Bernoulli trials, we see
real-world situations.
exactly k failures before the first success.

25.1 Discrete uniform distribution The geometric distribution with parameter


p ∈ [0, 1] is given by
This type of distribution arises when we choose
Pr(X = k) = p(1 − p)k for k ∈ N.
one of a set of consecutive integers so that all
choices are equally likely. 1−p 1−p
We have E[X] = p and Var[X] = p2
.

Geometric distribution with p = 0.5


The discrete uniform distribution with pa-
rameters a, b ∈ Z (a ⩽ b) is given by 0.5
1
Pr(X = k) = b−a+1 for k ∈ {a, a + 1, . . . , b}.
0.4
a+b (b−a+1)2 −1
We have E[X] = and Var[X] = .
Pr(X = k)

2 12 0.3
Uniform distribution with a = 3, b = 8
0.2
0.2

0.1
0.15
Pr(X = k)

0
0 2 4 6 8 10 12
0.1 k

0.05
Example. If every minute there is a 1% chance
that your internet connection cuts out then the
probability of staying online for exactly x con-
0 secutive minutes is approximated by a geometric
0 2 4 6 8 10
k distribution with p = 0.01. It follows that the
expected value is 1−0.01
0.01 = 99 minutes and the
variance is 1−0.01
(0.01)2
= 9900.
25.2 Bernoulli distribution
25.4 Binomial distribution
This type of distribution arises when we have a
single process that succeeds with probability p This distribution gives the probability that, in
and fails otherwise. Such a process is called a a sequence of n independent Bernoulli trials, we
Bernoulli trial. see exactly k successes.

The Bernoulli distribution with parameter The binomial distribution with parameters
p ∈ [0, 1] is given by n ∈ Z+ and p ∈ [0, 1] is given by
(
Pr(X = k) = nk pk (1 − p)n−k

p for k = 1
Pr(X = k) =
1 − p for k = 0. for k ∈ {0, . . . , n}.

We have E[X] = p and Var[X] = p(1 − p). We have E[X] = np and Var[X] = np(1 − p).
Binomial distribution with n = 20, p = 0.5 λ = 6 approximates the probability it receives
k calls in a certain minute. It follows that the
0.2 expected value is 6 calls and the variance is 6.

0.15 Questions
Pr(X = k)

25.1 There is a 95% chance of a packet being


0.1 received after being sent down a noisy line,
and the packet is resent until it is received.
0.05 What is the probability that the packet is
received within the first three attempts?

0 25.2 A factory aims to have at most 2% of the


0 5 10 15 20 components it makes be faulty. What is
k
the probability of a quality control test of
Example. If 1000 people search a term on a 20 random components finding that 2 or
certain day and each of them has a 10% chance more are faulty, if the factory is exactly
of clicking a sponsored link, then the number of meeting its 2% target?
clicks on that link is approximated by a binomial
25.3 The number of times a machine needs ad-
distribution with n = 1000 and p = 0.1. It fol-
justing during a day approximates a Pois-
lows that the expected value is 1000 × 0.1 = 100
son distribution, and on average the ma-
clicks and the variance is 1000 × 0.1 × 0.9 = 90.
chine needs to be adjusted three times per
day. What is the probability it does not
25.5 Poisson distribution
need adjusting on a particular day?
In many situations where we know that an aver-
age of λ events occur per time period, this dis-
tribution gives a good model of the probability
that k events occur in a time period.

The Poisson distribution with parameter


λ ∈ R (where λ > 0) is given by
λk e−λ
Pr(X = k) = for k ∈ N.
k!

We have E[X] = λ and Var[X] = λ.


Poisson distribution with λ = 4

0.2

0.15
Pr(X = k)

0.1

0.05

0
0 2 4 6 8 10 12 14 16
k

Example. If a call centre usually receives 6 calls


per minute, then a Poisson distribution with
Lesson 26: Recursion

Just as the structure of the natural numbers Remark. Using a recursive program to com-
supports induction as a method of proof, it sup- pute Fibonacci numbers can easily lead to stack
ports induction as a method of definition or of overflow, because each value depends on two
computation. previous values (each of which depends on an-
When used in this way, induction is usually other two, and so on).
called recursion, and one speaks of a recursive A more efficient way to use the recursive defi-
definition or a recursive algorithm. nition is to use three variables to store F (k + 1),
F (k) and F (k − 1). The new values of these
26.1 Recursive Definitions variables, as k increases by 1, depend only on
the three stored values, not on all the previous
Many well known functions f (n) are most easily
values.
defined in the “base step, induction step” for-
mat, because f (n + 1) depends in some simple
way on f (n).
The induction step in the definition is more 26.2 Properties of recursively defined
commonly called the recurrence relation for f , functions
and the base step the initial value.
These are naturally proved by induction, using
Example. The factorial f (n) = n! a base step and induction step which parallel
those in the definition of the function.
Initial value. 0! = 1.
Recurrence relation. (k + 1)! = (k + 1) × k!
Many programming languages allow this Example. For n ⩾ 5, 10 divides n!
style of definition, and the value of the func-
tion is then computed by a descent to the initial
value. Proof Base step.
For example, to compute 4!, the machine 5! = 5 × 4 × 3 × 2 × 1 = 10 × 4 × 3,
successively computes
hence 10 divides 5!.
4! = 4 × 3! Induction step. We have to show
= 4 × (3 × 2!) 10 divides k! =⇒ 10 divides (k + 1)!
= 4 × (3 × (2 × (1!)))
Since (k + 1)! = (k + 1) × k! by the recurrence
= 4 × (3 × (2 × (1 × 0!))) relation for factorial, the induction step is clear,
which can finally be evaluated since 0! = 1. and hence the induction is complete.
Remark. The numbers 4, 3, 2, 1 have to be
stored on a “stack” before the program reaches Example. F (0) + F (1) + · · · + F (n) = F (n +
the initial value 0! = 1 which finally enables it 2) − 1.
to evaluate 4!.
Thus a recursive program, though short,
may run slowly and even cause “stack overflow.” Proof Base step. F (0) = 0 = F (2) − 1,
because F (2) = 1.
Induction step. We have to show
Example. The Fibonacci sequence
0, 1, 1, 2, 3, 5, 8, . . . F (0) + F (1) + · · · + F (k)
= F (k + 2) − 1
The nth number F (n) in this sequence is defined
by ⇒ F (0) + F (1) + · · · + F (k + 1)
= F (k + 3) − 1.
Initial values. F (0) = 0, F (1) = 1.
Recurrence relation. F (k +1) = F (k)+F (k −1).
Well, hence it follows (by induction) that
F (0) + F (1) + · · · + F (k) f (n) = number of n bit strings
= F (k + 2) − 1 with no consecutive 0s
⇒ F (0) + F (1) + · · · + F (k + 1) = F (n + 2).
= F (k + 2) + F (k + 1) − 1,
by adding F (k + 1) to both sides Questions
⇒ F (0) + F (1) + · · · + F (k + 1) 26.1 A function s(n) is defined recursively by
= F (k + 3) − 1 Initial value: s(0) = 0
since F (k + 2) + F (k + 1) = F (k + 3) Recurrence relation: s(n + 1) = s(n) + 2n + 1
by the Fibonacci recurrence relation Write down the first few values of s(n),
This completes the induction. and guess an explicit formula for s.

26.2 Check that the formula you guessed in


26.3 Problems with recursive solu-
Question 26.1 satisfies
tions
s(0) = 0 and s(n + 1) = s(n) + 2n + 1
Sometimes a problem about n reduces to a prob-
lem about n − 1 (or smaller numbers), in which (This proves by induction that your guess
case the solution may be a known recursively is correct.)
defined function.
26.3 If a sequence satisfies the Fibonacci recur-
Example. Find how many n-bit binary rence relation,
strings contain no two consecutive 0s. f (n) = f (n − 1) + f (n − 2),
We can divide this problem into two cases. must it agree with the Fibonacci sequence
from some point onward?
1. Strings which end in 1, e.g. 1101101.
In this case, the string before the 1 (110110
here) can be any (n − 1) bit string with no
consecutive 0s.

2. Strings which end in 0, e.g. 1011010.


In this case, the string must actually end
in 10, to avoid consecutive 0s, but the
string before the 10 (10110 here) can be
any (n − 2) bit string with no consecutive
0s.

Thus among strings with no consecutive 0s we


find

1. Those with n bits ending in 1 correspond


to those with (n − 1) bits.

2. Those with n bits ending in 0 correspond


to those with (n − 2) bits.

Hence if we let f (n) be the number of such


strings with n bits we have
f (n) = f (n − 1) + f (n − 2).
This is the Fibonacci recurrence relation.
It can also be checked that
f (1) = 2 = F (3) and f (2) = 3 = F (4),
Lesson 27: Recursive Algorithms

Recursion may be used to define functions 27.2 Products


whose definition normally involves “ · · · ”, to give
algorithms for computing these functions, and to
Example. 1 × 2 × 3 × · · · × n
prove some of their properties.
This is the function n! defined recursively by
27.1 Sums Initial value. 0! = 1
Recurrence relation. (k + 1)! = (k + 1) × k!
Example. 1 + 2 + 3 + · · · + n
This is the function f (n) defined by
Initial value. f (1) = 1
Recurrence relation. f (k + 1) = f (k) + (k + 1)

Example. 1 + a + a2 + · · · + an
27.3 Sum and product Notation
This is the function g(n) defined by
Initial value. g(0) = 1 n
Recurrence relation. g(k + 1) = g(k) + ak+1
X
1 + 2 + 3 + · · · + n is written k,
k=1
We can use this relation to prove by induc-
n+1 n
tion that g(n) = a a−1−1 (a formula for the sum X
1 + a + a2 + · · · + an is written ak .
of a geometric series), provided a ̸= 1. k=0

Proof Σ is capital sigma, standing for “sum.”


a0+1 −1 n
Base step. For n = 0, 1 = g(0) = a−1 , as Y
1 × 2 × 3 × · · · × n is written k.
required.
k=1
Induction step. We want to prove
Π is capital pi, standing for “product.”
ak+1 − 1 ak+2 − 1
g(k) = ⇒ g(k + 1) = .
a−1 a−1
Well,
ak+1 − 1
g(k) =
a−1
ak+1 − 1 27.4 Binary search algorithm
⇒ g(k + 1) = + ak+1
a−1
ak+1 − 1 + (a − 1)ak+1
⇒ g(k + 1) = Given a list of n numbers in order
a−1
a k+2 +a k+1 − ak+1 − 1 x1 < x2 < · · · < xn ,
=
a−1 we can find whether a given number a is in the
a k+2 −1
= as required. list by repeatedly “halving” the list.
a−1 The algorithm binary search is specified
This completes the induction. recursively by a base step and a recursive step.
More generally, binary search needs at most
Base step. If the list is empty,
⌊log2 n⌋ “halvings” to search a list of n numbers,
report ‘a is not in the list.’
where ⌊log2 n⌋ is the floor of log2 n, the greatest
integer ⩽ log2 n.
Recursive step If the list is not empty, see
whether its middle element is a. If so, report Remark. In an alphabetical list of 1,000,000
‘a found.’ names, which is just under 220 , it takes at most
Otherwise, if the middle element m > a, 19 halvings (using alphabetical order) to find
binary search the list of elements < m. whether a particular name is present.
And if the middle element m < a, binary
search the list of elements > m. 27.7 20 questions
A mathematically ideal way to play 20 questions
27.5 Correctness would be to divide the number of possibilities in
half with each question.
We prove that the algorithm works on a list of E.g. if the answer is an integer, do binary
n items by strong induction on n. search on the list of possible answers. If the
Base step. The algorithm works correctly on answer is a word, do binary search on the list
a list of 0 numbers, by reporting that a is not in of possible answers (ordered alphabetically). If
the list. this is done, then 20 questions suffice to find the
Induction step. Assuming the algorithm correct answer out of 220 = 1, 048, 576 possibili-
works correctly on any list of < k + 1 numbers, ties.
suppose we have a list of k + 1 numbers.
The recursive step either finds a as the mid-
Questions
dle number in the list, or else produces a list of
P
< k +1 numbers to search, which by assumption 27.1 Rewrite the following sums using nota-
it will do correctly. tion.
This completes the induction.
• 1 + 4 + 9 + 16 + · · · + n2
Remark. This example shows how easy it is to • 1 − 2 + 3 − 4 + · · · − 2n
prove correctness of recursive algorithms, which
may be why they are popular despite the prac- 27.2 Which of the proofs in this lesson uses
tical difficulties in implementing them. strong induction?

27.3 Imagine a game where the object is to


27.6 Running time identify a natural number between 1 and
log2 n is the number x such that 220 using 20 questions with YES-NO an-
swers. The lesson explains why 20 ques-
n = 2x . tions are sufficient to identify any such
For example, 1024 = 210 , and therefore number.
Explain why less than 20 YES-NO ques-
log2 1024 = 10.
tions are not always sufficient.
Similarly log2 512 = 9, and hence log2 1000 is
between 9 and 10.
Repeatedly dividing 1000 by 2 (and discard-
ing remainders of 1) runs for 9 steps:
500, 250, 125, 62, 31, 15, 7, 3, 1
The 10 halving steps for 1024 are
512, 256, 128, 64, 32, 16, 8, 4, 2, 1
This means that the binary search algorithm
would do at most 9 “halvings” in searching a
list of 1000 numbers and at most 10 “halvings”
for 1024 numbers.
Lesson 28: Recursion, lists and sequences

A list or sequence of objects from a set X is “Unfolding” tn , we see that multiplication


a function f from {1, 2, . . . , n} to X, or by r is done n − 1 times, hence
(if infinite) from {1, 2, 3, . . .} to X.
tn = arn−1 .
We usually write f (k) as xk and the list as
⟨x1 , x2 , . . . , xn ⟩, or ⟨x1 , x2 , x3 , . . .⟩. Thus The above recurrence relations are called
first order because tk+1 depends on only the
f (1) = x1 = first item on list
previous value, tk . (Or, because the values of
f (2) = x2 = second item on list all terms follow from one initial value.)
.. A second order recurrence relation requires
.
two initial values, and is usually harder to un-
The empty list is written ⟨⟩. fold.
Example. Example. A simple sequence in disguise
⟨m, a, t, h, s⟩ is a function f from {1, 2, 3, 4, 5}
into the English alphabet, with f (1) = m, Initial values. t0 = 1, t1 = 2
f (2) = a, etc. Recurrence relation. tk+1 = 2tk − tk−1

28.1 Sequences Calculating the first values, we find

A sequence is also a list, but when we use the t2 = 2t1 − t0 = 2 × 2 − 1 = 3,


term sequence we are usually interested in the t3 = 2t2 − t1 = 2 × 3 − 2 = 4,
rule by which the successive terms t1 , t2 , . . . are t4 = 2t3 − t2 = 2 × 4 − 3 = 5.
defined.
Often, the rule is a recurrence relation.
It looks like tn = n + 1, and indeed we can
Example. Arithmetic sequence prove this by induction. For the base step we
a, a + d, a + 2d, a + 3d, . . . have the initial values t0 = 1 = 0 + 1 and
t1 = 2 = 1 + 1. We do the induction step by
strong induction: assuming tn = n + 1 for all
This is defined by n ⩽ k, we deduce that tk+1 = k + 2.
Initial value. t1 = a In fact we have
Recurrence relation. tk+1 = tk + d
tk+1 = 2tk − tk−1
“Unfolding” this recurrence relation from tn by the recurrence relation
back to t1 , we see that d gets added n − 1 times, = 2(k + 1) − k
hence
by our assumption
tn = a + (n − 1)d. = 2k + 2 − k = k + 2
as required.
Example. Geometric sequence This completes the induction.
a, ar, ar2 , ar3 , . . . Example. Fibonacci sequence
Initial value. t1 = a
Recurrence relation. tk+1 = rtk Initial values. t0 = 0, t1 = 1
Recurrence relation. tk+1 = tk + tk−1 28.2 Let Tn be the number of ways of tiling a
2 × n strip with 2 × 1 tiles (which may be
It is possible to write tn directly as a func- rotated so they are 1 × 2). Find Tn for
tion of n. The function
√ is not at all obvious, n = 1, 2, 3, 4. Find a recurrence relation
because it involves 5: for Tn .
√ √ !
1 1 + 5 n 1 − 5 n
tn = √ − .
5 2 2
We do not go into how to find such a formula in
this unit. However, if someone gave you such a
formula you could prove it is correct by induc-
tion.

28.2 Relations – homogeneous and in-


homogeneous
Recurrence relations such as
tk+1 = 2tk
and
tk+1 = tk + tk−1
in which each term is a multiple of some tj , are
called homogeneous.
The characteristic property of any homoge-
neous equation is that if tn = f (n) is a solution,
then so is tn = cf (n), for any constant c.
E.g. tn = 2n is a solution of tk+1 = 2tk , and
so is tn = c2n , for any constant c.
Relations like tk+1 = tk + 3, in which there
is a term other than the tj terms, are called in-
homogeneous.
Homogeneous recurrence relations are usu-
ally easier to solve, and in fact there is a gen-
eral method for solving them (which we will not
cover in this unit).
There is no general method for solving in-
homogeneous recurrence relations, though they
can often be solved if the term other than the tj
terms is simple.

Questions
28.1 Find the next four values of each of the fol-
lowing recurrence relations. What order is
each recurrence relation? Which are ho-
mogeneous and which are inhomogeneous?

(a) rk+1 = rk + k 2 , r0 = 0.

(b) sk+1 = 3sk − 2sk−1 , s0 = 1, s1 = 2.

(c) tk+1 = tk +tk−2 +1, t0 = 1, t1 = 1, t2 = 1.


Lesson 29: Graphs

allowed to have multiple edges between the same


A graph consists of a finite set of objects
pair of vertices and also edges called loops join-
called vertices together with a set of un-
ing a vertex to itself. (A normal graph is some-
ordered pairs of vertices, called edges.
times called a simple graph to emphasise it is not
Graphs are normally represented by pic- a multigraph.) In directed graphs the edges have
tures, with vertex A represented by a dot la- a direction associated with them. In hypergraphs
belled A and each edge {A, B} represented by edges may join more that two vertices. For var-
a line joining A and B. Sometimes we do not ious problems it is useful to consider graphs
include the vertex labels when the names of the where the vertices and/or edges have weights,
vertices are not important. labels or colours etc. etc. We will focus on the
Such pictures are helpful for displaying data basic definition here, but will occasionally men-
or relationships, and they make it easy to recog- tion these variants.
nise properties which might otherwise not be no-
ticed. 29.3 Important kinds of graphs
The description by sets of vertices and edges
is useful when graphs have to be manipulated A complete graph is a graph in which every
by computer. It is also a useful starting point pair of vertices is joined by an edge.
for precise definitions of graph concepts.
Below are pictures of complete graphs with
29.1 Examples of graphs 2, 3, 4 and 5 vertices.

Description Picture

vertex set: {A, B, C} A


edge set: A path of length ℓ is a graph whose ver-
{{A, B}, {B, C}, {C, A}} tices can be renamed so that its vertex
B C
set is {V1 , . . . , Vℓ+1 } and its edge set is
To save space, an edge {A, B} is often writ- {V1 V2 , V2 V3 , . . . , Vℓ Vℓ+1 }.
ten simply as AB (or, equivalently, BA). We
will do this from now on. We say this is a path from V1 to Vℓ+1 (or,
equivalently, from Vℓ+1 to V1 ). Note that the
Description Picture
length of a path refers to its number of edges,
not its number of vertices. Below are pictures of
vertex set: {A, B, C, D} A paths of lengths 1, 2, 3 and 4.
edge set: {AB, BC, AD,
BD, CD} B D

Warning: A graph can A cycle of length ℓ (for ℓ ⩾ 3) is a graph


be represented by pictures A B whose vertices can be renamed so that its
that look very different. vertex set is {V1 , . . . , Vℓ } and its edge set is
This last example could be C D {V1 V2 , V2 V3 , . . . , Vℓ−1 Vℓ , Vℓ V1 }.
redrawn as:
Below are pictures of cycles of lengths 3, 4,
29.2 Variants of graphs 5 and 6.

Many different variants of graphs are used in


different contexts. For example multigraphs are
A graph is bipartite if its vertices can A graph is bipartite if and only if it has no
be renamed so that its vertex set is subgraph that is an odd-length cycle.
{U1 , U2 , . . . , Ui , V1 , V2 , . . . , Vj } and each of its
edges joins a vertex in {U1 , U2 , . . . , Ui } to a
vertex in {V1 , V2 , . . . , Vj }. 29.5 Connectivity

The graph is a complete bipartite graph A graph G is connected if, for any two of its
with parts of sizes i and j if every vertex in vertices A and B, it has a subgraph that is a
{U1 , U2 , . . . , Ui } is joined by an edge to every path from A to B.
vertex in {V1 , V2 , . . . , Vj }. Below are pictures of
some bipartite graphs. The vertices have been A graph that is not connected is discon-
arranged to the left and right to make it obvious nected. All the examples of graphs we have seen
the graphs are bipartite. so far in this lesson have been connected.
Example. The graph pictured below is discon-
nected. For example it does not contain a path
from D to E.

A D
Below are pictures of some complete bipar-
tite graphs. C B

Questions
29.1 Write the vertex sets and edge sets for the
Notice that all paths are bipartite and cy- graphs corresponding to the following pic-
cles of even length are bipartite. Cycles of odd tures
length are not bipartite, however. A D A B A B

29.4 Subgraphs
B C D C C D
A subgraph of a graph G is a graph whose
29.2 Draw pictures of graphs with the following
vertex set is a subset of the vertex set of G
vertex and edge sets.
and whose edge set is a subset of the edge set
(a) vertex set: {A, B, C, D}
of G.
edge set: {AB, BC, BD}
So a subgraph of a graph G is a graph that (b) vertex set: {A, B, C, D, E}
can be obtained from G by (possibly) deleting edge set: {AB, BC, CA, DE}
edges and/or vertices. Note that every graph is
29.3 What is the maximum number of edges
a subgraph of itself.
that a bipartite graph with 6 vertices can
Example. The graph pictured on the right be- have? What is the maximum number of
low is a subgraph of the graph pictured on the edges that a bipartite graph with n ver-
left. tices can have?
A D A

B C B C

Sometimes important properties of a graph


can be phrased in terms of its subgraphs. For
example, the following is true.
Lesson 30: Walks, paths and trails

30.2 Adjacency matrix


A walk of length ℓ in a graph G is a sequence
of vertices
If two vertices are joined by an edge we say that
V1 , V2 , V3 , . . . , Vℓ , Vℓ+1 they are adjacent.
where there is an edge of G joining vertex Vi A simple graph G with vertices V1 , V2 , . . . , Vn
to vertex Vi+1 for all i ∈ {1, 2, . . . , ℓ}. is described by an adjacency matrix which has
(i, j) entry (ith row and j th column) aij given by
We say that this walk traverses the edges (
1 if Vi is adjacent to Vj in G,
V1 V2 , V2 V3 , . . . , Vℓ Vℓ+1 . If Vℓ+1 = V1 the walk is aij =
said to be closed. Note that the length of the 0 otherwise.
walk counts the number of “steps” and so is one V1
less than the number of vertices in the sequence.
For example, the graph
V2 V3
A trail is a walk that traverses each edge at  
0 0 1
most once.
has adjacency matrix  0 0 1 
 
Remembering the definition of a path from 1 1 0
the last lesson, a walk that visits each vertex at
most once corresponds to a path.

30.1 Examples
30.3 Adjacency matrix powers
In these pictures, a walk is indicated by a di-
rected curve running alongside the actual edges
The product of matrices
in the walk.
   
a11 a12 · · · b11 b12 · · ·
a21 a22 · · · × b21 b22 · · ·
   

··· ··· ··· ··· ··· ···


is the matrix whose (i, j) entry is
A walk which is not a trail or a path. (Repeated
ai1 b1j + ai2 b2j + ai3 b3j + · · · ,
edge, repeated vertex.)
–the “dot product” of the ith row
ai1 ai2 ai3 ···
of the matrix on the left with the jth column
b1j
A trail which is not a path. (Repeated vertex.)
b2j
b3j
..
.
A nonclosed walk and a closed walk. of the matrix on the right.
30.4 The travelling salesman problem
The (i, j) entry in the ℓth power of the ad-
jacency matrix gives the number of walks of Given a connected graph in which each edge has
length ℓ between Vi and Vj . been assigned a positive weight, the travelling
salesman problem asks us to find a walk of the
For example, suppose we want the number of smallest possible weight that visits every ver-
walks of length 2 from V3 to V3 in the graph tex. (The problem’s name comes from thinking
V1 of the vertices as towns and the edge weights as
distances between towns.) For example, BADC
V2 V3
is a solution to the travelling salesman problem
The adjacency matrix M tells us that the fol- on the edge-weighted graph given below
lowing edges exist. C
 
··· ··· 1 ← V1 toV3
5
· · · · · · 1 ← V2 toV3
 
A
1 1 0 6
4 3
↑ ↑
6
V3 V3 B D
to to
V1 V2 Questions
So when we square this matrix, the (3, 3) entry
in M 2 30.1 A graph G has adjacency matrix
   
0 1 1 0 0
1  
1 1 0 1 = 1 × 1 + 1 × 1 = 2
  1 0 1 0 0
 
1 1 0 0 0
 
0  
0 0 0 0 1
counts the walks from V3 to V3 , namely  
0 0 0 1 0
V3 → V1 → V3 and V3 → V2 → V3 .
Decide, without drawing the graph,
Similarly, the (i, j) entry in M 2 is the num-
whether G is connected or not. Then draw
ber of walks of length 2 from Vi to Vj . The (i, j)
G.
entry in M 3 is the number of walks of length 3
from to Vi to Vj , and so on. 30.2 Draw the graph with adjacency matrix
In fact,  
    0 1 1
1 1 0 0 0 1 M = 1 0 0  ,
 
M2 × M =  1 1 0  ×  0 0 1 
   
1 0 0
0 0 2 1 1 0
using V1 , V2 and V3 as names for the ver-
has (3, 2) entry
tices corresponding to columns 1, 2 and 3.
 
0 30.3 Without doing any matrix multiplication,
0 0 2 0 = 2.
 
find M 3 .
1
Hence the number of walks of length 3 from V3
to V2 is 2.
Lesson 31: Degree

The degree of a vertex A in a graph G is the


number of edges of G that include A.

For example, if G is
Lecture 31: Degree
C
Lecture 28. Degree The question came up: is it possible for a
A B walk to cross all seven bridges without crossing
the same bridge twice?
then the degree of B is 2 and the degrees of A An equivalent question is whether there is
The degree of a vertex A in a graph G is the
and C arenumber
bothof1. times A occurs in the list of edges of The questiona came
trailup:which includes
is it possible for a walkall edges in the following

For example, if G is r'>v---....


G. to cross all seven bridges without crossing the
multigraph.
B same bridge twice?

An equivalent question is whether there is a
then the list of edges is AA, AB, AB and hence trail which includes all edges in theA
following
31.1 The handshaking lemma
degree(A) = 4. graph. A
Intuitively speaking, degree(A) is the number
of "ends" of edges occurring at A. In particular, B� D
a loop at A contributes 2 to the degree of A.
B D
In any graph, C
sum of degrees = 2× number of edges.
31.1 The handshaking lemma 31.3 Euler's solution
C
In any graph, Euler (1737) observed that the answer is no, be­
The reason
sum offor the= name
degrees is ofthat
2 x number edges.if each edge
cause

. .
is viewed asThe
a reason
handshake,
for the name is that if each edge is 1. Each time a walk enters and leaves a vertex
viewed as a handshake, 31.3 Euler’s solution
it "uses up" 2 from the degree.
� 2. Hence if all edges are used by the walk, all
then at each vertex V Eulerthe(1737)
vertices except first and observed
last must havethat the answer is no, be-
then at each vertex V = number of hands.
degree(V) even degree.
cause
Hence 3. The seven bridges graph in fact has four ver­
degree(V ) = number of hands. tices of odd degree.
sum of degrees 1. Each time a walk enters and leaves a ver-
Hence = total number of hands
sum of =degrees 31.4 Euler's theorem
tex it “uses up” 2 from the degree.
2 x number of handshakes
The same argument shows in general that
= total number
An important of hands
consequence
A graph with >2. 2 odd
Hence if all
edges are used by the walk, all
degree vertices
= × handshaking
2The number of implies that in
handshakes
lemma any has no trail using all its edges.
graph the sum of degrees is even (being vertices except the first and last must have
2xsomething). Thus it is impossible, e.g. for And a similar argument
evenshows
degree.
An important consequence
a graph to have degrees 1,2,3,4,5.
A graph with odd degree vertices
The handshaking lemma implies thatofin anyhas no closed trail using all its edges.
31.2 The seven bridges
graph the sumKonigsberg of degrees is even (being 3. The seven bridges graph in fact has four
(Because in this case the first and last vertex
2×something). ThusKonigsberg
In 18th century it is impossible,
there were seven forthe same, and its vertices
e.g. are of odd
degree is ''used up" bydegree.
a
bridges connecting islands in the river to the closed trail as follows: 1 at the start, 2 each time
a graph tobanks
have degrees
as follows. 1,2,3,4,5. through, 1 at the end.)

56 31.4 Euler’s theorem


31.2 The seven bridges of Königsberg
A trail that uses every edge of a graph exactly
once is called an Euler trail.
In 18th century Königsberg there were seven
bridges connecting islands in the river to the The argument from the last section shows in
banks as follows. general that
6. Since a graph has only a finite number of
A graph with > 2 odd degree vertices has no
edges, this process eventually halts. The
Euler trail.
result will be a closed trail which uses all
And a similar argument shows the edges (this requires the graph to be
connected, since any unused edge would
be connected to used ones, and thus would
A graph with odd degree vertices has no have eventually been used).
closed Euler trail.

(Because in this case the first and last vertex 31.6 Bridges
are the same, and its degree is “used up” by a A bridge in a connected graph G is an edge
closed trail as follows: 1 at the start, 2 each time whose removal disconnects G. E.g. the edge
through, 1 at the end.) B is a bridge in the following graph.

31.5 The converse theorem B


If, conversely, we have a graph G whose vertices The construction of an Euler trail is improved
all have even degree, must it have an Euler trail? by doing the following (Fleury’s algorithm).
Not necessarily. For example, G might be ˆ Erase each edge as soon as it is used.
the following disconnected graph.
ˆ Use a bridge in the remaining graph only
if there is no alternative.
It turns out, when this algorithm is used, that it
However, if we also assume the graph is con- is not necessary to make any detours. The im-
nected then it must have a closed Euler trail. provement, however, comes at the cost of need-
ing an algorithm to recognise bridges.
A connected graph with no odd degree ver-
tices has a closed Euler trail.
Questions
31.1 For each of the following sequences, con-
Such a closed trail can be constructed as follows: struct a graph whose vertices have those
1. Starting at any vertex V1 , follow a trail t1 degrees, or explain why no such graph ex-
for as long as possible. ists.
2. The trail t1 eventually returns to V1 , be- • 1, 2, 3, 4
cause it can leave any other vertex it en- • 1, 2, 1, 2, 1
ters. (Immediately after the start, V1 has • 1, 2, 2, 2, 1
one “used” edge, and hence an odd num-
ber of “unused” edges. Any other vertex 31.2 How many bridges are there in a path with
has an even number of “unused” edges.) n edges? How many bridges are there in a
cycle with n edges?
3. If t1 does not use all edges, retrace it to
the first vertex V2 where t1 meets an edge 31.3 A graph H has adjacency matrix
not in t1 .  
0 1 1 0 1
4. At V2 add a “detour” to t1 by following a 
1

0 0 1 0
trail out of V2 as long as possible, not using  
1 0 0 0 1
 
edges in t1 . As before, this trail eventually  
returns to its starting point V2 , where we 0 1 0 0 0
 
resume the trail t1 . Let t2 be the trail t1 1 0 1 0 0
plus the detour from V2 .
What are the degrees of its vertices? Does
5. If t2 does not use all the edges, retrace t2 H have an Euler trail? Does H have a
to the first vertex V3 where t2 meets an closed Euler trail?
edge not in t2 . Add a detour at V3 , and so
on.
Lesson 32: Trees

Remarks
A tree is a graph that is connected and has
no subgraph that is a cycle. 1. This proof also shows that any edge in a
tree is a bridge.
For example,
2. Since a tree has one more vertex than edge,
it follows that m trees have m more ver-
tices than edges.

3. The theorem also shows that adding any


edge to a tree (without adding a vertex)
is a tree. creates a cycle. (Since the graph remains
connected, but has too many edges to be
a tree.)
32.1 The number of edges in a tree
These remarks can be used to come up with
A tree with n vertices has n − 1 edges. several equivalent definitions of tree.
Next we see how any connected graph can
The proof is by strong induction on n. be related to trees.
Base step. A tree with 1 vertex has 0 edges
(an edge requires at least 2 vertices). 32.2 Spanning trees
Induction step. Supposing any tree with
j ⩽ k vertices has j −1 edges, we have to deduce A spanning tree of a graph G is a tree that
that a tree with k + 1 vertices has k edges. is subgraph of G and includes every vertex of
Well, given a tree Tk+1 with k + 1 vertices, G.
we consider any edge e in Tk+1 , e.g.
For example,

A e B

is a spanning tree of
Removing e disconnects the ends A and B
of e. (If they were still connected, by some path
p, then p and e together would form a cycle in
Tk+1 , contrary to its being a tree.)
Thus Tk+1 − {e} consists of two trees, say Ti
and Tj with i and j vertices respectively. We Any connected graph G contains a spanning
have i + j = k + 1 but both i, j ⩽ k, so our tree.
induction assumption gives
This is proved by induction on the number of
Ti has i − 1 edges, Tj has j − 1 edges. edges.
But then Tk+1 = Ti ∪ Tj ∪ {e} has Base step. If G has no edge but is connected
(i − 1) + (j − 1) + 1 = (i + j) − 1 = k edges, as then it consists of a single vertex. Hence G itself
required. is a spanning tree of G.
Induction step. Suppose any connected 4. The algorithm always works, though this
graph with ⩽ k edges has a spanning tree, and is not obvious, and the proof is not re-
we have to find a spanning tree of a connected quired for this course. (You can find it,
graph Gk+1 with k + 1 edges. e.g. in Chartrand’s Introductory Graph
If Gk+1 has no cycle then Gk+1 is itself a Theory.)
tree, hence a spanning tree of itself.
If Gk+1 has a cycle p we can remove any edge 5. Another problem which can solved by a
e from p and Gk+1 − {e} is connected (because “greedy” algorithm is splitting a natural
vertices previously connected via e are still con- number n into powers of 2. Begin by
nected via the rest of p). Since Gk+1 − {e} has subtracting the largest such power 2m ⩽
one edge less, it contains a spanning tree T by n from n, then repeat the process with
induction, and T is also a spanning tree of Gk+1 . n − 2m , etc.

Remark It follows from these two theorems Questions


that a graph G with n vertices and n − 2 edges 32.1 Which of the following graphs are trees?
(or less) is not connected. In each case we insist that m ̸= n.
If it were, G would have a spanning tree T ,
with the same n vertices. But then T would • vertex set {1, 2, 3, 5, 7}
have n − 1 edges, which is impossible, since it is an edge between m and n
more than the number of edges of G. if m divides n or n divides m
• vertex set {1, 2, 3, 4, 5}
32.3 The greedy algorithm an edge between m and n
Given a connected graph with weighted edges, if m divides n or n divides m
a minimal weight spanning tree T of G may be • vertex set {2, 3, 4, 5, 6}
constructed as follows. an edge between m and n
if m divides n or n divides m
1. Start with T empty.
32.2 Find spanning trees of the following
2. While T is not a spanning tree for graphs (cube and dodecahedron).
G, add to T an edge ek+1 of minimal
weight among those which do not cre-
ate a cycle in T , together with the ver-
tices of ek+1 .

This is also known as Kruskal’s algorithm.

Remarks
32.3 Also find spanning trees of the cube and
1. T is not necessarily a tree at all steps of dodecahedron which are paths.
the algorithm, but it is at the end.

2. For a graph with n vertices, the algorithm


runs for n − 1 steps, because this is the
number of edges in a tree with n vertices.

3. The algorithm is called “greedy” because


it always takes the cheapest step available,
without considering how this affects future
steps. For example, an edge of weight 4
may be chosen even though this prevents
an edge of length 5 being chosen at the
next step.
Lesson 33: Trees, queues and stacks

To search a graph G systematically, it helps each vertex v to a “predecessor” among the ad-
to have a spanning tree T , together with an or- jacent vertices of v already in T . An arbitrary
dering of the vertices of T . vertex is chosen as the root V0 of T .

1. Initially, T = tree with just one vertex V0 ,


33.1 Breadth first ordering
Q = the queue containing only V0 .
The easiest ordering to understand is called
2. While Q is nonempty
breadth first, because it orders vertices “across”
the tree in “levels.” 2.1. Let V be the vertex at the head of Q
2.2. If there is an edge e = V W in G
Level 0 is a given “root” vertex.
where W is not in T
Level 1 is the vertices one edge
2.2.1. Add e and W to T
away from the root.
2.2.2. Insert W in Q (at the tail).
Level 2 are the vertices two edges
away from the root, 2.3. Else remove V from Q.
. . . and so on. Remarks
Example.
1. If the graph G is not connected, the al-
gorithm gives a spanning tree of the con-
A, B, C, D, E, F, G A nected component containing the root ver-
is a breadth first ordering of tex A, the part of G containing all vertices
B C connected to A.

2. Thus we can recognise whether G is con-


DEF G nected by seeing whether all its vertices
are included when the algorithm termi-
nates.
33.2 Queues
Breadth first ordering amounts to putting ver- 3. Being able to recognise connectedness en-
tices in a queue - a list processed on a “first ables us, e.g., to recognise bridges.
come, first served” or “first in, first out” basis.

ˆ The root vertex is first in the queue (hence


first out).

ˆ Vertices adjacent to the head vertex v in


the queue go to the tail of the queue (hence
they come out after v), if they are not al-
ready in it.

ˆ The head vertex v does not come out of


the queue until all vertices adjacent to v
have gone in.

33.3 Breadth first algorithm


For any connected graph G, this algorithm not
only orders the vertices of G in a queue Q, it
also builds a spanning tree T of G by attaching
Example. Example.
A We use the same G, and take the top of S to be
its right hand end.
If G = B C , with root vertex A.
D E
Step S T
Then Q and T grow as follows: 1 A A
A
2 AB B
Step Q T
A A
1 A B C
3 ABC
A
B A
2 AB B C
A E
B C 4 ABCE
3 ABC
A
4 BC B C
A 4 ABCED D E
B C
D 6 ABCE
5 BCD
A 7 ABC
B C 8 AB
6 BCDE D E 9 A
7 CDE
8 DE
9 E Questions
33.1 The following list gives the state, at suc-
cessive stages, of either a queue or a stack.
A
33.4 Depth first algorithm AB
This is the same except it has a stack S instead ABC
of a queue Q. S is “last in, first out,” so we BC
insert and remove vertices from the same end of BCD
S (called the top of the stack). CD
D
Which is it: a queue or a stack?
1. Initially, T = tree with just one vertex V0 ,
S = the stack containing only V0 . 33.2 Construct a breadth first spanning tree for
the graph
2. While S is nonempty
2.1. Let V be the vertex at the top of S A B D

2.2. If there is an edge e = V W in G


where W is not in T C E
2.2.1. Add e and W to T 33.3 Construct a depth first spanning tree for
2.2.2. Insert W in S (at the top). the graph
2.3. Else remove V from S. A

Remark. The breadth first and depth first al- C D


B
gorithms give two ways to construct a spanning
tree of a connected graph.
E
Useful notation

Sets of numbers
N the set of natural numbers {0, 1, 2, 3, . . .}
Z the set of integers {. . . , −2, −1, 0, 1, 2, . . .}
Q the set of rational numbers { ab : a, b ∈ Z, b ̸= 0}
R the set of real numbers

Number Theory
a|b a divides b b = qa for some q ∈ Z
gcd(a, b) greatest common divisor of a and b
a ≡ b (mod n) a and b are congruent modulo n n | (a − b)

Logic
¬p not p
p∧q p and q
p∨q p or q
p→q p implies q
∀x for all x
∃x there exists x

Sets
x∈A x is an element of A
{x : P (x)} the set of x such that P (x)
|A| the number of elements in A
A⊆B A is a subset of B
A∩B A intersect B {x : x ∈ A ∧ x ∈ B}
A∪B A union B {x : x ∈ A ∨ x ∈ B}
A−B set difference A minus B {x : x ∈ A ∧ x ∈
/ B}
A△B A symmetric difference B {x : x ∈ A ∨ x ∈ B}

Functions
f :A→B f is a function from A to B

Probability
Pr(E) probability of E
Pr(A|B) conditional probability of A given B
E[X] expected value of X
Var[X] variance of X

Sums and products


Pb
f (i) sum of f (i) from i = a to i = b f (a) + f (a + 1) + · · · + f (b)
i=a
b
Q
f (i) product of f (i) from i = a to i = b f (a) × f (a + 1) × · · · × f (b)
i=a
Useful formulas

Logic Laws Binomial theorem


p ↔ q ≡ (p → q) ∧ (q → p) n
(x + y)n = xn y 0 + n1 xn−1 y 1 + n2 xn−2 y 2 +

0
p → q ≡ (¬p) ∨ q n
· · · + n−1
1 n−1
x y + nn x0 y n

¬¬p ≡ p
p∧p≡p
p∨p≡p Conditional probability
p∧q ≡q∧p Pr(A ∩ B)
Pr(A|B) =
Pr(B)
p∨q ≡q∨p
p ∧ (q ∧ r) ≡ (p ∧ q) ∧ r Bayes’ theorem
p ∨ (q ∨ r) ≡ (p ∨ q) ∨ r Pr(B|A)Pr(A)
Pr(A|B) =
Pr(B|A)Pr(A) + Pr(B|A)Pr(A)
p ∧ (q ∨ r) ≡ (p ∧ q) ∨ (p ∧ r)
p ∨ (q ∧ r) ≡ (p ∨ q) ∧ (p ∨ r)
¬(p ∧ q) ≡ (¬p) ∨ (¬q) Discrete uniform distribution
¬(p ∨ q) ≡ (¬p) ∧ (¬q) 1
Pr(X = k) = for k ∈ {a, a + 1, . . . , b}
b−a+1
p∧T≡p a+b (b−a+1)2 −1
E[X] = 2 , Var[X] = 12
p∨F≡p
Bernoulli distribution
p∧F≡F (
p∨T≡T p for k = 1
Pr(X = k) =
1 − p for k = 0
p ∧ (¬p) ≡ F
E[X] = p, Var[X] = p(1 − p)
p ∨ (¬p) ≡ T
p ∧ (p ∨ q) ≡ p Geometric distribution
p ∨ (p ∧ q) ≡ p Pr(X = k) = p(1 − p)k for k ∈ N
1−p 1−p
¬∀xP (x) ≡ ∃x¬P (x) E[X] = p , Var[X] = p2
¬∃xP (x) ≡ ∀x¬P (x)
Binomial distribution
n k

Pr(X = k) = p (1 − p)n−k for k ∈ {0, . . . , n}
k
Ordered selections without repetition
E[X] = np, Var[X] = np(1 − p)
n!
n(n − 1) · · · (n − r + 1) =
(n − r)! Poisson distribution
λk e−λ
Unordered selections without repetition Pr(X = k) = for k ∈ N
k!
n(n − 1) · · · (n − r + 1) n! n E[X] = λ, Var[X] = λ

= =
r! r!(n − r)! r

Ordered selections with repetition


nr

Unordered selections with repetition


n+r−1 (n + r − 1)!

=
r r!(n − 1)!

You might also like