MAT1830NotesBooklet (3)
MAT1830NotesBooklet (3)
Discrete mathematics studies objects which we’ll see, even if you are pretty sure that some-
have distinct separated values (e.g. integers), as thing is true, it can be really useful to have a
opposed to objects which vary smoothly (e.g. proof of it, for a number of reasons.
real numbers). You can think of it as being
“digital” mathematics rather than “analogue” 1.3 Maths in computer science
mathematics.
Discrete mathematics is particularly impor- As we mentioned above, maths and computer
tant in computer science and the two fields are science are very closely related. The topics in
very closely linked. this course all have many applications to com-
This course covers a wide range of topics in puter science. For example:
discrete mathematics including the following:
Number theory is used in cryptography
Numbers to enable secure communication, identity
verification, online banking and shopping
Logic etc.
Example. 210 = 2 × 3 × 5 × 7, and this is the 2.4 The greatest common divisor of
only product of primes which equals 210. two numbers
It is remarkable that we can find the greatest
This would not be true if 1 was counted as
common divisor of positive integers m and n,
a prime, because many factorisations involve 1.
gcd(m, n), without finding their prime divisors.
E.g.
This is done by the famous Euclidean al-
210 = 1 × 2 × 3 × 5 × 7 = 12 × 2 × 3 × 5 × 7 = . . . gorithm, which repeatedly divides the greater
number by the smaller, keeping the smaller num- given above, the gcd of the numbers in the last
ber and the remainder. two columns is always gcd(m, n).)
2.5 The Euclidean algorithm works! 2.2 Can a multiple of 15 and a multiple of 21
differ by 1? If not, what is the small-
We start with the precondition m ⩾ n > 0. est positive difference between such mul-
Then the division theorem tells us there is a re- tiples?
mainder r < b when a = m is divided by b = n.
Repeating the process gives successively smaller 2.3 Find gcd(13, 21) and gcd(15, 21), and sug-
remainders, and hence the algorithm eventually gest how they are related to the results in
returns a value. Questions 2.1 and 2.2.
That the value returned value is actually
2.4 Work out the prime factorisations of 999
gcd(m, n) relies on the following fact.
and 1000.
Fact. If a, b and k are integers, then 2.5 You should find no common prime factor
of 999 and 1000. How could you see this
gcd(a − kb, b) = gcd(a, b).
without factorising the numbers? (Hint: a
By using this fact repeatedly, we can show that common divisor of 1000 and 999 is also a
after each execution of the while loop in the al- common divisor of . . . what?)
gorithm gcd(b, r) = gcd(m, n). When the algo-
rithm terminates, this means b = gcd(b, 0) =
gcd(m, n). (Equivalently, in the neat set out
Lesson 3: Congruences
x − y ≡ 1 (mod 8)
Let n ⩾ 2 be an integer. We say integers a
and b are congruent modulo n and write xy ≡ 6 (mod 8).
a ≡ b (mod n) We can also deduce that x + 4 ≡ 7 (mod 8),
when n divides a − b. that 4x ≡ 12 (mod 8) and so on, because obvi-
ously 4 ≡ 4 (mod 8). Note as well that 4x ≡
12 (mod 8) can be simplified to 4x ≡ 4 (mod 8).
In some situations we can also “divide 3.4 Modular inverses
through” a congruence by an integer.
A modular multiplicative inverse of an inte-
If a ≡ b (mod n) and d divides a, b and n, ger a modulo n is an integer x such that
then ax ≡ 1 (mod n).
a b n
d ≡ d (mod d ).
From the last section we know that such an in-
verse will exist if and only if gcd(a, n) = 1. If
3.3 Solving linear congruences inverses do exist then we can find them using
the extended Euclidean algorithm (there will
Think of a congruence like 7x ≡ 5 (mod 9). This be lots of inverses, but they will all be in one
will hold if 9 divides 7x − 5 or in other words if congruence class modulo n). These inverses
there is an integer y such that 7x − 5 = 9y. So have important applications to cryptography
to solve our original congruence we can find an and random number generation.
integer solution to 7x − 9y = 5.
Some congruences don’t have solutions. Example. 8 should have a multiplicative in-
For example, there is no solution to 10x ≡ verse modulo 45 because gcd(8, 45) = 1. Using
6 (mod 20) because there are no integers x and the extended Euclidean algorithm we see that
y such that 10x − 20y = 6. −3 × 45 + 17 × 8 = 1. So 8 × 17 ≡ 1 (mod 45).
To find an expression for all the integers x This means that 17 is a multiplicative inverse of
that satisfy a congruence like ax ≡ b (mod n), 8 modulo 45.
first find d = gcd(a, n) and then act as follows.
If d = 1: Find integers x′ and y ′ such that Questions
ax − ny ′ = b. The integers x that satisfy the
′
original congruence are exactly those for which 3.1 Are the following true or false?
x ≡ x′ (mod n). 6 ≡ 3 (mod 3)
If d > 1 and d divides b: The method
above will still work but it will only give some 9 ≡ 18 (mod 8)
of the solutions. To find all of the solutions, 5x + 6 ≡ 2x (mod 3)
first divide through the congruence by d to get
3.2 Prove all of the facts about congruences
the equivalent congruence ad x ≡ db (mod nd ) and
that were stated in this lesson (use the
then use the method above on the new congru-
definition of congruence modulo n and the
ence.
definition of divides).
If d doesn’t divide b: The congruence has
no solutions. 3.3 Find an expression for all the integers x
Example. Find all integers x such that that satisfy 9x ≡ 12 (mod 60).
36x ≡ 10 (mod 114).
Using the Euclidean algorithm we find
gcd(36, 114) = 6. So 6 divides 36x − 114y
for any integers x and y, and consequently
36x − 114y ̸= 10. This means that there are
no integers x such that 36x ≡ 10 (mod 114).
The simplest and most commonly used part Similarly, p ∨ q is true when p is true or q is
of logic is the logic of “and”, “or” and “not”, true, but now we have to be more precise, be-
which is known as propositional logic. cause “or” has at least two meanings in ordinary
A proposition is any sentence which has a speech.
definite truth value (true= T or false= F), such
as We define ∨ by the truth table
1 + 1 = 2, or p q p∨q
10 is a prime number. T T T
but not T F T
What is your name? or F T T
This sentence is false. F F F
p q p∨q
T T F
T F T
F T T
F F F
4.4 Remarks
1. The symbols ∧ and ∨ are intentionally sim-
ilar to the symbols ∩ and ∪ for set intersection
and union because
x ∈ A ∩ B ⇔ (x ∈ A) ∧ (x ∈ B)
x ∈ A ∪ B ⇔ (x ∈ A) ∨ (x ∈ B)
(We study sets later.)
Questions
4.1 Which of the following are propositions?
1 + 1 = 3, 1 + 1, 3 divides 7, 3÷7
Example. p → q ≡ (¬p) ∨ q
A sentence in propositional logic is We know p → q has the truth table
a tautology if it has value T under all
interpretations; p q p→q
a contradiction if it has value F under T T T
all interpretations. T F F
F T T
We can check whether ϕ is a tautology, a F F T
contradiction, or neither by computing its value Now (¬p) ∨ q has the truth table
for all possible values of its variables.
p q ¬p (¬p) ∨ q
Example. (¬p) ∨ p is a tautology. T T F T
The truth table for (¬p) ∨ p is T F F F
p ¬p (¬p) ∨ p F T T T
T F T F F T T
F T T So p → q and (¬p) ∨ q have the same truth
So (¬p)∨p has value T under all interpretations, table (looking just at their columns). It follows
and thus is a tautology. (It is sometimes known from this that p → q can always be rewritten
as the law of the excluded middle). as (¬p) ∨ q. In fact, all truth functions can be
expressed in terms of ∧, ∨, and ¬.
We can similarly compute the values of any
truth function ϕ, so this is an algorithm for
recognising tautologies. However, if ϕ has n
variables, they have 2n sets of values, so the
amount of computation grows rapidly with n.
One of the biggest unsolved problems of logic This is like finding identities in algebra – one
and computer science is to find an efficient algo- uses known equivalences to rearrange, expand
rithm for recognising tautologies. and simplify.
5.3 Useful equivalences brackets. Since p ∨ (q ∨ r) ≡ (p ∨ q) ∨ r, we
can write either side as p ∨ q ∨ r. This is like
The following equivalences are the most fre- p + (q + r) = (p + q) + r = p + q + r in ordinary
quently used in this “algebra of logic”. algebra.
Example. p ∧ q ⇒ p
p is a logical consequence of p ∧ q, because p = T
whenever p ∧ q = T. However, we can have
p ∧ q = F when p = T (namely, when q = F).
Hence p ∧ q and p are not equivalent.
This example shows that ⇒ is not symmet-
ric:
(p ∧ q) ⇒ p but p ⇏ (p ∧ q)
This is where ⇒ differs from ≡, because if ϕ ≡ ψ
then ψ ≡ ϕ.
In fact, we build the relation ≡ from ⇒ the
same way ↔ is built from →:
ϕ ≡ ψ means (ϕ ⇒ ψ) and (ψ ⇒ ϕ).
Lesson 7: Predicates and quantifiers
7.1 Predicates
A predicate such as “n is prime” is not a proposi- 7.3 Quantifiers and connectives
tion because it is neither true nor false. Rather,
it is a function P (n) of n with the Boolean val- We can also combine quantifiers with connec-
ues T (true) or F (false). In this case, P (n) is a tives from propositional logic.
function of natural numbers defined by
(
T if n is prime Example. Let Sq(n) be the predicate “n is a
P (n) =
F otherwise. square,” and let P os(n) be the predicate “n is
Similarly, the “x ⩽ y” predicate is a function of positive” as above. Then we can symbolise the
pairs of real numbers, defined by following sentences:
( There is a positive square:
T if x ⩽ y
R(x, y) = ∃n(P os(n) ∧ Sq(n)).
F otherwise.
Since most of mathematics involves properties There is a positive integer which is not a
and relations such as these, only a language with square:
predicates is adequate for mathematics (and ∃n(P os(n) ∧ ¬Sq(n))
computer science).
All squares are positive:
∀n(Sq(n) → P os(n))
7.2 Building sentences from predi-
cates
Notice that the “All. . . are” combination in
One way to create a sentence from a predicate English actually involves an implication. This
is to replace its variables by constants. For ex- is needed because we are making a claim only
ample, when P (n) is the predicate “n is prime,” about squares and the implication serves to
P (3) is the sentence “3 is prime.” “narrow down” the set we are examining.
7.4 Alternating quantifiers Remark. Another way to say “you can’t fool
all of the people all of the time” is
Combinations of quantifiers like ∀x∃y . . . , “for
all x there is a y . . .” are common in mathe- ∃p∃t¬F (p, t).
matics, and can be confusing. It helps to have
some examples in mind to recall the difference Questions
between ∀x∃y . . . and ∃y∀x . . . 7.1 Write “roses are red” in the language of
The relation x < y is convenient to illustrate predicate logic, using
such combinations; we write x < y as the pred-
icate L(x, y) rose(x) for “x is a rose”
Then red(x) for “x is red.”
∀x∃yL(x, y) 7.2 If P (n) stands for “n is prime” and E(n)
is the (true) sentence stands for “n is even,” what does P (n) ∧
(¬E(n)) say about n?
for all x there is a y such that x < y,
which says that there is no greatest number. 7.3 Using the predicates
But with the opposite combination of quan- pol(x) for “x is a politician”
tifiers we have
liar(x) for “x is a liar”
∃y∀xL(x, y) represent the following statements in logic.
is the false sentence
all politicians are liars
there is a y such that for all x, x < y,
some politicians are liars
which says there is a number greater than all
no politicians are liars
numbers.
Even though these statements are usually some politicians are not liars.
written without brackets they are effectively
Are any of these sentences logically equiv-
bracketed “from the centre”. So ∀x∃yL(x, y)
alent?
means ∀x(∃yL(x, y)) and ∃y∀xL(x, y) means
∃y(∀xL(x, y)).
Nevertheless, in some cases, we can see that a ∀x(P (x) ∧ Q(x)) ≡ ∀x(Q(x) ∧ P (x))
sentence is true for all interpretations. simply because
Example. ∀x∀yP (x, y) → ∀y∀xP (x, y) is true P (x) ∧ Q(x) ≡ Q(x) ∧ P (x)
for all properties P , and hence is valid. for any x.
Likewise, we can sometimes see that a sen-
tence is not valid by finding an interpretation However there are also equivalences that
which makes it false. genuinely involve quantifiers.
8.5 Useful equivalences 8.7* Completeness and undecidabil-
ity
Two important equivalences involving quanti-
fiers are In 1930, Gödel proved that there is a complete
set of rules of inference for predicate logic. This
means, in particular, that there is an algorithm
to list the valid sentences.
¬∀xP (x) ≡ ∃x¬P (x) However, in 1936, Church and Turing proved
that there is no algorithm to list the logically
¬∃xP (x) ≡ ∀x¬P (x)
false sentences. This means, in particular, that
predicate logic is undecidable: there is no algo-
These make sense intuitively. For example, rithm which, for any sentence ϕ, decides whether
¬∀xP (x) means P (x) is not true for all x, hence ϕ is valid or not.
there is an x for which P (x) is false, that is, This negative result is due to the power of
∃x¬P (x). predicate logic: it can express all mathemati-
They can also be viewed as “infinite De Mor- cal or computational problems, and it is known
gan’s laws.” If x ranges over {1, 2, 3, . . .} for ex- that some of these problems cannot be solved by
ample, then algorithm.
∀xP (x) ≡ P (1) ∧ P (2) ∧ P (3) ∧ · · ·
Questions
and
8.1 Give interpretations which make the fol-
∃xP (x) ≡ P (1) ∨ P (2) ∨ P (3) ∨ · · ·
lowing sentences false.
Hence
∃nP (n) → ∀nP (n)
¬∀xP (x) ≡ ¬ (P (1) ∧ P (2) ∧ P (3) ∧ · · · )
∀x∀y (R(x, y) → R(y, x))
≡ (¬P (1)) ∨ (¬P (2)) ∨ (¬P (3)) ∨ · · ·
∀m∃nS(m, n)
by de Morgan’s law
≡ ∃x¬P (x). 8.2 Give interpretations which show that the
sentences
And similarly
∃x (P (x) ∧ L(x))
¬∃xP (x) ≡ ¬ (P (1) ∨ P (2) ∨ P (3) ∨ · · · )
and
≡ (¬P (1)) ∧ (¬P (2)) ∧ (¬P (3)) ∧ · · ·
by de Morgan’s law ∃x (P (x) ∧ ¬L(x))
≡ ∀x¬P (x). are not equivalent.
Since the natural numbers 0, 1, 2, 3, . . . are as required. This completes the induction step,
generated by a process which begins with 0 and and hence completes the proof.
repeatedly adds 1, we have the following. Example 2. Prove there are 2n n-bit binary
strings.
Property P is true for all natural numbers if Let P (n) be “there are 2n n-bit binary strings”.
1. P (0) is true. Base step. There is 20 = 1 0-bit binary string
2. P (k) ⇒ P (k + 1) for all k ∈ N. (the empty string) so P (0) is true.
This is called the principle of mathematical Induction step. We want to prove that
induction.
It is used in a style of proof called proof by there are 2k k-bit binary strings
induction, which consists of two steps. ⇒ there are 2k+1 (k + 1)-bit binary strings
Base step: Proof that the required property P Well, a (k + 1)-bit binary string is either W 0 or
is true for 0. W 1, where W is any k-bit binary string. Thus
Induction step: Proof that if P (k) is true if there are 2k k-bit binary strings W , there are
then P (k + 1) is true, for each k ∈ N. 2 × 2k = 2k+1 (k + 1)-bit binary strings.
This completes the induction step, and hence
completes the proof.
9.1 Examples
Example 1. Prove that 3 divides n3 + 2n for 9.2 Starting the base step higher
all n ∈ N
It is not always appropriate to start the induc-
Let P (n) be “3 divides n3 + 2n”. tion at 0. Some properties are true only from a
Base step. 3 divides 03 + 2 × 0 = 0, so P (0) is certain positive integer upwards, in which case
true. the induction starts at that integer.
Induction step. We want to prove Example 3. Prove n! > 2n for all integers
3 divides k 3 + 2k n⩾4
⇒ 3 divides (k + 1)3 + 2(k + 1). Let P (n) be “n! > 2n ”.
Well, Base step. 4! = 4 × 3 × 2 = 24 > 16 = 24 , so
P (4) is true.
(k + 1)3 + 2(k + 1)
= k 3 + 3k 2 + 3k + 1 + 2k + 2 Induction step. We want to prove k! > 2k ⇒
(k + 1)! > 2k+1 for all integers k ⩾ 4.
= k 3 + 2k + 3k 2 + 3k + 3
Now, for k ⩾ 4, if k! > 2k ,
= k 3 + 2k + 3(k 2 + k + 1).
(k+1)! = (k+1)×k! > (k+1)×2k > 2×2k = 2k+1 .
Therefore,
(The first > holds because we are assuming
3 divides k 3 + 2k k! > 2k and the second holds because k ⩾ 4.)
⇒ 3 divides k 3 + 2k + 3(k 2 + k + 1) Thus k! > 2k ⇒ (k + 1)! > 2k+1 , as required to
⇒ 3 divides (k + 1)3 + 2(k + 1) complete the induction.
So n! > 2n for all n ⩾ 4. Remark. Another proof is to write down
1 + 2 + 3 + · · · + (n − 1) + n
Example 4. Prove any integer value n ⩾ 8 (in n + (n − 1) + · · · + 3 + 2 + 1
cents) is obtainable with 3c and 5c stamps. and observe that each of the n columns sums
to n + 1. Thus the sum of twice the series is
Let P (n) be “n cents is obtainable with 3c and
n(n + 1), and hence the sum of the series itself
5c stamps”.
is n(n + 1)/2. One could argue that this proof
Base step. 8c can be obtained by a 3c plus a 5c uses induction stealthily, to prove that the sum
stamp. So P (8) is true. of each column is the same.
Induction step. We have to show that if k cents
is obtainable, so is (k + 1) cents, when k ⩾ 8. Questions
Case 1. The k cents is obtained using a 5c
In most induction problems set for students we
stamp (among others). Replace the 5c stamp by
skip the experimental part, which is finding what
two 3c stamps, thus obtaining k + 1 cents.
to prove. Before trying to prove that 3 divides
Case 2. If the k cents is obtained using only
n3 + 2n, for example, someone needs to guess
3c stamps, there are at least three of them (since
that it is true, perhaps by trying n = 1, 2, 3, 4.
k ⩾ 8). In this case, replace three 3c stamps by
two 5c stamps, again obtaining k + 1 cents. 9.1 In this question, try to guess what ? stands
Thus in either case, when k ⩾ 8, P (k) ⇒ for, by trying a few values of n.
P (k + 1). This completes the induction proof
that n cents are obtainable from 3c and 5c ? divides n2 + n
stamps, for all integers n ⩾ 8. The sum of the first n odd numbers
is ?
1
1×2
1
+ 2×3 1
+ 3×4 1
+ . . . + n(n+1) = 1−?
9.3 Sums of series 9.2 If you correctly guessed the sum
1 1 1 1
Induction is often used to prove that sum for- 1×2 + 2×3 + 3×4 + ... + n(n+1) ,
mulas are correct. you might wonder why it is so simple.
Here is a clue:
Example 5. Prove 1+2+3+· · ·+n = n(n+1)/2 1
= 1
− 12 .
1×2 1
for all integers n ⩾ 1.
1 1
Let P (n) be “1 + 2 + 3 + · · · + n = n(n + 1)/2”. What is 2×3 ? 3×4 ?
How does this lead to a simple formula for
Base step. When n = 1, the left hand side is 1,
1 1 1 1
and the right hand side is 1(1 + 1)/2 = 2/2 = 1, 1×2 + 2×3 + 3×4 + ... + n(n+1) ?
so P (1) is true.
OK, if we can guess formulas correctly,
Induction step. We have to prove that why bother proving them by induction?
1 + 2 + ··· + k = k(k+1) The reason is that a statement which fits
2
(k+1)(k+2) many values of n can still be wrong.
⇒ 1 + 2 + · · · + k + (k + 1) = 2 .
Now, if P (k) is true, 9.3 Show that n2 + n + 41 is a prime number
for n = 1, 2, 3, 4 (and go further, if you
1 + 2 + · · · + k + (k + 1) like). Do you think n2 + n + 41 is prime
= (1 + 2 + · · · + k) + (k + 1) for all natural numbers n?
k(k+1)
= 2 + (k + 1) using P (k)
= (k + 1)( k2 + 1)
(k+1)(k+2)
= 2
as required.
This completes the induction proof.
Lesson 10: Induction and well-ordering
In the previous lesson we were able to prove Example 2. Prove that every positive integer
a property P holds for 0, 1, 2, . . . as follows: is a sum of distinct powers of 2. (Just a power
Base step. Prove P (0) of two by itself counts as a “sum”.)
Induction step. Prove P (k) ⇒ P (k + 1) for
The idea behind this proof is to repeatedly sub-
each natural number k.
tract the largest possible power of 2. We illus-
This is sufficient to prove that P (n) holds for
trate with the number 27.
all natural numbers n, but it may be difficult to
prove that P (k + 1) follows from P (k). It may 27 − largest power of 2 less than 27
in fact be easier to prove the induction step = 27 − 16 = 11
11 − largest power of 2 less than 11
= 11 − 8 = 3
3 − largest power of 2 less than 3
P (0) ∧ P (1) ∧ · · · ∧ P (k) ⇒ P (k + 1).
=3−2=1
Hence 27 = 16 + 8 + 2 + 1 = 24 + 23 + 21 + 20 .
That is, it may help to assume P holds for (It is only interesting to find distinct powers
all numbers before k + 1. Induction with this of 2, because of course each integer ⩾ 1 is a sum
style of induction step is sometimes called the of 1s, and 1 = 20 .)
strong form of mathematical induction. The strong induction proof goes as follows.
Let P (n) be “n is a sum of distinct powers of
2”.
Base step. 1 = 20 , so 1 is a sum of (one) power
10.1 Examples of strong induction of 2. Thus P (1) is true.
Induction step. Suppose each of the numbers
Example 1. Prove that every integer ⩾ 2 is a 1, 2, 3, . . . , k is a sum of distinct powers of 2. We
product of primes. (Just a prime by itself counts wish to prove that k +1 is a sum of distinct pow-
as a “product”.) ers of 2.
Let P (n) be “n is a product of primes”. This is certainly true if k + 1 is a power of 2. If
not, let 2j be the greatest power of 2 less than
Base step. 2 is a prime, hence a product of (one) k + 1. Then
prime. So P (2) is true.
i = k + 1 − 2j
Induction step. Suppose 2, 3, . . . , k are products
is one of the numbers 1, 2, 3, ..., k, and hence it
of primes. We wish to prove that k +1 is a prod-
is a sum of distinct powers of 2.
uct of primes.
Also, the powers of 2 that sum to i are all
This is certainly true if k + 1 is a prime. If not
less than 2j , otherwise 2j is less than half k + 1,
k + 1 = i × j, contrary to the choice of 2j as the largest power
for some natural numbers i and j less than k +1. of 2 less than k + 1.
But then i and j are products of primes by our Hence k + 1 = 2j + powers of 2 that sum to
assumption, hence so is i × j = k + 1. i is a sum of distinct powers of 2.
This completes the induction proof. This completes the induction proof.
√
10.2 Well-ordering and descent But then 2 = m1 /n1 , and we can repeat
the argument to show that m1 and n1 are both
Induction expresses the fact that each natural
even, so m1 = 2m2 and n1 = 2n2 , and so on.
number n can be reached by starting at 0 and
Since the argument can be repeated indefi-
going upwards (e.g. adding 1) a finite number
nitely, we get an infinite descending sequence of
of times.
natural numbers
Equivalent facts are that it is only a finite
number of steps downwards from any natural m > m1 > m2 > · · ·
number to 0, that any descending sequence of which is impossible.
natural numbers is finite, and that any set of Hence
√ there are no natural numbers m and
natural numbers has a least element. n with 2 = m/n.
This property is called well-ordering of the
natural numbers. It is often convenient to ar- Questions
range a proof to “work downwards” and appeal
to well-ordering by saying that the process of 10.1 For each of the following statements, say
working downwards must eventually stop. which is likely to require strong induction
Such proofs are equivalent to induction, for its proof.
though they are sometimes called “infinite de- an+1 −1
1 + a + a2 + · · · + an = a−1
scent” or similar.
¬ (p1 ∨ p2 ∨ p3 ∨ · · · ∨ pn ) ≡ (¬p1 ) ∧
(¬p2 ) ∧ (¬p3 ) ∧ · · · ∧ (¬pn )
10.3 Proofs by descent
Each fraction m n
< 1 is a sum of
Example 1. Prove that any integer ⩾ 2 has a distinct fractions with numerator 1
prime divisor. for example, 11 1 1 1
12 = 2 + 3 + 12 .
If n is prime, then it is a prime divisor of itself.
10.2 There is something else which tells you ev-
If not, let n1 < n be a divisor of n.
ery integer ⩾ 1 is a sum of distinct powers
If n1 is prime, it is a prime divisor of n. If
of 2. What is it?
not, let n2 < n1 be a divisor of n1 (and hence of
n). 10.3 Is every integer ⩾ 1 a sum of distinct pow-
If n2 is prime, it is a prime divisor of n. If ers of 3?
not, let n3 < n2 be a divisor of n2 , etc.
The sequence n > n1 > n2 > n3 > · · · must
eventually terminate, and this means we find a
prime divisor of n.
√
Example 2. Prove 2 is irrational.
√
Suppose that 2 = m/n for natural numbers m
and n. We will show this is impossible. Since the
square of an odd number is odd, we can argue
as follows
√
2 = m/n
⇒ 2 = m2 /n2 squaring both sides
⇒ m2 = 2n2
⇒ m2 is even
⇒ m is even
since the square of an odd number is odd
⇒ m = 2m1 say
⇒ 2n2 = m2 = 4m21
⇒ n2 = 2m21
⇒ n is even, = 2n1 say
Lesson 11: Sets
U
A B The difference U − B relative to the univer-
sal set U is called the complement B of B. Here
is the Venn diagram of B.
U
A B
12.2 Union A ∪ B
The union A ∪ B of sets A and B consists of
the elements in A or B, and is indicated by the
shaded region in the following Venn diagram.
U
A B
12.3 Intersection A ∩ B
The intersection A ∩ B of sets A and B consists
of the elements in A and B, indicated by the
shaded region in the following Venn diagram. A△B consists of the elements of one of A, B
but not the other.
U It is clear from the diagram that we have not
A B only
A△B = (A − B) ∪ (B − A),
but also
A△B = (A ∪ B) − (A ∩ B).
12.6 Ordered Pairs area l × w. In fact, we call it an “l × w rectan-
gle.” This is probably the reason for using the
Sometimes we do want order to be important.
× sign, and for calling A × B a “product.”
In computer science arrays are ubiquitous ex-
amples of ordered data structures. In maths,
ordered pairs are frequently used. An ordered
Questions
pair (a, b) consists simply of a first object a and 12.1 Draw a Venn diagram for A ∩ B. What is
a second object b. The objects a and b are some- another name for this set?
times called the entries or coordinates of the or-
dered pair. 12.2 Check the de Morgan laws by drawing
Venn diagrams for A ∪ B, A ∩ B, A ∩ B
and A ∪ B
Two ordered pairs (a, b) and (c, d) are equal
if and only if a = c and b = d. 12.3 Find which of the following is true by
drawing suitable Venn diagrams.
Example. {0, 1} = {1, 0} but (0, 1) ̸= (1, 0). A ∩ (B△C) = (A ∩ B)△(A ∩ C)?
A△(B ∩ C) = (A△B) ∩ (A△C)?
There’s no reason we need to stop with pairs.
We can similarly define ordered triples, quadru- 12.4 If plane = line × line, what do you think
ples, and so on. When there are k coordinates, line × circle is? What about circle × cir-
we call the object an ordered k-tuple. Two or- cle?
dered k-tuples are equal if and only if their ith
coordinates are equal for i = 1, 2, . . . , k.
(a, b)
b
O a
√
A function can be thought of as a “black 2. The square root function sqrt(x) = x with
box” which accepts inputs and, for each input, domain R⩾0 , codomain R, and pairs
produces a single output. √
{(x, x) : x ∈ R and x ⩾ 0}.
Examples. 0
Complicated functions are often built from 15.2 Conditions for composition
simple parts. For example, the function f : R →
Composite functions do not always exist.
R defined by f (x) = (x2 + 1)3 is computed by
doing the following steps in succession: Example. If reciprocal : R − {0} → R is de-
fined by reciprocal(x) = x1 and predecessor :
square, R → R is defined by predecessor(x) = x − 1,
then reciprocal ◦ predecessor does not exist, be-
add 1, cause predecessor(1) = 0 is not a legal input for
reciprocal.
cube.
To avoid this problem, we demand that the
We say that f (x) = (x2 + 1)3 is the composite codomain of h be equal to the domain of g for
of the functions (from R to R) g ◦ h to exist. This ensures that each output of
h will be a legal input for g.
square(x)=x2 ,
Mathematical objects can be related in var- This relation is also a function (the identity
ious ways, and any particular way of relating function on R), since there is exactly one pair
objects is called a relation on the set of objects for each x ∈ R.
in question.
(This also applies to relations in the every-
day sense. For example, “parent of” is a relation
on the set of people.)
-1
16.1 Relations and functions -1 0 1
Any function f : X → Y can be viewed as a (The dashed line indicates that the points
relation R on X ∪ Y . The relation is defined by where x = y are omitted.)
xRy if and only if y = f (x).
However, not every relation is a function.
Remember that a function must have exactly
one output y for each input x in its domain. In
a relation, on the other hand, an element x may
be related to many elements y, or to none at all. 3. Algebraic curves.
An algebraic curve consists of the points
(x, y) satisfying an equation p(x, y) = 0,
16.2 Examples where p is a polynomial.
1. Equality on R. E.g. unit circle x2 + y 2 − 1 = 0.
This is the relation consisting of the pairs 1
(x, x) for x ∈ R. Thus it is the following
subset of the plane. 0
1
-1
0 -1 0 1
1 Questions
16.1 Which of the following relations R(x, y)
0 satisfy ∀x∃yR(x, y)?
5. Congruence modulo n.
For a fixed n, congruence modulo n is a bi-
nary relation. It consists of all the ordered
pairs of integers (a, b) such that n divides
a − b.
1. Reflexive property.
a ≡ a (mod n)
for any number a.
2. Symmetric property.
a ≡ b (mod n) ⇒ b ≡ a (mod n)
for any numbers a and b.
3. Transitive property.
a ≡ b (mod n) and b ≡ c (mod n) ⇒
a ≡ c (mod n)
for any numbers a, b and c.
Lesson 17: Equivalence relations
3. Similarity of triangles.
An equivalence relation R on a set A is a bi-
Triangles ABC and A′ B ′ C ′ are similar if
nary relation with the following three prop-
erties. AB BC CA
′ ′
= ′ ′ = ′ ′.
AB BC CA
1. Reflexivity. E.g. the following triangles are similar
aRa C′
for all a ∈ A.
2. Symmetry.
C
aRb ⇒ bRa
for all a, b ∈ A.
3. Transitivity. A B B′ A′
aRb and bRc ⇒ aRc 4. Parallelism of lines.
for all a, b, c ∈ A. The relation L∥M (L is parallel to M ) is an
equivalence relation.
Equality and congruence mod n (for fixed n) are
Remark
examples of equivalence relations.
In all these cases the relation is an equivalence
because it says that objects are the same in some
respect.
17.1 Other equivalence relations
2. Congruence of triangles.
Triangles ABC and A′ B ′ C ′ are congruent if 3. Similar triangles have the same shape.
AB = A′ B ′ , BC = B ′ C ′ and CA = C ′ A′ . E.g.
the following triangles are congruent.
4. Parallel lines have the same direction.
C C′
Examples
The parallel equivalence class of a line L Example. Let R be the relation on Z defined
consists of all lines parallel to L. by aRb if and only if a ≡ b (mod 3). The three
The equivalence class of 1 for congruence equivalence classes of R are
mod 2 is the set of all odd numbers. {x : x ≡ 0 (mod 3)} = {3k : k ∈ Z}
{x : x ≡ 1 (mod 3)} = {3k + 1 : k ∈ Z}
{x : x ≡ 2 (mod 3)} = {3k + 2 : k ∈ Z}.
17.3 Equivalence class properties These partition the set Z.
3. Transitivity. 2. ⊆ on P(N).
This is not a total order because, for example,
aRb and bRc ⇒ aRc
{1, 2} ⊈ {1, 3} and {1, 3} ⊈ {1, 2}.
for all a, b, c ∈ A.
3. Divisibility on N.
Examples. This is not a total order because, for exam-
ple, 2 does not divide 3 and 3 does not divide
1. ⩽ on R. 2.
Reflexive: a ⩽ a for all a ∈ R.
Antisymmetric: a ⩽ b and b ⩽ a ⇒ a = b for 4. Alphabetical order of words.
all a, b ∈ R. This is a total order because given any two
Transitive: a ⩽ b and b ⩽ c ⇒ a ⩽ c for all different words, one will appear before the
a, b, c ∈ R. other in alphabetical order.
2. ⊆ on P(N).
Reflexive: A ⊆ A for all A ∈ P(N). 18.3 Hasse diagrams
Antisymmetric: A ⊆ B and B ⊆ A ⇒ A = B A partial order relation R on a finite set A can be
for all A, B ∈ P(N). represented as a Hasse diagram. The elements
Transitive: A ⊆ B and B ⊆ C ⇒ A ⊆ C for of A are written on the page and connected by
all A, B, C ∈ P(N). lines so that, for any a, b ∈ A, aRb exactly when
3. Divisibility on N. b can be reached from a by travelling upward
The relation “a divides b” on natural num- along the lines.
bers is reflexive, antisymmetric and transi-
Example. A Hasse diagram for the relation ⊆
tive. We leave checking this as an exercise.
on the set P({1, 2}) can be drawn as follows.
4. Alphabetical order of words.
{1, 2}
Words on the English alphabet are alphabet-
ically ordered by comparing the leftmost let-
{1} {2}
ter at which they differ. We leave checking
that this relation is reflexive, antisymmetric
and transitive as an exercise. ∅
Example. A Hasse diagram for the relation “di- Example. The relation ⩽ on {x : x ∈ R, x ⩾ 0}
vides” on the set {1, 2, 3, 5, 6, 10, 15, 30} can be is not a well-order relation. For example, the
drawn as follows. subset {x : x ∈ R, x > 3} has no least element.
30
Questions
6 15 18.1 Explain why “antisymmetric” does not
10
mean “not symmetric”. Give an example
3 of a relation which is neither symmetric
2 5
nor antisymmetric.
1 18.2 Draw a diagram of the positive divisors of
42 under the relation “divides.” Why does
Example. A Hasse diagram for the relation ⩽
it resemble the diagram for the positive di-
on the set {1, 2, 3, 4, 5} can be drawn as follows.
visors of 30}?
5
18.3 Invent a partial order relation on N × N.
Is your ordering a total ordering? Is your
4
ordering a well-ordering?
3
18.4 Well-ordering
A well-order relation on a set is a total order
relation that also has the property that each
nonempty set of its elements contains a least el-
ement.
19.1 Ordered selections without For every unordered list our reviewer could
repetition make there are 3! = 6 corresponding possible or-
dered lists. And we’ve seen that she could make
A reviewer is going to compare ten phones and
10 × 9 × 8 ordered lists. So the number of un-
list, in order, a top three. In how many ways can
ordered lists she could make is 10×9×8
6 .
she do this? More generally, how many ways are
For every combination of r elements from a
there to arrange r objects chosen from a set of
set of n elements there are r! corresponding per-
n objects?
mutations. So, using our formula for the number
In our example, the reviewer has 10 options
of permutations we have the following.
for her favourite, but then only 9 for her second-
favourite, and 8 for third-favourite. So there are
10 × 9 × 8 ways she could make her list. The number of combinations of r elements
For an ordered selection without repetition from a set of n elements (0 ⩽ r ⩽ n) is
of r elements from a set of n elements there are n(n − 1) · · · (n − r + 1) n! n
= = .
r! r!(n − r)! r
n options for the 1st element
Notice that the notation nr is used for r!(n−r)!
n!
n−1 options for the 2nd element .
n−2 options for the 3rd element Expressions like this are called binomial coeffi-
.. .. cients. We’ll see why they are called this in the
. . next lesson.
n−r+1 options for the rth element.
19.3 Ordered selections with
So we have the following formula.
repetition
An ordered selection of r elements from a set X
The number of ordered selections without
is really just a sequence of length r with each
repetition of r elements from a set of n el-
term in X. If X has n elements, then there are
ements (0 ⩽ r ⩽ n) is
n possibilities for each term and so:
n!
n(n − 1) · · · (n − r + 1) = .
(n − r)!
The number of sequences of r terms, each
When r = n and all the elements of a set S from some set of n elements, is
are ordered, we just say that this is a permuta- r
tion of S. Our formula tells us there are n! such | ×n×
n {z· · · × n} = n .
r
permutations. For example, there are 3! = 6
permutations of the set {a, b, c}:
(a, b, c), (a, c, b), (b, a, c), (b, c, a), (c, a, b), (c, b, a). 19.4 Unordered selections with
repetition
19.2 Unordered selections without A shop has a special deal on any four cans of soft
repetition drink. Cola, lemonade and sarsaparilla flavours
What if our reviewer instead chose an unordered are available. In how many ways can you select
top three? In how many ways could she do that? four cans?
More generally, how many ways are there to We can write a selection in a table, for ex-
choose (without order) r objects from a set of ample,
n objects?
C L S C L S
and .
• •• • • •••
A combination of r elements from a set S is
a subset of S with r elements. We can change a table like this into a string
of zeroes and ones, by moving from left to right
reading a “•” as a 0 and a column separator as Questions
a 1. The tables above would be converted into
19.1 A bank requires a PIN that is a string of
0 1 0 0 1 0 and 1 0 1 0 0 0 four decimal digits. How many such PINs
are there? How many are made of four
Notice that each string has four zeroes (one different digits?
for each can selected) and two ones (one fewer
than the number of flavours). We can choose 19.2 How many binary strings of length 5 are
a string like this by beginning with a string of there? How many of these contain exactly
six ones and then choosing four ones to change two 1s?
6
to zeroes. There are 4 ways to do this and so
6
there are 4 possible can selections. 19.3 In a game, each of ten players holds red,
An unordered selection of r elements, with blue and green marbles, and places one
repetition allowed, from a set X of n elements marble in a bag. How many possibilities
can be thought of as a multiset with r elements, are there for the colours of marbles in the
each in X. As in the example, we can represent bag? If each player chooses their colour at
each such multiset with a string of r zeroes and random are all of these possibilities equally
n − 1 ones. We can choose a string like this by likely?
beginning with a string of n + r − 1 ones and
then choosing r ones to change to zeroes.
n
In the above ⌈ m ⌉ means the smallest integer
n n
greater than or equal to m (or m “rounded up”).
20.1 Pascal’s triangle try in the triangle is the sum of the two entries
above it. To see why this is, we’ll begin with an
We can write the binomial coefficients in an (in-
example.
finite) triangular array as follows:
Example. Why is 62 = 52 + 51 ?
0
1
0
1 There are 62 combinations of 2 elements of
0 1 {1, 2, 3, 4, 5, 6}. Every such combination either
2 2 2
3
0 31 32 3
0 41 42 43 4 does not contain a 6, in which case it is
4
one of the 52 combinations of 2 elements
0 1 2 3 4
5 5 5 5 5 5
0 1 2 3 4 5 of {1, 2, 3, 4, 5}; or
6 6 6 6 6 6 6
0 1 2 3 4 5 6
.. .. .. does contain a 6, in which case the rest of
. . . the combination is one of the 51 combi-
Here are the first ten rows with the entries as nations of 1 element from {1, 2, 3, 4, 5}.
integers: So 62 = 52 + 51 .
1
We can make a similar argument in general.
1 1
Let X be a set of n elements and x is a fixed
1 2 1
1 3 3 1 n−1
of X. For any r ∈ {1, . . . , n}, there are
element
r combinations of r elements of X that do
1 4 6 4 1
not contain x and there are n−1
r−1 combinations
1 5 10 10 5 1
of r elements of X that do contain x. So:
1 6 15 20 15 6 1
1 7
35 35 21 7 1 21
1 8
56 70 56 28 8 28 1 n n−1 n−1
= + for 1 ⩽ r ⩽ n.
1 9 36 84 126 126 84 36 9 1 r r r−1
1 10 45 120 210 252 210 120 45 10 1
This shows that every internal entry in Pascal’s
This triangular array is often called Pascal’s triangle is the sum of the two above it.
triangle (although Pascal was nowhere near the
first to discover it). 20.3 The binomial theorem
(x + y)0 = 1
20.2 Patterns (x + y)1 = x+y
Writing the binomial coeffcients this way reveals (x + y)2 = x2 + 2xy + y 2
3
(x + y) = x + 3x2 y + 3xy 2 + y 3
3
a lot of different patterns in them. Perhaps the
most obvious is that every row reads the same (x + y)4 = x4 + 4x3 y + 6x2 y 2 + 4xy 3 + y 4
left-to-right and right-to-left. Choosing r ele- (x+y) = x +5x4 y+10x3 y 2 +10x2 y 3 +5xy 4 +y 5
5 5
20.4 Inclusion-exclusion
A school gives out prizes to its best ten students
in music and its best eight students in art. If
three students receive prizes in both, how many
students get a prize? If we try to calculate this
as 10 + 8 then we have counted the three over-
achievers twice. To compensate we need to sub-
tract three and calculate 10 + 8 − 3 = 15.
In general, if A and B are finite sets then we
have
|A ∪ B| = |A| + |B| − |A ∩ B|.
With a bit more care we can see that if A, B
and C are sets then we have
|A ∪ B ∪ C| = |A| + |B| + |C| − |A ∩ B|
−|A ∩ C| − |B ∩ C| + |A ∩ B ∩ C|.
This is part of a more general law called the
inclusion-exclusion principle.
Probability gives us a way to model ran- It can be convenient to give this as a table:
dom processes mathematically. These processes
s 1 2 3 4
could be anything from the rolling of dice, to .
1 1 1 1
radioactive decay of atoms, to the performance Pr(s) 2 4 8 8
of a stock market index. The mathematical en- Example. Rolling a fair six-sided die could
vironment we work in when dealing with prob- be modeled by a probability space with sample
abilities is called a probability space. space S = {1, 2, 3, 4, 5, 6} and probability func-
tion Pr given as follows.
Your friend believes that Python coding has least 3 and B be the event that the result was
become more popular than AFL in Melbourne. even. What is Pr(A|B)?
She bets you $10 that the next person to pass 1
Pr(A ∩ B) = Pr(4) = 8
you on the street will be a Python program-
1 1 3
mer. You feel confident about this bet. How- Pr(B) = Pr(2) + Pr(4) = 4 + 8 = 8
ever, when you see a man in a “Hello, world!” Thus,
Pr(A∩B)
t-shirt approaching, you don’t feel so confident Pr(A|B) = Pr(B) = ( 18 )/( 38 ) = 13 .
any more. Why is this? Example. A binary string of length 6 is gener-
We can think about this with a diagram. ated uniformly at random. Let A be the event
The rectangle represents the set of people in that the first bit is a 1 and B be the event that
Melbourne, the circle P is the set of Python the string contains two 1s. What is Pr(A|B)?
coders, and the circle T is the set of “Hello, There are 26 strings in our sample space.
world!” t-shirt owners. Now A ∩ B occurs when the first bit is 1 and the
5
rest of the string contains 1 one. There
6 are 1
5
such strings and so Pr(A ∩ B) = 1 /2 . Also,
there are 62 strings containing two 1s and so
Pr(B) = 62 /26 . Thus,
T P
Pr(A∩B) 5 6
= 13 .
Pr(A|B) = Pr(B) = 1 / 2
22.5 Bayes’ theorem examples 22.2 A standard die is rolled twice. What is the
probability that the first roll is a 1, given
Example. Luke Skywalker discovers that some that the sum of the rolls is 6?
porgs have an extremely rare genetic mutation
that makes them powerful force users. He de- 22.3 A bag contains three black marbles and
velops a test for this mutation that is right 99% two white marbles and they are randomly
of the time and decides to test all the porgs on selected and removed, one at a time un-
Ahch-To. Suppose there are 100 mutant porgs til the bag in empty. Use Bayes’ theo-
in the population of 24 million. We would guess rem to calculate the probability that the
that the test would come up positive for 99 of the first marble selected is black, given that
100 mutants, but also for 239 999 non-mutants. the second marble selected is black.
We are assuming that the conditional prob-
ability of a porg testing positive given it’s a mu-
tant is 0.99. But what is the conditional prob-
ability of it being a mutant given that it tested
positive? From our guesses, we would expect
99
this to be 99+239999 ≈ 0.0004. Bayes’ theorem
gives us a way to formalise this:
Pr(P |M )Pr(M )
Pr(M |P ) = Pr(P |M )Pr(M )+Pr(P |M )Pr(M )
100
×0.99
= 100
24000000
×0.99+(1− 100
)×0.01
24000000 24000000
99
= 99+239999
≈ 0.0004.
Lesson 23: Random variables
In a game, three standard dice will be rolled Example. A standard die is rolled three times.
and the number of sixes will be recorded. We Let X be the number of sixes rolled. What is the
could let X stand for the number of sixes rolled. probability distribution of X? Obviously X can
Then X is a special kind of variable whose value only take values in {0, 1, 2, 3}. Each roll there
is based on a random process. These are called is a six with probability 16 and not a six with
random variables. probability 65 . The rolls are independent.
Because the value of X is random, it doesn’t 5 5 5
Pr(X = 0) = 6 × 6 × 6
make sense to ask whether X = 0, for exam-
ple. But we can ask what the probability is that Pr(X = 1) = ( 61 )( 65 )( 56 ) + ( 56 )( 16 )( 56 ) + ( 56 )( 56 )( 16 )
X = 0 or that X ⩾ 2. This is because “X = 0” Pr(X = 2) = ( 16 )( 61 )( 56 ) + ( 16 )( 56 )( 16 ) + ( 56 )( 16 )( 16 )
and “X ⩾ 2” correspond to events from our 1 1 1
sample space. Pr(X = 3) = 6 × 6 × 6
So the probability distribution of X is
23.1 Formal definition x 0 1 2 3
125 75 15 1
.
Formally, a random variable is defined as a func- Pr(X = x) 216 216 216 216
tion from the sample space to R. In the example
above, X is a function from the process’s sample 23.3 Independence
space that maps every outcome to the number
of sixes in that outcome. We have seen that two events are independent
when the occurrence or non-occurrence of one
Example. Let X be the number of 1s in event does not affect the likelihood of the other
a binary string of length 2 chosen uniformly occurring. Similarly two random variables are
at random. Formally, X is a function from independent if the value of one does not affect
{00, 01, 10, 11} to {0, 1, 2} such that the likelihood that the other will take a certain
value.
X(00) = 0, X(01) = 1, X(10) = 1, X(11) = 2.
For most purposes, however, we can think of X Random variables X and Y are independent
as simply a special kind of variable. if, for all x and y,
Pr(X = x ∧ Y = y) = Pr(X = x)Pr(Y = y).
23.2 Probability distribution
We can describe the behaviour of a random vari-
Example. An integer is generated uniformly at
able X by listing, for each value x that X can
random from the set {10, 11, . . . , 29}. Let X and
take, the probability that X = x. This gives the
Y be its first and second (decimal) digit. Then
probability distribution of the random variable.
X and Y are independent random variables be-
Again, formally this listing is a function from
cause, for x ∈ {1, 2} and {0, 1, . . . , 9},
the values of X to their probabilities.
1
Pr(X = x ∧ Y = y) = 20
Example. Continuing with the last example, 1 1
= 2 × 10
the probability distribution of X is given by
= Pr(X = x)Pr(Y = y).
1
4 if x = 0
Pr(X = x) = 1
if x = 1 23.4 Operations
2
1
if x = 2.
4 From a random variable X, we can create new
It can be convenient to give this as a table: random variables such as X + 1, 2X and X 2 .
These variables work as you would expect them
x 0 1 2 to.
1 1 1
.
Pr(X = x) 4 2 4
Example. If X is the random variable with dis- Questions
tribution
23.1 An elevator is malfunctioning. Every
x −1 0 1 minute it is equally likely to ascend one
1 1 1
,
Pr(X = x) 6 3 2
floor, descend one floor, or stay where it
is. When it begins malfunctioning it is on
then the distributions of X + 1, 2X and X 2 are
level 5. Let X be the level it is on three
y 0 1 2 minutes later. Find the probability distri-
Pr(X + 1 = y) 1 1 1 bution for X.
6 3 2
2 12 0.3
Uniform distribution with a = 3, b = 8
0.2
0.2
0.1
0.15
Pr(X = k)
0
0 2 4 6 8 10 12
0.1 k
0.05
Example. If every minute there is a 1% chance
that your internet connection cuts out then the
probability of staying online for exactly x con-
0 secutive minutes is approximated by a geometric
0 2 4 6 8 10
k distribution with p = 0.01. It follows that the
expected value is 1−0.01
0.01 = 99 minutes and the
variance is 1−0.01
(0.01)2
= 9900.
25.2 Bernoulli distribution
25.4 Binomial distribution
This type of distribution arises when we have a
single process that succeeds with probability p This distribution gives the probability that, in
and fails otherwise. Such a process is called a a sequence of n independent Bernoulli trials, we
Bernoulli trial. see exactly k successes.
The Bernoulli distribution with parameter The binomial distribution with parameters
p ∈ [0, 1] is given by n ∈ Z+ and p ∈ [0, 1] is given by
(
Pr(X = k) = nk pk (1 − p)n−k
p for k = 1
Pr(X = k) =
1 − p for k = 0. for k ∈ {0, . . . , n}.
We have E[X] = p and Var[X] = p(1 − p). We have E[X] = np and Var[X] = np(1 − p).
Binomial distribution with n = 20, p = 0.5 λ = 6 approximates the probability it receives
k calls in a certain minute. It follows that the
0.2 expected value is 6 calls and the variance is 6.
0.15 Questions
Pr(X = k)
0.2
0.15
Pr(X = k)
0.1
0.05
0
0 2 4 6 8 10 12 14 16
k
Just as the structure of the natural numbers Remark. Using a recursive program to com-
supports induction as a method of proof, it sup- pute Fibonacci numbers can easily lead to stack
ports induction as a method of definition or of overflow, because each value depends on two
computation. previous values (each of which depends on an-
When used in this way, induction is usually other two, and so on).
called recursion, and one speaks of a recursive A more efficient way to use the recursive defi-
definition or a recursive algorithm. nition is to use three variables to store F (k + 1),
F (k) and F (k − 1). The new values of these
26.1 Recursive Definitions variables, as k increases by 1, depend only on
the three stored values, not on all the previous
Many well known functions f (n) are most easily
values.
defined in the “base step, induction step” for-
mat, because f (n + 1) depends in some simple
way on f (n).
The induction step in the definition is more 26.2 Properties of recursively defined
commonly called the recurrence relation for f , functions
and the base step the initial value.
These are naturally proved by induction, using
Example. The factorial f (n) = n! a base step and induction step which parallel
those in the definition of the function.
Initial value. 0! = 1.
Recurrence relation. (k + 1)! = (k + 1) × k!
Many programming languages allow this Example. For n ⩾ 5, 10 divides n!
style of definition, and the value of the func-
tion is then computed by a descent to the initial
value. Proof Base step.
For example, to compute 4!, the machine 5! = 5 × 4 × 3 × 2 × 1 = 10 × 4 × 3,
successively computes
hence 10 divides 5!.
4! = 4 × 3! Induction step. We have to show
= 4 × (3 × 2!) 10 divides k! =⇒ 10 divides (k + 1)!
= 4 × (3 × (2 × (1!)))
Since (k + 1)! = (k + 1) × k! by the recurrence
= 4 × (3 × (2 × (1 × 0!))) relation for factorial, the induction step is clear,
which can finally be evaluated since 0! = 1. and hence the induction is complete.
Remark. The numbers 4, 3, 2, 1 have to be
stored on a “stack” before the program reaches Example. F (0) + F (1) + · · · + F (n) = F (n +
the initial value 0! = 1 which finally enables it 2) − 1.
to evaluate 4!.
Thus a recursive program, though short,
may run slowly and even cause “stack overflow.” Proof Base step. F (0) = 0 = F (2) − 1,
because F (2) = 1.
Induction step. We have to show
Example. The Fibonacci sequence
0, 1, 1, 2, 3, 5, 8, . . . F (0) + F (1) + · · · + F (k)
= F (k + 2) − 1
The nth number F (n) in this sequence is defined
by ⇒ F (0) + F (1) + · · · + F (k + 1)
= F (k + 3) − 1.
Initial values. F (0) = 0, F (1) = 1.
Recurrence relation. F (k +1) = F (k)+F (k −1).
Well, hence it follows (by induction) that
F (0) + F (1) + · · · + F (k) f (n) = number of n bit strings
= F (k + 2) − 1 with no consecutive 0s
⇒ F (0) + F (1) + · · · + F (k + 1) = F (n + 2).
= F (k + 2) + F (k + 1) − 1,
by adding F (k + 1) to both sides Questions
⇒ F (0) + F (1) + · · · + F (k + 1) 26.1 A function s(n) is defined recursively by
= F (k + 3) − 1 Initial value: s(0) = 0
since F (k + 2) + F (k + 1) = F (k + 3) Recurrence relation: s(n + 1) = s(n) + 2n + 1
by the Fibonacci recurrence relation Write down the first few values of s(n),
This completes the induction. and guess an explicit formula for s.
Example. 1 + a + a2 + · · · + an
27.3 Sum and product Notation
This is the function g(n) defined by
Initial value. g(0) = 1 n
Recurrence relation. g(k + 1) = g(k) + ak+1
X
1 + 2 + 3 + · · · + n is written k,
k=1
We can use this relation to prove by induc-
n+1 n
tion that g(n) = a a−1−1 (a formula for the sum X
1 + a + a2 + · · · + an is written ak .
of a geometric series), provided a ̸= 1. k=0
Questions
28.1 Find the next four values of each of the fol-
lowing recurrence relations. What order is
each recurrence relation? Which are ho-
mogeneous and which are inhomogeneous?
(a) rk+1 = rk + k 2 , r0 = 0.
Description Picture
The graph is a complete bipartite graph A graph G is connected if, for any two of its
with parts of sizes i and j if every vertex in vertices A and B, it has a subgraph that is a
{U1 , U2 , . . . , Ui } is joined by an edge to every path from A to B.
vertex in {V1 , V2 , . . . , Vj }. Below are pictures of
some bipartite graphs. The vertices have been A graph that is not connected is discon-
arranged to the left and right to make it obvious nected. All the examples of graphs we have seen
the graphs are bipartite. so far in this lesson have been connected.
Example. The graph pictured below is discon-
nected. For example it does not contain a path
from D to E.
A D
Below are pictures of some complete bipar-
tite graphs. C B
Questions
29.1 Write the vertex sets and edge sets for the
Notice that all paths are bipartite and cy- graphs corresponding to the following pic-
cles of even length are bipartite. Cycles of odd tures
length are not bipartite, however. A D A B A B
29.4 Subgraphs
B C D C C D
A subgraph of a graph G is a graph whose
29.2 Draw pictures of graphs with the following
vertex set is a subset of the vertex set of G
vertex and edge sets.
and whose edge set is a subset of the edge set
(a) vertex set: {A, B, C, D}
of G.
edge set: {AB, BC, BD}
So a subgraph of a graph G is a graph that (b) vertex set: {A, B, C, D, E}
can be obtained from G by (possibly) deleting edge set: {AB, BC, CA, DE}
edges and/or vertices. Note that every graph is
29.3 What is the maximum number of edges
a subgraph of itself.
that a bipartite graph with 6 vertices can
Example. The graph pictured on the right be- have? What is the maximum number of
low is a subgraph of the graph pictured on the edges that a bipartite graph with n ver-
left. tices can have?
A D A
B C B C
30.1 Examples
30.3 Adjacency matrix powers
In these pictures, a walk is indicated by a di-
rected curve running alongside the actual edges
The product of matrices
in the walk.
a11 a12 · · · b11 b12 · · ·
a21 a22 · · · × b21 b22 · · ·
For example, if G is
Lecture 31: Degree
C
Lecture 28. Degree The question came up: is it possible for a
A B walk to cross all seven bridges without crossing
the same bridge twice?
then the degree of B is 2 and the degrees of A An equivalent question is whether there is
The degree of a vertex A in a graph G is the
and C arenumber
bothof1. times A occurs in the list of edges of The questiona came
trailup:which includes
is it possible for a walkall edges in the following
. .
is viewed asThe
a reason
handshake,
for the name is that if each edge is 1. Each time a walk enters and leaves a vertex
viewed as a handshake, 31.3 Euler’s solution
it "uses up" 2 from the degree.
� 2. Hence if all edges are used by the walk, all
then at each vertex V Eulerthe(1737)
vertices except first and observed
last must havethat the answer is no, be-
then at each vertex V = number of hands.
degree(V) even degree.
cause
Hence 3. The seven bridges graph in fact has four ver
degree(V ) = number of hands. tices of odd degree.
sum of degrees 1. Each time a walk enters and leaves a ver-
Hence = total number of hands
sum of =degrees 31.4 Euler's theorem
tex it “uses up” 2 from the degree.
2 x number of handshakes
The same argument shows in general that
= total number
An important of hands
consequence
A graph with >2. 2 odd
Hence if all
edges are used by the walk, all
degree vertices
= × handshaking
2The number of implies that in
handshakes
lemma any has no trail using all its edges.
graph the sum of degrees is even (being vertices except the first and last must have
2xsomething). Thus it is impossible, e.g. for And a similar argument
evenshows
degree.
An important consequence
a graph to have degrees 1,2,3,4,5.
A graph with odd degree vertices
The handshaking lemma implies thatofin anyhas no closed trail using all its edges.
31.2 The seven bridges
graph the sumKonigsberg of degrees is even (being 3. The seven bridges graph in fact has four
(Because in this case the first and last vertex
2×something). ThusKonigsberg
In 18th century it is impossible,
there were seven forthe same, and its vertices
e.g. are of odd
degree is ''used up" bydegree.
a
bridges connecting islands in the river to the closed trail as follows: 1 at the start, 2 each time
a graph tobanks
have degrees
as follows. 1,2,3,4,5. through, 1 at the end.)
(Because in this case the first and last vertex 31.6 Bridges
are the same, and its degree is “used up” by a A bridge in a connected graph G is an edge
closed trail as follows: 1 at the start, 2 each time whose removal disconnects G. E.g. the edge
through, 1 at the end.) B is a bridge in the following graph.
Remarks
A tree is a graph that is connected and has
no subgraph that is a cycle. 1. This proof also shows that any edge in a
tree is a bridge.
For example,
2. Since a tree has one more vertex than edge,
it follows that m trees have m more ver-
tices than edges.
A e B
is a spanning tree of
Removing e disconnects the ends A and B
of e. (If they were still connected, by some path
p, then p and e together would form a cycle in
Tk+1 , contrary to its being a tree.)
Thus Tk+1 − {e} consists of two trees, say Ti
and Tj with i and j vertices respectively. We Any connected graph G contains a spanning
have i + j = k + 1 but both i, j ⩽ k, so our tree.
induction assumption gives
This is proved by induction on the number of
Ti has i − 1 edges, Tj has j − 1 edges. edges.
But then Tk+1 = Ti ∪ Tj ∪ {e} has Base step. If G has no edge but is connected
(i − 1) + (j − 1) + 1 = (i + j) − 1 = k edges, as then it consists of a single vertex. Hence G itself
required. is a spanning tree of G.
Induction step. Suppose any connected 4. The algorithm always works, though this
graph with ⩽ k edges has a spanning tree, and is not obvious, and the proof is not re-
we have to find a spanning tree of a connected quired for this course. (You can find it,
graph Gk+1 with k + 1 edges. e.g. in Chartrand’s Introductory Graph
If Gk+1 has no cycle then Gk+1 is itself a Theory.)
tree, hence a spanning tree of itself.
If Gk+1 has a cycle p we can remove any edge 5. Another problem which can solved by a
e from p and Gk+1 − {e} is connected (because “greedy” algorithm is splitting a natural
vertices previously connected via e are still con- number n into powers of 2. Begin by
nected via the rest of p). Since Gk+1 − {e} has subtracting the largest such power 2m ⩽
one edge less, it contains a spanning tree T by n from n, then repeat the process with
induction, and T is also a spanning tree of Gk+1 . n − 2m , etc.
Remarks
32.3 Also find spanning trees of the cube and
1. T is not necessarily a tree at all steps of dodecahedron which are paths.
the algorithm, but it is at the end.
To search a graph G systematically, it helps each vertex v to a “predecessor” among the ad-
to have a spanning tree T , together with an or- jacent vertices of v already in T . An arbitrary
dering of the vertices of T . vertex is chosen as the root V0 of T .
Sets of numbers
N the set of natural numbers {0, 1, 2, 3, . . .}
Z the set of integers {. . . , −2, −1, 0, 1, 2, . . .}
Q the set of rational numbers { ab : a, b ∈ Z, b ̸= 0}
R the set of real numbers
Number Theory
a|b a divides b b = qa for some q ∈ Z
gcd(a, b) greatest common divisor of a and b
a ≡ b (mod n) a and b are congruent modulo n n | (a − b)
Logic
¬p not p
p∧q p and q
p∨q p or q
p→q p implies q
∀x for all x
∃x there exists x
Sets
x∈A x is an element of A
{x : P (x)} the set of x such that P (x)
|A| the number of elements in A
A⊆B A is a subset of B
A∩B A intersect B {x : x ∈ A ∧ x ∈ B}
A∪B A union B {x : x ∈ A ∨ x ∈ B}
A−B set difference A minus B {x : x ∈ A ∧ x ∈
/ B}
A△B A symmetric difference B {x : x ∈ A ∨ x ∈ B}
Functions
f :A→B f is a function from A to B
Probability
Pr(E) probability of E
Pr(A|B) conditional probability of A given B
E[X] expected value of X
Var[X] variance of X
¬¬p ≡ p
p∧p≡p
p∨p≡p Conditional probability
p∧q ≡q∧p Pr(A ∩ B)
Pr(A|B) =
Pr(B)
p∨q ≡q∨p
p ∧ (q ∧ r) ≡ (p ∧ q) ∧ r Bayes’ theorem
p ∨ (q ∨ r) ≡ (p ∨ q) ∨ r Pr(B|A)Pr(A)
Pr(A|B) =
Pr(B|A)Pr(A) + Pr(B|A)Pr(A)
p ∧ (q ∨ r) ≡ (p ∧ q) ∨ (p ∧ r)
p ∨ (q ∧ r) ≡ (p ∨ q) ∧ (p ∨ r)
¬(p ∧ q) ≡ (¬p) ∨ (¬q) Discrete uniform distribution
¬(p ∨ q) ≡ (¬p) ∧ (¬q) 1
Pr(X = k) = for k ∈ {a, a + 1, . . . , b}
b−a+1
p∧T≡p a+b (b−a+1)2 −1
E[X] = 2 , Var[X] = 12
p∨F≡p
Bernoulli distribution
p∧F≡F (
p∨T≡T p for k = 1
Pr(X = k) =
1 − p for k = 0
p ∧ (¬p) ≡ F
E[X] = p, Var[X] = p(1 − p)
p ∨ (¬p) ≡ T
p ∧ (p ∨ q) ≡ p Geometric distribution
p ∨ (p ∧ q) ≡ p Pr(X = k) = p(1 − p)k for k ∈ N
1−p 1−p
¬∀xP (x) ≡ ∃x¬P (x) E[X] = p , Var[X] = p2
¬∃xP (x) ≡ ∀x¬P (x)
Binomial distribution
n k
Pr(X = k) = p (1 − p)n−k for k ∈ {0, . . . , n}
k
Ordered selections without repetition
E[X] = np, Var[X] = np(1 − p)
n!
n(n − 1) · · · (n − r + 1) =
(n − r)! Poisson distribution
λk e−λ
Unordered selections without repetition Pr(X = k) = for k ∈ N
k!
n(n − 1) · · · (n − r + 1) n! n E[X] = λ, Var[X] = λ
= =
r! r!(n − r)! r