Review Notes - Probability
Review Notes - Probability
Viraj Agashe
December 2021
Contents
1 Probability Basics 2
1.1 Important Terminologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Axiomatic Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Properties of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Classical Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 More Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Random Variable 4
2.1 Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Types of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 Discrete RV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.2 Continuous Type RV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.3 Mixed Type RV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Probability Distribution Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.4 Function of Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.4.1 Distribution of Y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4 Generating Functions 8
4.1 Probability Generating Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.2 Moment Generating Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.3 Characteristic Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
5 Random Vectors 9
5.1 Joint Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.2 Joint PDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1
6 Independent Random Variables 10
6.1 Functions of Independent RVs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
6.2 i.i.d Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
7 Conditional Distributions 10
7.1 Discrete RVs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
7.2 Continuous RVs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
10 Limiting Distributions 12
10.1 Convergence in Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
10.2 Convergence in Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
10.3 Convergence in rth Moment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
10.4 Convergence Almost Surely . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1 Probability Basics
1.1 Important Terminologies
1. Sample Space (Ω): The set of all possible results of a random experiment.
2. σ-field: A collection F of subsets of Ω which satisfies
(i) ω ∈ F
(ii) If A ∈ F then Ā ∈ F
S∞
(iii) Union of countable elements of F belongs to F, i.e. i=1 Ai ∈ F
2
3. Event: Any element of F
4. Samples: Any element of Ω
5. Borel σ-field (B) on R: The collection of Borel sets on R. Borel Sets are those sets which can be
formed from countable union, countable intersection and relative complement of open intervals.
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
|A|
P (A) =
|Ω|
P (A ∪ B)
P (A/B) =
P (B)
P (A ∩ B) = P (A)P (B)
3
(i) Pairwise Independent: Sequence of events {Ai } is pairwise independent if P (Ai )P (Aj ) =
P (Ai ∩ Aj ) ∀ i 6= j.
(ii) Mutually Independent: Sequence of events {Ai } is pairwise independent if P (A1 ∩ A2 ∩ ... ∩
An ) = P (A1 )P (A2 )...P (An ).
3. Total Probability Theorem: For mutually disjoint and mutually exhaustive events A1 , A2 , ...An
we have for any event B, X
P (B) = P (B\Ai )P (Ai )
n
4. Baye’s Theorem: For any event B ∈ F with P (B) > 0, for mutually disjoint and mutually
exhaustive events A1 , A2 , ...An we have,
P (Ai )P (B\Ai )
P (Ai \B) = P
n P (B\Ai )P (Ai )
2 Random Variable
Let (Ω, F, P ) be a probability space. A real valued function X : Ω −→ R is said to be a random
variable iff
X −1 {(−∞, x]} ∈ F ∀ x ∈ R
.
Figure 1: Collect the set of outcomes ω ∈ Ω which under the mapping X gives values from (−∞, x].
If this lies in F then X is a random variable.
4
2.2 Types of Random Variables
2.2.1 Discrete RV
If the CDF of the Random Variable (RV) has countable no. of (left) discontinuities then it is a discrete
RV. For a discrete RV, the CDF is given by:
X
FX (x) = P (X = xi )
xi ≤x
The probability mass function (PMF) of a discrete RV is defined as, P(x) = P (X = xi ) when x = i
and 0 at other points. Properties:
1. P(x) ≥ 0 ∀ x ∈ R
P
2. x P(x) = 1
The function f is called the probability density function of the continuous type RV. Note that
f (x) = F 0 (x). It satisfies:
1. f (t) > 0 ∀ t ∈ R
R∞
2. −∞ f (t)dt = 1
P (x ≤ X ≤ x + ∆x) ≈ f (x)∆x
5
2.4.1 Distribution of Y
We can find the distribution of the random variable Y as follows:
FY (y) = P (Y ≤ y) = P (g(X) ≤ y)
From here, we may determine the distribution of Y . Note that if g is strictly monotonic and differen-
−1
tiable then the following holds: fY (y) = fX g −1 (y) dg dy(y) , where y = g(X), and 0 otherwise.
Properties:
1. E(c) = c, where c is a constant.
2. E(aX + b) = aE(X) + b
3. If P (X ≥ 0) = 1, E(X) ≥ 0 if it exists.
4. If X is continuous type with the CDF F (X) then E(X) is given by,
Z ∞ Z 0
E(X) = (1 − F (x)) dx − (F (x)) dx
0 −∞
3.2 Variance
Let X be a random variable with E(X) = µ exists. The second order moment about mean is called
variance, defined as,
V ar(X) = E (x − µ)2
6
3.3 Higher Order Moments
3.3.1 n-th Order Moment about Mean
For a RV X for which n-th order moments about mean exist, we denote by
Note that if µ0n exists then µ0r also exists for all r < n.
E(X)
P (X ≥ t) ≤
t
σ2
P (|X − µ| ≥ ) ≤
2
E(g(X)) ≥ g(E(X))
7
4 Generating Functions
4.1 Probability Generating Function
Let X is a non-negative integer valued RV with Pk = P (X = k) then we can define the PGF as,
∞
X
GX (s) = Pk sk
k=0
Remarks:
1. Converges in s ∈ (−1, 1)
2. Gx (s) = E(sX )
(r)
3. GX (s) represents the r-th derivative of GX . Evaluating it at zero gives factorial moments of
r-th order, i.e.
(r)
GX (1) = E (x(x − 1)...(x − r + 1))
4. We can find the probabilities from the k-th derivatives of the PDF, i.e.
1 (k)
P (X = k) = G (0)
k! X
5. If X and Y have the same PGF for all s then X and Y have the same distribution.
Remarks:
1. One can get the n-th order moments about the moment from the n-th derivative of the MGF
at origin, i.e.
(n)
E(X n ) = MX (0)
2. If X and Y have the same MGF for all t then X and Y have the same distribution.
8
5 Random Vectors
A collection of n random variables (X1 , X2 , ...Xn ) over a probability space (Ω, F, P ) is called a random
vector.
F (x1 , x2 ) = P (X1 ≤ x1 , X2 ≤ x2 ) , x1 , x2 ∈ R
It satisfies:
1. F (x1 , x2 ) is non-decreasing, continuous from the right w.r.t each of the coordinates (x1 , x2 ).
2. When x1 →
− ∞, x2 →
− ∞ then F (x1 , x2 ) →
− 1.
3. When x1 →
− −∞ or x2 →
− −∞ then F (x1 , x2 ) →
− 0.
4. For every (a, c), (b, d) s.t. a < b and c < d we have,
The distribution of one of the random variables constituting a random vector is called a marginal
distribution. From a joint CDF F (x1 , x2 ), we can obtain the marginal distribution of X1 as,
By summing up the PMF over one of the coordinates, we can get the marginal distribution of the
other random variable, i.e. X
PX1 (x1 ) = P (x1 , x2 )
X2
9
6 Independent Random Variables
We say that X and Y are independent random variables if and only if
For discrete type random variables, a necessary and sufficient condition is,
P (X = xi , Y = yi ) = P (X = xi )P (Y = yi )
For continuous type random variables, another condition involving PDFs is:
Note that for independent random variables, E(XY ) = E(X)E(Y ). Note that the converse is not
necessarily true.
7 Conditional Distributions
7.1 Discrete RVs
For two random variables X, Y of discrete type, condiitonal PMF of X given Y = y is given by:
P (X = x, Y = y)
PX/Y (x/y) = P (X = x/Y = y) =
P (Y = y)
As long as P (Y = y) > 0
f (x, y)
fX/Y (x/y) =
fY (y)
10
8.1 Distribution of Function of RV
Consider any 2D continuous type RV with joint PDF f (x, y). Define Z = H1 (X, Y ) and W =
H2 (X, Y ). Assuming that H1 , H2 are Borel-measurable, we can solve for the distribution of Z, W
under the assumptions:
• It is possible to solve z = H1 (x, y) and w = H2 (x, y) uniquely for x, y in terms of z, w. Let the
solution be x = g1 (z, w) and y = g2 (z, w).
• The partial derivatives of x, y wrt z, w exist and are continuous.
Then the joint PDF of Z, W can be written as,
fZ,W (z, w) = fX,Y (g1 (z, w), g2 (z, w)) |J(z, w)|
2. If U = X − Y then Z ∞
fU (u) = f (u + y, y)dy
−∞
3. If V = XY then Z ∞
v 1
fV (v) = f x, . dx
−∞ x |x|
X
4. If W = Y then Z ∞
fW (w) = f (yw, y).|y|dy
−∞
Properties of covariance:
1. cov(aX, Y ) = acov(X, Y )
2. cov(X + Y, Z) = cov(X, Z) + cov(Y, Z)
P P
3. cov( Xi , Y ) = cov(Xi , Y )
4. If X, Y are independent, cov(X, Y ) = 0.
11
9.1 Variance Formula
The variance of a sum of random variables can be expressed as:
X X XX
var( Xi ) = var(Xi ) + cov(Xi , Xj )
cov(X, Y )
ρ(X, Y ) =
σX σY
p
Here, σX = var(X). Note that |ρ(X, Y )| ≤ 1.
10 Limiting Distributions
10.1 Convergence in Distribution
Let X1 , X2 , ...Xn be a sequence of random variables with CDF F1 , F2 , ... respectively. We say that
d
{Xn } converges in distribution to X i.e. Xn −
→ X if:
In other words,
P (A) = P ({ω ∈ Ω|Xn (w) → X}) = 1
12
11 Laws of Large Numbers
11.1 Weak Law of Large Numbers
Let X1 , X2 , ...Xn be a sequence of i.i.d. random variables with mean µ and variance σ 2 . Then for any
> 0 we have,
σ2
X1 + X2 + ...Xn
P −µ > ≤ 2
n n
The proof follows by Chebyshev’s inequality.
13
13.3 Geometric Distribution
PMF:
P (k) = (1 − p)k−1 p, k = 1, 2, ...
Properties:
• In a sequence of Bernoulli trials, the 1st success is a geometric distribution.
• Denoted by X ∼ Geo(p).
• Mean: E(X) = 1
p
pet
• MGF: M (t) = 1−(1−p)et
• Mean: E(X) = λ
• Variance: var(X) = λ
t
• MGF: M (t) = eλ(e −1)
14
14 Common Continuous Distributions
14.1 Uniform Distribution
PDF: (
1
b−a a<x<b
f (x) =
0 otherwise
Properties:
(b−a)2
• Variance: var(X) = 12
ebt −eat
• MGF: M (t) = t(b−a)
• Denoted by X ∼ exp(λ).
• Mean: E(X) = 1
λ
• Variance: var(X) = 1
λ2
• MGF: M (t) = 1
t
1− λ
,t < λ
15
14.4 Normal Distribution
PDF:
1 (x−µ)2
f (x) = √ e 2σ2 , x ∈ R
σ 2π
Properties:
• Denoted by X ∼ N (µ, σ 2 ).
• If we consider Z = X−µ
σ then we have Z is standard normal distributed, i.e. Z ∼ N (0, 1) with
1 −z2
f (z) = √ e 2
2π
• Mean: µ
• Variance: σ 2
σ 2 t2
• MGF: M (t) = eµt+ 2
16