Random variables distributions
Random variables distributions
distributions
Prof. Miloš Stanković
Introduction
• Fundamental concept in probability and statistics
• E.g. coin toss – heads we assign with number 1, tail we assign with 0
Definition
• Random variable is a mapping from the set Ω (set of all elementary events)
to the set of real numbers ℝ
• Typically random variable is denoted with capital letter, and the particular
value that it can have with small letters
• e.g. {𝑋 = 𝑥} is the event {𝜔 ∈ Ω|𝑋(𝜔) = 𝑥}
Distribution of random variable
• To a random variable 𝑋 we can assign a probability function in the
following way:
𝑃𝑋 𝐵 = 𝑃 𝜔 ∈ Ω 𝑋 𝜔 ∈ 𝐵 , 𝐵 ⊂ ℝ
• Or just:
𝑃𝑋 𝐵 = 𝑃(𝑋 ∈ 𝐵}
Probability distribution of discrete random
variables
• If the set of all possible values that a RV can take is discrete (finite or
countable) we say that the RV is discrete
• Then,
{𝒑𝒊 = 𝑷 𝑿 = 𝒙𝒊 , 𝒊 = 𝟏, 𝟐, … } is the probability mass function (PMF) of RV 𝑋
𝑛 𝑘 𝑛−𝑘
𝑃 𝑋=𝑘 = 𝑝 1−𝑝
𝑘
• 𝑋~Bin(𝑛, 𝑝)
Discrete uniform distribution
• Finite discrete RV which, with the same probability, can take any of 𝑛
possible values 𝑥1 , … , 𝑥𝑛
1
• PMF: 𝑃 𝑥𝑖 =
𝑛
• It can be parameterized using an interval 𝑎, 𝑏 :
Poisson distribution
• Poisson RV models number of certain events that happened in a unit of
time (or space), when the events occur independently of each other!
• Examples:
• Number of emails received in one day
• Number of phone calls in one day
• Number of buses in a station in a unit of time
• Number of newborn babies in one day
• Number of radioactive decays in a unit of time
• Number of trees per unit of surface in a forest
• From the given assumptions the following PMF can be derived:
𝑘
𝜆
𝑃 𝑋 = 𝑘 = 𝑒 −𝜆 , 𝑘 = 1,2, … , 𝜆>0
𝑘!
Poisson distribution
• 𝑋~Poiss (𝜆)
𝑚 𝑛−𝑚
𝑃 𝑋=𝑘 = 𝑘 𝑟−𝑘 ,
𝑛 𝑘 = 0,1,2, … , 𝑟
𝑟
Negative binominal distribution
• RV 𝑋 is the number of Bernoulli experiments up to the 𝑟-th sucess 𝑟 ≥ 1
• If 𝑟-th success happended in 𝑘-th experiment, it follows that in the first 𝑘 − 1
experiments there was 𝑟 − 1 successes
𝑘−1
• This can happen in ways, and the probability of each of theses outcomes is
𝑝𝑟−1 1 − 𝑝 𝑘−𝑟 𝑟−1
• Hence, the PMF is:
• Example: How many times we need to roll a dice in order to claim that, with probability
0.99, we had at least two sixes?
1
• Answer: 𝑝 = , 𝑟 = 2, 𝑚 𝑘=2 𝑃 𝑋 = 𝑘 ≥ 0.99 , so that we get 𝑚 ≥ 37.
6
Cumulative distribution function and
continuous random variables
• Distribution of continuous random variables cannot be characterized
with the probability mass function!
• E.g. if RV 𝑋 can take any value from the interval [0,1] with the same
probability, then for each separate point 𝑥 ∈ [0,1]
𝑃 𝑋=𝑥 =0
• PMF doesn’t make sense in this case!
𝑏
•𝑃 𝑋<𝑏 =𝑃 𝑋≤𝑏 = −∞
𝑓 𝑡 𝑑𝑡
+∞
•𝑃 𝑋>𝑎 =𝑃 𝑋≥𝑎 = 𝑎
𝑓 𝑡 𝑑𝑡
+∞
• −∞
𝑓 𝑡 𝑑𝑡 = 1
Formal interpretation of PDF
• The following equality can be derived:
𝑓 𝑥 Δ𝑥 = 𝑃 𝑥 ≤ 𝑋 ≤ 𝑥 + Δ𝑥 , (Δ𝑥 → 0)
• If we know that a has worked without malfunctions for s hours, probability that it
will malfunction in the next t hours is the same as the probability that it will
malfunction t hours after we turn it on!
• Connection with the Poisson distribution
Exponential distribution
• 𝑋~Exp(0.5)
Normal (Gaussian) distribution
• The most important distribution in probability and statistics
• RVs which are the result of large number of random influences, where
the effect of individual influence is negligible with respect to their
total sum, will have normal distribution!
• Hence, this distribution appears the most frequently in natural
processes (e.g. measurement noise)
• We will prove this statement later – Central Limit Theorem
Normal (Gaussian) distribution
𝑥2
1 −2
• PDF: 𝑓 𝑥 = 𝑒 , −∞ < 𝑥 < +∞
2𝜋
• 𝑋~Norm(0,1) – mathematical expectation is 0, variance is 1
• CDF is not elementary function! (it must be calculated numerically)
• For arbitrary expectation and variance:
(𝑥−𝜇)2
1 −
𝑓 𝑥 = 𝑒 2𝜎2 , −∞ < 𝑥 < +∞
𝜎 2𝜋
• 𝑋~Norm(𝜇, 𝜎 2 )
Normal (Gaussian) distribution
𝑋~Norm(5.2,3.7)
Random vectors
• In practice we usually observe several RVs defined on a same set of
events Ω
• e.g. in machine learning we typically have very large number of RVs
(𝑋1 , 𝑋2 , … , 𝑋𝑛 )
Joint Cumulative Distribution Function for 2D
RV
• A random vector (𝑋, 𝑌) is given
• CDFs of RVs 𝑋 and 𝑌 are called marginal CDFs and can be obtained in the
following way:
then, 𝑓𝑋,𝑌 (𝑥, 𝑦) is called joint PDF of the random vector (𝑋, 𝑌)
• If this function is continuous it can be obtained as the derivative of CDF:
𝜕
𝑓𝑋,𝑌 𝑥, 𝑦 = 𝐹𝑋,𝑌 (𝑥, 𝑦)
𝜕𝑥𝜕𝑦
+∞
• Also, it directly follows that: 𝑓𝑋 𝑥 = −∞ 𝑓𝑋,𝑌 𝑥, 𝑦 𝑑𝑦
Example – discrete random vector
• Joint PMF of discrete random vector (𝑋, 𝑌) is given:
i/j 1 2 3
1 5/24 1/12 1/6
2 ? 7/24 0
• From the condition that the sum of probabilities of all the values should be
1
1, we get 𝑃 𝑋 = 2, 𝑌 = 1 =
4
• Marginal PMFs (summing up columns or rows):
11
• 𝑃 𝑌 = 1 = 𝑃 𝑋 = 2, 𝑌 = 1 + 𝑃 𝑋 = 1, 𝑌 = 1 =
24
3
• 𝑃 𝑌 = 2 = 𝑃 𝑋 = 2, 𝑌 = 2 + 𝑃 𝑋 = 1, 𝑌 = 2 =
8
•…
Example – 2D uniform distribution
• Given is a region 𝐷 ⊂ ℝ2 , with area 𝑆
1
• PDF: 𝑓𝑋,𝑌 𝑥, 𝑦 = , 𝑥, 𝑦 ∈ 𝐷 , 𝑓 𝑥, 𝑦 = 0 , 𝑥, 𝑦 ∉ 𝐷
𝑆
1 Area (𝐵)
• Then: 𝑃 𝑋, 𝑌 ∈ 𝐵 = 𝐵
𝑑𝑥𝑑𝑦 =
𝑆 𝑆
Independence of random variables
• One of the fundamental characteristics which describe relationship
between RVs is there (in)dependence
• If RVs are not independent, their relationship can be described in
more precise terms (we will see later)
𝐹𝑋,𝑌 𝑥, y = P X ≤ 𝑥, 𝑌 ≤ 𝑦 = 𝑃 𝑋 ≤ 𝑥 𝑃 𝑌 ≤ 𝑦 = 𝐹𝑋 𝑥 𝐹𝑌 (𝑦)
𝑓𝑋,𝑌 𝑥, 𝑦 = 6𝑒 −2𝑥−3𝑦 , 𝑥, 𝑦 ≥ 0,
𝑓𝑋,𝑌 𝑥, 𝑦 = 0, 𝑥, 𝑦 < 0
• Find PMF of RV 𝑈 = 𝑋 + 𝑌
• For each value of the pair (𝑋, 𝑌) we get a value for 𝑈
• We find PMF by assigning probabilities of each value of the pair (𝑋, 𝑌) to the probabilities of
corresponding values of 𝑈
• For more complicating distributions and functions there is a general procedure/formula for
finding the resulting distributions (PDF, CDF)