0% found this document useful (0 votes)
14 views71 pages

Lecture Note V

Statistics

Uploaded by

sanab999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views71 pages

Lecture Note V

Statistics

Uploaded by

sanab999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

One-Dimensional Discrete Random

Variables (DRVs) and their Probability


Distributions (PDs)

Introduction To One-Dimensional Random Variables (RVs)

- Motivation: It's not always the case that experimental outcomes (i.e., the outcomes in a sample space of a random
experiment) are numbers; that if they were, then
• the sample space would be considered as a quantitative data set which, in turn, would enable one to study/describe
statistical features of the experimental outcomes, and/or
• every event of the sample space could be described mathematically which, in turn, would make studying the
probabilistic properties of the random experiment easier.
+ Goal of this Section: Assigning (real) numbers to experimental outcomes (i.e., numeralization of experimental
outcomes), based on the experimental objective (i.e., the objective for which the random experiment is conducted.)

Definition Random Variable (RV)a

A function assigning a real number to each outcome in a sample space of a random experiment, based on an
experimental objective , is called a random variable (RV).
a Instead of "random variable," the terms "chance variable," "stochastic variable," and "variate" are also used in some books.

- Some Introductory Remarks on Rvs:


- If the experimental outcomes are meaningful numbers, then those numbers may just be used to dene the
corresponding RV .
- The set of all RV-values (i.e., range of an RV), assigned to experimental outcomes, is called the state space of
that RV . Some authors use RX to denote the state space of an RV X .
+ An RV is usually denoted by an uppercase letter such as X, Y, Z, . . ., and its typical value is then denoted by a
lowercase letter such as x, y, z, . . ..
+ Every event of a sample space, on which an RV is dened, corresponds to and is identied with a subset of state
space of the RV, and vice versa . Considering this correspondence, then
+ Every event of a sample space, on which an RV X is dened, is identied with either

X = a, X > a, X ≥ a, X < a, X ≤ a, a < X < b, a ≤ X < b, a < X ≤ b, a ≤ X ≤ b, for some real numbers a, b,

or any combination of these.

1
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021
+ we may also assign probabilities to subsets of the state space of an RV: that is, if event E of a sample space,
on which an RV X is dened, which corresponds to and is identied with a subset E of the state space of RV
X , then

Probability of occurring subset E Probability of occurring event E of the


def
of state space of the RV X = sample space on which the RV is defined
| {z } | {z }
PX (E) := P (E).

+ It's sometimes useful to consider the state space of an RV as a (population/sample) quantitative data set, and its
values as (population/sample) data values. With this in mind, statistical features of experimental outcomes (on
which the RV is dened) could be studied/described using the assigned probabilities to each subset of the state
space.
+ Depending on whether the state space of an RV is countable or not, there are two types of RVs :

Definition (One-Dimensional) Discrete and Continuous RVs


• An RV whose state space is either nite or countably innite is called a discrete RV (DRV).
+ DRVs represent count data, such as the number of defectives in a sample of k items or the number
of highway fatalities per year in a given country.
• An RV whose state space is uncountable is called a continuous RV (CRV); that is, its state space is
an interval containing all the real numbers within a (continuous) range of real numbers (i.e., the RV can
take on values on a continuous scale.)
+ CRVs represent measured data, such as all possible heights, weights, temperatures, distance, or life
periods.

Example 1.
Random Experiment: Tossing a coin, one side labeled with H and the other one with T , three times, and observing
which side is faced up.
Sample Space: {HHH, HHT, HT H, T HH, HT T, T HT, T T H, T T T }
Experimental Objective: Counting the number of times T is faced up.
Dening The Random Variable:

Therefore, X is a DRV.
Identifying Some Events:
The event of observing exactly two T 's ≡ X = 2.

2 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

The event of observing exactly 2 H 's ≡ X = 1.


The event of observing no H 's ≡ X = 3.
The event of observing at least one T ≡ X ≥ 1.
The event of observing at most one H 's ≡ X ≥ 2.
The event of observing either one or three T 's ≡ X = 1 or X = 3.

Example 2.
Random Experiment: Randomly selecting a newly charged battery and using it to operate a certain appliance, and
observing when it fails (i.e., when it can no longer supply enough energy to operate the appliance.)

Experimental Objective: Measuring the failure time, in hours, of each randomly selected newly charged battery.

Dening The Random Variable: Based on the experimental objective, the RV of interest is time (which we denote
it by T ). Knowing that time (as a variable) can hypothetically take any positive real number, this RV is a CRV and
its state space is [0, ∞[.

Identifying Some Events:

The event of observing a newly charged battery which lasts at most 2 hours ≡ T ≤ 2.
The event of observing a newly charged battery which lasts at least 5 hours ≡ T ≥ 5.
The event of observing a newly charged battery which lasts between 2 and 5 hours ≡ 2 ≤ T ≤ 5.
The event of observing a newly charged battery which lasts either less than 2.5 hours or more than 4.2 hours ≡ T <
2.5 or T > 4.2.

+ In contrary to DRVs, which can take on any single value in their state space, CRVs can't take on single values in their
state space ; that's because of the limitations we have in our measuring tools. For this, CRVs are only allowed to take
on vales on some intervals contained in their state space.

Cumulative Distribution Functions (CDFs) of One-Dimensional RVs

+ In what follows,

+ wherever it's mentioned that "Let X be an RV", you should assume in advance that there is a random experiment
with an experimental objective for which the X is dened on its sample space, and the X -values are the numerical
assignments to the experimental outcomes.
+ The expression "X ≤ x, X < x, X ≥ x, or X > x" (for an RV X and a typical real number x) corresponds to or
is identied with an event of a sample space, on which X is dened, and contains those experimental outcomes
assigned to X -values which are ≤, <, ≥, or > x; that is, simply you should consider any of these expressions as an
event.

- One way to study and/or to specify the probabilistic properties of an RV (regardless of being discrete or continuous) is
to determine a function, called cumulative , which is dened on all real numbers :
distribution function

3 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

Definition Cumulative Distribution Function (CDF)

Let X be an RV. The function, which is denoted and dened as


FX :R −→ [0, 1]
x 7−→ FX (x) := P (X ≤ x)

is called the cumulative distribution function (CDF) of X .


+ Notice that
+ CDF is dened for every real number.
+ FX (x) gives the probability that the RV X takes on any value less than or equal x, which equals the
probability of occurring the event (of the sample space on which X is dened) containing those experimental
outcomes whose assigned X -values are less than or equal x.
- Some authors call this function ,
distribution function cumulative function , or cumulative distribution of X .
- CDF of an RV usually presented in either tabular (mostly for DRVs, not CRVs), graphical, or mathematical form .

Example 3. Determine the CDF of DRV dened in Example 1.

Solution First, let's assume the coin is fair. This assumption, along with Multiplication Rule for three independent
events, results
1
P (HHH) = P (HHT ) = P (HT H) = P (T HH) = P (HT T ) = P (T HT ) = P (T T H) = P (T T T ) = .
8
Since the state space is {0, 1, 2, 3},
• X = 0 ≡ {HHH}. Therefore, P (X = 0) = P (HHH) = 1/8,

• X = 1 ≡ {HHT, HT H, T HH}. Therefore,

P (X = 1) = P ({HHT, HT H, T HH}) = P (HHT ) + P (HT H) + P (T HH) = 3/8,

• X = 2 ≡ {HT T, T HT, T T H}. Therefore,

P (X = 2) = P ({HT T, T HT, T T H}) = P (HT T ) + P (T HT ) + P (T T H) = 3/8,

• X = 3 ≡ {T T T }. Therefore, P (X = 3) = P (T T T ) = 1/8.

Therefore,
• FX (x) = 0 for every x < 0 because X doesn't take on any value less than 0; that is, P (X < 0) = P (∅) = 0.

• Knowing that 0 is the only value that X can take on over the interval ] − ∞, 1[, we have

1
FX (x) = P (X ≤ x) = P (X = 0) = , for 0 ≤ x < 1.
8

• Knowing that 0 and 1 are the only values that X can take on over the interval ] − ∞, 2[, we have

1 3 1
FX (x) = P (X ≤ x) = P (X
| = 0 {z
or X = 1}) = P (X = 0) + P (X = 1) = + = , for 1 ≤ x < 2.
8 8 2
mutually exclusive

4 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

• Knowing that 0, 1 and 2 are the only values that X can take on over the interval ] − ∞, 3[, we have

FX (x) = P (X ≤ x) = P (X = 1 or X = 2})
| = 0 or X {z
mutually exclusive
1 3 3 7
= P (X = 0) + P (X = 1) + P (X = 2) = + + = , for 2 ≤ x < 3.
8 8 8 8

• Knowing that 0, 1, 2 and 3 are the only values that X can take on over the interval ] − ∞, ∞[, we have

FX (x) = P (X ≤ x) = P (X
| = 0 or X = 1 {z
or X = 2 or X = 3})
mutually exclusive
1 3 3 1
= P (X = 0) + P (X = 1) + P (X = 2) + P (X = 3) = + + + = 1, for 3 ≤ x.
8 8 8 8

Therefore, the CDF of DRV X , say FX , may be presented in either


• tabular form as

R x<0 0≤x<1 1≤x<2 2≤x<3 x≥3


FX (x) 0 1/8 1/2 7/8 1

• graphical form as

or
• mathematical form as 

 0 for x < 0
 1/8 for 0 ≤ x < 1


FX (x) = 1/2 for 1 ≤ x < 2 .
7/8 for 2 ≤ x < 3




1 for x ≥ 3

- Some Observations On the CDF-Graph of the DRV, in Example 3:


+ The graph is bounded; that is, it's between y = 0 and y = 1.
+ The graph is increasing (i.e., CDF increases from 0 to 1.)
+ The graph is step function; that is, it contains nite or countably innite horizontal line segments (except the left
and right tails of the graph.)

5 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

+ Since DRV-values has a minimum, the graph has a 0-tail. And, since DRV-values has a maximum, the graph has
a 1-tail.

+ The graph has (nite) jump discontinuity at the DRV-values, but right-continuous at those values.

+ The amount of each jump at any DRV-value equals the probability at that DRV-value.

General Properties of CDF of an RV


Let FX be the CDF of RV X . FX satises the following conditions:
I. 0 ≤ FX (x) ≤ 1, for every real number x.
II. limx→+∞ FX (x) = 1.
III. limx→−∞ FX (x) = 0.
IV. It's increasing on R; that is, if x1 ≤ x2 , then FX (x1 ) ≤ FX (x2 ).
V. It is right-continuous; that is, for all x and δ > 0,
lim [FX (x + δ) − FX (x)] = 0, or simply lim FX (x) = FX (a), for every a ∈ R.
δ→0 x→a+

On Probability Distributions of One-Dimensional DRVs: Probability Mass

Functions (PMFs)

+ In what follows,

+ wherever it's mentioned that "A DRV, say X , with values, say x1 , x2 , . . .", you should assume in advance that
there is a random experiment with an experimental objective for which the X is dened on its discrete sample
space, and xi 's are the numerical assignments to the experimental outcomes.

+ The expression "X = a" (for a DRV X and a typical real number a) corresponds to or is identied with the event
of a sample space, on which X is dened, and contains those experimental outcomes which are assigned to a by
X.

- Another way to study and/or to specify the probabilistic properties of an DRV is to determine a function, called
probability mass function , which is dened on all real numbers:

6 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

Definition Probability Mass Function (PMF)

Let X be a DRV with the state space RX , (notice that RX may be either nite or countably innite.) The function,
which is denoted and dened as
pX :R −→ [0, 1]

P (X = x) if x ∈ RX
x 7−→ pX (x) :=
0 if x ∈
/ RX

and satises pX (x) = 1, is called the probability mass function (PMF) of X .


P
x∈RX

+ Notice that

+ 0 ≤ pX (x) ≤ 1 for every real number x.


+ pX (x) gives the probability that the DRV X takes on the value x, which equals the probability of occurring
the event (of the sample space on which X is dened) containing those experimental outcomes which are
assigned to x by X
- Some authors call this function ,
probability function probability distribution function , or probability law of X .
- PMF of a DRV usually presented in either ,
tabular graphical (so-called line ), or mathematical
graph form .

Example 4. Determine the PMF of the DRV in Example 1.

Solution First, let's assume the coin is fair. This assumption, along with Multiplication Rule for three independent
events, results
1
P (HHH) = P (HHT ) = P (HT H) = P (T HH) = P (HT T ) = P (T HT ) = P (T T H) = P (T T T ) = .
8
Since the state space is {0, 1, 2, 3},
X = 0 ≡ {HHH}. Therefore, P (X = 0) = P (HHH) = 1/8,

X = 1 ≡ {HHT, HT H, T HH}. Therefore,

P (X = 1) = P ({HHT, HT H, T HH}) = P (HHT ) + P (HT H) + P (T HH) = 3/8

,
X = 2 ≡ {HT T, T HT, T T H}. Therefore,

P (X = 2) = P ({HT T, T HT, T T H}) = P (HT T ) + P (T HT ) + P (T T H) = 3/8

,
X = 3 ≡ {T T T }. Therefore, P (X = 3) = P (T T T ) = 1/8.

Therefore, the PMF of the DRV X , say pX , may be presented in either


tabular form as
xi 0 1 2 3
pX (xi ) 1/8 3/8 3/8 1/8

graphical form as

7 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

or

mathematical form as


 1/8 for x = 0, 3
pX (x) = 3/8 for x = 1, 2 .
0 Otherwise

PMF vs. CDF of a DRV


Let X be a DRV whose values are listed in ascending order of their magnitude as xi 's. If pX and FX are its PMF
and CDF, respectively, then
I. pX (xi ) = FX (xi ) − FX (xi−1 ), for every DRV-value xi .
- In other words, the value of pX (xi ) equals the amount of jump of CDF at DRV-value xi .

II. FX (x) = xi :xi ≤x pX (xi ), for every real number x.


P

- In other words, the value of FX (x) is constructed by simply adding together the probabilities P (X = xi )
for DRV-values xi that are no larger than x.
+ Notice that I. and II. simply state that knowledge of either the PMF or the CDF allows the other function to
be calculated.

8 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

Specifying Probabilistic Properties of a DRV Using its PMF or CDF


Let X be a DRV with state space RX , and a and b are any two real numbers such that a < b. If pX and FX are its
PMF and CDF, respectively, then
I. 
 FX (a) − pX (a) if a ∈ RX
P (X < a) = P (X ≤ a) − P (X = a) =
FX (a) if a ∈
/ RX

II.
P (X ≤ a) = FX (a).

III.
P (X > a) = 1 − P (X ≤ a) = 1 − FX (a).

IV. 
 1 − FX (a) + pX (a) if a ∈ RX
P (X ≥ a) = 1 − P (X < a) =
1 − FX (a) if a ∈
/ RX

V. 
 [FX (b) − FX (a)] − pX (b) if b ∈ RX
P (a < X < b) = P (X < b) − P (X ≤ a) =
FX (b) − FX (a) if b ∈
/ RX

VI.


 [FX (b) − FX (a)] − [pX (b) − pX (a)] if a and b ∈ RX




 [FX (b) − FX (a)] + pX (a) if a ∈ RX but b ∈
/ RX


P (a ≤ X < b) = P (X < b) − P (X < a) =
[FX (b) − FX (a)] − pX (b) if b ∈ RX but a ∈
/ RX








FX (b) − FX (a) if a and b ∈
/ RX

VII.
P (a < X ≤ b) = P (X ≤ b) − P (X ≤ a) = FX (b) − FX (a).

VIII. 
 [FX (b) − FX (a)] + pX (a) if a ∈ RX
P (a ≤ X ≤ b) = P (X ≤ b) − P (X < a) =
FX (b) − FX (a) if a ∈
/ RX

Supplementary Examples
Example 5. Evaluate the constant c so that the following function satises the conditions of being considered as a PMF:
c
f (x) = , for x = 0, 1, 2, . . . , N.
N
Solution Here we presume that the (not stated) RV to be X whose values are x = 0, 1, 2, . . . , N . Now, for the function
f to be a PMF, the rst condition is to take on non-negative values; that is, f (x) ≥ 0, for 0, 1, 2, . . . , N , which implies
c
≥ 0 ≡ c ≥ 0.
N
Then, for the function f to be a PMF, the second condition is that the sum of all values is 1; that is, f (x) = 1, which
PN
i=0

9 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

implies
N N
X X c
1= f (x) =
i=0 i=0
N
N
c X c
= 1 = (N + 1)
N i=0 N
N
= c.
N +1

Example 6. Can the following function be considered as a PMF of an RV?


3
f (x) = , for x = 0, 1, 2, 3.
4 · x!(3 − x)!

Solution Here we presume that the (not stated) RV to be X whose values are x = 0, 1, 2, 3. Now, for the function f to be
a PMF, the rst condition is to take on non-negative values; that is, f (x) ≥ 0, for 0, 1, 2, 3. Let's check its values
x 0 1 2 3
f (x) 1/8 3/8 3/8 1/8

Hence, the rst condition is satised. Then, for the function f to be a PMF, the second condition is that the sum of all
values is 1; that is, x f (x) = 1. Let's check the sum
P

X 1 3 3 1
f (x) = f (0) + f (1) + f (2) + f (3) = + + + = 1.
x
8 8 8 8

Therefore, the given function may be considered as a PMF.

Example 7. Evaluate the constant k for which the following function satises the conditions of being a PMF.
x
1
f (x) = k , for x = 0, 1, 2, . . . .
2

Solution Again, we may presume that the (not stated) RV to be X whose values are x = 0, 1, 2, . . .. Since (1/2)x ,
for x = 0, 1, 2, . . ., is positive, the rst condition is satised. For the second condition to be satised, we should have
x=0 f (x) = 1, which implies
P ∞

∞ 0 2
X 1 1 1
1= f (x) = k +k +k + ···
x=0
2 2 2
 geometric series 
1 1
= k 1 + + + · · ·
| 2 {z4 }

1
=k = 2k
1 − (1/2)
1
= k.
2

Example 8. Let X be a DRV, whose state space is {−2, 0, 1, 2}, and its PMF given as
x −2 0 1 2
pX (x) 0.2 0.1 0.5 0.2

10 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

Evaluate the probability that |X| = 2?

Solution  mutually exclusive 


z }| {
P (|X| = 2) = P X = −2 or X = 2 = P (X = −2) + P (X = 2) = 0.2 + 0.2 = 0.4.

Notice that |X| may also be considered as an RV whose state space is {0, 1, 2}, and its PMF can be constructed from the
PMF of the RV X as
|x| 0 1 2
p|X| (|x|) 0.1 0.5 0.4
Remark 3.0.1. Let f be a function dened on the state space of an RV X . f (X) (i.e., the composition function f ◦ X ) may
be considered as an RV, but its corresponding PMF, pf (X) , may not satisfy the conditions of being a PMF.

Example 9. Let X be a DRV whose state space is {0, 1, 2, 3}, and its PMF is given as
x 0 1 2 3
pX (x) 0.3 0.2 0.1 0.4
We introduce another RV Y whose values are obtained as y := x2 . Determine the PMF of RV Y .

Solution First, let nd the Y -values:


x 0 1 2 3
y = x2 0 1 4 9
from which we have
Y =0≡X=0 Y =1≡X=1 Y =4≡X=2 Y = 9 ≡ X = 3.
The later observation shows that
P (Y = 0) = P (X = 0) = 0.3 P (Y = 1) = P (X = 1) = 0.2 P (Y = 4) = P (X = 2) = 0.1 P (Y = 9) = P (X = 3) = 0.4.
Therefore, the tabular presentation of the PMF of Y is given as
y 0 1 4 9
pY (y) 0.3 0.2 0.1 0.4
Notice that the conditions of being a PMF are satised here.

Example 10. Let X be a DRV whose state space is {−1, 0, 1, 2, }, and its PMF is given as
x −1 0 1 2
pX (x) 0.1 0.3 0.4 0.2
We introduce another RV Y whose values are obtained as y := x2 . Determine the PMF of RV Y .

Solution First, let nd the Y -values:


x −1 0 1 2
y = x2 1 0 1 4
hence, the state space of Y is {0, 1, 4}, from which we have
Y =0≡X=0 Y = 1 ≡ X = 1 or X = −1 Y = 4 ≡ X = 2.
The later observation shows that
mutually exclusive
z }| {
P (Y = 0) = P (X = 0) = 0.3 P (Y = 1) = P (X = 1 or X = −1) = 0.1 + 0.4 = 0.5 P (Y = 4) = P (X = 2) = 0.2.
Therefore, the tabular presentation of the PMF of Y is given as

11 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

y 0 1 4
pY (y) 0.3 0.5 0.2

Notice that the conditions of being a PMF are satised here.

Example 11. An investment rm oers its customers municipal bonds that mature after varying numbers of years. Given
that the CDF of T , the number of years to maturity for a randomly selected bond, is


 0 for t<1
 1/4 for 1≤t<3


FT (t) = 1/2 for 3≤t<5
3/4 for 5≤t<7




1 for t≥7

Evaluate (a) P (T = 5). (b) P (T > 3). (c) P (T ≥ 5) (d) P (1.4 < T < 6). (e) P (T ≤ 5|T ≥ 2).

Solution First, notice that the state space of the DRV T is {1, 3, 5, 7} since its CDF has (nite) jump discontinuities
only at these values.
It may be helpful that the PMF is also determined. With this in mind that the PMF-value at each DRV-value equals the
jump of the corresponding CDF at that DRV-value, the PMF is given as


 1/4 if t = 1
 FT (3) − FT (1) = 1/4 if t = 3


1/4 if t = 1, 3, 5, 7
pT (t) = FT (5) − FT (3) = 1/4 if t = 5 =
0 Otherwise
FT (7) − FT (5) = 1/4 if t = 7




0 Otherwise

(a)
1
P (T = 5) = pT (5) = .
4

(b)
1 1
P (T > 3) = 1 − P (T ≤ 3) = 1 − FT (3) = 1 − = .
2 2

(c)

P (T ≥ 5) = 1 − P (T < 5)
= 1 − [P (T ≤ 5) − P (T = 5)] = 1 − [FT (5) − pT (5)]

3 1 1
=1− − = .
4 4 2

(d)

P (1.4 < T < 6) = P (T < 6) − P (T ≤ 1.4)


= [P (T ≤ 6) − P (T = 6)] − P (T ≤ 1.4) = [FT (6) − pT (6)] − FT (1.4)

3 1 1
= −0 − = .
4 4 2

12 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

(e)

2≤T ≤5
z }| {
P (T ≥ 2 and T ≤ 5)
P (T ≤ 5|T ≥ 2) =
P (T ≥ 2)
P (T ≤ 5) − P (T < 2) P (T ≤ 5) − [P (T ≤ 2) − P (T = 2)]
= =
1 − P (T < 2) 1 − [P (T ≤ 2) − P (T = 2)]
FT (5) − (FT (2) − pT (2))
=
1 − (FT (2) − pT (2))
3/4 − (1/4 − 0) 2
= = .
1 − (1/4 − 0) 3

Example 12. An oce has four copying machines, and the RV X measures how many of them are in use at a particular
moment in time. Suppose that P (X = 0) = 0.08, P (X = 1) = 0.11, P (X = 2) = 0.27, and P (X = 3) = 0.33. (a) What is
P (X = 4)? (b) Draw a line graph of the PMF. (c) Construct and plot the CDF.

Solution Notice that the experimental objective is the number of copying machines (out of 4 machines) are in use at
a particular moment in time. Therefore the RV, which is denoted by X , is a DRV with values 0, 1, 2, 3 and 4 (i.e., its state
space is {0, 1, 2, 3, 4}) And, according to the given info., four of its PMF-values are known as

pX (0) := P (X = 0) = 0.08, pX (1) := P (X = 1) = 0.11, pX (2) := P (X = 2) = 0.27, and pX (3) := P (X = 3) = 0.33.

(a) Since the sum of all PMF-values is 1, we should have

4
X 4
X
1= pX (i) = P (X = i) = P (X = 0) + P (X = 1) + P (X = 2) + P (X = 3) + P (X = 4)
i=0 i=0
= 0.79 + P (X = 4)
0.21 = P (X = 4).

Therefore, the PMF of DRV X is



 0.08 if x = 0
0.11 if x = 1




0.27 if x = 2

pX (x) =
 0.33
 if x = 3
 0.21 if x = 4



0 Otherwise

13 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

(b)

(c) • Since the DRV X has no value less than 0, FX (x) = 0 for x < 0.

• For every 0 ≤ x < 1, since 0 is the only X -value no larger than x, FX (x) = P (X = 0) = 0.08 for 0 ≤ x < 1.

• For every 1 ≤ x < 2, since 0 and 1 are the only X -values no larger than x, we have

FX (x) = P (X = 0) + P (X = 1) = 0.19, for 1 ≤ x < 2.

• For every 2 ≤ x < 3, since 0, 1 and 2 are the only X -values no larger than x, we have

FX (x) = P (X = 0) + P (X = 1) + P (X = 2) = 0.46, for 2 ≤ x < 3.

• For every 3 ≤ x < 4, since 0, 1, 2 and 3 are the only X -values no larger than x, we have

FX (x) = P (X = 0) + P (X = 1) + P (X = 2) + P (X = 3) = 0.79, for 3 ≤ x < 4.

• For every x ≥ 4, since all the X -values are no larger than x, we have

FX (x) = P (X = 0) + P (X = 1) + P (X = 2) + P (X = 3) + P (X = 4) = 1, for x ≥ 4.

Therefore, the CDF of DRV X is presented as




 0 for x < 0
0.08 for 0≤x<1




0.19 for 1 ≤ x < 2

FX (x) =
 0.46 for 2 ≤ x < 3

 0.79 for 3 ≤ x < 4



1 for x ≥ 4

So, its CDF-graph is like

14 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

Example 13. The following table presents the CDF of a DRV X . Make a table and line graph of its PMF.
R x < −4 −4 ≤ x < −1 −1 ≤ x < 0 0≤x<2 2≤x<3 3≤x<7 x≥7
FX (x) 0.00 0.21 0.32 0.39 0.68 0.81 1.00

Solution Notice that since the given CDF has (nite) jumps at −4, −1, 0, 2, 3, and 7, the state space of DRV X is
{−4, −1, 0, 2, 3, 7}. Knowing that PMF-value at a DVR-value equals the amount of jump of the corresponding CDF at that
DRV-value, we have 

 0.21 if x = −4
F (−1) − F (−4) = 0.32 − 0.21 = 0.11 if x = −1

X X



 FX (0) − FX (−1) = 0.39 − 0.32 = 0.07 if x = 0


pX (x) = FX (2) − FX (0) = 0.68 − 0.39 = 0.29 if x = 2
F (3) − F (2) = 0.81 − 0.68 = 0.13 if x = 3

X X



F (7) − F (3) = 1 − 0.81 = 0.19 if x=7

X X



0 Otherwise

So, the line graph of the PMF is

15 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

Example 14. Suppose that two fair dice are rolled and that the two numbers recorded are multiplied to obtain a nal score.
Construct and plot the PMF and the SCF of the nal score.

Solution The sample space of the random experiment in question may be represented as
{(i, j) | i, j = 1, 2, 3, 4, 5, 6} ,

where i and j are the faced up numbers of the die in the rst and the second rolls. Since the experimental objective is
multiplying the faced up numbers of the die rolled twice by letting X be the DRV of the random experiment in question, the
state space of X is
RX = {1, 2, 3, 4, 5, 6, 8, 9, 10, 12, 15, 16, 18, 20, 24, 25, 30, 36} ,
where
X = 1 ≡ The event {(1, 1)} X = 2 ≡ The event {(1, 2), (2, 1)} X = 3 ≡ The event {(1, 3), (3, 1)}
X = 4 ≡ The event {(1, 4), (2, 2), (4, 1)} X = 5 ≡ The event {(1, 5), (5, 1)} X = 6 ≡ The event {(1, 6), (2, 3), (3, 2), (6, 1)}
X = 8 ≡ The event {(2, 4), (4, 2)} X = 9 ≡ The event {(3, 3)} X = 10 ≡ The event {(2, 5), (5, 2)}
X = 12 ≡ The event {(2, 6), (3, 4), (4, 3), (6, 2)} X = 15 ≡ The event {(3, 5), (5, 3)} X = 16 ≡ The event {(4, 4)}
X = 18 ≡ The event {(3, 6), (6, 3)} X = 20 ≡ The event {(4, 5), (5, 4)} X = 24 ≡ The event {(4, 6), (6, 4)}
X = 25 ≡ The event {(5, 5)} X = 30 ≡ The event {(5, 6), (6, 5)} X = 36 ≡ The event {(6, 6)} .

Since the die is fair, we have


1 1 1 1 1
P (X = 1) = P (X = 2) = P (X = 3) = P (X = 4) = P (X = 5) =
36 18 18 12 18
1 1 1 1 1
P (X = 6) = P (X = 8) = P (X = 9) = P (X = 10) = P (X = 12) =
9 18 36 18 9
1 1 1 1 1
P (X = 15) = P (X = 16) = P (X = 18) = P (X = 20) = P (X = 24) =
18 36 18 18 18
1 1 1
P (X = 25) = P (X = 30) = P (X = 36) = ,
36 18 36
from which the PMF and the CDF of X are


 0 for x<1
1/36 for 1≤x<2




1/12 for 2≤x<3




5/36 for 3≤x<4




2/9 for 4≤x<5




5/18 for 5≤x<6




7/18 for 6≤x<8


 

1/36 if x = 1, 9, 16, 25, 36 4/9 for 8≤x<9



 

 1/18 if x = 2, 3, 5, 8, 10, 15, 18, 20, 24, 30 17/36 for 9 ≤ x < 10

 


pX (x) = 1/12 if x = 4 and FX (x) = 19/36 for 10 ≤ x < 12
1/9 if x = 6, 12 23/36 for 12 ≤ x < 15

 


 

0 Otherwise 25/36 for 15 ≤ x < 16
 



13/18 for 16 ≤ x < 18




7/9 for 18 ≤ x < 20




5/6 for 20 ≤ x < 24




8/9 for 24 ≤ x < 25




11/12 for 25 ≤ x < 30




35/36 for 30 ≤ x < 36




1 for x ≥ 36

Here, we don't plot the line graph of the PMF and the graph of CDF.

16 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

Example 15. Two cards are drawn at random from a pack of cards without replacement one at a time. Let the RV X be the
number of cards drawn from the heart suit. (a) Construct the PMF. (b) Construct the CDF. (c) What is the most likely
value of the RV X ?

Solution The sample space of the random experiment in question may be represented as
{(C1 , C2 ) | C1 , C2 = H, D, S, or C} ,

where C1 represents the rst drawn card from 52 cards and C2 does the second one from the remaining 51 cards, and
C1 or C2 = H, D, S or C states that the rst (or the second) drawn card from either the heart, diamond, spade, or club suit.
Since the experimental objective is number of cards drawn from the heart suit, the state space of X is RX = {0, 1, 2}, where
consists of 156 outcomes consists of 156 outcomes consists of 156 outcomes
z }| { z }| { z }| {
X = 0 ≡ {(C1 , C2 ) | C1 , C2 = D} ∪ {(C1 , C2 ) | C1 , C2 = S} ∪ {(C1 , C2 ) | C1 , C2 = C} ∪
∪ {(C1 , C2 ) | C1 , C2 = D or S} ∪ {(C1 , C2 ) | C1 , C2 = D or C} ∪ {(C1 , C2 ) | C1 , C2 = S or C},
| {z } | {z } | {z }
consists of 338 outcomes consists of 338 outcomes consists of 338 outcomes

consists of 338 outcomes consists of 338 outcomes consists of 338 outcomes


z }| { z }| { z }| {
X = 1 ≡ {(C1 , C2 ) | C1 , C2 = H or D} ∪ {(C1 , C2 ) | C1 , C2 = H or S} ∪ {(C1 , C2 ) | C1 , C2 = H or C} ∪,

and
consists of 156 outcomes
z }| {
X = 2 ≡ {(C1 , C2 ) | C1 , C2 = H} .

Since each draw is done randomly, we have


1482 19 1014 13 156 1
P (X = 0) = = , P (X = 1) = = , and P (X = 2) = = .
2652 34 2652 34 2652 17
Therefore,
(a) the PMF of X is 

 19/34 if x = 0
13/34 if x = 1

pX (x) = ,

 1/17 if x = 2
0 Otherwise

and
(b) the CDF of X is 

 0 for x < 0
19/34 for 0 ≤ x < 1

FX (x) = .

 16/17 for 1 ≤ x < 2
1 for x ≥ 2

(c) Since the largest value of the PMF is obtained when X = 0, 0 is the most likely value of the RV X .

Example 16. A fair coin is tossed three times. A player wins $1 if the rst toss is a head, but loses $1 if the rst toss is
a tail. Similarly, the player wins $2 if the second toss is a head, but loses $2 if the second toss is a tail, and wins or loses
$3 according to the result of the third toss. Let the RV X be the total winnings after the three tosses (possibly a negative
value if losses are incurred). (a) Construct the PMF. (b) Construct the CDF. (c) What is the most likely value of the RV X ?

Solution Labeling one side of the coin with H for head and the other side of it with T for tail, the sample space of this
random experiment may be represented by
{(H, H, H), (H, H, T ), (H, T, H), (T, H, H), (H, T, T ), (T, H, T ), (T, T, H), (T, T, T )}.

17 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

Since the experimental objective is the total winnings after the three tosses, according to the given info., we have
{(H, H, H)} ≡ X = $6 {(H, H, T ), (T, T, H)} ≡ X = $0 {(H, T, H)} ≡ X = $2 {(T, H, H)} ≡ X = $4
{(H, T, T )} ≡ X = −$4 {(T, H, T )} ≡ X = −$2 {(T, T, T )} ≡ X = −$6,

from which the state space of X is RX = {−$6, −$4, −$2, $0, $2, $4, $6}. Since the coin is fair,
1 1 1 1
P (X = −6) = P (X = −4) = P (X = −2) = P (X = 0) =
8 8 8 4
1 1 1
P (X = 2) = P (X = 4) = P (X = 6) = .
8 8 8
Therefore,
(a) the PMF of X is 
 1/8 if x = ±6, ±4, ±2,
pX (x) = 1/4 if x = 0 ,
0 Otherwise

and
(b) the CDF of X is 

 0 for x < −6
1/8 for − 6 ≤ x < −4




1/4 for − 4 ≤ x < −2




3/8 for −2≤x<0

FX (x) = .

 5/8 for 0≤x<2
3/4 for 2≤x<4




7/8 for 4≤x<6




1 for x≥6

(c) Since the largest value of the PMF is obtained when X = 0, 0 is the most likely value of the RV X .

Example 17. Four cards are labeled $1, $2, $3, and $6. A player pays $4, selects two cards at random, and then receives
the sum of the winnings indicated on the two cards. Calculate the PMF and the CDF of the net winnings (that is, winnings
minus the $4 payment).

Solution The sample space of this random experiment may be presented as


{(1, 2), (1, 3), (1, 6), (2, 1), (2, 3), (2, 6), (3, 1), (3, 2), (3, 6), (6, 1), (6, 2), (6, 3)},

where, for instance, (1, 2) represents the outcome that the rst selected card is labeled with $1 and the second selected one
is labeled with $2. Since the experimental objective is the "sum of the labeled numbers indicated on the two selected cards
minus 4" (i.e., net winnings), according to the given info., letting X be the RV in question, we have
{(3, 6), (6, 3)} ≡ X = $5 {(2, 6), (6, 2)} ≡ X = $4 {(1, 6), (6, 1)} ≡ X = $3
{(2, 3), (3, 2)} ≡ X = $1 {(3, 1), (1, 3)} ≡ X = $0 {(1, 2), (2, 1)} ≡ X = −$1,

from which the state space of X is RX = {−$1, $0, $1, $3, $4, $5}. Since each card selection is done randomly,
1 1 1
P (X = −1) = P (X = 0) = P (X = 1) =
6 6 6
1 1 1
P (X = 3) = P (X = 4) = P (X = 5) = .
6 6 6
Therefore, the PMF of X is
1/6 if x = −1, 0, 1, 3, 4, 5,
pX (x) = ,
0 Otherwise

18 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

and the CDF of X is 



 0 for x < −1
1/6 for −1≤x<0




 1/3 for 0≤x<1


FX (x) = 1/2 for 1≤x<3 .
2/3 for 3≤x<4




5/6 for 4≤x<5




1 for x≥5

Example 18. A company has ve warehouses, only two of which have a particular product in stock. A salesperson calls the
ve warehouses in a random order until a warehouse with the product is reached. Let the RV X be the number of calls made
by the salesperson, and calculate its PMF and CDF.

Solution Let's denote any of those two warehouses, has that particular product in stuck, by W X and any of those, which
does't, by W . Then, the sample space consists of all outcomes each of which may be presented as
×

−1 , −2 , −3 , −4 , −5

where the symbols − should be lled with two W X 's and three W × 's, and the subscripted numbers show the order to make
calls; that is, for instance, W1× , W2× , W3X , W4× , W3X shows the order the salesperson to make calls. Therefore, the sample
space consists of 5! = 120 outcomes.
According to the description of the RV X in question, the state space of DRV X is {1, 2, 3, 4} (i.e., either the rst, second,
third, or fourth salesperson's call is successful to order the product.) Therefore, we have
consists of 2×4!=48 outcomes
z }| {
X = 1 ≡ {W1X , −2 , −3 , −4 , −5 | the remaining to be filled with one W X and three W × }
consists of 3×2×3!=36 outcomes
z }| {
X = 2 ≡ {W1× , W2X , −3 , −4 , −5 | the remaining to be filled with one W X and two W × }
consists of 3×2×2×2!=24 outcomes
z }| {
X = 3 ≡ {W1× , W2× , W3X , −4 , −5 | the remaining to be filled with one W X and one W × }
consists of 3×2×1×2×1=12 outcomes
z }| {
X = 4 ≡ {W1× , W2× , W3× , W4X , −5 | the remaining to be filled with one W X }

Since orders of calls are random,


48 2 36 3 24 1 12 1
P (X = 1) = = P (X = 2) = = P (X = 3) = = P (X = 4) = =
120 5 120 10 120 5 120 10
Therefore, the PMF of X is 

 2/5 if x = 1
 3/10 if x = 2


pX (x) = 1/5 if x = 3 ,
 1/10 if x = 4



0 Otherwise

and the CDF of X is 



 0 for x<1
 2/5 for 1≤x<2


FX (x) = 7/10 for 2≤x<3 .
9/10 for 3≤x<4




1 for x≥4

Example 19. After manufacture, computer disks are tested for errors. Let X be the number of errors detected on a randomly
chosen disk. The following table presents values of the CDF FX (x) of X :

19 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

x 0 1 2 3 4
FX (x) 0.41 0.72 0.83 0.95 1

(a) What is the probability that two or fewer errors are detected? (b) What is the probability that more than three errors
are detected? (c) What is the probability that exactly one error is detected? (d) What is the probability that no errors are
detected? (e) What is the most probable number of errors to be detected?

Solution Knowing that the state space of X is {0, 1, 2, 3, 4}, we have

1. P (X ≤ 2) = FX (2) = 0.83.

2. P (X > 3) = 1 − P (X ≤ 3) = 1 − FX (3) = 1 − 0.95 = 0.05.

3. P (X = 3) = P (X ≤ 3) − P (X ≤ 2) = FX (3) − FX (2) = 0.95 − 0.83 = 0.12.

4. P (X = 0) = P (X ≤ 0) = FX (0) = 0.41.

5. Since P (X = 1) = FX (1) − FX (0) = 0.31, P (X = 2) = FX (2) − FX (1) = 0.11, and P (X = 4) = FX (4) − FX (3) = 0.05,
detecting no errors is the most probable ones.

Numerical Summary/Descriptive Measures of a DRV

- As mentioned before, the state space of an RV may be viewed as a quantitative data set from which the basic statistical
features of outcomes (of a random experiment) can be studied/described.

- In this section, two numerical summary/descriptive measures of a DRV are dened and interpreted.

20 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

Definition Mean/Expected-Value of a DRV


Let X be a DRV, with state space RX , and pX be its PMF. The Mean/Expected-Value of X , denoted by either
µX or E(X), is dened as sum of the products of each X -value and its corresponding PMF-value:

(3.1)
X
µX or E(X) := x · pX (x),
x∈RX

where the sum is taken over all values in the state space of X .
E(X)
zX }| {
- Observation: Notice that since each X -value is weighted by its corresponding PMF-value, x · pX (x) is
x∈RX
considered as the weighted arithmetic mean of all X -values whose weights are their corresponding PMF-values.
+ This observation states that E(X) may be regarded as the representative value for X ; that is, E(X) is the
unique number satisfying X
(x − E(X)) pX (x) = 0.
x∈RX

+ This observation also interprets E(X) as an X -value that we expect to observe per repetition, on average,
if we perform the corresponding random experiment a large number of times (because of this interpretation
that µX is also called expected-value of X and is denoted by E(X) as well.)
+ Notice that if the values of an RV has some measurement unit, then E(X) is measured with the same unit.
- In Statistics and Probability Theory, E (in E(X)) is considered as a real-valued function acting on RVs, and is
called (mathematical) expectation.
- Both notations µX or E(X) may be used interchangeably, but some authors prefer to use µX when "mean of
X -values" is to be emphasized, and to use E(X) when "expected-value of X " is to be emphasized.
- Some authors call E(X) as the Mean of the Distribution of X or X − values.

Definition Symmetric DRV


Let X be a DRV whose PMF is pX . If there's a real number a for which pX (a + x) = pX (a − x), for every x ∈ R,
then DRV X is said to be symmetric and a is called the point of symmetry of X .
- Simply, a DRV is symmetric if the line graph of its PMF is symmetric w.r.t. a (hypothetical) vertical line passing
through a point on the horizontal axis. In this case, the point on the horizontal axis is the point of symmetry
of X .
+ Consequence: If a DRV is symmetric, then its expected-value is its point of symmetry.

+ Fact: For any real-valued function g and any RV X , g(X) is also considered as an RV, provided that g(X) is dened.

- Note that g(X) is a composition of two functions.

- Hence, the concept of expectation can be extended in a natural way from the expectation E(X) of an RV X to the
expectation E [g(X)] of the RV g(X).

21 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

Proposition
Let X be a DRV and pX be its PMF. If g is any real-valued function for which g(X) is dened, then the expectation
of g(X) is
(3.2)
X
E [g(X)] = g(x) · pX (x),
where the sum is taken over all those X -values which are in the domain of function g .
- Note that E [g(X)] may be viewed as either the mean or expected-value of DRV g(X). For, The notation
µg(X) is also used to denote the expectation of g(X).
- The equality in (3.8) is known as the Law of the Unconscious Statistician !!!!

- Consequence: Some Special Cases of (3.8) For a DRV X (with state space RX ) whose PMF is pX ,
(a) if g(x) = b for some real number b, then
=E(b) =b =1
z }| { X z}|{ zX }| {
E [g(X)] = g(x) ·pX (x) = b pX (x) = b.
x∈RX x∈RX

+ This result is also valid if X is a CRV.


(b) if g(x) = ax for some real number a, then
=E(aX) =E(X)
=ax zX }| {
z }| { X z}|{
E [g(X)] = g(x) ·pX (x) = a x · pX (x) = aE(X).
x∈RX x∈RX

+ This result is also valid if X is a CRV.


(c) if g(x) = ax + b for some real numbers a and b, then
=E(aX+b) =E(aX) =E(b)
=ax+b zX }| { zX }| {
z }| { X z}|{
E [g(X)] = g(x) ·pX (x) = ax · pX (x) + b · pX (x) = aE(X) + b.
x∈RX x∈RX x∈RX

+ This result is also valid if X is a CRV.


(d) if g(x) = (x + b)2 for some real number b, then
E [(X+b)2 ] x2 +2bx+b2
z }| { X z}|{
E [g(X)] = g(x) ·pX (x)
x∈RX
X X X
= x2 · pX (x) + 2bx · pX (x) + b2 · pX (x)
x∈RX x∈RX x∈RX
| {z } | {z } | {z }
=E(X 2 ) =E(2bX) E(b2 )

= E X 2 + 2bE(X) + b2

+ This result is also valid if X is a CRV.


- Note that in (c), if a := 1 and b := −E(X), then

E [X − E(X)] = E(X) − E(X) = 0.

- Note that in (d), if b := −E(X), then


2 2 2
E (X − E(X))2 = E X 2 − 2 [E(X)] + [E(X)] = E X 2 − [E(X)] . (3.3)

22 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

Definition Variance of an RV
Let X be an RVa . The variance of X , denoted by either σX
2
or Var(X), is dened as the expectation of [X − E(X)]2 ;
that is,
h i h i
2
σX
2
or Var(X) := E (X − E(X)) or E (X − µX )
2
(3.4)
= E(X 2 ) − [E(X)] .
2
according to (3.3) (3.5)
Therefore, if X is a DRV, with state space RX , and pX is its PMF, then variance of X is obtained as either

(3.6)
X
2
σX or Var(X) = (x − µX )2 · pX (x), using (3.4)
x∈RX

or !2
(3.7)
X X
2 2
σX or Var(X) = x · pX (x) − x · pX (x) , using (3.5)
x∈RX x∈RX

where the sum is taken over all values in the state space of X .
- Both formulas in (3.6) and (3.7) are equivalent, but their dierence is that in (3.7) you don't need to perform
extra operations of rst subtracting µX from each X -value and then raising the dierence to the second power.
This, in turn, reduces the rounding error.
- Observation: Note that since each X -value is weighted by its corresponding PMF-value, the expression in (3.6)
is considered as the weighted arithmetic mean of all square deviations of X -values from their expected-value,
whose weights are their corresponding PMF-values.
+ This observation states that Var(X) may be regarded as a measure of dispersion of X -values around their
expected-value.
+ This observation also interprets Var(X) as the expected-value of all square deviations of X -values from
their expected-value; that is, if we perform the corresponding random experiment a large number of times,
then we expect to observe σX 2
as square deviation from the expected-value of X per repetition, on average.
+ Conclusion: lower value of the variance of an RV indicates that its values are spread over a relatively
smaller range around its expected-value. And in contrast, a larger value of the variance of an RV
indicates that its values are spread over a relatively larger range around its expected-value.
a This denition is valid for both DRV and CRV.

Proposition
Let X be an RV. If g is any real-valued function for which g(X) is dened, then the variance of g(X) is
h i
2 2
Var [g(X)] = E (g(X)) − (E [g(X)]) . (3.8)

- Note that Var [g(X)] may be viewed as the variance of RV g(X). For, The notation σg(X)
2
is also used to
denote the variance of g(X).

- Consequence: Some Special Cases For an RV X ,

(a) if g(x) = b for some real number b, then


=Var(b) =b2  =b 2
z }| { z }| { z }| {
2
Var [g(X)] = E(b ) − E(b) = 0.

23 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

(b) if g(x) = ax for some real number a, then

 =Var(X) 
z }| {

2 2
=a2 E(X )


− (E(X)) 

z }| {
2
=a2 E(X 2 )

=Var(aX) =aE(X)
z }| { z }| { z }| {
Var [g(X)] = E(a2 X 2 ) − E(aX) = a2 Var(X).
 

(c) if g(x) = ax + b for some real numbers a and b, then

=Var(aX+b)
z }| {
2
Var [g(X)] = E (aX + b)2 − [E(aX + b)]

2
= E(a2 X 2 ) + E(2abX) + E(b2 ) − [E(aX) − E(b)]
2

2

= a2 E(X 2 ) + 2abE(X) + b2 − a2 [E(X)] + 2abE(X) + b2 = a2 E(X 2 ) − [E(X)]

= a2 Var(X).

Definition Standard Deviation of an RV


Let X be an RV. The standard deviation of X , denoted by either σX or SD(X), is dened as the square-root of
its variance; that is, q
p
SD(X) := Var(X) or σX := 2 .
σX

+ Note that whatever is mentioned about the variance of an RV is also valid for its standard deviation, knowing
that the value of standard deviation is obtained by taking square root of variance.
+ Notice that if the values of an RV has some measurement unit, then SD(X) is measured with the same unit.

Chebyshev's Inequality

- In the lecture note on Numerical Summary Measures, according to the Chebyshev's Inequality, we saw that in any
quantitative data set with known A-mean µ and standard deviation σ , at least 1 − (1/k2 ) × 100 % of data values

are between µ − kσ and µ + kσ , provided that k > 1. In other words, any symmetric interval centered at µ of length
2d, contains at least 1 − (σ 2 /d2 ) × 100 % of data values, provided that d > σ .

- Therefore, variance of a quantitative data set gives us a lower bound of the proportion of data values distributed
on a relatively large symmetric interval about their mean.

- Here we want to state the Chebyshev's Inequality in terms of probability distribution of an RV. Notice that the state
space of an RV can be considered as a quantitative data set whose values are weighted by some probabilities.

24 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

Chebyshev's Inequality
Let X be an RV (regardless of being discrete or continuous.) If µ and σ are the mean/expected-value and standard
deviation of X , respectively, then for every k > 1,
|X−µ|≤kσ
 
1
(3.9)
z }| {
P µ − kσ ≤ X ≤ µ + kσ  ≥ 1 − 2 ,
k

or, equivalently  
|X−µ|≥kσ
1
(3.10)
z }| {
P X ≤ (µ − kσ) or X ≥ (µ + kσ) ≤ 2 .
 
k
| {z }
=P (X≤µ−kσ)+P (X≥µ+kσ)

- (3.9) simply says that the probability that an RV takes a value within k of its standard deviation from its
mean/expected-value is not less than 1 − (1/k2 ).
- (3.10) simply says that the probability that an RV takes a value not within k of its standard deviation from its
mean/expected-value is not more than 1/k2 .
+ A simple consequence of the Chebyshev's Inequality is the farther an RV value from its mean/expected-value,
the less chance to be taken by the RV.
+ One simple advantage of the Chebyshev's Theorem is that, without knowing the probability distribution of an
RV, you can nd a lower bound for the probability that the RV may take a value between two numbers which
are equidistance from its mean/expected value; that is, for any RV X with mean/expected-value µ and standard
deviation σ , if a := µ − d and b := µ + d, then
σ2
P (a ≤ X ≤ b) ≥ 1 − , (3.11)
d2
provided that d > σ .

- Some Simple Observation: If X is an RV with expected-value µ and standard deviation σ ,

• for k = 1.5, there's at least approximately 55.6% chance that X takes a value within 1.5σ from µ, or equivalently,
there's at most approximately 44.4% chance that X takes a value not within 1.5σ from µ.
• for k = 2, there's at least approximately 75% chance that X takes a value within σ from µ, or equivalently, there's
at most approximately 25% chance that X takes a value not within σ from µ.
• for k = 2.5, there's at least approximately 84% chance that X takes a value within 2.5σ from µ, or equivalently,
there's at most approximately 16% chance that X takes a value not within 2.5σ from µ.
• for k = 3, there's at least approximately 88.9% chance that X takes a value within 3σ from µ, or equivalently,
there's at most approximately 11.1% chance that X takes a value not within 3σ from µ.
+ Remark: In some problems, you may be asked to determine an interval around the mean/expected-value of an RV that
there's at least p% chance for the RV to take a value within that interval. One such intervals, using the Chebyshev's
Inequality, can be obtained as

10σ 10σ p
µ− √ ,µ + √ , provided that10 > 100 − p
100 − p 100 − p

where µ and σ are the mean/expected-value and standard deviation, respectively, of the RV
-

25 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

Supplementary Examples

Example 20. Suppose that the R X takes the values −2, 1, 4, and 6 with probability values 1/3, 1/6, 1/3, and 1/6, respec-
tively. (a) Find the expectation of X . (b) Find the variance of X using the formula Var(X) = E (X − E(X))2 (c) Find

the variance of X using the formula Var(X) = E(X 2 ) − [E(X)]2 .

Solution The tabular presentation of the PMF of X is


x −2 1 4 6
pX (x) 1/3 1/6 1/3 1/6

(a)

X 1 1 1 1
E(X) = x · pX (x) = −2 × + 1× + 4× + 6×
3 6 3 6
11
= .
6

(b)
X 2
Var(X) = E (X − E(X))2 =

(x − E(X)) · pX (x)
2 2 2 2
11 1 11 1 11 1 11 1
= −2 − + 1− + 4− + 6−
6 3 6 6 6 3 6 6
529 25 169 625 2046 341
= + + + = = .
108 216 108 216 216 36

(c)
2
2
X
2 11
Var(X) = E(X ) − [E(X)] 2 = x · pX (x) −
6
2
2 1 2 1 2 1 2 1 11
= (−2) · + 1 · + 4 · + 6 · −
3 6 3 6 6
2
77 11 341
= − = .
6 6 36

Example 21. Computer chips often contain surface imperfections. For a certain type of computer chip, the PMF of the
number of defects X is presented in the following table:
x 0 1 2 3 4
pX (x) 0.4 0.3 0.15 0.1 0.05
Evaluate (a) P (X ≤ 2), (b) P (X > 2), (c) E(X), (d) Var(X).

Solution Since the state space of X is {0, 1, 2, 3, 4},


(a) The probability that a certain type of computer chip contains at most two surface imperfections is
P (X ≤ 2) = P (X = 2) + P (X = 1) + P (X = 0) = 0.15 + 0.3 + 0.4 = 0.85.

(b) The probability that a certain type of computer chip contains more than one surface imperfections is
P (X > 1) = 1 − P (X ≤ 1) = 1 − [P (X = 1) + P (X = 0)] = 1 − (0.4 + 0.3) = 0.3.

26 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

(c)
E(X) = (0 × 0.4) + (1 × 0.3) + (2 × 0.15) + (3 × 0.1) + (4 × 0.05) = 1.1,
indicating that on average any computer chip of that type contains 1.1 surface imperfections.
(d)
2
Var(X) = E(X 2 ) − [E(X)]
= (02 × 0.4) + (12 × 0.3) + (22 × 0.15) + (32 × 0.1) + (42 × 0.05) − 1.12 = 1.39,

indicating that the number of surface imperfections, for that certain type of computer chips, are relatively less variable
around the 1.1.

Example 22. Calculate the variance and standard deviation of the number of copying machines in use at a particular mo-
ment in Example 12.

Solution Since
4
X
E(X) = x · pX (x) = (0 × pX (0)) + (1 × pX (1)) + (2 × pX (2)) + (3 × pX (3)) + (4 × pX (4))
x=0
= (0 + 0.08) + (1 + 0.11) + (2 + 0.27) + (3 + 0.33) + (4 + 0.21) = 2.48,
indicating, over a long period of time, on average 2.48 copying machines are in use at a particular moment,
4
X
Var(X) = E(X 2 ) − [E(X)] 2 = x2 · pX (X) − 2.482
x=0
= 0 × pX (0) + 12 × pX (1) + 22 × pX (2) + 32 × pX (3) + 42 × pX (4) − 2.482
2

= 02 · 0.08 + 12 · 0.11 + 22 · 0.27 + 32 · 0.33 + 42 · 0.21 − 2.482


= 7.52 − 2.482 = 1.3695,


and p √
SD(X) = Var(X) = 1.3695 ≈ 1.17.
Since one standard deviation from the expected-value forms the interval [1.31, 3.65] which contains two values of X (i.e., 40%
of X -values), this implies majority of X -values are spread over a relatively larger range around its expected-value. Therefore,
the X -values are more variable.

Example 23. Calculate the variance and standard deviation of the number of warehouses called by the salesperson in Ex-
ample 18.

Solution Since
4
X
E(X) = x · pX (x) = (1 × pX (1)) + (2 × pX (2)) + (3 × pX (3)) + (4 × pX (4))
x=1

2 3 1 1
= 1× + 2× + 3× + 4× = 2,
5 10 5 10
indicating, over a long period of time, on average after two calls the salesperson can order the product,
4
X
Var(X) = E(X 2 ) − [E(X)] 2 = x2 · pX (X) − 22
x=1
= 12 × pX (1) + 22 × pX (2) + 32 × pX (3) + 42 × pX (4) − 22


2 2 2 3 2 1 2 1
= 1 × + 2 × + 3 × + 4 × − 22
5 10 5 10
= 5 − 4 = 1,

27 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021
p √
SD(X) = Var(X) = 1 = 1.
Since one standard deviation from the expected-value forms the interval [1, 3] which contains three values of X (i.e., 75% of
X -values), this implies majority of X -values are spread over a relatively smaller range around its expected-value. Therefore,
the X -values are less variable.

Example 24. According to the following tabular form of the PMF of a DRV X :
x 2 4 6 8 10
pX (x) 0.1 0.25 0.3 0.25 0.1
Evaluate E(−3X + 7).

Solution Since E(−3X + 7) = −3E(X) + 7, rst we need to evaluate E(X): From the given table, we get
X
E(X) = x · pX (x) = (2 × 0.1) + (4 × 0.25) + (6 × 0.3) + (8 × 0.25) + (10 × 0.1) = 6.

Therefore,
E(−3X + 7) = −3E(X) + 7 = (−3)(6) + 7 = −11.

Example 25. If E(X) = 1, E(X 2 ) = 2, and the RV Z is dened as Z := (2X − 1)2 , evaluate E(Z).

Solution

E(Z) = E[(2X − 1)2 ] = E(4X 2 − 4X + 1)


= 4E(X 2 ) − 4E(X) + 1 = (4 × 2) − (4 × 1) + 1 = 5.

Example 26. Let X be an RV. If the mean of X is 0 and its variance is 9, evaluate the mean of the new RV Y dened as
Y := X(X + 1).

Solution Given are E(X) = 0 and Var(X) = 9, but required is E(Y ) = E[X(X + 1)]. Hence, we have
E(Y ) = E(X 2 + X) = E(X 2 ) + E(X)
n o
2
= Var(X) + [E(X)] + E(X)
= (9 + 02 ) + 0 = 9.

Example 27. For an RV X , if E(X) = 2 and E [X(X − 4)] = 5, evaluate its variance.

Solution Since
5 = E [X(X − 4)] = E(X 2 − 4X) = E(X 2 ) − 4E(X) which implies E(X 2 ) = 5 + (4 × 2) = 13,
we have
2
Var(X) = E(X 2 ) − [E(X)] = 13 − 22 = 9.

Example 28. For an RV X , if E(X) = 1.5 and E(X 2 ) = 6, evaluate Var(−2X + 5).

Solution Since
2
Var(X) = E(X 2 ) − [E(X)] = 6 − 1.52 = 3.75,
we have
Var(−2X + 5) = (−2)2 Var(X) = 4 × 3.75 = 15.

28 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

29. For an RV X , if E(X) = 1 and Var(X) = 5, evaluate E (X + 2)2 .



Example

Solution Since
2
Var(X) = E(X 2 ) − [E(X)]
5 = E(X 2 ) − 12
6 = E(X 2 ),
we have
E (X + 2)2 = E(X 2 + 4X + 4) = E(X 2 ) + 4E(X) + 4

= 6 + 4 × 1 + 4 = 14.

Example 30. Suppose that you are organizing the game where you charge players $2 to roll two dice and then you pay them
the dierence in the scores. (a) What is the variance in your prot from each game? (b) If you are playing a game in which
you have positive expected winnings, would you prefer a small or a large variance in the winnings?

Solution Here the RV is my net gain after the game is played once. Letting X be the RV, it may be written as
X = 2 − |i1 − i2 | (in $)
where i1 and i2 are the faced up scores in the rst and the second rolls, respectively; that is, i1 , i2 = 1, 2, 3, 4, 5, 6. Hence, the
state space of X is {−3, −2, −1, 0, 1, 2} and presenting the sample space of rolling two dice as
{(i1 , i2 ) | i1 , i2 = 1, 2, 3, 4, 5, 6},
we have
X = −3 ≡{(1, 6), (6, 1)} X = −2 ≡ {(1, 5), (5, 1), (2, 6), (6, 2)}
X = −1 ≡{(1, 4), (4, 1), (2, 5), (5, 2), (3, 6), (6, 3)} X = 0 ≡{(1, 3), (3, 1), (2, 4), (4, 2), (3, 5), (5, 3),
, (4, 6), (6, 4)}
X = 1 ≡{(1, 2), (2, 1), (2, 3), (3, 2), (3, 4), (4, 3), (4, 5), (5, 4), (5, 6), (6, 5)} X = 2 ≡{(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)}.

Now, assuming the dice are fair, we get


1 1 1
P (X = −3) = P (X = −2) = P (X = −1) =
18 9 6
2 5 1
P (X = 0) = P (X = 1) = P (X = 2) = .
9 18 6
From which PMF of X is 

 1/18 if x = −3
1/9 if x = −2




 1/6 if x = −1


pX (x) = 2/9 if x = 0 .
5/18 if x = 1




1/6 if x = 2




0 Otherwise

Since
2
X
E(X) = x · pX (x) = (−3 × pX (−3)) + (−2 × pX (−2)) + (−1 × pX (−1)) + (0 × pX (0)) + (1 × pX (1)) + (2 × pX (2))
x=−3
1
= ≈ $0.06,
18
indicating, over a long period of time, my prot would be 6 cents per game, on average. From which,

29 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

1. the variance is
2 2
2
X
2 1
Var(X) = E(X ) − [E(X)] 2 = x · pX (X) −
x=−3
18
37 1 665
= − = ≈ 2.05,
18 182 324

2. As an organizer, for this expected winnings, I'd prefer smaller variance. But, as the expected winnings gets more
positive, I'd prefer larger variance.

Example 31. In a game a player either loses $1 with a probability 0.25, wins $1 with a probability 0.4, or wins $4 with a
probability 0.35. What are the expectation and the standard deviation of the winnings?

Solution Letting X be the RV representing player's net winnings of the game, the tabular form of presenting its PMF
is
x −$1 $1 $4
pX (x) 0.25 0.4 0.35

from which the expected winnings is


E(X) = (−1 × 0.25) + (1 × 0.4) + (4 × 0.35) = $1.55,

indicating, over a long period of time, the player's net winnings is $1.55 per game, on average (that is, it's a winning game
for the player!), and the standard deviation of player's net winnings is
q
2
p
SD(X) = Var(X) = E(X 2 ) − [E(X)]
p p
= ([(−1)2 × 0.25] + [12 × 0.4] + [42 × 0.35]) − 1.552 = 6.25 − 1.552
≈ $1.96,

indicating that net winnings are less variable around the expected winnings.

Example 32. By investing in a particular stock, a person can make a prot in one year of $4000 with probability 0.3 or take
a loss of $1000 with probability 0.7. What is this person's expected gain?

Solution Let RV X be the person's prot (in dollars) in one year after investing in that stock. The tabular presentation
of its PMF is
x 4, 000 −1, 000
pX (x) 0.3 0.7

Therefore, the person's expected gain is


E(X) = (4, 000 × 0.3) + (−1, 000 × 0.7) = $500,

indicating, over a long period of time, the person's net prot is $500 per year, on average.

Example 33. Suppose that an antique jewelry dealer is interested in purchasing a gold necklace for which the probabilities
are 0.2, 0.3, 0.4, and 0.1, respectively, that she will be able to sell it for a prot of $250, sell it for a prot of $150, break
even, or sell it for a loss of $150. What is her expected prot?

Solution Let RV X be the net prot (in dollars) of selling the purchased necklace. The tabular presentation of its PMF
is

30 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

x −150 0 150 250


pX (x) 0.1 0.4 0.3 0.2

from which the expected net prot is


E(X) = (−150 × 0.1) + (0 × 0.4) + (150 × 0.3) + (250 × 0.2) = $80,

which suggests it's a good deal.


Example 34. A private pilot wishes to insure an airplane for $200, 000. The insurance company estimates that a total loss
will occur with probability 0.001, a 50% loss with probability 0.01, and a 25% loss with probability 0.2. Ignoring all other
partial losses, what premium should the insurance company charge each year to realize an average prot of $500?

Solution Let R be the xed premium to be charged per year, and RV X be the company's yearly net prot (in dollars)
of selling the insurance. Hence, the values of X are R − 200, 000, R − 100, 000, R − 50, 000, and R. According to the given
info, the tabular presentation of PMF of X is
x R − 200, 000 R − 100, 000 R − 50, 000 R
pX (x) 0.001 0.01 0.2 0.789

To determine R, knowing that E(X) to be $500, we have


E(X) = [(R − 200, 000) 0.001] + [(R − 100, 000) 0.01] + [(R − 50, 000) 0.2] + (R × 0.789)
500 = R − 11, 200
$11, 700 = R.

Example 35. An RV X has a mean µX = 10 and a variance σX 2


= 4. Using Chebyshev's theorem, to approximate (a)
P (|X − 10| ≥ 4), (b) P (|X − 10| < 4), (c) P (4 < X < 16), (d) the value of the constant c such that P (|X − 10| ≥ c) ≤ 0.05.

Solution

1. Based on the Chebyshev's Inequality form in (3.10), we set


kσX = 4

k 4=4
k = 2.

Therefore,
1
P (|X − µX | ≥ kσX ) ≤
k2
1
P (|X − 10| ≥ 4) ≤ = 0.25.
22

2.
P (|X − 10| < 4) = 1 − P (|X − 10| ≥ 4) ≥ 1 − 0.25 = 0.75.

3. Since 4 < µX = 10 < 16, and µX − 4 = 10 − 4 = 6 and 16 − µX = 16 − 10 = 6, the 4 and 16 are equidistance from
µX = 10. Hence, based on the Chebyshev's Inequality form in (3.11), setting d = 6, we have
2
σX
P (µX − d ≤ X ≤ µX + d) ≥ 1 −
d2
4 8
P (4 ≤ X ≤ 16) ≥ 1 − 2 = .
6 9

31 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

4. Based on the Chebyshev's Inequality form in (3.10), setting kσX = c and 1/k2 = 0.05, implies

r
1
c = kσX = 2 = 4 5.
0.05

Example 36. The time taken to serve a customer at a fast-food restaurant has a mean of 75.0 seconds and a standard
deviation of 7.3 seconds. Use Chebyshev's inequality to calculate time intervals that have (at least) (a) 75% and (b) 89%
probabilities of containing a particular service time.

Solution Let X be the RV representing the time (in seconds) taken to serve a customer at that fast-food restaurant,
with µX = 75 and σX = 7.3.
(a) Based on the Chebyshev's Inequality form in (3.11), we set
σ2
1− = 0.75
d2
7.32
1 − 2 = 0.75
d
7.32
0.25 = 2
d
2
7.3
d2 = = 213.16
0.25
d = 14.6.

Therefore, the at least 75% of customers received their orders between µX − d = 75 − 14.6 = 60.4 seconds and µX + d =
75 + 14.6 = 89.6 seconds.

(b) Based on the Chebyshev's Inequality form in (3.11), we set


σ2
1− = 0.89
d2
7.32
1 − 2 = 0.89
d
7.32
0.11 = 2
d
2
7.3
d2 =
0.11
r
7.32
d= ≈ 22.01.
0.11
q
Therefore, the at least 89% of customers received their orders between µX − d = 75 − 7.32
0.11 ≈ 53 seconds and µX + d =
q
75 + 7.32
0.11 ≈ 97 seconds.

Example 37. A machine produces iron bars whose lengths have a mean of 110.8 cm and a standard deviation of 0.5 cm. Use
Chebyshev's inequality to obtain a lower bound on the probability that an iron bar chosen at random has a length between
109.55 cm and 112.05 cm.

Solution Let X be the RV representing the length (in centimeter) of a randomly chosen iron bar produced by the
machine, with µX = 110.8 and σX = 0.5.
Since 109.55 < µX = 110.8 < 112.05, and µX − 109.55 = 110.8 − 109.55 = 1.25 and 112.05 − µX = 112.05 − 110.8 = 1.25,

32 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

the given lengths are equidistance from µX = 110.8.


Based on the Chebyshev's Inequality form in (3.11), setting d = 1.25, we have

2
σX
P (µX − d ≤ X ≤ µX + d) ≥ 1 −
d2
0.52
P (109.55 ≤ X ≤ 112.05) ≥ 1 − = 0.84.
1.252

Therefore, the required lower bound is 0.84.

Some Common Discrete Probability Distributions (DPDs)

- Goal of This Section: Describing PMFs of some common DRVs and their expectations and variances by mathematical
formulas, which depend on some certain parameter values .

- Motivation: In some (statistical) populations, the characteristic under study is (or, is approximated by) one these
common DRVs. Therefore, knowing the PMF of that standard DRV, helps us to learn about the population by drawing
a sample from the population and analyzing the sample data.

- In what follows, keep in mind that the phrase probability distribution of DRV X refers to the collection

{(x, pX (x)) | x ∈ RX } ,

where pX and RX are PMF and state space of the DRV X , respectively; that is, simply it refers to all values of the
DRV X with their assigned probabilities.

The Bernoulli Probability Distribution (Bernoulli PD)

Bernoulli Trial
Any random experiment having only two outcomes (according to some experimental objective) is called a Bernoulli
trial.
+ Notice that, in a Bernoulli trial,

- the outcome of our interest is called success (and is usually denoted by S ) and the other one is called
failure (and is usually denoted by F .) Therefore, the sample space of a Bernoulli trial may be presented
as {S, F }.
- if the probability of success S is p, then the probability of failure is 1 − p.

33 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

Bernoulli RV
For a Bernoulli trial, an RV X which assigns 1 to success and 0 to failure; that is, X(S) = 1 and X(F ) = 0, is called
the Bernoulli RV.
+ Notice that,
- the Bernoulli RV is a DRV whose state space is {0, 1}.
- the Bernoulli RV gives the number of success in a Bernoulli trial.
- if the probability of success in a Bernoulli trial, whose Bernoulli RV denoted by X , is p, then

P (X = 1) = p and P (X = 0) = 1 − p.

+ Since the assigned probabilities to the values of a Bernoulli RV are determined completely by the probability
of success, probability of success in a Bernoulli trial is called the parameter of the Bernoulli distribution.
For, instead of stating that an "RV X has the Bernoulli distribution with parameter p", the following notation
is used
X ∼ Ber(p).

Probabilistic Properties of a Bernoulli RV


If X ∼ Ber(p), then
• the PMF of X is denoted and given by

px (1 − p)1−x

if x = 0, 1
Ber(x; p) := P (X = x) =
0 Otherwise

• the CDF of X is given by 


 0 if x < 0
FX (x) = P (X ≤ x) = 1 − p if 0 ≤ x < 1
1 if x ≥ 1

• the mean/expected-value of X is given by


E(X) = p.

• the variance of X is given by


Var(X) = p(1 − p).

Example 38. Two fair dice are rolled. Let X = 1 if the dice come up doubles and let X = 0 otherwise. Let Y = 1 if the
sum is 6, and let Y = 0 otherwise. Let Z = 1 if the dice come up both doubles and with a sum of 6 (that is, double 3), and
let Z = 0 otherwise. Determine the probabilistic properties of the RVs X , Y , and Z .

Solution Here for the random experiment of rolling two fair dice, three experimental objectives are dened each of which
makes the random experiment a Bernoulli trial:
• If the experimental objective is to observe whether faced up numbers on two rolls are the same, then success is
{(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)}. Hence, X ∼ Ber(1/6). Therefore,

 0 if x < 0
51−x /6 if x = 0, 1

1 5
pX (x) = FX (x) = 5/6 if 0 ≤ x < 1 E(X) = Var(X) =
0 Otherwise 6 36
1 if x ≥ 1

• If the experimental objective is to observe whether the sum of faced up numbers on two rolls is 6, then success is

34 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

{(1, 5), (5, 1), (2, 4), (4, 2), (3, 3)}. Hence, Y ∼ Ber(5/6). Therefore,

y  0 if y < 0
5 5
5 /6 if y = 0, 1
pY (y) = FY (y) = 1/6 if 0 ≤ y < 1 E(Y ) = Var(Y ) =
0 Otherwise 6 36
1 if y ≥ 1

• If the experimental objective is to observe whether faced up numbers on two rolls are the same and their sum is 6, then
success is {(3, 3)}. Hence, Z ∼ Ber(1/36). Therefore,

1−z  0 if z < 0
1 35
35 /36 if z = 0, 1
pZ (z) = FZ (z) = 35/36 if 0 ≤ z < 1 E(Z) = Var(Y ) = 2
0 Otherwise 36 36
1 if z ≥ 1

The Binomial Probability Distribution (Binomial PD)

Binomial Experiment
Any random experiment formed with repeating one Bernoulli trial a nite number of times and independentlya is
called a Binomial experiment.
+ Notice that, if one Bernoulli trial is (independently) repeated n times, then each outcome of the resulted Binomial
experiment may be presented as a sequence of k S s and (n − k) F s, where k = 0, 1, 2, 3, . . . , n. Hence, the sample
space of the resulted Binomial experiment has 2n outcomes.
a Repeating one Bernoulli trial a nite number of times is said to be independent if occurrences of success and failure in the repetition
are unrelated.

Binomial RV
On the sample space of a Binomial experiment, an RV which assigns the number of successes to each of its outcomes
is called the Binomial RV.
+ Notice that, if one (Bernoulli) trial with probability of success p is repeated n times independently, then

- the Binomial RV of the resulted (Binomial) experiment, say X , is a DRV whose state space is
{0, 1, 2, 3, . . . , n}, where X = k identies the event consists of experimental outcomes as sequences of k
successes and (n − k) failures. Since there are n!/[(k!)(n − k)!] of such outcomes which are mutually
exclusive, we have
P (X = k) = P (observing k Ss and (n − k) F s)
=p =p =1−p =1−p
n! z }| { z }| { z }| { z }| {
= P (S) · · · · · P (S) · P (F ) · · · · · P (F )
k!(n − k)! | {z } | {z }
k times (n−k)times
n!
= pk (1 − p)n−k
k!(n − k)!

n k
= p (1 − p)n−k
k

+ Since the assigned probabilities to the values of a Binomial RV are determined completely by the probability of
success of each trial and the number of trials, probability of success p of each trial and the number of trials n,
in a Binomial experiment, are called the parameters of the Binomial distribution. For, instead of stating
that an "RV X has the Binomial distribution with trial probability of success p and number of trials n", the
following notation is used
X ∼ Bin(n, p).

35 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

Probabilistic Properties of a Binomial RV


If X ∼ Bin(n, p), then
• the PMF of X is denoted and given by

n x
p (1 − p)n−x if x = 0, 1, 2, . . . , n


x

Bin(x; n, p) := P (X = x) =


0 Otherwise

+ Notice that Bin(x; n, p) gives the probability of observing x successes in n trials (each with success probability
p).
• the CDF of X is given by


 0 if x < 0


 P
k
FX (x) = P (X ≤ x) = i=0 Bin(i; n, p) if k ≤ x < k + 1, for k = 0, 1, 2, . . . , n − 1




1 if x ≥ n

• the mean/expected-value of X is given by


E(X) = np.

• the variance of X is given by


Var(X) = np(1 − p).

+ In X ∼ Bin(n, p), if p = 0.5, then


n n n
Bin − x; n, p = Bin + x; n, p , for x <
2 2 2

that is, the Binomial PD is symmetric w.r.t. the (hypothetical) vertical line through n/2 (i.e., its mean/expected-value)
on the horizontal axis.

On n-Sampling, with Replacement, from a Binomial Population

Some Terminologies

- Binomial Population: Any (statistical) population which can be divided into two groups based on a (denite)
characteristic is said to be Binomial population.
- In a Binomial population, the group of interest is called the success subpopulation, and the other one
is called the failure subpopulation. Each member of the success subpopulation is called a success.
- In a Binomial population of size N , if the size of the success subpopulation is nS , then nNS is called
(population) success proportion and is denoted by p; that is, p := nNS .
- n-Sampling with replacement from a Binomial Population: The experiment of randomly selecting a
sample of size n, with replacement, from a Binomial population, and observing the number of successes in the
sample.

36 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

Two Important Results:


If X is the RV gives the number of successes in each outcome of an n-sampling with replacement from a Binomial
population with population proportion p, then
+
X ∼ Bin (n; p) , E(X) = np, Var(X) = np(1 − p).

+ Letting Pb := X/n, Pb is an RV which gives sample success proportion for each selected sample of size n, and

=np =np(1−p)
z }| { z }| {
E(X) Var(X) p(1 − p)
E P =
b =p Var P =
b
2
= .
n n n

Moreover, The variance of Pb decreases as the sample size n increases, so that there is a tendency for Pb to become
closer and closer to the population proportion p as the sample size n increases.

Supplementary Examples

Example 39. Suppose that X ∼ Bin(7, 0.8). Calculate:

(a) P (X = 4) (b) P (X 6= 2) (c) P (X ≤ 3) (d) P (X ≥ 6)


(e) P (1 < X ≤ 5) (f ) P (X > 5|X > 1) (g) E(X) (h) E(X)

Solution Here the RV X has the Binomial distribution with parameters n = 7 and p = 0.8 with the state space
{0, 1, 2, 3, 4, 5, 6, 7}:

(a)

7
P (X = 4) = Bin(4; 7, 0.8) = 0.84 · 0.23 = 35 · 0.84 · 0.23 ≈ 0.1147.
4

(b)

7
P (X 6= 2) = 1 − P (X = 2) = 1 − Bin(2; 7, 08) = 1 − 0.82 · 0.25 ≈ 0.9957.
2

(c)
3 3
X X 7
P (X ≤ 3) = FX (3) = Bin(i; 7, 08) = 0.8i · 0.27−i
i
i=0 i=0

7 0 7 7 1 6 7 2 5 7
= 0.8 · 0.2 + 0.8 · 0.2 + 0.8 · 0.2 + 0.83 · 0.24
0 1 2 3
≈ 0.0333.

(d)

P (X ≥ 6) = P (X = 6) + P (X = 7) = Bin(6; 7, 08) + Bin(7; 7, 08)



7 6 1 7
= 0.8 · 0.2 + 0.87 · 0.20
6 7
≈ 0.5767

37 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

(e)
P (1 < X ≤ 5) = P (X = 2) + P (X = 3) + P (X = 4) + P (X = 5)
= Bin(2; 7, 08) + Bin(3; 7, 08) + Bin(4; 7, 08) + Bin(5; 7, 08)

7 7 7 7
= 0.82 · 0.25 + 0.83 · 0.24 + 0.84 · 0.23 + 0.85 · 0.22
2 3 4 5
≈ 0.4229.

(f)
P (X > 5 & X > 1) P (X > 5) P (X = 6) + P (X = 7) P (X = 6) + P (X = 7)
P (X > 5|X > 1) = = = =
P (X > 1) P (X > 1) 1 − P (X ≤ 1) 1 − P (X = 1) − P (X = 0)
Bin(6; 7, 08) + Bin(7; 7, 08)
=
1 − Bin(1; 7, 08) − Bin(0; 7, 08)

7 7
0.86 · 0.21 + 0.87 · 0.20
6 7
=
7 1 6 7
1− 0.8 · 0.2 − 0.80 · 0.27
1 0
≈ 0.5769.

(g)
E(X) = 7 · 0.8 = 5.6.

(h)
Var(X) = 7 · 0.8 · 0.2 = 1.12.

Example 40. An archer hits a bull's-eye with a probability of 0.09, and the results of dierent attempts can be taken to
be independent of each other. If the archer shoots nine arrows, calculate the probability that: (a) Exactly two arrows score
bull's-eyes. (b) At least two arrows score bull's-eyes. (c) What is the expected number of bull's-eyes scored?

Solution Shooting one arrow to score bull's-eye is a Bernoulli trial with success probability p = 0.09. Since it's assumed
the results of dierent shot arrows are independent of each other, shooting nine arrows to score bull's-eye is a Binomial
experiment. Letting X be the number of shot arrows, in 9 attempts, scored bull's-eye, we have
X ∼ Bin(9, 0.09) and RX = {0, 1, 2, 3, . . . , 9},

(a)
9
P (X = 2) = Bin(2; 9, 0.09) = 0.092 · 0.917 ≈ 0.1507.
2

(b)
P (X ≥ 2) = 1 − P (X < 2) = 1 − P (X = 1) − P (X = 0)

9 1 8 9
=1− 0.09 · 0.91 − 0.090 · 0.919 ≈ 0.0405.
1 0

(c)
E(X) = 9 · 0.09 = 0.81,
indicating, on average 0.81 arrows, in every set of 9 attempts, score bull's-eye.

38 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

Example 41. A fair die is rolled eight times. Calculate the probability that there are: (a) Exactly ve even numbers (b)
Exactly one 6 (c) No 4s.

Solution

(a) Here, since rolling a fair die to get one even number is a Bernoulli trial with success probability p = 0.5 and assuming the
faced up numbers in 8 rolls are independent of each other, rolling a fair die 8 times to observe whether an even number
is faced up in each roll is a Binomial experiment. Letting X be the number of even faced up numbers in 8 rolls of a fair
die, we have
X ∼ Bin(8, 0.5) and RX = {0, 1, 2, 3, . . . , 8}.
Therefore, the required probability is

8
P (X = 5) = Bin(5; 8, 0.5) = 0.55 0.53 ≈ 0.2188.
5

(b) Here, since rolling a fair die to get number 6 is a Bernoulli trial with success probability p = 1/6 and assuming the faced
up numbers in 8 rolls are independent of each other, rolling a fair die 8 times to observe whether number 6 is faced up
in each roll is a Binomial experiment. Letting Y be the number of times 6 is faced up in 8 rolls of a fair die, we have
Y ∼ Bin(8, 1/6) and RX = {0, 1, 2, 3, . . . , 8}.
Therefore, the required probability is
1 7
8 1 5
P (Y = 1) = Bin(1; 8, 1/6) = ≈ 0.3721.
1 6 6

(c) Here, since rolling a fair die to get number 4 is a Bernoulli trial with success probability p = 1/6 and assuming the faced
up numbers in 8 rolls are independent of each other, rolling a fair die 8 times to observe whether number 4 is faced up
in each roll is a Binomial experiment. Letting Z be the number of times 4 is faced up in 8 rolls of a fair die, we have
Z ∼ Bin(8, 1/6) and RX = {0, 1, 2, 3, . . . , 8}.
Therefore, the required probability is
0 8
8 1 5
P (Z = 0) = Bin(0; 8, 1/6) = ≈ 0.2326.
0 6 6

Example 42. A multiple-choice quiz consists of ten questions, each with ve possible answers of which only one is correct.
A student passes the quiz if seven or more correct answers are obtained. (a) What is the probability that a student who
guesses blindly at all of the questions will pass the quiz? (b) What is the probability of passing the quiz if, on each question,
a student can eliminate three incorrect answers and then guesses between the remaining two?

Solution

(a) Here, since randomly choosing one choice out of 5 to observe whether the correct choice is chosen is a Bernoulli trial with
success probability p = 1/5 and assuming the chosen choices (in a 5-choice quiz consists of 10 question) are independent of
each other, randomly choosing one choice for each of ten 5-choice questions to observe the number of correctly answered
questions is a Binomial experiment. Letting X be the number of correctly answered questions in such an experiment,
we have
X ∼ Bin(10, 1/5) and RX = {0, 1, 2, 3, . . . , 10}.
Therefore, the required probability is
P (X ≥ 7) = P (X = 7) + P (X = 8) + P (X = 9) + P (X = 10)
7 3 8 2 9 1 1 0
10 1 4 10 1 4 10 1 4 10 1 4
= + + + 0
7 5 5 8 5 5 9 5 5 10 5 5
≈ 0.00086.

39 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

(b) Here, since three choices out of 5 are going to be removed, we may assume each question in the quiz has only two choices.
By what's mentioned in the solution of part (a), letting Y be the number of correctly answered questions in a 2-choice
quiz consists of ten questions (where a choice for each question is chosen randomly), we have

Y ∼ Bin(10, 1/5) and RX = {0, 1, 2, 3, . . . , 10}.

Therefore, the required probability is

P (Y ≥ 7) = P (Y = 7) + P (Y = 8) + P (Y = 9) + P (Y = 10)

10 7 3 10 8 2 10 9 1 10
= 0.5 0.5 + 0.5 0.5 + 0.5 0.5 + 0.51 00.50
7 8 9 10
≈ 0.1719.

Example 43. A u virus hits a company employing 180 people. Independent of the other employees, there is a probability
of p = 0.35 that each person needs to take sick leave. (a) What are the expectation and variance of the proportion of the
workforce who need to take sick leave? (b) In general, what value of the sick rate p produces the largest variance for this
proportion?

Solution Letting X be the number of employees need to take sick leave, the proportion of employees need who need to
take sick leave is X/180.
Knowing that X ∼ Bin(180, 0.35), E(X) = 180 · 0.35 = 63 and Var(X) = 180 · 0.35 · 0.65 = 40.95. Therefore,
(a)

X 1 X 1
E = E(X) = 0.35 andVar = Var(X) ≈ 0.0013.
180 180 180 1802

(b) If the sick rate is p, then X ∼ Bin(180, p) and



X 1 p(1 − p)
Var = Var(X) = .
180 1802 180

Hence, since the variance of X/180 is maximized when p(1−p) is maximized, p = 0.5 maximizes the variance of proportion
of the workforce who need to take sick leave.

Example 44. A company receives 60% of its orders over the Internet. Within a collection of 18 independently placed orders,
what is the probability that (a) between eight and ten of the orders are received over the Internet? (b) no more than four
of the orders are received over the Internet?

Solution Letting X be the number of received orders over the internet out of 18 independently placed orders with
success probability 0.6, we have
X ∼ Bin(18, 0.6) and RX = {0, 1, 2, 3, . . . , 18}.

Therefore,
(a)

P (8 ≤ X ≤ 10) = P (X = 8) + P (X = 9) + P (X = 10)

18 8 10 18 9 9 18
= 0.6 0.4 + 0.6 0.4 + 0.610 0.48
8 9 10
≈ 0.3789.

40 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

(b)

P (X ≤ 4) = P (X = 4) + P (X = 3) + P (X = 2) + P (X = 1) + P (X = 0)

18 18 18 18 18
= 0.64 0.414 + 0.63 0.415 + 0.62 0.416 + 0.61 0.417 + 0.60 0.418
4 3 2 1 0
≈ 0.0013.

The Geometric Probability Distribution (Geometric PD)

Geometric Experiment
Any random experiment formed with independently repeating one Bernoulli trial until the rst success occurs is called
a Geometric experiment.
+ Notice that, the sample space of a Geometric experiment may be presented as

{S, F S, F F S, F F F S, F F F F S, . . . }.

Geometric RV
On the sample space of a Geometric experiment, an RV which assigns to each of its outcomes the number of times
the Bernoulli trial performeda is called the Geometric RV.
+ Notice that, if one (Bernoulli) trial with probability of success p is repeated independently until the rst success
occurs,
- the Geometric RV of the resulted (Geometric) experiment, say X , is a DRV whose state space is {1, 2, 3, . . .},
where X = k identies the outcome with k − 1 failures (which occur in the rst k − 1 trials) and 1 success
(which occurs in the last trial). Hence, we have
(k−1)times
 
z }| {
P (X = k) = P observing F F . . . F S 

=1−p =1−p p
z }| { z }| { z }| {
= P (F ) · · · · · P (F ) · P (S)
| {z }
(k−1)times

= (1 − p)k−1 p.

+ Since the assigned probabilities to the values of a Geometric RV are determined completely by the probability of
success of each trial, probability of success p of each trial, in a Geometric experiment, is called the parameter
of the Geometric distribution. For, instead of stating that an "RV X has the Geometric distribution with
trial probability of success p", the following notation is used
X ∼ G(p).
a That is, the number of times required the Bernoulli trial to be performed until the rst success occurs.

41 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

Probabilistic Properties of a Geometric RV


If X ∼ G(p), then
• the PMF of X is denoted and given by

 (1 − p)x−1 p if x = 1, 2, 3, . . .
G(x; p) := P (X = x) =
0 Otherwise

+ Notice that G(x; p) gives the probability of observing the rst success (with success probability p) in the xth
trial.
• the CDF of X is given by

 0 if x < 1
FX (x) = P (X ≤ x) =
1 − (1 − p)k if k ≤ x < k + 1, for k = 1, 2, 3, . . .

• the mean/expected-value of X is given by


1
E(X) = .
p
• the variance of X is given by
1−p
Var(X) = .
p2

Supplementary Examples

Example 45. If X ∼ G(0.7), calculate


(a) P (X = 4) (b) P (X = 0) (c) P (X ≤ 1) (d) P (X ≤ 5) (e) P (X ≥ 8) (f ) P (X ≤ 8|X ≥ 5)

Solution Notice that the state space of X is {1, 2, 3, . . .}.


(a)
P (X = 4) = G(4; 0.7) = 0.33 · 0.7 = 0.0189.

(b) P (X = 0) = G(0; 0.7) = 0.


(c) P (X ≤ 1) = P (X = 1) = G(1; 0.7) = 0.7.
(d) P (X ≤ 5) = 1 − 0.35 = 0.99757.
(e)
P (X ≥ 8) = 1 − P (X < 8) = 1 − P (X ≤ 7)
= 1 − 1 − 0.37 ≈ 0.00022.

(f)
P (X ≤ 8 & X ≥ 5) P (5 ≤ X ≤ 8)
P (X ≤ 8|X ≥ 5) = =
P (X ≥ 5) P (X ≥ 5)
P (X ≤ 8) − P (X < 5) P (X ≤ 8) − P (X ≤ 4)
= =
P (X ≥ 5) P (X ≥ 5)
8
4

1 − 0.3 − 1 − 0.3
= ≈ 0.0081.
1 − 0.35

42 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

Example 46. Refer to Example 40, (a) If the archer shoots a series of arrows, what is the probability that the rst bull's-eye
is scored with the fourth arrow? (b) What is the expected number of arrows shot before the rst bull's-eye is scored?

Solution Here, shooting a series of arrows that the rst bull's-eye is scored (where it's assumed the results of dierent
attempts can be taken to be independent of each other, and the probability of hitting it is 0.09) is a Geometric experiment.
Letting X be the Geometric RV for this experiment; that is, the RV X gives the number of shot arrows/attempts until the
rst bull's-eye is scored, we have X ∼ G(0.09). Therefore,
(a) P (X = 4) = G(4; 0.09) = 0.913 · 0.09 ≈ 0.0678.
(b) E(X) = 1/0.09 ≈ 11.1 which indicates that, on average at the 11th attempt, the rst bull'eye is scored per series of shot
arrows.

Example 47. Cards are chosen randomly from a pack of cards with replacement. (a) calculate the probability that the rst
heart is obtained on the third drawing. (b) If the rst two cards drawn are spades, what is the probability that the rst
heart is obtained on the fth drawing?

Solution Here, randomly choose a card from a deck of 52 cards to observe if it's a heart is a Bernoulli trial with success
probability 0.25. Hence, randomly choosing cards, one at a time with replacement from a deck of 52, until the rst heart is
drawn is a Geometric experiment. Letting X be the Geometric RV for this experiment; that is, the RV X gives the number
of randomly selected cards until the rst heart is drawn, we have X ∼ G(0.25). Therefore,
(a) P (X = 3) = G(3; 0.25) = 0.752 0.25 = 0.140625.
(b) The event whose probability of occurrence is required can be stated as the intersection of two other events: (i) the event
of randomly selecting two spades one at a time with replacement from a deck of 52 cards (label this event as A), and
(ii) the event of randomly selecting cards, one at a time with replacement from a deck of 52 cards, until the rst heart
is drawn at the third draw (label this event as B ). Therefore, the required probability is
=P (X=3)
z }| {
P (A ∩ B) = P (A) · P (B) A and B are independent
2
13
= · G(3; 0.25)
522
= 0.0625 · 0.752 0.25 ≈ 0.0088.

Example 48. When a sherman catches a sh, it is a young one with a probability of 0.23 and it is returned to the water.
On the other hand, an adult sh is kept to be eaten later. (a) What is the expected number of sh caught by the sherman
before an adult sh is caught? (b) What is the probability that the fth sh caught is the rst young sh?

Solution

(a) Here, catching an adult sh is a Bernoulli trial with success probability 0.77. Hence, catching sh until the rst adult
sh caught is a Geometric experiment. Letting X be the Geometric RV for this experiment; that is, the RV X gives
the number of caught sh until the rst adult sh is caught, we have X ∼ G(0.77). Therefore, E(X) = 1/0.77 ≈ 1.3
indicating that, on average approximately 1.3 sh to be caught per catching until the rst adult sh is caught.
Accordingly, the expected number of sh caught by the sherman, before an adult sh is caught, given by
E(X − 1) = E(X) − 1 ≈ 0.3.

(b) Here, catching an young sh is a Bernoulli trial with success probability 0.23. Hence, catching sh until the rst young
sh caught is a Geometric experiment. Letting Y be the Geometric RV for this experiment; that is, the RV Y gives the
number of caught sh until the rst young sh is caught, we have Y ∼ G(0.23). Therefore,
P (Y = 5) = G(5; 0.23) = 0.774 · 0.23 ≈ 0.0809.

43 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

Example 49. Refer to Example 44, within a certain period of time, what is the probability that the fth order received is
the rst Internet order?

Solution Here, checking an order to see if it's received over the Internet is a Bernoulli trial with success probability
0.6. Hence, checking orders one by one (within a certain period of time) to observe the rst one received over the internet
is a Geometric experiment. Letting X be the Geometric RV for this experiment; that is, the RV X gives the number of
received orders (within a certain period of time) until observing the rst one ordered over the internet, we have X ∼ G(0.6).
Therefore, the required probability is
P (X = 5) = G(5; 0.6) = 0.44 · 0.6 = 0.01536.

Example 50. The refusal rate for telephone polls is known to be approximately 20%. A newspaper report indicates that
50 people were interviewed before the rst refusal. (a) Comment on the validity of the report. Use a probability in your
argument. (b) What is the expected number of people interviewed before a refusal?

Solution Here, observing if a telephone poll is refused by the interviewer is a Bernoulli trial with success probability
0.2. Hence, randomly making telephone calls to poll to observe the rst refusal is a Geometric experiment. Letting X be
the Geometric RV for this experiment; that is, the RV X gives the number of telephone calls made to poll (within a certain
period of time) until observing the rst refusal, we have X ∼ G(0.2).
(a) To validate the given report, we rst calculate the probability of observing the rst refusal on the 51st telephone poll:
P (X = 51) = G(51; 0.2) = 0.85 0 · 0.2 ≈ 2.9 × 10−6 ,

indicating that it's highly unlikely that a telephone poll is refused after 50 calls if the refusal rate is known to be 20%.
Therefore, the newspaper report couldn't be validated!
(b) On average, the number of telephone polls to be made before the rst refusal is
1
E(X − 1) = E(X) − 1 = − 1 = 4.
0.2

Example 51. Computer technology has produced an environment in which robots operate with the use of microprocessors.
The probability that a robot fails during any 6-hour shift is 0.10. What is the probability that a robot will operate through
at most 5 shifts before it fails?

Solution Here, observing if a robot fails during any 6-hour shift is a Bernoulli trial with success probability 0.1. Hence,
observing the rst failure in the robot operation in a series of 6-hour shifts is a Geometric experiment. Letting X be the
Geometric RV for this experiment; that is, the RV X gives the number of 6-hour shifts of operating the robot until the rst
failed operation is observed, we have X ∼ G(0.1). Therefore the required probability is
P (X ≤ 6) = 1 − 0.96 ≈ 0.4686.

Example 52. A general contracting rm experiences cost overruns on 20% of its contracts. In a company audit, 20 contracts
are sampled at random. (a) What is the probability that exactly four of them experience cost overruns? (b) What is the
probability that fewer than three of them experience cost overruns? (c) What is the probability that none of them experience
cost overruns? (d) Find the mean number that experience cost overruns. (e) Find the standard deviation of the number
that experience cost overruns.

Solution

44 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

The Negative Binomial Probability Distribution (Negative Binomial PD): Generaliza-


tion of Geometric PD

Negative Binomial Experiment


Any random experiment formed with independently repeating one Bernoulli trial until (i.e., up to and including) the
rth success occurs is called a Negative Binomial experiment.

+ Notice that, if one (Bernoulli) trial with probability of success p is repeated independently until the rth success
occurs, each outcome of the resulted Negative Binomial experiment may be presented as a sequence contains k
F ,s where k = 0, 1, 2, 3, . . ., and r S 's (of which one occurs on the rightmost position): A typical outcome of
such a Negative Binomial experiment may be presented as a sequence in the form
k times (r−1) times
z }| { z }| {
F · · · F S · · · S S, where k = 0, 1, 2, 3, . . .

k+r−1
note that, for every choice of k, there exist (k+r−1)!
k!(r−1)! = of such sequences. Therefore, the sample
k
space of a Negative Binomial experiment is countably innite.

Negative Binomial RV
On the sample space of a Negative Binomial experiment, an RV which assigns to each of its outcomes the number of
times the Bernoulli trial performeda is called the Negative Binomial RV.
+ Notice that, if one (Bernoulli) trial with probability of success p is repeated independently until the rth success
occurs,
- the Negative Binomial RV of the resulted (Negative Binomial) experiment, say X , is a DRV whose state
space is {r, r + 1, r + 2, . . .}, where X = k identies the event consists of outcomes with (k − r) failures and
r successes (of which one occurs in the last trial). Since there are (r−1)!(k−r)!
(k−1)!
of such outcomes which are
mutually exclusive, we have
P (X = k) = P (observing (k − r) failures and r successes of which one occurs in the last trial)
=1−p =1−p =p =p
(k − 1)! z }| { z }| { z }| { z }| {
= P (F ) · · · · · P (F ) · P (S) · · · · · P (S)
(r − 1)!(k − r)! | {z } | {z }
(k−r) times r times

(k − 1)!
= (1 − p)k−r pr
(r − 1)!(k − r)!

k−1
= (1 − p)k−r pr .
r−1

+ Since the assigned probabilities to the values of a Negative Binomial RV are determined completely by the
probability of success of each trial and the number of successes required, probability of success p of each trial
and the required number of successes r, in a Negative Binomial experiment, are called the parameters of
the Negative Binomial distribution. For, instead of stating that an "RV X has the Negative Binomial
distribution with trial probability of success p and required successes r", the following notation is used
X ∼ NBin(r, p).
a That is, the number of times required the Bernoulli trial to be performed until the rth success occurs.

45 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

Probabilistic Properties of a Negative Binomial RV


If X ∼ NBin(r, p), then
• the PMF of X is denoted and given by

x−1
(1 − p)x−r pr if x = r, r + 1, r + 2, . . .


r−1

NBin(x; r, p) := P (X = x) =


0 Otherwise

+ Notice that NBin(x; r, p) gives the probability of observing the rth success (with success probability p) in the
xth trial.

• the CDF of X is given by



 0 if x < r
FX (x) = P (X ≤ x) =
 Pk
i=0 NBin(r + i; r, p) if r + k ≤ x < r + k + 1, for k = 0, 1, 2, 3, . . .

• the mean/expected-value of X is given by


r
E(X) = .
p
• the variance of X is given by
r(1 − p)
Var(X) = .
p2

Supplementary Examples

Example 53. If X ∼ NBin(3, 0.6), calculate


(a) P (X = 2) (b) P (X = 6) (c) P (X ≤ 6) (d) P (X ≥ 5) (e) P (X ≤ 3) (f ) P (X ≤ 6|X ≥ 5)
Solution Here, note that the state space of X is {3, 4, 5, . . .}.
(a) Since 2 is not in the state space of X , P (X = 2) = 0.
(b)
5
P (X = 6) = NBin(6; 3, 0.6) = 0.43 0.63 = 0.13824.
2
(c)
P (X ≤ 6) = P (X = 6) + P (X = 5) + P (X = 4) + P (X = 3)
= NBin(6; 3, 0.6) + NBin(5; 3, 0.6) + NBin(4; 3, 0.6) + NBin(3; 3, 0.6)

5 3 3 4 2 3 3 1 3 2
= 0.4 0.6 + 0.4 0.6 + 0.4 0.6 + 0.40 0.63
2 2 2 2
= 0.8208.

(d)
P (X ≥ 5) = 1 − P (X < 5) = 1 − P (X ≤ 4) = 1 − [P (X = 4) + P (X = 3)]
= 1 − NBin(4; 3, 0.6) − NBin(3; 3, 0.6)

3 1 3 2
=1− 0.4 0.6 − 0.40 0.63
2 2
= 0.5248.

46 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

(e)
2
P (X ≤ 3) = P (X = 3) = NBin(3; 3, 0.6) = 0.40 0.63 = 0.216.
2

(f)
P (X ≤ 6 & X ≥ 5) P (5 ≤ X ≤ 6)
P (X ≤ 6|X ≥ 5) = =
P (X ≥ 5) 1 − P (X < 5)
P (X = 5) + P (X = 6)
=
1 − P (X = 4) − P (X = 3)
NBin(5; 3, 0.6) + NBin(6; 3, 0.6)
=
1 − NBin(4; 3, 0.6) − NBin(3; 3, 0.6)

4 2 3 5
0.4 0.6 + 0.43 0.63
2 2
=
3 2
1− 0.41 0.63 − 0.40 0.63
2 2
0.3456
= ≈ 0.6585
0.5248

Example 54. Refer to Example 40, (a) what is the probability that the third bull's-eye is scored with the tenth arrow? (d)
what is the expected number of arrows shot before the third bull's-eye is scored?

Solution Here, shooting a series of arrows that the third bull's-eye is scored (where it's assumed the results of dierent
attempts can be taken to be independent of each other, and the probability of hitting it is 0.09) is a Negative Binomial
experiment. Letting X be the Negative Binomial RV for this experiment; that is, the RV X gives the number of shot
arrows/attempts until the third bull's-eye is scored, we have X ∼ NBin(3, 0.09). Therefore,

9
(a) P (X = 10) = NBin(10; 3, 0.09) = 0.917 · 0.093 ≈ 0.0136.
2

(b) E(X) = 3/0.09 ≈ 33.3 which indicates that, on average after 33.3 attempts, the third bull'eye is scored per series of shot
arrows.

Example 55. Refer to Example 47, calculate the probability that the fourth heart is obtained on the tenth drawing.

Solution Here, randomly choose a card from a deck of 52 cards to observe if it's a heart is a Bernoulli trial with success
probability 0.25. Hence, randomly choosing cards, one at a time with replacement from a deck of 52, until the fourth heart
is drawn is a Negative Binomial experiment. Letting X be the Negative Binomial RV for this experiment; that is, the RV
X gives the number of randomly selected cards until the fourth heart is drawn, we have X ∼ NBin(4, 0.25). Therefore, the
required probability is
9
P (X = 10) = NBin(10; 4, 0.25) = 0.756 · 0.254 ≈ 0.0584.
3

Example 56. Refer to Example 48, suppose that the sherman wants three sh to eat for lunch. (a) What is the probability
that the rst time the sherman can stop for lunch is immediately after the sixth sh has been caught? (b) If the sherman
catches eight sh, what is the probability that there are sucient sh for lunch?

Solution Here, catching an adult sh is a Bernoulli trial with success probability 0.77. Hence, catching sh until the
third adult sh caught is a Negative Binomial experiment. Letting X be the Negative Binomial RV for this experiment; that
is, the RV X gives the number of caught sh until the third adult sh is caught, we have X ∼ NBin(3, 0.77). Therefore,

47 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

(a) the required probability is



5
P (X = 6) = NBin(6; 3, 0.77) = 0.233 · 0.773 ≈ 0.0555.
2

(b) the required probability is

P (X ≤ 8) = NBin(8; 3, 0.77) + NBin(7; 3, 0.77) + · · · + NBin(3; 3, 0.77)



7 5 3 6 4 3 2
= 0.23 · 0.77 + 0.23 · 0.77 + · · · + 0.230 · 0.773
2 2 2
≈ 0.9973.

Example 57. Refer to Example 44, within a certain period of time, what is the probability that the eighth order received is
the fourth Internet order?

Solution Here, checking an order to see if it's received over the Internet is a Bernoulli trial with success probability 0.6.
Hence, checking orders one by one (within a certain period of time) to observe the fourth one received over the internet is
a Negative Binomial experiment. Letting X be the Negative Binomial RV for this experiment; that is, the RV X gives the
number of received orders (within a certain period of time) until observing the fourth one ordered over the internet, we have
X ∼ NBin(4, 0.6). Therefore, the required probability is

7
P (X = 8) = NBin(8; 4; 0.6) = 0.44 · 0.64 ≈ 0.1161.
3

Example 58. Consider a fair six-sided die. The die is rolled until a 6 is obtained for the third time. What is the expectation
of the number of die rolls needed?

Solution Here, rolling a fair die once to observe if 6 is obtained is a Bernoulli trial with success probability 1/6. Hence,
rolling a fair die several times until 6 is obtained for the third time is a Negative Binomial experiment. Letting X be the
Negative Binomial RV for this experiment; that is, the RV X gives the number of the fair die rolls until 6 is obtained for the
third time, we have X ∼ NBin(3, 1/6). Therefore, E(X) = 3/(1/6) = 18, indicating that on average at the 18th roll of a fair
die, 6 is faced up for the third time, per a set of rolls.

Example 59. A fair coin is tossed until the fth head is obtained. What is the probability that the coin is tossed exactly
ten times?

Solution Here, tossing a fair coin once to observe if head is obtained is a Bernoulli trial with success probability 0.5.
Hence, tossing a fair coin several times until head is obtained for the fth time is a Negative Binomial experiment. Letting
X be the Negative Binomial RV for this experiment; that is, the RV X gives the number of the fair coin tosses until head is
obtained for the fth time, we have X ∼ NBin(5, 0.5). Therefore, the required probability is

9
P (X = 10) = NBin(10; 5; 0.5) = 0.55 · 0.55 ≈ 0.1230.
4

The Hypergeometric Probability Distribution (Hypergeometric PD)


- For this part, we need some terminologies stated (in this lecture note) after introducing the Binomial PD. So, go back
there to remind yourself.

48 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

Hypergeometric Experiment
Any experiment which can be modeled as to randomly select a sample (of a given and xed size), without replacement,
from a Binomial population with distinct members, and to observe the number of successes in the sample (i.e., sampling
without replacement from a Binomial population) is called a Hypergeometric experiment.
- In a Hypergeometric experiment, three quantities are of importance: Binomial population size, sample size, and
success subpopulation size.
+ Notice that, the sample space of a Hypergeometric experiment, where the sizes of the Binomial population,
sample, and success subpopulation are N , n, and r, respectively, consists of all samples of size n to be selected
from the Binomial population of size N . Therefore,

N
+ the sample space of such Hypergeometric experiment consists of = n!(NN−n)!
!
equally likely outcomes
n
(i.e., samples), and
+ each outcome of the sample space contains i successes and (n − i) failures, where
-
if n > N − r, then n + r − N ≤ i ≤ r
if n ≥ r and
if n ≤ N − r, then 0 ≤ i ≤ r
, or
-
if n > N − r, then n + r − N ≤ i ≤ n
if n ≤ r and
if n ≤ N − r, then 0 ≤ i ≤ n
.

Hypergeometric RV
On the sample space of a Hypergeometric experiment, an RV which assigns to each outcome the number of successes
contained in it is called the Hypergeometric RV.
+ The Hypergeometric RV, say X , assigning the number of successes to every possible randomly selected samples
of size n, from the Binomial population of size N with success subpopulation size of r, is a DRV whose values
are any integer between max{0, n + r − N } and min{n, r}:
- where X = k identies the event consists of all randomly selected samples of size n containing k successes
and n − k failures (without replacement), and

r N −r
- since there are · of such samples (according to the Multiplication Principle),
k n−k

r N −r
·
k n−k
P (X = k) = .
N
n

+ Since the assigned probabilities to the values of a Hypergeometric RV are determined completely by the Binomial
population size N , success subpopulation size r and sample size n, these sizes, in a Hypergeometric experiment,
are called the parameters of the Hypergeometric distribution. For, instead of stating that an "RV X has
the Hypergeometric distribution with parameters N , n, and r", the following notation is used
X ∼ HG(N, n, r).

49 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

Probabilistic Properties of a Geometric RV


If X ∼ HG(N, n, r), then
• the PMF of X is denoted and given by
   

 r
 ·
N − r

 x n−x
if max{0, n − (N − r)} ≤ x ≤ min{n, r}

  

HG(x; N, n, r) := P (X = x) = N
 

 n





0 Otherwise

+ Notice that HG(x; N, n, r) gives the probability of observing x successes in a randomly selected sample of size
n from a Binomial population of size N contains r successes, without replacement.
• the mean/expected-value of X is given by
nr
E(X) = .
N
• the variance of X is given by
N −n r r
Var(X) = ·n· 1− .
N −1 N N

The Hypergeometric PD vs. The Binomial PD: Approximation the Hypergeometric PD with the Binomial
PD

Fact:
In a Hypergeometric experiment, sampling may be viewed as with replacement if the size of a randomly selected
sample is small enough compared to the Binomial population. Hence, in such a case, the Hypergeometric PD can be
approximated using the Binomial PD with parameters sample size and population success proportion, as
r
HG(x; N, n, r) ≈ Bin x; n, .
N
+ Rule of Thumb: This approximation works well if n
N < 0.05.
+ Notice that with this approximation, mean/expected-value of the Hypergeometric RV doesn't change but its
variance is approximated by n · Nr 1 − Nr .

Supplementary Examples

Example 60. If X ∼ HG(11, 7, 6), calculate


(a) P (X = 4) (b) P (X ≤ 6) (c) P (X < 4)

Solution Notice that since N = 11, n = 7, and r = 6, the state space of X are all integers between max{0, 2} = 2 and
min{7, 6} = 6. Hence, the admissible values for X are 2, . . . , 6. Therefore,
1.
6 5
·
4 3 150
P (X = 4) = HG(4; 11, 7, 6) = = ≈ 0.45.
11 330
7

2. P (X ≤ 6) = 1.

50 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

3.

P (X < 4) = P (X = 3) + P (X = 2)
= HG(3; 11, 7, 6) + HG(2; 11, 7, 6)

6 5 6 5
· ·
3 4 2 5
= +
11 11
7 7
100 15
= + ≈ 0.3485.
330 330

Example 61. A committee consists of eight right-wing members and seven left-wing members. A subcommittee is formed
by randomly choosing ve of the committee members. Draw a line graph of the PMF of the number of right-wing members
serving on the subcommittee.

Solution Here, the Binomial population is the committee with N = 15 members and the success subpopulation is the
right-wing members with r = 8. Now, the experiment is to randomly select a subcommittee of n = 5 members without
replacement and to observe the right-wing members, which is a Hypergeometric experiment. Hence, letting X be the
Hypergeometric RV of this experiment; that is, the RV which gives the number of right-wing members in each selected
subcommittee, we have X ∼ HG(15, 5, 8). Notice that the values of the RV X are all integers between max{0, −2} = 0 and
min{5, 8} = 5; that is, RX = {0, 1, 2, 3, 4, 5}. Therefore,


8 7 8 7
· ·
0 5 21 1 4 280
P (X = 0) = HG(0; 15, 5, 8) = = P (X = 1) = HG(1; 15, 5, 8) = =
15 3003 15 3003
5 5

8 7 8 7
· ·
2 3 980 3 2 1176
P (X = 2) = HG(2; 15, 5, 8) = = P (X = 3) = HG(3; 15, 5, 8) = =
15 3003 15 3003
5 5

8 7 8 7
· ·
4 1 490 5 0 56
P (X = 4) = HG(4; 15, 5, 8) = = P (X = 5) = HG(5; 15, 5, 8) = =
15 3003 15 3003
5 5

and the line graph is

51 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

Example 62. A box contains 17 balls of which 10 are red and 7 are blue. A sample of 5 balls is chosen at random and
placed in a jar. Calculate the probability that: (a) The jar contains exactly 3 red balls. (b) The jar contains exactly 1 red
ball. (c) The jar contains more blue balls than red balls.

Solution Here, the Binomial population is the box contains N = 17 balls and the success subpopulation is the red balls
r = 10. Now, the experiment is to randomly select a sample of n = 5 balls without replacement and to observe the red
balls, which is a Hypergeometric experiment. Hence, letting X be the Hypergeometric RV of this experiment; that is, the
RV which gives the number of red balls in each selected sample, we have X ∼ HG(17, 5, 10). Notice that the values of the
RV X are all integers between max{0, −2} = 0 and min{5, 10} = 5; that is, RX = {0, 1, 2, 3, 4, 5}. Therefore,

(a)

10 7
·
3 2 2520
P (X = 3) = HG(3; 17, 5, 10) = = ≈ 0.4072.
17 6188
5

(b)

10 7
·
1 4 350
P (X = 1) = HG(1; 17, 5, 10) = = ≈ 0.0566.
17 6188
5

(c) To get a sample with more blue balls than red ones, the number of red balls in the sample should be either 2 or 1 (here,

52 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

we don't consider 0 red balls.) So, the required probability is


P (X = 2 or X = 1) = P (x = 2) + p(x = 1) = HG(2; 17, 5, 10) + HG(1; 17, 5, 10)

10 7 10 7
· ·
2 3 1 4
= +
17 17
5 5
1575 350
= + ≈ 0.3111.
6188 6188

Example 63. A jury of 12 people is selected at random from a group of 16 men and 18 women. (a) What is the probability
that the jury contains exactly 7 women? (b) Suppose that the jury is selected at random from a group of 1600 men and 1800
women. Use the binomial approximation to the hypergeometric distribution to calculate the probability that in this case the
jury contains exactly 7 women.

Solution

(a) Here, the Binomial population is the group of N = 34 men and women and the success subpopulation is the women group
r = 18. Now, the experiment is to randomly select a sample of n = 12 people without replacement and to observe the
selected women, which is a Hypergeometric experiment. Hence, letting X be the Hypergeometric RV of this experiment;
that is, the RV which gives the number of women in each selected sample, we have X ∼ HG(34, 12, 18). Notice that
the values of the RV X are all integers between max{0, −4} = 0 and min{12, 18} = 12; that is, RX = {0, 1, 2, . . . , 12}.
Therefore, the required probability is

18 16
·
7 5 139, 007, 232
P (X = 7) = HG(7; 34, 12, 18) = = ≈ 0.2535.
34 548, 354, 040
12

(b) Here, the Binomial population is the group of N = 3400 men and women and the success subpopulation is the women
group r = 1800. Now, the experiment is to randomly select a sample of n = 12 people without replacement and to
observe the selected women, which is a Hypergeometric experiment. Hence, letting X be the Hypergeometric RV of this
experiment; that is, the RV which gives the number of women in each selected sample, we have X ∼ HG(3400, 12, 1800).
Notice that the Binomial population is large compare to the sample (i.e., 12/3400<0.05). Therefore, instead of directly
calculating the required probability, we use the following approximation:

18
HG(3400, 12, 1800) ≈ Bin 12, .
34

Accordingly,
5 7
18 12 16 18
P (X = 7) = HG(7; 3400, 12, 1800) ≈ Bin 7; 12, = · · ≈ 0.2131.
34 7 34 34

Example 64. Five cards are selected at random from a pack of cards without replacement. (a) What is the probability that
exactly three of them are picture cards (kings, queens, or jacks)? (b) If a hand of 13 cards is dealt from the pack, what are
the expectation and variance of the number of picture cards that it contains?

Solution

(a) Here, the Binomial population is a pack of cards N = 52 and the success subpopulation is picture cards r = 12. Now, the
experiment is to randomly select a sample of n = 5 cards without replacement and to observe the selected picture cards,

53 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

which is a Hypergeometric experiment. Hence, letting X be the Hypergeometric RV of this experiment; that is, the RV
which gives the number of picture cards in each selected sample, we have X ∼ HG(52, 5, 12). Notice that the values of
the RV X are all integers between max{0, −35} = 0 and min{5, 12} = 5; that is, RX = {0, 1, 2, 3, 4, 5}. Therefore, the
required probability is

12 40
·
3 2 171, 600
P (X = 3) = HG(3; 52, 5, 12) = = ≈ 0.0660.
52 2, 598, 960
5

(b) Here, the Binomial population is a pack of cards N = 52 and the success subpopulation is picture cards r = 12. Now, the
experiment is to randomly select a sample of n = 13 cards without replacement and to observe the selected picture cards,
which is a Hypergeometric experiment. Hence, letting Y be the Hypergeometric RV of this experiment; that is, the RV
which gives the number of picture cards in each selected sample, we have Y ∼ HG(52, 13, 12). Notice that the values of
the RV X are all integers between max{0, −27} = 0 and min{13, 12} = 12; that is, RY = {0, 1, 2, . . . , 12}. Therefore,
the number of picture cards, on average, for every selected without replacement sample of size 13, per repetition is
13 · 12
E(Y ) = = 3,
52
and the variance is
52 − 13 12 12 30
Var(Y ) = · 13 · 1− = .
52 − 1 52 52 17

Example 65. There are 11 items of a product on a shelf in a retail outlet, and unknown to the customers, 4 of the items are
outdated. Suppose that a customer takes 3 items at random. (a) What is the probability that none of the outdated prod-
ucts are selected by the customer? (b) What is the probability that exactly 2 of the items taken by the customer are outdated?

Solution Here, the Binomial population is the items of a product on a shelf in a retail outlet N = 11 and the success
subpopulation is the outdated ones r = 4. Now, the experiment is to randomly select a sample of n = 3 of the items without
replacement and to observe the selected outdated products, which is a Hypergeometric experiment. Hence, letting X be the
Hypergeometric RV of this experiment; that is, the RV which gives the number of outdated products in each selected sample,
we have X ∼ HG(11, 3, 4). Notice that the values of the RV X are all integers between max{0, −4} = 0 and min{3, 4} = 3;
that is, RX = {0, 1, 2, 3}. Therefore,
(a) the required probability is

4 7
·
0 3 35
P (X = 0) = HG(0; 11, 3, 4) = = ≈ 0.2121.
11 165
3

(b) the required probability is



4 7
·
2 1 42
P (X = 2) = HG(2; 11, 3, 4) = = ≈ 0.2545.
11 165
3

Example 66. A plate has 15 cupcakes on it, of which 9 are chocolate and 6 are strawberry. A child randomly selects 5 of the
cupcakes and eats them. What is the probability that the number of chocolate cupcakes remaining on the plate is between
5 and 7 inclusive?

54 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

Solution Here, the Binomial population is the plate of cupcakes N = 15 and the success subpopulation is the chocolate
cupcakes r = 9. Now, the experiment is to randomly select a sample of n = 5 of the cupcakes without replacement and to
observe the selected chocolate cupcakes, which is a Hypergeometric experiment. Hence, letting X be the Hypergeometric
RV of this experiment; that is, the RV which gives the number of chocolate cupcakes in each selected sample, we have
X ∼ HG(15, 5, 9). Notice that the values of the RV X are all integers between max{0, −1} = 0 and min{5, 9} = 5; that is,
RX = {0, 1, 2, 3, 4, 5}. Hence, if Y is the RV giving the number of chocolate cupcakes in the plate after each sampling, then
Y = 9 − X . Therefore, the required probability is

P (5 ≤ Y ≤ 7) = P (5 ≤ 9 − X ≤ 7) = P (2 ≤ X ≤ 4) = P (X = 2) + P (X = 3) + P (X = 4)
= HG(2; 15, 5, 9) + HG(3; 15, 5, 9) + HG(4; 15, 5, 9)

9 6 9 6 9 6
· · ·
2 3 3 2 4 1
= + +
15 15 15
5 5 5
720 1260 756 2736
= + + = ≈ 0.9111.
3003 3003 3003 3003

Example 67. (a) A box contains 8 red balls and 8 blue balls, and 4 balls are taken at random without replacement. What
is the probability that 2 red balls and 2 blue balls are taken? (b) A box contains 50,000 red balls and 50,000 blue balls, and
4 balls are taken at random without replacement. Estimate the probability that 2 red balls and 2 blue balls are taken.

Solution

(a) Here, the Binomial population is the box of balls N = 16 and (let's say) the success subpopulation is the red balls r = 8.
Now, the experiment is to randomly select a sample of n = 4 of the balls without replacement and to observe the selected
red balls, which is a Hypergeometric experiment. Hence, letting X be the Hypergeometric RV of this experiment; that
is, the RV which gives the number of red balls in each selected sample, we have X ∼ HG(16, 4, 8). Notice that the values
of the RV X are all integers between max{0, −4} = 0 and min{4, 8} = 4; that is, RX = {0, 1, 2, 3, 4}. Therefore, the
required probability is
8 8
·
2 2 784
P (X = 2) = HG(2; 16, 4, 8) = = ≈ 0.4308.
16 1820
4

(b) Here, the Binomial population is the group of N = 100, 000 balls and (let's say) the success subpopulation is the red balls
r = 50, 000. Now, the experiment is to randomly select a sample of n = 4 balls without replacement and to observe the
selected red balls, which is a Hypergeometric experiment. Hence, letting X be the Hypergeometric RV of this experiment;
that is, the RV which gives the number of red balls in each selected sample, we have X ∼ HG(100, 000, 4, 50, 000).
Notice that the Binomial population is large compare to the sample (i.e., 4/100,000<0.05). Therefore, instead of directly
calculating the required probability, we use the following approximation:

50, 000
HG(100, 000, 4, 50, 000) ≈ Bin 4, .
100, 000

Accordingly,
4
P (X = 2) = HG(2; 100, 000, 4, 50, 000) ≈ Bin (2; 4, 0.5) = · 0.52 · 0.52 = 0.375.
2

Example 68. In a ground water contamination study, the researchers identify 25 possible sites for drilling and sample col-
lection. Unknown to the researchers, 19 of these sites have ground water with a high contamination, while the other 6 sites
have ground water with a low contamination. The researchers have a budget that only allows them to drill at 5 sites, so they
randomly choose these 5 sites from their list of 25 sites. What is the probability that at least 4 out of the 5 sites that the

55 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

researchers examine have ground water with a high contamination?

Solution Here, the Binomial population is the sites to drill and to sample collection N = 25 and the success subpopulation
is the ones having ground water with a high contamination r = 19. Now, the experiment is to randomly select a sample of
n = 5 of the sites without replacement and to observe the selected ones having ground water with a high contamination,
which is a Hypergeometric experiment. Hence, letting X be the Hypergeometric RV of this experiment; that is, the RV which
gives the number of sites having ground water with a high contamination in each selected sample, we have X ∼ HG(25, 5, 19).
Notice that the values of the RV X are all integers between max{0, −1} = 0 and min{5, 19} = 5; that is, RX = {0, 1, 2, 3, 4, 5}.
Therefore, the required probability is
P (X ≥ 4) = P (X = 4) + P (X = 5) = HG(4; 25, 5, 19) + HG(5; 25, 5, 19)

19 6 19 6
· ·
4 1 5 0
= +
25 25
5 5
23, 256 11, 628 34, 884
= + = ≈ 0.6566.
53, 130 53, 130 53, 130

Example 69. An investor is considering making investments in ten companies. Unknown to the investor, only three of these
companies will be successful. If the investor randomly chooses four of the companies, what is the probability that at least
two will be successful?

Solution Here, the Binomial population is the companies for investment N = 10 and the success subpopulation is the
ones going to be successful r = 3. Now, the experiment is to randomly select a sample of n = 4 of the companies without
replacement and to observe the selected ones going to be successful, which is a Hypergeometric experiment. Hence, letting
X be the Hypergeometric RV of this experiment; that is, the RV which gives the number of companies will make prot in
each selected sample, we have X ∼ HG(10, 4, 3). Notice that the values of the RV X are all integers between max{0, −3} = 0
and min{4, 3} = 3; that is, RX = {0, 1, 2, 3}. Therefore, the required probability is
P (X ≥ 2) = P (X = 2) + P (X = 3) = HG(2; 10, 4, 3) + HG(3; 10, 4, 3)

3 7 3 7
· ·
2 2 3 1
= +
10 10
4 4
63 7 70
= + = ≈ 0.3333.
210 210 210

Example 70. A company purchases large lots of a certain kind of electronic device. A method is used that rejects a lot if 2
or more defective units are found in a random sample of 100 units. (a) What is the mean number of defective units found
in a sample of 100 units if the lot is 2% defective? (b) What is the variance?

Solution Here, the Binomial population is the purchased large lots of a certain kind of electronic device N and the
success subpopulation consists of defective units r. Now, the experiment is to randomly select a sample of n = 100 units
without replacement and to observe the selected defective units, which is a Hypergeometric experiment. Hence, letting X be
the Hypergeometric RV of this experiment; that is, the RV which gives the number of defective units in each selected sample,
we have X ∼ HG(N, 100, r).
(a) Given that r/N = 0.02 < 0.05, we have
HG(x; N, 100, r) ≈ Bin(x; 100, 0.02).

Therefore, mean number of defective units found in a sample of 100 units E(X) = 2.

56 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

(b) With what's mentioned in (a), we have


Var(X) ≈ 100 · 0.02 (1 − 0.02) = 1.96.

The Poisson Probability Distribution (Poisson PD)

Poisson Process
Any random experiment of observing occurrences of a certain event during a given time interval or in a specied
region of space, such that
• the number of outcomes occurring in one time interval or specied region of space is independent of the number
that occur in any other disjoint time interval or region, and
• there exists a positive quantity, say λ, satisfying the following conditions:

(i) The probability that a single outcome would occur


- during a small enough time interval (of length, say ∆t) is approximately λ · ∆t, or
- in a small enough region of space (of size, say ∆l) is approximately λ · ∆l,
(ii) The probability that no outcome would occur
- during a small enough time interval (of length, say ∆t) is approximately (1 − λ · ∆t), or
- in a small enough region of space (of size, say ∆l) is approximately (1 − λ · ∆l),
(iii) The probability that more than one outcome will occur in such a small enough time interval or fall in such a
small enough region of space is negligible.
is called a Poisson process.
+ In practice, the quantity λ may be considered as the mean/average number of event occurrences per unit of the
time interval or the specied region of space; that is, the mean/average rate of event occurrences.
+ Therefore, if the total length of time interval is T or the total size of the specied region of space is S ,
when/where the event occurrences are observed, then the mean/average number of the event occurrences
(which is denoted by Λ) is λT or λS , respectively.

Poisson RV
The RV which gives the number of the event occurrences in a Poisson process is called the Poisson RV, whose state
space is {0, 1, 2, 3, . . .}.
+ It's proved that the probability that a Poisson RV takes any of its values is completely determined by the
quantity Λ in the Poisson process. Hence, Λ is called the parameter of the Poisson PD. For, instead of stating
that an "RV X has the Poisson distribution with parameter Λ", the following notation is used
X ∼ Poisson(Λ).

+ Notice that, in a Poisson process, letting X represent the number of event occurrences,
- given that their mean/average rate is λ, in s units of time or a region of space, we have
X ∼ Poisson(λs).

- given that their mean/average is Λ, we have


X ∼ Poisson(Λ).

57 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

Probabilistic Properties of a Poisson RV


If X ∼ Poisson(Λ), then
• the PMF of X is denoted by and proved to be
e−Λ Λx

 x! if x = 0, 1, 2, . . .
Poisson(x; Λ) := P (X = x) =
0 Otherwise

where e ≈ 2 : 71828 . . . is the Euler's number.


+ Notice that Poisson(x; Λ) gives the probability of observing x occurrences of an event (in a Poisson process)
whose average number of its occurrences during a given time interval or specied region of space is Λ.
• the CDF of X is given by

 0 if x < 0
FX (x) = P (X ≤ x) =
 Pk
i=0 Poisson(i; Λ) if k ≤ x < k + 1, for k = 0, 1, 2, . . .

 0
 if x < 0
=
 e−Λ 1 + Λ + Λ2 Λk
+ ··· + if k ≤ x < k + 1, for k = 0, 1, 2, . . .

2! k!

• the mean/expected-value of X is given by


E(X) = Λ.

• the variance of X is given by


Var(X) = Λ.

The Poisson PD as a limiting form of the Binomial PD

Theorem
Let X ∼ Bin(n, p). If as n → +∞ and p → 0+ , np → Λ (where Λ is a non-zero real number), then
Bin(n, p) −→ Poisson(Λ), as n → +∞ and p → 0+ .

+ Consequence: Approximating the Binomial PD with the Poisson PD Assume X ∼ Bin(n, p). If
np < 5 (or equivalently, n(1 − p) < 5), then

Bin(x; n, p) ≈ Poisson(x; np).

Supplementary Examples

Example 71. If X ∼ Poisson(2.1), calculate:


(a) P (X = 0) (b) P (X ≤ 2) (c) P (X ≥ 5) (d) P (X = 1|X ≤ 2)

Solution Note that the state space of X is {0, 1, 2, 3, . . .}


(a)
e−2.1 · 2.10
P (X = 0) = Poisson(0; 2.1) = = e−2.1 ≈ 0.1225.
0!

58 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

(b)
P (X ≤ 2) = P (X = 0) + P (X = 1) + P (X = 2) = Poisson(0; 2.1) + Poisson(1; 2.1) + Poisson(2; 2.1)
e−2.1 · 2.10 e−2.1 · 2.11 e−2.1 · 2.12
= + +
0! 1! 2!
−2.1 2.12
=e 1 + 2.1 + ≈ 0.6496.
2

(c)
P (X ≥ 5) = 1 − P (X < 5) = 1 − [P (X = 0) + P (X = 1) + P (X = 2) + P (X = 3) + P (X = 4)]
= 1 − [Poisson(0; 2.1) + Poisson(1; 2.1) + Poisson(2; 2.1) + Poisson(3; 2.1) + Poisson(4; 2.1)]
−2.1
· 2.10 e−2.1 · 2.11 e−2.1 · 2.12 e−2.1 · 2.13 e−2.1 · 2.14

e
=1− + + + +
0! 1! 2! 3! 4!
2 3 4

2.1 2.1 2.1
= 1 − e−2.1 1 + 2.1 + + + ≈ 0.0621.
2 6 24

(d)
P (X = 1 & X ≤ 2) P (X = 1) P (X = 1)
P (X = 1|X ≤ 2) = = =
P (X ≤ 2) P (X ≤ 2) P (X = 0) + P (X = 1) + P (X = 2)
Poisson(1; 2.1)
=
Poisson(0; 2.1) + Poisson(1; 2.1) + Poisson(2; 2.1)
e−2.1 · 2.1
= 2 ≈ 0.3959.
e−2.1 1 + 2.1 + 2.1
2

Example 72. The number of cracks in a ceramic tile has a Poisson distribution with a mean of Λ = 2.4. (a) What is the
probability that a tile has no cracks? (b) What is the probability that a tile has four or more cracks?

Solution Letting RV X gives the number of cracks in a ceramic tile, we have X ∼ Poisson(2.4) and its state space
is {0, 1, 2, 3, . . .}. Notice that Λ = 2.4 indicates that there are, on average, 2.4 cracks in every ceramic tile. Therefor, the
required probabilities are
(a)
e−2.4 · 2.40
P (X = 0) = Poisson(0; 2.4) = ≈ 0.0907.
0!
(b)
P (X ≥ 4) = 1 − P (X < 4) = 1 − [P (X = 3) + P (X = 2) + P (X = 1) + P (X = 0)]
= 1 − [Poisson(3; 2.4) + Poisson(2; 2.4) + Poisson(1; 2.4) + Poisson(0; 2.4)]
3
2.42

2.4
= 1 − e−2.4 + + 2.4 + 1
3! 2!
≈ 0.2213.

Example 73. On average there are about 25 imperfections in 100 meters of optical cable. Use the Poisson distribution to
estimate (a) the probability that there are no imperfections in 1 meter of cable. (b) What is the probability that there is no
more than one imperfection in 1 meter of cable?

Solution Since there are, on average, 25 imperfections in 100 meters of optical cable, there are, on average, 0.25
imperfections in 1 meter of optical cable. Hence, letting RV X gives the number of imperfections in 1 meter of optical cable,
we have X ∼ Poisson(0.25) and its state space is {0, 1, 2, 3, . . .}. Therefore, the required probabilities are

59 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

(a)
e−0.25 · 0.250
P (X = 0) = Poisson(0; 0.25) = ≈ 0.7788.
0!
(b)
P (X ≤ 1) = P (X = 1) + P (X = 0)
= Poisson(1; 0.25) + Poisson(0; 0.25)
= e−0.25 (0.25 + 1)
≈ 0.9735.

Example 74. On average there are four trac accidents in a city during one hour of rush-hour trac. Use the Poisson
distribution to calculate the probability that in one such hour there are (a) no accidents (b) at least six accidents

Solution Letting RV X gives the number of trac accidents in the city during one hour of rush-hour trac, we have
X ∼ Poisson(4) and its state space is {0, 1, 2, 3, . . .}. Therefore, the required probabilities are
(a)
e−4 · 40
P (X = 0) = Poisson(0; 4) = ≈ 0.0183.
0!
(b)
P (X ≥ 6) = 1 − P (X < 6) = 1 − [P (X = 5) + P (X = 4) + P (X = 3) + P (X = 2) + P (X = 1) + P (X = 0)]
= 1 − [Poisson(5; 4) + Poisson(4; 4) + Poisson(3; 4) + Poisson(2; 4) + Poisson(1; 4) + Poisson(0; 4)]
5
44 43 42

−4 4
=1−e + + + +4+1
5! 4! 3! 2!
≈ 0.2149.

Example 75. A box contains 500 electrical switches, each one of which has a probability of 0.005 of being defective. Use the
Poisson distribution to make an approximate calculation of the probability that the box contains no more than 3 defective
switches.

Solution Letting RV X be the number of defective electrical switches in the box, we have X ∼ Bin(500, 0.005) with
state space {0, 1, 2, . . . , 500}. But, since 500 · 0.005 = 2.5 < 5,
Bin(x; 500, 0.005) ≈ Poisson(x; 2.5).

Therefore, the required probability is


P (X ≤ 3) = P (X = 3) + P (X = 2) + P (X = 1) + P (X = 0)
= Bin(3; 500, 0.005) + Bin(2; 500, 0.005) + Bin(1; 500, 0.005) + Bin(0; 500, 0.005)
≈ Poisson(3; 2.5) + Poisson(2; 2.5) + Poisson(1; 2.5) + Poisson(0; 2.5)
3
2.52

−2.5 2.5
=e + + 2.5 + 1
3! 2!
≈ 0.7576.

Example 76. The average number of eld mice per acre in a 5-acre wheat eld is estimated to be 12. Find the probability
that fewer than 7 eld mice are found (a) on a given acre; (b) on 2 of the next 3 acres inspected.

Solution Notice that observing a mouse in a 5-acre eld follows the Poisson process with λ = 12.

60 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

(a) Since the average number of eld mice per acre in a 5-acre wheat eld is estimated to be 12, the average number of eld
mice in the 5-acre wheat eld is approximately 60. Letting RV X gives the number of eld mice in the 5-acre eld, we
have X ∼ Poisson(60) and its state space is {0, 1, 2, 3, . . .}. Therefore, the required probability is

P (X < 7) = P (X = 6) + P (X = 5) + P (X = 4) + P (X = 3) + P (X = 2) + P (X = 1) + P (X = 0)
6
X
= Poisson(i; 60)
i=0
6
X 60i
= e−60
i=0
i!
≈ 6.3 × 10−19 .

(b) Since the average number of eld mice per acre in a 5-acre wheat eld is estimated to be 12, the average number of eld
mice in the 2-acre wheat eld is approximately 24. Letting RV Y gives the number of eld mice in the 2-acre eld, we
have Y ∼ Poisson(24) and its state space is {0, 1, 2, 3, . . .}. Considering that there are 7 pairs of one acres in every three
adjacent one acres, the required probability is

7P (Y < 7) = 7 [P (Y = 6) + P (Y = 5) + P (Y = 4) + P (Y = 3) + P (Y = 2) + P (Y = 1) + P (Y = 0)]
6
X
=7 Poisson(i; 24)
i=0
6
X 24i
= 7e−24
i=0
i!
−5
≈ 9.2 × 10 .

Example 77. Hospital administrators in large cities anguish about trac in emergency rooms. At a particular hospital in a
large city, the sta on hand cannot accommodate the patient trac if there are more than 10 emergency cases in a given hour.
It is assumed that patient arrival follows a Poisson process, and historical data suggest that, on the average, 5 emergencies
arrive per hour. (a) What is the probability that in a given hour the sta cannot accommodate the patient trac? (b)
What is the probability that more than 20 emergencies arrive during a 3-hour shift?

Solution

(a) Letting RV X gives the number of patients arrive in the emergency room at a particular hospital in that large city per
hour, we have X ∼ Poisson(5) and its state space is {0, 1, 2, 3, . . .}. Therefore, the required probability is
10
X
P (X > 10) = 1 − P (X ≤ 10) = 1 − P (X = i)
i=0
10
X
=1− Poisson(i; 5) =
i=0
10
X 5i
= 1 − e−5
i=0
i!
≈ 0.0137.

(b) Since the average number of emergencies arrive per hour is 5, the average number of emergencies arrive during a 3-hour
shift is 15. Letting RV Y gives the number of patients arrive in the emergency room at a particular hospital in that large
city per 3-hour shift, we have Y ∼ Poisson(15) and its state space is {0, 1, 2, 3, . . .}. Therefore, the required probability

61 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

is

20
X
P (X > 20) = 1 − P (X ≤ 20) = 1 − P (X = i)
i=0
20
X
=1− Poisson(i; 15) =
i=0
20
X 15i
= 1 − e−15
i=0
i!
≈ 0.0830.

Example 78. The probability that a person will die when he or she contracts a viral infection is 0.004. Of the next 4000
people infected, what is the mean number who will die?

Solution Notice that observing whether a person will die in any group of people infected by the virus follows Poisson
process with λ = 0.004. Letting RV X gives the number of deaths in a group 4000 people infected by the virus, we have
X ∼ Poisson(16). Therefore, E(X) = 16 indicating that on average 16 deaths would occur in any group of 4000 infected.

The Multinomial Probability Distribution (Multinomial PD): Generalization of the Bi-


nomial PD

Multinomial Experiment
Any random experiment formed with repeating one trial, having more than two outcomes each of which occurs with
some constant probability value, a nite number of times and independently is called a Multinomial experiment.
+ Notice that, if a trial, having k > 2 outcomes O1 , O2 , . . . , Ok , is (independently) repeated n times, then each
outcome of the resulted Multinomial experiment may be presented as a sequence of n1 O1 s, n2 O2 s, ..., and nk
Ok s, where n1 + n2 + · · · + nk = n. Hence, the sample space of the resulted Multinomial experiment has k n
outcomes.

62 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

Multinomial RVs
On the sample space of a Multinomial experiment, which is formed by independently repeating a trial having k > 2
outcomes, the ordered k-tuple of RVs each of which assigns the number of occurrences of one and only one of those k
outcomes is called the Multinomial RV.
trial outcomes

+ Notice that, if a trial, having k > 2 outcomes O1 , O2 , . . . , Ok with (constant) occurrence probabilities
z }| {

p1 , . . . , pk with p1 + · · · + pk = 1, is repeated n times independently, then


| {z }
trial outcomes probabilities

- the Multinomial RV of the resulted (Multinomial) experiment, say (X1 , . . . , Xk ), is a DRV whose state
space may be presented as
{(x1 , . . . , xk ) | x1 , . . . , xk = 0, 1, 2, . . . , n & x1 + · · · + xk = n} ,

where (X1 = x1 , . . . , Xk = xk ) identies the event consists of experimental outcomes as sequences of x1


O1 s, ..., and xk Ok s. Since there are
n!
x1 ! · · · · · xk !
of such outcomes which are mutually exclusive, we have
P [(X1 = x1 , . . . , Xk = xk )] = P (observing x1 O1 s , . . . , and xk Ok s)
=p =p =p =p
n! z }|1 { z }|1 { z }|k { z }|k {
= P (O1 ) · · · · · P (O1 ) · · · · · P (Ok ) · · · · · P (Ok )
x1 ! · · · · · xk ! | {z } | {z }
x1 times xk times
n!
= (px1 × · · · × pxkk )
x1 ! · · · · · xk ! 1

n
= px1 × · · · × pxkk ,
x1 , . . . , x k 1

where x1 + · · · + xk = n and p1 + · · · + pk = 1.
+ Since the assigned probabilities to the values of a Multinomial RV are determined completely by the k trial out-
comes probabilities and the number of times the trial is repeated, p1 , . . . , pk and n, in a Multinomial experiment,
are called the parameters of the Multinomial distribution. For, instead of stating that an "RV X has the
Multinomial distribution with trial outcomes probabilities p1 , . . . , pk and number of times the trial is repeated
n", the following notation is used
(X1 , . . . , Xk ) ∼ Mn(n, p1 , . . . , pk ).

63 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

Probabilistic Properties of a Multinomial RV


If (X1 , . . . , Xk ) ∼ Mn(n, p1 , . . . , pk ), then
• the PMF of (X1 , . . . , Xk ) is denoted and given by

Mn(x1 , . . . , xk ; n, p1 , . . . , pk ) := P [(X1 = x1 , . . . , Xk = xk )]

n
px1 1 × · · · × pxkk if xi = 0, 1, 2, . . . , n, for i = 1, . . . , k


x , . . . , x

1 k



where x1 + · · · + xk = n

=






0 Otherwise

+ Notice that Mn(x1 , . . . , xk ; n, p1 , . . . , pk ) gives the probability of observing x1 O1 s, ..., and xk Ok s if a trial,
with k outcomes O1 , . . . , Ok occurring with probabilities p1 , . . . , pk , is independently repeated n times.
+ Notice that each RV Xi in (X1 , . . . , Xk ) ∼ Mn(n, p1 , . . . , pk ) has the Binomial distribution with parameters n
and pi :
Xi ∼ Bin(n, pi ).
• the mean/expected-value of each RV Xi in (X1 , . . . , Xk ) ∼ Mn(n, p1 , . . . , pk ) is given by

E(Xi ) = npi .

• the variance of each component RV Xi in (X1 , . . . , Xk ) ∼ Mn(n, p1 , . . . , pk ) is given by

Var(Xi ) = npi (1 − pi ).

64 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

Some Computational Results for Multinomial PD

• Generally speaking, for any positive integer n



X n
= kn ,
x1 , . . . , x k
x1 +···+xk =n

where xi = 0, 1, . . . , n, for i = 1, . . . , k.
• Generally speaking, for any positive integer n

X n n
px1 × · · · × pxkk = (p1 + · · · + pk ) ,
x1 , . . . , x k 1
x1 +···+xk =n

where pi ∈ R and xi = 0, 1, . . . , n, for i = 1, . . . , k.


• Given (X1 , . . . , Xk ) ∼ Mn(n, p1 , . . . , pk ). If the value of at least one of the Xi s is given, the corresponding cumulative
probability can 0still be evaluated; indeed, without loss of generality, assume the values of the rst r Xi s are given;
that is, X1 = x1 , X2 = x2 , . . ., and Xr = xr are known, but the values of Xr+1 , Xr+2 , . . . , and Xk are unknown.
0 0

In such a case, we have


h 0 0 0
i
P X1 = x1 , X2 = x2 , . . . , Xr = xr , Xr+1 , Xr+2 , . . . , Xk
known values
z }| { unknown values
X 0 0 0 z }| {
= Mn(x1 , x2 , . . . , xr , xr+1 , xr+2 , . . . , xk ; n, p1 , . . . , pk )
0 0
xr+1 +xr+2 +···+xk =n−(x1 +x2 +···+x0r )
0 0 0 0 0 0
n!

=
x
p1 1 ·
x
p2 2
x
· · · pr r · 0 0 0
0 0 0
· (pr+1 + pr+2 + · · · + pk )n− x1 +x2 +···+xr
x1 ! · x2 ! · · · xr ! · n − x1 + x2 + · · · + xr !

Supplementary Examples
Example 79. In a consumer satisfaction survey the responses are "very unsatisfactory," "unsatisfactory," "average," "sat-
isfactory," and "very satisfactory." If each of these responses are equally likely, what is the probability that in ten surveys
each answer will be selected twice?

Solution Here, the trial is to observe which one of these ve responses is chosen by a consumer, whose trial outcomes
and trial outcomes probabilities are
O1 := ”very unsatisfactory” is chosen on the survey, P (O1 ) := p1 = 0.2;
O2 := ”unsatisfactory” is chosen on the survey, P (O2 ) := p2 = 0.2;
O3 := ”average” is chosen on the survey, P (O3 ) := p3 = 0.2;
O4 := ”satisfactory” is chosen on the survey, P (O4 ) := p4 = 0.2.
O5 := ”very satisfactory” is chosen on the survey, P (O4 ) := p4 = 0.2.

Now, let's this trial be (independently) repeated n := 10 times when its trial outcomes probabilities remain xed. Introducing
four RVs Xi s (i = 1, . . . , 5) which give the number of times Oi s occur in 10 repetitions of the trial respectively, the RV
(X1 , . . . , X5 ) is a Multinomial RV with

(X1 , . . . , X5 ) ∼ Mn (10, 0.2, 0.2, 0.2, 0.2, 0.2) ,

whose state space is {(x1 , . . . , x5 ) | xi = 0, 1, 2, . . . , 10 & x1 + · · · + x5 = 10} having 1001 members.

65 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

Therefore, the probability that in ten surveys each answer will be selected twice is
P [(X1 = 2, X2 = 2, X3 = 2, X4 = 2, X5 = 2)] = Mn(2, 2, 2, 2, 2; 10, 0.2, 0.2, 0.2, 0.2, 0.2)

10
= · 0.22 · 0.22 · 0.22 · 0.22 · 0.22
2, 2, 2, 2, 2
≈ 0.0116.

Example 80. A company receives 60% of its orders over the Internet. Suppose that 30% of the orders received over the
Internet are large orders, and 40% of the orders received by other means are large orders. Out of eight independently placed
orders, what is the probability that two will be large orders received over the Internet, two will be small orders received over
the Internet, two will be large orders not received over the Internet, and two will be small orders not received over the Internet?

Solution Here, the trial is to observe the status a received order, whose trial outcomes and trial outcomes probabilities
are
O1 := Receiving a small order via internet, P (O1 ) := p1 = 0.42;
O2 := Receiving a large order via internet, P (O2 ) := p2 = 0.18;
O3 := Receiving a large order not via internet, P (O3 ) := p3 = 0.16;
O4 := Receiving a small order not via internet, P (O4 ) := p4 = 0.24.

Now, let's this trial be (independently) repeated n := 8 times when its trial outcomes probabilities remain xed. Introducing
four RVs Xi s (i = 1, . . . , 4) which give the number of times Oi s occur in 8 repetitions of the trial respectively, the RV
(X1 , . . . , X4 ) is a Multinomial RV with

(X1 , . . . , X4 ) ∼ Mn (8, 0.42, 0.18, 0.16, 0.24) ,

whose state space is {(x1 , . . . , x4 ) | xi = 0, 1, 2, . . . , 8 & x1 + · · · + x4 = 8} having 165 members.


Therefore, the probability that two will be large orders received over the Internet, two will be small orders received over the
Internet, two will be large orders not received over the Internet, and two will be small orders not received over the Internet is
P [(X1 = 2, X2 = 2, X3 = 2, X4 = 2)] = Mn(2, 2, 2, 2; 8, 0.42, 0.18, 0.16, 0.24)

8
= · 0.422 · 0.182 · 0.162 · 0.242
2, 2, 2, 2
≈ 0.0212.

Example 81. A researcher plants 22 seedlings. After one month, independent of the other seedlings, each seedling has a
probability of 0.08 of being dead, a probability of 0.19 of exhibiting slow growth, a probability of 0.42 of exhibiting medium
growth, and a probability of 0.31 of exhibiting strong growth. (a) What is the expected number of seedlings in each of
these four categories after one month? Calculate the probability that after one month: (b) Exactly three seedlings are dead,
exactly four exhibit slow growth, and exactly six exhibit medium growth. (c) Exactly ve seedlings are dead, exactly ve
exhibit slow growth, and exactly seven exhibit strong growth. (d) No more than two seedlings have died.

Solution Here, the trial is to observe the status of a planted seedling after one month, whose trial outcomes and trial
outcomes probabilities are
O1 := planted seedling is dead, P (O1 ) := p1 = 0.08;
O2 := planted seedling exhibits slow growth, P (O2 ) := p2 = 0.19;
O3 := planted seedling exhibits medium growth, P (O3 ) := p3 = 0.42;
O4 := planted seedling exhibits strong growth, P (O4 ) := p4 = 0.31.

66 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

Now, let's this trial be (independently) repeated n := 22 times when its trial outcomes probabilities remain xed. Introducing
four RVs Xi s (i = 1, . . . , 4) which give the number of times Oi s occur in 22 repetitions of the trial respectively, the RV
(X1 , . . . , X4 ) is a Multinomial RV with

(X1 , . . . , X4 ) ∼ Mn (22, 0.08, 0.19, 0.42, 0.31) ,

whose state space is {(x1 , . . . , x4 ) | xi = 0, 1, 2, . . . , 22 & x1 + · · · + x4 = 22} having 2,300 members.


(a) The expected number of seedlings in each of these four categories after one month are
E(X1 ) = 22 · 0.08 = 1.76, E(X2 ) = 22 · 0.19 = 4.18, E(X3 ) = 22 · 0.42 = 9.24, E(X4 ) = 22 · 0.31 = 6.82,

indicating that, among the 22 planted seedlings, on average 1.76, 4.18, 9.24, and 6.82 of them would die, exhibit slow,
medium, and strong growth after one month of planting.
(b) The probability that after one month exactly three seedlings are dead, exactly four exhibit slow growth, and exactly six
exhibit medium growth is
P [(X1 = 3, X2 = 4, X3 = 6, X4 = 9)] = Mn(3, 4, 6, 9; 22, 0.08, 0.19, 0.42, 0.31)

22
= · 0.083 · 0.194 · 0.426 · 0.319
3, 4, 6, 9
≈ 0.0029.

(c) The probability that exactly ve seedlings are dead, exactly ve exhibit slow growth, and exactly seven exhibit strong
growth is
P [(X1 = 5, X2 = 5, X3 = 5, X4 = 7)] = Mn(5, 5, 5, 7; 22, 0.08, 0.19, 0.42, 0.31)

22
= · 0.085 · 0.195 · 0.425 · 0.317
5, 5, 5, 7
≈ 0.00038.

(d) the event that no more than two seedlings have died is presented as
{(x1 , x2 , x3 , x4 ) | x1 = 0, 1, 2 & x2 , x3 , x4 = 0, 1, 2, . . . , 22 & x1 + x2 + x3 + x4 = 22} ,

which is union of the following mutually exclusive events


The event that no planted seedlings, among 22, would die after one month
z }| {
A0 := {(0, x2 , x3 , x4 ) | x2 , x3 , x4 = 0, 1, 2, . . . , 22 & x2 + x3 + x4 = 22},
The event that one planted seedling, among 22, would die after one month
z }| {
A1 := {(1, x2 , x3 , x4 ) | x2 , x3 , x4 = 0, 1, 2, . . . , 21 & x2 + x3 + x4 = 21},
The event that two planted seedlings, among 22, would die after one monthmers
z }| {
A2 := {(2, x2 , x3 , x4 ) | x2 , x3 , x4 = 0, 1, 2, . . . , 20 & x2 + x3 + x4 = 20} .

Note that event A0 consists of 276 mutually exclusive outcomes, event A1 consists of 253 mutually exclusive outcomes,
and event A2 consists of 231 mutually exclusive outcomes. And

X X 22
P (A0 ) = Mn(0, x2 , x3 , x4 ; 22, 0.08, 0.19, 0.42, 0.31) = 0.080 · 0.19x2 · 0.42x3 · 0.31x4
0, x2 , x3 , x4
x2 +x3 +x4 =22 x2 +x3 +x4 =22

X 22
= 0.19x2 · 0.42x3 · 0.31x4
x2 , x3 , x4
x2 +x3 +x4 =22
22
= (0.19 + 0.42 + 0.31) = 0.9222 ,

67 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

X X 22
P (A1 ) = Mn(1, x2 , x3 , x4 ; 22, 0.08, 0.19, 0.42, 0.31) = 0.081 · 0.19x2 · 0.42x3 · 0.31x4
1, x2 , x3 , x4
x2 +x3 +x4 =21 x2 +x3 +x4 =21

X 21
= 0.08 · 22 · 0.19x2 · 0.42x3 · 0.31x4
x2 , x3 , x4
x2 +x3 +x4 =21
21
= 0.08 · 22 · (0.19 + 0.42 + 0.31) = 0.08 · 22 · 0.9221 ,


X X 22
P (A2 ) = Mn(2, x2 , x3 , x4 ; 22, 0.08, 0.19, 0.42, 0.31) = 0.082 · 0.19x2 · 0.42x3 · 0.31x4
2, x2 , x3 , x4
x2 +x3 +x4 =20 x2 +x3 +x4 =20

X 20
= 0.082 · 231 · 0.19x2 · 0.42x3 · 0.31x4
x2 , x3 , x4
x2 +x3 +x4 =20
2 20
= 0.08 · 231 · (0.19 + 0.42 + 0.31) = 0.082 · 231 · 0.9220 .

Therefore, the probability that fewer than three sets of type A are sold to the next seven customers is
P (A0 ) + P (A1 ) + P (A2 ) ≈ 0.7442.

Example 82. A garage sells three types of tires, type A, type B, and type C. A customer purchases type A with probability
0.23, type B with probability 0.48, and type C with probability 0.29. (a) What is the probability that the next 11 customers
purchase four sets of type A, ve sets of type B, and two sets of type C? (b) What is the probability that fewer than three
sets of type A are sold to the next seven customers?

Solution Here, the trial is to randomly select and purchase one of the given three types of tires by a customer, whose
trial outcomes and trial outcomes probabilities are
A := One set of type A tire is randomly selected and purchased P (A) := pA = 0.23
B := One set of type B tire is randomly selected and purchased P (B) := pB = 0.48
C := One set of type C tire is randomly selected and purchased P (C) := pC = 0.29.

(a) Now, let's this trial be (independently) repeated n := 11 times when its trial outcomes probabilities remain xed.
Introducing three RVs XA , XB , and XC which give the number of times A, B , or C occurs in 11 repetitions of the trial,
the RV (XA , XB , XC ) is a Multinomial RV with
(XA , XB , XC ) ∼ Mn(11, 0.23, 0.48, 0.29),

whose state space is {(xA , xB , xC ) | xA , xB , xC = 0, 1, 2, . . . , 11 & xA + xB + xC = 11} having 78 members.


Therefore, the required probability is
P [(XA = 4, XB = 5, XC = 2)] = Mn(4, 5, 2; 11, 0.23, 0.48, 0.29)

11 11!
= 0.234 · 0.485 · 0.292 = · 0.234 · 0.485 · 0.292
4, 5, 2 4! · 5! · 2!
≈ 0.0416.

(b) Now, let's this trial be (independently) repeated n := 7 times when its trial outcomes probabilities remain xed.
Introducing three RVs YA , YB , and YC which give the number of times A, B , or C occurs in 7 repetitions of the trial,
the RV (YA , YB , YC ) is a Multinomial RV with
(YA , YB , YC ) ∼ Mn(7, 0.23, 0.48, 0.29),

whose state space is {(yA , yB , yC ) | yA , yB , yC = 0, 1, 2, . . . , 7 & yA + yB + yk = 7} having 36 members.


Hence, the event that fewer than three sets of type A are sold to the next seven customers is presented as
{(yA , yB , yC ) | yA = 0, 1, 2 & yB , yC = 0, 1, 2, . . . , 7 & yA + yB + yC = 7} ,

68 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

which is union of the following mutually exclusive events


The event that no set of type A tire is purchased by 7 customers
z }| {
A0 := {(0, yB , yC ) | yB , yC = 0, 1, 2, . . . , 7 & yB + yC = 7} ,
The event that one set of type A tire is purchased by 7 customers
z }| {
A1 := {(1, yB , yC ) | yB , yC = 0, 1, 2, . . . , 6 & yB + yC = 6} ,
The event that two sets of type A tire are purchased by 7 customers
z }| {
A2 := {(2, yB , yC ) | yB , yC = 0, 1, 2, . . . , 5 & yB + yC = 5} .
Note that event A0 consists of 8 mutually exclusive outcomes, event A1 consists of 7 mutually exclusive outcomes, and
event A2 consists of 6 mutually exclusive outcomes. And

X X 7
P (A0 ) = Mn(0, yB , yC ; 7, 0.23, 0.48, 0.29) = 0.230 · 0.48yB · 0.29yC
0, yB , yC
yB +yC =7 yB +yC =7

X 7
= · 0.48yB · 0.29yC
yB , yC
yB +yC =7
7
= (0.48 + 0.29) = 0.777 ,

X X 7
P (A1 ) = Mn(1, yB , yC ; 7, 0.23, 0.48, 0.29) = 0.231 · 0.48yB · 0.29yC
1, yB , yC
yB +yC =6 yB +yC =6

X 6
= 0.23 · 7 · · 0.48yB · 0.29yC
yB , yC
yB +yC =6
6
= 0.23 · 7 · (0.48 + 0.29) = 0.23 · 7 · 0.776 ,

X X 7
P (A1 ) = Mn(2, yB , yC ; 7, 0.23, 0.48, 0.29) = 0.232 · 0.48yB · 0.29yC
2, yB , yC
yB +yC =5 yB +yC =5

X 5
= 0.232 · 21 · 0.48yB · 0.29yC
yB , yC
yB +yC =5
5
= 0.23 · 21 · (0.48 + 0.29) = 0.232 · 21 · 0.775 .
2

Therefore, the probability that fewer than three sets of type A are sold to the next seven customers is
P (A0 ) + P (A1 ) + P (A2 ) ≈ 0.7967.

Example 83. A fair die is rolled 15 times. Calculate the probability that there are: (a) Exactly three 6s and three 5s (b)
Exactly three 6s, three 5s, and four 4s (c) Exactly two 6s (d) What is the expected number of 6s obtained?

Solution Here, the trial is to roll a fair die once, whose trial outcomes and trial outcomes probabilities are
1 1 1
O1 := 1 is faced up, P (O1 ) := p1 = ; O2 := 2 is faced up, P (O2 ) := p2 = ; O3 := 3 is faced up, P (O3 ) := p3 = ;
6 6 6
1 1 1
O4 := 4 is faced up, P (O4 ) := p4 = ; O5 := 5 is faced up, P (O5 ) := p5 = ; O6 := 6 is faced up, P (O6 ) := p6 = .
6 6 6
Now, let's this trial be (independently) repeated n := 15 times when its trial outcomes probabilities remain xed. Introducing
six RVs Xi s (i = 1, . . . , 6) which give the number of times Oi s occur in 15 repetitions of the trial respectively, the RV
(X1 , . . . , X6 ) is a Multinomial RV with

1 1 1 1 1 1
(X1 , . . . , X6 ) ∼ Mn 15, , , , , , ,
6 6 6 6 6 6
whose state space is {(x1 , . . . , x6 ) | xi = 0, 1, 2, . . . , 15 & x1 + · · · + x6 = 15} having 15,504 members.

69 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

(a) The event of observing exactly three 6s and three 5s in 15 rolls of a die is presented as

{(x1 , x2 , x3 , x4 , 3, 3) | x1 , . . . , x4 = 0, 1, 2, . . . , 9 & x1 + · · · + x4 = 9} ,

whose probability is

X 1 1 1 1 1 1
Mn x1 , x2 , x3 , x4 , 3, 3; 15, , , , , , =
x1 +···+x4 =9
6 6 6 6 6 6
x1 x2 x3 x4 3 3
X 15 1 1 1 1 1 1
= · · · · · · =
x1 , x2 , x3 , x4 , 3, 3 6 6 6 6 6 6
x1 +···+x4 =9 | {z }
15
=( 16 )
15
1 X 15
= ·
6 x1 , x2 , x3 , x4 , 3, 3
x1 +···+x4 =9
15
1 X 9
= · 100100 ·
6 x1 , x2 , x3 , x4
x1 +···+x4 =9
15
1
= · 100100 · 49
6
≈ 0.0558.

(b) The event of observing exactly three 6s, three 5s, and four 4s in 15 rolls of a die is presented as

{(x1 , x2 , x3 , 4, 3, 3) | x1 , x2 , x3 = 0, 1, 2, 3, 4, 5 & x1 + x2 + x3 = 5} ,

whose probability is

X 1 1 1 1 1 1
Mn x1 , x2 , x3 , 4, 3, 3; 15, , , , , , =
x1 +x2 +x3 =5
6 6 6 6 6 6
x1 x2 x3 4 3 3
X 15 1 1 1 1 1 1
= · · · · · · =
x1 , x2 , x3 , 4, 3, 3 6 6 6 6 6 6
x1 +x2 +x3 =5 | {z }
15
=( 16 )
15
1 X 15
= ·
6 x1 , x2 , x3 , 4, 3, 3
x1 +x2 +x3 =5
15
1 X 5
= · 12612600 ·
6 x1 , x2 , x3
x1 +x2 +x3 =5
15
1
= · 12612600 · 35
6
≈ 0.0065.

(c) The event of observing exactly two 6s in 15 rolls of a die is presented as

{(x1 , x2 , x3 , x4 , x5 , 2) | x1 , . . . , x5 = 0, 1, . . . , 13 & x1 + · · · + x5 = 13} ,

70 of 71
STT 201 & 342: Lecture Note V
Instructor: Mehdi Nikpour Introduction To (One-Dimensional) DRVs Spring 2021

whose probability is

X 1 1 1 1 1 1
Mn x1 , x2 , x3 , x4 , x5 , 2; 15, , , , , , =
x1 +···+x5 =13
6 6 6 6 6 6
x1 x2 x3 x4 x5 2
X 15 1 1 1 1 1 1
= · · · · · · =
x1 , x2 , x3 , x4 , x5 , 2 6 6 6 6 6 6
x1 +···+x5 =13 | {z }
15
=( 16 )
15
1 X 15
= ·
6 x1 , x2 , x3 , x4 , x5 , 2
x1 +···+x5 =13
15
1 X 13
= · 105 ·
6 x1 , x2 , x3 , x4 , x5
x1 +···+x5 =13
15
1
= · 105 · 513
6
≈ 0.2726.

(d) The expected number of 6s obtained is


1
E(X6 ) = 15 · = 2.5,
6
indicating that on average 2.5 number of times, in each set of 15 rolls, 6 is observed per repetition.

71 of 71

You might also like