Notes STA408 - Chapter 2 PDF
Notes STA408 - Chapter 2 PDF
Sampling Distribution
Outline
2.1 Definition
2.2 Central Limit Theorem
Objectives
At the end of this chapter, the students should be able to
1. Understand the difference between population and sample.
2. Understand Central Limit Theorem.
2.1 Definition
(a) Population
A population is any entire collection of objects or people from which we may collect data. It is
the entire group we are interested in, which we wish to describe or draw conclusions about.
(b) Sample
A sample is a subset of objects from a larger population.
(c) Experiment
An experiment is any process or study which results in the collection of data, the outcome of
which is unknown. In statistics, the term is usually restricted to situations in which the
researcher has control over some of the conditions under which the experiment takes place.
(d) Parameter
A parameter is a value, usually unknown (and which therefore has to be estimated), used to
represent a certain population characteristic. For example, the population mean is a
parameter that is often used to indicate the average value of a quantity.
(e) Statistics
A sample statistics is a numerical descriptive measure of a sample. It is calculated based on
the observation in the sample. It is used to give information about unknown values in the
corresponding population. Hence, a statistics is used to estimate an unknown population
parameter. For example, the average of the data in a sample is used to give information
about the overall average in the population from which that sample was drawn.
Sampling distribution of the sample mean is the distribution the means computed from all
possible random samples of a specific size taken from a population.
In many cases we can approximate the distribution of the sample mean by a normal
distribution, when sample size (n) is large. This result is called the Central Limit Theorem. A
sample with sample size of 30 or more is considered to be large enough by most
statisticians for the Central Limit Theorem to apply.
1
Let x1, x2, …, xn be a random sample drawn from an arbitrary distribution with a mean µ and
variance σ2. Then the distribution of the sample mean, X , will approach a normal
distribution with a mean and a standard deviation . We may write it as:
n
2 X
X ~ N( , ) or ~ N(0,1)
n
n
For the Central Limit Theorem,
The sample mean will be normally distributed when the original variable is normally
distributed, regardless of any sample size, n.
The sample mean will be approximately normally distributed when the original
variable might not be normally distributed. The approximation will be more accurate
when the sample size, n, is sufficiently large.
Example 1
The average time spent by engineering workers on weekends is 7.23 hours . Assume the
distribution is approximately normal with a standard deviation of 0.6 hour.
(i) Find the probability an individual who works that trade works fewer than 7 hours
on the weekend.
(ii) If a sample of 50 workers is randomly selected, find the probability the mean of
the sample will be less than 7 hours.
Solution:
(i) 𝑋~𝑁 (𝜇, 𝜎 2 ) 𝑋~𝑁 (7.23, 0.62 )
7 − 7.23
𝑃(𝑋 < 7) = 𝑃(𝑍 < ) = 𝑃(𝑍 < −0.38) = 0.3520
0.6
2 0.62
(ii) X ~ N( , ) X ~ N(7.23,
)
n 50
7−7.23
P( X < 7) = 𝑃(𝑍 < 0.6 ) = 𝑃(𝑍 < −2.71) = 0.00336
⁄
√50
Example 2
The average daily of air pollution index (API) in XYZ country is 94. Assume the standard
deviation is 18. If a random sample of 32 days is selected, find the probability that the mean
of their API is between 84 and 101.
Solution:
2 182
X ~ N( , ) X ~ N(94, )
n 32
84−94 101−94
P(84< X < 110)= 𝑃 (18 <𝑍< 18⁄ )
⁄
√32 √32
= 1 – 0.00084 – 0.0139
= 0.98526
2
Exercises
Chapter 2 Sampling Distribution
1. The length of time of long-distance telephone calls has mean of 20 minutes and standard
deviation of 5 minutes. Suppose a sample of 45 telephone calls is used to reflect on the
population of all long-distance calls.
a) What is the chance that the average talked time of the 45 calls is between 18 and 19
minutes?
b) What theorem do we need in order to solve (a.)?
4. A drinks machine dispenses lemonade into cups. It is electronic controlled to cut off the
flow of lemonade. The volume of dispensed into cups vary according to a normal
distribution with a mean of 185 ml and standard deviation of 4 ml.
a) Find the mean and standard deviation of the sampling distribution of the sample
mean of seven randomly selected cups?
b) What can you comment about the shape of the sampling distribution of the sample
mean? State the theorem used.
c) What is the probability that the mean volume of the seven selected cups is less than
181 ml?
d) What is probability that the mean volume of seven selected cups is more than 188
ml?
[Answer: a) mean =185, standard error = 1.51 b) Normal distribution, Central Limit theorem
c) 0.004 d) 0.0234 ]
5. A previous study had shown that the tire lifetime for a particular brand of vehicles has a
mean of 80,000 km and a standard deviation of 35,000 km.
a) What would be the distribution, mean and standard error mean lifetime of a random
sample of 60 vehicles?
3
b) What is the probability that the sample mean lifetime for these 50 vehicles exceeds
88,500 km?
1. The mean weight of a box of cereal filled by a machine is 425.0 grams, with a standard
deviation of 11.3 grams. The weight is assumed to be normally distributed.
a) What is the probability that a box of cereal has weight more than 450 grams?
b) If the weight of a box of cereal is less than m grams, the box is rejected. Estimate the
value of m if 11.5% of the boxes are rejected.
c) In a sample of 150 boxes, find the probability that the mean weight of the boxes is
over 427 grams.
[Answer: a) 0.013 b) 411.4 c) 0.015 ]
2. A firm claims that the life of the LED light bulbs manufactured by them follows a normal
distribution with a mean of 45000 hours and a standard deviation of 9000 hours.
a) What is the probability that a random chosen LED light bulb lasts less than 35000
hours?
b) A sample of 27 LED light bulbs is randomly drawn. Find the probability that the mean
life of the bulbs is at least 50000 hours?
3. In a catalog advertises pure honey from Cameron Highlands in 300g jars, 500 g jars and
1 kg jars.
a) The weight, Q g, of pure honey in a 300g jar may be modelled by a normal
distribution with mean of 305 g and variance of 30 g. Find the probability that the
weight of pure honey in a 300g jar is more than 320 g.
b) The actual weight, R kg, of pure honey in a 1 kg jar may be modelled by a normal
distribution with mean of 1.25 kg and variance of 20 kg. A random sample of seven 1
kg jars is selected. Find the probability that the mean weight of pure honey in a 1 kg
jar is less than 1.4 kg.
c) The weight, S g, of pure honey in a 500 g jar may be modelled by a normal
distribution with mean of 521 g and standard deviation of 10 g. A random sample of
ten 500g jars is selected. Find the probability that none of the ten jars contain
between 515 g and 522 g.