Lecture No. Probability & Statistics
Lecture No. Probability & Statistics
Sampling Distribution of p̂
Sampling Distribution of
X1 X2
TOPICS FOR TODAY
Sampling Distribution of X1 X 2
(continued)
Sampling Distribution of p̂1 p̂ 2
Point Estimation
Desirable Qualities of a Good Point
Estimator
–Unbiasedness
–Consistency
EXAMPLE: Car batteries produced by
company A have a mean life of 4.3 years with
a standard deviation of 0.6 years.
A similar battery produced by company B
has a mean life of 4.0 years and a standard
deviation of 0.4 years.
What is the probability that a random
sample of 49 batteries from company A will
have a mean life of at least 0.5 years more
than the mean life of a sample of 36 batteries
from company B?
SOLUTION: We are given the following data:
Population A:
1 = 4.3 years, 1 = 0.6 years, n1 = 49
Population B:
2 = 4.0 years, 2 = 0.4 years, n2 = 36
Both sample sizes (n1 = 49, n2 = 36) are large
enough to assume that the sampling distribution
of the differences is approximately a normal
such that:
Mean:
x1 x 2 1 2 4.3 4.0 0.3 years
X1 X 2 0.3
0.1086
is approximately N (0, 1)
We are required to find the probability that the
mean life of 49 batteries produced by company
A will have a mean life of at least 0.5 years
longer than the mean life of 36 batteries
produced by company B, i.e. We are required
to find: PX1 X 2 0.5.
Transforming X1 X 2 0.5 to z-value, we find:
0.5 0.3
z 1.84
0.1086
0.3 0.5 X1 X 2
0 1.84 Z
PX1 X 2 0.5PZ 1.84
0.5 P0 Z 1.84
0.5 0.4671
0.0329
In other words, (given that the real
difference between the mean lifetimes of
batteries of company A and batteries of
company B is 4.3 - 4.0 = 0.3 years), the
probability that a sample of 49 batteries
produced by company A will have
a mean life of at least 0.5 years longer than the
mean life of a sample of 36 batteries produced
by company B, is only 3.3%.
Next, we consider the Sampling Distribution
of the Differences between Proportions.
Suppose there are two binomial
populations with proportions of successes p1
and p2 respectively.
Let independent random samples of sizes n 1
and n2 be drawn from the respective
populations, and the differences p̂1 p̂ 2
between the proportions of all possible
pairs of samples be computed.
T hen, a probability distribution of the
differences p̂ 1 p̂ 2 can be obtained.
Such a probability distribution is
called the sam pling distribution
of the differences betw een the
proportions p̂ 1 p̂ 2 .
We illustrate the sampling
distribution of p̂1 p̂ 2
with the help of the following example:
EXAMPLE: It is claimed that 30% of the
households in Community A and 20% of the
households in Community B have at least one
teenager.
A simple random sample of 100
households from each community yields the
following results:
p̂ A 0.34, p̂ B 0.13.
What is the probability of observing a
difference this large or larger if the claims
are true?
SOLUTION: We assume that if the
claims are true, the sampling distribution of
p̂ A p̂ B is approximately normally
distributed.
Since we are reasonably confident
that our sampling distribution is
approximately normally distributed, hence
we will be finding any required probability
by computing the relevant areas under our
normal curve, and, in order to do so, we will
first need to convert our variable p̂ A p̂ B to
Z.
In order to convert p̂ A p̂ B to Z, we need
the values of P̂ P̂ as well as P̂ P̂ .
A B A B
p1q1 p 2 q 2
p̂1 p̂ 2 ,
n1 n2
where q = 1 – p.
Hence, in this example, we have:
p̂A p̂ B 0.30 0.20 0.10 and
2
p̂A p̂B
0.300.70 0.200.80
0.0037
100 100
The observed difference in sample proportions is
p̂ A p̂ B 0.34 0.13 0.21
The probability that we wish to determine is
represented by the area to the right of 0.21 in the
sampling distribution of p̂ A p̂ B .
To find this area, we compute
0.21 0.10 0.11
z 1.83
0.0037 0.06
0.4664 0.0336
p̂ A p̂ B
0.10 0.21
Z
0 1.83
The area between z = 0 and z = 1.83 is 0.4664.
Hence, the area to the right of z = 1.83 is
0.0336 i.e. 3.36%, the probability of observing
a difference as larger as or larger than the
actually observed.
The students are encouraged to try to
interpret this result with reference to the
situation at hand, as, in attempting to solve a
statistical problem, it is very important not just
to apply various formulae and obtain numerical
results, but to interpret the results with
reference to the problem under consideration.
Does the result indicate that at least one
of the two claims is untrue, or does it imply
something else?
1) We have discussed various sampling
distributions with reference to the simplest
technique of random sampling, i.e. simple
random sampling.
And, with reference to simple random
sampling, it should be kept in mind that this
technique of sampling is appropriate in that
situation when the population is
homogeneous.
2) Let us consider the reason why the
standard deviation of the sampling distribution
of any statistic is known as its standard error:
Consider the fact that any statistic, considered
as an estimate of the corresponding population
parameter, should be as close in magnitude to
the parameter as possible.
The difference between the value of the statistic
and the value of the parameter can be regarded
as an error and is called ‘sampling error’.
Geometrically, each one of these errors
can be represented by horizontal line
segment below the X-axis, as shown below:
x6 x5 x 4 x1 x 2 x 3
x
The above diagram clearly indicates that
there are various magnitudes of this error,
depending on how far or how close the values
of our statistic are in different samples.
The standard deviation of X gives us a
‘standard’ value of this error, and hence the
term ‘Standard Error’.
Having presented the basic ideas
regarding sampling distributions, we now
begin the discussion regarding POINT
ESTIMATION:
POINT ESTIMATION
Point estimation of a population parameter
provides an estimate a single value calculated
from the sample that is likely to be close in
magnitude to the unknown parameter.
The difference between ‘Estimate’ and
‘Estimator’:
An estimate is a numerical value of the
unknown parameter obtained by applying a rule
or a formula, called an estimator, to a sample
X1, X2, …, Xn of size n, taken from a
population.
In other words, an estimator stands for the rule
or method that is used to estimate a parameter
whereas an estimate stands for the numerical
value obtained by substituting the sample
observations in the rule or the formula.
If X1, X2, …, Xn is a random sample of
size n from a population with mean , then
1 n
X X i is an estimator of , and x,
n i 1
the numerical value of X, is an estimate
of (i.e. a point estimate of ).
In general, the (the Greek letter ) is
customarily used to denote an unknown
parameter that could be a mean, median,
proportion or standard deviation, while an
estimator of is commonly denoted by
̂ , or sometimes by T.
It is important to note that an estimator is
always a statistic which is a function of the
sample observations and hence is a random
variable as the sample observations are likely
to vary from sample to sample.
In other words:
In repeated sampling, an estimator is a
random variable, and has a probability
distribution, which is known as its sampling
distribution.
Having presented the basic definition of a
point estimator, we now consider some
desirable qualities of a good point estimator:
In this regard, the point to be understood
is that a point estimator is considered a good
estimator if it satisfies various criteria.
X
E X implies that the distribution
of X is centered at .
What this means is that, although many of
the individual sample means are either under-
estimates or over-estimates of the true
population mean, in the long run, the over-
estimates balance the under-estimates so that
the mean value of the sample means comes
out to be equal to the population mean.
Let us now consider
some other estimators
which possess the
desirable property of
being unbiased:
The sample median is also
an unbiased estimator of
when the population is
normally distributed
(i.e.
If X is normally distributed,
then
~
E X . )
Also, as far as p, the
proportion of successes
in the sample is concerned,
we have:
Considering the binomial
random variable X (which denotes
the number of successes in n
trials), we have:
X 1
Ep̂ E EX
n n
np
p
n
Hence, the sample proportion is
an unbiased estimator of the
population parameter p.
But
As far as the sample
variance S2 is concerned, it
can be mathematically proved
that
E(S ) .
2 2
T h is q u a n tity is p o sitiv e if E ˆ
,
a n d is n e g a tiv e if E ˆ
,
2
s
x x 2
n 1
Since E(s ) = , hence s is
2 2 2
A n e s tim a to r ˆ is s a id to b e
a c o n s is te n t e s tim a to r o f th e
p a r a m e te r if, fo r a n y
a r b itr a r ily s m a ll p o s itiv e
q u a n tity e ,
Lim
n
P ˆ
e 1.
I
n otherw o
r d
s,an est
imatr̂
o
i
sc al
led aconsi
stentest
imat
or
of i
fthe pro
bab i
li
tythat̂i
s
ver
yc l
oset o ,a ppr
oach
es
uni
tyw ith an i
ncreasein t
he
s
a mplesi
ze.
It should be noted
that consistency is a
large sample property.
Another point to be
noted is that a consistent
estimator may or may not
be unbiased.
T h e sam p le m ean
1 n
X X i , w h ich is an
n i 1
u n b iased estim ator of , is a
con sisten t estim ator of th e
m ean .
The sample proportion p̂
is also a consistent
estimator of the parameter
p of a population that has a
binomial distribution.
The median is not a
consistent estimator of
when the population has
a skewed distribution.
The sample variance
1 n
S X i X ,
2 2
n i1
though a biased estimator, is
a consistent estimator of the
population variance .
2
Generally speaking, it can
be proved that a statistic
whose STANDARD ERROR
decreases with an increase in
the sample size, will be
consistent.
IN TODAY’S LECTURE,
YOU LEARNT
Sampling Distribution of X1 X 2
(continued)
Sampling Distribution of p̂1 p̂ 2
Point Estimation
Desirable Qualities of a Good Point
Estimator
–Unbiasedness
–Consistency
IN THE NEXT LECTURE,
YOU WILL LEARN