Edur 8131 Notes 5 T Test
Edur 8131 Notes 5 T Test
(a) Formulas
Recall the Z test formula:
X
zX =
n
The one sample t-test, which is very similar to the Z test, has the following formula:
X X
t= =
s n sX
where the only difference is SD versus in the Z test. That is, the standard error of the mean is now
estimated by the formula:
SD
sX =
n
where the symbol, s X , is used to indicate that the standard error of the mean is being estimated with the
sample SD. Recall that the standard error of the mean for the Z test was calculated as:
X =
n
Non-directional:
H1: 1000
H0: = 1000
Lower-tailed test
H1: < 1000
H0: 1000 or
H0: = 1000 (this one is preferred)
Upper-tailed test
H1: > 1000
H0: 1000 or
H0: = 1000 (this one is preferred)
2
Note that for the directional hypotheses, the alternative, H1, states what one expects to find (as long as a
relationship, or difference, is expected). For example, if one expects that a sample of students will have a
higher than average IQ, then H1: > 1000. Similarly, if one expects that a given sample of students will
have a lower than average IQ, the H1: < 1000.
df (or ν) = n - 1
where, as before, n is the sample size. Degrees of freedom can be described as the amount of information
available in the sample after certain mathematical restrictions are applied to the data.
Two-tailed tests:
If t -tcrit or t tcrit, then reject H0; otherwise, fail to reject H0
Note that tcrit symbolizes the critical t-value found in t tables and is different from t, which is the
calculated t-ratio obtained from sample data.
(f) An example
A physical education teacher wishes to know whether his class of students is statistically above or below
the national average in weight. The national average for eighth graders is = 100. The student weights
for his class are: 99, 98, 105, 110, 115, 103, 88, 125, 130, and 115. For this sample, n = 10, X = 108.8,
SD = 12.839, so
Version: 3/1/2012
3
The goal of the test is to determine whether the sample has an average weight that is statistically different
from the national average. This calls for a non-directional test because the specific direction of the mean
difference (higher or lower) was not indicated. Therefore,
If 2.167 -2.262 or 2.167 2.262, then reject H0; otherwise, FTR H0.
Since 2.167 is neither less than -2.262 nor greater than 2.262, the null is not rejected (i.e., fail to reject)
and one concludes that the sample does not have a statistically different mean from the national average.
What happens if instead it was hypothesized that the sample would have a lower than average weight,
i.e.,
This is a lower tailed test since the sample is expected to have a lower mean score. If = .05, the
corresponding critical t-value is tcrit = - 1.833. The decision rule states:
Finally, had one hypothesized that the sample would be above average, then
This is an upper-tailed test. The critical t-value, for an alpha of .05, is tcrit = 1.833, and this time the null
is rejected, so the sample can be said to have a statistically higher average weight than normal.
It is also possible to perform hypothesis testing with the t-test without using critical t-values. Recall that
the Z test had decision rules for p-values. Calculated probability values, p-values, are usually reported
with statistical software, and the decision rules for the Z test also apply to the t test. See earlier notes for
these decision rules. (This section to be discussed further in class.)
(g) Assumptions
The assumptions for the one sample t-test are identical to the Z test: normality and independence.
Version: 3/1/2012
4
(h) Exercises
(1) Raw MAT scores are 31, 38, 27, 41, 39, and 36. Is this sample statistically different from the national
average MAT of 30? Set = .01.
(3) Same scores and = .05, but this time hypothesize that the sample average will be greater than
population average.
(4) Fifteen students have a sample MAT mean of 32.3 with a sample standard deviation of 4.73. Does
this sample of students have a mean MAT score that is statistically different from the national average at
the .01 level with a two tailed test? What about = .10 and a two tailed test? What about = .05 and an
upper-tailed test?
(5) Suppose the following random sample of ITBS-math scores are observed in your middle school: 45,
58, 65, 63, 35, 43, 78, 55, 58, 69, 81, and 49. Is this evidence that your middle school has a student
population above average in terms of mathematics skills (national average ITBS-math is 50)? Set alpha at
.05.
(6) You wish to determine whether you are getting cheated every time you buy a bag of apples. The
standard bag of apples that you buy states that it contains one pound (16 ounces) of apples. After you get
home you notice that the bag only contains 15.5 ounces, not the stated 16 ounces. To determine whether
or not the company is systematically cheating the consumer, you decide to buy every 16 ounce bag of
apples in the three local grocery stores. After weighing each bag you find the following weights: 14.3,
15.5, 16.3, 17.0, 15.2, 15.9, 14.8, 15.0, 15.2, 15.9, 15.7, 15.6, and 16.1. Setting the significance level at
.05, does it seem the company is systematically cheating the consumer? Which should you perform, an
upper-, lower-, or two-tailed test? Why?
(7) Ford claims that its new car, Aspire, gets 39 mpg on the highway. Consumer Reports magazine
wishes to test this claim, so they hire you for $1500 to perform the statistical testing. They buy 10
Aspires and road test each. They find the following mpg estimates for the cars: 32, 43, 39, 38, 34, 36, 35,
38, 39, and 36. Their question to you is: Does our sample of Aspires have an estimated mpg that is
different from Ford's claim? Set alpha at .05 and give them an answer.
Exercises 1 through 3
. ttest mat=30
Ho: mean = 30
t = 2.46 with 5 d.f.
Pr > |t| = 0.0574
Version: 3/1/2012
5
Exercise 4
. ttesti 15 32.3 4.73 30
Ho: mean = 30
t = 1.88 with 14 d.f.
Pr > |t| = 0.0806
Exercise 5
. ttest itbs=50
Ho: mean = 50
t = 2.05 with 11 d.f.
Pr > |t| = 0.0649
Exercise 6
. ttest weight = 16
Ho: mean = 16
t = -2.17 with 12 d.f.
Pr > |t| = 0.0503
Exercise 7
. ttest mpg = 39
Ho: mean = 39
t = -2.05 with 9 d.f.
Pr > |t| = 0.0711
When estimating a parameter, one typically uses a point estimate like X , s, or s2. Using these point
estimates, one may construct an interval which will show a possible interval range of values which might
include the parameter being estimated.
(1 - )CI = X
1 / 2 t df s
X
Version: 3/1/2012
6
SD
(1 - )CI = X 1 / 2 tcritical
n
or simply
SD SD
(1 - )CI = X 1 / 2 tcritical , X 1 / 2 tcritical
n n
This is 100(1 - ) confidence interval. That is, if = .05, then this is a 100(1 - .05) = 100(.95) = 95%
confidence interval, or .95CI. A .95CI means that one can be 95% confident that all intervals constructed
like this for 100 random samples, in the long run, will contain the population value . This means that if
100 such intervals were constructed, on average the population value of would be correctly included in
95 of those intervals while would increase fail to include .
To calculate this CI, choose , say at .05, then construct the interval by simply finding the critical value
associated with = .05, and filling in the rest of the formula.
Example:
Construct .95CI for a class of high school students (n = 12) with a mean IQ of 120 and a standard
deviation of 16.5.
16.5 16.5
120 2.201 , 120 2.201
12 12
= (109.517, 130.483)
With such an interval, one may state that one is 95% confident that this interval contains the true for all
students who are like the students in the particular high school class (apparently smart students).
Based upon this confidence interval, it seems that this high school class is quite different from the mean
score typically found for IQ tests in the population. How does one know this?
The CI may also be used as a non-directional hypothesis test. If the hypothesized population value of is
not within the CI, then H0: = 100 may be rejected. Since the value 100 is not within the interval
constructed, which ranges from 109.5 to 130.5, one may conclude that sample data appears to differ,
statistically, from the hypothesized value of 100. In this particular case, the sample data such a mean that
is higher than the expected value of 100.
Version: 3/1/2012
7
Exercises
(1) Construct a .99CI for the following scores: 120, 123, 125, 101, 98, 101. Test the hypothesis, using the
.99CI, that H0: = 100.
(4) Fifteen students have an SAT mean of 1200 with a standard deviation of 150. Does this sample of
students have a mean SAT score statistically different from the population mean of 1000 at the = .05?
Use a CI to answer this question. Is the mean statistically different if a .99CI is used?
3. The Two-Independent Samples t test (also called the Two Group t test)
(a) Situation
Both the Z test and the one sample t-test allow one to statistically comparing the mean of one sample of
observations with a given population value (e.g., ). If one is interested in comparing two independent
groups, then the two independent sample t-test may be appropriate.
For example, suppose one is using a posttest only control group design to examine the effect of computer
assisted learning in geography achievement among third graders. The control (or comparison) group is
taught U.S. geography with the traditional methods using maps, textbooks, and workbooks. The
experimental group uses the computer game Where in the U.S. is Carmen SanDiego. At the end of the
lesson, both groups are given the same posttest. A two group independent t-test would be appropriate for
determining statistical difference between the control and experimental groups.
Non-directional:
The experimental and control group will have different levels of achievement in US geography.
where 1 represents for group 1 (experimental group) and 2 represents group 2 (control group).
Version: 3/1/2012
8
( X 1 X 2 ) ( 1 2)
t=
s X1 X 2
Since it is usually assumed that 1 - 2 = 0.00 (no difference in the population values), the t formula can
be simplified to
X1 X 2 X1 X 2
t = =
s X1 X 2 SEd
where
s12 s22
SEd = s X1 X 2 =
n1 n2
Note that SEd represents the standard error of the difference, which, like the standard error of the mean,
represents the standard deviation of the sampling distribution for X1 X 2 . The symbols s12 and s22
represent the variances for group 1 and group 2, respectively.
Recall that the sampling distribution of the sample mean has a known distribution that approaches the
normal distribution when sample sizes are large. The sampling distribution for X1 X 2 also follows the
central limit theorem. Note that the mean of the sampling distribution of X1 X 2 is equal to 1 - 2. The
standard error for X1 X 2 is SEd = s X1 X 2 .
df (or ν) = n1 + n2 – 2
where the n1 is the sample size for group 1 (experimental group) and n2 is the sample size for group 2
(control group).
Version: 3/1/2012
9
Two-tailed test
If t -tcrit or t tcrit, then reject H0; otherwise, fail to reject H0
One-tailed (upper-tailed, group 1 anticipated to have higher mean than group 2) test
If t tcrit, then reject H0; otherwise, fail to reject H0
One-tailed (lower-tailed, group 1 anticipated to have lower mean than group 2) test
If t - tcrit, then reject H0; otherwise, fail to reject H0
(f) Assumptions
The two independent samples t-test requires that the raw scores in both populations be normally
distributed and independent. Also, the two populations should have equal (homogeneous) variances. The
two group t-test is generally robust to non-normality and unequal variance (provided n1 n2), but is not
robust to dependence of observations.
(g) An Example
Recall the geography experiment. The scores for both groups are:
X e = 87.889 X c = 83.111
s = 4.256 s = 5.578
n= 9 n= 9
The experimental group has a mean of 87.889 and a standard deviation of 4.256, and the control group
had a mean of 83.111 and a standard deviation of 5.578. There were 9 students in the experimental group
and 9 students in the control group. So the two independent group t-test, with an = .05 and a non-
directional test would be:
and the degrees of freedom are df = n1 + n2 - 2 = 9 + 9 - 2 = 16. The critical t is: tcrit = 2.120. The
rejection regions are: t <
_ -2.120, and t >
_ 2.120, and the decision rule is:
Version: 3/1/2012
10
The correct decision is fail to reject H0. One would therefore conclude the following:
Note, however, what would happen if one hypothesized that the experimental group would have higher
scores than the control group. If = .05, the critical value for an upper-tailed would be 1.746, so the
decision rule would be:
The data indicate that students who learn with the computer program Carmen SanDiego show a
statistically significant, at the .05 level, higher achievement score in U.S. geography. Thus, use of
the software appears to benefit students.
Version: 3/1/2012
11
The question one may ask after rejecting H0 is just how strong an impact does the treatment have on
student achievement. One measure of the strength of the association between the treatment and the
outcome is eta squared, η2:
t2
η2 =
t 2 df
2.0432 4.174
η2 = = = .207
2.043 16
2
4.174 16
The value obtained for η2 may be interpreted in a manner identical to r2, such as the variance explained or
predicted in posttest scores by the treatment. In fact, if one calculates a Pearson's correlation between the
two numerical variables listed in the table below (posttest scores and the indicator of treatment
[1=treatment, 0=control]), the obtained r will be equal to .455 and the r 2 will be .207!
Version: 3/1/2012
12
This should indicate to you that one may actually use a Pearson correlation to determine whether two
groups are statistically different. For example, using the same experimental data, one could reproduce the
same t value obtained from the two independent groups t-test using only the correlation r:
In short, the two group independent t-test and the Pearson correlation coefficient provide identical
inferential results. The two group t-test requires the calculation of η2 in order to determine the strength of
the relationship between the IV and DV.
ES, denoted in the researcher literature as d and/or Δ, may be calculated with one of two formulas. First,
d is
X1 X 2
d=
SD within
where
Version: 3/1/2012
13
X X X X
2 2
1 2
SDwithin =
n1 1 n2 1
Second, Δ is
X1 X 2
Δ=
SD controlgroup
where SDcontrol group is simply the SD of the control group (if one is present).
Note that both d and Δ describe the magnitude of the difference between the two group means in standard
deviation units. So, for example, if d or Δ = .2, then this indicates that the two group means differ by .2
standard deviations. The larger either d or Δ, the greater the difference between two groups, and, hence,
the larger the effect of the treatment.
X X X X
2 2
1 2
SDwithin =
n1 1 n2 1
144.908 248.913
=
(9 1) (9 1)
393.821
=
16
= 24.614 = 4.961
X 1 X 2 87.889 83.11
d= = = 0.963
SD within 4.961
X1 X 2 87.889 83.11
Δ= = = 0.857
SD controlgroup 5.578
Either ES is appropriate to use when an experimental group is compared to a control group. When two
groups are compared and the two groups do not represent experimental and control (such as males vs.
females), then one should use d as the measure of ES.
Version: 3/1/2012
14
(k) Exercises
(1) Determine whether boys have a statistically different, at the 1% level, ITBS math score from girls.
The mean math score for boys is 78 (s = 5.3) and the mean for girls is 73 (s = 6.1). There are 25 boys and
25 girls.
(a) What is the correct H0 and H1 in both written and symbol form?
(b) What are the critical and calculated t-values?
(2) Determine whether a statistical difference exists between men and women in weight:
Men: 156, 158, 175, 203, 252, 195
Women: 149, 119, 168, 123, 155, 126
(a) Test for a non-directional H0 with = .01; what is the correct H0, H1?
(b) Test for a non-directional H0 with = .10.
(c) Test the hypothesis that men will have lower weight, and set = .10. What is the correct H0, H1?
(3) Two classes of educational research were taught with two different methods of instruction, teacher
guided (TG) and self paced (SP). Which had the better student achievement at the end of the quarter?
(a) Test for a non-directional H0 with = .01; what is the correct H0, H1?
(b) Test for a non-directional H0 with = .10.
(c) Test the hypothesis that TG will have higher scores, and set = .05. What is the correct H0, H1?
Example 1
. ttesti 25 78 5.3 25 73 6.1
Version: 3/1/2012
15
Example 2
. ttest weight, by(sex)
Example 3
. ttest scores, by(groups)
The correlated t test allows the researcher to consider differences between two groups or sets of scores
that are related to one-another. Under what conditions is one likely to find correlated or dependent
samples or groups?
Condition 1
Before/After Studies; Multiple Measures on the Same Subject = This type of data occurs most often with
pretest-treatment-posttest experimental designs. These types of designs are used to determine whether
some treatment will change posttest scores relative to the pretest score. The pretest and posttest scores
are related because the scores are taken from the same individuals, i.e., each person is measured twice.
Examples:
(a) A student takes the SAT, enrolls in an SAT enhancement class, and then retakes the SAT. Two scores
from the same student exist.
(b) A teacher measured the reading performance of a third-grader, presented some treatment designed to
increase reading performance, then remeasured the student's reading performance again (two scores from
same individual).
(c) A PE teacher measures the vertical jumping ability of his class, provides his class a weight training
program for one month, then remeasures vertical jumping ability of each student (two scores from same
students).
Version: 3/1/2012
16
Condition 2
Matched-Subjects = Two groups are involved in the study (experimental and control); and they are
matched on some extraneous variable(s) that is likely to be related to the dependent variable being
examined.
Examples:
(a) A teacher is interested in determining whether "Hooked on Phonics" increases third-grade students'
reading performance. Using two groups of students, group A (the experimental group) will use "Hook on
Phonics" for one month, and group B (the control) will be exposed to the usual reading lessons during the
month. The teacher knows that IQ influences reading performance, so to control for the effects of IQ on
the dependent variable (which is a posttest on reading performance), the researcher matches students in
the two groups on their IQ levels in a fashion similar to the schematic below:
In this scheme, students from both groups are matched according to their IQ levels. It is important to
match on IQ since we would expect students with higher IQs to perform better on a reading test than
students with lower IQs.
(b) As another example, one might make a comparison of faculty salary between men and women to
determine whether sexual discrimination exists. It would be important to match men and women on
academic rank since we know that assistant professors, on average, make less than associate and full
professors.
Condition 3
Naturally occurring pairs = Natural pairs, such as husbands and wives, twins, brothers, sisters, brothers
and sisters, parents and their children, etc. With naturally occurring pairs, one would expect the pairs to
hold similar feelings, beliefs, attitudes, etc., so their scores will generally be related to one-another.
Examples:
(a) Determining whether husbands' attitudes toward politics are similar to their wives. Since people tend
to marry others like themselves, one would expect that most husbands and wives to hold similar political
views.
(b) Determining whether boys' IQ differs from girls' IQ. Since brothers and sisters are similar genetically,
one might anticipate the two to have similar IQs, that is, their IQs are likely to be related; therefore,
brothers and sisters need to be matched.
Hypothesis Formulation:
The hypothesis tested with the correlated t-test is the same as in the independent t-test.
For example, suppose one is in determining whether boys or girls get higher math scores on the ITBS.
Clearly, intelligence plays an important part in determining mathematics performance, so this is a factor
that needs to be controlled through matching. One may formulate several hypotheses, as demonstrated
below.
Version: 3/1/2012
17
Non-directional:
The average ITBS math scores will differ between boys and girls; their scores will differ on average.
X1 X 2 X1 X 2
t = =
s X1 X 2 SEd
where X 1 X 2 is the difference between the two sample means, and the denominator is the standard error
of the difference, SEd.
Note that this is identical to the formula for the two independent sample t test. The difference between
the formulas for the independent and the correlated t test occurs in the calculation of the standard error of
the difference.
For the correlated t test the standard error of the difference is calculated as:
s12 s22 s s
SEd = s X1 X 2 = 2r12 1 2
n1 n2 n n
but in the independent t test it is assumed that the groups are not related (scores between groups are not
correlated), so the standard error looses the correlated term in the formula, i.e.:
Version: 3/1/2012
18
s12 s22 s s s2 s2 s s s2 s2 s2 s2
s X1 X 2 = 2r12 1 2 = 1 2 20 1 2 = 1 2 0 = 1 2
n1 n2 n n n1 n2 n n n1 n2 n1 n2
If there is no correlation, then the SEd formula reduces to the SEd formula given in the independent
samples t-test. In short, the primary difference between the two t tests is the calculation of the standard
error of the difference, SEd.
d d d d
t = = =
sd2 / n sd2 / n SEd
d=
d
n
SEd = sd2 / n
where sd2 is the variance of the difference scores, and is calculated like a regular variance, i.e.,
d d
2
sd2 =
n 1
In short, the correlated t test is may be viewed as the mean of the difference, d , divided by the standard
error of the difference, SEd.
d
t =
SEd
Degrees of Freedom:
The df for the correlated t test is calculated as:
df = n - 1
Decision Rules:
The decision rules are the same as in the independent two-sample t test:
Version: 3/1/2012
19
Two-tailed tests:
If t -tcrit or t tcrit, then reject H0; otherwise, fail to reject H0
Note that tcrit symbolizes the critical t-value found in t tables and is different from t, which is the
calculated t-ratio obtained from sample data.
Example 1:
Suppose we are interested in determining whether salary differs between men and women faculty at
GSU. When randomly selecting subjects for the study, it is important that we take into consideration their
academic rank since full professors make more money than associate professors, and associates make
more money than assistant professors, on average. Test the hypothesis of no difference between men and
women, H0: 1 = 2, at the 5% significance level.
d=
d = 16000 = 2285.714
n 7
SEd = sd2 / n
where sd2 is the variance of the difference scores, and is calculated like a regular variance, i.e.,
Version: 3/1/2012
20
d d
2
sd2 =
n 1
d d
2
89928571.43
n 1
7 1
SEd = sd / n =
2 =
n 7
89928571.43 14988095.24
= 7= =1463.269
6 7
d d 2285.714
t= = = = 1.562
sd2 /n SEd 1463.269
The critical values at the .05 level for df = n - 1 = 6 are 2.447, so fail to reject H0 and conclude that
salaries do not appear to differ between men and women faculty at GSU even after controlling for
academic rank.
What do you think would happen if an independent samples t test were used to analyze the above data?
Calculate the regular independent t test and see: Mmen = 38000, SDmen = 10012.49, Mwomen = 35714.29,
and SDwomen = 11250.4.
Which is more powerful (recall that power represents the probability of rejecting a false H0), the
independent or correlated t test? Why?
Version: 3/1/2012
21
Example 2:
A researcher wishes to discover whether or not the intake of orange juice increases the potassium level in
the bloodstream. A group of 12 elderly patients are selected from those in a nursing home, where
previous diet has been controlled. Potassium blood levels are measured for each subject. Next, each
subject is given a quart of orange juice, and, two hours later, potassium levels are again measured. Test
the difference in potassium levels at the 5% level. The data are as follows (the scaled scores represent
potassium blood levels):
d=
d = 24
= -2.00
n 12
(d d ) 2
24 24
n 1
SEd = sd2 / n = = 12 1 = 11 = 2.182 = .426
n 12 12 12
d d 2
t= = = = -4.695
sd2 / n SEd .426
The hypothesis was that orange juice will increase potassium in the blood stream, i.e., the pretest scores
will be lower than the posttest scores. This hypothesis indicates that a lower-tailed test is needed since
H0: 1 2 and H1: 1 < 2.
The critical value at the .05 level for df = 12 - 1 = 11 is - 1.796, so we reject H0, and conclude that orange
juice does appear to increase the amount of potassium in the blood stream for elderly people.
Version: 3/1/2012
22
Exercises:
(1) A researcher is interested in determining whether typing speed is affected by the kind of typewriter
(electric versus manual) used. A group of student typists, equally experienced on both types of machines,
are randomly selected and are matched on the basis of their typing speed (error-free words per minute).
One group is then tested on an electric machine and the other group on a manual machine. Test H0 at the
1% significance level. The data are as follows:
(a) What are the correct H0 and H1 in both written and symbolic form?
(b) What is (are) the critical value(s)?
(c) What is the obtained (calculated) t value?
(d) Did you reject or fail to reject H0?
(e) Write your conclusion as if explaining the results to non-statisticians.
Version: 3/1/2012
23
(2) A psychologist wishes to look at the relationship between frustration and positive attitude. He
hypothesizes that frustration affects attitude. Students are given an "Attitude Toward Psychologists"
(ATP) instrument prior to taking their first exam in an introductory psychology course. After completing
the ATP instrument, the students are then administered their course exam. The teacher, a psychologist,
made the exam especially difficult in an attempt to frustrate his students. After completing the exam, all
students were asked to fill out the ATP instrument again. High scores on the ATP instrument indicate
more positive attitudes toward psychologist. The data are as follows:
(a) What are the correct H0 and H1 in both written and symbolic form?
(b) What is (are) the critical value(s)?
(c) What is the obtained (calculated) t value?
(d) Did you reject or fail to reject H0?
(e) Did frustration influence the students' attitude toward psychologists? Write your conclusion as if
explaining the results to non-statisticians.
For additional examples, see chapter exercises in book and notes on course web page.
Version: 3/1/2012