0% found this document useful (0 votes)
10 views

Chi-Square Test

The document provides an overview of the Chi-square test, a non-parametric statistical method used to assess the goodness of fit between observed and expected frequencies, as well as to test the significance of associations between attributes. It outlines the characteristics, assumptions, and degrees of freedom relevant to the test, along with practical applications and examples demonstrating its use in hypothesis testing. The document concludes with specific case studies illustrating how to apply the Chi-square test in various scenarios.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Chi-Square Test

The document provides an overview of the Chi-square test, a non-parametric statistical method used to assess the goodness of fit between observed and expected frequencies, as well as to test the significance of associations between attributes. It outlines the characteristics, assumptions, and degrees of freedom relevant to the test, along with practical applications and examples demonstrating its use in hypothesis testing. The document concludes with specific case studies illustrating how to apply the Chi-square test in various scenarios.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 5

Chi-Square Test

Dr. M. Shamim Uddin Khan


Introduction: Chi-square test is one of the most commonly used tests in statistics. Generally it is used
for testing hypothesis concerning the distribution of a random variable rather than the parameter of the
distribution, for which it referred to as non-parametric test. Chi-square test is applied in statistics to test
the goodness of fit to verify the distribution of observed data with assumed theoretical distribution.
Therefore, it is a measure to study the divergence of actual and expected frequencies. Chi-square tests
enable us to test whether more than two population proportions can be considered equal. In order that
Chi-square test may be applicable, both the frequencies must be grouped in the same way and the
theoretical distribution must be adjusted to give the same total frequency which is equal to that observed
frequencies. The test is, in fact, a technique through the use of which it is possible for all researchers to
(i) test the goodness of fit (ii) test the significance of association between two attributes, and (iii) test the
homogeneity or the significance of population variance. If the calculated value of is greater than the
tabled value of at certain level of significance, we reject the hypothesis. If there is no difference
between the actual and expected frequencies, is zero. If the calculated value of
is less than the tabled value at certain level of significance, it is said to be non-significant. Thus, the
test describes the discrepancy between theory and observation.
Characteristics of Chi-Square Test:
1. Test is based on events or frequencies, whereas in theoretical distribution, the test is based on
mean and standard deviation.
2. To draw inferences, this test is applied specially testing the hypothesis but not useful for
estimation.
3. The test can be used between the entire set of observed and expected frequencies.
4. For every increase in the number of degree of freedom, a new chi-square distribution is formed.
5. It is a general purpose test and as such is highly useful in research.
Assumptions of Chi-square Test:
1. Observations recorded and used are collected on a random basis.
2. All the observations must be independent.
3. All the events must be mutually exclusive.
4. No group should contain very few items say less than 10. (In case where the frequencies are less
than 10, regrouping is done by combining the frequencies of adjoining groups so that the new
frequencies become greater than10. Some statisticians take this number as 5, but 10 is regarded
as better by most of the statisticians.)
5. The overall number of items must also be reasonably large. (It should normally be at least 50,
however small number of groups may be).
6. For comparison purpose, the data must be in original units.
Degree of Freedom: When we compare the computed value of with the table value, the degree of
freedom is evident. The degree of freedom means the number of classes to which values can be assigned
at will, without violating restrictions. For e.g. we choose any four numbers whose total is 50. Here we
have a choice to select any three numbers say 10, 15 and 20 and the fourth number is 5. Thus our choice
of freedom is reduced by is one and degree of freedom is three. As the restrictions increase, the freedom
is reduced.
Thus, , where V is degree of freedom, k = No. of independent constraints, n = No. of frequency
classes.
For contingency table the degree of freedom is
where c means the number of column and r means number of row.

Uses of :
1. as a Test for Comparing Variance: The Chi-square value is often used to judge the
significance of population variance i.e. we can use the test to judge if a random sample has been drawn
from a normal population with mean and with a specified variance.

where variance of the sample, variance of the population, degree of freedom,


n being the number of items in the sample. Then by comparing the calculated value with the table value
of Chi-square for (n-1) degrees of freedom at a given level of significance, we may either accept or
reject the null hypothesis. If the calculated value of is less than the tabled value at certain level of
significance, the null hypothesis is accepted, but if the calculated value of is equal or greater than the
tabled value of at certain level of significance, the hypothesis is rejected.
Problem 1: Weight of 10 students is as follows:
Student No. 1 2 3 4 5 6 7 8 9 10
Weight (kg) 38 40 45 53 47 43 55 48 52 49
Can we say that the variance of the distribution of weight of all students from which the above sample of
10 students was drawn is equal to 20 kgs? Test this as 5 per cent and 1 per cent level of significance.
Solution: First of all we should work out the variance of the sample data or and the same has been
worked out as under:
Student No. (Weight in kgs)
1 38 -9 81
2 40 -7 49
3 45 -2 4
4 53 6 36
5 47 0 0
6 43 -4 16
7 55 8 64
8 48 1 1
9 52 5 25
10 49 2 4

kgs

= 31.11
Let the hypothesis be . In order to test this hypothesis we work out the value as under:

Degree of freedom in the given case is At 5% level of significance the table value of
and at 1% level of significance, it is 21.67 for 9 d.f. and both these are greater than the
calculated value 13.99. Hence we accept the null hypothesis and conclude that the variance of the given
distribution can be taken as 20kgs at 5% and 1% level of significance. In other words, the sample can be
said to have been taken from a population with variance 20kgs.
2. as a Non-Parametric Test: is an important non-parametric test and as such no rigid
assumptions are necessary in respect of the type of population. We require only the degree of freedom
for using this test. As a non-parametric test, can be used for (i) as a test of goodness of fit and (ii) as a
test of independence. It is calculated with the help of the following formula:

Case (i): as a Test Goodness of Fit: Through the test we can find out the deviations between
the observed values and expected values. Here we are not concerned with the parameters but concerned
with the form of distribution. test enables us to see how well does the assumed theoretical distribution
(such as Binomial distribution, Poisson distribution, or normal distribution) fit to the observed data.
When some theoretical distribution is fitted to the given data, we are always interested in knowing as to
how well this distribution fits with the observed data. The Chi-square test can give answer to this. If the
calculated value of Chi-square is less than the table value at a certain level of significance, the fit is
considered to be good one which means that the divergence between the observed and expected
frequencies is attributable to fluctuations of sampling. But if the calculated value of Chi-square is greater
than its table value, the fit is not considered to be good one.
Problem 2: 4 coins were tossed 160 times and the following results were obtained:
No. of heads 0 1 2 3 4
Observed frequencies 17 52 54 31 6
Under the assumptions that coins are balanced, find the expected frequencies of getting 0, 1, 2, 3 or 4
heads and test the goodness of fit.
Solution: Hypothesis is that the coins are unbiased.
x Expected frequency =
0
1
2
3
4

No O E O-E (O-E)2
of
heads
0 17 10 7 49 4.9
1 52 40 12 144 3.6
2 54 60 -6 36 0.6
3 31 40 -9 81 2.025
4 6 10 -4 16 1.6

d.f = 5 – 1 = 4;
Calculated value is greater than the table value. Therefore the fit is poor.

Case (ii): as a Test of Independence: test can be used to find out whether one or more
attributes are associated or not. For example, coaching class and successful candidate, marriage and
failure etc; we can find out whether they are related or independent. We take a hypothesis that the
attributes are independent. If the calculated value of is less than the table value at a certain level of
significance, the hypothesis is correct and vice versa.
Problem 3: A certain drug was administered to 500 people out of a total of 800 included in the sample
to test its efficacy against typhoid. The results are given below:
Typhoid No Typhoid Total
Drug 200 300 500
No Drug 280 20 300
Total 480 320 800
On the basis of these data, can it be concluded that the drug is effective in preventing typhoid.
Solution: Let the hypothesis be ‘the drug is not effective in preventing typhoid’.
Expected cell frequency =
The table of expected frequency is
500

300

480 320 800

O E O-E (O-E)2

200 300 -100 10000 33.33


280 180 100 10000 55.56
300 200 100 10000 50.00
20 120 -100 10000 83.33
800 800

d.f.1,
The computed value of is much greater than the table value. Therefore, the hypothesis –the drug is
not effective – is rejected. Hence we conclude that the drug is effective in preventing typhoid.
Problem 4: In an experiment on the immunization of goats from anthrax the following results were
obtained. Derive your inference on the vaccine.

Died of Anthrax Survived Total


Inoculated with vaccine 2 10 12
Not inoculated 6 6 12
Total 8 16 24
Solution: Let us take the hypothesis that the vaccine is not effective. Both the attributes are independent.
Expected frequency of any cell =
The table of expected frequency is

12

12

8 16 24
O E O-E (O-E)2

2 4 -2 4 1.0
10 8 2 4 0.5
6 4 2 4 1.0
6 8 -2 4 0.5
24 24 0 16

d.f.1,
The computed value of is 3 which is less than the table value. Therefore, the null hypothesis may be
accepted. Hence we conclude that the vaccine is ineffective in controlling the disease.

You might also like