0% found this document useful (0 votes)
291 views

Hypothesis Testing Using Minitab

Hypothesis testing is used to determine whether to accept or reject claims about a population based on a sample. A hypothesis test examines the null hypothesis of "no effect" against the alternative hypothesis. If the p-value is less than the significance level, the null hypothesis can be rejected. The document then discusses performing hypothesis tests in Minitab, including specifying hypotheses, choosing a significance level, collecting data, and comparing the p-value to the significance level to determine whether to reject or fail to reject the null hypothesis. Finally, different types of hypothesis tests like z-tests, t-tests, ANOVA, and chi-square tests are described.

Uploaded by

Aero Acad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
291 views

Hypothesis Testing Using Minitab

Hypothesis testing is used to determine whether to accept or reject claims about a population based on a sample. A hypothesis test examines the null hypothesis of "no effect" against the alternative hypothesis. If the p-value is less than the significance level, the null hypothesis can be rejected. The document then discusses performing hypothesis tests in Minitab, including specifying hypotheses, choosing a significance level, collecting data, and comparing the p-value to the significance level to determine whether to reject or fail to reject the null hypothesis. Finally, different types of hypothesis tests like z-tests, t-tests, ANOVA, and chi-square tests are described.

Uploaded by

Aero Acad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Hypothesis testing using Minitab

What is a hypothesis test?


A hypothesis test is rule that specifies whether to accept or reject a claim about a population
depending on the evidence provided by a sample of data.

A hypothesis test examines two opposing hypotheses about a population: the null hypothesis
and the alternative hypothesis. The null hypothesis is the statement being tested. Usually the
null hypothesis is a statement of "no effect" or "no difference". The alternative hypothesis is
the statement you want to be able to conclude is true based on evidence provided by the
sample data.

Based on the sample data, the test determines whether to reject the null hypothesis. You use a
p-value, to make the determination. If the p-value is less than the significance level (denoted as
α or alpha), then you can reject the null hypothesis.

Performing a basic hypothesis test


1. Specify the hypotheses.

Formulate the hypotheses.

The null hypothesis is: The population mean of all the observations is equal to a fixed value (say
10 cm). Formally, this is written as: H0: μ = 10

2. Then, choose from the following alternative hypotheses:

Condition to test Alternative Hypothesis

The population mean is less than the target. one sided: μ < 10

The population mean is greater than the target. one sided: μ > 10

The population mean differs from the target. two sided: μ ≠ 10

Formally, this is written as Ha: μ ≠ 10


3. Choose a significance level (also called alpha or α).

A significance level of 0.05 is the most commonly used significance level. However, depending
on the criticality of the process, other values of α may be chosen.

4. Collect the data.


5. Compare the p-value from the test to the significance level.

After performing the hypothesis test, Minitab gives a p-value. Compare the p-value is against
the significance level α (0.05).

6. Decide whether to reject or fail to reject the null hypothesis.

If p-value > α, then accept the null hypothesis, otherwise reject the null hypothesis.

Different types of Hypothesis tests


Z-test
In a z-test, the sample is assumed to be normally distributed. A z-score is calculated with population
parameters such as “population mean” and “population standard deviation” and is used to validate a
hypothesis that the sample drawn belongs to the same population.

T-test
A t-test is used to compare the mean of two given samples. A t-test requires a normal distribution of
the sample. A t-test is used when the population parameters (mean and standard deviation) are not
known.

There are three versions of t-test

1. Independent samples t-test which compares mean for two groups

2. Paired sample t-test which compares means from the same group at different times

3. One sample t-test which tests the mean of a single group against a known mean.

ANOVA
ANOVA, also known as analysis of variance, is used to compare multiple (three or more) samples with
a single test. There are 2 major flavors of ANOVA

1. One-way ANOVA: It is used to compare the difference between the three or more samples/groups of
a single independent variable.
2. MANOVA: MANOVA allows us to test the effect of one or more independent variable on two or more
dependent variables. In addition, MANOVA can also detect the difference in co-relation between
dependent variables given the groups of independent variables.

Chi-Square Test
Chi-square test is used to compare categorical variables. There are two type of chi-square test

1. Goodness of fit test, which determines if a sample matches the population.

2. A chi-square fit test for two independent variables is used to compare two variables in a contingency
table to check if the data fits.
One sample t-test using Minitab
Assumptions
A one-sample t-test has four assumptions.

o Assumption 1: The dependent variable should be measured at a continuous level.

o Assumption 2: The data are independent (i.e., not correlated/related), which means that there


is no relationship between the observations.

These assumptions cannot be tested using the software and has to be ensured by the researcher.

Assumptions 3 and 4 relate to the nature of your data and can be checked using Minitab.

o Assumption 3: There should be no significant outliers. It is possible to test this assumption using
a simple box plot (see chapter on Graphs using Minitab). A box plot in Minitab will indicate all
those observations that fall outside the 3-sigma limits. These are outliers and can be removed
from the dataset before proceeding with the test.

o Assumption 4: The dependent variable should be approximately normally distributed. You can


test for normality using the Anderson darling test for normality using Minitab (see normality test
in Chapter____).

Example
An order was placed for a component that was to weigh 50 milligrams with a tolerance of +/- 1.5
milligrams. 2 vendors were asked to provide 50 samples of the components so that they could be
compared and the order could be placed on one of them. The data from the samples is shown:

A B
49. 51. 48. 49. 46. 47. 52. 47. 49. 51.
26 07 05 56 41 33 84 13 09 13
49. 48. 49. 48. 48. 49. 51. 50. 52. 50.
75 95 70 32 16 31 08 35 24 16
49. 49. 49. 48. 50. 54. 50. 50. 52. 49.
57 98 34 07 72 19 79 35 44 44
49. 51. 49. 49. 50. 51. 51. 49. 47. 49.
17 17 22 35 12 34 27 09 59 42
49. 48. 50. 48. 50. 51. 48. 51. 49. 48.
53 40 27 60 57 06 96 09 67 61
49. 48. 49. 48. 49. 48. 47. 52. 51. 48.
87 97 36 50 79 29 11 20 32 65
49. 47. 47. 48. 48. 48. 47. 46. 50. 50.
92 32 99 08 36 70 11 69 04 19
48. 49. 48. 48. 49. 51. 52. 52. 49. 49.
10 87 33 37 27 97 09 04 30 07
50. 49. 49. 48. 50. 52. 48. 50. 51. 50.
90 55 37 39 34 08 50 92 07 06
49. 49. 50. 48. 49. 52. 48. 50. 46. 49.
76 67 16 89 39 03 33 55 20 55

Analyse the data and determine whether they conform to the target value of 50 mg.

Setup in Minitab
In Minitab, we set up the variable, Vendor A (say), under column “C1”. Then, we enter the scores on the
dependent variable (i.e., the weight of components) into the column.

The hypothesis will be:


Null Hypothesis (H0): μA = 50
Alternative Hypothesis (Ha): μA ≠ 50
Test in Minitab
1. Choose Stat > Basic Statistics > 1-Sample t.

2. This opens a dialog box. From the drop-down list, select


“One or more samples, each in a column”
3. In the right-lower box, enter variable name “Vendor A”. This can be selected from the left box
that lists all the available variables in the worksheet.
4. Select Perform hypothesis test, and in Hypothesized mean, enter 50.

5. Click the Graphs button, and then select Histogram. Click OK in each dialog box.

Note: By default, Minitab uses 95% confidence intervals, which equates to declaring statistical
significance at the p < .05 level. If you want to change this, you can do so by first clicking on
the “Options” button, which opens the 1-Sample t - Options dialogue box, where the level of significance
can be set to the required value.
Output of the one-sample t-test in Minitab
The Minitab output for the one-sample t-test is shown below:

Minitab will present the descriptive statistics including the sample size (the "N" column), mean (the
"Mean" column), standard deviation (the "StDev" column) and the standard error of the mean ("SE
Mean" column), as well as the 95% confidence interval (CI) of the mean ("95% CI").

Finally, the results of the one-sample t-test include the value of the known or hypothesized population
mean you are comparing your sample data to (the Test of mu = 50 vs not = 50 row), the observed t-
value (the "T" column) and the statistical significance (2-tailed p-value) of the one-sample t-test (the "P"
column).

As brought out earlier, if p-value > α (0.05, in this case), then accept the null hypothesis,
otherwise reject the null hypothesis. Here, the p-value is “0.000”. In other words, the
probability of getting a sample like the one above (Vendor A) is extremely low (0.000). Hence
we reject the null hypothesis that Average weight of the components is 50 mg.

The histogram for “Vendor A” is as shown below:


The histogram clearly shows that the observations are concentrated more to the left of the required
target value.

Just as an exercise, we will repeat the same steps for “Vendor B”. The results for Vendor B are shown
below:

The p-value in this case is greater than α (0.05, in this case), hence, we accept the null
hypothesis that Average weight of the components is 50 mg.

Also, the histogram clearly shows that the observations are concentrated to the center of the required
target value.

2-Sample T Test in Minitab


When working with data sets, often there will be a need to compare two groups to each other. A 2-
sample T test is a hypothesis test is a hypothesis test to study whether there is a statistically significant
difference between the means of two populations. The 2-sample T test runs a comparison of two
categories within the same categorical variable, which becomes valuable when trying to answer
questions that involve understanding the effects of the addition of a program or change to a sample of
subjects.

Example
An order was placed for a component that was to weigh 50 milligrams with a tolerance of +/- 1.5
milligrams. 2 vendors were asked to provide 50 samples of the components so that they could be
compared and the order could be placed on one of them. The data from the samples is shown:

A B
49. 51. 48. 49. 46. 47. 52. 47. 49. 51.
26 07 05 56 41 33 84 13 09 13
49. 48. 49. 48. 48. 49. 51. 50. 52. 50.
75 95 70 32 16 31 08 35 24 16
49. 49. 49. 48. 50. 54. 50. 50. 52. 49.
57 98 34 07 72 19 79 35 44 44
49. 51. 49. 49. 50. 51. 51. 49. 47. 49.
17 17 22 35 12 34 27 09 59 42
49. 48. 50. 48. 50. 51. 48. 51. 49. 48.
53 40 27 60 57 06 96 09 67 61
49. 48. 49. 48. 49. 48. 47. 52. 51. 48.
87 97 36 50 79 29 11 20 32 65
49. 47. 47. 48. 48. 48. 47. 46. 50. 50.
92 32 99 08 36 70 11 69 04 19
48. 49. 48. 48. 49. 51. 52. 52. 49. 49.
10 87 33 37 27 97 09 04 30 07
50. 49. 49. 48. 50. 52. 48. 50. 51. 50.
90 55 37 39 34 08 50 92 07 06
49. 49. 50. 48. 49. 52. 48. 50. 46. 49.
76 67 16 89 39 03 33 55 20 55

Analyse the data and determine whether the average weight of components from each vendor is equal.

Setup in Minitab
We will be comparing the weight of components between Vendor A and Vendor B. We will use a data
set assuming that each data set is normally distributed with equal variances. The hypothesis will be:
Null Hypothesis (H0): μA = μB
Alternative Hypothesis (Ha): μA ≠ μB

Where μA is the mean of one population and μB is the mean of the other population of our interest.

In Minitab, we set up one variable, Vendor A (say), under column “C1” and the other Variable “Vendor
B” under column “C2”. Then, we enter the scores on both the variables into the respective columns.

Test in Minitab
In this example, we will be using a 2-Sample t data file for Minitab.
1. Click Stat → Basic Statistics → 2-Sample t.

A new window named “Two-Sample t for the Mean” pops up.

2. From the drop down list select “Each sample is in its own column”. Click in the blank box next to
“Samples” and the “Vendor A” and “Vendor B” appears in the list box on the left.
3. Select “Vendor A” and “Vendor B” as the “Samples.”

4. Click options.
Set the required confidence level. Check the box that says “Assume Equal Variances”

5. Click “OK” to save, and click “OK” again to run the test.

Results of our 2-Sample t Test:


The results for our study of how to run a 2-sample t test in Minitab (when σ1 = σ2) appear automatically
in the session window after clicking “OK.” Minitab’s output is below.

Take notice of a couple of important bits of information provided by the output. The mean of Vendor A
and Vendor B, the number of data points for each state represented by ‘N’ as well as each standard
deviation.

The key statistical output provided by Minitab when running a 2-sample t test is the P-Value. Since the
p-value of the t-test (assuming equal variance) is 0.01, it is lesser than the alpha level of 0.05. Therefore
we reject the null hypothesis which was (H0): μA = μB.
One-way ANOVA using Minitab
Introduction
The one-way analysis of variance (ANOVA) is used to determine whether the mean of a dependent
variable is the same in two or more unrelated, independent groups of an independent variable.
However, it is typically only used when you have three or more independent, unrelated groups, since
an independent t-test is more commonly used when you have just two groups.

Assumptions

The one-way ANOVA has six assumptions. You cannot test the first three of these assumptions with
Minitab because they relate to your study design and choice of variables. However, you should check
whether your study meets these three assumptions before moving on. If these assumptions are not met,
there is likely to be a different statistical test that you can use instead. Assumptions 1, 2 and 3 are
explained below:

o Assumption 1: The dependent variable should be measured on a continuous level (i.e., it is


an interval or ratio variable).

o Assumption 2: The independent variable should consist of two or more categorical,


independent (unrelated) groups.

o Assumption 3: You should have independence of observations, which means that there is no


relationship between the observations in each group or between the groups themselves.

These assumptions cannot be tested using the software and has to be ensured by the researcher.

Assumptions 4, 5 and 6 relate to the nature of your data and can be checked using Minitab.

o Assumption 4: There should be no significant outliers. It is possible to test this assumption using
a simple box plot (see chapter on Graphs using Minitab). A box plot in Minitab will indicate all
those observations that fall outside the 3-sigma limits. These are outliers and can be removed
from the dataset before proceeding with the test.

o Assumption 4: The dependent variable should be approximately normally distributed for each


group of the independent variable. You can test for normality using the Anderson darling test
for normality using Minitab

o Assumption 6: There needs to be homogeneity of variances. You can test this assumption in
Minitab using Levene's test for homogeneity of variances.

Example:
An order was placed for a component that was to weigh 50 milligrams with a tolerance of +/- 1.5
milligrams. 4 vendors were asked to provide 50 samples of the components so that they could be
compared and the order could be placed on one of them. The data from the samples is shown:
Vendor A Vendor B Vendor C Vendor D
49.26 49.36 47.33 52.20 51.24 50.70 48.96 48.87
49.75 47.99 49.31 46.69 49.98 50.97 48.78 49.47
49.57 48.33 54.19 52.04 50.75 51.06 49.44 49.59
49.17 49.37 51.34 50.92 50.60 51.63 49.36 48.93
49.53 50.16 51.06 50.55 50.67 51.97 49.24 48.17
49.87 49.56 48.29 49.09 50.68 50.48 49.24 49.04
49.92 48.32 48.70 52.24 51.18 50.43 49.56 49.30
48.10 48.07 51.97 52.44 50.84 51.31 48.53 49.74
50.90 49.35 52.08 47.59 52.56 51.07 49.35 48.56
49.76 48.60 52.03 49.67 51.66 50.16 48.66 48.65
51.07 48.50 52.84 51.32 51.14 50.80 49.30 48.73
48.95 48.08 51.08 50.04 51.76 50.21 48.61 49.53
49.98 48.37 50.79 49.30 51.19 50.03 49.51 48.59
51.17 48.39 51.27 51.07 51.47 50.58 48.69 48.94
48.40 48.89 48.96 46.20 51.10 51.60 48.93 49.07
48.97 46.41 47.11 51.13 50.68 51.42 49.34 49.11
47.32 48.16 47.11 50.16 50.91 51.70 49.20 48.61
49.87 50.72 52.09 49.44 51.27 49.98 49.03 48.49
49.55 50.12 48.50 49.42 51.26 51.30 48.60 49.60
49.67 50.57 48.33 48.61 50.98 50.76 49.67 49.42
48.05 49.79 47.13 48.65 51.11 51.50 47.84 47.82
49.70 48.36 50.35 50.19 51.03 50.84 48.87 48.72
49.34 49.27 50.35 49.07 51.52 51.21 48.43 48.59
49.22 50.34 49.09 50.06 51.14 50.92 49.22 48.92
50.27 49.39 51.09 49.55 50.62 51.64 48.59 48.68

Analyse the data and determine whether the average weight of components from each vendor is equal
to each other.

Note: This problem is similar to the previous problem as shown in 2 sample t- test. However, the t-test is
limited to a maximum of 2 samples. In this case, since there are 4 samples, ANOVA will be used for
hypothesis testing.

The hypotheses are as follows:

Null Hypothesis (H0): μA = μB = μC = μD


Alternative Hypothesis (Ha): At least one of the means is significantly different from the others.

Setup in Minitab
We will be comparing the weight of components between Vendor A, Vendor B, Vendor C and Vendor D.
We will use a data set assuming that each data set is normally distributed with equal variances. The
In Minitab, we set up each variable, under separate columns “C1, C2, C3, and C4”. Then, we enter the
scores on all the variables into the respective columns.

Test in Minitab
1. Click Stat → ANOVA → One-Way…

A new window named “One-Way Analysis of Variance” opens up.

2. From the drop down list select “Response data are in separate column for each factor level”.
Click in the blank box under “Responses” and the “Vendor A”, “Vendor B”, “Vendor C” and
“Vendor D” appears in the list box on the left.
3. Select “Vendor A”, “Vendor B”, “Vendor C” and “Vendor D” as the “Samples.”
4. Click options.

Set the required confidence level. Check the box that says “Assume Equal Variances”
5. Click “OK” to save, and click “OK” again to run the test.

Results of ANOVA:
The results for our study of how to run ANOVA in Minitab appear automatically in the session window
after clicking “OK.” Minitab’s output is below.
The initial part of the output gives information about the Method used (the two hypotheses,
significance level, etc.) and the factors and levels. Here we see that there is only one factor and 4 levels.

The “Analysis of Variance” section gives us the p-value which can be compared against the level of
significance (alpha). Here, we see that the p-value is 0.000 which is less than alpha. As brought out
earlier, when p-value < alpha, we reject the null hypothesis. In other words, the sample data gives very
little evidence (0.000) that the weights of components from the 4 vendors have the same average.

To determine how well the model fits your data, examine the goodness-of-fit statistics in the model
summary table.
S is measured in the units of the response variable and represents the how far the data values fall from
the fitted values. The lower the value of S, the better the model describes the response.

R2 is the percentage of variation in the response that is explained by the model. The higher the R 2 value,
the better the model fits your data. R 2 is always between 0% and 100%.

However, despite the values of S (ideally low) and R 2 (ideally high), it does not indicate that the model
meets the model assumptions. You should check the residual plots to verify the assumptions.

Use predicted R2 to determine how well your model predicts the response for new observations. Models
that have larger predicted R2 values have better predictive ability.

A predicted R2 that is substantially less than R2 may indicate that the model is over-fit.

We also see an interval plot as part of the results:

Use the interval plot to display the mean and confidence interval for each group.

The interval plots show the following:

 Each dot represents a sample mean.


 Each interval is a 95% confidence interval for the mean of a group. You can be 95% confident
that a group mean is within the group's confidence interval.

You might also like