Sigma Plot Statistics User Guide
Sigma Plot Statistics User Guide
iii
SigmaPlot Statistics
iv
5.5.6 Multiple Comparison Options for a One Way ANOVA............................ 85
5.5.7 Interpreting One Way ANOVA Results................................................... 86
5.5.8 One Way ANOVA Report Graphs .......................................................... 89
5.6 Two Way Analysis of Variance (ANOVA) ........................................................ 91
5.6.1 About the Two Way ANOVA................................................................. 92
5.6.2 Performing a Two Way ANOVA ............................................................ 92
5.6.3 Arranging Two Way ANOVA Data ........................................................ 92
5.6.4 Setting Two Way ANOVA Options ........................................................ 96
5.6.5 Running a Two Way ANOVA ...............................................................101
5.6.6 Multiple Comparison Options for a Two Way ANOVA ..........................103
5.6.7 Performing a One Way ANOVA on Two Way ANOVA Data ..................106
5.6.8 Interpreting Two Way ANOVA Results .................................................106
5.6.9 Two Way ANOVA Report Graphs.........................................................112
5.7 Three Way Analysis of Variance (ANOVA) .....................................................113
5.7.1 About the Three Way ANOVA..............................................................114
5.7.2 Performing a Three Way ANOVA .........................................................114
5.7.3 Arranging Three Way ANOVA Data .....................................................114
5.7.4 Setting Three Way ANOVA Options .....................................................118
5.7.5 Running a Three Way ANOVA .............................................................122
5.7.6 Multiple Comparison Options for a Three Way ANOVA ........................123
5.7.7 Interpreting Three Way ANOVA Results ...............................................125
5.7.8 Three Way ANOVA Report Graphs.......................................................130
5.8 Kruskal-Wallis Analysis of Variance on Ranks.................................................131
5.8.1 About the Kruskal-Wallis ANOVA on Ranks .........................................131
5.8.2 Performing an ANOVA on Ranks .........................................................132
5.8.3 Arranging ANOVA on Ranks Data........................................................132
5.8.4 Setting the ANOVA on Ranks Options ..................................................133
5.8.5 Running an ANOVA on Ranks .............................................................136
5.8.6 Multiple Comparison Options for ANOVA on Ranks .............................139
5.8.7 Interpreting ANOVA on Ranks Results .................................................139
5.8.8 ANOVA on Ranks Report Graphs .........................................................142
5.9 Performing a Multiple Comparison .................................................................143
5.9.1 Holm-Sidak Test ..................................................................................144
5.9.2 Tukey Test...........................................................................................144
5.9.3 Student-Newman-Keuls (SNK) Test......................................................144
5.9.4 Bonferroni t-Test..................................................................................145
5.9.5 Fisher’s Least Significance Difference Test............................................145
5.9.6 Dunnett’s Test......................................................................................145
5.9.7 Dunn’s test ..........................................................................................145
5.9.8 Duncan’s Multiple Range .....................................................................145
6 Comparing Repeated Measurements of the Same Individuals......................................147
6.1 About Repeated Measures Tests......................................................................147
6.1.1 Parametric and Nonparametric Tests .....................................................147
6.1.2 Comparing Individuals Before and After a Single Treatment ..................147
6.1.3 Comparing Individuals Before and After Multiple Treatments ................148
6.2 Data Format for Repeated Measures Tests .......................................................148
6.2.1 Raw Data ............................................................................................149
6.2.2 Indexed Data .......................................................................................149
6.3 Paired t-Test ..................................................................................................150
6.3.1 Performing a Paired t-test .....................................................................150
6.3.2 Arranging Paired t-Test Data ................................................................150
6.3.3 Setting Paired t-Test Options ................................................................151
6.3.4 Running a Paired t-Test ........................................................................154
v
SigmaPlot Statistics
vi
7.3.3 Arranging z-test Data ...........................................................................215
7.3.4 Setting z-test Options ...........................................................................215
7.3.5 Running a z-Test..................................................................................217
7.3.6 Interpreting Proportion Comparison Results ..........................................218
7.4 Chi-square Analysis of Contingency Tables .....................................................220
7.4.1 About the Chi-Square Test....................................................................220
7.4.2 Performing a Chi-Square Test ...............................................................220
7.4.3 Arranging Chi-Square Data ..................................................................221
7.4.4 Setting Chi-Square Options ..................................................................222
7.4.5 Running a Chi-Square Test ...................................................................223
7.4.6 Interpreting Results of a Chi-Squared Analysis of Contingency
tables...............................................................................................225
7.5 The Fisher Exact Test .....................................................................................227
7.5.1 About the Fisher Exact Test ..................................................................228
7.5.2 Performing a Fisher Exact Test .............................................................228
7.5.3 Arranging Fisher Exact Test Data..........................................................228
7.5.4 Running a Fisher Exact Test .................................................................229
7.5.5 Interpreting Results of a Fisher Exact Test .............................................231
7.6 McNemar’s Test ............................................................................................233
7.6.1 About McNemar’s Test ........................................................................233
7.6.2 Performing McNemar’s Test.................................................................233
7.6.3 Arranging McNemar Test Data .............................................................233
7.6.4 Setting McNemar’s Options..................................................................234
7.6.5 Running McNemar’s Test.....................................................................236
7.6.6 Interpreting Results of McNemar’s Test ................................................237
7.7 Relative Risk Test ..........................................................................................239
7.7.1 About the Relative Risk Test.................................................................239
7.7.2 Performing the Relative Risk Test .........................................................239
7.7.3 Arranging Relative Risk Test Data ........................................................240
7.7.4 Setting Relative Risk Test Options ........................................................240
7.7.5 Running the Relative Risk Test .............................................................241
7.7.6 Interpreting Results of the Relative Risk Test.........................................243
7.8 Odds Ratio Test .............................................................................................244
7.8.1 About the Odds Ratio Test....................................................................244
7.8.2 Performing the Odds Ratio Test ............................................................244
7.8.3 Arranging Odds Ratio Test Data ...........................................................244
7.8.4 Setting Odds Ratio Test Options ...........................................................245
7.8.5 Running the Odds Ratio Test ................................................................246
7.8.6 Interpreting Results of the Odds Ratio Test ............................................247
8 Prediction and Correlation ........................................................................................249
8.1 About Regression...........................................................................................249
8.1.1 Correlation ..........................................................................................250
8.1.2 Data Format for Regression and Correlation ..........................................250
8.2 Simple Linear Regression...............................................................................250
8.2.1 About the Simple Linear Regression .....................................................251
8.2.2 Performing a Linear Regression ............................................................251
8.2.3 Arranging Linear Regression data .........................................................252
8.2.4 Setting Linear Regression Options ........................................................252
8.2.5 Running a Linear Regression ................................................................257
8.2.6 Interpreting Simple Linear Regression Results.......................................257
8.2.7 Simple Linear Regression Report Graphs ..............................................264
8.3 Multiple Linear Regression.............................................................................264
8.3.1 About the Multiple Linear Regression ...................................................265
vii
SigmaPlot Statistics
viii
8.10Deming Regression ......................................................................................346
8.10.1 About Deming Regression ..................................................................346
8.10.2 Performing a Deming Regression........................................................346
8.10.3 Arranging Deming Regression Data ....................................................347
8.10.4 Setting Deming Regression Options ....................................................347
8.10.5 Running a Deming Regression ............................................................348
8.10.6 Interpreting Deming Regression Results ..............................................348
8.10.7 Deming Regression Result Graph........................................................349
9 Survival Analysis.....................................................................................................351
9.1 Five Survival Tests.........................................................................................351
9.2 Data Format for Survival Analysis ..................................................................352
9.2.1 Raw Data ............................................................................................352
9.2.2 Indexed Data .......................................................................................353
9.3 Single Group Survival Analysis ......................................................................354
9.3.1 Performing a Single Group Survival Analysis ........................................354
9.3.2 Arranging Single Group Survival Analysis Data ....................................354
9.3.3 Setting Single Group Test Options ........................................................354
9.3.4 Running a Single Group Survival Analysis ............................................356
9.3.5 Interpreting Single Group Survival Results ............................................358
9.3.6 Single Group Survival Graph................................................................360
9.4 LogRank Survival Analysis ............................................................................361
9.4.1 Performing a LogRank Analysis ...........................................................361
9.4.2 Arranging LogRank Survival Analysis Data ..........................................361
9.4.3 Setting LogRank Survival Options ........................................................361
9.4.4 Running a LogRank Survival Analysis ..................................................364
9.4.5 Interpreting LogRank Survival Results ..................................................368
9.4.6 LogRank Survival Graph......................................................................370
9.5 Gehan-Breslow Survival Analysis...................................................................371
9.5.1 Performing a Gehan-Breslow Analysis ..................................................371
9.5.2 Arrange Gehan-Breslow Survival Analysis Data ....................................371
9.5.3 Setting Gehan-Breslow Survival Options...............................................371
9.5.4 Running a Gehan-Breslow Survival Analysis.........................................374
9.5.5 Interpreting Gehan-Breslow Survival Results.........................................379
9.5.6 Gehan-Breslow Survival Graph ............................................................380
9.6 Cox Regression..............................................................................................381
9.6.1 About Cox Regression .........................................................................382
9.6.2 Performing a Cox Regression Proportional Hazards Model.....................383
9.6.3 Performing a Cox Regression Stratified Model ......................................384
9.6.4 Arranging Cox Regression Data............................................................384
9.6.5 Setting Cox Regression Proportional Hazards Options ...........................384
9.6.6 Setting Cox Regression Stratified Model Options...................................387
9.6.7 Running a Cox Regression Proportional Hazards Model.........................389
9.6.8 Running a Cox Regression Stratified Model...........................................391
9.6.9 Interpreting Cox Regression Results......................................................393
9.6.10 Cox Regression Graphs ......................................................................395
9.6.11 How to Create a Cox Regression Graph ...............................................395
9.7 Survival Curve Graph Examples .....................................................................396
9.7.1 Using Test Options to Modify Graphs ...................................................396
9.7.2 Editing Survival Graphs Using the Property Browser .............................398
9.8 Failures, Censored Values, and Ties ................................................................399
10 Computing Power and Sample Size .........................................................................401
10.1 About Power................................................................................................401
10.2 About Sample Size.......................................................................................402
ix
SigmaPlot Statistics
x
1 Statistics
SigmaPlot Statistics provide a wide range of powerful yet easy to use statistical analyses
specifically designed to meet the needs of researchers, without requiring in-depth knowledge
of the math behind the procedures performed. The tests and features described in this user’s
manual include:
• Using the Advisor Wizard. For more information, see 2.1 Using the Advisor Wizard.
• Using SigmaPlot procedures. For more information, see 3 Using Statistical Procedures.
• Comparing two or more groups. For more information, see 5 Comparing Two or More
Groups.
• Comparing repeated measurements of the same individuals. For more information, see 6
Comparing Repeated Measurements of the Same Individuals.
• Comparing frequencies, rates, and proportions. For more information, see 7 Comparing
Frequencies, Rates, and Proportions.
• Prediction and correlation. For more information, see 8 Prediction and Correlation.
• Survival analysis. For more information, see 9 Survival Analysis.
• Computing power and sample size. For more information, see 10 Computing Power and
Sample Size.
• Generating report graphs. For more information, see 11.1 Generating Report Graphs.
1
2 The Advisor Wizard
Topics Covered in this Chapter
♦ Using the Advisor Wizard
Use the Advisor Wizard to help you to determine the appropriate test to use to analyze your
data.
2. When the Advisor Wizard appears, answer the questions about what you want to do and
the format of your data. Click Next to go to the next panel, Back to go to the preceding
panel, Finish to view the suggested test, or Cancel to close the Advisor Wizard.
3. After the Advisor Wizard suggests a test, click Run to perform the test. The Pick
Columns dialog box for the suggested test appears prompting you to select the worksheet
columns with the data you want to test. For more information, see 3.1.4 Selecting the
Data to Test.
The remainder of this section describes the answers for each dialog box.
3
SigmaPlot Statistics
or the distributions or proportions of different groups. Click Next. You are asked to describe
how your data is measured. For more information, see 2.1.2 How are the data measured?.
Predict a trend, find a correlation, or fit a curve. Select this option if you want to use
regression to predict a dependent variable from one or more independent variables, or describe
the strength of association between two variables with a correlation coefficient. For example,
select this option if you want to see if you can predict the average caloric intake of an animal
from its weight. Click Next. You are asked to describe how your data is measured. For more
information, see 2.1.2 How are the data measured?.
Determine the sample size for an experimental design. Select this option if you want to
determine the desired sample size for an experiment you intend to perform. Click Next. You
are asked to describe how your data is measured. For more information, see 2.1.2 How are
the data measured?.
Determine the sensitivity of an experimental design. Select this option to determine the
power or ability of a test to detect an effect for an experiment you want to perform. Click
Next. You are asked to describe how your data is measured. For more information, see 2.1.2
How are the data measured?.
Measure the strength of association between a treatment and an event. Select this option
if you want to measure the strength of association between a treatment or risk factor and
a specified event that occurs in members of a population. Click Next. You are asked if
your study is retrospective or prospective. For more information, see 2.1.11 Is your study
retrospective or prospective?.
4
2.1.2 How are the data measured?
By numeric values. Select By numeric values if your data are measured on a continuous
scale using numbers. Examples of numeric values include height, weight, concentrations,
ages, or any measurement where there is an arithmetic relationship between values.
• If you are comparing groups or treatments for differences, you are asked if you have
repeated observations on the same individuals. For more information, see 2.1.4 Did you
apply more than one treatment per subject?.
• If you are predicting a trend, you are prompted to select the type of prediction you want to
perform. For more information, see 2.1.7 What kind of prediction do you want to make?.
• If you are determining the sample size of or the sensitivity of an experimental design,
you are asked how many groups or treatments you have. For more information, see 2.1.5
How many groups or treatments are there?.
By order or rank. Select By order or rank if your data are measured on a rank scale that has
an ordering relationship, but no arithmetic relationship, between values.
For example, clinical status is often measured on an ordinal scale, such as: Healthy = 1;
Feeling ill = 2; Sick = 3; Hospitalized = 4; and Dead = 5. These ratings show that being
dead is worse than being healthy, but they do not indicate that being dead is five times worse
than being healthy.
• If you are comparing groups or treatments for differences, you are asked if you have
repeated observations on the same individuals. For more information, see 2.1.4 Did you
apply more than one treatment per subject?.
• If you are predicting a trend, click Finish. The Advisor suggests computing the Spearman
Rank Correlation. For more information, see 8.9 Spearman Rank Order Correlation.
By proportion or number of observations (for example, male vs. female). Select By
proportion or number of observations in categories if your data is measured on a nominal
scale, which counts the number or proportions that fall into categories, and where there is no
relationship between the categories (such as Democrat versus Republican).
• If you are comparing groups or treatments for differences, you are asked if you have
repeated observations on the same individuals. For more information, see 2.1.4 Did you
apply more than one treatment per subject?.
• If you are predicting a trend, click Finish. SigmaPlot suggests running a Multiple Logistic
Regression. Click Run to perform the test, Cancel to exit the Advisor Wizard and
return to the worksheet, or Help for information on the test. For more information, see
8.4 Multiple Logistic Regression.
• If you are determining a sample size or the sensitivity of a experimental group, you are asked
how your data is formatted. For more information, see 2.1.6 What kind of data do you have?.
By survival time. Select By survival time if you have measurements that correspond to the
time to an event. This event is typically a death but other events like the time to motor failure
or the time to vascular graph closure are equally valid.
• If you wish to describe your survival data’s statistics or are comparing survival groups for
significant differences, then you are asked if your data includes potential risk factors that
may affect survival times.
• If you are comparing survival groups for significant differences then you are asked whether
later survival times are less accurate.
5
SigmaPlot Statistics
6
2.1.5 How many groups or treatments are there?
Tip
Click Finish to view the suggested test, then Run to perform it. You can also click
Back to return to the previous dialog box, Cancel to return to the worksheet, or click
Help for information on using the Advisor Wizard.
Select one of the following:
One. Select this option if you have only one different experimental group. For more
information, see 4.1 One-Sample t-Test.
Two. Select this option if you have two different experimental groups or if your subjects
underwent two different treatments.
For example, if you are comparing differences in hormone levels between men and women,
or if you are measuring the change in individuals before and after a drug treatment, there
are two groups.
• If you are comparing two different groups on an arithmetic scale, SigmaPlot suggests the
independent t-test. For more information, see 5.3 Unpaired t-Test. You can read descriptions
of the results for this procedure. For more information, see 5.3.6 Interpreting t-Test Results.
• If you are determining sample size or power for a comparison of two groups on an arithmetic
scale, SigmaPlot suggests that you perform t-test sample size or power computations. For
more information, see 10.9 Determining the Minimum Sample Size for a t-Test. You can
also determine the power. For more information, see 10.3 Determining the Power of a t-Test.
• If you are comparing the same subjects undergoing two different treatments on an arithmetic
scale, SigmaPlot suggests performing the Paired t-test. For more information, see 6.3
Paired t-Test. You can also read descriptions of the results for this procedure. For more
information, see 6.3.5 Interpreting Paired t-Test Results.
• If you are determining sample size or power for a comparison of the same subjects
undergoing two treatments on an arithmetic scale, SigmaPlot suggests performing Paired
t-test sample size or power computations. For more information, see 10.10 Determining
the Minimum Sample Size for a Paired t-Test. You can also read directions on determining
power. For more information, see 10.4 Determining the Power of a Paired t-Test.
• If you are comparing two different groups on a rank scale, SigmaPlot suggests performing
the Mann-Whitney Rank Sum Test. For more information, see 5.4 Mann-Whitney Rank
Sum Test. You can also read descriptions of the results for this procedure. For more
information, see 5.4.6 Interpreting Rank Sum Test Results.
• If you are comparing the same subjects undergoing two different treatments on a rank scale,
SigmaPlot suggests performing the Wilcoxon Signed Rank Test. For more information,
see 6.4 Wilcoxon Signed Rank Test. You can also read descriptions of the results for this
procedure. For more information, see 6.4.6 Interpreting Signed Rank Test Results.
Three or more. Select this option if your group has three or more different groups to compare,
or are comparing the response of the same subjects to three or more different treatments.
For example, if you collected ethnic diversity data from five different cities, or subjected
individuals to a series of four dietary changes and measured change in serum cholesterol, you
are analyzing three or more groups.
• If you are comparing three or more different groups on an arithmetic scale, SigmaPlot
suggests performing One Way ANOVA. For more information, see 5.5 One Way Analysis
of Variance (ANOVA).
• If you are determining sample size or power for a comparison of three or more different
groups on an arithmetic scale, SigmaPlot suggests performing One Way ANOVA sample
size computations. For more information, see 10.12 Determining the Minimum Sample Size
7
SigmaPlot Statistics
for a One Way ANOVA. You can also perform power computations. For more information,
see 10.6 Determining the Power of a One Way ANOVA.
• If you are comparing the same subjects undergoing three or more different treatments on an
arithmetic scale, SigmaPlot suggests performing One Way Repeated Measures ANOVA. For
more information, see 6.5 One Way Repeated Measures Analysis of Variance (ANOVA).
You can also read descriptions of the results for this procedure. For more information, see
6.5.7 Interpreting One Way Repeated Measures ANOVA Results.
• If you are comparing three or more different groups on a rank scale, SigmaPlot suggests the
Kruskal-Wallis ANOVA on Ranks. For more information, see 5.8 Kruskal-Wallis Analysis
of Variance on Ranks. You can also read descriptions of the results for this procedure. For
more information, see 5.8.7 Interpreting ANOVA on Ranks Results.
• If you are comparing the same subjects undergoing three or more different treatments on a
rank scale, SigmaPlot suggests the Friedman Repeated Measures ANOVA on Ranks. For
more information, see 6.7 Friedman Repeated Measures Analysis of Variance on Ranks.
You can also read descriptions of the results for this procedure. For more information, see
6.7.7 Interpreting Repeated Measures ANOVA on Ranks Results.
There are two combinations of groups or treatments to consider (for example, males and
females from different cities). Select this option if each experimental subject is affected by
two different experimental factors or underwent two different treatments simultaneously. Note
that different levels of a factor, such as male and female for gender, are not considered to
be different factors.
For example, if you were comparing only males and females, you would have only one factor;
however, if you compared males and females from different countries, there would be two
factors, gender and nationality.
• If you are comparing three or more different groups on an arithmetic scale, SigmaPlot
suggests performing Two Way ANOVA.
• For more information, see 5.6 Two Way Analysis of Variance (ANOVA). You can also read
descriptions of the results for this procedure. For more information, see 5.6.8 Interpreting
Two Way ANOVA Results.
• If you are comparing the same subjects undergoing three or more repeated treatments on
an arithmetic scale, SigmaPlot suggests Two Way Repeated Measures ANOVA. Note that
either one or both factors can be repeated treatments. For more information, see 6.6 Two
Way Repeated Measures Analysis of Variance (ANOVA). You can also read descriptions
of the results for this procedure. For more information, see 6.6.7 Interpreting Two Way
Repeated Measures ANOVA Results.
There are three combinations of groups to consider. Select this option if each experimental
subject is affected by three different experimental factors or underwent three different
treatments simultaneously. Note that different levels of a factor, such as male and female for
gender, and Italian and German for nationalities are not considered to be different factors.
For example, if you are comparing only males and females, from Italy and Germany, you have
only two factors. However, if you are comparing males and females from different countries,
with different diets, there are three factors, gender, nationality, and diet.
If you select this option, SigmaPlot suggests you run a Three Way ANOVA. For more
information, see 5.7 Three Way Analysis of Variance (ANOVA).
This is a measure of the association between two variables. If you are determining power
or sample size, this option also appears. SigmaPlot suggests performing power or sample size
computations for a correlation coefficient.
8
2.1.6 What kind of data do you have?
9
SigmaPlot Statistics
Fit a curved line through the data. Select this answer to find an equation that predicts the
dependent variable from an independent variable without assuming a straight line relationship.
If you select to fit a curved line through your data, SigmaPlot asks you what kind of curve you
want to use. For more information, see 2.1.8 What kind of curve do you want to use?.
Predict a dependent variable from several independent variables. Select this option if you
want to predict a dependent variable from more than one independent variable using the linear
relationship k y=b0+b1x1+b2x2+b3x3+ ... bkxk where y is the dependent variable, x1, x2, x3...,
xk are the k independent variables, and b1, b2, b3..., bk are the regression coefficients. As the
values for xI vary, the corresponding value for y either increases or decreases proportionately.
If you select this option, SigmaPlot asks how you want to specify the independent variables.
For more information, see 2.1.9 How do you want to specify the independent variables?.
Measure the strength of association between pairs of variables. Select this option to find
how closely the value of one variable predicts the value of another (for example, the likelihood
that a variable increases or decreases when the other variable increases or decreases), without
specifying which is the dependent and independent variable.
If you select this option, click Finish. SigmaPlot suggests computing the Pearson Product
Moment Correlation.
10
2.1.10 How do you want SigmaPlot to select the independent variable?
variables are selected as columns from the worksheet when the regression procedure is
performed.
Select one of the following:
Include all selected independent variables in the equation. Select this option if you want
to compute a single equation using all independent variables you select for the equation,
regardless of whether they contribute significantly to predicting the dependent variable.
If you select this option, click Finish. SigmaPlot suggests performing a Multiple Linear
Regression. For more information, see 8.3 Multiple Linear Regression. You can also read
descriptions of the results for this procedure. For more information, see 8.3.6 Interpreting
Multiple Linear Regression Results.
Let SigmaPlot select the "best" variables to include in the equation. Select this option if
you want SigmaPlot to screen the potential independent variables you select and only include
ones that significantly contribute to predicting the dependent variable. You are then asked how
you want to select the independent variables. For more information, see 2.1.10 How do you
want SigmaPlot to select the independent variable?.
11
SigmaPlot Statistics
If you select this option, click Finish. SigmaPlot suggests the Best Subset Regression. For
more information, see 8.7 Best Subsets Regression. You can also read descriptions of the
results for this procedure. For more information, see 8.7.7 Interpreting Best Subset Regression
Results.
SigmaPlot selects the sets of independent variables that "best" predict the dependent variable
using criteria specified in the Best Subsets Regression Options dialog box.
12
3 Using Statistical Procedures
Topics Covered in this Chapter
♦ Running Procedures
♦ Choosing the Procedure to Use
♦ Describing Your Data with Basic Statistics
♦ Choosing the Group Comparison Test to Use
♦ Choosing the Repeated Measures Test to Use
♦ Choosing the Rate and Proportion Comparison to Use
♦ Choosing the Prediction or Correlation Method
♦ Choosing the Survival Analysis to Use
♦ Testing Normality
♦ Determining Experimental Power and Sample Size
The statistical procedure you use to analyze a given data set depends on the goals of your
analysis and the nature of your data. The Advisor Wizard asks you questions about your
goals and your data, then selects the appropriate test. For more information, see 2.1 Using
the Advisor Wizard.
1. Entering or importing and arranging your data appropriately in the worksheet. For more
information, see 3.1.1 Arranging Worksheet Data.
2. Determining and choosing the test you want to perform. For more information, see 3.1.2
Selecting a Test.
3. If desired, setting the test options using the selected test’s Options dialog box. For more
information, see 3.1.3 Setting Test Options.
4. Running the test by picking the worksheet columns with the data you want to test using
the Pick Columns dialog box. For more information, see 3.1.4 Selecting the Data to Test.
5. Viewing, generating, and interpreting, the test reports and graphs. For more information,
see 3.1.5 Reports and Result Graphs.
13
SigmaPlot Statistics
• Data format for rate and proportion tests. For more information, see 7.2 Data Format for
Rate and Proportion Tests.
1. Click the Analysis tab, and then click the Tests drop-down arrow.
You can configure almost all statistics procedures with a set of options. Use these settings to
perform additional tests and procedures. You may wish to enable or disable some of these
options or change assumption checking parameters; all changes are saved between sessions.
To change option settings before you run a test:
1. Select the test. For more information, see 3.1.2 Selecting a Test.
2. Click Options.
4. Click the tab of the options you want to view. Select a check box to include an option in
the test. Clear a check box if you do not want to use that test option.
14
3.1.4 Selecting the Data to Test
Figure 3.1 An Example of an Test Options Dialog Box. Each test has its own
settings.
5. To accept the current settings without continuing the test, click Apply. To close the dialog
box without changing any settings or running the test, click Cancel.
8. To accept the current settings without continuing the test, click Apply. To close the dialog
box without changing any settings or running the test, click Cancel.
When you run a test and if you can arrange your data in more than one format, use the Select
Data panel to select the worksheet columns with the data you want to test and to specify
how your data is arranged in the worksheet.
1. Select the appropriate format from the Data Format drop-down list, then click Next.
15
SigmaPlot Statistics
If the test you are running uses only one type of data format, the Select Data panel appears
prompting you to select the columns with the data you want to test (see the following step).
2. To assign the desired worksheet columns to the Selected Columns list, select the
columns in the worksheet, or select the columns from the Data drop-down list.
The dialog box indicates the type of data you are selecting.
The first selected column is assigned to the first entry in the Selected Columns list, and all
successively selected columns are assigned to successive entries in the list. The number
or title of selected columns appear in each entry. The number of columns you can select
depends on the test you are running and the format of your data.
3. To change your selections, select the assignment in the list, then select new column
from the worksheet. You can also clear a column assignment by double-clicking it in the
Selected Columns list.
4. If you are running a Forward or Backward Stepwise Regression, click Next.
The Select Data dialog box appears. For more information, see 8.6 Stepwise Linear
Regression.
5. Click Finish to perform the test on the data in the selected columns.
After the computations are completed, the report appears. For more information, see
3.1.6 Repeating Tests.
1. Make sure the report is the active window. If it isn’t, click the Report tab.
2. Click Create Result Graph.
The Create Result Graph dialog box appears, from which you can select an available
Graph Type and create a graph.
SigmaPlot does not create graphs for rates and proportion tests, best subset and
incremental polynomial regression reports and normality reports.
Note
If you close a report without generating or saving a graph, the graph is not
recoverable.
16
3.1.5.1 Editing, Saving, and Opening Reports and Graphs
Repeating a test involves running the last test you performed, using the same worksheet
columns.
To repeat a test using new data columns:
1. Click the Analysis tab, and then click Run. For more information, see 3.1 Running
Procedures.
3. Make sure the last test you performed is displayed in the drop-down list in the Statistics
group.
4. If desired, edit the data in the columns used by the test. You can add data and change
values and column titles.
5. To change the option settings before you rerun the test, click Options, change the
desired options, then click OK to accept the changes and close the dialog box.
6. Click Run.
The Select Data panel box appears with the columns used in the last procedure selected.
7. Click Finish to repeat the procedure using these columns. After the computations are
complete, a new report appears.
17
SigmaPlot Statistics
• Use repeated measures comparisons to test the differences in the same individuals before
and after one or more treatments or changes in condition. For more information, see 6.1
About Repeated Measures Tests.
• Use rate and proportion analysis to compare the distribution of groups that are divided or
fall into different categories or classes (for example, male versus female, or reaction versus
no reaction). For more information, see 7.1 About Rate and Proportion Tests.
• Use survival to determine statistics about the time to an event and to compare two or more
time-to-event data sets. For more information, see 9 Survival Analysis.
• Use power and sample size determination to calculate the sensitivity, or power, of an
experimental test, or to compute the experimental sample size required to achieve a desired
sensitivity. For more information, see 10 Computing Power and Sample Size.
• Use Odds Ratio or Relative Risk to measure the strength of association between some
event and a treatment or risk factor. For more information, see 10 Computing Power and
Sample Size.
Type of Experiment
Scale of Two groups Three or Same Same Predict a
Measurement of different more groups individuals individuals variable
individuals of different before and after multiple or find an
individuals after a single treatments association
treatment between
variables
Numeric, Unpaired One Way or Paired t-test One Way or Regression
normally t-test Two Way Two Way or Pearson
distributed ANOVA Repeated Product
with equal Measures Moment
variances ANOVA Correlation
By rank or Mann-Whitney Kruskall-Wallis Wilcoxon Friedman Spearman
order, or Rank Sum ANOVA on Signed Rank Repeated Rank Order
numeric, but Test Ranks Test Measures
non-normally ANOVA on
distributed Ranks
or with
unequal
variances
By Chi-Square Chi-Square McNemar’s Not Not
distribution Analysis of Analysis of Test Available Available
in different Contingency Contingency
categories Tables Tables
All statistical procedure commands are found on the Statistics group on the Analysis tab.
18
3.3 Describing Your Data with Basic Statistics
19
SigmaPlot Statistics
You select the statistics that you would like to calculate in the Descriptive Statistics Options
dialog box.
To change descriptive statistics test options:
20
3.3.2 Setting Descriptive Statistics Options
1. If you are going to run the test after changing test options, and want to select your data
before you run the test, drag the pointer over your data.
2. To open the Options for Descriptive Statistics dialog box, on the Analysis tab, click
Descriptive Statistics from the drop-down list in the Statistics group.
3. Click Options.
4. Clear any of the selected statistics settings you do not want to include in the report. For
more information, see 3.3.4 Descriptive Statistics Results.
The specific summary statistics that are appropriate for a given data set depend on the
nature of the data. If the observations are normally distributed, then the mean and
standard deviation provide a good description of the data. If not, then the median and
percentiles often provide a better description of the data.
5. To change the confidence interval, enter any number from 1 to 99 (95 and 99 are the
most commonly used intervals) into the Confidence Interval Mean box.
6. To change the percentile or confidence intervals computed, edit the values in the
Percentile box.
7. To select all statistics options, click Select All. To clear all selections, click Clear.
8. Click Run Test to perform the test with the selected options settings.
21
SigmaPlot Statistics
Tip
To set the number of decimal places displayed, click the Sigma Button, and
then click Options. In the Options dialog box, click the Report tab, and select
Number of significant digits.
If you want to select your data before you run the procedure, drag the pointer over your data.
To describe your data:
1. On the Analysis tab, in the Statistics group, click the Tests drop-down list, and then
select Describe Data.
The Descriptive Statistics - Select Data dialog box appears prompting you to specify
a data format.
Tip
If you selected columns before you chose the test, the selected columns
automatically appear in the Select Columns list.
2. To assign the desired worksheet columns to the Selected Columns list, select the
columns in the worksheet, or select the columns from the Data for Data drop-down list.
The first selected column is assigned to the first row in the Selected Columns list, and all
successively selected columns are assigned to successive rows in the list. The number
or title of selected columns appear in each row. You can select up to 64 columns of
data for the Descriptive Statistics Test.
22
3.3.4 Descriptive Statistics Results
3. To change your selections, select the assignment in the list, then select new column
from the worksheet. You can also clear a column assignment by double-clicking it in the
Selected Columns list.
4. Click Finish to describe the data in the selected columns. After the computations are
completed, the report appears.
23
SigmaPlot Statistics
24
3.3.5.1 Creating a Descriptive Statistics Result Graph
• Box plot of the percentiles and median of column data. The Descriptive Statistics test
box plot graphs the percentiles and the median of column data. For more information,
see 11.1.5 Box Plot.
The Create Result Graph dialog box appears displaying the types of graphs available for
the Descriptive Statistics report.
3. Select the type of graph you want to create from the Graph Type list and click OK. The
specified graph appears in a graph window or in the report.
Tip
You can also double-click the desired graph in the list.
25
SigmaPlot Statistics
26
3.4.2 When to Compare Many Groups
Tip
You can tell SigmaPlot to analyze your data and test for normal distribution and equal
variance. If assumptions of normality and equal variance are violated, the alternative
parametric or nonparametric test is suggested. Activate and configure assumption tests
in the t-test and Mann-Whitney Rank Sum Test Options dialog boxes.
SigmaPlot tests for normality using either the Shapiro-Wilk or the Kolmogorov-Smirnov test,
and for equal variance using the Levene Median test.
27
SigmaPlot Statistics
Analysis of variance techniques (both parametric and nonparametric) test the hypothesis of
no differences between the groups, but do not indicate what the differences are. You can use
the multiple comparison procedures (post-hoc tests) provided by SigmaPlot to isolate these
differences.
To always test for differences among the groups select Always Perform on the Post Hoc Tests
tab in the ANOVA options dialog boxes. For more information, see 5.5.4 Setting One Way
ANOVA Options. You can also specify to use multiple comparisons to test for a difference
28
3.5 Choosing the Repeated Measures Test to Use
only when the ANOVA P value is significant by selecting the Only When ANOVA P Value is
Significant option, then select the desired P value.
The specific multiple comparisons procedures to use for each ANOVA are selected in the
Multiple Comparison Options dialog box. To open:
29
SigmaPlot Statistics
• If your sample effects are not normally distributed, choose the Wilcoxon Signed Rank Test.
For more information, see 6.4 Wilcoxon Signed Rank Test. If your sample effects are not
normally distributed, choose the Wilcoxon Signed Rank Test.For more information, see
6.4 . The Wilcoxon Signed Rank Test arranges the data into sets of rankings, then performs
a Paired t-test on the sum of these ranks, rather than directly on the data.
• If your samples are already ordered according to qualitative ranks, such as poor, fair, good,
and very good, use the Wilcoxon Signed Rank Test.
The advantage of the paired t-test is that, assuming normality and equal variance, it is slightly
more sensitive (for example, has greater power) than the Wilcoxon Signed Rank Test. When
these assumptions are not met, the Wilcoxon Signed Rank Test is more reliable.
Tip
You can tell SigmaPlot to analyze your data and test for normality. If the assumption
of normality is violated, the alternative parametric or nonparametric test is suggested.
Assumption tests are activated and configured in the Paired t-Test and Wilcoxon
Options dialog boxes.
SigmaPlot tests for normality using either the Shapiro-Wilk or Kolmogorov-Smirnov test.
30
3.5.3 When to Use One and Two Way RM ANOVA
parametric or nonparametric test is suggested. These tests are specified in the repeated
measures one and two way and Friedman options dialog boxes.
SigmaPlot tests for normality using either the Shapiro-Wilk or Kolmogorov-Smirnov test, and
for equal variance using the Levene Median test.
Repeated measures analysis of variance techniques (both parametric and nonparametric) test
the hypothesis of no effect among treatments, but do not indicate which treatments have an
effect. You can use the multiple comparison procedures provided by SigmaPlot to isolate
the differences in effect.
To always test for differences among the groups, select Always Perform on the Post
Hoc Tests tab in the ANOVA options dialog boxes. You can also specify to use multiple
comparisons to test for a difference only when the ANOVA P value is significant by selecting
Only When ANOVA P Value is Significant, then select the desired P value.
31
SigmaPlot Statistics
Select the specific multiple comparisons procedures to use for each ANOVA under Multiple
Comparisons on the Post Hoc Tests tab on the Options for ANOVA Options dialog box.
To open:
1. Select the appropriate test from the test drop-down list in the Statistics group on the
Analysis tab.
2. Click Options.
32
3.7 Choosing the Prediction or Correlation Method
33
SigmaPlot Statistics
• Use Best Subset Regression to evaluate all possible models of the regression equation, and
identify those with the best predictive ability (according a to specified criterion). For more
information, see 8.7 Best Subsets Regression.
Tip
You can use these procedures to find Multiple Linear Regression models. Choose
Polynomial or Nonlinear Regression for curved data sets.
34
3.9 Testing Normality
censored values with large survival times provide an example of this situation. For more
information, see 9.5 Gehan-Breslow Survival Analysis.
• Use Cox Regression to study the impact of potential risk factors on survival time. For a
single group, use the Proportional Hazards model. For multiple groups, use the Stratified
model. For more information, see 9.6 Cox Regression.
1. Enter, transform, or import the data to be tested for normality into data worksheet columns.
2. If desired, set the P value used to pass or fail the data on the Report tab of the Options
dialog box. For more information, see 3.9.3 Setting the P Value for the Normality Test.
3. Click the Analysis tab.
4. In the Statistics group, select Normality from the Tests drop-down list.
35
SigmaPlot Statistics
The Shapiro-Wilk and Kolmogorov-Smirnov tests use a P value to determine whether the data
passes or fails. Set this P value on the Report tab of the Options dialog box.
To set the P value for the Normality test:
36
3.9.5 Running a Normality Test
To run a Normality test, you need to select the data to test. Use the Pick Columns dialog is
used to select the worksheet columns with the data you want to test.
To run a Normality test:
1. If you want to select your data before you run the test, drag the pointer over your data.
2. Click the Analysis tab, and then in the Statistics group, select Normality from the Tests
drop-down list.
The Normality - Select Data dialog box appears. If you selected columns before you
chose the test, the selected columns automatically appear in the Selected Columns list.
3. To assign the desired worksheet columns to the Selected Columns list, select the
columns in the worksheet, or select the columns from the Data for Data drop-down list.
The first selected column is assigned to the first row in the Selected Columns list, and all
successively selected columns are assigned to successive rows in the list. The number or
title of selected columns appear in each row. You can select up to 64 columns of data
for the Normality test.
4. To change your selections, select the assignment in the list, then select new column
from the worksheet. You can also clear a column assignment by double-clicking it in the
Selected Columns list.
5. Click Finish to describe the data in the selected columns. After the computations are
completed, the report appears. For more information, see the SigmaPlot 12 User’s Guide.
37
SigmaPlot Statistics
3.9.6.2 P Values
The P values represent the observations for normality using either the Shapiro-Wilk or
Kolmogorov-Smirnov test. If the P computed by the test is greater than the P set in the
appropriate Report Options dialog, your data can be considered normal.
1. Click the Report tab, and then in the Result Graphs group, click Create Result Graph.
The Create Graph dialog box appears displaying the types of graphs available for the
Normality report.
2. Select the type of graph you want to create from the Graph Type list, then click OK. For
more information, see 11.1 Generating Report Graphs. The specified graph appears in a
graph window or in the report. For more information, see Creating and Modifying Graphs.
38
3.10 Determining Experimental Power and Sample Size
3. Select the type of graph you want to create from the Graph Type list, then click OK. The
specified graph appears in a graph window or in the report. For more information, see
11.1 Generating Report Graphs.
1. Click the Analysis tab, and then in the Statistics group, select Power from the Tests
drop-down list.
2. When the Power dialog box appears, specify the remaining parameters of the data. For
more information, see 10 Computing Power and Sample Size.
1. Click the Analysis tab, and then in the Statistics group, select Sample Size from the
Tests drop-down list.
2. When the Sample Size dialog box appears, specify the power and the remaining
parameters of the data. For more information, see 10 Computing Power and Sample Size.
39
4 Single Group Analysis
Topics Covered in this Chapter
♦ One-Sample t-Test
♦ One-Sample Signed Rank Test
1. Enter or arrange your data appropriately in the worksheet. For more information, see
4.1.3 Arranging One-Sample t-Test Data.
2. If desired, set the t-test options. For more information, see 4.1.4 Setting One-Sample
t-Test Data Options.
3. Click the Analysis tab, and then in the Statistics group, from the Tests drop-down list,
select:
Single Group→One-Sample t-test
4. Run the test. For more information, see 4.1.5 Running a One-Sample t-Test.
5. Generate report graphs. For more information, see 4.1.7 One-Sample t-Test Report
Graphs.
41
SigmaPlot Statistics
Tip
If you are going to run the test after changing test options, and want to select
your data before you run the test, drag the pointer over the column title to select
the data column.
Options settings are saved between SigmaPlot sessions.
3. To continue the test, click Run Test. The Pick Columns dialog box appears.
42
4.1.4.1 Options for One-Sample t-Test: Criterion
Test Mean. Enter the test, or hypothesized, population mean. This is the value that will be
compared to the computed mean. The default setting is 0.
Summary Table. Displays the number of observations for a column or group, the number of
missing values for a column or group, the average value for the column or group, the standard
deviation of the column or group, and the standard error of the mean for the column or group.
Confidence Intervals. Displays the confidence interval for the difference of the means.
To change the interval, enter any number from 1 to 99 (95 and 99 are the most commonly
used intervals).
Residuals in Column. Displays residuals in the report and to save the residuals of the test to
the specified worksheet column. Edit the number or select a number from the drop-down list.
43
SigmaPlot Statistics
Figure 4.1 The Options for One-Sample t-Test Dialog Box Displaying the
Summary Table, Confidence Intervals, and Residuals Options
Power. The power or sensitivity of a test is the probability that the test will detect a difference
between the groups if there is really a difference.
Use Alpha Value. Alpha (α) is the acceptable probability of incorrectly concluding that there
is a difference. The suggested value is α = 0.05. This indicates that a one in twenty chance
of error is acceptable, or that you are willing to conclude there is a significant difference
when P < 0.05.
Smaller values of α result in stricter requirements before concluding there is a significant
difference, but a greater possibility of concluding there is no difference when one exists.
Larger values of α make it easier to conclude that there is a difference, but also increase the
risk of reporting a false positive.
44
4.1.5 Running a One-Sample t-Test
Figure 4.2 The Options for One-Sample t-Test Dialog Box Displaying the Power
Option
If you want to select your data before you run the test, drag the pointer over your data.
1. Click the Analysis tab, and then in the Statistics group, from the Tests drop-down list,
select:
Single Group→One-Sample t-test
The Pick Columns for t-test dialog box appears prompting you to specify a data format.
Figure 4.3 The Pick Columns for One-Sample t-test Dialog Box Prompting You
to Specify a Data Format
2. Select the appropriate data format from the Data Format drop-down list. For more
information, see 4.1.3 Arranging One-Sample t-Test Data.
45
SigmaPlot Statistics
3. Click Next to pick the data columns for the test. If you selected columns before you chose
the test, the selected columns appear in the Selected Columns list.
4. To assign the desired worksheet columns to the Selected Columns list, select the
columns in the worksheet, or select the columns from the Data for Data drop-down list.
The first selected column is assigned to the first row in the Selected Columns list, and
all successively selected columns are assigned to successive rows in the list. The title of
selected columns appears in each row. For raw and indexed data, you are prompted to
select two worksheet columns. For statistical summary data you are prompted to select
three columns.
Figure 4.4 The Pick Columns for One-Sample t-test Dialog Box Prompting You
to Select Data Columns
5. To change your selections, select the assignment in the list, then select new column
from the worksheet. You can also clear a column assignment by double-clicking it in the
Selected Columns list.
6. Click Finish to run the t-test on the selected columns. After the computations are
completed, the report appears. For more information, see the SigmaPlot 12 User’s Guide.
46
4.1.6.1 Result Explanations
x µ
t= n
where
x = sample mean
By random sampling of the population, assuming the null hypothesis is true, this quantity
defines a random variable T, whose distribution is Student’s central T-distribution with n -1
degrees of freedom. The (two-sided) P-value for this test is computed as P(|T| > |t|), where P
denotes the probability distribution for T. This P-value is then compared to the significance
level α that is set by the user. If the value is less than σ, there is a significant difference
between the mean of the sampled population and μ.
The (1- σ)100% confidence interval for the true population mean is:
x± t ,n-1
where x and σ are defined above tα,n-1, and is the value that satisfies P(|T|> tα,n-1 ) = α.
In addition to the numerical results, expanded explanations of the results may also appear. You
can enable or disable this explanatory text in the Options dialog box. For more information,
see Setting Report Options.
Normality Test. Normality test results show whether the data passed or failed the test of the
assumption that the samples were drawn from a normal population and the P value calculated
by the test. All parametric tests require normally distributed source populations.
Summary Table. SigmaPlot can generate a summary table listing the size N for the sample,
number of missing values, means, standard deviations, and the standard error of the mean
(SEM). This result is displayed unless you disable Summary Table in the Options for t-test
dialog box.
• N (Size). The number of non-missing observations for that column or group.
• Missing. The number of missing values for that column or group.
• Mean. The average value for the column. If the observations are normally distributed
the mean is the center of the distribution.
• Standard Deviation. A measure of variability. If the observations are normally distributed,
about two-thirds will fall within one standard deviation above or below the mean, and about
95% of the observations will fall within two standard deviations above or below the mean.
• Standard Error of the Mean. A measure of the approximation with which the mean
computed from the sample approximates the true population mean.
47
SigmaPlot Statistics
The Create Result Graph dialog box appears displaying the types of graphs available
for the One-Sample t-Test results.
Figure 4.5 The Create Graph Dialog Box for the One-Sample t-test Report
3. Select the type of graph you want to create from the Graph Type list, then click OK, or
double-click the desired graph in the list.
The selected graph appears in a graph window. For more information, see 11.1 Generating
Report Graphs.
48
4.2 One-Sample Signed Rank Test
1. Enter or arrange your data appropriately in the worksheet. For more information, see 4.2.3
Arranging One-Sample Signed Rank Test Data.
2. If desired, set the t-test options. For more information, see 4.2.4 Setting One-Sample
Signed Rank Test Options.
3. Click the Analysis tab, and then in the Statistics group, from the Tests drop-down list,
select:
Single Group→One-Sample Signed Rank
4. Run the test. For more information, see 4.2.5 Running a One-Sample Signed Rank Test.
49
SigmaPlot Statistics
Tip
If you are going to run the test after changing test options, and want to select
your data before you run the test, drag the pointer over the column title to select
the data column.
Options settings are saved between SigmaPlot sessions.
3. To continue the test, click Run Test. The Pick Columns dialog box appears.
50
4.2.4.3 Options for One-Sample Signed Rank: Results
Summary Table. Select to place a summary table in the report. This table displays the number
of observations for the group, the number of missing values, the computed median value, and
percentiles. Text boxes are available to enter two percentile values for the data. By default, the
summary table check box is selected and the percentile values are given as 25% and 75%.
Confidence Intervals. Select to display the confidence interval for the population median.
The confidence level can be any number from 1 to 99 (95 and 99 are the most commonly
used). By default, the check box is selected and the confidence level is set to 95%.
Yates Correction Factor. When the sample size exceeds 20, the normal distribution is used to
approximate the P-value for the test. The P-value is smaller than it should be since the actual
distribution for the test statistic is discrete whereas the normal distribution is continuous. The
Yates continuity correction adjusts the statistic to compensate for this discrepancy.
Click the selected check box to turn the Yates Correction Factor on or off.
For descriptions of the derivation of the Yates correction, you can reference any appropriate
statistics reference.
If you want to select your data before you run the test, drag the pointer over your data.
1. Click the Analysis tab, and then in the Statistics group, from the Tests drop-down list,
select:
Single Group→One-Sample Signed Rank
The One-Sample Signed Rank Test - Select Data dialog box appears prompting you to
select one column of data to test.
2. To assign the desired worksheet columns to the Selected Columns list, select the
column in the worksheet, or select the column from the Data for Data drop-down list.
3. To change your selection, select the assignment in the list, then select new column
from the worksheet. You can also clear a column assignment by double-clicking it in the
Selected Columns list.
4. Click Finish to run the test. After the computations are completed, the report appears.
For more information, see the SigmaPlot 12 User’s Guide.
51
SigmaPlot Statistics
After clicking Finish in the Test Wizard, the normality test specified in Test Options tests
the normality of the data. If the normality test passes (the data is consistent with a normal
distribution), a message box appears with the option to select the one-sample t-test as an
alternate test. If you select this option, a report for the one-sample t-test appears instead of
the report for the one-sample signed rank test. Likewise, if the normality test fails when
running the one-sample t-test, a message box appears giving you the option to switch to
the one-sample signed rank test.
In addition to the numerical results, expanded explanations of the results may also appear. You
can enable or disable this explanatory text in the Options dialog box. For more information,
see the SigmaPlot 12 User’s Guide.
Normality Test. Normality test results show whether the data passed or failed the test of the
assumption that the samples were drawn from a normal population and the P value calculated
by the test.
Summary Table. A single-line summary table of the basic statistics for the input data. The
name of the group column, the number of cases and missing values, the sample median, and
the lower and upper percentiles.
Hypothesized Population Median. The value of the Test Median that was entered on the
Criterion tab of the Options for One Sample Signed Rank Test dialog box.
Test Statistics and P-Values. Values of the rank sums, their difference W, the Z-statistic
(normal approximation), and the estimated and exact significance probabilities (the P-value).
The exact P-value is based on the Wilcoxon distribution. The estimated P-value is based on
the normal approximation to the Wilcoxon distribution. The Yates correction is used to adjust
the estimated P-value. The sample size necessary for the normal approximation to hold varies
among sources, with most agreeing that the sample size should be at least 20. Both P-values
are reported if the number of values sample size that are different from the hypothesized
median is 20 or less. Otherwise, only estimated P-value is reported.
Yates Correction. A statement indicating whether the Yates correction was used.
Confidence Interval. The lower and upper limits of the confidence interval of the population
median.
Interpretation of P-Value. An interpretation of the significance probability that differs
depending on whether the result is positive (significant) or not. The significance level is set on
the Assumption Checking tab of the Options for One Sample Signed Rank Test dialog
box. The default significance level is .05.
52
5 Comparing Two or More Groups
Topics Covered in this Chapter
♦ About Group Comparison Tests
♦ Data Format for Group Comparison Tests
♦ Unpaired t-Test
♦ Mann-Whitney Rank Sum Test
♦ One Way Analysis of Variance (ANOVA)
♦ Two Way Analysis of Variance (ANOVA)
♦ Three Way Analysis of Variance (ANOVA)
♦ Kruskal-Wallis Analysis of Variance on Ranks
♦ Performing a Multiple Comparison
Use group comparison tests to compare random samples from two or more different groups
for differences in the mean or median values that cannot be attributed to random sampling
variation.
If you are comparing the effects of different treatments on the same individuals, use repeated
measures procedures. For more information, see 3.2 Choosing the Procedure to Use.
1. An Unpaired t-test (a parametric test). For more information, see 5.3 Unpaired t-Test.
2. A Mann-Whitney Rank Sum Test (a nonparametric test). For more information, see 5.4
Mann-Whitney Rank Sum Test.
53
SigmaPlot Statistics
Columns 1 and 2 are arranged as raw data. Columns 3, 4, and 5 are arranged as descriptive
statistics using the sample size, mean, and standard deviation. Columns 6 and 7 are arranged
as group indexed data, with column 6 as the factor column and column 7 as the data column.
54
5.2.1 Descriptive Statistics
55
SigmaPlot Statistics
Note
SigmaPlot tests accept messy and unbalanced data and do not require equal sample
sizes in the groups being compared. There are no problems associated with missing
data or uneven columns. Missing values are represented by empty cells or double
dashes (“–“). Text items may be considered missing also provided the test expects
only numeric values.
t-Tests and Rank Tests. The groups to be compared are always placed in two columns.
Paired t-tests and signed rank tests (both repeated measures tests) assume that the data for each
subject is in the same row.
One Way ANOVA and One Way ANOVA on Ranks. Data for each group is placed in
separate columns, with as many columns as there are groups. One way repeated measures
ANOVA and one way repeated measures ANOVA on ranks assume that the data for each
subject is in the same row. For more information, see 5.6.3 Arranging Two Way ANOVA Data.
Figure 5.2 Data Format for a Two Way ANOVA with Two Factor Indexed Data
Column 1 is the first factor column, column 2 is the second factor column, and column
3 contains the data.
Two way ANOVAs require two factor columns and one data column. Three Way ANOVAs
require three factor columns and one data column, and Repeated measures ANOVAs require
an additional subject column to identify the subject of the measurement.
The order of the rows containing the index and data does not matter, for example, they do
not have to be grouped or sorted by factor level or subject.
56
5.2.2.3 Statistical Summary Data
Note
If you are analyzing entire columns of data, the location in the worksheet of the
factor, subject, and data columns does not matter. For more information, see 5.2.2.2
Indexed Data.
Independent t-test and Mann-Whitney rank sum test. The group index is in a factor
column, and the corresponding data points to be compared are in a second column.
Paired t-test and Wilcoxon signed rank test. Repeated measures comparisons require an
additional subject index column, which indicates the subject for each level and data point.
One way ANOVA and Kruskall-Wallis ANOVA on ranks. The factor column contains the
group index, and the data column contains the corresponding data points. Indexed data for one
way ANOVA contains only two columns.
Two way ANOVA. Two factor columns are required for Two Way ANOVAs, one for each
level of the observation. Each data point should be represented by different combinations of
the factors. For example, the factors in a drug treatment test are Gender and Drug, and the
levels are Male/Female and Drug A/Drug B.
Three way ANOVA. Three factors are required for Three Way ANOVAs, one for each level
of observation. Each data point should be represented by different combinations of the factors.
For more information, see 5.7.3 .
Repeated measures ANOVA. These tests require an additional subject column, which
identifies the data points for each subject.
A Two Way Repeated Measures ANOVA requires both a subject column and two factor
columns, as well as a data column.
57
SigmaPlot Statistics
Tip
Depending on your t-test options settings, if you attempt to perform a t-test on
non-normal populations or populations with unequal variances, SigmaPlot will inform
you that the data is unsuitable for a t-test, and suggest the Mann-Whitney Rank Sum
Test instead. For more information, see 5.3.4 Setting t-Test Options.
1. Enter or arrange your data appropriately in the worksheet. For more information, see
5.3.3 Arranging t-Test Data.
2. If desired, set the t-test options. For more information, see 5.3.4 Setting t-Test Options.
3. Click the Analysis tab.
4. In the Statistics group, from the Tests drop-down list, select:
Compare Two Groups→t-test
5. Run the test. For more information, see 5.3.5 Running a t-Test.
6. Generate report graphs. For more information, see 5.3.7 t-Test Report Graphs.
Columns 1 and 2 are arranged as raw data. Columns 3, 4, and 5 are arranged as descriptive
statistics using the sample size, mean, and standard deviation. Columns 6 and 7 are arranged
as group indexed data, with column 6 as the factor column and column 7 as the data column.
58
5.3.4 Setting t-Test Options
1. Select t-test from the Select Test drop-down list in the Statistics group on the Analysis
tab.
2. Click Current Test Options.
The Options for t-test dialog box appears with three tabs:
• Assumption Checking. Adjust the parameters of a test to relax or restrict the testing of
your data for normality and equal variance. For more information, see 5.3.4.1 Options
for t-Test: Assumption Checking.
• Results. Display the statistics summary and the confidence interval for the data in the
report and save residuals to a worksheet column. For more information, see 5.3.4.2
Options for t-Test: Results.
• Post Hoc Tests. Compute the power or sensitivity of the test. For more information, see
5.3.4.3 Options for t-Test: Post Hoc Tests.
Tip
If you are going to run the test after changing test options, and want to select your
data before you run the test, drag the pointer over your data.
59
SigmaPlot Statistics
The normality assumption test checks for a normally distributed population. The equal
variance assumption test checks the variability about the group means.
Figure 5.4 The Options for t-test Dialog Box Displaying the Assumption
Checking Options
60
5.3.4.2 Options for t-Test: Results
Restriction
There are extreme conditions of data distribution that these tests cannot take into
account. For example, the Levene Median test fails to detect differences in variance of
several orders of magnitude; however, these conditions should be easily detected by
simply examining the data without resorting to the automatic assumption tests.
Figure 5.5 The Options for t-test Dialog Box Displaying the Summary Table,
Confidence Intervals, and Residuals Options
61
SigmaPlot Statistics
Figure 5.6 The Options for t-test Dialog Box Displaying the Power Option
If you want to select your data before you run the test, drag the pointer over your data.
Figure 5.7 The Pick Columns for t-test Dialog Box Prompting You to Specify a
Data Format
3. Select the appropriate data format (Raw or Indexed) from the Data Format drop-down
list. For more information, see 5.2 Data Format for Group Comparison Tests.
62
5.3.6 Interpreting t-Test Results
4. Click Next to pick the data columns for the test. If you selected columns before you chose
the test, the selected columns appear in the Selected Columns list.
5. To assign the desired worksheet columns to the Selected Columns list, select the
columns in the worksheet, or select the columns from the Data for Data drop-down list.
The first selected column is assigned to the first row in the Selected Columns list, and
all successively selected columns are assigned to successive rows in the list. The title of
selected columns appears in each row. For raw and indexed data, you are prompted to
select two worksheet columns. For statistical summary data you are prompted to select
three columns.
Figure 5.8 The Pick Columns for t-test Dialog Box Prompting You to Select
Data Columns
6. To change your selections, select the assignment in the list, then select new column
from the worksheet. You can also clear a column assignment by double-clicking it in the
Selected Columns list.
7. Click Finish to run the t-test on the selected columns. After the computations are
completed, the report appears.
63
SigmaPlot Statistics
64
5.3.6.1 Result Explanations
The standard error of the difference is a measure of the precision with which this difference
can be estimated.
You can conclude from "large" absolute values of t that the samples were drawn from different
populations. A large t indicates that the difference between the treatment group means is
larger than what would be expected from sampling variability alone (for example, that the
differences between the two groups are statistically significant). A small t (near 0) indicates
that there is no significant difference between the samples.
• Degrees of Freedom. Degrees of freedom represents the sample sizes, which affect the
ability of the t-test to detect differences in the means. As degrees of freedom (sample sizes)
increase, the ability to detect a difference with a smaller t increases.
• P Value. The P value is the probability of being wrong in concluding that there is a true
difference in the two groups (for example, the probability of falsely rejecting the null
hypothesis, or committing a Type I error, based on t). The smaller the P value, the greater
the probability that the samples are drawn from different populations. Traditionally, you can
conclude there is a significant difference when P < 0.05.
Confidence Interval for the Difference of the Means. If the confidence interval does not
include zero, you can conclude that there is a significant difference between the proportions
with the level of confidence specified. This can also be described as P < α (alpha), where α is
the acceptable probability of incorrectly concluding that there is a difference.
The level of confidence is adjusted in the Options for t-test dialog box; this is typically
100(1-α), or 95%. Larger values of confidence result in wider intervals and smaller values in
smaller intervals. For a further explanation of α, see Power below. This result is set Options
for t-test dialog box.
Power. The power, or sensitivity, of a t-test is the probability that the test will detect a
difference between the groups if there really is a difference. The closer the power is to 1,
the more sensitive the test.
t-test power is affected by the sample size of both groups, the chance of erroneously reporting
a difference, α (alpha), the difference of the means, and the standard deviation.
This result is set in the Options for t-test dialog box.
Alpha. Alpha (α) is the acceptable probability of incorrectly concluding that there is a
difference. An a error is also called a Type I error (a Type I error is when you reject the
hypothesis of no effect when this hypothesis is true).
The α value is set in the Options for t-test dialog box; a value of α = 0.05 indicates that a
one in twenty chance of error is acceptable, or that you are willing to conclude there is a
significant difference when P < 0.05.
Smaller values of a result in stricter requirements before concluding there is a significant
difference, but a greater possibility of concluding there is no difference when one exists (a
65
SigmaPlot Statistics
Type II error). Larger values of a make it easier to conclude that there is a difference but also
increase the risk of reporting a false positive (a Type I error).
The Create Graph dialog box appears displaying the types of graphs available for the
t-test results.
66
5.3.7.1 How to Create a Graph of the t-test Data
Figure 5.10 The Create Graph Dialog Box for the t-test Report
4. Select the type of graph you want to create from the Graph Type list, then click OK, or
double-click the desired graph in the list.
The selected graph appears in a graph window. For more information, see the SigmaPlot
12 User’s Guide.
67
SigmaPlot Statistics
68
5.4.1 About the Mann-Whitney Rank Sum Test
1. Enter or arrange your data appropriately in the worksheet. For more information, see
5.4.3 Arranging Rank Sum Data.
2. If desired, set the Rank Sum options. For more information, see 5.4.4 Setting
Mann-Whitney Rank Sum Test Options.
3. Click the Analysis tab.
4. In the Statistics group, from the Tests drop-down list, select:
Compare Two Groups→Rank Sum Test
5. Run the test. For more information, see 5.4.5 Running a Rank Sum Test.
6. Generate report graphs. For more information, see 5.4.7 Rank Sum Test Report Graphs.
Figure 5.12 Valid Data Formats for a Mann-Whitney Rank Sum Test
Columns 1 and 2 are arranged as raw data. Columns 3 and 4 are arranged as group indexed
data, with column 3 as the factor column.
69
SigmaPlot Statistics
Tip
If you are going to run the test after changing test options, and want to select your
data before you run the test, drag the pointer over your data.
The normality assumption test checks for a normally distributed population. The equal
variance assumption test checks the variability about the group means.
70
5.4.4.2 Options for Rank Sum Test: Results
71
SigmaPlot Statistics
Confidence Intervals. Displays the confidence interval for the difference of the means.
To change the interval, enter any number from 1 to 99 (95 and 99 are the most commonly
used intervals).
Figure 5.13 The Options for Rank Sum Test Dialog Box Displaying the Summary
Table Options
Yates Correction Factor. When a statistical test uses a α distribution with one degree of
freedom, such as analysis of a 2 x 2 contingency table or McNemar’s test, the α calculated
tends to produce P values which are too small, when compared with the actual distribution
of the α test statistic. The theoretical α distribution is continuous, whereas the distribution
of the α test statistic is discrete. Use the Yates Correction Factor to adjust the computed π2
value down to compensate for this discrepancy. Using the Yates correction makes a test more
conservative; for example, it increases the P value and reduces the chance of a false positive
conclusion. The Yates correction is applied to 2 x 2 tables and other statistics where the P
value is computed from a π2 distribution with one degree of freedom. For descriptions of the
derivation of the Yates correction, you can reference any appropriate statistics reference.
If you want to select your data before you run the test, drag the pointer over your data.
72
5.4.5 Running a Rank Sum Test
Figure 5.14 The Pick Columns for Rank Sum Test Dialog Box Prompting You to
Specify a Data Format
3. Select the appropriate data format from the Data Format drop-down list. For more
information, see 5.2 Data Format for Group Comparison Tests.
4. Click Next to pick the data columns for the test. If you selected columns before you chose
the test, the selected columns appear in the Selected Columns list.
Figure 5.15 The Pick Columns for Rank Sum Test Dialog Box Prompting You to
Select Data Columns
5. To assign the desired worksheet columns to the Selected Columns list, select the
columns in the worksheet, or select the columns from the Data for Data drop-down list.
73
SigmaPlot Statistics
The first selected column is assigned to the first row in the Selected Columns list, and
all successively selected columns are assigned to successive rows in the list. The title of
selected columns appears in each row. For raw and indexed data, you are prompted to
select two worksheet columns.
6. To change your selections, select the assignment in the list, then select new column
from the worksheet. You can also clear a column assignment by double-clicking it in the
Selected Columns list.
7. Click Finish to run the Rank Sum Test on the selected columns.
If you elected to test for normality and equal variance, SigmaPlot performs the test for
normality (Shapiro-Wilk or Kolmogorov-Smirnov) and the test for equal variance (Levene
Median). If your data pass both tests, SigmaPlot informs you and suggests continuing
your analysis using an unpaired t-test. For more information, see 6.3 Paired t-Test.
After the computations are completed, the report appears.
74
5.4.7 Rank Sum Test Report Graphs
by the test. For nonparametric procedures, this test can have failed, as nonparametric tests
do not assume normally distributed source populations. This result is set in the Options for
Rank Sum Test dialog box.
Equal Variance Test. Equal Variance test results display whether or not the data passed or
failed the test of the assumption that the samples were drawn from populations with the
same variance and the P value calculated by the test. Nonparametric tests do not assume
equal variance of the source populations. This result is set in the Options for Rank Sum
Test dialog box.
Summary Table. SigmaPlot generates a summary table listing the sample sizes N, number
of missing values, medians, and percentiles unless you disable the Display Summary Table
option in the Options for Rank Sum Test dialog box.
• N (Size). The number of non-missing observations for that column or group.
• Missing. The number of missing values for that column or group.
• Medians. The "middle" observation as computed by listing all the observations from
smallest to largest and selecting the largest value of the smallest half of the observations.
The median observation has an equal number of observations greater than and less than
that observation.
• Percentiles. The two percentile points that define the upper and lower tails of the observed
values.
• T Statistic. The T statistic is the sum of the ranks in the smaller sample group or from
the first selected group, if both groups are the same size. This value is compared to the
population of all possible rankings to determine the possibility of this T occurring.
• P Value. The P value is the probability of being wrong in concluding that there is a true
difference in the two groups (for example, the probability of falsely rejecting the null
hypothesis, or committing a Type I error, based on T). The smaller the P value, the greater
the probability that the samples are drawn from different populations.
Traditionally, you can conclude there is a significant difference when P < 0.05.
75
SigmaPlot Statistics
The Create Graph dialog box appears displaying the types of graphs available for the
Rank Sum Test results.
Figure 5.16 The Create Graph Dialog Box for the Rank Sum Test Report
5. Select the type of graph you want to create from the Graph Type list, then click OK, or
double-click the desired graph in the list.
The selected graph appears in a graph window. For more information, see 11.1 Generating
Report Graphs.
76
5.5 One Way Analysis of Variance (ANOVA)
Figure 5.17 A Box Plot of the Result Data for a Rank Sum Test
77
SigmaPlot Statistics
Note
Depending on your ANOVA options settings, if you attempt to perform an ANOVA
on non-normal populations or populations with unequal variances, SigmaStat informs
you that the data is unsuitable for a parametric test, and suggests the Kruskal-Wallis
ANOVA on Ranks. For more information, see 5.5.4 Setting One Way ANOVA
Options.
1. Enter or arrange your data appropriately in the worksheet. For more information, see
5.5.3 Arranging One Way ANOVA Data.
2. If desired, set One Way ANOVA options. For more information, see 5.5.4 Setting One
Way ANOVA Options.
3. Click the Analysis tab.
4. In the Statistics group, from the Tests drop-down list, select:
Compare Many Groups→One Way ANOVA
5. Run the test. For more information, see 5.5.5 Running a One Way ANOVA.
6. Generate report graphs. For more information, see 5.5.8 One Way ANOVA Report
Graphs.
Columns 1 through 3 are arranged as groups in columns. Columns 4, 5, and 6 are arranged
as descriptive statistics using the mean, standard deviation, and size. Columns 7 and 8 are
arranged as group indexed data, with column 7 as the factor column.
78
5.5.4 Setting One Way ANOVA Options
1. Select One Way ANOVA from the Select Test drop-down list in the Statistics group
on the Analysis tab.
2. Click Current Test Options. The Options for One Way ANOVA dialog box appears
with three tabs:
• Assumption Checking. Adjust the parameters of a test to relax or restrict the testing of
your data for normality and equal variance. For more information, see 5.5.4.1 Options
for One Way ANOVA: Assumption Checking.
• Results. Display the statistics summary and the confidence interval for the data in the
report and save residuals to a worksheet column. For more information, see 5.5.4.2
Options for One Way ANOVA: Results.
• Post Hoc Test. Compute the power or sensitivity of the test and enable multiple
comparisons. For more information, see 5.5.4.3 Options for One Way ANOVA: Post
Hoc Tests.
Tip
If you are going to run the test after changing test options, and want to select your
data before you run the test, drag the pointer over your data.
79
SigmaPlot Statistics
Figure 5.19 The Options for One Way ANOVA Dialog Box Displaying the
Assumption Checking Options
80
5.5.4.2 Options for One Way ANOVA: Results
Figure 5.20 The Options for One Way ANOVA Dialog Box Displaying the
Summary Table, Confidence Intervals, and Residuals Options
81
SigmaPlot Statistics
Figure 5.21 The Options for One Way ANOVA Dialog Box Displaying the Power
and Multiple Comparison Options
Multiple Comparisons
One-Way ANOVAs test the hypothesis of no differences between the several treatment groups,
but do not determine which groups are different, or the sizes of these differences. Multiple
comparison procedures isolate these differences. You can choose to always perform multiple
comparisons or to only perform multiple comparisons if a One Way ANOVA detects a
difference.
The P value used to determine if the ANOVA detects a difference is set on the Report tab of
the Options dialog box. If the P value produced by the One Way ANOVA is less than the P
value specified in the box, a difference in the groups is detected and the multiple comparisons
are performed. For more information, see the SigmaPlot 12 User’s Guide.
• Always Perform. Select to perform multiple comparisons whether or not the ANOVA
detects a difference.
• Only When ANOVA P Value is Significant. Select to perform multiple comparisons
only if the ANOVA detects a difference.
• Significance Value for Multiple Comparisons. Select either .05 or .01 from the
Significance Value for Multiple Comparisons drop-down list. This value determines the
that the likelihood of the multiple comparison being incorrect in concluding that there is
a significant difference in the treatments.
A value of .05 indicates that the multiple comparisons will detect a difference if there is less
than 5% chance that the multiple comparison is incorrect in detecting a difference.
Note
If multiple comparisons are triggered, the Multiple Comparison Options dialog box
appears after you pick your data from the worksheet and run the test, prompting you
to choose a multiple comparison method.
82
5.5.5 Running a One Way ANOVA
If you want to select your data before you run the test, drag the pointer over your data.
Figure 5.22 The Pick Columns for One Way ANOVA Dialog Box Prompting You
to Specify a Data Format
3. Select the appropriate data format from the Data Format drop-down list. For more
information, see 5.2 Data Format for Group Comparison Tests.
4. Click Next to pick the data columns for the test. If you selected columns before you chose
the test, the selected columns appear in the Selected Columns list.
83
SigmaPlot Statistics
Figure 5.23 The Pick Columns for One Way ANOVA Dialog Box Prompting You
to Select Data Columns
5. To assign the desired worksheet columns to the Selected Columns list, select the
columns in the worksheet, or select the columns from the Data for Data drop-down list.
The first selected column is assigned to the first row in the Selected Columns list, and
all successively selected columns are assigned to successive rows in the list. The title of
selected columns appears in each row. For raw and indexed data, you are prompted to
select two worksheet columns.
6. To change your selections, select the assignment in the list, then select new column
from the worksheet. You can also clear a column assignment by double-clicking it in the
Selected Columns list.
7. Click Finish to run the One Way ANOVA on the selected columns.
If you elected to test for normality and equal variance, SigmaPlot performs the test
for normality (Shapiro-Wilk or Kolmogorov-Smirnov) and the test for equal variance
(Levene Median).
If your data pass both tests, SigmaPlot informs you and suggests continuing your analysis
using a parametric t-test.
After the computations are completed, the report appears. For more information, see
the SigmaPlot 12 User’s Guide.
8. Click Finish to perform the One Way ANOVA.
If you elected to test for normality and equal variance, and your data fails either test,
SigmaPlot warns you and suggests continuing your analysis using the nonparametric
Kruskal-Wallis ANOVA on Ranks. For more information, see 5.8 Kruskal-Wallis
Analysis of Variance on Ranks.
If you selected to run multiple comparisons only when the P value is significant,
and the P value is not significant, the One Way ANOVA report appears after the test is
complete.
84
5.5.6 Multiple Comparison Options for a One Way ANOVA
If the P value for multiple comparisons is significant, or you selected to always perform
multiple comparisons, the Multiple Comparisons Options dialog box appears prompting
you to select a multiple comparison method.
85
SigmaPlot Statistics
86
5.5.7.1 Result Explanations
Summary Table. If you enabled this option in the Options for One Way ANOVA dialog box,
SigmaPlot generates a summary table listing the sample sizes N, number of missing values,
mean, standard deviation, differences of the means and standard deviations, and standard
error of the means.
• N (Size). The number of non-missing observations for that column or group.
• Missing. The number of missing values for that column or group.
• Mean. The average value for the column. If the observations are normally distributed
the mean is the center of the distribution.
• Standard Deviation. A measure of variability. If the observations are normally distributed,
about two-thirds will fall within one standard deviation above or below the mean, and about
95% of the observations will fall within two standard deviations above or below the mean.
• Standard Error of the Mean. A measure of the approximation with which the mean
computed from the sample approximates the true population mean.
Confidence Interval for the Difference of the Means. If the confidence interval does not
include zero, you can conclude that there is a significant difference between the proportions
with the level of confidence specified. This can also be described as P < α (alpha), where a is
the acceptable probability of incorrectly concluding that there is a difference.
The level of confidence is adjusted in the options dialog box; this is typically 100(1-a), or 95%.
Larger values of confidence result in wider intervals and smaller values in smaller intervals.
Power. The power of the performed test is displayed unless you disable this option in the
Options for One Way ANOVA dialog box.
The power, or sensitivity, of a One Way ANOVA is the probability that the test will detect a
difference among the groups if there really is a difference. The closer the power is to 1,
the more sensitive the test.
ANOVA power is affected by the sample sizes, the number of groups being compared, the
chance of erroneously reporting a difference a (alpha), the observed differences of the group
means, and the observed standard deviations of the samples.
Alpha. Alpha (α) is the acceptable probability of incorrectly concluding that there is a
difference. An a error is also called a Type I error. A Type I error is when you reject the
hypothesis of no effect when this hypothesis is true.
Set this valuein the Options for One Way ANOVA dialog box; the suggested value is α =
0.05 which indicates that a one in twenty chance of error is acceptable. Smaller values of a
result in stricter requirements before concluding there is a significant difference, but a greater
possibility of concluding there is no difference when one exists (a Type II error). Larger
values of a make it easier to conclude that there is a difference but also increase the risk
of seeing a false difference (a Type I error).
ANOVA Table. The ANOVA table lists the results of the one way ANOVA.
DF (Degrees of Freedom). Degrees of freedom represent the number of groups and sample
size which affects the sensitivity of the ANOVA.
• The degrees of freedom between groups is a measure of the number of groups
• The degrees of freedom within groups (sometimes called the error or residual degrees of
freedom) is a measure of the total sample size, adjusted for the number of groups
• The total degrees of freedom is a measure of the total sample size
SS (Sum of Squares). The sum of squares is a measure of variability associated with each
element in the ANOVA data table.
87
SigmaPlot Statistics
• The sum of squares between the groups measures the variability of the average differences
of the sample groups
• The sum of squares within the groups (also called error or residual sum of squares) measures
the underlying variability of all individual samples
• The total sum of squares measures the total variability of the observations about the grand
mean (mean of all observations)
MS (Mean Squares). The mean squares provide two estimates of the population variances.
Comparing these variance estimates is the basis of analysis of variance.
The mean square between groups is:
The mean square within groups (also called the residual or error mean square) is:
If the F ratio is around 1, you can conclude that there are no significant differences between
groups (for example, the data groups are consistent with the null hypothesis that all the
samples were drawn from the same population).
If F is a large number, you can conclude that at least one of the samples was drawn from a
different population (for example, the variability is larger than what is expected from random
variability in the population). To determine exactly which groups are different, examine
the multiple comparison results.
P Value. The P value is the probability of being wrong in concluding that there is a true
difference between the groups (for example, the probability of falsely rejecting the null
hypothesis, or committing a Type I error, based on F). The smaller the P value, the greater
the probability that the samples are drawn from different populations. Traditionally, you can
conclude that there are significant differences when P < 0.05.
Multiple Comparisons. If you selected to perform multiple comparisons, a table of the
comparisons between group pairs is displayed. The multiple comparison procedure is
activated in the Options for One Way ANOVA dialog box. The tests used in the multiple
comparison procedure is selected in the Multiple Comparison Options dialog box.
Multiple comparison results are used to determine exactly which treatments are different,
since the ANOVA results only inform you that two or more of the groups are different. The
specific type of multiple comparison results depends on the comparison test used and whether
the comparison was made pairwise or versus a control.
• All pairwise comparison results list comparisons of all possible combinations of group
pairs; the all pairwise tests are the Tukey, Student-Newman-Keuls, Fisher LSD, Duncan’s
test and the Bonferroni t-test.
88
5.5.8 One Way ANOVA Report Graphs
• Comparisons versus a single control group list only comparisons with the selected control
group. The control group is selected during the actual multiple comparison procedure.
The comparison versus a control tests are the Bonferroni t-test and the Dunnett’s, Fishers
LSD, and Duncan’s tests.
For descriptions of the derivations of parametric multiple comparison procedure results, you
can reference any appropriate statistics reference.
Bonferroni t-test Results. The Bonferroni t-test lists the differences of the means for each
pair of groups, computes the t values for each pair, and displays whether or not P < 0.05 for
that comparison. The Bonferroni t-test can be used to compare all groups or to compare
versus a control.
You can conclude from "large" values of t that the difference of the two groups being
compared is statistically significant.
If the P value for the comparison is less than 0.05, the likelihood of erroneously concluding
that there is a significant difference is less than 5%. If it is greater than 0.05, you cannot
confidently conclude that there is a difference.
The difference of the means is a gauge of the size of the difference between the two groups.
Tukey, Student-Newman-Keuls, Fisher LSD, Duncan’s, and Dunnett’s Test Results.
The Tukey, Student-Newman-Keuls (SNK), Fisher LSD, and Duncan’s tests are all pairwise
comparisons of every combination of group pairs. While the Tukey Fisher LSD, and Duncan’s
can be used to compare a control group to other groups, they are not recommended for this
type of comparison.
Dunnett’s test only compares a control group to all other groups. All tests compute the q test
statistic, and display whether or not P < 0.05 or < 0.01 for that pair comparison.
You can conclude from "large" values of q that the difference of the two groups being
compared is statistically significant.
If the P value for the comparison is less than 0.05, the likelihood of being incorrect in
concluding that there is a significant difference is less than 5%. If it is greater than 0.05, you
cannot confidently conclude that there is a difference.
The Difference of the Means is a gauge of the size of the difference between the two groups.
p is a parameter used when computing q. The larger the p, the larger q needs to be to indicate a
significant difference. p is an indication of the differences in the ranks of the group means
being compared. Groups means are ranked in order from largest to smallest, and p is the
number of means spanned in the comparison. For example, if you are comparing four means,
when comparing the largest to the smallest p = 4, and when comparing the second smallest to
the smallest p = 2. For the Tukey test, the p is always equal to the total number of groups.
If a group is found to be not significantly different than another group, all groups with p ranks
in between the p ranks of the two groups that are not different are also assumed not to be
significantly different, and a result of DNT (Do Not Test) appears for those comparisons.
89
SigmaPlot Statistics
• Scatter plot with error bars of the column means. The One Way ANOVA scatter plot
graphs the group means as single points with error bars indicating the standard deviation.
For more information, see 11.1.2 Scatter Plot.
• Histogram of the residuals. The One Way ANOVA histogram plots the raw residuals in a
specified range, using a defined interval set. For more information, see 11.1.8 Histogram of
Residuals.
• Normal probability plot of the residuals. The One Way ANOVA probability plot graphs
the frequency of the raw residuals. For more information, see 11.1.9 Normal Probability
Plot.
• Multiple comparison graphs. The One Way ANOVA multiple comparison graphs a plot
significant differences between levels of a significant factor. For more information, see
11.1.15 Multiple Comparison Graphs.
The Create Result Graph dialog box appears displaying the types of graphs available
for the One Way ANOVA results.
4. Select the type of graph you want to create from the Graph Type list, then click OK, or
double-click the desired graph in the list.
The selected graph appears in a graph window. For more information, see 11.1 Generating
Report Graphs.
90
5.6 Two Way Analysis of Variance (ANOVA)
91
SigmaPlot Statistics
1. Enter or arrange your data appropriately in the worksheet. For more information, see
5.6.3 Arranging Two Way ANOVA Data.
2. If desired, set Two Way ANOVA options. For more information, see 5.6.4 Setting Two
Way ANOVA Options.
3. Click the Analysis tab.
4. In the Statistics group, from the Tests drop-down list, select:
Compare Many Groups→Two Way ANOVA
5. Run the test. For more information, see 5.6.5 Running a Two Way ANOVA.
6. Generate report graphs. For more information, see 5.6.9 Two Way ANOVA Report
Graphs.
The factors are gender and drug, and the levels are Male/Female and Drug A/Drug B.
92
5.6.3.1 Indexing Raw Data for a Two-Way ANOVA
If your data is missing data points or even whole cells, SigmaPlot detects this and provides the
correct solutions. For more information, see 5.6.3.2 Missing Data and Empty Cells Data.
The Two-Way ANOVA test requires that the data be entered as indexed data. If your data is in
a raw format, you can use a transform to convert it into an indexed format and then run the
ANOVA.
In any Two-Way ANOVA, there are two factors, each divided into a number of levels. For
example, Gender could be one factor with two levels: male and female. Drug Treatment could
be another factor with three levels: Drug A, Drug B, Drug C.
Each combination of two levels, one from each factor, is called a cell. For example, all of the
data measured for males receiving Drug A would be a cell. When the data for each cell is
written into a column of the worksheet, this is known as a "raw data format" for Two-Way
ANOVA. The number of columns equals the number of cells. Since each column gives the data
for combining two factor levels, then the title of each column uses the names of the two levels.
The example above is a worksheet containing raw data for a Two-Way ANOVA. Note that the
title of each column is composed of two names separated by a hyphen. The names refer to
levels from different factors. There are six columns, and so there are six cells in the ANOVA.
To convert this data to Indexed format:
Tip
You can either select the columns from the worksheet, or you can select each
column individually from the Data for Group drop-down list.
5. Click Finish.
93
SigmaPlot Statistics
Figure 5.25 Data for a Two Way ANOVA with a Missing Value in the Male/Drug A
Cell
A general linear model approach is used in these situations.
Figure 5.26 Data for a Two Way ANOVA with a Missing Data Cell (Male/Drug A)
You can use either one factor analysis or assume no interaction between factors.
94
5.6.3.2.3 Connected versus Disconnected Data
The no interaction assumption does not always permit a two factor analysis when there is more
than one empty cell. The non-empty cells must be geometrically connected in order to do the
computation. You cannot perform Two Way ANOVAs on disconnected data.
Arrange data in a two-dimensional grid, where you can draw a series of straight vertical
and horizontal lines connecting all occupied cells, without changing direction in an empty
cell, is guaranteed to be connected.
Figure 5.27 Example of Drawing Straight Horizontal and Vertical Lines Through
Connected Data
It is important to note that failure to meet the above requirement does not imply that the data is
disconnected. The data in the table below, for example, is connected.
Figure 5.28 Example of Connected Data that You Can’t Draw a Series of Straight
Vertical and Horizontal Lines Through
SigmaPlot automatically checks for this condition. If disconnected data is encountered during
a Two Way ANOVA, SigmaPlot suggests treatment of the problem as a One Way ANOVA.
95
SigmaPlot Statistics
Because this data is not geometrically connected (the data shares no factor levels in common)
a two way ANOVA cannot be performed, even assuming no interaction.
96
5.6.4.1 Options Two Way ANOVA: Assumption Checking
1. If you are going to run the test after changing test options and want to select your data
before you run the test, drag the pointer over the data.
2. Select Two Way ANOVA from the Select Test drop-down list in the Statistics group
on the Analysis tab.
3. Click Current Test Options. The Options for Two Way ANOVA dialog box appears
with three tabs:
• Assumption Checking. Adjust the parameters of a test to relax or restrict the testing of
your data for normality and equal variance. For more information, see 5.6.4.1 Options
Two Way ANOVA: Assumption Checking.
• Results. Display the statistics summary and the confidence interval for the data in the
report and save residuals to a worksheet column. For more information, see 5.6.4.2
Options Two Way ANOVA: Results.
• Post Hoc Tests. Compute the power or sensitivity of the test. For more information,
see 5.6.4.3 Options Two Way ANOVA: Post Hoc Tests.
Tip
If you are going to run the test after changing test options, and want to select your
data before you run the test, drag the pointer over your data.
Options settings are saved between SigmaPlot sessions.
4. To continue the test, click Run Test. The Pick Columns dialog box appears. For more
information, see 5.6.5 Running a Two Way ANOVA.
5. To accept the current settings and close the options dialog box, click OK.
Select the Assumption Checking tab from the options dialog box to view the options for
Normality and Equal Variance. The normality assumption test checks for a normally
distributed population. The equal variance assumption test checks the variability about the
group means.
97
SigmaPlot Statistics
Figure 5.31 The Options for Two Way ANOVA Dialog Box Displaying the
Assumption Checking Options
Normality Testing. SigmaPlot uses the Shapiro-Wilk or Kolmogorov-Smirnov test to test for
a normally distributed population.
Equal Variance Testing. SigmaPlot tests for equal variance by checking the variability
about the group means.
P Values for Normality and Equal Variance. Enter the corresponding P value in the P
Value to Reject box. The P value determines the probability of being incorrect in concluding
that the data is not normally distributed (the P value is the risk of falsely rejecting the null
hypothesis that the data is normally distributed). If the P value computed by the test is greater
than the P set here, the test passes.
To require a stricter adherence to normality and equal variance, increase the P value.
Because the parametric statistical methods are relatively robust in terms of detecting violations
of the assumptions, the suggested value in SigmaPlot is 0.050. Larger values of P (for
example, 0.100) require less evidence to conclude that data is not normal.
To relax the requirement of normality and/or equal variance, decrease P. Requiring smaller
values of P to reject the normality assumption means that you are willing to accept greater
deviations from the theoretical normal distribution before you flag the data as non-normal.
For example, a P value of 0.050 requires greater deviations from normality to flag the data as
non-normal than a value of 0.100.
Restriction
There are extreme conditions of data distribution that these tests cannot take into
account. For example, the Levene Median test fails to detect differences in variance of
several orders of magnitude; however, these conditions should be easily detected by
simply examining the data without resorting to the automatic assumption tests.
98
5.6.4.2 Options Two Way ANOVA: Results
Figure 5.32 The Options for Two Way ANOVA Dialog Box Displaying the
Summary Table, Confidence Intervals, and Residuals Options
Summary Table. Select Summary Table under Report to display the number of observations
for a column or group, the number of missing values for a column or group, the average value
for the column or group, the standard deviation of the column or group, and the standard
error of the mean for the column or group.
Confidence Intervals. Select Confidence Intervals under Report to display the confidence
interval for the difference of the means. To change the interval, enter any number from 1 to 99
(95 and 99 are the most commonly used intervals).
Residuals in Column. The Residuals in Column drop-down list displays residuals in the
report and to save the residuals of the test to the specified worksheet column. Edit the number
or select a number from the drop-down list.
99
SigmaPlot Statistics
Power. The power or sensitivity of a test is the probability that the test will detect a difference
between the groups if there is really a difference.
Use Alpha Value. Change the alpha value by editing the number in the Alpha Value box.
Alpha (α) is the acceptable probability of incorrectly concluding that there is a difference. The
suggested value is α = 0.05. This indicates that a one in twenty chance of error is acceptable,
or that you are willing to conclude there is a significant difference when P < 0.05.
Smaller values of a result in stricter requirements before concluding there is a significant
difference, but a greater possibility of concluding there is no difference when one exists.
Larger values of a make it easier to conclude that there is a difference, but also increase the
risk of reporting a false positive.
Multiple Comparisons
Two Way ANOVAs test the hypothesis of no differences between the several treatment groups,
but do not determine which groups are different, or the sizes of these differences. Use multiple
comparisons to isolate these differences whenever a Two Way ANOVA detects a difference.
The P value used to determine if the ANOVA detects a difference is set in the Report tab of
the Options dialog box. If the P value produced by the Two Way ANOVA is less than the P
value specified in the box, a difference in the groups is detected and the multiple comparisons
are performed. For more information, see Setting Report Options.
• Always Perform. Select to perform multiple comparisons whether or not the Two Way
ANOVA detects a difference.
Only When ANOVA P Value is Significant. Perform multiple comparisons only if the
ANOVA detects a difference.
Significance Value for Multiple Comparisons. Select either .05 or .01. This value
determines the that the likelihood of the multiple comparison being incorrect in concluding
that there is a significant difference in the treatments. A value of .05 indicates that the
100
5.6.5 Running a Two Way ANOVA
multiple comparisons will detect a difference if there is less than 5% chance that the multiple
comparison is incorrect in detecting a difference. A value of .01 indicates that the multiple
comparisons will detect a difference if there is less than 1% chance that the multiple
comparison is incorrect in detecting a difference.
Note
If multiple comparisons are triggered, the Multiple Comparison Options dialog box
appears after you pick your data from the worksheet and run the test, prompting you
to choose a multiple comparison method. For more information, see 5.9 Performing
a Multiple Comparison.
If you want to select your data before you run the test, drag the pointer over your data.
Figure 5.33 The Pick Columns for Two ANOVA Dialog Box Prompting You to
Select Data Columns
3. If the P value for multiple comparisons is significant, or you selected to always perform
multiple comparisons, the Multiple Comparisons Options dialog box appears prompting
you to select a multiple comparison method. For more information, see 5.6.6 Multiple
Comparison Options for a Two Way ANOVA.
4. To assign the desired worksheet columns to the Selected Columns list, select the
columns in the worksheet, or select the columns from the Data for Data drop-down list.
101
SigmaPlot Statistics
The first selected column is assigned to the first row in the Selected Columns list, and all
successively selected columns are assigned to successive rows in the list. The number
or title of selected columns appear in each row. You are prompted to pick a minimum
three worksheet columns.
5. If the P value for multiple comparisons is significant, or you selected to always perform
multiple comparisons, the Multiple Comparisons Options dialog box appears prompting
you to select a multiple comparison method. For more information, see 5.6.6 Multiple
Comparison Options for a Two Way ANOVA.
6. To change your selections, select the assignment in the list, then select new column
from the worksheet. You can also clear a column assignment by double-clicking it in
the Selected Columns list.
7. Click Finish to perform the Two Way ANOVA.
• If your data is missing data points, missing cells, or is otherwise unbalanced, you are
prompted to perform the appropriate procedure.
• If you are missing data points, but still have at least one observation in each cell,
SigmaPlot automatically proceeds with the Two Way ANOVA using a general linear
model.
• If you are missing a cell, but the data is connected, you can proceed by either
performing a two way analysis assuming no interaction between the factor, or
converting the problem into a one way design with each non-empty cell a different
level of a single factor.
• If your data is not geometrically connected, you cannot perform a Two Way ANOVA.
Either treat the problem as a One Way ANOVA , or cancel the test.
10. If the P value for multiple comparisons is significant, or you selected to always perform
multiple comparisons, the Multiple Comparisons Options dialog box appears prompting
you to select a multiple comparison method. For more information, see 5.6.6 Multiple
Comparison Options for a Two Way ANOVA.
102
5.6.6 Multiple Comparison Options for a Two Way ANOVA
Figure 5.34 The Multiple Comparison Options Dialog Box for a Two Way ANOVA
This dialog box displays the P values for each of the two experimental factors and of the
interaction between the two factors. Only the options with P values less than or equal to the
value set in the Options dialog box are selected. You can disable multiple comparison testing
for a factor by clicking the selected option. If no factor is selected, multiple comparison
results are not reported.
There are seven multiple comparison tests to choose from for the Two Way ANOVA. You
can choose to perform the
• Holm-Sidak test. For more information, see 5.9.1 Holm-Sidak Test.
• Tukey Test. For more information, see 5.9.2 Tukey Test.
• Student-Newman-Keuls Test.
• Bonferroni t-test. For more information, see 5.9.4 Bonferroni t-Test.
• Fisher’s LSD. For more information, see 5.9.5 Fisher’s Least Significance Difference Test.
• Dunnet’s Test. For more information, see 5.9.6 Dunnett’s Test.
• Duncan’s Multiple Range Test. For more information, see 5.9.8 Duncan’s Multiple Range.
103
SigmaPlot Statistics
The Tukey and Student-Newman-Keuls tests are recommended for determining the
difference among all treatments. If you have only a few treatments, you may want to select the
simpler Bonferroni t-test.
The Dunnett’s test is recommended for determining the differences between the experimental
treatments and a control group. If you have only a few treatments or observations, you can
select the simpler Bonferroni t-test.
Figure 5.35 The Multiple Comparison Options Dialog Box Prompting You to
Select Control Groups
Note
In both cases the Bonferroni t-test is most sensitive with a small number of groups.
Dunnett’s test is not available if you have fewer than six observations.
There are two types of multiple comparison available for the Two Way ANOVA. The types
of comparison you can make depends on the selected multiple comparison test. For more
information, see 5.6 Two Way Analysis of Variance (ANOVA).
• All pairwise comparisons test the difference between each treatment or level within the two
factors separately (for example, among the different rows and columns of the data table)
• Multiple comparisons versus a control test the difference between all the different
combinations of each factors (for example, all the cells in the data table)
When comparing the two factors separately, the levels within one factor are compared among
themselves without regard to the second factor, and vice versa. These results should be used
when the interaction is not statistically significant.
When the interaction is statistically significant, interpreting multiple comparisons among
different levels of each experimental factor may not be meaningful. SigmaPlot also suggests
performing a multiple comparison between all the cells.
104
5.6.6.1 Holm-Sidak Test
The result of all comparisons is a listing of the similar and different group pairs, for example,
those groups that are and are not detectably different from each other. Because no statistical
test eliminates uncertainty, multiple comparison procedures sometimes produce ambiguous
groupings.
105
SigmaPlot Statistics
not attempt in controlling the error rate when detecting differences between groups, it is
not recommended.
Dunnett’s test is the analog of the Student-Newman-Keuls Test for the case of multiple
comparisons against a single control group. It is conducted similarly to the Bonferroni t-test,
but with a more sophisticated mathematical model of the way the error accumulates in order to
derive the associated table of critical values for hypothesis testing. This test is less conservative
than the Bonferroni Test, and is only available for multiple comparisons vs. a control.
The Duncan’s Test is the same way as the Tukey and the Student-Newman-Keuls tests, except
that it is less conservative in determining whether the difference between groups is significant
by allowing a wider range for error rates. Although it has a greater power to detect differences
than the Tukey and the Student-Newman-Keuls tests, it has less control over the Type 1
error rate, and is, therefore, not recommended.
When your data is missing too many observations to perform a valid Two Way ANOVA, you
can still analyze your data using a One Way ANOVA.
To perform a One Way ANOVA:
106
5.6.8.1 Result Explanations
Summary tables of least square means for each factor and for both factors together can also
be generated. This result and additional results are enabled in the Options for Two Way
ANOVA dialog box. For more information, see 5.6.4 Setting Two Way ANOVA Options.
Click a selected check box to enable or disable a test option. All options are saved between
SigmaPlot sessions.
You can also generate tables of multiple comparisons. Multiple Comparison results are also
specified in the Options for Two Way ANOVA dialog box. The tests used in the multiple
comparisons are selected in the Multiple Comparisons Options dialog box. For more
information, see 5.6.6 Multiple Comparison Options for a Two Way ANOVA.
107
SigmaPlot Statistics
Dependent Variable. This is the data column title of the indexed worksheet data you are
analyzing with the Two Way ANOVA. Determining if the values in this column are affected
by the different factor levels is the objective of the Two Way ANOVA.
Normality Test. Normality test results display whether the data passed or failed the test of the
assumption that they were drawn from a normal population and the P value calculated by the
test. Normally distributed source populations are required for all parametric tests.
This result appears if you enabled normality testing in the Two Way ANOVA Options dialog
box.
Equal Variance Test. Equal Variance test results display whether or not the data passed or
failed the test of the assumption that the samples were drawn from populations with the same
variance and the P value calculated by the test. Equal variance of the source population is
assumed for all parametric tests.
This result appears if you enabled equal variance testing in the Two Way ANOVA Options
dialog box.
ANOVA Table. The ANOVA table lists the results of the Two Way ANOVA.
Tip
When there are missing data, the best estimate of these values is automatically
calculated using a general linear model.
DF (Degrees of Freedom). Degrees of freedom represent the number of groups in each factor
and the sample size, which affects the sensitivity of the ANOVA.
• The degrees of freedom for each factor is a measure of the number of levels in each factor.
• The interaction degrees of freedom is a measure of the total number of cells.
• The error degrees of freedom (sometimes called the residual or within groups degrees of
freedom) is a measure of the sample size after accounting for the factors and interaction.
• The total degrees of freedom is a measure of the total sample size.
SS (Sum of Squares). The sum of squares is a measure of variability associated with each
element in the ANOVA data table.
• The factor sums of squares measure the variability within between the rows or columns of
the table considered separately.
• The interaction sum of squares measures the variability of the average differences
between the cell in addition to the variation between the rows and columns, considered
separately—this is a gauge of the interaction between the factors.
• The error sum of squares (also called residual or within group sum of squares) is a measure
of the underlying random variation in the data, for example, the variability not associated
with the factors or their interaction.
• The total sum of squares is a measure of the total variability in the data; if there are no
missing data, the total sum of squares equals the sum of the other table sums of squares.
MS (Mean Squares). The mean squares provide different estimates of the population
variances. Comparing these variance estimates is the basis of analysis of variance.
The mean square for each factor
108
5.6.8.1 Result Explanations
is an estimate of the variance of the underlying population computed from the variability
between levels of the factor.
The interaction mean square
is an estimate of the variance of the underlying population computed from the variability
associated with the interactions of the factors.
The error mean square (residual, or within groups)
is an estimate of the variability in the underlying population, computed from the random
component of the observations.
F Statistic. The F test statistic is provided for comparisons within each factor and between
the factors.
The F ratio to test each factor is
mean square for the factor MS factor
= = F error
mean square of the error MS factor
If the F ratio is around 1, you can conclude that there are no significant differences between
factor levels or that there is no interaction between factors (for example, the data groups are
consistent with the null hypothesis that all the samples were drawn from the same population).
If F is a large number, you can conclude that at least one of the samples for that factor or
combination of factors was drawn from a different population (for example, the variability is
larger than what is expected from random variability in the population). To determine exactly
which groups are different, examine the multiple comparison results.
P Value. The P value is the probability of being wrong in concluding that there is a true
difference between the groups (for example, the probability of falsely rejecting the null
hypothesis, or committing a Type I error, based on F). The smaller the P value, the greater the
probability that the samples are drawn from different populations.
Traditionally, you can conclude there are significant differences if P < 0.05.
Power. The power, or sensitivity, of a Two Way ANOVA is the probability that the test will
detect the observed difference among the groups if there really is a difference. The closer the
power is to 1, the more sensitive the test. The power for the comparison of the groups within
the two factors and the power for the comparison of the interactions are all displayed. These
results are set in the Options for Two Way ANOVA dialog box.
109
SigmaPlot Statistics
ANOVA power is affected by the sample sizes, the number of groups being compared, the
chance of erroneously reporting a difference a (alpha), the observed differences of the group
means, and the observed standard deviations of the samples.
Alpha. Alpha (α) is the acceptable probability of incorrectly concluding that there is a
difference. An a error also is called a Type I error (a Type I error is when you reject the
hypothesis of no effect when this hypothesis is true).
The a value is set in the Options for Two Way ANOVA dialog box; the suggested value is
α = 0.05 which indicates that a one in twenty chance of error is acceptable. Smaller values
of a result in stricter requirements before concluding there is a significant difference, but a
greater possibility of concluding there is no difference when one exists (a Type II error).
Larger values of a make it easier to conclude that there is a difference, but also increase the
risk of seeing a false difference (a Type I error).
Summary Table. The least square means and standard error of the means are displayed for
each factor separately (summary table row and column), and for each combination of factors
(summary table cells). If there are missing values, the least square means are estimated using a
general linear model.
• Mean The average value for the column. If the observations are normally distributed the
mean is the center of the distribution.
• Standard Error of the Mean A measure of the approximation with which the mean
computed from the sample approximates the true population mean.
When there are no missing data, the least square means equal the cell and marginal (row
and column) means. When there are missing data, the least squared means provide the best
estimate of these values, using a general linear model. These means and standard errors are
used when performing multiple comparisons (see following section).
Multiple Comparisons. If a difference is found among the groups, multiple comparison
tables can be computed. Multiple comparison procedures are activated in the Options for Two
Way ANOVA dialog box. The tests used in the multiple comparisons are set in the Multiple
Comparisons Options dialog box.
Multiple comparison results are used to determine exactly which groups are different, since
the ANOVA results only inform you that two or more of the groups are different. Two factor
multiple comparison for a full Two Way ANOVA also compares:
• Groups within each factor without regard to the other factor (this is a marginal comparison,
for example, only the columns or rows in the table are compared).
• All combinations of factors (all cells in the table are compared with each other).
The specific type of multiple comparison results depends on the comparison test used and
whether the comparison was made pairwise or versus a control.
• All pairwise comparison results list comparisons of all possible combinations of group
pairs; the all pairwise tests are the Holm-Sidak, Tukey, Student-Newman-Keuls, Fisher
LSD, Duncan’s, and Dunnett’s, and Bonferroni t-test.
• Comparisons versus a single control group list only comparisons with the selected control
group. The control group is selected during the actual multiple comparison procedure. The
comparison versus a control tests are Holm-Sidak, Tukey, Student-Newman-Keuls, Fisher
LSD, Duncan’s, Dunnett’s and Bonferroni t-test.
Holm-Sidak Test Results. The Holm-Sidak Test can be used for both pairwise comparisons
and comparisons versus a control group. It is more powerful than the Tukey and Bonferroni
tests and, consequently, it is able to detect differences that these other tests do not. It is
recommended as the first-line procedure for pairwise comparison testing.
110
5.6.8.1 Result Explanations
When performing the test, the P values of all comparisons are computed and ordered from
smallest to largest. Each P value is then compared to a critical level that depends upon the
significance level of the test (set in the test options), the rank of the P value, and the total
number of comparisons made. A P value less than the critical level indicates there is a
significant difference between the corresponding two groups.
Bonferroni t-test Results. The Bonferroni t-test lists the differences of the means for each
pair of groups, computes the t values for each pair, and displays whether or not P < 0.05 for
that comparison. The Bonferroni t-test can be used to compare all groups or to compare
versus a control.
You can conclude from "large" values of t that the difference of the two groups being
compared is statistically significant.
If the P value for the comparison is less than 0.05, the likelihood of erroneously concluding
that there is a significant difference is less than 5%. If it is greater than 0.05, you cannot
confidently conclude that there is a difference.
The Difference of Means is a gauge of the size of the difference between the levels or cells
being compared.
The degrees of freedom DF for the marginal comparisons are a measure of the number of
groups (levels) within the factor being compared. The degrees of freedom when comparing all
cells is a measure of the sample size after accounting for the factors and interaction. This is
the same as the error or residual degrees of freedom.
Tukey, Student-Newman-Keuls, Fisher LSD, Duncan’s, and Dunnett’s Test Results.
The Tukey, Student-Newman-Keuls (SNK), Fisher LSD, and Duncan’s tests are all pairwise
comparisons of every combination of group pairs. While the Tukey Fisher LSD, and Duncan’s
can be used to compare a control group to other groups, they are not recommended for this
type of comparison.
Dunnett’s test only compares a control group to all other groups. All tests compute the q test
statistic, the number of means spanned in the comparison p, and display whether or not P
< 0.05 for that pair comparison.
You can conclude from "large" values of q that the difference of the two groups being
compared is statistically significant.
If the P value for the comparison is less than 0.05, the likelihood of being incorrect in
concluding that there is a significant difference is less than 5%. If it is greater than 0.05, you
cannot confidently conclude that there is a difference.
p is the parameter used when computing q. The larger the p, the larger q needs to be to
indicate a significant difference. p is an indication of the differences in the ranks of the group
means being compared. Groups means are ranked in order from largest to smallest, and p is
the number of means spanned in the comparison. For example, when comparing four means,
comparing the largest to the smallest p = 4, and when comparing the second smallest to the
smallest p = 2.
If a group is found to be not significantly different than another group, all groups with p ranks
in between the p ranks of the two groups that are not different are also assumed not to be
significantly different, and a result of DNT (Do Not Test) appears for those comparisons.
The Difference of Means is a gauge of the size of the difference between the groups or cells
being compared.
The degrees of freedom DF for the marginal comparisons are a measure of the number of
groups (levels) within the factor being compared. The degrees of freedom when comparing all
111
SigmaPlot Statistics
cells is a measure of the sample size after accounting for the factors and interaction (this is the
same as the error or residual degrees of freedom).
The Create Result Graph dialog box appears displaying the types of graphs available for
the Two Way ANOVA results.
4. Select the type of graph you want to create from the Graph Type list.
5. Click OK, or double-click the desired graph in the list.
112
5.7 Three Way Analysis of Variance (ANOVA)
The selected graph appears in a graph window. For more information, see 11.1 Generating
Report Graphs.
113
SigmaPlot Statistics
test, use the Rank Transform to convert the observations to ranks, then run a Three Way
ANOVA on the ranks.
1. Enter or arrange your data appropriately in the worksheet. For more information, see
5.7.3 Arranging Three Way ANOVA Data.
2. If desired, set the Three Way ANOVA options. For more information, see 5.7.4 Setting
Three Way ANOVA Options.
3. Click the Analysis tab.
4. In the Statistics group, from the Tests drop-down list, select:
Compare Many Groups→Three Way ANOVA
5. Run the test. For more information, see 5.7.5 .
6. Generate report graphs. For more information, see 5.7.8 Three Way ANOVA Report
Graphs.
114
5.7.3.1 Missing Data and Empty Cells Data
The factors are gender, drug, and time period. The levels are Male/Female, Drug A/Drug B,
and Day 1, 2, and 3.
If your data is missing data points or even whole cells, SigmaPlot detects this and provides the
correct solutions. For more information, see 5.7.3.1 Missing Data and Empty Cells Data.
Column 1 is the first factor index, column 2 is the second factor index, column 3 is the third
factor index, and column 4 is the data.
115
SigmaPlot Statistics
data; however, SigmaPlot properly handles all occurrences of missing and unbalanced data
automatically.
Missing Data Points. If there are missing values, SigmaPlot automatically handles the missing
data by using a general linear model approach. This approach constructs hypothesis tests using
the marginal sums of squares (also commonly called the Type III or adjusted sums of squares).
Figure 5.40 Data for a Three Way ANOVA with a Missing Value in the Male,
Drug A, Day 1 Cell
Figure 5.41 Data for a Three Way ANOVA with a Missing Cell (Male/Drug A, Day 1)
You can use either a two factor analysis or assume no interaction between factors.
Assumption of no interaction analyzes the main effects of each treatment separately.
116
5.7.3.2 Connected versus Disconnected Data
DANGER
It can be dangerous to assume there is no interaction between the three factors
in a Three Way ANOVA. Under some circumstances, this assumption can lead
to a meaningless analysis, particularly if you are interested in studying the
interaction effect.
The no interaction assumption does not always permit a two factor analysis when there is more
than one empty cell. The non-empty cells must be geometrically connected in order to do the
computation. You cannot perform Three Way ANOVAs on disconnected data.
Data arranged in a two-dimensional grid, where you can draw a series of straight vertical
and horizontal lines connecting all occupied cells, without changing direction in an empty
cell, is guaranteed to be connected.
Figure 5.42 Example of Drawing Straight Horizontal and Vertical Lines through
Connected Data
It is important to note that failure to meet the above requirement does not imply that the data is
disconnected. The data in the table below, for example, is connected.
Figure 5.43 Example of Connected Data that You Can’t Draw a Series of Straight
Vertical and Horizontal Lines Through
117
SigmaPlot Statistics
Because this data is not geometrically connected (they share no factor levels in common), a
Three Way ANOVA cannot be performed, even assuming no interaction.
1. If you are going to run the test after changing test options and want to select your data
before you run the test, drag the pointer over the data.
2. On the Analysis tab, in the Statistics group, click Options. The Options for Three Way
ANOVA dialog box appears with three tabs:
• Assumption Checking. Adjust the parameters of a test to relax or restrict the testing of
your data for normality and equal variance. For more information, see 5.7.4.1 Options
for Two Way ANOVA: Assumption Checking.
• Results. Display the statistics summary and the confidence interval for the data in the
report and save residuals to a worksheet column. For more information, see 5.6.4.2
Options Two Way ANOVA: Results.
• Post Hoc Tests. Compute the power or sensitivity of the test. For more information,
see 5.6.4.3 Options Two Way ANOVA: Post Hoc Tests.
118
5.7.4.1 Options for Two Way ANOVA: Assumption Checking
Tip
If you are going to run the test after changing test options, and want to select your
data before you run the test, drag the pointer over your data.
Options settings are saved between SigmaPlot sessions.
3. To continue the test, click Run Test. The Pick Columns dialog box appears. For more
information, see 5.6.5 Running a Two Way ANOVA.
4. To accept the current settings and close the options dialog box, click OK.
Figure 5.45 The Options for Three Way ANOVA Dialog Box Displaying the
Assumption Checking Options
Normality Testing. SigmaPlot uses the Shapiro-Wilk or Kolmogorov-Smirnov test to test for
a normally distributed population.
Equal Variance Testing. SigmaPlot tests for equal variance by checking the variability
about the group means.
P Values for Normality and Equal Variance. Type the corresponding P value in the P
Value to Reject box. The P value determines the probability of being incorrect in concluding
that the data is not normally distributed (the P value is the risk of falsely rejecting the null
hypothesis that the data is normally distributed). If the P value computed by the test is greater
than the P set here, the test passes.
To require a stricter adherence to normality and/or equal variance, increase the P value.
Because the parametric statistical methods are relatively robust in terms of detecting violations
119
SigmaPlot Statistics
of the assumptions, the suggested value in SigmaPlot is 0.050. Larger values of P (for
example, 0.100) require less evidence to conclude that data is not normal.
To relax the requirement of normality and/or equal variance, decrease P. Requiring
smaller values of P to reject the normality assumption means that you are willing to accept
greater deviations from the theoretical normal distribution before you flag the data as
non-normal. For example, a P value of 0.050 requires greater deviations from normality to
flag the data as non-normal than a value of 0.100.
Note
There are extreme conditions of data distribution that these tests cannot take into
account. For example, the Levene Median test fails to detect differences in variance of
several orders of magnitude; however, these conditions should be easily detected by
simply examining the data without resorting to the automatic assumption tests.
Summary Table. Select Summary Table under Report to display the number of observations
for a column or group, the number of missing values for a column or group, the average value
for the column or group, the standard deviation of the column or group, and the standard
error of the mean for the column or group.
Confidence Intervals. Select Confidence Intervals under Report to display the confidence
interval for the difference of the means. To change the interval, enter any number from 1 to 99
(95 and 99 are the most commonly used intervals).
Residuals in Column. The Residuals in Column drop-down list displays residuals in the
report and to save the residuals of the test to the specified worksheet column. Edit the number
or select a number from the drop-down list.
120
5.7.4.3 Options for Two Way ANOVA: Post Hoc Tests
Power. The power or sensitivity of a test is the probability that the test will detect a difference
between the groups if there is really a difference.
Use Alpha Value. Change the alpha value by editing the number in the Alpha Value box.
Alpha (α) is the acceptable probability of incorrectly concluding that there is a difference. The
suggested value is α = 0.05. This indicates that a one in twenty chance of error is acceptable,
or that you are willing to conclude there is a significant difference when P < 0.05.
Smaller values of a result in stricter requirements before concluding there is a significant
difference, but a greater possibility of concluding there is no difference when one exists.
Larger values of a make it easier to conclude that there is a difference, but also increase the
risk of reporting a false positive.
Multiple Comparisons
Three Way ANOVAs test the hypothesis of no differences between the several treatment
groups, but do not determine which groups are different, or the sizes of these differences.
Multiple comparisons isolate these differences whenever a Three Way ANOVA detects a
difference.
The P value used to determine if the ANOVA detects a difference is set in the Report tab of
the Options dialog box. If the P value produced by the Three Way ANOVA is less than the P
value specified in the box, a difference in the groups is detected and the multiple comparisons
are performed. For more information, see Setting Report Options.
• Always Perform. Select to perform multiple comparisons whether or not the Two Way
ANOVA detects a difference.
Only When ANOVA P Value is Significant. Perform multiple comparisons only if the
ANOVA detects a difference.
121
SigmaPlot Statistics
Significant Multiple Comparison Value. Select either .05 or .10 from the Significance Value
for Multiple Comparisons drop-down list. This value determines that the likelihood of the
multiple comparison being incorrect in concluding that there is a significant difference in
the treatments.
A value of .05 indicates that the multiple comparisons will detect a difference if there is less
than 5% chance that the multiple comparison is incorrect in detecting a difference. A value of
.10 indicates that the multiple comparisons will detect a difference if there is less than 10%
chance that the multiple comparison is incorrect in detecting a difference.
Note
If multiple comparisons are triggered, the Multiple Comparison Options dialog box
appears after you pick your data from the worksheet and run the test, prompting
you to choose a multiple comparison test.
If you want to select your data before you run the test, drag the pointer over your data.
Figure 5.48 The Pick Columns for Three Way ANOVA Dialog Box
3. To assign the desired worksheet columns to the Selected Columns list, select the
columns in the worksheet, or select the columns from the Data for Data drop-down list.
The first selected column is assigned to the first row in the Selected Columns list, and all
successively selected columns are assigned to successive rows in the list. The number
122
5.7.6 Multiple Comparison Options for a Three Way ANOVA
or title of selected columns appear in each row. You are prompted to pick a minimum
three worksheet columns.
4. To change your selections, select the assignment in the list, then select new column
from the worksheet. You can also clear a column assignment by double-clicking it in the
Selected Columns list.
5. Click Finish to perform the Three Way ANOVA. The Three Way ANOVA report appears
if you:
• Elected to test for normality and equal variance, and your data passes both tests.
• Your data has no missing data points, cells, or is not otherwise unbalanced.
• Selected not to perform multiple comparisons, or if you selected to run multiple
comparisons only when the P value is significant, and the P value is not significant
If you elected to test for normality and equal variance, and your data fails either test, either
continue or transform your data, then perform the Three Way ANOVA on the transformed
data. If your data is missing data points, missing cells, or is otherwise unbalanced, you are
prompted to perform the appropriate procedure.
Figure 5.49 The Multiple Comparison Options Dialog Box for a Three Way
ANOVA
123
SigmaPlot Statistics
This dialog box displays the P values for each of the experimental factors and of the interaction
between the factors. Only the options with P values less than or equal to the value set in the
Options dialog box are selected. You can disable multiple comparison testing for a factor by
clicking the selected option. If no factor is selected, multiple comparison results are not
reported.
There are seven multiple comparison tests to choose from for the Three Way ANOVA. You
can choose to perform the:
• Holm-Sidak test. For more information, see 5.9.1 Holm-Sidak Test.
• Tukey Test. For more information, see 5.9.2 Tukey Test.
• Student-Newman-Keuls Test. For more information, see 5.9.3 Student-Newman-Keuls
(SNK) Test.
• Bonferroni t-test. For more information, see 5.9.4 Bonferroni t-Test.
• Fisher’s LSD. For more information, see 5.9.5 .
• Dunnet’s Test. For more information, see 5.9.6 Dunnett’s Test.
• Duncan’s Multiple Range Test. For more information, see 5.9.8 Duncan’s Multiple Range.
Figure 5.50 The Multiple Comparison Options Dialog Box Prompting You to
Select a Control Group
There are two types of multiple comparison available for the Three Way ANOVA. The types
of comparison you can make depends on the selected multiple comparison test.
• All pairwise comparisons test the difference between each treatment or level within the two
factors separately (for example, among the different rows and columns of the data table) .
• Multiple comparisons versus a control test the difference between all the different
combinations of each factors (for example, all the cells in the data table).
All pairwise comparisons test the difference between each treatment or level within the
two factors separately (for example, among the different rows and columns of the data
124
5.7.7 Interpreting Three Way ANOVA Results
table). Multiple comparisons versus a control test the difference between all the different
combinations of each factors (for example, all the cells in the data table).
When comparing the two factors separately, the levels within one factor are compared among
themselves without regard to the second factor, and vice versa. These results should be used
when the interaction is not statistically significant.
When the interaction is statistically significant, interpreting multiple comparisons among
different levels of each experimental factor may not be meaningful. SigmaPlot also suggests
performing a multiple comparison between all the cells.
The result of both comparisons is a listing of the similar and different group pairs, for example,
those groups that are and are not detectably different from each other. Because no statistical
test eliminates uncertainty, multiple comparison procedures sometimes produce ambiguous
groupings.
In addition to the numerical results, expanded explanations of the results may also appear.
You can turn off this text on the Options dialog box. For more information, see Setting
Report Options.
You can also set the number of decimal places to display the Options dialog box.
125
SigmaPlot Statistics
126
5.7.7.1.4 Equal Variance Test
This result appears if you enabled normality testing in the Options for Three Way ANOVA
dialog box.
is an estimate of the variance of the underlying population computed from the variability
between levels of the factor.
127
SigmaPlot Statistics
is an estimate of the variance of the underlying population computed from the variability
associated with the interactions of the factors.
The error mean square (residual, or within groups):
error sum of squares SS error
= = MS error
error degrees of freedom DF error
is an estimate of the variability in the underlying population, computed from the random
component of the observations.
F Statistic. The F test statistic is provided for comparisons within each factor and between
the factors.
The F ratio to test each factor is:
mean square for the factor MS error
= =F factor
error mean square for the factor MS factor
If the F ratio is around 1, you can conclude that there are no significant differences between
factor levels or that there is no interaction between factors (for example, the data groups are
consistent with the null hypothesis that all the samples were drawn from the same population).
If F is a large number, you can conclude that at least one of the samples for that factor or
combination of factors was drawn from a different population (for example, the variability is
larger than what is expected from random variability in the population). To determine exactly
which groups are different, examine the multiple comparison results.
P Value. The P value is the probability of being wrong in concluding that there is a true
difference between the groups (for example, the probability of falsely rejecting the null
hypothesis, or committing a Type I error, based on F). The smaller the P value, the greater the
probability that the samples are drawn from different populations.
Traditionally, you can conclude there are significant differences if P < 0.05.
5.7.7.1.6 Power
The power, or sensitivity, of a Three Way ANOVA is the probability that the test will detect
the observed difference among the groups if there really is a difference. The closer the power
is to 1, the more sensitive the test. The power for the comparison of the groups within the two
factors and the power for the comparison of the interactions are all displayed. These results
are set in the Options for Three Way ANOVA dialog box.
128
5.7.7.1.7 Summary Table
ANOVA power is affected by the sample sizes, the number of groups being compared, the
chance of erroneously reporting a difference a (alpha), the observed differences of the group
means, and the observed standard deviations of the samples.
Alpha. Alpha (α) is the acceptable probability of incorrectly concluding that there is a
difference. An a error also is called a Type I error (a Type I error is when you reject the
hypothesis of no effect when this hypothesis is true).
Set the value in the Options for Three Way ANOVA dialog box; the suggested value is α =
0.05 which indicates that a one in twenty chance of error is acceptable. Smaller values of a
result in stricter requirements before concluding there is a significant difference, but a greater
possibility of concluding there is no difference when one exists (a Type II error). Larger
values of a make it easier to conclude that there is a difference, but also increase the risk
of seeing a false difference (a Type I error).
129
SigmaPlot Statistics
Bonferroni t-test Results The Bonferroni t-test lists the differences of the means for each
pair of groups, computes the t values for each pair, and displays whether or not P < 0.05 for
that comparison. The Bonferroni t-test can be used to compare all groups or to compare
versus a control.
You can conclude from "large" values of t that the difference of the two groups being
compared is statistically significant.
If the P value for the comparison is less than 0.05, the likelihood of erroneously concluding
that there is a significant difference is less than 5%. If it is greater than 0.05, you cannot
confidently conclude that there is a difference.
The Difference of Means is a gauge of the size of the difference between the levels or cells
being compared.
The degrees of freedom DF for the marginal comparisons are a measure of the number of
groups (levels) within the factor being compared. The degrees of freedom when comparing all
cells is a measure of the sample size after accounting for the factors and interaction. This is
the same as the error or residual degrees of freedom.
Tukey, Student-Newman-Keuls, Fisher LSD, Duncan’s, and Dunnett’s Test Results The
Tukey, Student-Newman-Keuls (SNK), Fisher LSD, and Duncan’s tests are all pairwise
comparisons of every combination of group pairs. While the Tukey Fisher LSD, and Duncan’s
can be used to compare a control group to other groups, they are not recommended for this
type of comparison.
Dunnett’s test only compares a control group to all other groups. All tests compute the q test
statistic and display whether or not P < 0.05 for that pair comparison.
You can conclude from "large" values of q that the difference of the two groups being
compared is statistically significant.
If the P value for the comparison is less than 0.05, the likelihood of being incorrect in
concluding that there is a significant difference is less than 5%. If it is greater than 0.05, you
cannot confidently conclude that there is a difference.
p is a parameter used when computing q. The larger the p, the larger q needs to be to indicate a
significant difference. p is an indication of the differences in the ranks of the group means
being compared. Groups means are ranked in order from largest to smallest, and p is the
number of means spanned in the comparison. For example, when comparing four means,
comparing the largest to the smallest p = 4, and when comparing the second smallest to the
smallest p = 2.
If a group is found to be not significantly different than another group, all groups with p ranks
in between the p ranks of the two groups that are not different are also assumed not to be
significantly different, and a result of DNT (Do Not Test) appears for those comparisons.
The Difference of Means is a gauge of the size of the difference between the groups or cells
being compared.
The degrees of freedom DF for the marginal comparisons are a measure of the number of
groups (levels) within the factor being compared. The degrees of freedom when comparing all
cells is a measure of the sample size after accounting for the factors and interaction (this is the
same as the error or residual degrees of freedom).
130
5.7.8.1 How to Create a Three Way ANOVA Report Graph
• Histogram of the residuals. For more information, see 11.1.8 Histogram of Residuals.
• Normal probability plot of the residuals. For more information, see 11.1.9 Normal
Probability Plot.
• Multiple comparison graphs. For more information, see 11.1.15 Multiple Comparison
Graphs.
• Profile plots.
The Create Result Graph dialog box appears displaying the types of graphs available for
the Three Way ANOVA results.
3. Select the type of graph you want to create from the Graph Type list, then click OK.
131
SigmaPlot Statistics
Ranks on two groups, SigmaPlot tells you to perform a Rank Sum Test instead. For more
information, see 5.4 Mann-Whitney Rank Sum Test.
The null hypothesis you test is that there is no difference in the distribution of values between
the different groups.
The Kruskal-Wallis ANOVA on Ranks is a nonparametric test that does not require assuming
all the samples were drawn from normally distributed populations with equal variances.
1. Enter or arrange your data appropriately in the worksheet. For more information, see
5.8.3 Arranging ANOVA on Ranks Data.
2. If desired, set the ANOVA on Ranks options. For more information, see 5.8.4 Setting
the ANOVA on Ranks Options.
3. Click the Analysis tab.
4. In the Statistics group, from the Tests drop-down list, select:
Compare Many Groups→ANOVA on Ranks
5. Run the test. For more information, see 5.8.5 Running an ANOVA on Ranks.
6. Generate report graphs. For more information, see 5.8.8 ANOVA on Ranks Report
Graphs.
132
5.8.4 Setting the ANOVA on Ranks Options
Columns 1 through 3 are arranged as raw data. Columns 4 and 5 are arranged as indexed data,
with column 4 as the factor column and column 5 as the data column.
1. Select ANOVA on Ranks from the Select Test drop-down list in the Statistics group
on the Analysis tab.
2. Click Current Test Options. The Options for ANOVA on Ranks dialog box appears
with three tabs:
• Assumption Checking. Adjust the parameters of a test to relax or restrict the testing of
your data for normality and equal variance. For more information, see 5.8.4.1 Options
for ANOVA on Ranks: Assumption Checking.
133
SigmaPlot Statistics
• Results. Display the statistics summary and the confidence interval for the data in the
report and save residuals to a worksheet column. For more information, see 5.8.4.2
Options for ANOVA on Ranks: Results.
• Post Hoc Test. Compute the power or sensitivity of the test and enable multiple
comparisons. For more information, see 5.8.4.3 Options for ANOVA on Ranks: Post
Hoc Tests.
3. To continue the test, click Run Test.
4. To accept the current settings, click OK.
Click the Assumption Checking tab from the options dialog box to view the Normality and
Equal Variance options. The normality assumption test checks for a normally distributed
population. The equal variance assumption test checks the variability about the group means.
• Normality. SigmaPlot uses either the Shapiro-Wilk or Kolmogorov-Smirnov test to test for
a normally distributed population.
• Equal Variance. SigmaPlot tests for equal variance by checking the variability about the
group means.
• P Values for Normality and Equal Variance. Enter the corresponding P value in the
P Value to Reject box. The P value determines the probability of being incorrect in
concluding that the data is not normally distributed (the P value is the risk of falsely
rejecting the null hypothesis that the data is normally distributed). If the P value computed
by the test is greater than the P set here, the test passes.
To require a stricter adherence to normality and/or equal variance, increase the P value.
Because the parametric statistical methods are relatively robust in terms of detecting violations
of the assumptions, the suggested value in SigmaPlot is 0.050. Larger values of P (for
example, 0.100) require less evidence to conclude that data is not normal.
134
5.8.4.2 Options for ANOVA on Ranks: Results
To relax the requirement of normality and equal variance, decrease P. Requiring smaller
values of P to reject the normality assumption means that you are willing to accept greater
deviations from the theoretical normal distribution before you flag the data as non-normal.
For example, a P value of 0.050 requires greater deviations from normality to flag the data as
non-normal than a value of 0.100.
Note
There are extreme conditions of data distribution that these tests cannot take into
account. For example, the Levene Median test fails to detect differences in variance of
several orders of magnitude; however, these conditions should be easily detected by
simply examining the data without resorting to the automatic assumption tests.
Figure 5.53 The Options for ANOVA on Ranks Dialog Box Displaying the
Summary Table Option
135
SigmaPlot Statistics
in the box, a difference in the groups is detected and the multiple comparisons are performed.
For more information, see Setting Report Options.
Multiple Comparisons. You can choose to always perform multiple comparisons or to only
perform multiple comparisons if the ANOVA on Ranks detects a difference.
• Always Perform. Select to perform multiple comparisons whether or not the ANOVA
detects a difference.
• Only When ANOVA P Value is Significant. Select to perform multiple comparisons
only if the ANOVA detects a difference.
• Significance Value for Multiple Comparisons. Select a value from the Significance Value
for Multiple Comparisons drop-down list. This value determines the that the likelihood of
the multiple comparison being incorrect in concluding that there is a significant difference
in the treatments.
A value of .05 indicates that the multiple comparisons will detect a difference if there is less
than 5% chance that the multiple comparison is incorrect in detecting a difference.
Note
If multiple comparisons are triggered, the Multiple Comparison Options dialog box
appears after you pick your data from the worksheet and run the test, prompting you
to choose a multiple comparison method. For more information, see 5.8.6 Multiple
Comparison Options for ANOVA on Ranks.
Attention
Because no statistical test eliminates uncertainty, multiple comparison tests sometimes
produce ambiguous groupings.
136
5.8.5 Running an ANOVA on Ranks
If you want to select your data before you run the test, drag the pointer over your data.
To run an ANOVA on Ranks:
Figure 5.54 The Pick Columns for ANOVA on Ranks Dialog Box Prompting
You to Specify A Data Format
3. If the P value for multiple comparisons is significant, or you selected to always perform
multiple comparisons, the Multiple Comparisons Options dialog box appears prompting
you to select a multiple comparison method. For more information, see 5.8.6 Multiple
Comparison Options for ANOVA on Ranks.
4. Select the appropriate data format from the Data Format drop-down list. For more
information, see 5.2 Data Format for Group Comparison Tests.
5. Click Next to pick the data columns for the test. If you selected columns before you chose
the test, the selected columns appear in the Selected Columns list.
6. To assign the desired worksheet columns to the Selected Columns list, select the
columns in the worksheet, or select the columns from the Data for Data drop-down list.
The first selected column is assigned to the first row in the Selected Columns list, and all
successively selected columns are assigned to successive rows in the list.
137
SigmaPlot Statistics
Figure 5.55 The Pick Columns for ANOVA on Ranks Dialog Box Prompting
You to Select Data Columns
The number or title of selected columns appear in each row. You are prompted to pick a
minimum of two and a maximum of 64 columns for raw data and two columns with at
least three treatments are selected for indexed data. If you have less than three treatments,
a message appears telling you to use the Rank Sum Test. For more information, see 5.4
Mann-Whitney Rank Sum Test.
7. If the P value for multiple comparisons is significant, or you selected to always perform
multiple comparisons, the Multiple Comparisons Options dialog box appears prompting
you to select a multiple comparison method. For more information, see 5.8.6 Multiple
Comparison Options for ANOVA on Ranks.
8. To change your selections, select the assignment in the list, then select new column
from the worksheet. You can also clear a column assignment by double-clicking it in
the Selected Columns list.
9. If you elected to test for normality and equal variance, and your data fails either test,
either continue or transform your data, then perform the Two Way ANOVA on the
transformed data.
10. Click Finish to perform the ANOVA on Ranks. The ANOVA on Ranks report appears if
you:
• Elected to test for normality and equal variance, and your data passes both tests
• Selected not perform multiple comparisons, or if you selected to run multiple
comparisons only when the P value is significant, and the P value is not significant. For
more information, see 5.8.7 Interpreting ANOVA on Ranks Results.
11. If the P value for multiple comparisons is significant, or you selected to always perform
multiple comparisons, the Multiple Comparisons Options dialog box appears prompting
you to select a multiple comparison method. For more information, see 5.8.6 Multiple
Comparison Options for ANOVA on Ranks.
138
5.8.6 Multiple Comparison Options for ANOVA on Ranks
139
SigmaPlot Statistics
In addition to the numerical results, expanded explanations of the results may also appear.
You can turn off this text on the Options dialog box. For more information, see Setting
Report Options.
You can also set the number of decimal places to display the Options dialog box.
Normality test results display whether the data passed or failed the test of the assumption
that it was drawn from a normal population and the P value calculated by the test. For
nonparametric procedures, this test can fail, since nonparametric tests do not assume normally
distributed source populations.
These results appear unless you disabled normality testing in the Options for ANOVA on
Ranks dialog box.
Equal Variance test results display whether or not the data passed or failed the test of the
assumption that the samples were drawn from populations with the same variance and the
P value calculated by the test. Nonparametric tests do not assume equal variances of the
source populations.
These results appear unless you disabled equal variance testing in the Options for ANOVA
on Ranks dialog box.
140
5.8.7.1.3 Summary Table
If you selected this option in the Options for ANOVA on Ranks dialog box, SigmaPlot
generates a summary table listing the medians, the percentiles defined in the Options dialog
box, and sample sizes N.
N (Size). The number of non-missing observations for that column or group.
Missing. The number of missing values for that column or group.
Median. The "middle" observation as computed by listing all the observations from smallest
to largest and selecting the largest value of the smallest half of the observations. The median
observation has an equal number of observations greater than and less than that observation.
Percentiles. The two percentile points that define the upper and lower tails of the observed
values.
5.8.7.1.4 H Statistic
The ANOVA on Ranks test statistic H is computed by ranking all observations from smallest
to largest without regard for treatment group. The average value of the ranks for each
treatment group are computed and compared.
For large sample sizes, this value is compared to the chi-square distribution (the estimate of all
possible distributions of H) to determine the possibility of this H occurring. For small sample
sizes, the actual distribution of H is used.
If H is small, the average ranks observed in each treatment group are approximately the same.
You can conclude that the data is consistent with the null hypothesis that all the samples were
drawn from the same population (for example, no treatment effect).
If H is a large number, the variability among the average ranks is larger than expected from
random variability in the population, and you can conclude that the samples were drawn
from different populations (for example, the differences between the groups are statistically
significant).
P Value. The P value is the probability of being wrong in concluding that there is a true
difference in the groups (for example, the probability of falsely rejecting the null hypothesis,
or committing a Type I error, based on H). The smaller the P value, the greater the probability
that the samples are significantly different. Traditionally, you can conclude there are
significant differences when P < 0.05.
If a difference is found among the groups, and you requested and elected to perform multiple
comparisons, a table of the comparisons between group pairs is displayed. The multiple
comparison procedure is activated in the Options for ANOVA on Ranks dialog box. The test
used in the multiple comparison procedure is selected in the Multiple Comparison Options
dialog box.
Multiple comparison results are used to determine exactly which groups are different, since
the ANOVA results only inform you that two or more of the groups are different. The specific
type of multiple comparison results depends on the comparison test used and whether the
comparison was made pairwise or versus a control.
• All pairwise comparison results list comparisons of all possible combinations of group
pairs: the all pairwise tests are the Tukey, Student-Newman-Keuls test and Dunn’s test.
141
SigmaPlot Statistics
• Comparisons versus a single control list only comparisons with the selected control group.
The control group is selected during the actual multiple comparison procedure. The
comparison versus a control tests are Dunnett’s test and Dunn’s test.
Tukey, Student-Newman-Keuls, and Dunnett’s Test Results The Tukey and
Student-Newman-Keuls (SNK) tests are all pairwise comparisons of every combination of
group pairs. Dunnett’s test only compares a control group to all other groups. All tests
compute the q test statistic. They also display the number of rank sums spanned in the
comparison p, and display whether or not P < 0.05 or < 0.01 for that pair comparison.
You can conclude from "large" values of q that the difference of the two groups being
compared is statistically significant.
If the P value for the comparison is less than 0.05, the probability of being incorrect in
concluding that there is a significant difference is less than 5%. If it is greater than 0.05, you
cannot confidently conclude that there is a difference.
The Difference of Ranks is a gauge of the size of the real difference between the two groups.
p is a parameter used when computing q or. The larger the p, the larger q needs to be to
indicate a significant difference. p is an indication of the differences in the ranks of the group
means being compared. Group rank sums are ranked in order from largest to smallest in an
SNK or Dunnett’s test, so p is the number of rank sums spanned in the comparison. For
example, when comparing four rank sums, comparing the largest to the smallest p = 4, and
when comparing the second smallest to the smallest p = 2.
If a group is found to be not significantly different than another group, all groups with ranks
in between the rank sums of the two groups that are not different are also assumed not to be
significantly different, and a result of DNT (Do Not Test) appears for those comparisons.
Dunn’s Test Results Dunn’s test is used to compare all groups or to compare versus a control.
Dunn’s test lists the difference of rank means, computes the Q test statistic, and displays
whether or not P < 0.05, for each group pair.
You can conclude from "large" values of Q that the difference of the two groups being
compared is statistically significant.
If the P value for the comparison is less than 0.05, the likelihood of being incorrect in
concluding that there is a significant difference is less than 5%. If it is greater than 0.05, you
cannot confidently conclude that there is a difference.
The Difference of Rank Means is a gauge of the size of the difference between the two groups.
142
5.9 Performing a Multiple Comparison
The selected graph appears in a graph window. For more information, see 11.1 Generating
Report Graphs.
The multiple comparison test you choose depends on the treatments you are testing. Click
Cancel if you do not want to perform a multiple comparison test.
To perform a multiple comparison test:
1. Select which factors you wish to compare under Select Factors to Compare.
This option is automatically selected if the P value produced by the ANOVA (displayed
in the upper left corner of the dialog box) is less than or equal to the P value set in the
Options dialog box, and multiple comparisons are performed. If the P value displayed
143
SigmaPlot Statistics
in the dialog box is greater than the P value set in the Options dialog box, multiple
comparisons are not performed. For more information, see Setting Report Options.
2. Select the desired multiple comparison test from the Suggested Test drop-down list.
3. Select a Comparison Type. The types of comparisons available depend on the selected
test. All Pairwise compares all possible pairs of treatments and is available for the Tukey,
Student-Newman-Keuls, Bonferroni, Fisher LSD, and Duncan’s tests.
Versus Control compares all experimental treatments to a single control group and is
available for the Tukey, Bonferroni, Fisher LSD, Dunnett’s, and Duncan’s tests. It is not
recommended for the Tukey, Fisher LSD, or Duncan’s test.
4. If you select Versus Control, you must also select the control group from the list of
groups.
5. If you selected an all pairwise comparison test, click Finish to continue with the test and
view the report. For more information, see 5.5.7 Interpreting One Way ANOVA Results.
6. If you selected a multiple comparisons versus a control test, click Next. The Multiple
Comparisons Options dialog box prompts you to select a control group. Select the
desired control group from the list, then click Finish to continue the test and view the
report.
144
5.9.4 Bonferroni t-Test
145
6 Comparing Repeated
Measurements of the Same
Individuals
Topics Covered in this Chapter
♦ About Repeated Measures Tests
♦ Data Format for Repeated Measures Tests
♦ Paired t-Test
♦ Wilcoxon Signed Rank Test
♦ One Way Repeated Measures Analysis of Variance (ANOVA)
♦ Two Way Repeated Measures Analysis of Variance (ANOVA)
♦ Friedman Repeated Measures Analysis of Variance on Ranks
Use repeated measures procedures to test for differences in same individuals before and after
one or more different treatments or changes in condition.
When comparing random samples from two or more groups consisting of different individuals,
use group comparison tests. For more information, see 3.2 Choosing the Procedure to Use.
147
SigmaPlot Statistics
• Wilcoxon Signed Rank Test. This is a nonparametric test. For more information, see
6.4 Wilcoxon Signed Rank Test.
148
6.2.1 Raw Data
Columns 1 and 2 in the worksheet above are arranged as raw data. Columns 3, 4, and 5
are arranged as indexed data, with column 3 as the subject column, column 4 as the factor
column, and column 5 as the data column.
149
SigmaPlot Statistics
1. Enter or arrange your data in the worksheet. For more information, see 6.3.2 Arranging
Paired t-Test Data.
2. If desired, set the Paired t-test options. For more information, see 6.3.3 Setting Paired
t-Test Options.
3. Click the Analysis tab.
4. In the Statistics group, from the Tests drop-down list, select:
Before and After→Paired t-test
5. Run the test. For more information, see 6.3.4 Running a Paired t-Test.
6. Generate report graphs. For more information, see 6.3.6 Paired t-Test Report Graphs.
150
6.3.3 Setting Paired t-Test Options
Columns 1 and 2 in the worksheet above are arranged as raw data. Columns 3, 4, and 5
are arranged as indexed data, with column 3 as the subject column, column 4 as the factor
column, and column 5 as the data column.
1. Select Paired t-test from the Select Test drop-down list in the Statistics group on the
Analysis tab.
2. Click Options. The Options for Paired t-test dialog box appears with three tabs:
• Assumption Checking. Adjust the parameters of a test to relax or restrict the testing of
your data for normality and equal variance. For more information, see 6.3.3.1 Options
for Paired t-test: Assumption Checking.
• Results. Display the statistics summary and the confidence interval for the data in the
report and save residuals to a worksheet column. For more information, see 6.3.3.2
Options for Paired t-Test: Results.
• Post Hoc Tests. Compute the power or sensitivity of the test. For more information,
see 5.3.4.3 Options for t-Test: Post Hoc Tests.
151
SigmaPlot Statistics
Tip
If you are going to run the test after changing test options, and want to select your
data before you run the test, drag the pointer over your data.
Options settings are saved between SigmaPlot sessions.
3. To continue the test, click Run Test. The Pick Columns dialog box appears. For more
information, see 6.3.4 Running a Paired t-Test.
To accept the current settings and close the options dialog box, click OK.
Figure 6.3 The Options for Paired t-test Dialog Box Displaying the Assumption
Checking Options
Normality. SigmaPlot uses either the Shapiro-Wilk or Kolmogorov-Smirnov test to test for a
normally distributed population.
• P Value to Reject. Enter the corresponding P value in the P Value to Reject box. The
P value determines the probability of being incorrect in concluding that the data is not
normally distributed (the P value is the risk of falsely rejecting the null hypothesis that the
data is normally distributed). If the P value computed by the test is greater than the P
set here, the test passes.
152
6.3.3.2 Options for Paired t-Test: Results
To require a stricter adherence to normality, increase the P value. Because the parametric
statistical methods are relatively robust in terms of detecting violations of the assumptions, the
suggested value in SigmaPlot is 0.050. Larger values of P (for example, 0.100) require less
evidence to conclude that data is not normal.
To relax the requirement of normality, decrease P. Requiring smaller values of P to reject
the normality assumption means that your are willing to accept greater deviations from the
theoretical normal distribution before you flag the data as non-normal. For example, a P
value of 0.050 requires greater deviations from normality to flag the data as non-normal
than a value of 0.100.
Restriction
Although the normality test is robust in detecting data from populations that are
non-normal, there are extreme conditions of data distribution that this test cannot
take into account; however, these conditions should be easily detected by simply
examining the data without resorting to the automatic assumption test.
Summary Table. Displays the number of observations for a column or group, the number of
missing values for a column or group, the average value for the column or group, the standard
deviation of the column or group, and the standard error of the mean for the column or group.
Confidence Intervals. Displays the confidence interval for the difference of the means.
To change the interval, enter any number from 1 to 99 (95 and 99 are the most commonly
used intervals).
Residuals in Column. Displays residuals in the report and to save the residuals of the test to
the specified worksheet column. Edit the number or select a number from the drop-down list.
153
SigmaPlot Statistics
Power. The power or sensitivity of a test is the probability that the test will detect a difference
between the groups if there is really a difference.
Use Alpha Value. Alpha (α) is the acceptable probability of incorrectly concluding that there
is a difference. The suggested value is α = 0.05. This indicates that a one in twenty chance
of error is acceptable, or that you are willing to conclude there is a significant difference
when P < 0.05.
Smaller values of α result in stricter requirements before concluding there is a significant
difference, but a greater possibility of concluding there is no difference when one exists.
Larger values of α make it easier to conclude that there is a difference, but also increase the
risk of reporting a false positive.
Figure 6.4 The Options for Paired t-test Dialog Box Displaying the Power Option
If you want to select your data before you run the test, drag the pointer over your data.
154
6.3.4 Running a Paired t-Test
Figure 6.5 The Pick Columns for Paired t-test Dialog Box Prompting You to
Specify a Data Format
3. Select the appropriate data format (Raw or Indexed) from the Data Format drop-down
list. For more information, see 6.2 Data Format for Repeated Measures Tests.
4. Click Next to pick the data columns for the test. If you selected columns before you chose
the test, the selected columns appear in the Selected Columns list.
Figure 6.6 The Pick Columns for Paired t-test Dialog Box Prompting You to
Select Data Columns
5. To assign the desired worksheet columns to the Selected Columns list, select the
columns in the worksheet, or select the columns from the Data for Data drop-down list.
155
SigmaPlot Statistics
The first selected column is assigned to the first row in the Selected Columns list, and
all successively selected columns are assigned to successive rows in the list. The title of
selected columns appears in each row. For raw and indexed data, you are prompted to
select two worksheet columns. For statistical summary data you are prompted to select
three columns.
6. To change your selections, select the assignment in the list, then select new column
from the worksheet. You can also clear a column assignment by double-clicking it in the
Selected Columns list.
7. Click Finish to run the t-test on the selected columns. After the computations are
completed, the report appears. For more information, see 6.3.5 Interpreting Paired t-Test
Results.
Result Explanations
In addition to the numerical results, expanded explanations of the results may also appear. You
can turn off this text on the Options dialog box. You can also set the number of decimal places
to display in the Options dialog box. For more information, see Setting Report Options.
156
6.3.5.2 Summary Table
This result appears unless you disabled normality testing in the Paired t-test Options dialog
box. For more information, see 6.3.3 Setting Paired t-Test Options.
6.3.5.3 Difference
The difference of the group before and after the treatment is described in terms of the mean
of the differences (changes) in the subjects before and after the treatment, and the standard
deviation and standard error of the mean difference.
The standard error of the mean difference is a measure of the precision with which the mean
difference estimates the true difference in the underlying population.
6.3.5.4 t Statistic
The t-test statistic is computed by subtracting the values before the intervention from the
value observed after the intervention in each experimental subject. The remaining analysis
is conducted on these differences.
The t-test statistic is the ratio:
mean difference of the subjects before after
t=
standard error of the mean difference
You can conclude from large (bigger than ~2) absolute values of t that the treatment affected
the variable of interest (you reject the null hypothesis of no difference). A large t indicates that
the difference in observed value after and before the treatment is larger than one would be
expected from effect variability alone (for example, that the effect is statistically significant).
A small t (near 0) indicates that there is no significant difference between the samples (little
difference in the means before and after the treatment).
Degrees of Freedom. The degrees of freedom is a measure of the sample size, which affects
the ability of t to detect differences in the mean effects. As degrees of freedom increase, the
ability to detect a difference with a smaller t increases.
P Value. The P value is the probability of being wrong in concluding that there is a true effect
(for example, the probability of falsely rejecting the null hypothesis, or committing a Type I
157
SigmaPlot Statistics
error, based on t). The smaller the P value, the greater the probability that the treatment effect
is significant. Traditionally, you can conclude there is a significant difference when P < 0.05.
6.3.5.6 Power
The power, or sensitivity, of a Paired t-test is the probability that the test will detect a
difference between treatments if there really is a difference. The closer the power is to 1,
the more sensitive the test.
Paired t-test power is affected by the sample sizes, the chance of erroneously reporting a
difference a (alpha), the observed differences of the subject means, and the observed standard
deviations of the samples.
This result is displayed unless you disabled it in the Options for Paired t-test dialog box. For
more information, see 6.3.3 Setting Paired t-Test Options.
Alpha. Alpha (α) is the acceptable probability of incorrectly concluding that there is a
difference. An a error is also called a Type I error. A Type I error is when you reject the
hypothesis of no effect when this hypothesis is true.
Set the value in the Options for Paired t-test dialog box; the suggested value is α = 0.05
which indicates that a one in twenty chance of error is acceptable. Smaller values of a result
in stricter requirements before concluding there is a significant difference, but a greater
possibility of concluding there is no difference when one exists (a Type II error). Larger
values of a make it easier to conclude that there is a difference but also increase the risk
of seeing a false difference (a Type I error).
158
6.3.6.1 How to Create a Graph of the Paired t-test Data
The Create Graph dialog box appears displaying the types of graphs available for the
Paired t-test results.
Figure 6.7 The Create Graph Dialog Box for Paired t-test Report Graphs
3. Select the type of graph you want to create from the Graph Type list, then click OK, or
double-click the desired graph in the list.
159
SigmaPlot Statistics
160
6.4.1 About the Signed Rank Test
1. Enter or arrange your data in the data worksheet. For more information, see 6.4.3
Arranging Signed Rank Data.
2. If desired, set the Signed Rank Test options. For more information, see 6.4.4 Setting
Signed Rank Test Options.
3. On the Analysis tab, in the Statistics group, from the Tests drop-down list select:
Before and After→Signed Rank Test
4. Generate report graphs. For more information, see 6.4.7 Signed Rank Test Report Graphs.
5. Run the test. For more information, see 6.4.5 Running a Signed Rank Test.
Figure 6.9 Valid Data Formats for a Wilcoxon Signed Rank Test
Columns 1 and 2 are arranged as raw data. Columns 3 and 4 are arranged as indexed data,
with column 3 as the factor column.
161
SigmaPlot Statistics
1. If you are going to run the test after changing test options, and want to select your data
before you run the test, drag the pointer over your data.
2. On the Analysis tab, in the Statistics group, click Options. The Options for Signed
Rank Test dialog box appears with two tabs:
• Assumption Checking. Adjust the parameters of a test to relax or restrict the testing of
your data for normality. For more information, see 6.4.4.1 Options for Signed Rank
Test: Assumption Checking.
• Results. Display the statistics summary and the confidence interval for the data in the
report. For more information, see 6.4.4.2 Options for Signed Rank Test: Results.
3. To continue the test, click Run Test. The Pick Columns dialog box appears. For more
information, see 6.4.5 Running a Signed Rank Test.
4. To accept the current settings and close the options dialog box, click OK.
162
6.4.4.1 Options for Signed Rank Test: Assumption Checking
Note
Equal Variance is not available for the Signed Rank Test because Signed Rank Tests
are based on changes in each individual rather than on different individuals in the
selected population, making equal variance testing unnecessary.
Normality. SigmaPlot uses either the Shapiro-Wilk or Kolmogorov-Smirnov test to test for a
normally distributed population.
P Value to Reject. Enter the corresponding P value in the P Value to Reject box. The P value
determines the probability of being incorrect in concluding that the data is not normally
distributed (the P value is the risk of falsely rejecting the null hypothesis that the data is
normally distributed). If the P value computed by the test is greater than the P set here, the
test passes.
To require a stricter adherence to normality, increase the P value. Because the parametric
statistical methods are relatively robust in terms of detecting violations of the assumptions, the
suggested value in SigmaPlot is 0.050. Larger values of P (for example, 0.100) require less
evidence to conclude that data is not normal.
To relax the requirement of normality, decrease P. Requiring smaller values of P to reject
the normality assumption means that your are willing to accept greater deviations from the
theoretical normal distribution before you flag the data as non-normal. For example, a P
value of 0.050 requires greater deviations from normality to flag the data as non-normal
than a value of 0.100.
Restriction
Although this assumption test is robust in detecting data from populations that are
non-normal, there are extreme conditions of data distribution that this test cannot
take into account; however, these conditions should be easily detected by simply
examining the data without resorting to the automatic assumption test.
163
SigmaPlot Statistics
To run a test, you need to select the data to test by dragging the pointer over your data. Then
use the Pick Columns dialog box select the worksheet columns with the data you want to test
and to specify how your data is arranged in the worksheet.
To run a Signed Rank Test:
1. On the Analysis tab, in the Statistics group, from the tests drop-down list select;
Before and After→Signed Rank Test
The Pick Columns dialog box appears prompting you to specify a data format.
Figure 6.10 The Pick Columns for Signed Rank Test Dialog Box Prompting
You to Specify a Data Format
164
6.4.6 Interpreting Signed Rank Test Results
2. Select the appropriate data format from the Data Format drop-down list.
If your data is grouped in columns, select Raw. If your data is in the form of a group
index column(s) paired with a data column(s), select Indexed.
3. Click Next to pick the data columns for the test. If you selected columns before you chose
the test, the selected columns appear in the Selected Columns list.
4. To assign the desired worksheet columns to the Selected Columns list, select the
columns in the worksheet, or select the columns from the Data for Data drop-down list.
The first selected column is assigned to the first row in the Selected Columns list, and all
successively selected columns are assigned to successive rows in the list.
The number or title of selected columns appear in each row. You are prompted to pick
two columns for raw data and three columns for indexed data.
5. To change your selections, select the assignment in the list, then select new column
from the worksheet. You can also clear a column assignment by double-clicking it in the
Selected Columns list.
6. Click Finish to perform the test. If you elected to test for normality, SigmaPlot performs
the test for normality (Shapiro-Wilk or Kolmogorov-Smirnov). If your data pass the test,
SigmaPlot informs you and suggests continuing your analysis using a Paired t-test.
When the test is complete, the report appears displaying the results of the Signed Rank
Test.
165
SigmaPlot Statistics
Normality test results display whether the data passed or failed the test of the assumption that
the difference of the treatment originates from a normal distribution, and the P value calculated
by the test. For nonparametric procedures this test can fail, since nonparametric tests do not
require normally distributed source populations. This result appears unless you disabled
normality testing in the Options for Signed Rank Test dialog box. For more information,
see 6.4.4 Setting Signed Rank Test Options.
SigmaPlot generates a summary table listing the sample sizes N, number of missing values
(if any), medians, and percentiles. All of these results are displayed in the report unless you
disable them in the Signed Rank Test Options dialog box. For more information, see 6.4.4
Setting Signed Rank Test Options.
N (Size). The number of non-missing observations for that column or group.
Missing. The number of missing values for that column or group.
Medians. The "middle" observation as computed by listing all the observations from smallest
to largest and selecting the largest value of the smallest half of the observations. The median
observation has an equal number of observations greater than and less than that observation.
Percentiles. The two percentile points that define the upper and lower tails of the observed
values.
166
6.4.6.3 W Statistic
6.4.6.3 W Statistic
The Wilcoxon test statistic W is computed by ranking all the differences before and after
the treatment based on their absolute value, then attaching the signs of the difference to the
corresponding ranks. The signed ranks are summed and compared.
If the absolute value of W is "large", you can conclude that there was a treatment effect (for
example, the ranks tend to have the same sign, so there is a statistically significant difference
before and after the treatment).
If W is small, the positive ranks are similar to the negative ranks, and you can conclude
that there is no treatment effect.
P Value. The P value is the probability of being wrong in concluding that there is a true
effect (for example, the probability of falsely rejecting the null hypothesis, or committing a
Type I error, based on W). The smaller the P value, the greater the probability that the there
is a treatment effect.
Traditionally, you can conclude there is a significant difference when P < 0.05.
The Create Graph dialog box appears displaying the types of graphs available for the
Signed Rank Test results.
167
SigmaPlot Statistics
Figure 6.12 The Create Graph Dialog Box for the Signed Rank Test Report
3. Select the type of graph you want to create from the Graph Type list.
4. Click OK, or double-click the desired graph in the list. For more information, see 11.1
Generating Report Graphs. The specified graph appears in a graph window or in the report.
168
6.5 One Way Repeated Measures Analysis of Variance (ANOVA)
169
SigmaPlot Statistics
Tip
Depending on your One Way Repeated Measures ANOVA options settings if you
attempt to perform an ANOVA on a non-normal population, SigmaPlot informs you
that the data is unsuitable for a parametric test, and suggest the Friedman ANOVA
on Ranks instead.
1. Enter or arrange your data in the worksheet. For more information, see 6.5.3 Arranging
One Way Repeated Measures ANOVA Data.
2. If desired, set One Way Repeated Measures ANOVA options. For more information, see
6.5.4 Setting One Way Repeated Measures ANOVA Options.
3. On the Analysis tab, in the Statistics group, from the Tests drop-down list click select:
Repeated Measures→One Way Repeated Measures ANOVA
4. Generate report graphs. For more information, see 6.5.8 One Way Repeated Measures
ANOVA Report Graphs.
5. Run the test. For more information, see 6.5.5 Running a One Way Repeated Measures
ANOVA.
170
6.5.4 Setting One Way Repeated Measures ANOVA Options
Figure 6.14 Valid Data Formats for a One Way Repeated Measures ANOVA
Columns 1 through 3 in the worksheet above are arranged as raw data. Columns 4, 5, and 6
are arranged as indexed data, with column 4 as the treatment index column and column 5 as
the subject index column.
Missing Data Points
If there are missing values, SigmaPlot automatically handles the missing data by using a
general linear model. This approach constructs hypothesis tests using the marginal sums
of squares (also commonly called the Type III or adjusted sums of squares); however, the
columns must still be equal in length.
171
SigmaPlot Statistics
1. On the Analysis tab, in the Statistics group, from the Tests drop-down list select:
Repeated Measures→One Way Repeated Measures ANOVA
2. Click Options in the Statistics group. The Options for One Way RM ANOVA dialog
box appears with three tabs:
• Assumption Checking. Adjust the parameters of a test to relax or restrict the testing of
your data for normality and equal variance. For more information, see 6.5.4.1 Options
for One Way Repeated Measures ANOVA: Assumption Checking.
• Results. Display the statistics summary and the confidence interval for the data in the
report and save residuals to a worksheet column. For more information, see 6.5.4.2
Options for One Way RM ANOVA: Results.
• Post Hoc Test. Compute the power or sensitivity of the test and enable multiple
comparisons. For more information, see 6.5.4.3 Options for One Way RM ANOVA:
Post Hoc Tests.
3. To continue the test, click Run Test. For more information, see 6.5.5 Running a One
Way Repeated Measures ANOVA.
4. To accept the current settings and close the options dialog box, click OK.
Figure 6.15 The Options for One Way RM ANOVA Dialog Box Displaying the
Assumption Checking Options
172
6.5.4.2 Options for One Way RM ANOVA: Results
• Equal Variance Testing. SigmaPlot tests for equal variance by checking the variability
about the group means.
• P Values for Normality and Equal Variance. The P value determines the probability of
being incorrect in concluding that the data is not normally distributed (P value is the risk of
falsely rejecting the null hypothesis that the data is normally distributed). If the P computed
by the test is greater than the P set here, the test passes.
To require a stricter adherence to normality and/or equal variance, increase the P value.
Because the parametric statistical methods are relatively robust in terms of detecting violations
of the assumptions, the suggested value in SigmaPlot is 0.050. Larger values of P (for
example, 0.100) require less evidence to conclude that data is not normal.
To relax the requirement of normality and/or equal variance, decrease P. Requiring larger
values of P to reject the normality assumption means that you are willing to accept greater
deviations from the theoretical normal distribution before you flag the data as non-normal.
For example, a P value of 0.010 requires greater deviations from normality to flag the data as
non-normal than a value of 0.050.
Note
There are extreme conditions of data distribution that these tests cannot take into
account. For example, the Levene Median test fails to detect differences in variance of
several orders of magnitude; however, these conditions should be easily detected by
simply examining the data without resorting to the automatic assumption tests.
Summary Table. Select to display the number of observations for a column or group, the
number of missing values for a column or group, the average value for the column or group,
the standard deviation of the column or group, and the standard error of the mean for the
column or group.
Confidence Intervals. Select to display the confidence interval for the difference of the
means. To change the interval, enter any number from 1 to 99 (95 and 99 are the most
commonly used intervals).
Residuals in Column. Select to display residuals in the report and to save the residuals of
the test to the specified worksheet column. Edit the number or select a number from the
drop-down list.
173
SigmaPlot Statistics
Figure 6.16 The Options for One Way ANOVA Dialog Box Displaying the
Summary Table Options
Power. The power or sensitivity of a test is the probability that the test will detect a difference
between the groups if there is really a difference.
Use Alpha Value. Alpha (a) is the acceptable probability of incorrectly concluding that there
is a difference. The suggested value is a = 0.05. This indicates that a one in twenty chance
of error is acceptable, or that you are willing to conclude there is a significant difference
when P < 0.05.
Smaller values of a result in stricter requirements before concluding there is a significant
difference, but a greater possibility of concluding there is no difference when one exists.
Larger values of a make it easier to conclude that there is a difference, but also increase the
risk of reporting a false positive.
174
6.5.4.3 Options for One Way RM ANOVA: Post Hoc Tests
Figure 6.17 The Options for One Way ANOVA Dialog Box Displaying the Power
and Multiple Comparison Options
Multiple Comparisons
A One Way Repeated Measures ANOVA tests the hypothesis of no differences between the
several treatment groups, but does not determine which groups are different, or the sizes of
these differences. Multiple comparison procedures isolate these differences.
The P value used to determine if the ANOVA detects a difference is set on the Report tab of
the Options dialog box. If the P value produced by the One Way ANOVA is less than the P
value specified in the box, a difference in the groups is detected and the multiple comparisons
are performed. For more information, see Setting Report Options.
• Always Perform. Select to perform multiple comparisons whether or not the ANOVA
detects a difference.
• Only When ANOVA P Value is Significant. Select to perform multiple comparisons
only if the ANOVA detects a difference.
• Significance Value for Multiple Comparisons. Select either .05 or .01 from the
Significance Value for Multiple Comparisons drop-down list. This value determines the
that the likelihood of the multiple comparison being incorrect in concluding that there is
a significant difference in the treatments.
A value of .05 indicates that the multiple comparisons will detect a difference if there is less
than 5% chance that the multiple comparison is incorrect in detecting a difference. A value of
.10 indicates that the multiple comparisons will detect a difference if there is less than 10%
chance that the multiple comparison is incorrect in detecting a difference.
Note
If multiple comparisons are triggered, the Multiple Comparison Options dialog box
appears after you pick your data from the worksheet and run the test, prompting you
to choose a multiple comparison method.
175
SigmaPlot Statistics
If you want to select your data before you run the test, drag the pointer over your data.
1. On the Analysis tab, in the Statistics group, from the Tests drop-down list select:
Repeated Measures→One Way Repeated Measures ANOVA
The Pick Columns for One Way RM ANOVA dialog box appears prompting you to
specify a data format.
Figure 6.18 The Pick Columns for One Way RM ANOVA Dialog Box Prompting
You to Specify a Data Format
2. Select the appropriate data format from the Data Format drop-down list. For more
information, see 6.2 Data Format for Repeated Measures Tests.
3. Click Next to pick the data columns for the test. If you selected columns before you chose
the test, the selected columns appear in the Selected Columns list.
176
6.5.5 Running a One Way Repeated Measures ANOVA
Figure 6.19 The Pick Columns for One Way RM ANOVA Dialog Box Prompting
You to Select Data Columns
4. To assign the desired worksheet columns to the Selected Columns list, select the
columns in the worksheet, or select the columns from the Data for Data drop-down list.
The first selected column is assigned to the first row in the Selected Columns list, and
all successively selected columns are assigned to successive rows in the list. The title of
selected columns appears in each row. For raw and indexed data, you are prompted to
select two worksheet columns.
5. To change your selections, select the assignment in the list, then select new column
from the worksheet. You can also clear a column assignment by double-clicking it in the
Selected Columns list.
6. Click Finish to run the One Way RM ANOVAon the selected columns.
If you elected to test for normality and equal variance, and your data fails either test,
SigmaPlot warns you and suggests continuing your analysis using the nonparametric
Friedman Repeated Measures ANOVA on Ranks. For more information, see 6.7 Friedman
Repeated Measures Analysis of Variance on Ranks.
If you selected to run multiple comparisons only when the P value is significant,
and the P value is not significant, the One Way ANOVA report appears after the test is
complete. For more information, see 6.5.7 Interpreting One Way Repeated Measures
ANOVA Results.
If the P value for multiple comparisons is significant, or you selected to always perform
multiple comparisons, the Multiple Comparisons Options dialog box appears prompting
you to select a multiple comparison method. For more information, see 6.5.6 Multiple
Comparison Options (One Way RM ANOVA).
177
SigmaPlot Statistics
178
6.5.7.1 If There Were Missing Data Cells
Figure 6.20 Example of the One Way Repeated Measures ANOVA Report
Result Explanations
In addition to the numerical results, expanded explanations of the results may also appear. You
can turn off this text on the Options dialog box. You can also set the number of decimal places
to display in the Options dialog box. For more information, see Setting Report Options.
If your data contained missing values, the report indicates the results were computed using a
general linear model. The ANOVA table includes the degrees of freedom used to compute F,
the estimated mean square equations are listed, and the summary table displays the estimated
least square means.
For descriptions of the derivations for One Way Repeated Measures ANOVA results, you can
reference an appropriate statistics reference.
Normality test results display whether the data passed or failed the test of the assumption that
the differences of the changes originate from a normal distribution, and the P value calculated
by the test. Normally distributed source populations are required for all parametric tests.
This result appears unless you disabled equal variance testing in the Options for One Way
RM ANOVA dialog box.
179
SigmaPlot Statistics
6.5.7.5 Power
The power of the performed test is displayed unless you disable this option in the Options for
One Way RM ANOVA dialog box.
The power, or sensitivity, of a One Way Repeated Measures ANOVA is the probability that the
test will detect a difference among the treatments if there really is a difference. The closer
the power is to 1, the more sensitive the test.
Repeated measures ANOVA power is affected by the sample sizes, the number of treatments
being compared, the chance of erroneously reporting a difference a (alpha), the observed
differences of the group means, and the observed standard deviations of the samples.
Alpha (α). Alpha (α) is the acceptable probability of incorrectly concluding that there is a
difference. An a error is also called a Type I error. A Type I error is when you reject the
hypothesis of no effect when this hypothesis is true.
Set this value in the Options for One Way RM ANOVA dialog box; the suggested value is
α = 0.05 which indicates that a one in twenty chance of error is acceptable. Smaller values
of a result in stricter requirements before concluding there is a significant difference, but a
greater possibility of concluding there is no difference when one exists (a Type II error).
Larger values of a make it easier to conclude that there is a difference but also increase the risk
of seeing a false difference (a Type I error).
180
6.5.7.7 F Statistic
DF (Degrees of Freedom). Degrees of freedom represent the number of groups and sample
size which affects the sensitivity of the ANOVA.
• The degrees of freedom between subjects is a measure of the number of subjects
• The degrees of freedom within subjects is a measure of the total number of observations,
adjusted for the number of treatments
• The degrees of freedom for the treatments is a measure of the number of treatments
• The residual degrees of freedom is a measure of the difference between the number of
observations, adjusted for the number of subjects and treatments
• The total degrees of freedom is a measure of both number of subjects and treatments
SS (Sum of Squares). The sum of squares is a measure of variability associated with each
element in the ANOVA data table.
• The sum of squares between the subjects measures the variability of the average responses
of each subject.
• The sum of squares within the subjects measures the underlying total variability within
each subject.
• The sum of squares of the treatments measures the variability of the mean treatment
responses within the subjects.
• The residual sum of squares measures the underlying variability among all observations
after accounting for differences between subjects.
• The total sum of squares measures the total variability.
MS (Mean Squares). The mean squares provide two estimates of the population variances.
Comparing these variance estimates is the basis of analysis of variance.
The mean square of the treatments is:
sum of squares between groups SS between
= = MS between
degrees of freedom between groups DF between
6.5.7.7 F Statistic
The F test statistic is a ratio used to gauge the differences of the effects. If there are no missing
data, F is calculated as:
estimated population variance between groups MS between
= =F
estimated population variance within groups MS within
If the F ratio is around 1, you can conclude that there are no differences among treatments (the
data is consistent with the null hypothesis that there are no treatment effects).
If F is a large number, the variability among the effect means is larger than expected from
random variability in the treatments, you can conclude that the treatments have different
effects (the differences among the treatments are statistically significant).
181
SigmaPlot Statistics
P Value. The P value is the probability of being wrong in concluding that there is a true
difference between the groups (for example, the probability of falsely rejecting the null
hypothesis, or committing a Type I error, based on F). The smaller the P value, the greater
the probability that the samples are drawn from different populations. Traditionally, you can
conclude that there are significant differences when P < 0.05.
182
6.5.8 One Way Repeated Measures ANOVA Report Graphs
The difference of the means is a gauge of the size of the difference between the two treatments.
Tukey, Student-Newman-Keuls, Fisher LSD, Duncan’s, and Dunnett’s Test Results.
The Tukey, Student-Newman-Keuls (SNK), Fisher LSD, and Duncan’s tests are all pairwise
comparisons of every combination of group pairs. While the Tukey Fisher LSD, and Duncan’s
can be used to compare a control group to other groups, they are not recommended for this
type of comparison.
Dunnett’s test only compares a control group to all other groups. All tests compute the q test
statistic, and display whether or not P < 0.05 or < 0.01 for that pair comparison.
You can conclude from "large" values of q that the difference of the two groups being
compared is statistically significant.
If the P value for the comparison is less than 0.05, the likelihood of being incorrect in
concluding that there is a significant difference is less than 5%. If it is greater than 0.05, you
cannot confidently conclude that there is a difference.
The Difference of the Means is a gauge of the size of the difference between the two groups.
p is parameter used when computing q. The larger the p, the larger q needs to be to indicate a
significant difference. p is an indication of the differences in the ranks of the group means
being compared. Groups means are ranked in order from largest to smallest in an SNK test,
so p is the number of means spanned in the comparison. For example, when comparing four
means, comparing the largest to the smallest p = 4, and when comparing the second smallest to
the smallest p = 2.
If a treatment is found to be not significantly different than another treatment, all treatments
with p ranks in between the p ranks of the two treatments that are not different are also
assumed not to be significantly different, and a result of DNT (Do Not Test) appears for
those comparisons.
183
SigmaPlot Statistics
2. On the Report tab, in the Result Graphs group, click Create Result Graph.
The Create Result Graph dialog box appears displaying the types of graphs available for
the One Way Repeated Measure ANOVA results.
Figure 6.21 The Create Graph Dialog Box for a One Way RM ANOVA Report
3. Select the type of graph you want to create from the Graph Type list, then click OK, or
double-click the desired graph in the list.
184
6.6 Two Way Repeated Measures Analysis of Variance (ANOVA)
185
SigmaPlot Statistics
If your want to consider the effects of only one factor on your experimental groups, use One
Way Repeated Measures ANOVA.
There is no equivalent in SigmaPlot for a two factor repeated measure comparison for samples
drawn from a non-normal populations. If your data is non-normal, you can transform the data
to make it comply better with the assumptions of analysis of variance using transforms. If the
sample size is large, and you want to do a nonparametric test, use Rank transform (available
in the Transform group on the Analysis tab) to convert the observations to ranks, then do a
Two Way ANOVA on the ranks.
1. Enter or arrange your data in the data worksheet. For more information, see 6.6.3
Arranging Two Way Repeated Measures ANOVA Data.
2. Set the Two Way Repeated Measures ANOVA options. For more information, see 6.6.4
Setting Two Way Repeated Measures ANOVA Options.
3. On the Analysis tab, in the Statistics group, from the Tests drop-down list select:
Repeated Measures→Two Way Repeated Measures ANOVA
4. Generate report graphs. For more information, see 6.6.8 Two way repeated measures
ANOVA Report Graphs.
5. Run the test.For more information, see 6.6.5 Running a Two Way Repeated Measures
ANOVA.
186
6.6.3 Arranging Two Way Repeated Measures ANOVA Data
Figure 6.23 Data for a Two Way Repeated Factor ANOVA with one repeated
factor (salinity).
If you wanted to test the effect of different salinities and temperatures on the activity on
a single species of shrimp, you have a two factor experiment with two repeated treatments,
salinity and temperature. In both cases, the different combinations of treatments/factors levels
are the cells of the comparison. SigmaPlot automatically handles both one and two repeated
treatment factors.
Figure 6.24 Data for a Two Way Repeated Factor ANOVA with two repeated
factors (temperature and salinity).
187
SigmaPlot Statistics
data. However, SigmaPlot properly handles all occurrences of missing and unbalanced data
automatically.
Missing Data Point(s). If there are missing values, SigmaPlot automatically handles the
missing data by using a general linear model. This approach constructs a hypothesis tests using
the marginal sums of squares (also commonly called the Type III or adjusted sums of squares).
Figure 6.25 Data for a Two Way Repeated Factor ANOVA with one repeated
factor (salinity) and a missing data point.
Figure 6.26 Data for a Two Way Repeated Factor ANOVA with two repeated
factors (temperature and salinity) and a missing cell.
188
6.6.3.2 Connected versus Disconnected Data
Data with missing cells that still have repeated factor data for every subject can be analyzed
either by assuming no interaction or a One Way ANOVA.
If you treat the problem as One Way ANOVA, each cell in the table is treated as a different
level of a single experimental factor. This approach is the most conservative analysis because
it requires no additional assumptions about the nature of the data or experimental design.
The no interaction assumption requires that the non-empty cells must be geometrically
connected in order to do the computation of a two factor no interaction model. You cannot
perform Two Way Repeated Measures ANOVA on data disconnected by empty cells.
Figure 6.27 Data for a Two Way Repeated Factor ANOVA with geometrically
disconnected data.
This data cannot be analyzed with a Two Way Repeated Measures ANOVA.
When the data is geometrically connected, you can draw a series of straight vertical and
horizontal lines connecting all cells containing data without changing direction in any empty
cells. SigmaPlot automatically checks for this condition. If disconnected data is encountered
during Two Way Repeated Measures ANOVA, SigmaPlot suggests treatment of the problem
as a One Way Repeated Measures ANOVA.
For descriptions of the concept of connectivity, you can reference an appropriate statistics
reference.
Another case of an empty cell can occur when both factors are repeated, and there are no
data for one level for one of the subjects. SigmaPlot automatically handles this situation by
converting the problem to a One Way Repeated Measures ANOVA.
189
SigmaPlot Statistics
Figure 6.28 Data for a Two Way Repeated Factor ANOVA with two factors
repeated and no data for one level for a subject.
This data cannot be analyzed as a Two Way Repeated Measures ANOVA problem.
1. If you are going to run the test after changing test options, and want to select your data
before you run the test, drag the pointer over your data.
2. On the Analysis tab, in the Statistics group, from the Tests drop-down list select:
Repeated Measures→Two Way Repeated Measures ANOVA
190
6.6.4.1 Options for Two Way Repeated Measures ANOVA: Assumption Checking
3. Click Options. The Options for Two Way RM ANOVA dialog box appears with three
tabs:
• Assumption Checking. Adjust the parameters of a test to relax or restrict the testing of
your data for normality and equal variance.
• Results. Display the statistics summary and the confidence interval for the data in the
report and save residuals to a worksheet column.
• Post Hoc Test. Compute the power or sensitivity of the test and enable multiple
comparisons.
4. To continue the test, click Run Test. For more information, see 6.6.5 Running a Two
Way Repeated Measures ANOVA.
5. To accept the current settings and close the options dialog box, click OK.
191
SigmaPlot Statistics
Summary Table. Select Summary Table to display the number of observations for a column
or group, the number of missing values for a column or group, the average value for the
column or group, the standard deviation of the column or group, and the standard error of
the mean for the column or group.
Confidence Interval. Select Confidence Intervals to display the confidence interval for the
difference of the means. To change the interval, enter any number from 1 to 99 (95 and 99
are the most commonly used intervals). Click the selected check box if you do not want to
include the confidence interval in the report.
Select Residuals to display residuals in the report and to save the residuals of the test to the
specified worksheet column. To change the column the residuals are saved to, edit the number
in or select a number from the drop-down list.
192
6.6.5 Running a Two Way Repeated Measures ANOVA
Note
If multiple comparisons are triggered, the Multiple Comparison Options dialog box
appears after you pick your data from the worksheet and run the test, prompting you
to choose a multiple comparison method.
To run a test, you need to select the data to test. If you want to select your data before you
run the test, drag the pointer over your data.
1. On the Analysis tab, in the Statistics group, from the Tests drop-down list select:
Repeated Measures→Two Way Repeated Measures ANOVA
The Pick Columns for Two Way RM ANOVA dialog box appears prompting you to
specify a data format.
2. Select the appropriate data format from the Data Format drop-down list. For more
information, see 6.2 Data Format for Repeated Measures Tests.
3. Click Next to pick the data columns for the test. If you selected columns before you chose
the test, the selected columns appear in the Selected Columns list.
4. To assign the desired worksheet columns to the Selected Columns list, select the
columns in the worksheet, or select the columns from the Data for Data drop-down list.
The first selected column is assigned to the first row in the Selected Columns list, and
all successively selected columns are assigned to successive rows in the list. The title of
selected columns appears in each row. For raw and indexed data, you are prompted to
select two worksheet columns.
5. To change your selections, select the assignment in the list, then select new column
from the worksheet. You can also clear a column assignment by double-clicking it in the
Selected Columns list.
6. Click Finish to run the Two Way RM ANOVAon the selected columns.
7. If you elected to test for normality and equal variance, SigmaPlot performs the test
for normality (Shapiro-Wilk or Kolmogorov-Smirnov) and the test for equal variance
(Levene Median). If your data fail either test, SigmaPlot informs you. You can either
continue, or transform your data, then perform a Two Way Repeated Measures ANOVA
on the transformed data.
8. If your data have empty cells, you are prompted to perform the appropriate procedure.
• If you are missing a cell, but the data is still connected, you may have to proceed
by either assuming no interaction between the factors, or by performing a one factor
analysis on each cell.
• If your data is not geometrically connected, or if a subject is missing data for one
level, you cannot perform a Two Way Repeated Measures ANOVA. Continue using
a One Way ANOVA, or cancel the test.
• If you are missing a few data points, but there is still at least one observation in each
cell, SigmaPlot automatically proceeds. For more information, see 6.6.3 Arranging
Two Way Repeated Measures ANOVA Data.
193
SigmaPlot Statistics
9. If you selected to run multiple comparisons only when the P value is significant, and the P
value is not significant the One Way ANOVA report appears after the test is complete. For
more information, see 6.6.4 Setting Two Way Repeated Measures ANOVA Options.
If the P value for multiple comparisons is significant, or you selected in to always perform
multiple comparisons, the Multiple Comparisons Options dialog box appears prompting
you to select a multiple comparison method.
194
6.6.7 Interpreting Two Way Repeated Measures ANOVA Results
If your data contained missing values but no empty cells, the report indicates the results
were computed using a general linear model. The ANOVA table includes the approximate
degrees of freedom used to compute F, the estimated mean square equations are listed, and
the summary table displays the estimated least square means.
If your data contained empty cells, you either analyzed the problem assuming no interaction,
or treated the problem as a One Way ANOVA.
• If you choose no interactions, no statistics for factor interaction are calculated.
• If you performed a One Way ANOVA, the results shown are identical to one way ANOVA
results.
• For more information, see 6.5.7 Interpreting One Way Repeated Measures ANOVA Results.
This is the column title of the indexed worksheet data you are analyzing with the Two Way
Repeated Measures ANOVA. Determining if the values in this column are affected by the
different factor levels is the objective of the Two Way Repeated Measures ANOVA.
195
SigmaPlot Statistics
196
6.6.7.5 ANOVA Table
is an estimate of the variance of the underlying population computed from the variability
between levels of the factor.
The interaction mean square
sum of squares for the interaction SS inter
= = MS inter
degrees of freedom for the interaction DF inter
is an estimate of the variance of the underlying population computed from the variability
associated with the interactions of the factors.
The error mean square (residual, or within groups)
error sum of squares SS error
= = MS error
error degrees of freedom DF error
is an estimate of the variability in the underlying population, computed from the random
component of the observations.
F Test Statistic. The F test statistic is provided for comparisons within each factor and
between the factors
If there are no missing data, the F statistic within the factors is
Note
If there are missing data or empty cells, SigmaPlot automatically adjusts the F
computations to account for the offsets of the expected mean squares.
If the F ratio is around 1, the data is consistent with the null hypothesis that there is no effect
(for example, no differences among treatments).
If F is a large number, the variability among the means is larger than expected from random
variability in the population, and you can conclude that the samples were drawn from different
populations (for example, the differences between the treatments are statistically significant).
P value. The P value is the probability of being wrong in concluding that there is a true
difference between the treatments (for example, the probability of falsely rejecting the null
hypothesis, or committing a Type I error, based on F). The smaller the P value, the greater
the probability that the samples are drawn from different populations. Traditionally, you can
conclude there are significant differences if P < 0.05.
197
SigmaPlot Statistics
Approximate DF (Degrees of Freedom). If a general linear model was used, the ANOVA
table also includes the approximate degrees of freedom that allow for the missing value(s). See
DF (Degrees of Freedom) above for an explanation of the degrees of freedom for each variable.
6.6.7.6 Power
The power of the performed test is displayed unless you disable this option in the Options for
Two Way RM ANOVA dialog box.
The power, or sensitivity, of a Two Way Repeated Measures ANOVA is the probability that
the test will detect a difference among the treatments if there really is a difference. The closer
the power is to 1, the more sensitive the test.
Repeated Measures ANOVA power is affected by the sample sizes, the number of treatments
being compared, the chance of erroneously reporting a difference a (alpha), the observed
differences of the group means, and the observed standard deviations of the samples.
Alpha (α). Alpha (α) is the acceptable probability of incorrectly concluding that there is a
difference. An a error is also called a Type I error. A Type I error is when you reject the
hypothesis of no effect when this hypothesis is true.
Set the value in the Options for Two Way RM ANOVA dialog box; the suggested value is
α = 0.05 which indicates that a one in twenty chance of error is acceptable. Smaller values
of a result in stricter requirements before concluding there is a significant difference, but a
greater possibility of concluding there is no difference when one exists (a Type II error).
Larger values of a make it easier to conclude that there is a difference but also increase the risk
of seeing a false difference (a Type I error).
If there were missing data and a general linear model was used, the linear equations for the
expected mean squares computed by the model are displayed. These equations are displayed
only if a general linear model was used.
The least square means and standard error of the means are displayed for each factor separately
(summary table row and column), and for each combination of factors (summary table cells).
If there are missing values, the least square means are estimated using a general linear model.
Mean. The average value for the condition or group.
Standard Error of the Mean. A measure of uncertainty in the mean.
The Least Squares Mean and associated Standard Error are computed based on all the data.
These values can differ from the values computed from the data in the individual cells. In
particular, if the design is balanced, all the least square errors will be equal for all cells. (If the
sample sizes in different cells are different, the least squares standard errors will be different,
depending on the sample sizes, with larger standard errors associated with smaller sample
sizes.) These standard errors will be different than the standard errors computed from each
cell separately.
This table is generated if you select to display summary table in the Options for Two Way RM
ANOVA dialog box. For more information, see 6.6.4 Setting Two Way Repeated Measures
ANOVA Options.
198
6.6.7.9 Multiple Comparisons
199
SigmaPlot Statistics
p is parameter used when computing q. The larger the p, the larger q needs to be to indicate a
significant difference. p is an indication of the differences in the ranks of the group means
being compared. Groups means are ranked in order from largest to smallest in an SNK test,
so p is the number of means spanned in the comparison. For example, when comparing four
means, comparing the largest to the smallest p = 4, and when comparing the second smallest to
the smallest p = 2.
If a treatment is found to be not significantly different than another treatment, all treatments
with p ranks in between the p ranks of the two treatments that are not different are also
assumed not to be significantly different, and a result of DNT (Do Not Test) appears for
those comparisons.
Note
SigmaPlot does not apply the DNT logic to all pairwise comparisons because of
differences in the degrees of freedom between different cell pairs.
The Difference of Means is a gauge of the size of the difference between the treatments
or cells being compared.
The degrees of freedom DF for the marginal comparisons are a measure of the number of
treatments (levels) within the factor being compared. The degrees of freedom when comparing
all cells is a measure of the sample size after accounting for the factors and interaction (this is
the same as the error or residual degrees of freedom).
The Create Result Graph dialog box appears displaying the types of graphs available for
the Two Way Repeated Measure ANOVA results.
3. Select the type of graph you want to create from the Graph Type list, then click OK, or
double-click the desired graph in the list.
The selected graph appears in a graph window. For more information, see Modifying
Graphs Using the Property Browser.
200
6.7 Friedman Repeated Measures Analysis of Variance on Ranks
1. Enter or arrange your data in the worksheet. For more information, see 6.7.3 Arranging
Repeated Measures ANOVA on Ranks Data.
2. Set the rank sum options. For more information, see 6.7.4 Setting the Repeated Measures
ANOVA on Ranks Options.
3. On the Analysis tab, in the Statistics group, from the Tests drop-down list select:
Repeated Measures→Repeated Measures ANOVA on Ranks
4. Generate report graph. For more information, see 6.7.8 Repeated Measures ANOVA
on Ranks Report Graphs.
5. Run the test. For more information, see 6.7.5 Running a Repeated Measures ANOVA
on Ranks.
201
SigmaPlot Statistics
6. Generate report graph. For more information, see 6.7.8 Repeated Measures ANOVA
on Ranks Report Graphs.
202
6.7.4.2 Options for RM ANOVA on Ranks: Results
• P Values for Normality and Equal Variance. Enter the corresponding P value in the P
Value to Reject box. The P value determines the probability of being incorrect in concluding
that the data is not normally distributed (the P value is the risk of falsely rejecting the null
hypothesis that the data is normally distributed). If the P value computed by the test is
greater than the P set here, the test passes.
To require a stricter adherence to normality and/or equal variance, increase the P value.
Because the parametric statistical methods are relatively robust in terms of detecting violations
of the assumptions, the suggested value in SigmaPlot is 0.050. Larger values of P (for
example, 0.100) require less evidence to conclude that data is not normal.
To relax the requirement of normality and/or equal variance, decrease P. Requiring smaller
values of P to reject the normality assumption means that you are willing to accept greater
deviations from the theoretical normal distribution before you flag the data as non-normal. For
example, a P value of 0.01 for the normality test requires greater deviations from normality to
flag the data as non-normal than a value of 0.05.
Restriction
Although the assumption tests are robust in detecting data from populations that
are non-normal or with unequal variances, there are extreme conditions of data
distribution that these tests cannot take into account. For example, the Levene Median
test fails to detect differences in variance of several orders of magnitude; however,
these conditions should be easily detected by simply examining the data without
resorting to the automatic assumption tests.
203
SigmaPlot Statistics
A value of .05 indicates that the multiple comparisons will detect a difference if there is less
than 5% chance that the multiple comparison is incorrect in detecting a difference. A value of
.10 indicates that the multiple comparisons will detect a difference if there is less than 10%
chance that the multiple comparison is incorrect in detecting a difference.
Note
If multiple comparisons are triggered, the Multiple Comparison Options dialog box
appears after you pick your data from the worksheet and run the test, prompting you
to choose a multiple comparison method. For more information, see 6.7.6 Multiple
Comparison Options (RM ANOVA on ranks).
To run an Repeated Measures ANOVA on Ranks, you need to select the data to test. If you
want to select your data before you run the test, drag the pointer over your data.
1. On the Analysis tab, in the Statistics group, from the Tests drop-down list select:
Repeated Measures→Repeated Measures ANOVA on Ranks
The Pick Columns for RM ANOVA on Ranks dialog box appears prompting you
to specify a data format.
2. Select the appropriate data format from the Data Format drop-down list. For more
information, see 6.2 .
3. Click Next to pick the data columns for the test. If you selected columns before you chose
the test, the selected columns appear in the Selected Columns list.
4. To assign the desired worksheet columns to the Selected Columns list, select the
columns in the worksheet, or select the columns from the Data for Data drop-down list.
The first selected column is assigned to the first row in the Selected Columns list, and
all successively selected columns are assigned to successive rows in the list. The title of
selected columns appears in each row. For raw and indexed data, you are prompted to
select two worksheet columns.
5. To change your selections,select the assignment in the list, then select new column
from the worksheet. You can also clear a column assignment by double-clicking it in the
Selected Columns list.
6. Click Finish to run the RM ANOVA on Ranks test on the selected columns.
If you elected to test for normality and equal variance, SigmaPlot performs the test
for normality (Shapiro-Wilk or Kolmogorov-Smirnov) and the test for equal variance
(Levene Median). If your data passes both tests, SigmaPlot informs you and suggests
continuing your analysis using One Way Repeated Measures ANOVA.
If you did not enable multiple comparison testing in the Options for RM ANOVA on
Ranks dialog box, the Repeated Measures ANOVA on Ranks report appears after the
test is complete.
If you did enable the Multiple Comparisons option in the options dialog box, the Multiple
Comparison Options dialog box appears prompting you to select a multiple comparison
204
6.7.6 Multiple Comparison Options (RM ANOVA on ranks)
method. For more information, see 6.7.6 Multiple Comparison Options (RM ANOVA
on ranks).
205
SigmaPlot Statistics
206
6.7.7.5 Multiple Comparisons
Traditionally, you can conclude there are significant differences when P < 0.05.
207
SigmaPlot Statistics
If the P value for the comparison is less than 0.05, the likelihood of being incorrect in
concluding that there is a significant difference is less than 5%. If it is greater than 0.05, you
cannot confidently conclude that there is a difference.
The rank sums is a gauge of the size of the difference between the two treatments.
A result of DNT (do not test) appears for those comparison pairs whose difference of rank
means is less than the differences of the first comparison pair which is found to be not
significantly different. For more information, see 6.7.8 Repeated Measures ANOVA on Ranks
Report Graphs.
The Create Result Graph dialog box appears displaying the types of graphs available
for the One Way Repeated Measure ANOVA results.
3. Select the type of graph you want to create from the Graph Type list, then click OK, or
double-click the desired graph in the list.
The selected graph appears in a graph window. For more information, see Modifying
Graphs Using the Property Browser.
208
7 Comparing Frequencies, Rates,
and Proportions
Topics Covered in this Chapter
♦ About Rate and Proportion Tests
♦ Data Format for Rate and Proportion Tests
♦ Comparing Proportions Using the z-Test
♦ Chi-square Analysis of Contingency Tables
♦ The Fisher Exact Test
♦ McNemar’s Test
♦ Relative Risk Test
♦ Odds Ratio Test
Use rate and proportion tests to compare two or more sets of data for differences in the number
of individuals that fall into different classes or categories. You can find all of these tests by
going to the menus and selecting:
If you are comparing groups where the data is measured on a numeric scale, use the
appropriate group comparison or repeated measures tests. For more information, see 3.2
Choosing the Procedure to Use.
Use a z-test to compare the proportions of two groups found within a single category for a
significant difference. To perform a z-Test:
209
SigmaPlot Statistics
210
7.2.1 z-test
7.2.1 z-test
The data for a z-test is always placed in two worksheet rows by two columns. The size (total
number of observations) of each group is in one column, and the corresponding proportion p
of the observations within the category is in a second column. The number of observations
must always be an integer, and the proportions p must be between 0 and 1.
Figure 7.1 A Contingency Table describing the number of Lowland and Alpine
species found at different locations.
Raw Data You can report the group and category of each individual observation by placing
the group in one worksheet column and the corresponding category in another column. Each
row corresponds to a single observation, so there should be as many rows of data as there
are total numbers of observations.
SigmaPlot automatically cross tabulates these data and performs the χ2 analysis on the
resulting contingency table. For more information, see 7.4.3 .
211
SigmaPlot Statistics
Figure 7.2 Worksheet Data Arrangement for Contingency Table Data from the
Table above
Columns 1 through 3 in the workshhet above are in tabular format, and columns 4 and 5 are
raw data.
Figure 7.3 A 2 x 2 Contingency Table describing the number of harbor seals and
sea lions found on two different islands.
Tabulated Data. Tabulated data is arranged in a contingency table showing the number of
observations for each cell. The worksheet rows and columns correspond to the groups and
categories. The number of observations must always be an integer.
Raw Data . A group identifier is placed in one worksheet column and the corresponding
category in another column. There must be exactly two kinds of groups and two types of
categories. Each row corresponds to a single observation, so there should be as many rows of
data as there are total numbers of observations.
SigmaPlot automatically cross-tabulates this data and performs the Fisher Exact Test on the
resulting contingency table. For more information, see 7.5.3 Arranging Fisher Exact Test Data.
212
7.2.4 McNemar’s Test
Columns 1 and 2 in the worksheet above are in tabular format and columns 3 and 4 are raw
data observations. A Fisher Exact Test requires data for a 2 x 2 table.
Raw Data A category identifier is placed in one worksheet column and the corresponding
category in another column. There must be the same number of the types of categories. Each
row corresponds to a single observation, so there should be as many rows of data as there
are total numbers of observations.
213
SigmaPlot Statistics
SigmaPlot automatically cross tabulates this data and performs McNemar’s Test on the
resulting contingency table. For more information, see 7.6.3 Arranging McNemar Test Data.
Columns 1 through 3 in the worksheet above are in tabular format, and columns 4 through 6
are raw data observations. McNemar’s Test requires data for tables with equal numbers of
columns and rows–here a 3 x 3 table.
214
7.3.2 Performing a z-test
To perform a z-test:
1. Enter or arrange your data in the data worksheet. For more information, see 7.3.3
Arranging z-test Data.
2. If desired, set the z-test options.For more information, see 7.3.4 Setting z-test Options.
3. Click the Analysis tab.
4. In the Statistics group, from the Tests drop-down list, select:
Rates and Proportions→z-test
5. View and interpret the z-test report. For more information, see 7.3.6 Interpreting
Proportion Comparison Results.
6. Run the test. For more information, see 7.3.5 Running a z-Test.
7. View and interpret the z-test report. For more information, see 7.3.6 Interpreting
Proportion Comparison Results.
1. If you are going to run the test after changing test options, and want to select your data
before you run the test, drag the pointer over your data.
2. Select z-test from the Select Test drop-down list in the Statistics group on the Analysis.
3. Click Current Test Options.
The Options for z-test dialog box appears. For more information, see 7.3.4.1 Options
for z-test.
215
SigmaPlot Statistics
4. Click a check box to enable or disable a test option. All options are saved between
SigmaPlot sessions.
5. To continue the test, click Run Test. For more information, see 7.3.5 Running a z-Test.
6. To accept the current settings and close the options dialog box, click OK.
216
7.3.5 Running a z-Test
conclusion. The Yates correction is applied to 2 x 2 tables and other statistics where the P
value is computed from a χ2 distribution with one degree of freedom.
Click the selected check box to turn the Yates Correction Factor on or off.
Confidence Interval. This is the confidence interval for the difference of proportions. To
change the specified interval, select the box and type any number from 1 to 99 (95 and 99 are
the most commonly used intervals).
To run a test, you need to select the data to test. The Pick Columns dialog box is used to
select the worksheet columns with the data you want to test and to specify how your data is
arranged in the worksheet.
To run a z-test:
1. If you want to select your data before you run the test, drag the pointer over your data.
2. Click the Analysis tab.
3. In the Statistics group, from the Tests drop-down list, select:
Rates and Proportions→z-test
The Pick Columns dialog box appears. If you selected columns before you chose the
test, the selected columns appear in the column list. If you have not selected columns, the
dialog box prompts you to pick your data.
Figure 7.8 The z-test — Select Data Dialog Box Prompting You to Select Data
Columns
4. To assign the desired worksheet columns to the Selected Columns list, select the columns
in the worksheet, or select the columns from the Data for Size or Proportion drop-down
list.
The first selected column is assigned to the Size row in the Selected Columns list, and the
second column is assigned to Proportion row in the list. The title of selected columns
appear in each row. You can only select one Size and one Proportion data column.
217
SigmaPlot Statistics
5. To change your selections, select the assignment in the list, then select new column from
the worksheet. You can also clear a column assignment by double-clicking it in the
Selected Columns list.
6. Click Finish to perform the test. The report appears displaying the results of the z-test.
For more information, see 7.3.6 Interpreting Proportion Comparison Results.
Results Explanations
In addition to the numerical results, expanded explanations of the results may also appear. You
can turn off this text on the Options dialog box. You can also set the number of decimal places
to display in the Options dialog box. For more information, see the SigmaPlot 12 User’s Guide.
218
7.3.6.2 z statistic
Standard Error of the Difference. The standard error of the difference is a measure of the
precision with which this difference can be estimated.
7.3.6.2 z statistic
The z statistic is
difference of the sample proportions
=z
standard error of the sample proportions
You can conclude from "large" absolute values of z that the proportions of the populations are
different. A large z indicates that the difference between the proportions is larger than what
would be expected from sampling variability alone (for example, that the difference between
the proportions of the two groups is statistically significant). A small z (near 0) indicates that
there is no significant difference between the proportions of the two groups.
If you enabled the Yates correction in the Options for z-test dialog box, the calculation of z is
slightly smaller to account for the difference between the theoretical and calculated values
of z. For more information, see 7.3.4 Setting z-test Options.
P Value. The P value is the probability of being wrong in concluding that there is a difference
in the proportions of the two groups (for example, the probability of falsely rejecting the null
hypothesis, or committing a Type I error). The smaller the P value, the greater the probability
that the samples are drawn from populations with different proportions. Traditionally, you
conclude that there are significant differences when P < 0.05.
7.3.6.4 Power
The power, or sensitivity, of a z-test is the probability that the test will detect a difference
among the groups if there really is a difference. The closer the power is to 1, the more
sensitive the test. z-test power is affected by the sample size and the observed proportions
of the samples.
This result is displayed unless you disabled it in the Options for z-test dialog box. For more
information, see 7.3.4 Setting z-test Options.
Alpha. Alpha (α) is the acceptable probability of incorrectly concluding that there is a
difference. An a error is also called a Type I error (a Type I error is when you reject the
hypothesis of no effect when this hypothesis is true).
The α value is set in the z-test Power dialog box; the suggested value is α = 0.05 which
indicates that a one in twenty chance of error is acceptable. Smaller values of α result in
stricter requirements before concluding there is a difference in distribution, but a greater
219
SigmaPlot Statistics
possibility of concluding there is no difference when one exists (a Type II error). Larger
values of α make it easier to conclude that there is a difference, but also increase the risk
of seeing a false difference (a Type I error).
Figure 7.10 A Contingency Table describing the number of Lowland and Alpine
species found at different locations.
The χ2 test uses the percentages of the row and column totals for each cell to compute the
expected number of observations per cell if the treatment had no effect. The χ2 statistic
summarizes the difference between the expected and the observed frequencies. For more
information, see 7.2 Data Format for Rate and Proportion Tests.
220
7.4.3 Arranging Chi-Square Data
1. Enter or arrange your data appropriately in the data worksheet. For more information, see
7.4.3 Arranging Chi-Square Data.
2. If desired, set the Chi-Square options. For more information, see 7.4.4 .
3. Click the Analysis tab.
4. In the Statistics group, from the Tests drop-down list, select:
Rates and Proportions→Chi-Square
5. Run the test. For more information, see 7.4.5 Running a Chi-Square Test.
6. View and interpret the Chi-Square report. For more information, see 7.4.6 Interpreting
Results of a Chi-Squared Analysis of Contingency tables.
Columns 1 through 3 in the worksheet above are arranged as a contingency table. Columns 4
and 5 are raw data for the observations. Each row corresponds to a single observation. Note
that not all the raw data points are shown, as the columns are longer than fifteen rows.
Tabulated Data. Tabulated data is arranged in a contingency table using the worksheet rows
and columns as the groups and categories. The number of observations for each combination
of the group are entered into the appropriate cells.
221
SigmaPlot Statistics
Raw Data. Raw data uses a row for each individual observation, and places the corresponding
groups for the observations in one column and the categories in a second column. SigmaPlot
automatically determines the number of groups and categories used. For more information,
see 7.2 Data Format for Rate and Proportion Tests.
1. If you are going to run the test after changing test options, and want to select your data
before you run the test, drag the pointer over your data.
2. Select Chi-square from the Select Test drop-down list in the Statistics group on the
Analysis.
3. Click a check box to enable or disable a test option. All options are saved between
SigmaPlot sessions.
4. To continue the test, click Run Test. For more information, see 7.4.5 Running a
Chi-Square Test.
222
7.4.4.1 Options for Chi Square
5. To accept the current settings and close the options dialog box, click OK.
To run a test, you need to select the data to test. Use the Pick Columns dialog box to select the
worksheet columns with the data you want to test and to specify how your data is arranged in
the worksheet.
To run a Chi-Square Test:
1. If you want to select your data before you run the test, drag the pointer over your data.
2. Click the Analysis tab.
3. In the Statistics group, from the Tests drop-down list, select:
Rates and Proportions→Chi-Square
The Chi-Square — Data Format dialog box appears prompting you to specify a data
format.
4. Select the appropriate data format from the Data Format drop-down list. If you are
testing contingency table data, select Tabulated. If your data is arranged in raw format,
select Raw. For more information, see 7.4.3 Arranging Chi-Square Data.
223
SigmaPlot Statistics
Figure 7.13 The Chi-Square — Data Format Dialog Box Prompting You to Select
a Data Format
5. Click Next to pick the data columns for the test. If you selected columns before you chose
the test, the selected columns appear in the Selected Columns list.
If you selected columns before you chose the test, the selected columns appear in the
column list. If you have not selected columns, the dialog box prompts you to pick your
data.
6. To assign the desired worksheet columns to the Selected Columns list, select the columns
in the worksheet, or select the columns from the Data for Observations or Category
drop-down list.
The first selected column is assigned to the first Observation or Category row in the
Selected Columns list, and all successively selected columns are assigned to successive
rows in the list. The title of selected columns appears in each row. For raw data, you
are prompted to select two worksheet columns. For tabulated data you are prompted to
select up to 64 columns.
7. To change your selections, select the assignment in the list, then select new column
from the worksheet. You can also clear a column assignment by double-clicking it in
the Selected Columns list.
224
7.4.6 Interpreting Results of a Chi-Squared Analysis of Contingency tables
Figure 7.14 The Chi-Square — Select Data Dialog Box Prompting You to Select
Data Columns
8. Click Finish to run the test. If there are too many cells in a contingency table with
expected values below 5, SigmaPlot either:
• Suggests that you redefine the groups or categories in the contingency table to reduce
the number of cells and increase the number of observations per cell.
• Suggests the Fisher Exact Test if the table is a 2 x 2 contingency table.
When there are many cells with expected observations of 5 or less, the theoretical χ2
distribution does not accurately describe the actual distribution of the χ2 test statistic, and
the resulting P values may not be accurate.
Fisher Exact Test computes the exact two-tailed probability of observing a specific 2 x 2
contingency table, and does not require that the expected frequencies in all cells exceed
5. When the test is complete, the χ2 test report appears. For more information, see 7.4.6
Interpreting Results of a Chi-Squared Analysis of Contingency tables.
225
SigmaPlot Statistics
Results Explanations
In addition to the numerical results, expanded explanations of the results may also appear. You
can turn off this text on the Options dialog box. You can also set the number of decimal places
to display in the Options dialog box. For more information, see Setting Report Options.
226
7.4.6.2 Chi-Square
7.4.6.2 Chi-Square
χ2 is the summed squared differences between the observed frequencies in each cell of the
table and the expected frequencies, or
This computation assumes that the rows and columns are independent.
If the value of χ2 is large, you can conclude that the distributions are different (for example,
that there is a large differences between the expected and observed frequencies, indicating
that the rows and columns are independent).
Values of χ2 near zero indicate that the pattern in the contingency table is no different from
what one would expect if the counts were distributed at random.
Yates Correction. The Yates correction is used to adjust the χ2 and therefore the P value for 2
x 2 tables to more accurately reflect the true distribution of χ2. The Yates correction is enabled
in the Options for Chi-Square dialog box, and is only applied to 2 x 2 tables.
P Value. The P value is the probability of being wrong in concluding that there is a true
difference in the distribution of the numbers of observations (for example, the probability
of falsely rejecting the null hypothesis, or committing a Type I error, based on χ2). The
smaller the P value, the greater the probability that the samples are drawn from populations
with different distributions among the categories. Traditionally, you conclude that there are
significant differences when P < 0.05.
7.4.6.3 Power
The power, or sensitivity, of a Chi-Square test is the probability that the test will detect a
difference among the groups if there really is a difference. The closer the power is to 1, the
more sensitive the test. Chi-Square power is affected by the sample size and the observed
proportions of the samples. This result is displayed if you selected this option in the Options
for Chi-Square dialog box.
Alpha. Alpha (α) is the acceptable probability of incorrectly concluding that there is a
difference. An a error is also called a Type I error (a Type I error is when you reject the
hypothesis of no effect when this hypothesis is true).
Set the α value is set in the Power Option dialog box. The suggested value is α = 0.05,
which indicates that a one in twenty chance of error is acceptable. Smaller values of α result
in stricter requirements before concluding there is a difference in distribution, but a greater
possibility of concluding there is no difference when one exists (a Type II error). Larger
values of α make it easier to conclude that there is a difference, but also increase the risk
of seeing a false difference (a Type I error).
227
SigmaPlot Statistics
1. Enter or arrange your data in the data worksheet. For more information, see 7.5.3
Arranging Fisher Exact Test Data.
2. Click the Analysis tab.
3. In the Statistics group, from the Tests drop-down list, select:
Rates and Proportions→Fisher Exact Test
Run the test. For more information, see 7.5.4 Running a Fisher Exact Test.
4. View and interpret the Fisher Exact Test report. For more information, see 7.5.5
Interpreting Results of a Fisher Exact Test.
228
7.5.4 Running a Fisher Exact Test
Columns 1 and 2 in the worksheet above are arranged as a 2 x 2 contingency table, and
columns 3 and 4 are the raw observation data.
Tabulated Data. Tabulated or contingency table data uses the rows to represent the two
groups, and the columns to represent the two categories, or vice versa. The number of
individuals that fall into each combination of groups and categories is entered into each cell.
There should be no more than two rows and two columns.
Raw Data. Raw data uses a row for each individual observation, and places the corresponding
groups for the observations in one column and the categories in a second column. There
should be no more than two different groups and two types of categories.
To run a test, you need to select the data to test. Use the Test Wizard to select the worksheet
columns with the data you want to test and to specify how your data is arranged in the
worksheet.
To run a Fisher Exact Test:
1. If you want to select your data before you run the test, drag the pointer over your data.
2. Click the Analysis tab.
3. In the Statistics group, from the Tests drop-down list, select:
Rates and Proportions→Fisher Exact Test
The Fisher Exact — Data Format dialog box appears prompting you to specify a data
format.
229
SigmaPlot Statistics
Figure 7.17 The Fisher Exact — Data Format Dialog Box Prompting You to
Specify a Data Format
4. Select the appropriate data format from the Data Format drop-down list. If you are
testing contingency table data, select Tabulated. If your data is arranged in raw format,
select Raw. For more information, see 7.5.3 Arranging Fisher Exact Test Data.
5. Click Next to pick the data columns for the test. If you selected columns before you chose
the test, the selected columns appear in the Selected Columns list.
If you selected columns before you chose the test, the selected columns appear in the
column list. If you have not selected columns, the dialog box prompts you to pick your
data.
6. To assign the desired worksheet columns to the Selected Columns list, select the columns
in the worksheet, or select the columns from the Data for Observations or Category
drop-down list.
The first selected column is assigned to the first Observation or Category row in the
Selected Columns list, and all successively selected columns are assigned to successive
rows in the list. The title of selected columns appears in each row. For raw data, you are
prompted to select up two worksheet columns. For tabulated data you are prompted to
select up to 64 columns.
7. To change your selections, select the assignment in the list, then select new column from
the worksheet. You can also clear a column assignment by double-clicking it in the
Selected Columns list.
230
7.5.5 Interpreting Results of a Fisher Exact Test
Figure 7.18 The Fisher Exact — Select Data Dialog Box Prompting You to
Select Data Columns
8. Click Finish to run the test. If there are no cells in the table with expected values below 5,
SigmaPlot suggests the χ2 test instead. (You can use the Fisher Exact Test, but it takes
longer to compute.)
Note
The Fisher Exact Test computes the exact two-tailed probabilities of observing
a specific 2 x 2 contingency table, and does not require that the expected
frequencies in all cells exceed 5.
The Fisher Exact Test is performed. When the test is complete, the Fisher Exact Test
report appears. For more information, see 7.5.5 Interpreting Results of a Fisher Exact Test.
231
SigmaPlot Statistics
Results Explanations
In addition to the numerical results, expanded explanations of the results may also appear. You
can turn off this text on the Options dialog box. You can also set the number of decimal places
to display in the Options dialog box. For more information, see Setting Report Options.
7.5.5.1 P Value
The P value is the two-tailed probability of being wrong in concluding that there is a true
difference in the distribution of the numbers of observations (for example, the probability
of falsely rejecting the null hypothesis, or committing a Type I error). The smaller the P
value, the greater the probability that the samples are drawn from populations with different
distributions among the two categories.
Traditionally, you conclude that there are significant differences when P < 0.05.
Note
The Fisher Exact Test computes P directly using a two tailed probability.
232
7.6 McNemar’s Test
Row Percentage. The percentage of observations in each row of the contingency table,
obtained by dividing the observed frequency counts in the cells by the total number of
observations in that row.
Column Percentage. The percentage of observations in each column of the contingency
table, obtained by dividing the observed frequency counts in the cells by the total number of
observations in that column.
1. Enter or arrange your data appropriately in the data worksheet. For more information,
see 7.6.3 Arranging McNemar Test Data.
2. View and interpret the McNemar Test report. For more information, see 7.6.6 Interpreting
Results of McNemar’s Test.
233
SigmaPlot Statistics
Tabulated Data. For tabulated or contingency table data, the worksheet rows correspond to
one set of treatment categories and the columns to the other set of treatment categories. The
number of individuals that correspond to that combination of categories is entered into each
cell. The categories assigned to the rows are assumed to be in the same order of occurrence
as the columns. The number of individuals that fall into each combination of the categories
is entered into each cell. Because the same set of categories are used for the two different
treatments, the number of rows and columns in the table are always the same.
Raw Data. Raw data uses a row for each individual observation, and places the corresponding
groups for the first treatment category in one column and the second treatment category in a
second column. There should be the same number of categories in each column.
Specify the data format to use when running a test in the Pick Columns dialog box.
Columns 1 through 3 in the worksheet above are arranged as a 3 x 3 contingency table, and
columns 4 and 5 are raw observation data.
234
7.6.4.1 Options for McNemar’s
Use the McNemar Test options to enable the Yates Correction Factor.
To change McNemar Test options:
1. If you are going to run the test after changing test options and want to select your data
before you run the test, drag the pointer over your data.
2. Select McNemar Test from the Tests drop-down list in the Statistics group on the
Analysis tab.
3. Click Options.
4. Select Yates Correction Factor to include the Yates Correction Factor in the test report.
For more information, see 7.6.4.1 Options for McNemar’s.
5. To continue the test, click Run Test.
6. To close the options dialog box and accept the current settings without continuing the
test, click OK.
235
SigmaPlot Statistics
Yates correction is applied to 2 x 2 tables and other statistics where the P value is computed
from a χ2 distribution with one degree of freedom.
To run the McNemar Test, you need to select the data to test. Use the Pick Columns dialog
box to select the worksheet columns with the data you want to test and to specify how your
data is arranged in the worksheet.
To run McNemar’s Test:
1. If you want to select your data before you run the test, drag the pointer over your data.
2. Click the Analysis tab.
3. In the Statistics group, from the Tests drop-down list, select:
Rates and Proportions→McNemar’s Test
The Pick Columns dialog box appears prompting you to specify a data format.
Figure 7.22 The McNemar’s — Data Format Dialog Box Prompting You to
Specify a Data Format
4. Select the appropriate data format from the Data Format drop-down list. If you are
testing contingency table data, select Tabulated. If your data is arranged in raw format,
select Raw. For more information, see 7.6.3 Arranging McNemar Test Data.
5. Click Next to pick the data columns for the test. If you selected columns before you chose
the test, the selected columns appear in the Selected Columns list.
If you selected columns before you chose the test, the selected columns appear in the
column list. If you have not selected columns, the dialog box prompts you to pick your
data.
6. To assign the desired worksheet columns to the Selected Columns list, select the columns
in the worksheet, or select the columns from the Data for Observations or Category
drop-down list.
236
7.6.6 Interpreting Results of McNemar’s Test
The first selected column is assigned to the first Observation or Category row in the
Selected Columns list, and all successively selected columns are assigned to successive
rows in the list. The title of selected columns appears in each row. For raw data, you
are prompted to select two worksheet columns. For tabulated data you are prompted to
select up to 64 worksheet columns.
7. To change your selections, select the assignment in the list, then select new column from
the worksheet. You can also clear a column assignment by double-clicking it in the
Selected Columns list.
Figure 7.23 The McNemar’s — Select Data Dialog Box Prompting You to Select
Data Columns
8. Click Finish to run the test. The McNemar’s test report appears. For more information,
see 7.6.6 Interpreting Results of McNemar’s Test.
237
SigmaPlot Statistics
Results Explanations
In addition to the numerical results, expanded explanations of the results may also appear. You
can turn off this text on the Options dialog box. You can also set the number of decimal places
to display in the Options dialog box. For more information, see Setting Report Options.
7.6.6.1 Chi-Square
χ2 is the summed squared differences between the observed frequencies in each cell of the
table and the expected frequencies, ignoring observations on the diagonal cells of the table
where the individuals responded identically to the treatments.
Large values of the χ2 test statistic indicate that individuals responded differently to the
different treatments (for example, that there are differences between the expected and
observed frequencies).
Values of χ2 near zero indicate that the pattern in the contingency table is no different from
what one would expect if the counts were distributed at random.
P Value. The P value is the probability of being wrong in concluding that there is a true
difference in the distribution of the numbers of observations (for example, the probability
of falsely rejecting the null hypothesis, or committing a Type I error). The smaller the P
value, the greater the probability that the samples are drawn from populations with different
238
7.6.6.2 Contingency Table Summary
distributions among the categories. Traditionally, you conclude that there are significant
differences when P < 0.05.
The null hypothesis for the Relative Risk Test is that the value of RR for the entire population
is 1. If the computed value of RR is significantly different from 1, then the treatment either
significantly increases or decreases the risk of the event in the population.
The data for a Relative Risk Test can always be represented in a 2x2 contingency table. The
probability of significance calculation for the test is based on the chi-square statistic for this
table. If the expected number of observations for any cell of the table is less than 5, then the
Fisher-Exact test is used to compute the probability. For more information, see 7.8.1 About
the Odds Ratio Test.
1. Enter or arrange your data appropriately in the data worksheet. For more information, see
7.7.3 Arranging Relative Risk Test Data.
2. If desired, set the Relative Risk options. For more information, see 7.7.4 Setting Relative
Risk Test Options.
239
SigmaPlot Statistics
1. If you are going to run the test after changing test options, and want to select your data
before you run the test, drag the pointer over your data.
2. Select Relative Risk from the Select Test drop-down list in the Statistics group on the
Analysis tab.
240
7.7.4.1 Options for Relative Risk
3. Click Current Test Options. The Options for Relative Risk dialog box appears. For
more information, see 7.7.4.1 Options for Relative Risk.
4. Click a check box to enable or disable a test option. All options are saved between
SigmaPlot sessions.
5. To continue the test, click Run Test. For more information, see 7.7.5 Running the
Relative Risk Test.
6. To accept the current settings and close the options dialog box, click OK.
To run the Relative Risk Test, you need to select the data to test. Use the Test Wizard to
select the worksheet columns with the data you want to test and to specify how your data is
arranged in the worksheet.
To run Relative Risk’s Test:
1. If you want to select your data before you run the test, drag the pointer over your data.
241
SigmaPlot Statistics
Figure 7.25 The Relative Risk - Data Format Dialog Box Prompting You to
Specify a Data Format
4. Select the appropriate data format from the Data Format drop-down list. If you are
testing contingency table data, select Tabulated. If your data is arranged in raw format,
select Raw. For more information, see 7.7.3 Arranging Relative Risk Test Data.
5. Click Next to pick the data columns for the test. If you selected columns before you chose
the test, the selected columns appear in the Selected Columns list.
If you selected columns before you chose the test, the selected columns appear in the
column list. If you have not selected columns, the dialog box prompts you to pick your
data.
6. To assign the desired worksheet columns to the Selected Columns list, select the
columns in the worksheet, or select the columns from the Data for Observations or
Category drop-down list.
If you selected the tabulated data format, the first selected column is assigned to the
Event row in the Selected Columns list and the second selected column is assigned to the
No Event row in the list. If the raw data format was selected, the first selected column is
assigned to the Event row in the Selected Columns list and the second selected column
is assigned to the Group row in the list.
7. To change your selections, select the assignment in the list, then select new column
from the worksheet. You can also clear a column assignment by double-clicking it in the
Selected Columns list.
242
7.7.6 Interpreting Results of the Relative Risk Test
Figure 7.26 The Relative Risk - Select Data Dialog Box Prompting You to Select
Data Columns
8. Click Finish to run the test. The Relative Risk test report appears. For more information,
see 7.7.6 Interpreting Results of the Relative Risk Test.
243
SigmaPlot Statistics
The odds ratio is an estimate of how much more likely the event occurs for an individual
in the population exposed to the risk factor as compared to an individual not exposed to
the risk factor.
1. Enter or arrange your data appropriately in the data worksheet. For more information,
see 7.8.3 Arranging Odds Ratio Test Data.
2. For more information, see 7.8.6 View and interpret the Odds Ratio Test report. .
244
7.8.4 Setting Odds Ratio Test Options
for event and the two levels for treatment. To distinguish the two levels in the event column,
the label in the first row will always represent the event. In the treatment column, the label
that represents the treatment is determined by a setting in the Test Options dialog box. For
more information, see 7.8.4 Setting Odds Ratio Test Options.
1. If you are going to run the test after changing test options, and want to select your data
before you run the test, drag the pointer over your data.
2. Click the Analysis tab.
3. In the Statistics group, from the Tests drop-down list, select:
Current Test Options
The Options for Odds Ratio dialog box appears. For more information, see 7.8.4.1
Options for Odds Ratio.
4. Click a check box to enable or disable a test option. All options are saved between
SigmaPlot sessions.
5. To continue the test, click Run Test. For more information, see 7.8.5 Running the Odds
Ratio Test.
6. To accept the current settings and close the options dialog box, click OK.
245
SigmaPlot Statistics
values which are too small when compared with the actual distribution of the χ2 test statistic.
The theoretical χ2 distribution is continuous, whereas the χ2 produced with real data is discrete.
The Yates continuity correction is used to adjust the chi-square statistic so that it more
accurately computes P-values based on the chi-square probability distribution.
Confidence Interval. This is the confidence interval for the population value of the relative
risk. To change the specified interval, select the box and type any number from 1 to 99 (95 and
99 are the most commonly used intervals).
Use the first row of the selected data as the treatment group. The Odds Ratio Test assumes
the population has been sampled into two groups, where the members of one group receive a
treatment and the members of the other group do not. Although selected by default so that the
first row of data represents the treatment group, select this option to specify the groups in the
worksheet. This option applies to both tabulated and raw data formats.
To run the Odds Ratio Test, you need to select the data to test. Use the Pick Columns dialog
box to select the worksheet columns with the data you want to test and to specify how your
data is arranged in the worksheet.
To run Odds Ratio Test:
1. If you want to select your data before you run the test, drag the pointer over your data.
2. Click the Analysis tab.
3. In the Statistics group, from the Tests drop-down list, select:
Rates and Proportions→Odds Ratio Test
The Odds Ratio - Data Format dialog box appears prompting you to specify a data
format.
4. Select the appropriate data format from the Data Format drop-down list. If you are
testing contingency table data, select Tabulated. If your data is arranged in raw format,
select Raw. For more information, see 7.8.3 Arranging Odds Ratio Test Data.
5. Click Next to pick the data columns for the test. If you selected columns before you chose
the test, the selected columns appear in the Selected Columns list.
If you selected columns before you chose the test, the selected columns appear in the
column list. If you have not selected columns, the dialog box prompts you to pick your
data.
6. To assign the desired worksheet columns to the Selected Columns list, select the columns
in the worksheet, or select the columns from the Data for Observations or Category
drop-down list.
If the tabulated data format was selected, the first selected column is assigned to the
Event row in the Selected Columns list and the second selected column is assigned to the
No Event row in the list. If the raw data format was selected, the first selected column is
246
7.8.6 Interpreting Results of the Odds Ratio Test
assigned to the Event row in the Selected Columns list and the second selected column
is assigned to the Group row in the list.
7. To change your selections, select the assignment in the list, then select new column from
the worksheet. You can also clear a column assignment by double-clicking it in the
Selected Columns list.
8. Click Finish to run the test. The Odds Ratio test report appears. For more information,
see 7.8.6 Interpreting Results of the Odds Ratio Test.
247
8 Prediction and Correlation
Topics Covered in this Chapter
♦ About Regression
♦ Simple Linear Regression
♦ Multiple Linear Regression
♦ Multiple Logistic Regression
♦ Polynomial Regression
♦ Stepwise Linear Regression
♦ Best Subsets Regression
♦ Pearson Product Moment Correlation
♦ Spearman Rank Order Correlation
♦ Deming Regression
Prediction uses regression and correlation techniques to describe the relationship between two
or more variables. For more information, see 3.7 Choosing the Prediction or Correlation
Method.
249
SigmaPlot Statistics
Multiple Linear Regression is similar to simple linear regression, but uses multiple independent
variables to fit the general equation for a multidimensional plane y=b0+b1x1+b2x2+b3x3+...+
bkxk where y is the dependent variable, x1, x2, x3, ...xk are the k independent variables,
and b1, b2, b3, ...bk are the k regression coefficients. As the values for x1 increase by 1, the
corresponding value for y either increases or decreases by bk depending on the sign of bk.
Regression is a parametric statistical method that assumes that the residuals (differences
between the predicted and observed values of the dependent variables) are normally distributed
with constant variance.
Because the regression coefficients are computed by minimizing the sum of squared residuals,
this technique is often called least squares regression.
8.1.1 Correlation
Correlation procedures measure the strength of association between two variables, which can
be used as a gauge of the certainty of prediction. Unlike regression, it is not necessary to
define one variable as the independent variable and one as the dependent variable.
The correlation coefficient r is a number that varies between –1 and +1. A correlation of –1
indicates there is a perfect negative relationship between the two variables, with one always
decreasing as the other increases. A correlation of +1 indicates there is a perfect positive
relationship between the two variables, with both always increasing together. A correlation of
0 indicates no relationship between the two variables.
There are two types of correlation coefficients.
• The Pearson Product Moment Correlation, a parametric statistic which assumes a normal
distribution and constant variance of the residuals. For more information, see 8.8 Pearson
Product Moment Correlation.
• The Spearman Rank Order Correlation, a nonparametric association test that does not
require assuming normality or constant variance of the residuals. For more information,
see 8.9 Spearman Rank Order Correlation.
250
8.2.1 About the Simple Linear Regression
• You want to predict a trend in data, or predict the value of a variable from the value of
another variable, by fitting a straight line through the data.
• You know there is exactly one independent variable.
The independent variable is the known, or predicted, variable, such as time or temperature.
When the independent variable is varied, it produces a corresponding value for the dependent,
or response, variable. If you know there is more than one independent variable, use multiple
linear regression.
1. Enter or arrange your data in the worksheet. For more information, see 8.2.3 Arranging
Linear Regression data.
2. If desired, set the Linear Regression options. For more information, see 8.2.4 Setting
Linear Regression Options.
3. Select the Analysis tab.
4. In the Statistics group, from the Tests drop-down list, select:
Regression→Linear
5. Generate report graphs. For more information, see 8.2.7 Simple Linear Regression
Report Graphs.
6. Run the test. For more information, see 8.2.5 .
251
SigmaPlot Statistics
1. If you are going to run the test after changing test options, and want to select your data
before you run the test, drag the pointer over your data.
2. Select the Analysis tab.
3. In the Statistics group, click Options. The Options for Linear Regression dialog box
appears with four tabs:
• Assumption Checking. Click the Assumption Checking tab to return to the
Normality, Constant Variance, and Durbin-Watson options. For more information, see
8.2.4.1 Options for Linear Regression: Assumption Checking.
• Residuals. Click the Residuals tab to view the residual options. For more information,
see 8.2.4.2 Options for Linear Regression: Residuals .
• More Statistics. Click the More Statistics tab to view the confidence intervals, PRESS
Prediction Error, and Standardized Coefficients options. For more information, see
8.2.4.3 Options for Linear Regression: More Statistics.
• Other Diagnostics. Click the Other Diagnostics tab to view the Influence and Power
options. For more information, see 8.2.4.4 .
4. Select a check box to enable or disable a test option. Options settings are saved between
SigmaPlot sessions. For more information, see 8.2.6 Interpreting Simple Linear
Regression Results.
5. To continue the test, click Run Test.
6. To accept the current settings and close the options dialog box, click OK.
252
8.2.4.1 Options for Linear Regression: Assumption Checking
253
SigmaPlot Statistics
To relax the requirement of independence, increase the acceptable difference from 2.0.
Select the Residuals tab in the options dialog box to view the Predicted Values, Raw,
Standardized, Studentized, Studentized Deleted, and Report Flagged Values Only options.
Predicted Values. Use this option to calculate the predicted value of the dependent variable
for each observed value of the independent variable(s), then save the results to the worksheet.
Click the selected check box if you do not want to include raw residuals in the worksheet.
To assign predicted values to a worksheet column, select the worksheet column you want to
save the predicted values to from the corresponding drop-down list. If you select none and the
Predicted Values check box is selected, the values appear in the report but are not assigned to
the worksheet.
Raw Residuals. The raw residuals are the differences between the predicted and observed
values of the dependent variables. To include raw residuals in the report, make sure this check
box is selected. Click the selected check box if you do not want to include raw residuals in
the worksheet.
To assign the raw residuals to a worksheet column, select the number of the desired column
from the corresponding drop-down list. If you select none from the drop-down list and the Raw
check box is selected, the values appear in the report but are not assigned to the worksheet.
Standardized Residuals. The standardized residual is the residual divided by the standard
error of the estimate. The standard error of the residuals is essentially the standard deviation
of the residuals, and is a measure of variability around the regression line. To include
standardized residuals in the report, make sure this check box is selected. Click the selected
check box if you do not want to include raw residuals in the worksheet.
SigmaPlot automatically flags data points lying outside of the confidence interval specified
in the corresponding box. These data points are considered to have "large" standardized
residuals, for example, outlying data points. You can change which data points are flagged by
editing the value in the Flag Values > edit box. The suggested residual value is 2.5.
Studentized Residuals. Studentized residuals scale the standardized residuals by taking
into account the greater precision of the regression line near the middle of the data versus
the extremes. The Studentized residuals tend to be distributed according to the Student t
distribution, so the t distribution can be used to define "large" values of the Studentized
residuals. SigmaPlot automatically flags data points with "large" values of the Studentized
residuals, for example, outlying data points; the suggested data points flagged lie outside the
95% confidence interval for the regression population.
To include Studentized residuals in the report, make sure this check box is selected. Click the
selected check box if you do not want to include Studentized residuals in the worksheet.
Studentized Deleted Residuals. Studentized deleted residuals are similar to the Studentized
residual, except that the residual values are obtained by computing the regression equation
without using the data point in question.
To include Studentized deleted residuals in the report, make sure this check box is selected.
Click the selected check box if you do not want to include Studentized deleted residuals in
the worksheet.
SigmaPlot can automatically flag data points with "large" values of the Studentized deleted
residual, for example, outlying data points; the suggested data points flagged lie outside the
95% confidence interval for the regression population.
254
8.2.4.3 Options for Linear Regression: More Statistics
Note
Both Studentized and Studentized deleted residuals use the same confidence interval
setting to determine outlying points.
Report Flagged Values Only. To include only the flagged standardized and Studentized
deleted residuals in the report, make sure the Report Flagged Values Only check box is
selected. Clear this option to include all standardized and Studentized residuals in the report.
255
SigmaPlot Statistics
Select DFFITS to compute this value for all points and flag influential points, for example,
those with DFFITS greater than the value specified in the Flag Values > edit box. The
suggested value is 2.0 standard errors, which indicates that the point has a strong influence on
the data. To avoid flagging more influential points, increase this value; to flag less influential
points, decrease this value.
• Leverage. Leverage is used to identify the potential influence of a point on the results of the
regression equation. Leverage depends only on the value of the independent variable(s).
Observations with high leverage tend to be at the extremes of the independent variables,
where small changes in the independent variables can have large effects on the predicted
values of the dependent variable.
The expected leverage of a data point is , where there are k independent variables and n data
points. Observations with leverages much higher than the expected leverages are potentially
influential points.
Select Leverage to compute the leverage for each point and automatically flag potentially
influential points, for example, those points that could have leverages greater than the specified
value times the expected leverage. The suggested value is 2.0 times the expected leverage
for the regression. To avoid flagging more potentially influential points, increase this value;
to flag points with less potential influence, lower this value.
• Cook’s Distance. Cook’s distance is a measure of how great an effect each point has on
the estimates of the parameters in the regression equation. Cook’s distance assesses how
much the values of the regression coefficients change if a point is deleted from the analysis.
Cook’s distance depends on both the values of the independent and dependent variables.
Select Cook’s Distance to compute this value for all points and flag influential points, for
example, those with a Cook’s distance greater than the specified value. The suggested value is
4.0. Cook’s distances above 1 indicate that a point is possibly influential. Cook’s distances
exceeding 4 indicate that the point has a major effect on the values of the parameter estimates.
To avoid flagging more influential points, increase this value: to flag less influential points,
lower this value.
Report Flagged Values Only. To only include only the influential points flagged by the
influential point tests in the report, make sure you’ve selected Report Flagged Values Only.
Clear this check box to include all influential points in the report.
256
8.2.5 Running a Linear Regression
To run a Simple Linear Regression, you need to select the data to test. You use the Pick
Columns dialog box to select the worksheet columns with the data you want to test.
To run a Linear Regression:
1. If you want to select your data before you run the test, drag the pointer over your data.
2. Click the Analysis tab.
3. In the Statistics group, from the Tests drop-down list, select:
Regression→Linear
The Pick Columns for Linear Regression dialog box appears. If you selected columns
before you chose the test, the columns appear in the Selected Columns list. If you have
not selected columns, the dialog box prompts you to pick your data.
4. To assign the desired worksheet columns to the Selected Columns list, select the
columns in the worksheet, or select the columns from the Data for Dependent or Data
for Independent drop-down list.
The first selected column is assigned to the dependent row in the Selected Columns list,
and the second column is assigned to independent row in the list. The title of selected
columns appear in each row. You can only select one dependent and one independent
data column.
5. To change your selections, select the assignment in the list, then select new column
from the worksheet. You can also clear a column assignment by double-clicking it in the
Selected Columns list.
6. Click Finish to run the regression. If you elected to test for normality, constant variance,
and independent residuals, SigmaPlot performs the tests for normality (Shapiro-Wilk or
Kolmogorov-Smirnov), constant variance, and independent residuals. If your data fail
either of these tests, SigmaPlot warns you. When the test is complete, the Simple Linear
Regression report appears.
If you selected to place predicted values and residuals the worksheet, they are placed in
the specified column and are labeled by content and source column.
257
SigmaPlot Statistics
The other results displayed in the report are enabled and disabled Options for Linear
Regression dialog box. For more information, see 8.2.4 Setting Linear Regression Options.
Result Explanations
In addition to the numerical results, expanded explanations of the results may also appear. You
can turn off this text on the Options dialog box. You can also set the number of decimal places
to display in the Options dialog box. For more information, see Setting Report Options.
258
8.2.6.5 Beta (Standardized Coefficient b)
regression coefficient
t=
standard error of regression coefficient
You can conclude from "large" t values that the independent variable can be used to predict the
dependent variable (for example, that the coefficient is not zero).
P Value. P is the P value calculated for t. The P value is the probability of being wrong in
concluding that there is a true association between the variables (for example, the probability
of falsely rejecting the null hypothesis, or committing a Type I error, based on t). The smaller
the P value, the greater the probability that the independent variable can be used to predict the
dependent variable.
Traditionally, you can conclude that the independent variable can be used to predict the
dependent variable when P < 0.05.
259
SigmaPlot Statistics
MS (Mean Square). The mean square provides two estimates of the population variances.
Comparing these variance estimates is the basis of analysis of variance.
The mean square regression is a measure of the variation of the regression from the mean of
the dependent variable, or
The residual mean square is a measure of the variation of the residuals about the regression
line, or
residual sum of squares SS res
= = MS res
residual degrees of freedom DF res
If F is a large number, you can conclude that the independent variable contributes to the
prediction of the dependent variable (for example, the slope of the line is different from zero,
and the "unexplained variability" is smaller than what is expected from random sampling
variability). If the F ratio is around 1, you can conclude that there is no association between
the variables (for example, the data is consistent with the null hypothesis that all the samples
are just randomly distributed about the population mean, regardless of the value of the
independent variable).
P Value. The P value is the probability of being wrong in concluding that there is an
association between the dependent and independent variables (for example, the probability of
falsely rejecting the null hypothesis, or committing a Type I error, based on F). The smaller the
P value, the greater the probability that there is an association.
Traditionally, you can conclude that the independent variable can be used to predict the
dependent variable when P < 0.05.
Tip
In simple linear regression, the P value for the ANOVA is identical to the P value
associated with the t of the slope coefficient, and F=t2, where t is the t value associated
with the slope.
PRESS, the Predicted Residual Error Sum of Squares, is a measure of how well a regression
model predicts the observations.
The PRESS statistic is computed by summing the squares of the prediction errors (the
differences between predicted and observed values) for each observation, with that point
deleted from the computation of the estimated regression model.
260
8.2.6.8 Durbin-Watson Statistic
One important use of the PRESS statistics is for model comparison. If several different
regression models are applied to the same data, the one with the smallest PRESS statistic
has the best predictive capability.
8.2.6.11 Power
This result is displayed if you selected this option in the options dialog box. The power, or
sensitivity, of a performed regression is the probability that the model correctly describes the
relationship of the variables, if there is a relationship.
Regression power is affected by the number of observations, the chance of erroneously
reporting a difference a (alpha), and the correlation coefficient r associated with the regression.
Alpha (α). Alpha (α) is the acceptable probability of incorrectly concluding that the model
is correct. An α error is also called a Type I error (a Type I error is when you reject the
hypothesis of no association when this hypothesis is true).
Set the value in the Power Options dialog box; the suggested value is α = 0.05 which indicates
that a one in twenty chance of error is acceptable. Smaller values of α result in stricter
requirements before concluding the model is correct, but a greater possibility of concluding
261
SigmaPlot Statistics
the model is bad when it is really correct (a Type II error). Larger values of α make it easier
to conclude that the model is correct, but also increase the risk of accepting a bad model (a
Type I error).
The regression diagnostic results display only the values for the predicted values, residual
results, and other diagnostics selected in the Options for Regression dialog box. All results
that qualify as outlying values are flagged with a < symbol. The trigger values to flag residuals
as outliers are set in the Options for Linear Regression dialog box.
If you selected Report Cases with Outliers Only, only those observations that have one or
more residuals flagged as outliers are reported; however, all other results for that observation
are also displayed.
Row. This is the row number of the observation.
Predicted Values. This is the value for the dependent variable predicted by the regression
model for each observation.
Residuals. These are the raw residuals, the difference between the predicted and observed
values for the dependent variables.
Standardized Residuals. The standardized residual is the raw residual divided by the
standard error of the estimate syx.
If the residuals are normally distributed about the regression line, about 66% of the
standardized residuals have values between -1 and +1, and about 95% of the standardized
residuals have values between -2 and +2. A larger standardized residual indicates that the
point is far from the regression line; the suggested value flagged as an outlier is 2.5.
Studentized Residuals. The Studentized residual is a standardized residual that also takes
into account the greater confidence of the predicted values of the dependent variable in the
"middle" of the data set. By weighting the values of the residuals of the extreme data points
(those with the lowest and highest independent variable values), the Studentized residual is
more sensitive than the standardized residual in detecting outliers.
Both Studentized and Studentized deleted residuals that lie outside a specified confidence
interval for the regression are flagged as outlying points; the suggested confidence value is
95%.
This residual is also known as the internally Studentized residual because the standard error
of the estimate is computed using all data.
Studentized Deleted Residuals. The Studentized deleted residual, or externally Studentized
residual, is a Studentized residual which uses the standard error of the estimate syx(—i) ,
computed after deleting the data point associated with the residual. This reflects the greater
effect of outlying points by deleting the data point from the variance computation.
Both Studentized and Studentized deleted residuals that lie outside a specified confidence
interval for the regression are flagged as outlying points; the suggested confidence value is
95%.
The Studentized deleted residual is more sensitive than the Studentized residual in detecting
outliers, since the Studentized deleted residual results in much larger values for outliers than
the Studentized residual.
262
8.2.6.13 Influence Diagnostics
263
SigmaPlot Statistics
Predicted. This is the value for the dependent variable predicted by the regression model
for each observation.
Regression. The confidence interval for the regression line gives the range of variable
values computed for the region containing the true relationship between the dependent and
independent variables, for the specified level of confidence.
Population. The confidence interval for the population gives the range of variable values
computed for the region containing the population from which the observations were drawn,
for the specified level of confidence.
The Create Result Graph dialog box appears displaying the types of graphs available for
the Linear Regression results.
3. Select the type of graph you want to create from the Graph Type list, then click OK.
For more information, see 11.1 Generating Report Graphs.
264
8.3.1 About the Multiple Linear Regression
• You know there are two or more independent variables and want to find a model with
these independent variables.
The independent variables are the known, or predictor, variables. When the independent
variables are varied, they produce a corresponding value for the dependent, or response,
variable.
If you know there is only one independent variable, use Simple Linear Regression. If you
are not sure if all independent variables should be used in the model, use Stepwise or Best
Subsets Regression to identify the important independent variables from the selected possible
independent variables.
If the relationship is not a straight line or plane, use Polynomial or Nonlinear Regression,
or use a variable transformation.
1. Enter or arrange your data appropriately in the worksheet. For more information, see 8.3.3
Arranging Multiple Linear Regression Data.
2. Generate report graphs. For more information, see 8.3.7 Multiple Linear Regression
Report Graphs.
265
SigmaPlot Statistics
1. If you are going to run the test after changing test options and want to select your data
before you run the test, drag the pointer over the data.
2. Click the Analysis tab.
3. In the Statistics group, select Multiple Linear Regression from the Tests drop-down list.
4. Click Options.
The Options for Multiple Linear Regression dialog box appears with four tabs:
• Assumption Checking. Click the Assumption Checking tab to view the Normality,
Constant Variance, and Durbin-Watson options. For more information, see 8.3.4.1
Options for Multiple Linear Regression: Assumption Checking.
• Residuals. Click the Residuals tab to view the residual options. For more information,
see 8.3.4.2 Options for Multiple Linear Regression: Residuals.
• More Statistics. Click the More Statistics tab to view the confidence intervals, PRESS
Prediction Error, Standardized Coefficients options. For more information, see 8.3.4.3
Options for Multiple Linear Regression: More Statistics.
• Other Diagnostics. Click Other Diagnostics to view the Influence, Variance Inflation
Factor, and Power options. For more information, see 8.3.4.4 Options for Multiple
Linear Regression: Other Diagnostics.
5. Select or clear a check box to enable or disable a test option. Options settings are saved
between SigmaPlot sessions. For more information, see 8.4.6 Interpreting Multiple
Logistic Regression Results.
6. To continue the test, click Run Test.
7. To accept the current settings and close the options dialog box, click OK.
266
8.3.4.1 Options for Multiple Linear Regression: Assumption Checking
267
SigmaPlot Statistics
Click the Residuals tab in the options dialog box to view the Predicted Values, Raw,
Standardized, Studentized, Studentized Deleted, and Report Flagged Values Onlyoptions.
Predicted Values . Use this option to calculate the predicted value of the dependent variable
for each observed value of the independent variable(s), then save the results to the data
worksheet.
To assign predicted values to a worksheet column, select the worksheet column you want to
save the predicted values to from the corresponding drop-down list. If you select none and the
Predicted Values check box is selected, the values appear in the report but are not assigned to
the worksheet.
Raw Residuals. The raw residuals are the differences between the predicted and observed
values of the dependent variables. To include raw residuals in the report, make sure this
check box is selected.
To assign the raw residuals to a worksheet column, select the number of the desired column
from the corresponding drop-down list. If you select none from the drop-down list and the Raw
check box is selected, the values appear in the report but are not assigned to the worksheet.
Standardized Residuals. The standardized residual is the residual divided by the standard
error of the estimate. The standard error of the residuals is essentially the standard deviation
of the residuals, and is a measure of variability around the regression line. To include
standardized residuals in the report, make sure this check box is selected.
SigmaPlot automatically flags data points lying outside of the confidence interval specified
in the corresponding box. These data points are considered to have "large" standardized
residuals, for example, outlying data points. You can change which data points are flagged by
editing the value in the Flag Values > edit box.
Studentized Residuals. Studentized residuals scale the standardized residuals by taking
into account the greater precision of the regression line near the middle of the data versus
the extremes. The Studentized residuals tend to be distributed according to the Student t
distribution, so the t distribution can be used to define "large" values of the Studentized
residuals. SigmaPlot automatically flags data points with "large" values of the Studentized
residuals, for example, outlying data points; the suggested data points flagged lie outside the
95% confidence interval for the regression population.
To include Studentized residuals in the report, make sure this check box is selected. Click the
selected check box if you do not want to include Studentized residuals in the worksheet.
Studentized Deleted Residuals. Studentized deleted residuals are similar to the Studentized
residual, except that the residual values are obtained by computing the regression equation
without using the data point in question.
To include Studentized deleted residuals in the report, make sure this check box is selected.
Click the selected check box if you do not want to include Studentized deleted residuals in
the worksheet.
SigmaPlot can automatically flag data points with "large" values of the Studentized deleted
residual, for example, outlying data points; the suggested data points flagged lie outside the
95% confidence interval for the regression population.
Tip
Both Studentized and Studentized deleted residuals use the same confidence interval
setting to determine outlying points.
268
8.3.4.3 Options for Multiple Linear Regression: More Statistics
Report Flagged Values Only. To include only the flagged standardized and Studentized
deleted residuals in the report, select Report Flagged Values Only. Clear this option to include
all standardized and Studentized residuals in the report.
269
SigmaPlot Statistics
influential data points. Most influential points are data points which are outliers, that is, they
do not do not "line up" with the rest of the data points. These points can have a potentially
disproportionately strong influence on the calculation of the regression line. You can use
several influence tests to identify and quantify influential points.
DFFITS. DFFITSi is the number of estimated standard errors that the predicted value changes
for the ith data point when it is removed from the data set. It is another measure of the
influence of a data point on the prediction used to compute the regression coefficients.
Predicted values that change by more than two standard errors when the data point is removed
are considered to be influential.
Select DFFITS to compute this value for all points and flag influential points, for example,
those with DFFITS greater than the value specified in the Flag Values > box. The suggested
value is 2.0 standard errors, which indicates that the point has a strong influence on the data.
To avoid flagging more influential points, increase this value; to flag less influential points,
decrease this value. For more information, see 8.3.4.4.1 What to Do About Influential Points .
Leverage. Select Leverage to identify the potential influence of a point on the results of the
regression equation. Leverage depends only on the value of the independent variable(s).
Observations with high leverage tend to be at the extremes of the independent variables,
where small changes in the independent variables can have large effects on the predicted
values of the dependent variable.
The expected leverage of a data point is
k+1
n
, where there are k independent variables and n data points. Observations with leverages much
higher than the expected leverages are potentially influential points.
Select Leverage to compute the leverage for each point and automatically flag potentially
influential points, for example, those points that could have leverages greater than the specified
value times the expected leverage. The suggested value is 2.0 times the expected leverage for
the regression (for example,
2(k + 1)
n
). To avoid flagging more potentially influential points, increase this value; to flag points with
less potential influence, lower this value. For more information, see 8.3.4.4.1 What to Do
About Influential Points .
Cook’s Distance. Cook’s distance is a measure of how great an effect each point has on the
estimates of the parameters in the regression equation. Cook’s distance assesses how much the
values of the regression coefficients change if a point is deleted from the analysis. Cook’s
distance depends on both the values of the independent and dependent variables.
Select Cook’s Distance to compute this value for all points and flag influential points, for
example, those with a Cook’s distance greater than the specified value. The suggested value is
4.0. Cook’s distances above 1 indicate that a point is possibly influential. Cook’s distances
exceeding 4 indicate that the point has a major effect on the values of the parameter estimates.
To avoid flagging more influential points, increase this value: to flag less influential points,
lower this value.For more information, see 8.3.4.4.1 What to Do About Influential Points .
270
8.3.4.4 Options for Multiple Linear Regression: Other Diagnostics
Report Flagged Values Only. To only include only the influential points flagged by the
influential point tests in the report, select Report Flagged Values Only. Clear this option to
include all influential points in the report.
Power. The power of a regression is the power to detect the observed relationship in the data.
The alpha (α) is the acceptable probability of incorrectly concluding there is a relationship.
Select Power to compute the power for the multiple linear regression data. Change the alpha
value by editing the number in the Alpha Value edit box. The suggested value is α = 0.05.
This indicates that a one in twenty chance of error is acceptable, or that you are willing to
conclude there is a significant relationship when P < 0.05.
Smaller values of α result in stricter requirements before concluding there is a significant
relationship, but a greater possibility of concluding there is no relationship when one exists.
Larger values of α make it easier to conclude that there is a relationship, but also increase the
risk of reporting a false positive.
Variance Inflation Factor. Select Variance Inflation Factor to measure the multicollinearity
of the independent variables, or the linear combination of the independent variables in the fit.
Regression procedures assume that the independent variables are statistically independent of
each other, for example, that the value of one independent variable does not affect the value of
another. However, this ideal situation rarely occurs in the real world. When the independent
variables are correlated, or contain redundant information, the estimates of the parameters in
the regression model can become unreliable.
The parameters in regression models quantify the theoretically unique contribution of each
independent variable to predicting the dependent variable. When the independent variables are
correlated, they contain some common information and "contaminate" the estimates of the
parameters. If the multicollinearity is severe, the parameter estimates can become unreliable.
For more information, see 8.3.4.4.2 What to Do About Multicollinearity.
There are two types of multicollinearity:
• Structural Multicollinearity. Structural multicollinearity occurs when the regression
equation contains several independent variables which are functions of each other. The most
common form of structural multicollinearity occurs when a polynomial regression equation
contains several powers of the independent variable. Because these powers (for example,
x2, and so on) are correlated with each other, structural multicollinearity occurs. Including
interaction terms in a regression equation can also result in structural multicollinearity.
• Sample-Based Multicollinearity. Sample-based multicollinearity occurs when the sample
observations are collected in such a way that the independent variables are correlated (for
example, if age, height, and weight are collected on children of varying ages, each variable
has a correlation with the others).
SigmaPlot can automatically detect multicollinear independent variables using the variance
inflation factor.
Flagging Multicollinear Data. Use the value in the Flag Values > edit box as a threshold
for multicollinear variables. The default threshold value is 4.0, meaning that any value
greater than 4.0 will be flagged as multicollinear. To make this test more sensitive to possible
multicollinearity, decrease this value. To allow greater correlation of the independent variables
before flagging the data as multicollinear, increase this value.
When the variance inflation factor is large, there are redundant variables in the regression
model, and the parameter estimates may not be reliable. Variance inflation factor values above
4 suggest possible multicollinearity; values above 10 indicate serious multicollinearity.
271
SigmaPlot Statistics
Report Flagged Values Only. To only include only the points flagged by the influential
point tests and values exceeding the variance inflation threshold in the report, select Report
Flagged Values. Clear this option to include all influential points in the report.
To run a Multiple Linear Regression, you need to select the data to test. The Pick Columns
dialog box is used to select the worksheet columns with the data you want to test.
To run a Multiple Linear Regression:
1. If you want to select your data before you run the regression, drag the pointer over your
data.
2. Click the Analysis tab.
3. In the Statistics group, from the Tests drop-down list, select:
Regression→Multiple Linear
The Pick Columns for Multiple Linear Regression dialog box appears. If you selected
columns before you chose the test, the selected columns appear in the column list. If you
have not selected columns, the dialog box prompts you to pick your data.
4. To assign the desired worksheet columns to the Selected Columns list, select the
columns in the worksheet or from the Data for Dependent or Independent drop-down
list.
The first selected column is assigned to the Dependent row in the Selected Columns list,
and all successively selected columns are assigned to the Independent rows in the list.
The title of selected columns appear in each row. You can select up to 64 independent
columns.
272
8.3.6 Interpreting Multiple Linear Regression Results
5. To change your selections, select the assignment in the list, then select new column
from the worksheet. You can also clear a column assignment by double-clicking it in the
Selected Columns list.
6. Click Finish to run to perform the regression. If you elected to test for normality, constant
variance, and/or independent residuals, SigmaPlot performs the tests for normality
(Shapiro-Wilk or Kolmogorov-Smirnov), constant variance, and independent residuals. If
your data fails either of these tests, SigmaPlot warns you. When the test is complete, the
report appears displaying the results of the Multiple Linear Regression.
If you selected to place residuals and other test results in the worksheet, they are placed in
the specified column and are labeled by content and source column.
273
SigmaPlot Statistics
The standard error of the estimate syx is a measure of the actual variability about the regression
plane of the underlying population. The underlying population generally falls within about
two standard errors of the estimate of the observed sample.
Coefficients. The value for the constant and coefficients of the independent variables for the
regression model are listed.
Standard Error. The standard errors of the regression coefficients (analogous to the standard
error of the mean). The true regression coefficients of the underlying population generally
fall within about two standard errors of the observed sample coefficients. Large standard
errors may indicate multicollinearity.
These values are used to compute t and confidence intervals for the regression.
8.3.6.5 Beta
These are the coefficients of the regression equation standardized to dimensionless values,
sx
i= bi
sy
where bi = regression coefficient, sxi = standard deviation of the independent variable xi, and ys
= standard deviation of dependent variable y.
These results are displayed if the Standardized Coefficients option was selected in the
Regression Options dialog box.
t Statistic. The t statistic tests the null hypothesis that the coefficient of the independent
variable is zero, that is, the independent variable does not contribute to predicting the
dependent variable. t is the ratio of the regression coefficient to its standard error, or:
regression coefficient
t=
standard error of regression coefficient
You can conclude from "large" t values that the independent variable can be used to predict the
dependent variable (for example, that the coefficient is not zero).
P value. P is the P value calculated for t. The P value is the probability of being wrong in
concluding that there is a true association between the variables (for example, the probability
of falsely rejecting the null hypothesis, or committing a Type I error, based on t). The smaller
the P value, the greater the probability that the variables are correlated.
Traditionally, you can conclude that the independent variable contributes to predicting the
dependent variable when P < 0.05.
VIF (Variance Inflation Factor). The variance inflation factor is a measure of
multicollinearity. It measures the "inflation" of the standard error of each regression parameter
(coefficient) for an independent variable due to redundant information in other independent
variables.
274
8.3.6.6 Analysis of Variance (ANOVA) Table
If the variance inflation factor is 1.0, there is no redundant information in the other independent
variables. If the variance inflation factor is much larger, there are redundant variables in the
regression model, and the parameter estimates may not be reliable.
Variance inflation factor values for independent variables above the specified value are
flagged with a > symbol, indicating multicollinearity with other independent variables. The
suggested value is 4.0.
The ANOVA (analysis of variance) table lists the ANOVA statistics for the regression and
the corresponding F value.
SS (Sum of Squares) . The sum of squares are measures of variability of the dependent
variable.
• The sum of squares due to regression measures the difference of the regression plane from
the mean of the dependent variable.
• The residual sum of squares is a measure of the size of the residuals, which are the
differences between the observed values of the dependent variable and the values predicted
by regression model.
• The total sum of squares is a measure of the overall variability of the dependent variable
about its mean value.
DF (Degrees of Freedom) . Degrees of freedom represent the number observations and
variables in the regression equation.
• The regression degrees of freedom is a measure of the number of independent variables.
• The residual degrees of freedom is a measure of the number of observations less the number
of terms in the equation.
• The total degrees of freedom is a measure of total observations.
MS (Mean Square). The mean square provides two estimates of the population variances.
Comparing these variance estimates is the basis of analysis of variance.
The mean square regression is a measure of the variation of the regression from the mean of
the dependent variable, or:
The residual mean square is a measure of the variation of the residuals about the regression
plane, or:
residual sum of squares SS res
= = MS res
residual degrees of freedom DF res
275
SigmaPlot Statistics
F Statistic. The F test statistic gauges the ability of the regression equation, containing all
independent variables, to predict the dependent variable. It is the ratio
If F is a large number, you can conclude that the independent variables contribute to the
prediction of the dependent variable (for example, at least one of the coefficients is different
from zero, and the "unexplained variability" is smaller than what is expected from random
sampling variability about the mean value of the dependent variable). If the F ratio is around
1, you can conclude that there is no association between the variables (for example, the data is
consistent with the null hypothesis that all the samples are just randomly distributed).
P Value. The P value is the probability of being wrong in concluding that there is an
association between the dependent and independent variables (for example, the probability of
falsely rejecting the null hypothesis, or committing a Type I error, based on F). The smaller the
P value, the greater the probability that there is an association.
Traditionally, you can conclude that the independent variable can be used to predict the
dependent variable when P < 0.05.
SSincr. SSincr, the incremental or Type I sum of squares, is a measure of the new predictive
information contained in an independent variable, as it is added to the equation.
The incremental sum of squares measures the increase in the regression sum of squares (and
reduction in the sum of squared residuals) obtained when that independent variable is added to
the regression equation, after all independent variables above it have been entered.
You can gauge the additional contribution of each independent variable by comparing these
values.
SSmarg. SSmarg, the marginal or Type III sum of squares, is a measure of the unique
predictive information contained in an independent variable, after taking into account all other
independent variables. You can gauge the independent contribution of each independent
variable by comparing these values.
The marginal sum of squares measures the reduction in the sum of squared residuals obtained
by entering the independent variable last, after all other variables in the equation have been
entered.
PRESS, the Predicted Residual Error Sum of Squares, is a measure of how well a regression
model predicts the observations.
The PRESS statistic is computed by summing the squares of the prediction errors (the
differences between predicted and observed values) for each observation, with that point
deleted from the computation of the estimated regression model.
One important use of the PRESS statistics is for model comparison. If several different
regression models are applied to the same data, the one with the smallest PRESS statistic
has the best predictive capability.
276
8.3.6.9 Durbin-Watson Statistic
8.3.6.12 Power
This result is displayed if you selected this option in the Options for Multiple Linear
Regression dialog box.
The power, or sensitivity, of a regression is the probability that the regression model can detect
the observed relationship among the variables, if there is a relationship in the underlying
population.
Regression power is affected by the number of observations, the chance of erroneously
reporting a difference a (alpha), and the slope of the regression.
Alpha (α) . Alpha (α) is the acceptable probability of incorrectly concluding that the model
is correct. An a error is also called a Type I error (a Type I error is when you reject the
hypothesis of no association when this hypothesis is true).
Set the value in the Power Options dialog box; the suggested value is α = 0.05 which indicates
that a one in twenty chance of error is acceptable. Smaller values of α result in stricter
requirements before concluding the model is correct, but a greater possibility of concluding
the model is bad when it is really correct (a Type II error). Larger values of α make it easier
to conclude that the model is correct, but also increase the risk of accepting a bad model (a
Type I error).
277
SigmaPlot Statistics
278
8.3.6.15 Confidence Intervals
If you selected Report Cases with Outliers Only, only observations that have one or more
observations flagged as outliers are reported; however, all other results for that observation
are also displayed.
Row. This is the row number of the observation.
Cook’s Distance . Cook’s distance is a measure of how great an effect each point has on the
estimates of the parameters in the regression equation. It is a measure of how much the values
of the regression coefficients would change if that point is deleted from the analysis.
Values above 1 indicate that a point is possibly influential. Cook’s distances exceeding 4
indicate that the point has a major effect on the values of the parameter estimates. Points with
Cook’s distances greater than the specified value are flagged as influential; the suggested
value is 4.
Leverage. Leverage values identify potentially influential points. Observations with leverages
a specified factor greater than the expected leverages are flagged as potentially influential
points; the suggested value is 2.0 times the expected leverage.
The expected leverage of a data point is
(k + 1)
n
These results are displayed if you selected them in the Options for Multiple Linear Regression
dialog box. If the confidence interval does not include zero, you can conclude that the
coefficient is different than zero with the level of confidence specified. This can also be
described as P < α (alpha), where α is the acceptable probability of incorrectly concluding that
the coefficient is different than zero, and the confidence interval is 100(1 - α ).
The specified confidence level can be any value from 1 to 99; the suggested confidence level
for both intervals is 95%.
Row. This is the row number of the observation.
Predicted. This is the value for the dependent variable predicted by the regression model
for each observation.
Regression. The confidence interval for the regression gives the range of variable values
computed for the region containing the true relationship between the dependent and
independent variables, for the specified level of confidence.
279
SigmaPlot Statistics
Population. The confidence interval for the population gives the range of variable values
computed for the region containing the population from which the observations were drawn,
for the specified level of confidence.
1. With the Multiple Linear Regression report in view, click the Report tab.
2. In the Graph Results group, click Create Result Graph.
3. The Create Result Graph dialog box appears displaying the types of graphs available
for the Multiple Linear Regression results.
4. Select the type of graph you want to create from the Graph Type list, then click OK,
or double-click the desired graph in the list. For more information, see 11.1 Generating
Report Graphs.
If you select Scatter Plot Residuals, Bar Chart Std Residuals, Regression, Conf.
& Pred, a dialog box appears prompting you to select the column with independent
variables you want to use in the graph.
If you select 3D Scatter & Mesh, or 3D Residual Scatter, and you have more than two
columns of independent variables, a dialog box appears prompting you to select the two
columns with the independent variables you want to plot.
5. Select the columns with the independent variables you want to use in the graph, then click
OK. The graph appears using the specified independent variables.
280
8.4 Multiple Logistic Regression
where y is the dependent variable, P(y =1) is the predicted probability that the dependent
variable is positive response or has a value of 1, b0 through bk are the k+1 regression
coefficients, and x1 through xk are the independent variables.
As the values xI vary, the corresponding estimated probability that y =1 increases or decreases,
depending on the sign of the associated regression coefficient bI.
Multiple Logistic Regression finds the set of values of the regression coefficients most likely
to predict the observed values of the dependent variable, given the observed values of the
independent variables.
1. Enter or arrange your data appropriately in the worksheet. For more information, see 8.4.3
Arranging Multiple Logistic Regression Data.
2. Set the Logistic Regression options. For more information, see 8.4.4 .
3. Click the Analysis tab.
4. In the Statistics group, from the Tests drop-down list, select:
Regression→Multiple Logistic
5. View and interpret the Multiple Logistic Regression report. For more information, see
8.4.6 Interpreting Multiple Logistic Regression Results.
6. Run the test. For more information, see 8.4.5 .
281
SigmaPlot Statistics
1. If you are going to run the test after changing test options and want to select your data
before you run the test, drag the pointer over the data.
2. Click the Analysis tab.
3. In the Statistics group, select Multiple Logistic Regression from the Select Test
drop-down list.
282
8.4.4.1 Options for Multiple Logistic Regression: Criterion
4. Click Options. The Options for Multiple Logistic Regression dialog box appears
with three tabs:
• Criterion. Click the Criterion tab to view the criterion options. For more information,
see 8.4.4.1 Options for Multiple Logistic Regression: Criterion.
• More Statistics. Click the More Statistics tab to view the Standard Error Coefficients,
Wald Statistic, Odds Ratio, Odds Ratio Confidence, and Coefficients P Values, Predicted
Values, and Variance Inflation Factor options.For more information, see 8.4.4.2 .
• Residuals. Click the Residuals tab to view the residual and influence options. For
more information, see 8.4.4.3 Options for Multiple Logistic Regression: Residuals .
Option settings are saved between SigmaPlot sessions.
5. To continue the test, click Run Test.
6. To accept the current settings and close the options dialog box, click OK.
283
SigmaPlot Statistics
are assigned a value of 0 or a reference value. The default threshold is 0.5. The resulting
contingency table can be analyzed with a Chi-Square test. As with the Hosmer-Lemshow
statistic, a large P value indicates a good fit between the logistic regression equation and the
data. For more information, see 8.4.6 Interpreting Multiple Logistic Regression Results.
Number of Independent Variable Combinations. If the number of unique combinations
of the independent variables is not large compared to the number of independent variables,
your logistic regression results may be unreliable. To calculate the number of independent
variable combinations and warn if there are not enough combinations as compared to the
independent variables, select the Number of Independent Variable Combinations check box. If
the calculated independent combination is less than the value in the corresponding edit box, a
dialog box appears warning you that the number of independent variable combinations are too
small, and asks if you want to continue. If you select Yes, the warning message appears in
the report.
b 2i
z=
s 2b i
b is
where P is the probability of the event happening. The odds ratio for an independent variable
is computed as
G= e 1
where β1 is the regression coefficient. The odds ratio is an estimate of the increase (or
decrease) in the odds for an outcome if the independent variable value is increased by 1.
284
8.4.4.2 Options for Multiple Logistic Regression: Statistics
Odds Ratio Confidence. The odds ratio confidence intervals are defined as
s bi
bi ± Z
e 1
2
is the point on the axis of the standard normal distribution that corresponds to the desired
confidence interval.
The default confidence used is 95%. To change the confidence used, change the percentage in
the corresponding edit box.
Coefficients P Value. The Coefficients P Value determines the probability of being incorrect
in concluding that the each independent variable has a significant effect on determining the
dependent variable. The smaller the P value, the more likely the independent variables
actually predicts the dependent variables.
Use the Wald Statistic to test whether the coefficients associated with the independent
variables are significantly different from zero. The significance of independent variables is
tested by comparing the observed value of the coefficients with the associated standard error
of the coefficient. If the observed value of the coefficient is large compared to the standard
error, you can conclude that the coefficients are significantly different from zero and that the
independent variables contribute significantly to predicting the dependent variables. For more
information, see 8.4.6 Interpreting Multiple Logistic Regression Results.
Predicted Values. Use this option to calculate the predicted value of the dependent variable for
each observed value of the independent variable(s), then save the results to the data worksheet.
For logistic regression the predicted values indicate the probability of a positive response. For
more information, see 8.4.6 Interpreting Multiple Logistic Regression Results.
To assign predicted values to a worksheet column, select the worksheet column you want to
save the predicted values to from the corresponding drop-down list. If you select none and the
Predicted Values check box is selected, the values appear in the report but are not assigned to
the worksheet.
Variance Inflation Factor. Use this option to measure the multicollinearity of the independent
variables, or the linear combination of the independent variables in the fit.
Regression procedures assume that the independent variables are statistically independent of
each other, for example, that the value of one independent variable does not affect the value of
another. However, this ideal situation rarely occurs in the real world. When the independent
variables are correlated, or contain redundant information, the estimates of the parameters in
the regression model can become unreliable.
The parameters in regression models quantify the theoretically unique contribution of each
independent variable to predicting the dependent variable. When the independent variables are
correlated, they contain some common information and "contaminate" the estimates of the
parameters. If the multicollinearity is severe, the parameter estimates can become unreliable.
There are two types of multicollinearity.
285
SigmaPlot Statistics
286
8.4.4.3 Options for Multiple Logistic Regression: Residuals
Raw Residuals. The raw residuals are the differences between the predicted and observed
values of the dependent variables. To include raw residuals in the report, make sure this check
box is selected. Click the selected check box if you do not want to include raw residuals in
the worksheet.
To assign the raw residuals to a worksheet column, select the number of the desired column
from the corresponding drop-down list. If you select none from the drop-down list and the Raw
check box is selected, the values appear in the report but are not assigned to the worksheet.
Studentized Residuals. Studentized residuals take into account the greater precision of
the regression estimates near the middle of the data versus the extremes. The Studentized
residuals tend to be distributed according to the Student t distribution, so the t distribution
can be used to define "large" values of the Studentized residuals. SigmaPlot automatically
flags data points with "large" values of the Studentized residuals, for example, outlying
data points; the suggested data points flagged lie outside the 95% confidence interval for
the regression population.
To include Studentized residuals in the report, make sure this check box is selected. Click the
selected check box if you do not want to include Studentized residuals in the worksheet.
Studentized Deleted Residuals. Studentized deleted residuals are similar to the Studentized
residual, except that the residual values are obtained by computing the regression equation
without using the data point in question.
To include Studentized deleted residuals in the report, make sure this check box is selected.
Click the selected check box if you do not want to include Studentized deleted residuals in
the worksheet.
SigmaPlot can automatically flag data points with "large" values of the Studentized deleted
residual, for example, outlying data points; the suggested data points flagged lie outside the
95% confidence interval for the regression population.
Note
Both Studentized and Studentized deleted residuals use the same confidence interval
setting to determine outlying points.
Report Flagged Values Only. To only include the flagged standardized and Studentized
deleted residuals in the report, select Report Flagged Values Only. Clear this option to
include all standardized and Studentized residuals in the report.
Influence
Influence options automatically detect instances of influential data points. Most influential
points are data points which are outliers, that is, they do not "line up" with the rest of the
data points. These points can have a potentially disproportionately strong influence on the
calculation of the regression line. You can use several influence tests to identify and quantify
influential points.
Leverage. Leverage is used to identify the potential influence of a point on the results of the
regression equation. Leverage depends only on the value of the independent variable(s).
Observations with high leverage tend to be at the extremes of the independent variables,
where small changes in the independent variables can have large effects on the predicted
values of the dependent variable.
The expected leverage of a data point is
(k + 1)
n
287
SigmaPlot Statistics
, where there are k independent variables and n data points. Observations with leverages much
higher than the expected leverages are potentially influential points.
Select Leverage to compute the leverage for each point and automatically flag potentially
influential points, for example, those points that could have leverages greater than the
specified value times the expected leverage. The suggested value is 2.0 times the expected
leverage for the regression
2(k + 1)
n
. To avoid flagging more potentially influential points, increase this value; to flag points
with less potential influence, lower this value.
Cook’s Distance. Cook’s distance is a measure of how great an effect each point has on the
estimates of the parameters in the regression equation. Cook’s distance assesses how much the
values of the regression coefficients change if a point is deleted from the analysis. Cook’s
distance depends on both the values of the independent and dependent variables.
Select Cook’s Distance to compute this value for all points and flag influential points, for
example, those with a Cook’s distance greater than the specified value. The suggested value is
4.0. Cook’s distances above 1 indicate that a point is possibly influential. Cook’s distances
exceeding 4 indicate that the point has a major effect on the values of the parameter estimates.
To avoid flagging more influential points, increase this value: to flag less influential points,
lower this value.
To run a Multiple Logistic Regression, you need to select the data to test. Use the Pick
Columns for Multiple Logistic Regression dialog box select the worksheet columns with
the data you want to test.
To run a Multiple Logistic Regression:
1. If you want to select your data before you run the regression, drag the pointer over your
data.
2. Click the Analysis tab.
3. In the Statistics group, from the Tests drop-down list, select:
Regression→Multiple Logistic
288
8.4.6 Interpreting Multiple Logistic Regression Results
The Pick Columns for Multiple Logistic Regression dialog box appears. If you
selected columns before you chose the test, the selected columns appear in the Selected
Columns list.
4. To assign the desired worksheet columns to the Selected Columns list, select
the columns in the worksheet, or select the columns from the Data for Dependent,
Independent, or Count drop-down list.
Select the column with the values indication the number of time a dependent and
independent combination is repeats as the Count column. The title of selected columns
appears in each row.
5. To change your selections, select the assignment in the list, then select new column
from the worksheet. You can also clear a column assignment by double-clicking it in the
Selected Columns list.
6. Click Finish to run the regression. If you elected to test for normality, constant variance,
and/or independent residuals, SigmaPlot performs the tests for normality (Shapiro-Wilk
or Kolmogorov-Smirnov), constant variance, and independent residuals. If your data fails
either of these tests, SigmaPlot warns you. When the test is complete, the report appears
displaying the results of the Multiple Logistic Regression.
If you selected to place residuals and other test results in the worksheet, they are placed in
the specified column and are labeled by content and source column.
where P is the probability of a "positive" response (for example, value of the dependent
variable equal to 1) and x1, x2, x3, ..., xk are the independent variables and b1, b2, b3,..., bk are
the regression coefficients. The equation can be rewritten by applying the logit transformation
to both sides of this equation.
289
SigmaPlot Statistics
P
Logit P = ln
1 P
290
8.4.6.8 Likelihood Ratio Test Statistic
poor agreement. The Pearson Chi-Square option is set in the Options for Multiple Logistic
Regression dialog box.
where the yi and μi are respectively the observed and predicted values of the dependent
variable, and n is the number of observations. Note that ln(1) is zero and the observed values
must be 0 or 1. Thus the closer the predicted values are to the observed, the closer this sum
will be to zero.
The -2 log likelihood is also equal to the sum of the squared deviance residuals.
The -2 log likelihood (LL) statistic is related to the likelihood ratio (LR): LR=LL=LL0 where
LL0 is the -2 log likelihood of a regression model having none of the independent variables,
just a constant term. In viewing this relationship note that both LL0and LL are positive, and
LL must be closer to zero reflecting a better fit. (At the extremes, LL will be zero when
there is a perfect fit, and LL will equal LL0 when there is no fit whatsoever). Thus the larger
the LR the larger the implied explanatory power of the independent variables for the given
dependent variable.
291
SigmaPlot Statistics
The responses classified by the logistic model are derived by comparing estimated logistic
probabilities in the Probability Table to the specified threshold probability value (see
preceding section).
This table appears in the report if the Classification Table option is selected in the Options
dialog box.
where z is the Wald Statistics, bI is the observed value of the estimated coefficient, and
sbi
G= e 1
292
8.4.6.14 Residual Calculation Method
where π1 is the regression coefficient. The odds ratio is an estimate of the increase (or
decrease) in the odds for an outcome if the independent variable value is increased by 1.
Odds Ratio Confidence. These two values represent the lower and upper ends of the
confidence interval in which the true odds ratio lies. The level of confidence (95%) is specified
in the options dialog.
VIF (Variance Inflation Factor). The variance inflation factor is a measure of
multicollinearity. It measures the "inflation" of the standard error of each regression parameter
(coefficient) for an independent variable due to redundant information in other independent
variables.
If the variance inflation factor is 1.0, there is no redundant information in the other independent
variables. If the variance inflation factor is much larger, there are redundant variables in the
regression model, and the parameter estimates may not be reliable.
Variance inflation factor values for independent variables above the specified value are flagged
with a > symbol, indicating multicollinearity with other independent variables.
The presence of serious multicollinearity indicates that you have too many redundant
independent variables in your regression equation. To improve the quality of the regression
equation, you should delete the redundant variables. The cutoff value for flagging
multicollinearity is set in the Options dialog box. The suggested value is 4.0.
where yi and μi are respectively the observed and predicted values of the dependent variable
for the ith case.
The deviance residual is defined as:
1
2ln for y i= 0
(1 µ)
1
+ 2ln for y i= 1
µi
293
SigmaPlot Statistics
also displayed. The way the residuals are calculated depend on whether Pearson or Deviance
is selected as the residual type in the Options dialog box.
Row. This is the row number of the observation. Note that if your data has a case with a value
missing, the corresponding row is entirely omitted from the table of residuals.
Pearson/Deviance Residuals. The Residual table displays either Pearson or Deviance
residuals, depending on the Residual Type option setting in the Options for Logistic
Regression dialog box.
Both Pearson and Deviance residuals indicate goodness of fit between the logistic equation
and the data, with smaller values indicating a better fit. These two residual types are calculated
differently and affect the way the Studentized residuals in the table are calculated.
Pearson residuals, also known as standardized residuals, are the raw residuals divided by the
standard error. Deviance residuals are a measure of how much each point contributes to the
likelihood function being minimized as part of the maximum likelihood procedure.
Raw Residuals. Raw residuals are the difference between the predicted and observed values
for each of the subjects or cases.
Studentized Residuals. The Studentized residual is a standardized residual that also takes
into account the greater confidence of the predicted values of the dependent variable in the
"middle" of the data set.
This residual is also known as the internally Studentized residual, because the standard error
of the estimate is computed using all data.
Studentized Deleted Residual. The Studentized deleted residual, or externally Studentized
residual, is a Studentized residual which uses the standard error, computed after deleting
the data point associated with the residual.
Both Studentized and Studentized deleted residuals that lie outside a specified confidence
interval for the regression are flagged as outlying points; the suggested confidence value is
95%.
The Studentized deleted residual is more sensitive than the Studentized residual in detecting
outliers, since the Studentized deleted residual results in much larger values for outliers than
the Studentized residual.
294
8.5 Polynomial Regression
Leverage. Leverage values identify potentially influential points. Observations with leverages
a specified factor greater than the expected leverages are flagged as potentially influential
points; the suggested value is 2.0 times the expected leverage.
The expected leverage of a data point is
(k + 1)
n
295
SigmaPlot Statistics
1. If you are going to run the test after changing test options, and want to select your data
before you run the test, drag the pointer over your data.
2. Click the Analysis tab.
3. In the Statistics group, select Polynomial Regression from the Select Test drop-down
list.
4. Click Options. The Options for Polynomial Regression dialog box appears. If you
select Incremental Order as the regression type, only the Criterion options are available.
If you select Order Only, then the following tabs appear:
296
8.5.4.1 Options for Polynomial Regression: Criterion
• Criterion. Click the Criterion tab to return to the Normality, Constant Variance, and
Durbin-Watson options.
• Assumption Checking. Click the Assumption Checking tab to view the Normality,
Constant Variance, and Durbin-Watson options.
• Residuals. Click the Residuals tab to view the residual options.
• More Statistics. Click the More Statistics tab to view the confidence intervals, PRESS
Prediction Error, Standardized Coefficients options.
• Post Hoc. Click the Post Hoc Tests tab to view the Power options.
Options settings are saved between SigmaPlot sessions.
5. To continue the test, click Run Test.
6. To accept the current settings and close the dialog box, click OK.
297
SigmaPlot Statistics
assumption may be violated, and you should consider trying a different model (for example,
one that more closely follows the pattern of the data), or transforming one or more of the
independent variables to stabilize the variance.
P Values for Normality and Constant Variance. The P value determines the probability of
being incorrect in concluding that the data is not normally distributed (P value is the risk of
falsely rejecting the null hypothesis that the data is normally distributed). If the P computed by
the test is greater than the P set here, the test passes.
To require a stricter adherence to normality and/or constant variance, increase the P value.
Because the parametric statistical methods are relatively robust in terms of detecting violations
of the assumptions, the suggested value in SigmaPlot is 0.05. Larger values of P (for example,
0.10) require less evidence to conclude that the residuals are not normally distributed or
the constant variance assumption is violated.
To relax the requirement of normality and/or constant variance, decrease P. Requiring smaller
values of P to reject the normality assumption means that you are willing to accept greater
deviations from the theoretical normal distribution before you flag the data as non-normal. For
example, a P value of 0.01 for the normality test requires greater deviations from normality to
flag the data as non-normal than a value of 0.05.
Note
Although the assumption tests are robust in detecting data from populations that
are non-normal or with non-constant variances, there are extreme conditions of
data distribution that these tests cannot detect; however, these conditions should be
easily detected by visually examining the data without resorting to the automatic
assumption tests.
Durbin-Watson Statistic. SigmaPlot uses the Durbin-Watson statistic to test residuals
for their independence of each other. The Durbin-Watson statistic is a measure of serial
correlation between the residuals. The residuals are often correlated when the independent
variable is time, and the deviation between the observation and the regression line at one
time are related to the deviation at the previous time. If the residuals are not correlated, the
Durbin-Watson statistic will be 2.
Difference from 2 Value. Enter the acceptable deviation from 2.0 that you consider as
evidence of a serial correlation in the Difference for 2.0 box. If the computed Durbin-Watson
statistic deviates from 2.0 more than the entered value, SigmaPlot warns you that the residuals
may not be independent. The suggested deviation value is 0.50, for example, Durbin-Watson
Statistic values greater than 2.5 or less than 1.5 flag the residuals as correlated.
To require a stricter adherence to independence, decrease the acceptable difference from 2.0.
To relax the requirement of independence, increase the acceptable difference from 2.0.
Click the Residuals tab in the Options for Polynomial Regression dialog box to view the
Predicted Values, Raw, Standardized, Studentized, Studentized Deleted, and Report Flagged
Values Only options.
Predicted Values. Use this option to calculate the predicted value of the dependent variable
for each observed value of the independent variable(s), then save the results to the worksheet.
Click the selected check box if you do not want to include raw residuals in the worksheet.
To assign predicted values to a worksheet column, select the worksheet column you want
to save the predicted values to from the corresponding drop-down list. If you select none
298
8.5.4.4 Options for Polynomial Regression: More Statistics
and the Predicted Values check box is selected, the values appear in the report but are not
assigned to the worksheet.
Raw Residuals. The raw residuals are the differences between the predicted and observed
values of the dependent variables. To include raw residuals in the report, make sure this check
box is selected. Click the selected check box if you do not want to include raw residuals in
the worksheet.
To assign the raw residuals to a worksheet column, select the number of the desired column
from the corresponding drop-down list. If you select none from the drop-down list and the Raw
check box is selected, the values appear in the report but are not assigned to the worksheet.
Standardized Residuals. Select Standardized Residuals to include them in the report. The
standardized residual is the residual divided by the standard error of the estimate. The standard
error of the residuals is essentially the standard deviation of the residuals, and is a measure of
variability around the regression line.
SigmaPlot automatically flags data points lying outside of the confidence interval specified
in the corresponding box. These data points are considered to have "large" standardized
residuals, for example, outlying data points. You can change which data points are flagged by
editing the value in the Flag Values > edit box. The suggested residual value is 2.5.
Studentized Residuals. Select Studentized Residuals to include them in the report.
Studentized residuals scale the standardized residuals by taking into account the greater
precision of the regression line near the middle of the data versus the extremes. The
Studentized residuals tend to be distributed according to the Student t distribution, so the t
distribution can be used to define "large" values of the Studentized residuals. SigmaPlot
automatically flags data points with "large" values of the Studentized residuals, for example,
outlying data points; the suggested data points flagged lie outside the 95% confidence interval
for the regression population.
Studentized Deleted Residuals. Studentized deleted residuals are similar to the Studentized
residual, except that the residual values are obtained by computing the regression equation
without using the data point in question.
SigmaPlot can automatically flag data points with "large" values of the Studentized deleted
residual, for example, outlying data points; the suggested data points flagged lie outside the
95% confidence interval for the regression population.
Note
Both Studentized and Studentized deleted residuals use the same confidence interval
setting to determine outlying points.
Report Flagged Values Only . To only include only the flagged standardized and Studentized
deleted residuals in the report, select Report Flagged Values Only.
Click the More Statisticstab in the options dialog to view the confidence interval options.
You can set the confidence interval for the population, regression, or both and then save
them to the worksheet.
Confidence Interval for the Population. The confidence interval for the population gives the
range of values that define the region that contains the population from which the observations
were drawn.
To include confidence intervals for the population in the report, select Population.
299
SigmaPlot Statistics
Confidence Interval for the Regression. The confidence interval for the regression line gives
the range of values that defines the region containing the true mean relationship between the
dependent and independent variables, with the specified level of confidence.
To include confidence intervals for the regression in the report, select Regression and then
specify a confidence level by entering a value in the percentage box. The confidence level can
be any value from 1 to 99. The suggested confidence level for all intervals is 95%.
Clear the selected check box if you do not want to include the confidence intervals for the
population in the report.
Saving Confidence Intervals to the Worksheet. To save the confidence intervals to the
worksheet, select the column number of the first column you want to save the intervals to from
the Starting in Column drop-down list. The selected intervals are saved to the worksheet
starting with the specified column and continuing with successive columns in the worksheet.
PRESS Prediction Error. The PRESS Prediction Error is a measure of how well the
regression equation predicts the observations. Leave this check box selected to evaluate the fit
of the equation using the PRESS statistic. Clear the selected check box if you do not want
to include the PRESS statistic in the report.
Standardized Coefficients. These are the coefficients of the regression equation standardized
to dimensionless values,
s xi
i= bi
sy
300
8.5.6 Interpreting Incremental Polynomial Regression Results
To run a Polynomial Regression you need to select the data to test. Use the Pick Columns
dialog box to select the worksheet columns with the data you want to test.
To run a Polynomial Regression:
1. If you want to select your data before you run the regression, drag the pointer over your
data.
2. Click the Analysis tab.
3. In the Statistics group, from the Tests drop-down list, select:
Regression→Polynomial
The Pick Columns for Polynomial Regression dialog box appears. If you selected
columns before you chose the test, the selected columns appear in the column list. If you
have not selected columns, the dialog prompts you to pick your data.
4. To assign the desired worksheet columns to the Selected Columns list, select the
columns in the worksheet, or select the columns from the Data for Dependent and
Independent drop-down list.
The first selected column is assigned to the Dependent Variable row in the Selected
Columns list, and the second column is assigned to the Independent Variable row. The
title of selected columns appears in each row. You are only prompted for one dependent
and one independent variable column.
5. To change your selections, select the assignment in the list, then select new column
from the worksheet. You can also clear a column assignment by double-clicking it in the
Selected Columns list.
6. Click Finish to run the regression. If you elected to test for normality, constant variance,
and/or independent residuals, SigmaPlot performs the tests for normality (Shapiro-Wilk
or Kolmogorov-Smirnov), constant variance, and independent residuals. If your data fail
either of these tests, SigmaPlot warns you. When the test is complete, the report appears
displaying the results of the Polynomial Regression.
If you are performing a regression using one order only, and selected to place predicted
values, residuals, and/or other test results in the worksheet, they are placed in the specified
data columns and are labeled by content and source column.
Remember
Worksheet results can only be obtained using order only polynomial regression.
301
SigmaPlot Statistics
In addition to the numerical results, expanded explanations of the results may also appear. You
can turn off this text on the Options dialog box. You can also set the number of decimal places
to display in the Options dialog box. For more information, see Setting Report Options.
MSincr (Incremental Mean Square). The incremental mean square is a measure of the
reduction in variation of the residuals about the regression equation gained with this order
polynomial
incremental sum of squares SS incr
= = MS incr
incremental degrees of freedom DF incr
302
8.5.6.3 Assumption Testing
F Value. The F test statistic gauges the ability of the independent variable in predicting the
dependent variable.
• The incremental F value gauges the increase in contribution of each added order of the
independent variable in predicting the dependent variable. It is the ratio
incremental variation from the dependent variable mean MS res
=
residual variation about the regression curve MS res
If the incremental F is large and the overall F jumps to a large number, you can conclude that
adding that order of the independent variables predicts the dependent variable significantly
better than the previous model. The "best" order polynomial to use is generally the highest
order polynomial that produces a marked improvement in predictive ability.
• Overall F value gauges the contribution of all orders of the independent variable in
predicting the dependent variable. It is the ratio
When the overall F ratio is around 1, you can conclude that there is no association between
the independent variables (for example, the data is consistent with the null hypothesis that all
the samples are just randomly distributed).
P Value. P is the P value calculated for F. The P value is the probability of being wrong in
concluding that there is a true association between the dependent and independent variables
(for example, the probability of falsely rejecting the null hypothesis, or committing a Type
I error, based on F). The smaller the P value, the greater the probability that there is an
association.
• The incremental P value is the change in probability of being wrong that the added
independent variable order improves the prediction of the dependent variable.
• The overall P value is the probability of being wrong that order of polynomial correctly
predicts the dependent variable.
Traditionally, you can conclude that the independent variable can be used to predict the
dependent variable when P < 0.05.
Normality. Normality test result displays whether or not the polynomial model passed or
failed the test of the assumption that the source population is normally distributed around the
regression curve, and the P value calculated by the test. All regression requires a source
population to be normally distributed about the regression curve.
When this assumption may be violated, a warning appears in the report. Failure of the
normality test can indicate the presence of outlying influential points or an incorrect regression
model.
Constant Variance. The constant variance test results list whether or not that polynomial
model passed the test for constant variance of the residuals about the regression, and the
P value computed for that order polynomial. All regression techniques require a normal
distribution of the residuals about the regression curve.
303
SigmaPlot Statistics
Rsq. The coefficient of determination R2 is a measure of how well the regression model
describes the data.
R2 values near 1 indicate that the curve is a good description of the relation between the
independent and dependent variables. R2 values near 0 indicate that the values of the
independent variable do not predict the dependent variables.
F Statistic. The F test statistic gauges the contribution of the regression equation to predict the
dependent variable. It is the ratio
304
8.5.7.3 Standard Error of the Estimate
If F is a large number, you can conclude that the independent variable contributes to the
prediction of the dependent variable (for example, the "unexplained variability" is smaller
than what is expected from random sampling variability of the dependent variable about its
mean). If the F ratio is around 1, you can conclude that there is no association between the
variables (for example, the data is consistent with the null hypothesis that all the samples are
just randomly distributed).
P Value. P is the P value calculated for F. The P value is the probability of being wrong in
concluding that there is a true association between the variables (for example, the probability
of falsely rejecting the null hypothesis, or committing a Type I error, based on F). The smaller
the P value, the greater the probability that the variables are correlated.
305
SigmaPlot Statistics
stabilize the variance and obtain more accurate estimates of the parameters in the regression
equation.
This result appears unless you disabled constant variance testing in the Options for Polynomial
Regression dialog box.
The regression diagnostic results display only the values for the predicted values, residual
results, and other diagnostics selected in the Options for Polynomial Regression dialog. All
results that qualify as outlying values are flagged with a < symbol. The trigger values to flag
residuals as outliers are set in the Options for Polynomial Regression dialog.
If you selected Report Cases with Outliers Only, only those observations that have one or
more residuals flagged as outliers are reported; however, all other results for that observation
are also displayed.
Row. This is the row number of the observation.
Residuals. These are the raw residuals, the difference between the predicted and observed
values for the dependent variables.
Standardized Residuals. The standardized residual is the raw residual divided by the
standard error of the estimate syx .
If the residuals are normally distributed about the regression line, about 66% of the
standardized residuals have values between -1 and +1, and about 95% of the standardized
residuals have values between -2 and +2. A larger standardized residual indicates that the
point is far from the regression line; the suggested value flagged as an outlier is 2.5.
These results are displayed if you selected them in the Options for Polynomial Regression
dialog box. If the confidence interval does not include zero, you can conclude that the
coefficient is different than zero with the level of confidence specified. This can also be
described as P < α (alpha), where α is the acceptable probability of incorrectly concluding that
the coefficient is different than zero, and the confidence interval is 100(1- α).
The specified confidence level can be any value from 1 to 99; the suggested confidence level
for both intervals is 95%.
Row. This is the row number of the observation.
Predicted. This is the value for the dependent variable predicted by the regression model
for each observation.
Regression. These are the values that define the region containing the true relationship
between the dependent and independent variables, for the specified level of confidence,
centered at the predicted value.
This result is displayed if you selected it in the Options for Polynomial Regression dialog
box. The specified confidence level can be any value from 1 to 99; the suggested confidence
level is 95%.
Population Confidence Interval . These are the values that define the region containing the
population from which the observations were drawn, for the specified level of confidence,
centered at the predicted value.
306
8.5.8 Polynomial Regression Report Graphs
This result is displayed if you selected it in the Options for Polynomial Regression dialog
box. The specified confidence level can be any value from 1 to 99; the suggested confidence
level is 95%.
1. With the Polynomial Regression report in view, click the Report tab.
2. In the Result Graphs group, click Create Result Graph.
The Create Result Graph dialog box appears displaying the types of graphs available for
the Polynomial Regression report.
3. Select the type of graph you want to create from the Graph Type list, then click OK, or
double-click the desired graph in the list.
307
SigmaPlot Statistics
Subsets Regression. If the relationship is not a straight line or plane, use Polynomial or
Nonlinear Regression.
1. Enter or arrange your data in the worksheet. For more information, see 8.6.3 Arranging
Stepwise Regression Data.
2. If desired, set the Stepwise Regression options.
308
8.6.3 Arranging Stepwise Regression Data
1. If you are going to run the test after changing test options, and want to select your data
before you run the test, drag the pointer over your data.
2. Select Forward Stepwise Regression from the Select Test drop-down list in the
Statistics group on the Analysis tab.
3. Click Options. The Options for Forward Stepwise Regression dialog box appears
with five tabs:
• Criterion. Click the Criterion tab to return to the F-to-Enter, F-to-Remove, and
Number of Steps options. For more information, see 8.6.4.1 Options for Forward
Stepwise Regression: Criterion.
• Assumption Checking. Click the Assumption Checking tab to view the Normality,
Constant Variance, and Durbin-Watson options. For more information, see 8.6.4.2
Options for Forward Stepwise Regression: Assumption Checking.
309
SigmaPlot Statistics
• Residuals. Click the Residuals tab to view the residual options.For more information,
see 8.6.4.3 Options for Forward Stepwise Regression: Residuals.
• More Statistics. Click the More Statistics tab to view the confidence intervals, PRESS
Prediction Error, Standardized Coefficients options. For more information, see 8.6.4.4
Options for Forward Stepwise Regression: More Statistics.
• Other Diagnostics. Click the Post Hoc Tests tab to view the Power options. For more
information, see 8.6.4.5 Options for Forward Stepwise Regression: Other Diagnostics.
Options settings are saved between SigmaPlot sessions.
4. To continue the test, click Run Test. For more information, see 8.6.6 Running a Stepwise
Regression.
5. To accept the current settings and close the dialog box, click OK.
310
8.6.4.2 Options for Forward Stepwise Regression: Assumption Checking
ability of the regression equation to predict the dependent variable are still accepted. However,
the regression may still contain redundant variables, resulting in multicollinearity.
Remember
The F-to-Remove value should always be less than or equal to the F-to-Enter value, to
avoid cycling variables in and out of the regression model.
Increasing the F-to-Remove value makes it easier to delete variables from the equation, as
variables that contain more predictive value can be removed. Important variables may also
be deleted, however.
Tip
If you are performing forwards stepwise regression and you want any variable that has
been to entered to remain in the equation, set the F-to-Remove value to zero.
Number of Steps. Use this option to set the maximum number of steps permitted before
the stepwise algorithm stops. Note that if the algorithm stops because it ran out of steps,
the results are probably not reliable. The suggested number of steps is 20 added or deleted
independent variables.
311
SigmaPlot Statistics
0.10) require less evidence to conclude that the residuals are not normally distributed or
the constant variance assumption is violated.
To relax the requirement of normality and/or constant variance, decrease P. Requiring smaller
values of P to reject the normality assumption means that you are willing to accept greater
deviations from the theoretical normal distribution before you flag the data as non-normal. For
example, a P value of 0.01 for the normality test requires greater deviations from normality to
flag the data as non-normal than a value of 0.05.
Note
Although the assumption tests are robust in detecting data from populations that
are non-normal or with non-constant variances, there are extreme conditions of
data distribution that these tests cannot detect; however, these conditions should be
easily detected by visually examining the data without resorting to the automatic
assumption tests.
Durbin-Watson Statistic. SigmaPlot uses the Durbin-Watson statistic to test residuals
for their independence of each other. The Durbin-Watson statistic is a measure of serial
correlation between the residuals. The residuals are often correlated when the independent
variable is time, and the deviation between the observation and the regression line at one
time are related to the deviation at the previous time. If the residuals are not correlated, the
Durbin-Watson statistic will be 2.
Difference from 2 Value Enter the acceptable deviation from 2.0 that you consider as
evidence of a serial correlation in the Difference for 2.0 box. If the computed Durbin-Watson
statistic deviates from 2.0 more than the entered value, SigmaPlot warns you that the residuals
may not be independent. The suggested deviation value is 0.50, for example, Durbin-Watson
Statistic values greater than 2.5 or less than 1.5 flag the residuals as correlated.
To require a stricter adherence to independence, decrease the acceptable difference from 2.0.
To relax the requirement of independence, increase the acceptable difference from 2.0.
312
8.6.4.4 Options for Forward Stepwise Regression: More Statistics
of the residuals, and is a measure of variability around the regression line. To include
standardized residuals in the report, make sure this check box is selected.
SigmaPlot automatically flags data points lying outside of the confidence interval specified
in the corresponding box. These data points are considered to have "large" standardized
residuals, for example, outlying data points. You can change which data points are flagged by
editing the value in the Flag Values > edit box.
Studentized Residuals. Studentized residuals scale the standardized residuals by taking
into account the greater precision of the regression line near the middle of the data versus
the extremes. The Studentized residuals tend to be distributed according to the Student t
distribution, so the t distribution can be used to define "large" values of the Studentized
residuals. SigmaPlot automatically flags data points with "large" values of the Studentized
residuals, for example, outlying data points; the suggested data points flagged lie outside the
95% confidence interval for the regression population.
To include Studentized residuals in the report, make sure this check box is selected. Click the
selected check box if you do not want to include Studentized residuals in the worksheet.
Studentized Deleted Residuals Studentized deleted residuals are similar to the Studentized
residual, except that the residual values are obtained by computing the regression equation
without using the data point in question.
To include Studentized deleted residuals in the report, make sure this check box is selected.
Click the selected check box if you do not want to include Studentized deleted residuals in
the worksheet.
SigmaPlot can automatically flag data points with "large" values of the Studentized deleted
residual, for example, outlying data points; the suggested data points flagged lie outside the
95% confidence interval for the regression population.
Note
Both Studentized and Studentized deleted residuals use the same confidence interval
setting to determine outlying points.
Report Flagged Values Only. To only include only the flagged standardized and Studentized
deleted residuals in the report, make sure the Report Flagged Values Only check box is
selected. Clear this option to include all standardized and Studentized residuals in the report.
313
SigmaPlot Statistics
box. The confidence level can be any value from 1 to 99. The suggested confidence level is
95%. Click the selected check box if you do not want to include the confidence intervals for
the population in the report.
Clear the selected check box if you do not want to include the confidence intervals for the
population in the report.
Saving Confidence Intervals to the Worksheet. To save the confidence intervals to the
worksheet, select the column number of the first column you want to save the intervals to from
the Starting in Column drop-down list. The selected intervals are saved to the worksheet
starting with the specified column and continuing with successive columns in the worksheet.
PRESS Prediction Error. The PRESS Prediction Error is a measure of how well the
regression equation predicts the observations. Leave this check box selected to evaluate the fit
of the equation using the PRESS statistic. Clear the selected check box if you do not want
to include the PRESS statistic in the report.
Standardized Coefficients. These are the coefficients of the regression equation standardized
to dimensionless values,
s xi
i= b i
sy
where bI = regression coefficient, sxi = standard deviation of the independent variable xi, and sy
= standard deviation of dependent variable y.
To include the standardized coefficients in the report, select Standardized Coefficients. Clear
the check box if you do not want to include the standardized coefficients in the worksheet.
314
8.6.4.5.1 Variance Inflation Factor
Observations with high leverage tend to be at the extremes of the independent variables,
where small changes in the independent variables can have large effects on the predicted
values of the dependent variable.
The expected leverage of a data point is
(k + 1)
n
, where there are k independent variables and n data points. Observations with leverages much
higher than the expected leverages are potentially influential points.
Select the Leverage check box to compute the leverage for each point and automatically flag
potentially influential points, for example, those points that could have leverages greater than
the specified value times the expected leverage. The suggested value is 2.0 times the expected
leverage for the regression (for example,
2(k + 1)
n
). To avoid flagging more potentially influential points, increase this value; to flag points
with less potential influence, lower this value.
Cook’s Distance. Cook’s distance is a measure of how great an effect each point has on the
estimates of the parameters in the regression equation. Cook’s distance assesses how much the
values of the regression coefficients change if a point is deleted from the analysis. Cook’s
distance depends on both the values of the independent and dependent variables.
Select the Cook’s Distance check box to compute this value for all points and flag influential
points, for example, those with a Cook’s distance greater than the specified value. The
suggested value is 4.0. Cook’s distances above 1 indicate that a point is possibly influential.
Cook’s distances exceeding 4 indicate that the point has a major effect on the values of the
parameter estimates. To avoid flagging more influential points, increase this value: to flag less
influential points, lower this value.
Report Flagged Values Only. To only include only the influential points flagged by the
influential point tests in the report, make sure the Report Flagged Values Only check box is
selected. Clear this option to include all influential points in the report.
The Variance Inflation Factor option measures the multicollinearity of the independent
variables, or the linear combination of the independent variables in the fit.
Regression procedures assume that the independent variables are statistically independent of
each other, for example, that the value of one independent variable does not affect the value of
another. However, this ideal situation rarely occurs in the real world. When the independent
variables are correlated, or contain redundant information, the estimates of the parameters in
the regression model can become unreliable.
The parameters in regression models quantify the theoretically unique contribution of each
independent variable to predicting the dependent variable. When the independent variables are
correlated, they contain some common information and "contaminate" the estimates of the
parameters. If the multicollinearity is severe, the parameter estimates can become unreliable.
There are two types of multicollinearity.
315
SigmaPlot Statistics
8.6.4.5.2 Power
Select the Other Diagnostics tab in the options dialog to view the Power options. If Other
Diagnostic is hidden, click the right pointing arrow to the right of the tabs to move it into view.
Use the left pointing arrow to move the other tabs back into view.
The power of a regression is the power to detect the observed relationship in the data. The
alpha (α) is the acceptable probability of incorrectly concluding there is a relationship.
Check the Power check box to compute the power for the stepwise linear regression data.
Change the alpha value by editing the number in the Alpha Value edit box. The suggested
value is α = 0.05. This indicates that a one in twenty chance of error is acceptable, or that you
are willing to conclude there is a significant relationship when P < 0.05.
Smaller values of α result in stricter requirements before concluding there is a significant
relationship, but a greater possibility of concluding there is no relationship when one exists.
Larger values of α make it easier to conclude that there is a relationship, but also increase the
risk of reporting a false positive.
316
8.6.5 Setting Backward Stepwise Regression Options
1. If you are going to run the test after changing test options, and want to select your data
before you run the test, drag the pointer over your data.
2. Select Backward Stepwise Regression from the Testsdrop-down list in the Statistics
group on the Analysis tab.
3. Click Options. The Options for Backward Stepwise Regression dialog box appears
with five tabs:
• Criterion. Click the Criterion tab to return to the F-to-Enter, F-to-Remove, and
Number of Steps options. For more information, see 8.6.5.1 Options for Backward
Stepwise Regression: Criterion.
• Assumption Checking. Click the Assumption Checking tab to view the Normality,
Constant Variance, and Durbin-Watson options. For more information, see 8.6.5.2
Options for Backward Stepwise Regression: Assumption Checking.
• Residuals. Click the Residuals tab to view the residual options. For more information,
see 8.6.5.3 Options for Backward Stepwise Regression: Residuals.
• More Statistics. Click the More Statistics tab to view the confidence intervals, PRESS
Prediction Error, Standardized Coefficients options. For more information, see 8.6.5.4
Options for Backward Stepwise Regression: More Statistics.
• Other Diagnostics. Click the Post Hoc Tests tab to view the Power options. For more
information, see 8.6.5.5 Options for Backward Stepwise Regression: Other Diagnostics.
Options settings are saved between SigmaPlot sessions.
4. To continue the test, click Run Test. For more information, see 8.6.6 Running a Stepwise
Regression.
5. To accept the current settings and close the dialog box, click OK.
317
SigmaPlot Statistics
F-to-Enter Value. The F-to-Enter value controls which independent variables are entered
into the regression equation during forward stepwise regression or replaced after each step
during backwards stepwise regression.
The F-to-Enter value is the minimum incremental F value associated with an independent
variable before it can be entered into the regression equation. All independent variables
producing incremental F values above the F-to-Enter value are added to the model.
The suggested F-to-Enter value is 4.0. Increasing F-to-Enter requires a potential independent
variable to have a greater effect on the ability of the regression equation to predict the
dependent variable before it is accepted, but may stop too soon and exclude important
variables.
Remember
The F-to-Enter value should always be greater than or equal to the F-to-Remove
value, to avoid cycling variables in and out of the regression model.
Reducing the F-to-Enter value makes it easier to add a variable, because it relaxes the
importance of a variable required before it is accepted, but may produce redundant variables
and result in multicollinearity.
Tip
If you are performing backwards stepwise regression and you want any variable
that has been removed to remain deleted, increase the F-to-Enter value to a large
number, for example, 100000.
F-to-Remove Value. The F-to-Remove value controls which independent variables are
deleted from the regression equation during backwards stepwise regression, or removed after
each step in backward stepwise regression.
The F-to-Remove is the maximum incremental F value associated with an independent
variable before it can be removed from the regression equation. All independent variables
producing incremental F values below the F-to-Remove value are deleted from the model.
The suggested F-to-Remove value is 3.9. Reducing the F-to-Remove value makes it easier to
retain a variable in the regression equation because variables that have smaller effects on the
ability of the regression equation to predict the dependent variable are still accepted. However,
the regression may still contain redundant variables, resulting in multicollinearity.
Remember
The F-to-Remove value should always be less than or equal to the F-to-Enter value, to
avoid cycling variables in and out of the regression model.
Increasing the F-to-Remove value makes it easier to delete variables from the equation, as
variables that contain more predictive value can be removed. Important variables may also
be deleted, however.
Tip
If you are performing backward stepwise regression and you want any variable that
has been to entered to remain in the equation, set the F-to-Remove value to zero.
Number of Steps. Use this option to set the maximum number of steps permitted before
the stepwise algorithm stops. Note that if the algorithm stops because it ran out of steps,
the results are probably not reliable. The suggested number of steps is 20 added or deleted
independent variables.
318
8.6.5.2 Options for Backward Stepwise Regression: Assumption Checking
319
SigmaPlot Statistics
time are related to the deviation at the previous time. If the residuals are not correlated, the
Durbin-Watson statistic will be 2.
Difference from 2 Value Enter the acceptable deviation from 2.0 that you consider as
evidence of a serial correlation in the Difference for 2.0 box. If the computed Durbin-Watson
statistic deviates from 2.0 more than the entered value, SigmaPlot warns you that the residuals
may not be independent. The suggested deviation value is 0.50, for example, Durbin-Watson
Statistic values greater than 2.5 or less than 1.5 flag the residuals as correlated.
To require a stricter adherence to independence, decrease the acceptable difference from 2.0.
To relax the requirement of independence, increase the acceptable difference from 2.0.
Click the Residuals tab in the options dialog box to view the Predicted Values, Raw,
Standardized, Studentized, Studentized Deleted, and Report Flagged Values Only options.
Predicted Values. Select this option to calculate the predicted value of the dependent
variable for each observed value of the independent variable(s), then save the results to the
data worksheet. Click the selected check box if you do not want to include raw residuals in
the worksheet.
To assign predicted values to a worksheet column, select the worksheet column you want to
save the predicted values to from the corresponding drop-down list. If you select none and the
Predicted Values check box is selected, the values appear in the report but are not assigned to
the worksheet.
Raw Residuals. The raw residuals are the differences between the predicted and observed
values of the dependent variables. To include raw residuals in the report, make sure this check
box is selected. Click the selected check box if you do not want to include raw residuals in
the worksheet.
To assign the raw residuals to a worksheet column, select the number of the desired column
from the corresponding drop-down list. If you select none from the drop-down list and the Raw
check box is selected, the values appear in the report but are not assigned to the worksheet.
Standardized Residuals. The standardized residual is the residual divided by the standard
error of the estimate. The standard error of the residuals is essentially the standard deviation
of the residuals, and is a measure of variability around the regression line. To include
standardized residuals in the report, make sure this check box is selected.
SigmaPlot automatically flags data points lying outside of the confidence interval specified
in the corresponding box. These data points are considered to have "large" standardized
residuals, for example, outlying data points. You can change which data points are flagged by
editing the value in the Flag Values > edit box.
Studentized Residuals. Studentized residuals scale the standardized residuals by taking
into account the greater precision of the regression line near the middle of the data versus
the extremes. The Studentized residuals tend to be distributed according to the Student t
distribution, so the t distribution can be used to define "large" values of the Studentized
residuals. SigmaPlot automatically flags data points with "large" values of the Studentized
residuals, for example, outlying data points; the suggested data points flagged lie outside the
95% confidence interval for the regression population.
To include Studentized residuals in the report, make sure this check box is selected. Click the
selected check box if you do not want to include Studentized residuals in the worksheet.
320
8.6.5.4 Options for Backward Stepwise Regression: More Statistics
Studentized Deleted Residuals Studentized deleted residuals are similar to the Studentized
residual, except that the residual values are obtained by computing the regression equation
without using the data point in question.
To include Studentized deleted residuals in the report, make sure this check box is selected.
Click the selected check box if you do not want to include Studentized deleted residuals in
the worksheet.
SigmaPlot can automatically flag data points with "large" values of the Studentized deleted
residual, for example, outlying data points; the suggested data points flagged lie outside the
95% confidence interval for the regression population.
Note
Both Studentized and Studentized deleted residuals use the same confidence interval
setting to determine outlying points.
Report Flagged Values Only. To only include only the flagged standardized and Studentized
deleted residuals in the report, make sure the Report Flagged Values Only check box is
selected. Clear this option to include all standardized and Studentized residuals in the report.
321
SigmaPlot Statistics
sx
i= bi
sy
where bI = regression coefficient, xIs = standard deviation of the independent variable xI, and sy
= standard deviation of dependent variable y.
To include the standardized coefficients in the report, select Standardized Coefficients. Clear
the check box if you do not want to include the standardized coefficients in the worksheet.
, where there are k independent variables and n data points. Observations with leverages much
higher than the expected leverages are potentially influential points.
Select the Leverage check box to compute the leverage for each point and automatically flag
potentially influential points, for example, those points that could have leverages greater than
the specified value times the expected leverage. The suggested value is 2.0 times the expected
leverage for the regression (for example,
2(k + 1)
n
322
8.6.5.5.1 Variance Inflation Factor
). To avoid flagging more potentially influential points, increase this value; to flag points
with less potential influence, lower this value.
Cook’s Distance. Cook’s distance is a measure of how great an effect each point has on the
estimates of the parameters in the regression equation. Cook’s distance assesses how much the
values of the regression coefficients change if a point is deleted from the analysis. Cook’s
distance depends on both the values of the independent and dependent variables.
Select the Cook’s Distance check box to compute this value for all points and flag influential
points, for example, those with a Cook’s distance greater than the specified value. The
suggested value is 4.0. Cook’s distances above 1 indicate that a point is possibly influential.
Cook’s distances exceeding 4 indicate that the point has a major effect on the values of the
parameter estimates. To avoid flagging more influential points, increase this value: to flag less
influential points, lower this value.
Report Flagged Values Only. To only include only the influential points flagged by the
influential point tests in the report, make sure the Report Flagged Values Only check box is
selected. Clear this option to include all influential points in the report.
323
SigmaPlot Statistics
When the variance inflation factor is large, there are redundant variables in the regression
model, and the parameter estimates may not be reliable. Variance inflation factor values above
4 suggest possible multicollinearity; values above 10 indicate serious multicollinearity.
What to Do About Multicollinearity. Sample-based multicollinearity can sometimes be
resolved by collecting more data under other conditions to break up the correlation among the
independent variables. If this is not possible, the regression equation is over parameterized and
one or more of the independent variables must be dropped to eliminate the multicollinearity.
Structural multicollinearities can be resolved by centering the independent variable before
forming the power or interaction terms.
Report Flagged Values Only. To only include only the points flagged by the influential
point tests and values exceeding the variance inflation threshold in the report, make sure the
Report Flagged Values Only check box is selected. Clear this option to include all influential
points in the report.
What to Do About Influential Points. Influential points have two possible causes:
• There is something wrong with the data point, caused by an error in observation or data
entry.
• The model is incorrect.
If a mistake was made in data collection or entry, correct the value. If you do not know the
correct value, you may be able to justify deleting the data point. If the model appears to be
incorrect, try regression with different independent variables, or a Nonlinear Regression.
8.6.5.5.2 Power
Click the Other Diagnostics tab in the options dialog to view the Power options. If Other
Diagnostic is hidden, click the right pointing arrow to the right of the tabs to move it into view.
Use the left pointing arrow to move the other tabs back into view.
The power of a regression is the power to detect the observed relationship in the data. The
alpha (α) is the acceptable probability of incorrectly concluding there is a relationship.
Select Power to compute the power for the stepwise linear regression data. Change the alpha
value by editing the number in the Alpha Value box. The suggested value is α = 0.05. This
indicates that a one in twenty chance of error is acceptable, or that you are willing to conclude
there is a significant relationship when P < 0.05.
Smaller values of α result in stricter requirements before concluding there is a significant
relationship, but a greater possibility of concluding there is no relationship when one exists.
Larger values of α make it easier to conclude that there is a relationship, but also increase the
risk of reporting a false positive.
To run a Stepwise Regression you need to select the data to test. Use the Pick Columns dialog
box to select the worksheet columns with the data you want to test.
To run a Stepwise Regression:
1. If you want to select your data before you run the regression, drag the pointer over your
data.
324
8.6.7 Interpreting Stepwise Regression Results
The first selected column is assigned to the Dependent Variable row in the Selected
Columns list, and the second column is assigned to the Independent Variable row. The
title of selected columns appears in each row. You are only prompted for one dependent
and one independent variable column.
5. To change your selections, select the assignment in the list, then select new column
from the worksheet. You can also clear a column assignment by double-clicking it in the
Selected Columns list.
6. Click Finish to run the regression. If you elected to test for normality, constant variance,
and/or independent residuals, SigmaPlot performs the tests for normality (Shapiro-Wilk
or Kolmogorov-Smirnov), constant variance, and independent residuals. If your data fail
either of these tests, SigmaPlot warns you. When the test is complete, the report appears
displaying the results of the Stepwise Regression.
If you are performing a regression using one order only, and selected to place predicted
values, residuals, and/or other test results in the worksheet, they are placed in the specified
data columns and are labeled by content and source column.
Note
Worksheet results can only be obtained using order only stepwise regression.
325
SigmaPlot Statistics
8.6.7.2 Step
The step number, variable added or removed, R, R2 and the adjusted R2 for the equation, and
standard error of the estimate are all listed under this heading.
R and R Squared. R, the multiple correlation coefficient, and R2, the coefficient of
determination for Stepwise Regression, are both measures of how well the regression model
describes the data. R values near 1 indicate that the equation is a good description of the
relation between the independent and dependent variables.
R equals 0 when the values of the independent variable does not allow any prediction of the
dependent variables, and equals 1 when you can perfectly predict the dependent variables from
the independent variables.
Adjusted R Squared. The adjusted R2, R2adj , is also a measure of how well the regression
model describes the data, but takes into account the number of independent variables, which
reflects the degrees of freedom. Larger R2adj values (nearer to 1) indicate that the equation is a
good description of the relation between the independent and dependent variables.
Standard Error of the Estimate. The standard error of the estimate syx is a measure of the
actual variability about the regression plane of the underlying population. The underlying
population generally falls within about two standard errors of the observed sample. This
statistic is displayed for the results of each step.
326
8.6.7.4 Variables in Model
The residual mean square is a measure of the variation of the residuals about the regression
plane, or
If F is a large number, you can conclude that the independent variables contribute to the
prediction of the dependent variable (for example, at least one of the coefficients is different
from zero, and the "unexplained variability" is smaller than what is expected from random
sampling variability of the dependent variable about its mean). If the F ratio is around 1,
you can conclude that there is no association between the variables (for example, the data is
consistent with the null hypothesis that all the samples are just randomly distributed).
P Value. The P value is the probability of being wrong in concluding that there is an
association between the dependent and independent variables (for example, the probability of
falsely rejecting the null hypothesis, or committing a Type I error, based on F). The smaller
the P value, the greater the probability that there is an association.
Traditionally, you can conclude that the independent variable can be used to predict the
dependent variable when P < 0.05.
Information about the independent variables used in the regression equation for the current
step are listed under this heading. The value of the variable coefficients, standard errors, the
F-to-Remove, and the corresponding P value for the F-to-Remove are listed. These statistics
are displayed for each step. An asterisk (*) indicates variables that were forced into the model.
327
SigmaPlot Statistics
Coefficients. The value for the constant and coefficients of the independent variables for the
regression model are listed.
Standard Error . The standard errors are estimates of the regression coefficients (analogous
to the standard error of the mean). The true regression coefficients of the underlying
population generally fall within about two standard errors of the observed sample coefficients.
Large standard errors may indicate multicollinearity.
F-to-Enter. The F-to-Enter gauges the increase in predicting the dependent variable gained by
adding the independent variable to the regression equation. It is the ratio
regression variation from the dependent variable mean associated with adding x when x 1, ..., x j 1 are a
residual variation about the regression containing x 1, ..., x j
If the F-to-Enter for a variable is larger than the F-to-Enter cutoff specified with the Stepwise
Regression options, the variable remains in or is added back to the equation.
Note
The F-to-Remove value is the cutoff that determines if a variable is removed from or
stays out of the equation.
P Value. P is the P value calculated for the F-to-Enter value. The P value is the probability
of being wrong in concluding that adding the independent variable contributes to predicting
the dependent variable (for example, the probability of falsely rejecting the null hypothesis,
or committing a Type I error, based on F-to-Enter). The smaller the P value, the greater the
probability that adding the variable contributes to the model.
Traditionally, you can conclude that the independent variable can be used to predict the
dependent variable when P < 0.05.
The variables not entered or removed from the model are listed under this heading, along with
their corresponding F-to-Remove and P values.
F-to-Remove. The F-to-Remove gauges the increase in predicting the dependent variable
gained by removing the independent variable from the regression equation.
If the F-to-Remove for a variable is larger than the F-to-Remove cutoff specified with the
stepwise regression options, the variable is removed from or stays out of the equation.
Remember
It is the F-to-Enter value that determines which variable is reentered into or remains in
the equation.
P Value. P is the P value calculated for the F-to-Remove value. The P value is the probability
of being wrong in concluding that removing the independent variable contributes to predicting
the dependent variable (for example, the probability of falsely rejecting the null hypothesis,
or committing a Type I error, based on F-to-Enter). The smaller the P value, the greater the
probability that removing the variable contributes to the model.
Traditionally, you can conclude that the independent variable can be used to predict the
dependent variable when P < 0.05.
328
8.6.7.6 PRESS Statistic
8.6.7.10 Power
This result is displayed if you selected this option in the Options for Stepwise Regression
dialog box.
The power, or sensitivity, of a regression is the probability that the model correctly describes
the relationship of the variables, if there is a relationship.
329
SigmaPlot Statistics
The regression diagnostic results display only the values for the predicted and residual results
selected in the Options for Stepwise Regression dialog. All results that qualify as outlying
values are flagged with a < symbol. The trigger values to flag residuals as outliers are set in
the Options for Stepwise Regression dialog box.
If you selected Report Cases with Outliers Only, only those observations that have one or
more residuals flagged as outliers are reported; however, all other results for that observation
are also displayed.
Predicted Values. This is the value for the dependent variable predicted by the regression
model for each observation. If these values were saved to the worksheet, they may be used to
plot the regression using SigmaPlot .
Residuals. These are the raw residuals, the difference between the predicted and observed
values for the dependent variables.
Standardized Residuals. The standardized residual is the raw residual divided by the
standard error of the estimate.
If the residuals are normally distributed about the regression, about 66% of the standardized
residuals have values between -1 and +1, and about 95% of the standardized residuals have
values between -2 and +2. A larger standardized residual indicates that the point is far from
the regression; the suggested value flagged as an outlier is 2.5.
Studentized Residuals. The Studentized residual is a standardized residual that also takes
into account the greater confidence of the data points in the "middle" of the data set. By
weighting the values of the residuals of the extreme data points (those with the lowest and
highest independent variable values), the Studentized residual is more sensitive than the
standardized residual in detecting outliers.
Both Studentized and Studentized deleted residuals that lie outside a specified confidence
interval for the regression are flagged as outlying points: the suggested confidence value is
95%.
This residual is also known as the internally Studentized residual, because the standard error
of the estimate is computed using all data.
Studentized Deleted Residual. The Studentized deleted residual, or externally Studentized
residual, is a Studentized residual which uses the standard error of the estimate syx(—i),
computed by deleting the data point associated with the residual. This reflects the greater
effect of outlying points by deleting the data point from the variance computation.
330
8.6.7.12 Influence Diagnostics
Both Studentized and Studentized deleted residuals that lie outside a specified confidence
interval for the regression are flagged as outlying points; the suggested confidence value is
95%.
The Studentized deleted residual is more sensitive than the Studentized residual in detecting
outliers, since the Studentized deleted residual results in much larger values for outliers than
the Studentized residual.
331
SigmaPlot Statistics
(alpha), where α is the acceptable probability of incorrectly concluding that the coefficient is
different than zero, and the confidence interval is 100(1 - α).
The specified confidence level can be any value from 1 to 99; the suggested confidence level
for both intervals is 95%.
Pred (Predicted Values). This is the value for the dependent variable predicted by the
regression model for each observation.
Mean. The confidence interval for the regression gives the range of variable values computed
for the region containing the true relationship between the dependent and independent
variables, for the specified level of confidence.
Obs (Observations). The confidence interval for the population gives the range of variable
values computed for the region containing the population from which the observations were
drawn, for the specified level of confidence.
The Create Result Graph dialog box appears displaying the types of graphs available for
the Stepwise Regression results.
3. Select the type of graph you want to create from the Graph Type list, then click OK.
For more information, see 11.1 Generating Report Graphs.
332
8.7 Best Subsets Regression
333
SigmaPlot Statistics
Mallows. Cp is a gauge of the size of the bias introduced into the estimate of the dependent
variable when independent variables are omitted from the regression equation, as computed
from the number of parameters plus a measure of the difference between the predicted and
true population means of the dependent variable.
The optimal value of Cp is equal to the number of parameters (the independent variables used
in the subset plus the constant), or: Cp=p=k+1
where p is the number of parameters and k is the number of independent variables.
The closer the value of Cp is to the number of parameters, the less likely a relevant variable
was omitted. Note that the fully specified model will always have a Cp=p.
1. Enter or arrange your data in the worksheet. For more information, see 8.7.4 Arranging
Best Subset Regression Data.
2. If desired, set the Best Subset Regression options. For more information, see 8.7.5 Setting
Best Subset Regression Options.
3. Click the Analysis tab.
4. In the Statistics group, from the Tests drop-down list, select:
Regression→Best Subsets
5. View and interpret the Best Subset Regression report. For more information, see 8.7.7
Interpreting Best Subset Regression Results.
1. If you are going to run the test after changing test options, and want to select your data
before you run the test, drag the pointer over your data.
334
8.7.5.1 Options for Best Subset Regression: Criterion
2. Select Best Subset Regression from the Select Test drop-down list in the Statistics
group on the Analysis tab.
3. Click Options.
The Options for Best Subset Regression dialog box appears with the Criterion tab in
view. For more information, see 8.7.5.1 Options for Best Subset Regression: Criterion.
Options settings are saved between SigmaPlot sessions.
4. To continue the test, click Run Test. For more information, see 8.6.6 Running a Stepwise
Regression.
5. To accept the current settings and close the dialog box, click OK.
335
SigmaPlot Statistics
Use the value in the Flag Values > edit box as a threshold for multicollinear variables. The
default threshold value is 4.0, meaning that any value greater than 4.0 will be flagged as
multicollinear. To make this test more sensitive to possible multicollinearity, decrease this
value. To allow greater correlation of the independent variables before flagging the data as
multicollinear, increase this value.
When the variance inflation factor is large, there are redundant variables in the regression
model, and the parameter estimates may not be reliable. Variance inflation factor values above
4 suggest possible multicollinearity; values above 10 indicate serious multicollinearity.
What to Do About Multicollinearity. Sample-based multicollinearity can sometimes be
resolved by collecting more data under other conditions to break up the correlation among the
independent variables. If this is not possible, the regression equation is over parameterized and
one or more of the independent variables must be dropped to eliminate the multicollinearity.
Structural multicollinearities can be resolved by centering the independent variable before
forming the power or interaction terms.
To run a Best Subset Regression, you need to select the data to test. You use the Pick Columns
dialog box to select the worksheet columns with the data you want to test.
To run a Best Subset Regression:
1. If you want to select your data before you run the regression, drag the pointer over your
data.
2. Click the Analysis tab.
3. In the Statistics group, from the Tests drop-down list, select:
Regression→Best Subsets
336
8.7.7 Interpreting Best Subset Regression Results
The Pick Columns for Best Subset Regression dialog box appears. If you selected
columns before you chose the test, the selected columns appear in the column list. If you
have not selected columns, the dialog prompts you to pick your data.
4. To assign the desired worksheet columns to the Selected Columns list, select the
columns in the worksheet, or select the columns from the Data for Dependent and
Independent drop-down list.
The first selected column is assigned to the Dependent Variable row in the Selected
Columns list, and the second column is assigned to the Independent Variable row. The
title of selected columns appears in each row. You are only prompted for one dependent
and one independent variable column.
5. To change your selections, select the assignment in the list, then select new column
from the worksheet. You can also clear a column assignment by double-clicking it in the
Selected Columns list.
6. Click Finish to run the regression. The Best Subset Regression is performed. When the
test is complete, the Best Subset regression report appears.
Tip
No predicted values, residuals and other test results are computed or placed in
the worksheet. To view results for models, note which independent variables
were used for that model, then perform a Multiple Linear Regression using only
those independent variables.
337
SigmaPlot Statistics
is an estimate of the variability in the underlying population, computed from the random
component of the observations.
Residual Sum of Squares. The residual sum of squares is a measure of the size of the
residuals, which are the differences between the observed values of the dependent variable and
the values predicted by regression model.
338
8.8 Pearson Product Moment Correlation
regression coefficient
t=
standard error of regression coefficient
You can conclude from "large" t values that the independent variable(s) can be used to predict
the dependent variable (for example, that the coefficient is not zero).
P Value. P is the P value calculated for t. The P value is the probability of being wrong in
concluding that there is a true association between the variables (for example, the probability
of falsely rejecting the null hypothesis, or committing a Type I error, based on t). The
smaller the P value, the greater the probability that the independent variable helps predict the
dependent variable.
Traditionally, you can conclude that the independent variable can be used to predict the
dependent variable when P < 0.05.
VIF (Variance Inflation Factor). The variance inflation factor is a measure of
multicollinearity. It measures the "inflation" of a regression parameter (coefficient) for an
independent variable due to redundant information in other independent variables.
If the variance inflation factor is at or near 1.0, there is no redundant information in the other
independent variables. If the variance inflation factor is much larger, there are redundant
variables in the regression model, and the parameter estimates may not be reliable.
This result appears unless it was disabled in the Options for Best Subset Regression dialog
box.
339
SigmaPlot Statistics
Pearson Product Moment Correlation is a parametric test that assumes the residuals (distances
of the data points from the regression line) are normally distributed with constant variance.
1. Enter or arrange your data appropriately in the data worksheet. For more information, see
8.8.3 Arranging Pearson Product Moment Correlation Data.
2. Click the Analysis tab.
3. In the Statistics group, from the Tests drop-down list, select:
Correlation→Pearson Product Moment
4. View and interpret the Pearson Product Moment Report. For more information, see 8.8.5
Interpreting Pearson Product Moment Correlation Results.
5. Generate the report graph. For more information, see 8.8.6 .
6. Run the test by selecting the worksheet columns with the data you want to test using the
Pick Columns dialog box. For more information, see 8.8.4 .
To run a Pearson Product Moment test, you need to select the data to test. The Pick Columns
dialog box is used to select the worksheet columns with the data you want to test.
To run a Pearson Product Moment Correlation:
1. If you want to select your data before you run the regression, drag the pointer over your
data.
2. Click the Analysis tab.
3. In the Statistics group, from the Tests drop-down list, select:
Correlation→Pearson Product Moment
The Pick Columns for Pearson Product Moment dialog box appears. If you selected
columns before you chose the test, the selected columns appear in the column list. If you
have not selected columns, the dialog box prompts you to pick your data.
4. Click Finish. The correlation coefficient is computed. When the test is complete, the
Pearson Product Moment Correlation Coefficient report appears.
340
8.8.5 Interpreting Pearson Product Moment Correlation Results
5. To assign the desired worksheet columns to the Selected Columns list, select the
columns in the worksheet, or select the columns from the Data for Variable drop-down
list.
The selected columns are assigned to the Variables row in the Selected Columns list in
the order they are selected from the worksheet. The title of selected columns appears in
each row. You can select up to 64 variable columns. SigmaPlot computes the correlation
coefficient for every possible pair.
6. Click Finish. The correlation coefficient is computed. When the test is complete, the
Pearson Product Moment Correlation Coefficient report appears.
7. To change your selections, select the assignment in the list, then select new column
from the worksheet. You can also clear a column assignment by double-clicking it in the
Selected Columns list.
8. Click Finish. The correlation coefficient is computed. When the test is complete, the
Pearson Product Moment Correlation Coefficient report appears.
8.8.5.2 P Value
The P value is the probability of being wrong in concluding that there is a true association
between the variables (for example, the probability of falsely rejecting the null hypothesis,
or committing a Type I error). The smaller the P value, the greater the probability that the
variables are correlated.
Traditionally, you can conclude that the independent variable can be used to predict the
dependent variable when P < 0.05.
341
SigmaPlot Statistics
1. With the Pearson Product Moment report in view, click the Report tab.
2. In the Result Graphs group, click Create Result Graph.
The Create Result Graph dialog box appears displaying a Scatter Matrix graph.
3. Click OK.
342
8.9.1 About the Spearman Rank Order Correlation Coefficient
If you want to assume that the value of one variable affects the other, use some form of
regression. If you need to find the correlation of normally distributed data, use the parametric
Pearson Product Moment Correlation.
343
SigmaPlot Statistics
To run a Spearman Rank Order Correlation test, you need to select the data to test. The Pick
Columns dialog box is used to select the worksheet columns with the data you want to test
and to specify how your data is arranged in the worksheet.
To run a Spearman Rank Order Correlation:
1. If you want to select your data before you run the regression, drag the pointer over your
data.
2. Click the Analysis tab.
3. In the Statistics group, from the Tests drop-down list, select:
Correlation→Spearman Correlation
The Pick Columns for Spearman Correlation dialog box appears. If you selected
columns before you chose the test, the selected columns appear in the column list. If you
have not selected columns, the dialog box prompts you to pick your data.
4. Click Finish. The correlation coefficient is computed. When the test is complete, the
Spearman Rank Order Correlation Coefficient report appears.
5. To assign the desired worksheet columns to the Selected Columns list, select the
columns in the worksheet, or select the columns from the Data for Variable drop-down
list.
The selected columns are assigned to the Variables row in the Selected Columns list in
the order they are selected from the worksheet. The title of selected columns appears in
each row. You can select up to 64 variable columns. SigmaPlot computes the correlation
coefficient for every possible pair.
6. Click Finish. The correlation coefficient is computed. When the test is complete, the
Spearman Rank Order Correlation Coefficient report appears.
7. To change your selections, select the assignment in the list, then select new column
from the worksheet. You can also clear a column assignment by double-clicking it in the
Selected Columns list.
8. Click Finish. The correlation coefficient is computed. When the test is complete, the
Spearman Rank Order Correlation Coefficient report appears.
344
8.9.5.1 Spearman Correlation Coefficient rs
8.9.5.2 P Value
The P value is the probability of being wrong in concluding that there is a true association
between the variables (for example, the probability of falsely rejecting the null hypothesis,
or committing a Type I error). The smaller the P value, the greater the probability that the
variables are correlated.
Traditionally, you can conclude that the independent variable can be used to predict the
dependent variable when P < 0.05.
1. With the Spearman Correlation report in view, click the Report tab.
2. In the Result Graphs group, click Create Result Graph.
The Create Result Graph dialog box appears displaying a Scatter Matrix graph.
345
SigmaPlot Statistics
3. Click OK.
min (a + bx k y k) 2
a, b k 2 2
+ b2
yk xk
where, (xkyk) is the observation, σxk is the standard deviation of xk, and σyk is the standard
deviation of yk.
There are two types of Deming Regression: Simple Deming Regression and General Deming
Regression. Use Simple Deming Regression when the data errors are constant among
all measurements for each of the two variables. If the two constant error values for the
independent and dependent variables are equal to each other, then Simple Deming Regression
is often called Orthogonal Regression.
Use General Deming Regression to allow arbitrary values for the error at each observation.
1. Enter or arrange your data in the worksheet. For more information, see 8.10.3 Arranging
Deming Regression Data.
2. If desired, set the Deming Regression options.
3. Select the Analysis tab.
4. In the Statistics group, from the Tests drop-down list, select:
Regression→Deming
5. For more information, see Generate report graphs. .
346
8.10.3 Arranging Deming Regression Data
1 = (a + bk x y k) 2
= 2 2 2
n 2 k yk+ b xk
where (xk, yk) is the kth observation; σxk is the standard of the deviation of xk; and σyk is the
standard deviation of yk.
This option has no effect on the computation of the best-fit parameter values, but does affect
the computation of the standard errors of both parameters.
Add table of predicted means. Select to put a table of the predicted means (or true values)
for every observation in both the X and Y variables in the report. This table also includes the
values of the residuals between the observations and the predicted means.
Add parameter covariance matrix. Select to put the two-by-two variance-covariance matrix
for the parameters in the report.
Confidence Intervals.
• Confidence level. Set the percent confidence level for the confidence intervals for the
parameters and for the confidence bands for the regression line. The default value is 95%.
Graph
Create new graph. Select to create a result graph of Deming Regression after the report is
created. By default, the result graph consists of the regression line and the raw data. Although
this graph does not appear with error bars, you can add them later by modifying the graph. For
more information, see Modifying Error Bars.
Graph features.
• Scatter plot of predicted means. Select to place a scatter plot of the predicted means for
each data point on top of the regression line. Data for the predicted means also appears in
the worksheet.
• Confidence Bands. Select to add pair of curves on the graph that represent the lower
and upper limit of confidence intervals for the predicted means at specific values of the
347
SigmaPlot Statistics
independent variable over the range of the input data. Data for the confidence bands
appears in the worksheet.
If you want to select your data before you run the test, drag the pointer over your data.
1. Click the Analysis tab, and then in the Statistics group, from the Testsdrop-down list,
select:
Regression→Deming
The Deming Regression - Data Format panel of the Regression Wizard appears
prompting you to specify a data format.
2. Click Finish to run the t-test on the selected columns. For more information, see After the
computations are completed, the report appears. .
3. For more information, see Select either XY Pair or XY Pair-Errors from the Data
Format drop-down list. .
4. Click Next to pick the data columns for the test. If you selected columns before you the
test, the selected columns appear in the Selected Columns list.
5. To assign the desired worksheet columns to the Selected Columns list, select the
columns in the worksheet, or select the columns from the Data for Data drop-down list.
The first selected column is assigned to the first row in the Selected Columns list, and
all successively selected columns are assigned to successive rows in the list. The title of
selected columns appears in each row. For raw and indexed data, you are prompted to
select two worksheet columns. For statistical summary data you are prompted to select
three columns.
6. To change your selections, select the assignment in the list, then select new column
from the worksheet. You can also clear a column assignment by double-clicking it in the
Selected Columns list.
7. Click Finish to run the t-test on the selected columns. For more information, see After the
computations are completed, the report appears. .
348
8.10.7 Deming Regression Result Graph
Data Summary. This section simply shows the total number of selected observations and the
number of missing data rows. What constitutes a missing data row was discussed above in the
description of the Deming Regression Wizard.
Regression Measures. There are four values that measure the strength of association between
the two data variables and the error variance scaling used in the model.
• The correlation coefficient measures the linear correlation between the data for the
independent and dependent variables. In Simple Deming Regression, a high enough
correlation coefficient (> .975) is sometimes used as a criterion for using simple linear
regression as an alternative to Deming Regression.
• The chi-square statistic is the optimal value of the sum that was minimized to obtain the
best-fit parameters (equivalent to maximizing the log-likelihood function). If the user
provided exact (or nearly so) standard deviations of the data, then this statistic has an
approximate chi-square distribution with N-2 degrees of freedom, where N is the number
of pairs of observations.
• The reduced chi-square statistic is simply the value of the chi-square statistic divided by
the degrees of freedom. It estimates the variance scaling for the data, assuming the user
selected the option to apply a scaling factor to the data errors.
• The degrees of freedom is the integer N-2, where N is the number of pairs of observations:
N-2 = (2N observations for X and Y) – (N linear constraints on the means for both X
and Y) – (2 parameters).
Parameter Estimates. This is a table of the estimated values and statistics for the intercept
and slope parameters for the best-fit line, and asymptotic values of certain statistics. The
statistical values reported are the standard error of the parameters and their individual
confidence intervals.
The statistics are affected not only be the data set, but by options selected by the user. In the
Options dialog, the user can choose to interpret the data errors either absolutely or relative
to some scaling factor. Also, there is a setting in the program’s configuration file (spw.ini)
that allows the user to choose between two estimation methods for the standard errors - York
and Williamson’s method which is similar to the so-called delta method and is based directly
on the values of the observations, and another method based on maximum likelihood theory
(MLE method) which is based on using the predicted means.
Covariance Matrix. This is a two-by-two matrix whose diagonal entries are the variances of
the parameters and whose off-diagonal entry is the covariance between both parameters.
Hypothesis Testing. Two F-tests are used to test the hypotheses that the slope is 0 and that
the slope is 1.
Predicted Means. This is a table of the predicted (estimated) means for the distributions
from which the data is sampled. There is a two predicted means given for the X and Y
measurements in each observed data point. The residual difference between the measured
value of each variable and its predicted mean is also given.
349
SigmaPlot Statistics
Confidence bands for the predicted values of the dependent variable can also be added to the
graph. These bands measure the accuracy of the predicted values for Y assuming specified
values for X.
350
9 Survival Analysis
Topics Covered in this Chapter
♦ Five Survival Tests
♦ Data Format for Survival Analysis
♦ Single Group Survival Analysis
♦ LogRank Survival Analysis
♦ Gehan-Breslow Survival Analysis
♦ Cox Regression
♦ Survival Curve Graph Examples
♦ Failures, Censored Values, and Ties
Survival analysis studies the variable that is the time to some event. The term survival
originates from the event death. But the event need not be death; it can be the time to any
event. This could be the time to closure of a vascular graft or the time when a mouse footpad
swells from infection. Of course it need not be medical or biological. It could be the time a
motor runs until it fails. For consistency we will use survival and death (or failure) here.
Sometimes death doesn’t occur during the length of the study or the patient dies from some
other cause or the patient relocates to another part of the country. Though a death did not
occur, this information is useful since the patient survived up until the time he or she left the
study. When this occurs the patient is referred to as censored. This comes from the expression
censored from observation – the data has been lost from view of the study. Examples of
censored values are patients who moved to another geographic location before the study ended
and patients who are alive when the study ended. Kaplan-Meier survival analysis includes
both failures (death) and censored values.
351
SigmaPlot Statistics
Figure 9.1 Raw Data Format for a Survival Analysis with Two Groups
352
9.2.2 Indexed Data
In the graph above, columns 1 and 2 are the survival time and status values for the first group -
Affected Node. Columns 3 and 4 are the same for the second group - Total Node. The report
and the survival curve graph will use the text strings (“Affected Node”, “Total Node”) found
in the survival time column titles.
Important
The worksheet columns for each group must be the same length. If not then the cells
in the longer length column will be considered missing. All non-positive survival
times will also be considered missing. All status variable values not defined as either a
failure or a censored value will be considered missing.
In the example above, group is in column 1, survival time is in column 2 and the status
variable is in column 3.
Note
The Index and Unindex transforms are not designed for converting between survival
analysis data formats. To use these features you must index and unindex the survival
time and status variables separately and then reorganize the resulting columns.
353
SigmaPlot Statistics
354
9.3.3.1 Options for Survival Single Group: Graph Options
1. If you are going to analyze your survival curve after changing test options, and want to
select your data before you create the curve, then drag the pointer over your data.
2. Select Survival Single Group from the Tests drop-down list in the Statistics group.
3. Click Options in the Statistics group. The Options for Survival Single Group dialog
box appears with two tabs:
• Graph Options. Click the Graph Options tab to view the graph symbol, line and
scaling options. You can select additional statistical graph elements here. For more
information, see 9.3.3.1 Options for Survival Single Group: Graph Options.
• Results. Click the Results tab to specify the survival time units and to modify the
content of the report and worksheet. For more information, see 9.3.3.2 Options for
Single Group Survival: Results.
SigmaPlot saves the options settings between sessions.
4. To continue the test, click Run Test.
Tip
All options in these dialog boxes are "sticky" and remain in the state that you
have selected until you change them.
355
SigmaPlot Statistics
Additional Plot Statistics. You can add two different types of graph elements to your survival
curve from the Type drop-down list:
• 95% Confidence Intervals. Selecting adds the upper and lower confidence lines in a
stepped line format.
• Standard Error Bars. Selecting this will add error bars for the standard errors of the
survival probability. These are placed at the failure times. All of these elements will be
graphed with the same color as the survival curve. You may change these colors, and other
graph attributes, in the Property Browser after creating the graph.
Report.
• Cumulative Probability Table. Clear this option to exclude the cumulative probability
table from the report. This reduces the length of the report for large data sets.
Worksheet.
• 95% Confidence Intervals. Select this to place the survival curve upper and lower 95%
confidence interval values into the worksheet. These are placed into the first empty
worksheet columns.
Time Units. Select a time unit from the drop-down list or enter a unit. These units are used in
the graph axis titles and the survival report.
To run a single group survival analysis you need to select survival time and status data
columns to analyze. Use the Pick Columns panel to select these two columns in the worksheet.
To run a Single Group analysis:
1. Specify any options for your graph and report. For more information, see 9.3.3 Setting
Single Group Test Options.
2. If you want to select your data before you run the test then drag the pointer over your
data. The Survival Time column must precede and be adjacent to the Status column.
3. Select the Analysis tab.
4. In the Statistics group, from the Tests drop-down list, select:
Survival→Kaplan-Meier→Single Group
The Pick Columns for Survival Single Group dialog box appears prompting you to
select your data columns. If you selected columns before you chose the test, the selected
columns appear in the Selected Columns list.
356
9.3.4 Running a Single Group Survival Analysis
Figure 9.3 The Pick Columns for Survival Single Group Panel Prompting You
to Select Time and Status Columns
5. To assign the desired worksheet columns to the Selected Columns list, select the
columns in the worksheet, or select the columns from the Data for drop-down list. The
first selected column is assigned to the first row (Time) in the Selected Columns list, and
the next selected column is assigned to the next row (Status) in the list. The number or
title of selected columns appears in each row.
6. To change your selections, select the assignment in the list and then select a new column
from the worksheet. You can also clear a column assignment by double-clicking it in the
Selected Columns list.
7. Click Next to choose the status variables. The status variables found in the columns you
selected are shown in the Status labels in selected columns window. Select these and
click the right arrow buttons to place the event variables in the Event window and the
censored variable in the Censored window.
Figure 9.4 The Pick Columns for Survival Single Group Panel Prompting You to
Select the Status Variables.
You can have more than one Event label and more than one Censored label. You must
select one Event label in order to proceed. You need not select a censored variable,
though, and some data sets will not have any censored values. You need not select all the
variables; any data associated with cleared status variables will be considered missing.
357
SigmaPlot Statistics
8. Click the back arrows to remove labels from the Event and Censored windows. This
places them back in the Status labels in selected columns window.
SigmaPlot saves the Event and Censored labels that you selected for your next analysis.
If the next data set contains exactly the same status labels, or if you are re-analyzing your
present data set, then the saved selections appear in the Event and Censored windows.
9. Click Finish to create the survival graph and report. The results you obtain depend on
the Test Options that you selected. For more information, see 9.3.3 Setting Single Group
Test Options.
358
9.3.5.1 Report Header Information
Results Explanations
In addition to the numerical results, expanded explanations of the results may also appear. You
can turn off this text on the Options dialog box. You can also set the number of decimal places
to display. For more information, see Setting Report Options.
The report header includes the date and time that the analysis was performed. The data source
is identified by the worksheet title containing the data being analyzed and the notebook
name. The event and censor labels used in this analysis are listed. Also, the time units used
are displayed.
The survival probability table lists all event times and, for each event time, the number
of events that occurred, the number of subjects remaining at risk, the cumulative survival
probability and its standard error. The upper and lower 95% confidence limits are not
displayed but these may be placed into the worksheet. Failure times are not shown but you
can infer their existence from jumps in the Number at Risk data and the summary table
immediately below this table
You can turn the display of this table off by clearing this option in the Results tab of Test
Options. This is useful for large data sets.
359
SigmaPlot Statistics
360
9.4 LogRank Survival Analysis
( (O i )
E i) 2/ E i
. It generates a P value that is the probability of the chance occurrence of survival curves
as different (or more so) as those observed.
The LogRank test assumes that there is no difference in the accuracy of the data at any given
time. This is different from the Gehan-Breslow test that weights the early data more since
it assumes that this data is more accurate.
361
SigmaPlot Statistics
1. If you are going to analyze your survival curve after changing test options, and want to
select your data before you create the curve, then drag the pointer over your data.
2. Select Survival LogRank from the Tests drop-down list in the Statistics group.
3. Click Options in the Statistics group. The Survival LogRank Options for dialog box
appears with three tabs:
• Graph Options. Click the Graph Options tab to view the graph symbol, line and
scaling options. Additional statistical graph elements may also be selected here. For
more information, see 9.4.3.1 Options for Survival LogRank: Graph Options.
• Results. Click the Results tab to specify the survival time units and to modify the
content of the report and worksheet. For more information, see 9.4.3.2 Options for
Survival Log Rank: Results.
• Post Hoc Tests. Click the Post Hoc Tests tab to modify the multiple comparison
options. For more information, see 9.4.3.3 Options for Survival LogRank: Post Hoc
Tests.
SigmaPlot saves options settings between sessions.
4. To continue the test, click Run Test. For more information, see 9.4.4 Running a
LogRank Survival Analysis.
5. To accept the current settings and close the options dialog box, click OK.
362
9.4.3.2 Options for Survival Log Rank: Results
Additional Plot Statistics. Two different types of graph elements may be added to your
survival curves. You can select one of two Types:
• 95% Confidence Intervals. Selecting this will add the upper and lower confidence lines
in a stepped line format.
• Standard Error Bars. Selecting this will add error bars for the standard errors of the
survival probability. These are placed at the failure times. All of these elements will be
graphed with the same color as the survival curve. You may change these colors, and other
graph attributes, in the Property Browser after the graph has been created.
363
SigmaPlot Statistics
To run a LogRank survival analysis you need to select data in the worksheet and specify
the status variables.
To run a LogRank Survival analysis:
1. If you want to select your data before you run the test then drag the pointer over your
data. The columns must be adjacent and in the correct order (Time, Status for Raw data
and Group, Time Status for Indexed data). For more information, see 9.4.2 Arranging
LogRank Survival Analysis Data.
2. Select the Analysis tab.
3. In the Statistics group, from the Tests drop-down list, select:
Survival→Kaplan-Meier→LogRank
The Pick Columns for Survival LogRank dialog box appears.
4. From the Data Format drop-down list select either:
• Raw data format when you have groups of data in multiple Time, Status column pairs.
• Indexed data format when you have the groups specified by a column.
Figure 9.6 The Data Format Panel With Raw Data Format Selected
5. Click Next to display the Pick Columns panel that prompts you to select your data
columns. If you selected columns before you chose the test, the selected columns appear
in the Selected Columns list.
364
9.4.4 Running a LogRank Survival Analysis
Figure 9.7 The Pick Columns Panel for Survival LogRank Raw Data Format
Prompting You to Select Multiple Time and Status Columns
6. To assign the desired worksheet columns to the Selected Columns list, select the
columns in the worksheet, or select the columns from the Data for drop-down list.
The first selected column is assigned to the first row (Time 1) in the Selected Columns
list, and the next selected column is assigned to the next row (Status 1) in the list. The
number or title of selected columns appears in each row. Continue selecting Time, Status
columns for all groups that you wish to analyze.
7. To change your selections, select the assignment in the list and then select a new column
from the worksheet. You can also clear a column assignment by double-clicking it in the
Selected Columns list.
8. Click Next to choose the status variables. The status variables found in the columns you
selected are shown in the Status labels in selected columns: box. Select these and
click the right arrow buttons to place the event variables in the Event: window and the
censored variable in the Censored: window.
Figure 9.8 The Pick Columns for Survival LogRank Panel Prompting You to
Select the Status Variables
You can have more than one Event label and more than one Censored label. You must
select one Event label in order to proceed. You need not select a censored variable,
365
SigmaPlot Statistics
though, and some data sets will not have any censored values. You need not select all the
variables; any data associated with unselected status variables will be considered missing.
Figure 9.9 The Pick Columns for Survival LogRank Dialog Showing the Results
of Selecting the Status Variables
9. Click the back arrow keys to remove labels from the Event: and Censored: windows.
This places them back in the Status labels in selected columns: window.
SigmaPlot saves the Event and Censored labels that you selected for your next analysis. If
the next data set contains exactly the same status labels, or if you are re-analyzing your
present data set, then the saved selections appear in the Event: and Censored: windows.
10. Click Finish to create the survival graph and report. The results you obtain depend on the
Test Options that you selected.
If you selected Indexed data format then the Pick Columns panel asks you to select the
three columns in the worksheet for your Group, Time and Status.
Figure 9.10 The Pick Columns Panel for Survival LogRank Indexed Data Format
Prompting You to Select Group, Time and Status Columns
11. Click Next to select the groups you want to include in the analysis. If you want to analyze
all groups found in the Group column then select Select all groups. Otherwise select
366
9.4.4.1 Multiple Comparison Options
groups from the Data for Group drop-down list. You can select subsets of all groups and
select them in the order that you wish to see them in the report.
Figure 9.11 The Group Selection Panel for Survival LogRank Indexed Data
Format Prompting You to Select Groups to Analyze
12. Click Next to select the status variables as described above and then continue to complete
the analysis to create the report and graph.
LogRank tests the hypothesis of no differences between the several survival groups, but does
not determine which groups are different, or the sizes of the differences. Multiple comparison
tests isolate these differences by running comparisons between the experimental groups.
If you selected to run multiple comparisons only when the P value is significant, and LogRank
produces a P value equal to or less than the trigger P value, or you selected to always run
multiple comparison in the Options for LogRank dialog, the multiple comparison results
are displayed in the Report.
There are two multiple comparison tests to choose from for the LogRank survival analysis:
• Holm-Sidak. For more information, see 9.4.4.1.1 .
• Bonferroni. For more information, see 9.4.4.1.2 .
The Holm-Sidak Test can be used for both pairwise comparisons and comparisons versus a
control group. It is more powerful than the Bonferroni test and, consequently, it is able to
detect differences that these the Bonferroni test does not. It is recommended as the first-line
procedure for pairwise comparison testing.
When performing the test, the P values of all comparisons are computed and ordered from
smallest to largest. Each P value is then compared to a critical level that depends upon the
significance level of the test (set in the test options), the rank of the P value, and the total
number of comparisons made. A P value less than the critical level indicates there is a
significant difference between the corresponding two groups.
367
SigmaPlot Statistics
Figure 9.12 Holm-Sidak Multiple Comparison Results for VA Lung Cancer Study
The Bonferroni test performs pairwise comparisons with paired chi-square tests. It is
computationally similar to the Holm-Sidak test except that it is not sequential (the critical
level used is fixed for all comparisons). The critical level is the ratio of the family P value
to the number of comparisons. It is a more conservative test than the Holm-Sidak test in
that the chi-square value required to conclude that a difference exists becomes much larger
than it really needs to be.
The critical level is constant at 0.05/6 = 0.00833. Since the critical level does not increase,
as it does for the Holm-Sidak test, there will tend to be fewer comparisons with significant
differences.
Figure 9.13 Bonferroni Multiple Comparison Results for VA Lung Cancer Study
368
9.4.5.1 Report Header Information
Results Explanations
In addition to the numerical results, expanded explanations of the results may also appear. You
can turn off this text on the Options dialog box. You can also set the number of decimal places
to display in the Options dialog box. For more information, see Setting Report Options.
The report header includes the date and time that the analysis was performed. The data source
is identified by the worksheet title containing the data being analyzed and the notebook
name. The event and censor labels used in this analysis are listed. Also, the time units used
are displayed.
The survival probability table lists all event times and, for each event time, the number
of events that occurred, the number of subjects remaining at risk, the cumulative survival
probability and its standard error. The upper and lower 95% confidence limits are not
displayed but these may be placed into the worksheet. Failure times are not shown but you
can infer their existence from jumps in the Number at Risk data and the summary table
immediately below this table.
You can turn the display of this table off by clearing this option in the Results tab of Test
Options. This is useful to keep the report a reasonable length when you have large data sets.
369
SigmaPlot Statistics
In the graph above, the default Test Options, gray scale colors, solid circle symbols, was
used. Squamous and large cell carcinomas do not appear to be significantly different (as
well as small cell and adenocarcinoma). This is confirmed by the LogRank test. For more
information, see 9.4.3 Setting LogRank Survival Options.
You can control the graph in two ways:
• You can set the graph options to become the default values until they are changed. For
more information, see Setting Page Options .
370
9.5 Gehan-Breslow Survival Analysis
• After the graph is created you can modify it using SigmaPlot’s Property Browser. For more
information, see Modifying Graphs Using the Property Browser. Each object in the graph
is a separate plot (for example, survival curve, failure symbols, censored symbols, upper
confidence limit, etc.) so you have considerable control over the appearance of your graph.
371
SigmaPlot Statistics
1. If you are going to analyze your survival curve after changing test options, and want to
select your data before you create the curve, then drag the pointer over your data.
2. Select the Analysis tab.
3. In the Statistics group, from the Tests drop-down list, select:
Survival→Kaplan-Meier→Gehan-Breslow
4. Click Options. The Options for Survival Gehan-Breslow dialog box appears with
three tabs:
• Graph Options. Click the Graph Options tab to view the graph symbol, line and
scaling options. You can select additional statistical graph elements here. For more
information, see 9.5.3.1 Options for Survival Gehan-Breslow: Graph Options.
• Results. Click the Results tab to specify the survival time units and to modify the
content of the report and worksheet.For more information, see 9.5.3.2 Options for
Survival Gehan-Breslow: Results.
• Post Hoc Tests. Click the Post Hoc Tests tab to modify the multiple comparison
options. For more information, see 9.5.3.3 Options for Survival Gehan-Breslow: Post
Hoc Tests.
SigmaPlot saves the options settings between sessions. For more information, see 9.5.3.3
Options for Survival Gehan-Breslow: Post Hoc Tests.
5. To continue the test, click Run Test. The Pick Columns panel appears.
6. To accept the current settings, click OK.
372
9.5.3.2 Options for Survival Gehan-Breslow: Results
373
SigmaPlot Statistics
Tip
If multiple comparisons are triggered, the report shows the results of the comparison.
You may elect to always show them by clearing Only when Survival P Value is
Significant.
To run a Gehan-Breslow survival analysis you need to select data in the worksheet and specify
the status variables.
To run a Gehan-Breslow Survival analysis:
1. Specify any options for your graph, report and post-hoc tests. For more information, see
9.5.3 Setting Gehan-Breslow Survival Options.
2. If you want to select your data before you run the test then drag the pointer over your
data. The columns must be adjacent and in the correct order, for example: Time, Status
for Raw data and Group, Time Status for Indexed data.
3. Select the Analysis tab.
4. In the Statistics group, from the Tests drop-down list, select:
Survival→Kaplan-Meier→Gehan-Breslow
5. Click Run.
Figure 9.16 The Data Format Panel With Raw Data Format Selected
374
9.5.4 Running a Gehan-Breslow Survival Analysis
If you selected columns before you chose the test, the selected columns appear in the
Selected Columns list.
8. To assign the desired worksheet columns to the Selected Columns list, select the
columns in the worksheet, or select the columns from the Data for drop-down list.
The first selected column is assigned to the first row (Time 1) in the Selected Columns
list, and the next selected column is assigned to the next row (Status 1) in the list. The
number or title of selected columns appears in each row. Continue selecting Time, Status
columns for all groups that you wish to analyze.
Figure 9.17 The Pick Columns for Survival LogRank Panel Prompting You to
Select Multiple Time and Status Columns
9. To change your selections, select the assignment in the list and then select a new column
from the worksheet. You can also clear a column assignment by double-clicking it in the
Selected Columns list.
Figure 9.18 The Pick Columns for Survival Gehan-Breslow Panel Prompting You
to Select the Status Variables
10. Click Next to choose the status variables. The status variables found in the columns you
selected are shown in the Status labels in selected columns: window. Select these and
click the right arrow buttons to place the event variables in the Event: window and the
censored variable in the Censored: window.
375
SigmaPlot Statistics
Figure 9.19 The Pick Columns for Survival Gehan-Breslow Dialog Showing the
Results of Selecting the Status Variables
You can have more than one Event: label and more than one Censored: label. Select one
Event: label to proceed. You don’t need to select a censored variable, though, and some
data sets will not have any censored values. You also don’t need to select all the variables;
any data associated with cleared status variables are considered missing.
11. Click the back arrow keys to remove labels from the Event: and Censored: windows.
This places them back in the Status labels in selected columns: window.
SigmaPlot saves the Event and Censored labels that you selected for your next analysis.
If the next data set contains exactly the same status labels, or if you are analyzing your
present data set again, then the saved selections appear in the Event and Censored
windows.
12. Click Finish to create the survival graph and report. The results you obtain will depend
on the Test Options that you selected.
If you selected Indexed data format then the Pick Columns panel asks you to select the
three columns in the worksheet for your Group, Time and Status.
376
9.5.4.1 Multiple Comparison Options
Figure 9.20 The Pick Columns Panel for Survival Gehan-Breslow Indexed Data
Format Prompting You to Select Group, Time and Status Columns
13. Click Next to select the groups you want to include in the analysis. If you want to analyze
all groups found in the Group column then select Select all groups. Otherwise select
groups from the Data for Group drop-down list. You can select subsets of all groups and
select them in the order that you wish to see them in the report.
Figure 9.21 The Group Selection Panel for Survival Gehan-Breslow Indexed
Data Format Prompting You to Select Groups to Analyze
14. Click Next to select the status variables as described above and then to complete the
analysis to create the report and graph.
377
SigmaPlot Statistics
There are two multiple comparison tests to choose from for the Gehan-Breslow survival
analysis.
• Holm-Sidak.
• Bonferroni. For more information, see 9.5.4.1.2 .
The Holm-Sidak Test can be used for both pairwise comparisons and comparisons versus a
control group. It is more powerful than the Bonferroni test and, consequently, it is able to
detect differences that Bonferroni test does not. It is recommended as the first-line procedure
for pairwise comparison testing.
When performing the test, the P values of all comparisons are computed and ordered from
smallest to largest. Each P value is then compared to a critical level that depends upon the
significance level of the test (set in the test options), the rank of the P value, and the total
number of comparisons made. A P value less than the critical level indicates there is a
significant difference between the corresponding two groups.
Figure 9.22 Holm-Sidak Multiple Comparison Results for VA Lung Cancer Study
The Bonferroni test performs pairwise comparisons with paired chi-square tests. It is
computationally similar to the Holm-Sidak test except that it is not sequential (the critical
level used is fixed for all comparisons). The critical level for the Bonferroni test is the ratio
of the family P value to the number of comparisons. It is a more conservative test than the
Holm-Sidak test in that the chi-square value required to conclude that a difference exists
becomes much larger than it really needs to be.
The critical level is constant at 0.05/6 = 0.00833. Since the critical level does not increase,
as it does for the Holm-Sidak test, there will tend to be fewer comparisons with significant
differences. This occurs here with three significant comparisons as compared to four for
the Holm-Sidak case.
378
9.5.5 Interpreting Gehan-Breslow Survival Results
Figure 9.23 Bonferroni Multiple Comparison Results for VA Lung Cancer Study
Results Explanations
379
SigmaPlot Statistics
The number of significant digits displayed in the report may be set in the Report Options
dialog box. For more information, see Setting Report Options.
The report header includes the date and time that the analysis was performed. The data source
is identified by the worksheet title containing the data being analyzed and the notebook
name. The event and censor labels used in this analysis are listed. Also, the time units used
are displayed.
The survival probability table lists all event times and, for each event time, the number
of events that occurred, the number of subjects remaining at risk, the cumulative survival
probability and its standard error. The upper and lower 95% confidence limits are not
displayed but these may be placed into the worksheet. Failure times are not shown but you
can infer their existence from jumps in the Number at Risk data and the summary table
immediately below this table
You can turn the display of this table off by clearing this option in the Results tab of Test
Options. This is useful to keep the report a reasonable length when you have large data sets.
The data summary table shows the total number of cases. The sum of the number of events,
censored and missing values, shown below this, will equal the total number of cases.
The mean and percentile survival times and their statistics are listed in this table. The median
survival time is commonly used in publications.
380
9.6 Cox Regression
In the graph above, incrementing colors, percent survival and 95% confidence interval options
were selected from Test Options. For more information, see 9.5.3 Setting Gehan-Breslow
Survival Options. The Holm-Sidak test showed these two curves to be significantly different
at the 0.001 level.
You can control the graph in two ways:
• You can set the graph options to become the default values until they are changed. For
more information, see Setting Page Options .
• After the graph is created you can modify it using SigmaPlot’s Property Browser. For more
information, see Modifying Graphs Using the Property Browser. Each object in the graph
is a separate plot (for example, survival curve, failure symbols, censored symbols, upper
confidence limit, etc.) so you have considerable control over the appearance of your graph.
381
SigmaPlot Statistics
h (t , X 1, X 2, X n) = h 0(t )·exp (b 1X 1+ b 2X 2+ +b nX n )
where X1, X2, . . ., Xn are the covariates in the study. The function h0 is called the baseline
hazard function and only depends upon time. The exponential factor on the right-hand side of
the equation involves the covariates, but does not depend on time. In our implementation of
Cox Regression, we are assuming that every covariate is time-independent and so its value for
each subject remains constant over time (it is possible, however, to extend Cox Regression to
include time-dependent covariates).
382
9.6.2 Performing a Cox Regression Proportional Hazards Model
The coefficients b1, b2, bn in our model are constants, independent of both time and the
covariates, and their values are determined from the regression analysis by maximizing a
quantity known as the partial likelihood function. The resulting values of the coefficients are
called the best-fit coefficients or, sometimes, the maximum likelihood estimates. Once the
coefficients are determined, there is a procedure that estimates the values of the baseline
survival function at the sampled event times. The baseline survival function is defined by
setting all covariates to zero. Denoting this function by S0, the covariate-adjusted survival
functions and cumulative hazard functions are determined for each event time t by:
Our model of the hazard function shows that if there are two specifications for the values
of the covariates, then the corresponding values of the hazards are proportional over time.
This is the reason the Cox model is called a proportional hazards model. It is possible that
a potential covariate for the model does not satisfy this assumption. For example, suppose
we have the covariate Gender in a survival study. If males are dying at twice the rate of
females during the first month of a study, and both genders die at the same rate during the
next month of the study, then the ratio of the hazards, or the hazard ratio, for males to females
is not constant over time and the proportionality assumption fails. Such a covariate cannot
be included in the hazard model.
A covariate may also be omitted from the model because its value is based on the design of
the study and has secondary importance as a risk factor for survival. For example, when a
study is performed at two different clinics to determine the impact of age and drug therapy on
patient recovery, then the variable Clinic is such a covariate.
Any variable whose values have been included in the survival data but is not included as a
covariate in the hazard model for the reasons described above is called a stratification variable.
Each value or level of such a variable is called a stratum; collectively, the levels are the strata.
When a stratification variable is present, then the survival study is partitioned into groups, one
for each stratum, where each group has its own survival function that is determined from the
regression analysis. The best-fit coefficients are the same for each stratum, but the baseline
time-dependent factors in the model are different.
383
SigmaPlot Statistics
6. Interpret the Cox Regression results. For more information, see 9.6.9 .
1. Select Cox PH Model from the Select Test drop-down list in the Statistics group on the
Analysis tab.
2. Click Options. The Options for Cox PH Model dialog box appears with three tabs:
• Criterion. Click the Criterion tab to specify variable selection and convergence
options. For more information, see 9.6.5.1 Options for Cox Regression Proportional
Hazard: Criterion.
• Results. Click the Results tab to specify the survival time units and to modify the
content of the report and worksheet. For more information, see 9.6.5.3 Options for Cox
Regression Proportional Hazard: Results.
• Graph Options. Click the Graph Options tab to view the graph symbol, line and
scaling options. You can select additional statistical graph elements here. For more
384
9.6.5.1 Options for Cox Regression Proportional Hazard: Criterion
information, see 9.6.5.2 Options for Cox Regression Proportional Hazard: Graph
Options.
SigmaPlot saves the options settings between sessions.
3. To continue the test, click Run Test.
Note
All options in these dialog boxes are "sticky" and remain in the state that you
have selected until you change them.
385
SigmaPlot Statistics
value should not be changed unless there is a problem with obtaining convergence. The
default value is 1.0.
• Maximum Iterations. The Maximum Iterations value is the largest number of improved
changes in the coefficients that are allowed in order to obtain convergence. If this value
is exceeded in the regression process, then the algorithm exits regardless of whether the
convergence criterion (determined by the Tolerance) has been satisfied. The default is 20.
386
9.6.6 Setting Cox Regression Stratified Model Options
Time units. Select a time unit from the drop-down list or enter a unit. These units are used in
the graph axis titles and the survival report.
1. If you are going to analyze your survival curve after changing test options, and want to
select your data before you create the curve, then drag the pointer over your data.
2. Select Cox Stratified Model from the Select Test drop-down list in the Statistics group
on the Analysis tab.
3. Click Options. The Options for Cox Stratified Model dialog box appears with three
tabs:
• Criterion. Click the Criterion tab to specify variable selection and convergence
options. For more information, see 9.6.6.1 Options for Cox Regression Stratified
Model: Criterion.
• Graph Options. Click the Graph Options tab to view the graph symbol, line and
scaling options. You can select additional statistical graph elements here. For more
information, see 9.6.6.2 Options for Cox Regression Stratified Model: Graph Options.
• Results. Click the Results tab to specify the survival time units and to modify the
content of the report and worksheet. For more information, see 9.6.6.3 Options for
Cox Regression Stratified Model: Results.
SigmaPlot saves the options settings between sessions.
4. To continue the test, click Run Test.
387
SigmaPlot Statistics
covariates that are no longer significant are removed. This procedure continues until all
covariates have been entered into the model or until each covariate not in model makes no
significant contribution.
• P-to-Enter. This value establishes the criterion for removing a covariate from the hazard
model. A covariate is removed from the model only if there is no significant change in the
likelihood function by adding the covariate. A change is not significant if the probability
associated with this change (the P-value) is greater than the P-to-Remove value. The
default value is .10.
To prevent the regression algorithm from cycling, the P-to-Remove value must by greater
than the P-to-Enter value.
• P-to-Remove. This value establishes the criterion for removing a covariate from the hazard
model. A covariate is removed from the model only if there is no significant change in the
likelihood function by adding the covariate. A change is not significant if the probability
associated with this change (the P-value) is greater than the P-to-Remove value. The
default value is .10.
To prevent the regression algorithm from cycling, the P-to-Remove value must by greater
than the P-to-Enter value.
• Maximum Steps. This integer value is the largest number of steps allowed for entering
covariates. If this value is attained in the regression process, then the algorithm exits
regardless of the stopping criteria indicated above.
Convergence. These options control the behavior of the regression algorithm.
• Tolerance. This value determines the upper limit for the two quantities that measure
convergence. One quantity is the coordinate of the gradient of the likelihood function with
largest absolute value. The other quantity is a distance measure of the model’s coefficients
between two consecutive iterations. The default value is 1e-008.
• Step Length. This value refers to the initial value of the parameter that controls the
direction and size of the change in coefficients between two consecutive iterations. This
value should not be changed unless there is a problem with obtaining convergence. The
default value is 1.0.
Maximum Iterations. This integer value is the largest number of improved changes to the
coefficients that are allowed in order to obtain convergence. If this value is exceeded in the
regression process, then the algorithm exits regardless of whether the convergence criterion
(determined by the Tolerance) has been satisfied. The default value is 20.
388
9.6.6.3 Options for Cox Regression Stratified Model: Results
To run a Cox Regression Proportional Hazards Model analysis you need to select survival
time, status, and covariate data columns to analyze. Use the Select Data panel of the Test
Wizard to select these columns from the worksheet.
To run a Cox Regression Proportional Hazards Mode analysis:
1. Specify any options for your graph and report. For more information, see 9.6.5 Setting
Cox Regression Proportional Hazards Options.
2. If you want to select your data before you run the test then drag the pointer over your
data. Your data must be selected in contiguous columns with the (survival) Time column
first, followed by the Status column, and then one or more Covariate columns. From the
menus select:
389
SigmaPlot Statistics
Figure 9.26 The Pick Columns for Cox PH Model Panel Prompting You to Select
Time, Status, and Covariate Columns
5. To assign the desired worksheet columns to the Selected Columns list, select the
columns in the worksheet, or select the columns from the Data for drop-down list. The
first selected column is assigned to the first row (Time) in the Selected Columns list, the
next selected column is assigned to the next row (Status) in the list, and then the next
column is assigned to the next row (Covariate). The number or title of selected columns
appears in each row.
6. To change your selections, select the assignment in the list and then select a new column
from the worksheet. You can also clear a column assignment by double-clicking it in the
Selected Columns list.
7. Click Next to choose the Categorical Covariates. The drop-down list displays all of the
covariates that you had selected on the Select Data panel. To select a categorical covariate,
the clicks an item in this list and the selection will be entered in the Selected Covariates
list. If a covariate column is not listed as categorical nd it contains a non-numeric data
entry, then this entry is treated as missing. Making a selection on this panel is optional as
there may be no categorical covariates in the study.
8. Click Next to choose the status variables. The status variables found in the columns you
selected are shown in the Status labels in selected columns window. Select these and
click the right arrow buttons to place the event variables in the Event window and the
censored variable in the Censored window.
390
9.6.8 Running a Cox Regression Stratified Model
Figure 9.27 The Pick Columns for Cox PH Model Panel Prompting You to Select
the Status Variables.
You can have more than one Event label and more than one Censored label. You must
select one Event label in order to proceed. You need not select a censored variable,
though, and some data sets will not have any censored values. You need not select all the
variables; any data associated with cleared status variables will be considered missing.
9. Click the back arrows to remove labels from the Event and Censored windows. This
places them back in the Status labels in selected columns window.
SigmaPlot saves the Event and Censored labels that you selected for your next analysis.
If the next data set contains exactly the same status labels, or if you are re-analyzing your
present data set, then the saved selections appear in the Event and Censored windows.
10. Click Finish to create the survival graph and report. The results you obtain depend on
the Test Options that you selected. For more information, see 9.3.3 Setting Single Group
Test Options.
To run a Cox Regression Stratified Model analysis you need to select strata, survival time,
status, and covariate data columns to analyze. Use the Select Data panel of the Test Wizard to
select these columns from the worksheet.
391
SigmaPlot Statistics
The Strata column contains the various levels of the stratification variable that are used
separate the survival study into groups, each of with its own baseline survival curve. The term
baseline refers to computations that result when all covariates are set to zero.
To run a Cox Regression Stratified Model analysis:
1. Specify any options for your graph and report. For more information, see 9.6.6 Setting
Cox Regression Stratified Model Options.
2. If you want to select your data before you run the test then drag the pointer over your data.
Your data must be selected in contiguous columns with the Strata column first, followed
by (survival) Time column, the Status column, and then one or more Covariate columns.
3. Select the Analysis tab.
4. In the Statistics group, from the Tests drop-down list, select:
Survival→Cox Regression→Stratified Model
The Cox Stratified Model - Select Data panel of the Test Wizard appears prompting
you to select your data columns. If you selected columns before you chose the test, the
selected columns appear in the Selected Columns list.
Figure 9.28 The Pick Columns for Cox Stratified Model Panel Prompting You to
Select Time, Status, and Covariate Columns
5. To assign the desired worksheet columns to the Selected Columns list, select the
columns in the worksheet, or select the columns from the Data for drop-down list. The
first selected column is assigned to the first row (Strata) in the Selected Columns list,
the next selected column is assigned to the next row (Time) in the list, the next selected
column is assigned to the next row (Status), and then the next column is assigned to the
next row (Covariate). The number or title of selected columns appears in each row.
6. To change your selections, select the assignment in the list and then select a new column
from the worksheet. You can also clear a column assignment by double-clicking it in the
Selected Columns list.
7. Click Next to choose the Categorical Covariates. The drop-down list displays all of the
covariates that you selected on the Select Data panel. To select a categorical covariate, the
click an item in this list and the selection will be entered in the Selected Covariates list. If
a covariate column is not listed as categorical and it contains a non-numeric data entry,
then this entry is treated as missing. Making a selection on this panel is optional as there
may be no categorical covariates in the study.
392
9.6.9 Interpreting Cox Regression Results
8. Click Next to choose the status variables. The status variables found in the columns you
selected are shown in the Status labels in selected columns window. Select these and
click the right arrow buttons to place the event variables in the Event window and the
censored variable in the Censored window.
You can have more than one Event label and more than one Censored label. You must
select one Event label in order to proceed. You need not select a censored variable,
though, and some data sets will not have any censored values. You need not select all the
variables; any data associated with cleared status variables will be considered missing.
9. Click Finish to create the survival graph and report. The results you obtain depend on
the Test Options that you selected. For more information, see 9.3.3 Setting Single Group
Test Options.
Results Explanations
In addition to the numerical results, expanded explanations of the results may also appear. You
can turn off this text on the Options dialog box. You can also set the number of decimal places
to display. For more information, see Setting Report Options.
393
SigmaPlot Statistics
Header. This includes the name of the test, date stamp, and data source, as for all other tests.
Event and Censor Labels. is a listing of the labels that you’ve selected in the Test Wizard.
There can be more than one label of each type.
Time Unit. This information comes from a setting in the Test Options dialog box and is used
to indicate the unit of survival time on result graphs.
Stratification Variable. This is the worksheet column (by title) to stratify the data if using
the Stratified Model test. This section does not appear if you’re using the Proportional
Hazards test.
Basic summary of time-event data over strata. This is a table whose first column is a list of
the strata for the stratification variable. The remaining columns have integer entries and are
titled: Cases, Missing, Events, Censored, and % Censored. The last row of the table gives the
total over all strata. If there is no stratification variable, then the table has one row of data.
Regression analysis. This section contains the results of maximizing the partial log-likelihood
function for the Cox Proportional Hazards Model to obtain the maximum likelihood estimates
of the coefficients. The partial likelihood function is based on the Breslow method for
resolving ties.
The coefficient values found by the regression are used to represent the hazard as a function
of time and the covariates. Each categorical covariate in the model is replaced by one or
more reference coded dummy variables, each with its own coefficient, before the regression
analysis is performed. The optimization process uses an iterative Damped-Newton method
with zero as the starting value for each coefficient.
The output of the analysis depends upon the variable selection method that is specified in the
Test Options dialog, either Complete or Stepwise. The default method is Complete, where
all covariates selected by the user are used to model the hazard function. When the default
method is used, the results show the maximum value of the log-likelihood function, the
number of iterations to convergence, and the tolerance used in the criterion for convergence.
If the Stepwise method is chosen, then only the covariates that contribute most to increasing
the value of the likelihood function are included in the hazard model. The included covariates
are determined using a step-by-step procedure. More details on the stepwise-regression results
are given later.
Testing the Global Null Hypothesis. This is the hypothesis that all coefficients in the hazard
model are zero. SigmaPlot provides two tests: the (partial) Likelihood Ratio test and the Global
Chi-Square test (also called Score test). The statistic for each test has a chi-square distribution
with p degrees of freedom, where p is the number of covariates. The default significance level
of the test is .05, which can be changed on the Report tab of the Tools/Options dialog box.
A significant result means that at least one of the covariates has a significant effect on survival
time. If the result is not significant, then no covariate significantly influences the survival time
and a Kaplan-Meier analysis should be considered for computing survival probabilities.
Both tests are used by many survival software applications and they usually agree in their
determination of significance. In the event they disagree, then the result of the Likelihood
Ratio test should be used as it is more accurate.
Model Estimates. This is a table of the best-fit coefficient values and their basic statistics. It
has five columns. The first column gives the names of the covariates. If stepwise regression
is used, then only the names of the covariates included in the model will be listed. The
remaining columns will be titled Coefficient, StdErr, Wald Chi-Square, and P Value. The
Wald Chi-Square statistic measures the significance of the covariate, testing the hypothesis
394
9.6.10 Cox Regression Graphs
that the coefficient is zero. The significance level is the same as the one used for testing the
Global Null Hypothesis.
Confidence. There are two sets of confidence intervals. The first set is a table with four
columns giving the confidence intervals of the coefficients for each covariate listed in the
Model Estimates section. The first two columns are the same as the Model Estimates table. The
confidence level has a default value of 95%, but can be changed in the Test Options dialog box.
The second set is a table with four columns that includes the hazard ratio for each covariate in
the model. The hazard ratio for a covariate is the proportional change in the hazard rate due to
a unit change in the value of the covariate. When the covariate represents a dummy variable
corresponding to some group in a categorical covariate, then the hazard ratio measures the
hazard rate for that group relative to the reference group. In this case, the confidence interval
in columns 3 and 4 can be used to test the hypothesis that the two groups have the same
hazard rate by testing the hypothesis the hazard ratio is 1. This can be tested by seeing if
1 lies in the confidence interval.
The Create Result Graphdialog box appears displaying the available graphs for the
Cox Regression report.
3. Select the report graph you want to create, then click OK, or double-click the graph
in the list.
The Covariate Values for Plot dialog box appears, in which you can select the covariates
values to use to specify the graph data.
395
SigmaPlot Statistics
The examples below show four variations that can be achieved by modifying the test options
for survival curves. Once you’ve selected a test from the Statistics toolbar, you can open
this dialog box by selecting from the menus:
Statistics→Current Test Options
The options used to create the examples below appear on the Graph Options tab of any
of the Options for Survival dialog boxes.
Survival curve with censored symbols. Under Status Symbols, select Censored.
396
9.7.1 Using Test Options to Modify Graphs
Survival curve with censored and failure symbols. Under Status Symbols, select both
Censored and Failures.
Survival curve with both symbol types and 95% confidence intervals. To add 95%
confidence intervals:
397
SigmaPlot Statistics
Figure 9.31 Survival Curve with both Symbol Types and 95% Confidence
Intervals
Survival curve with standard error bars. To add standard error bars:
3. Select Additional Plot Statistics.
4. From the Type drop-down list, select Standard Error Bars.
398
9.8 Failures, Censored Values, and Ties
Figure 9.33 Survival Curve with both Symbol Types and 95% Confidence
Intervals
The confidence interval lines were changed from small gray dashed to solid blue. The
censored symbol type was also changed from a solid circle to a square.
Figure 9.34 Modifications made using the Property Browser to a Survival Curve
with both Symbol Types and 95% Confidence Interval
399
SigmaPlot Statistics
• Larger step decreases result from multiple failures occurring at the same time (ties).
• The curve does not decrease at a censored value.
• Tied failure (and failure and censored) values superimpose at the appropriate inside corner
of the step survival curve.
• It is useful to display symbols for censored values.
• It is not necessary to display symbols for failures.
• The survival curve decreases to zero if the largest survival time is a failure.
• Censored values cause the survival curve to decrease more slowly.
Failures and censored values are shown above as open and filled circles, respectively. A single
failure is shown at time = 1.0. It is located at the inner corner of the step curve. All failures
occur at the inner corners so it is not necessary to display failure symbols. You can display
failure symbols in SigmaPlot , but by default they are not visible. Two tied failures are shown
at time = 2.0. They superimpose at the inner corner of the step that has decreased roughly
twice as much as the step for a single failure. Four censored values, two of which are tied,
are shown in the time interval between 2.0 and 8.0. Censored values do not cause a decrease
in the survival curve and nothing unusual occurs at tied censor values. Four tied values, two
failures and two censored, are shown at time = 8.0 (the censored values are slightly displaced
for clarity). They occur at the inside corner of the step since that is where failures are located.
The censored value at time = 19.0 prevents the survival curve from touching the X-axis.
400
10 Computing Power and
Sample Size
Topics Covered in this Chapter
♦ About Power
♦ About Sample Size
♦ Determining the Power of a t-Test
♦ Determining the Power of a Paired t-Test
♦ Determining the Power of a z-Test Proportions Comparison
♦ Determining the Power of a One Way ANOVA
♦ Determining the Power of a Chi-Square Test
♦ Determining the Power to Detect a Specified Correlation
♦ Determining the Minimum Sample Size for a t-Test
♦ Determining the Minimum Sample Size for a Paired t-Test
♦ Determining the Minimum Sample Size for a Proportions Comparison
♦ Determining the Minimum Sample Size for a One Way ANOVA
♦ Determining the Minimum Sample Size for a Chi-Square Test
♦ Determining the Minimum Sample Size to Detect a Specified Correlation
SigmaPlot provides two experimental design aids: experimental power, and sample size
computations. Use these procedures to determine the power of an intended test or to determine
the minimum sample size required to achieve a desired level of power.
Power and sample size computations are available for:
• Unpaired and Paired t-tests
• A z-test comparison of proportions
• One way ANOVAs
• Chi-Square Analysis of Contingency Tables
• Correlation Coefficient
401
SigmaPlot Statistics
You can determine the power of an intended t-test. Use unpaired t-tests to compare two
different samples from populations that are normally distributed with equal variances among
the individuals. For more information, see 5.3 Unpaired t-Test.
To determine the power for a t-test, you need to set the:
• Expected difference of the means of the groups you want to detect.
• Expected standard deviation of the groups.
• Expected sizes of the two groups.
• Alpha (α) used for power computations.
To find the power of a t-test:
402
10.3 Determining the Power of a t-Test
3. Enter the size of the difference between the means of the two groups you want to be able
to detect in the Expected Difference of Means box. This can be the size you expect to
see, as determined from previous samples or experiments, or just an estimate.
4. Enter the estimated size of the standard deviation for the population your data will be
drawn from in the Expected Standard Deviation box. This can be the size you expect to
see, as determined from previous samples or experiments, or just an estimate.
Note
t-Tests assume that the standard deviations of the underlying normally distributed
populations are equal.
5. Enter the expected sizes of each group in the Group 1 Size and Group 2 Size boxes.
6. If desired, change the alpha level in the Alpha box. Alpha (α) is the acceptable probability
of incorrectly concluding that there is a difference. An a error is also called a Type I error
(a Type I error is when you reject the hypothesis of no effect when this hypothesis is
true). The traditional α value used is 0.05. This indicates that a one in twenty chance of
error is acceptable, or that you are willing to conclude there is a significant difference
when P < 0.05.
403
SigmaPlot Statistics
7. Click = to see the power of a t-test at the specified conditions. The Power calculation
appears at the tip of the dialog box. If desired, you can change any of the settings and
click= again to view the new power as many times as desired.
8. Click Save to Report to save the power computation settings and resulting power to the
current report and click Close to exit from t-test power computation.
You can determine the power of a Paired t-test. Use Paired t-tests to see if there is a change
in the same individuals before and after a single treatment or change in condition. The sizes
of the treatment effects are assumed to be normally distributed. For more information, see
6.3 Paired t-Test.
To determine the power for a Paired t-test, you need to set the:
• Expected change before and after treatment you want to detect.
• Expected standard deviation of the changes.
• Number of subjects.
• Alpha used for power computations
To find the power of a Paired t-test:
404
10.4 Determining the Power of a Paired t-Test
3. Enter the size of the change before and after the treatment in the Change to be Detected
box. The size of the change is determined by the difference of the means. This can be
size of the treatment effect you expect to see, as determined from previous experiments,
or just an estimate.
4. Enter the size of standard deviation of the change in the Expected Standard Deviation
of Change box. This can be the size you expect to see, as determined from previous
experiments, or just an estimate.
5. Enter the expected (or estimated) number of subjects in the Desired Sample Size box.
6. Enter the desired alpha level. Alpha (α) is the acceptable probability of incorrectly
concluding that there is an effect. The traditional α value used is 0.05. This indicates that
a one in twenty chance of error is acceptable, or that you are willing to conclude there is a
significant treatment difference when P < 0.05.
405
SigmaPlot Statistics
Figure 10.3 The Paired t-test Power Computation Results Viewed in the Report
9. Click Close.
406
10.5 Determining the Power of a z-Test Proportions Comparison
3. Enter the expected proportions that fall into the category for each group. This can be
the distribution you expect to see, as determined from previous experiments, or just an
estimate.
4. Enter the sizes of each group. This can be sample sizes you expect to obtain, or just an
estimate.
5. Enter the desired alpha level. Alpha (α) is the acceptable probability of incorrectly
concluding that there is an effect. The traditional α value used is 0.05. This indicates that
a one in twenty chance of error is acceptable, or that you are willing to conclude there is a
significant distribution difference when P < 0.05.
407
SigmaPlot Statistics
7. Click Save to Report to save the power computation settings and resulting power to
the current report.
Figure 10.5 The Proportion Power Computation Results Viewed in the Report
You can determine the power of a One Way ANOVA (analysis of variance). Use One Way
ANOVAs to see if there is a difference among two or more samples taken from populations that
are normally distributed with equal variances among the individuals. For more information,
see 5.5 One Way Analysis of Variance (ANOVA).
To determine the power for a One Way ANOVA, you need to specify the:
• Minimum difference between group means you want to detect.
• Standard deviation of the population from which the samples were drawn.
• Estimated number of groups.
• Estimated size of a group.
• Alpha (α) used for power computations.
408
10.6 Determining the Power of a One Way ANOVA
3. Enter the minimum size of the expected difference of group means in the Minimum
Difference in Group Means to be Detected box. This can be size of a difference you
expect to see, as determined from previous experiments, or just an estimate.
The minimum detectable difference is the minimum difference between the largest and
smallest means.
4. Enter the estimated standard deviation of the population from which the samples will be
drawn. This can be size you expect to see, as determined from previous experiments,
or just an estimate.
5. Enter the expected number of groups and the expected size of each group.
6. Enter the desired alpha level. Alpha (α) is the acceptable probability of incorrectly
concluding that there is an effect. The traditional α value used is 0.05. This indicates that
a one in twenty chance of error is acceptable, for example, you are willing to conclude
there is a significant difference when P < 0.05.
409
SigmaPlot Statistics
8. Select Save to Report to save the power computation settings and resulting power to
the current report.
Figure 10.7 The ANOVA Power Computation Results Viewed in the Report
You can determine the power of a chi-square χ2 analysis of a contingency table. A χ2 test
compares the difference between the expected and observed number of individuals of two or
more different groups that fall within two or more categories. For more information, see 7.4
Chi-square Analysis of Contingency Tables.
The power of a χ2 analysis contingency tables is determined by the estimated relative
proportions in each category for each group. Because SigmaPlot uses numbers of observations
to compute the estimated proportions, you need to enter a contingency table in the worksheet
containing the estimated pattern in the observations before you can compute the estimated
proportions.
410
10.7 Determining the Power of a Chi-Square Test
Tip
You only need to specify the pattern (distribution) of the number of observations. The
absolute numbers in the cells do not matter, only their relative values.
To find the power of a chi-squared test:
1. Enter a contingency table into the worksheet by placing the estimated number of
observations for each table cell in a corresponding worksheet cell. These observations are
used to compute the estimated proportions.
The worksheet rows and columns correspond to the groups and categories. The number of
observations must always be an integer.
Tip
The order and location of the rows or columns corresponding to the groups and
categories is not important.
411
SigmaPlot Statistics
4. Select the columns of the contingency table from the worksheet as prompted.
5. Click Finish when you’ve selected the desired columns.
412
10.8 Determining the Power to Detect a Specified Correlation
Figure 10.11 The Chi-square Power Computation Results Viewed in the Report
You can determine the power to detect a given Pearson Product Moment Correlation
Coefficient R. A correlation coefficient quantifies the strength of association between the
values of two variables. A correlation coefficient of 1 means that as one variable increases,
the other increases exactly linearly. A correlation coefficient of -1 means that as one variable
increases, the other decreases exactly linearly. For more information, see 8.8 Pearson Product
Moment Correlation.
To determine the power of a correlation coefficient, you need to specify the:
• Correlation coefficient you want to detect.
• Desired sample size.
• Alpha (α) used for power computations.
To find the power to detect a correlation coefficient:
413
SigmaPlot Statistics
3. Enter the expected correlation coefficient. This can be the correlation coefficient you
expect to see, as determined from previous experiments, or just an estimate.
4. Enter the desired number of data points. This can be the sample size you expect to obtain,
or just an estimate.
5. Enter the desired alpha level. Alpha (α) is the acceptable probability of incorrectly
concluding that there is an association. The traditional α value used is 0.05. This indicates
that a one in twenty chance of error is acceptable, or that you are willing to conclude there
is an association when P < 0.05.
414
10.9 Determining the Minimum Sample Size for a t-Test
415
SigmaPlot Statistics
5. Enter the desired power, or test sensitivity in the Desired Power box. Power is the
probability that the t-test will detect a difference if there really is a difference. The closer
the power is to 1, the more sensitive the test.
Traditionally, you want to achieve a power of 0.80, which means that there is an 80%
chance of detecting a difference with 1– α confidence (for example, a 95% confidence
when α = 0.05).
Traditionally, you want to achieve a power of 0.80, which means that there is an 80%
chance of detecting a difference with 1– α confidence (for example, a 95% confidence
when α = 0.05).
6. Enter the desired alpha level in the Alpha box. Alpha (α) is the acceptable probability of
incorrectly concluding that there is a difference.
The traditional α value used is 0.05. This indicates that a one in twenty chance of error
is acceptable, or that you are willing to conclude there is a significant difference when
P < 0.05.
Smaller values of α result in stricter requirements before concluding there is a significant
difference, but a greater possibility of concluding there is no difference when one exists (a
Type II error). Larger values of α make it easier to conclude that there is a difference, but
also increase the risk of reporting a false positive (a Type I error).
7. Click = to see the required sample size for a t-test at the specified conditions. The sample
size calculation appears at the top of the dialog. The sample size is the size of each of
the groups. If desired, you can change any of the settings and click = again to view the
new sample size as many times as desired.
8. Click Save to Report to save the sample size computation settings and resulting sample
size to the current report.
Figure 10.14 The t-test Sample Size Results Viewed in the Report
416
10.10 Determining the Minimum Sample Size for a Paired t-Test
You can determine the sample size for a Paired t-test. Use Paired t-tests to see if there is a
change in the same individuals before and after a single treatment or change in condition. The
sizes of the treatment effects are assumed to be normally distributed. For more information,
see 6.3 Paired t-Test.
To determine the sample size for a Paired t-test, you need to estimate the:
• Difference of the means you wish to detect.
• Estimated standard deviation of the changes in the underlying population.
• Desired power or sensitivity of the test.
• Alpha (α) used to determine the sample size.
To find the sample size for a Paired t-test:
Figure 10.15 The t-test Sample Size Results Viewed in the Report
3. Enter the size of the change before and after the treatment in the Change to be Detected
box. This can be size of the treatment effect you expect to see, as determined from
previous experiments, or just an estimate.
417
SigmaPlot Statistics
4. Enter the size of standard deviation of the change in Expected Standard Deviation of
Change. This can be size you expect to see, as determined from previous experiments,
or just an estimate.
5. Enter the desired power, or test sensitivity. Power is the probability that the paired t-test
will detect an effect if there really is an effect. The closer the power is to 1, the more
sensitive the test. Traditionally, you want to achieve a power of 0.80, which means that
there is an 80% chance of detecting an effect with 1– α confidence (for example, a 95%
confidence when α = 0.05).
6. Enter the desired alpha level. Alpha (α) is the acceptable probability of incorrectly
concluding that there is an effect. The traditional α value used is 0.05. This indicates that
a one in twenty chance of error is acceptable, or that you are willing to conclude there is a
significant treatment difference when P < 0.05.
418
10.11 Determining the Minimum Sample Size for a Proportions Comparison
You can determine the sample size for a z-test comparison of proportions. A comparison of
proportions compares the difference in the proportion of two different groups that falls within
a single category. For more information, see 7.3 Comparing Proportions Using the z-Test.
To determine the sample size for a proportion comparison, you need to specify the:
• Proportion of each group that falls within the category.
• Desired power or sensitivity of the test.
• Alpha (α) used to determine the sample size.
To find the sample size for a z-test proportion comparison:
419
SigmaPlot Statistics
5. Enter the desired alpha level. Alpha (α) is the acceptable probability of incorrectly
concluding that there is an effect. The traditional α value used is 0.05. This indicates that
a one in twenty chance of error is acceptable, or that you are willing to conclude there is a
significant distribution difference when P < 0.05.
Note
The Yates correction factor is used if this option was selected in the Options for
z-Test dialog box. For more information, see 7.3.4 Setting z-test Options.
7. Click Save to Report to save the sample size computation settings and resulting sample
size to the current report. The estimated sample size is the sample size for each group.
Figure 10.18 The Proportions Sample Size Results Viewed in the Report
You can determine the group sample size for a One Way ANOVA (analysis of variance). One
Way ANOVAs are used to see if there is a difference among two or more samples taken from
420
10.12 Determining the Minimum Sample Size for a One Way ANOVA
populations that are normally distributed with equal variances among the individuals. For
more information, see 5.5 One Way Analysis of Variance (ANOVA).
To determine the sample size for a One Way ANOVA, you need to specify the:
• Minimum difference in between group means to be detected.
• Estimated standard deviation of the underlying populations.
• Number of groups.
• Desired power or sensitivity of the ANOVA.
• Alpha (α) used to determine the sample size.
To find the sample size for a One Way ANOVA:
3. Enter the size of the minimum expected difference of group means in the Minimum
Detectable Difference box. This can be size of a difference you expect to see, as
determined from previous experiments, or just an estimate.
The minimum detectable difference is the minimum difference between the largest and
smallest means.
421
SigmaPlot Statistics
4. Enter the size of standard deviation of the residuals. This can be size you expect to see, as
determined from previous experiments, or just an estimate. Note that one way ANOVA
assumes that the standard deviations of the underlying normally distributed populations
are equal. Then enter the expected number of groups.
5. Enter the desired power, or test sensitivity. Power is the probability that the ANOVA will
detect a difference if there really is a difference among the groups. The closer the power
is to 1, the more sensitive the test. Traditionally, you want to achieve a power of 0.80,
which means that there is an 80% chance of detecting an difference with 1– α confidence
(for example, a 95% confidence when α = 0.05).
6. Enter the desired alpha level. Alpha (α) is the acceptable probability of incorrectly
concluding that there is an effect. The traditional α value used is 0.05. This indicates that
a one in twenty chance of error is acceptable, or that you are willing to conclude there
is a significant difference when P < 0.05.
Figure 10.20 The ANOVA Sample Size Results Viewed in the Report
422
10.13 Determining the Minimum Sample Size for a Chi-Square Test
You can determine the sample size for a chi-square χ2 analysis of a contingency table. A
Chi-square test compares the difference between the expected and observed number of
individuals of two or more different groups that fall within two or more categories. For more
information, see 7.4 Chi-square Analysis of Contingency Tables.
The sample size for a chi-square analysis contingency table is determined by the estimated
relative proportions in each category for each group. Because SigmaPlot uses numbers of
observations to compute these estimated proportions, you need to enter a contingency table
in the worksheet containing the estimated number of observations before you can compute
the estimated proportions.
To find the sample size for a Chi-square test:
1. Enter a contingency table into the worksheet by placing the estimated number of
observations for each table cell in a corresponding worksheet cell.
The worksheet rows and columns correspond to the groups and categories. The number of
observations must always be an integer.
Note that the order and location of the rows or columns corresponding to the groups
and categories is unimportant. You can use the rows for category and the columns for
group, or vice versa.
423
SigmaPlot Statistics
4. Select the columns of the contingency table from the worksheet as prompted.
5. Click Finish when you have selected all three columns.
6. Enter the desired power, or test sensitivity. Power is the probability that the chi-square
test will detect a difference in observed distribution if there really is a difference. The
424
10.14 Determining the Minimum Sample Size to Detect a Specified Correlation
closer the power is to 1, the more sensitive the test. Traditionally, you want to achieve a
power of 0.80, which means that there is an 80% chance of detecting an difference with
1– α confidence (for example, a 95% confidence when α = 0.05).
7. Enter the desired alpha level. Alpha (α) is the acceptable probability of incorrectly
concluding that there is a difference. The traditional α value used is 0.05. This indicates
that a one in twenty chance of error is acceptable, or that you are willing to conclude there
is a significant difference when P < 0.05.
Figure 10.24 The Chi-square Sample Size Computation Results Viewed in the
Report
10. Click Close to exit from Chi-Square test sample size computation.
You can determine the sample size necessary to detect a specified Pearson Product Moment
Correlation Coefficient R. A correlation coefficient quantifies the strength of association
between the values of two variables. A correlation coefficient of 1 means that as one variable
increases, the other increases exactly linearly. A correlation coefficient of -1 means that
425
SigmaPlot Statistics
as one variable increases, the other decreases exactly linearly. For more information, see
8.8 Pearson Product Moment Correlation.
To determine the sample size necessary to detect a specified correlation coefficient, you need
to specify the:
• Expected value of the correlation coefficient.
• Desired power or sensitivity of the test.
• Alpha (α) used to determine the sample size.
To find the sample size required for a specific correlation coefficient:
3. Enter the expected correlation coefficient in the Correlation Coefficient box. This can be
the correlation coefficient you expect to see, as determined from previous experiments,
or just an estimate.
4. Enter the desired power, or test sensitivity. Power is the probability that the correlation
coefficient quantifies an actual association. The closer the power is to 1, the more
sensitive the test. Traditionally, you want to achieve a power of 0.80, which means that
there is an 80% chance of detecting an association with 1– α confidence (for example, a
95% confidence when α = 0.05).
5. Enter the desired alpha level. Alpha (α) is the acceptable probability of incorrectly
concluding that there is an association. The traditional α value used is 0.05. This indicates
that a one in twenty chance of error is acceptable, or that you are willing to conclude there
is an association when P < 0.05.
426
10.14 Determining the Minimum Sample Size to Detect a Specified Correlation
Figure 10.26 The Correlation Coefficient Sample Size Results Viewed in the
Report
427
11 Report Graphs
Topics Covered in this Chapter
♦ Generating Report Graphs
You can generate graphs for all test reports except rates and proportions tests, Best Subset and
Incremental Polynomial Regression, and Multiple Logistic reports.
1. Click the Report tab and then in the Result Graphs group, click Create Result Graph.
The Create Result Graph dialog box appears displaying the available graphs for the
selected report.
Note
Create Result Graph and Create Result Graph are dimmed if no report is
selected or if the selected report does not generate a graph.
2. Select the report graph you want to create, then click OK, or double-click the graph
in the list.
Figure 11.1 The Create Graph Dialog Box for a Report Graph
429
SigmaPlot Statistics
The selected graph appears in a graph page window with the name of the page in the
window title bar. Graph pages are named according to the type of graph created and are
numbered incrementally. The graph page is assigned to the test section of its associated
report.
430
11.1.2 Scatter Plot
Figure 11.2 The scatter plot graphs the group means as single points with error
bars indicating the standard deviation.
431
SigmaPlot Statistics
• ANOVA on Ranks. For more information, see 5.8 Kruskal-Wallis Analysis of Variance
on Ranks.
If the graph data is indexed, the levels in the factor column are used as the tick marks for the
plot points, and the column titles are used as the X and Y axis titles. If the graph data is in
raw or statistical format, the column titles are used as the tick marks for the plot points and
default X Data and Y Data axis titles are assigned to the graph.
Figure 11.3 A Point Plot of the Result Data for an ANOVA on Ranks
Figure 11.4 A Point and Column Means Plot of the Result Data for a Descriptive
Statistics Test
The error bars plot the column means and the standard deviations of the column data.
432
11.1.5 Box Plot
433
SigmaPlot Statistics
Figure 11.5 A Box Plot of the Result Data for the Rank Sum Test
434
11.1.7 Bar Chart of the Standardized Residuals
Figure 11.6 Scatter Plot of the Simple Linear Regression Residuals with
Standard Deviation
435
SigmaPlot Statistics
436
11.1.9 Normal Probability Plot
• Multiple Linear Regression. For more information, see 8.3 Multiple Linear Regression.
• Polynomial Regression. For more information, see 8.5 Polynomial Regression.
• Stepwise Regression. For more information, see 8.6 .
• Nonlinear Regression. For more information, see .
• Normality Test. For more information, see 3.9 Testing Normality.
437
SigmaPlot Statistics
438
11.1.11 3D Residual Scatter Plot
439
SigmaPlot Statistics
Figure 11.11 A Multiple Linear Regression 3D Residual Scatter Plot of the Two
Selected Independent Variable Columns
440
11.1.13 3D Category Scatter Graph
Figure 11.12 A Two Way ANOVA Grouped Bar Chart with Error Bars
441
SigmaPlot Statistics
442
11.1.15 Multiple Comparison Graphs
Figure 11.14 A Before and After Plot Displaying Data for a Paired t-Test
443
SigmaPlot Statistics
444
11.1.17 Profile Plots
445
SigmaPlot Statistics
in a Three-Way ANOVA are determined by averaging the cell means over all levels of the
remaining two factors while fixing each level of the given factor.
Profile Plots - Main Effects graphs are available for the following tests:
• Two Way Analysis of Variance (ANOVA). For more information, see 3.3 Describing
Your Data with Basic Statistics.
• Three Way Analysis of Variance (ANOVA).
For Main Effects, there is one plot per graph and the number of graphs equals the number
of factors. For each graph, the levels of one factor are fixed, while cell means are averaged
over all levels of the other factors (one other factor for Two-Way ANOVA, two other factors
for Three-Way ANOVA).
Profile Plots - 2Way Effects graphs are available for the following tests:
• Two Way Analysis of Variance (ANOVA). For more information, see 3.3 Describing
Your Data with Basic Statistics.
• Three Way Analysis of Variance (ANOVA).
For 2-Way Effects, there is one graph for each distinct pairwise-combination of factors (so
there will be one graph for Two-Way ANOVA and three graphs for Three-Way ANOVA).
Each of these graphs contains multiple profile plots, one for each level of one of the factors.
For Three-Way ANOVA, cell means are averaged over all levels of the remaining third factor
(whichever factor not included in the pairwise-combination for the given 2-Way Effects graph).
Profile Plots - 3Way Effects graphs are available for the following test:
• Three Way Analysis of Variance (ANOVA).
For 3-Way Effects in Three-Way ANOVA, the number of graphs equals the number of levels
of the third factor (which is the last factor that was selected for running the test). Each graph
for 3-Way Effects contains multiple profile plots, one for each level of one of the second factor
(which is the factor that was selected second for running the test).
446
Index
A advisor ................................................. 6
Advisor Wizard determining test to use........................... 6
calculating power.................................. 6 categories
calculating sample size...................... 3, 6 comparing .......................................... 32
data format ........................................... 9 cCorrelation procedures
defining your goals................................ 3 Spearman Rank Order ......................... 34
determining sensitivity .......................... 3 chi-square test
independent variables.......................... 10 when to use ........................................ 32
measuring data...................................... 4 chi-Square test
number of treatments ............................ 6 calculating power/sample size.............. 39
repeated observations ............................ 6 choosing
starting................................................. 3 appropriate procedure.......................... 17
using .................................................... 3 choosing column data
viewing ................................................ 3 descriptive statistics ............................ 22
alpha value coefficients
in power ............................................. 39 correlation .......................................... 34
sample size......................................... 39 compare groups procedures
ANOVA................................................... 6 determining test to use........................... 6
ANOVA on ranks compare many groups procedure
when to use .................................... 6, 27 ANOVA on ranks................................ 27
arranging data one way ANOVA...........................27–28
descriptive statistics ............................ 19 two way ANOVA...........................27–28
normality test...................................... 36 Compare many groups procedure
when to use ........................................ 27
compare two groups procedure
B when to use ........................................ 26
backward stepwise regression comparing
when to use ........................................ 33 categories ........................................... 32
bar charts comparing groups
descriptive statistics results.................. 24 choosing group comparison ................. 26
before & after procedures many.................................................. 27
paired t-test ........................................ 29 same group before and after multiple
signed rank test................................... 29 treatments......................................... 30
best subset regression same group before and after one
when to use ...................................11, 33 treatment .......................................... 29
box plots two groups ......................................... 26
descriptive statistics results.................. 24 computing.............................................. 23
calculating........................................ 402
conditions
C number of......................................... 3, 6
calculating ............................................. 23 confidence interval
N statistic ........................................... 23 descriptive statistics .......................20–21
power................................................. 39 descriptive statistics results.................. 23
calculating power ..................................... 3 for the mean ....................................... 23
advisor ................................................. 6 contingency table
determining test to use........................... 6 data format ........................................... 9
t-test ................................................ 402 continuous scale
calculating power: measuring data...................................... 4
determining test to use........................... 6 correlation................................................ 3
calculating sample size ............................. 3 correlation coefficient
447
SigmaPlot Statistics
448
Profile Plots - 3Way Effects
449
SigmaPlot Statistics
450
Profile Plots - 3Way Effects
451
SigmaPlot Statistics
W
Wilcoxon signed rank test
signed rank test................................... 29
Z
z-test
calculating power/sample size.............. 39
when to use ........................................ 32
452