0% found this document useful (0 votes)
38 views

Biostatistics Refresher WB PH

This document provides an overview of biostatistics concepts for pharmacists. It describes differences between descriptive and inferential statistics, types of data and appropriate statistical tests, measures of central tendency and data spread, hypothesis testing and p-values/confidence intervals. The learning objectives cover topics like choosing appropriate tests based on study design and sample characteristics, interpreting statistical significance, and identifying misuses of statistical methods. A self-assessment with multiple choice questions is provided to help evaluate understanding.

Uploaded by

reham O
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

Biostatistics Refresher WB PH

This document provides an overview of biostatistics concepts for pharmacists. It describes differences between descriptive and inferential statistics, types of data and appropriate statistical tests, measures of central tendency and data spread, hypothesis testing and p-values/confidence intervals. The learning objectives cover topics like choosing appropriate tests based on study design and sample characteristics, interpreting statistical significance, and identifying misuses of statistical methods. A self-assessment with multiple choice questions is provided to help evaluate understanding.

Uploaded by

reham O
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Biostatistics: A Refresher

Kevin M. Sowinski, Pharm.D., FCCP


Purdue University College of Pharmacy
Indiana University School of Medicine
West Lafayette and Indianapolis, Indiana
Biostatistics: A Refresher

Biostatistics: A Refresher
Kevin M. Sowinski, Pharm.D., FCCP
Purdue University College of Pharmacy
Indiana University School of Medicine
West Lafayette and Indianapolis, Indiana

ACCP Updates in Therapeutics® 2019: Pharmacotherapy Preparatory Review and Recertification Course

1-3
Biostatistics: A Refresher

Learning Objectives A. Kruskal-Wallis test.


B. Wilcoxon signed-rank test.
1. Describe differences between descriptive and
C. Analysis of variance (ANOVA).
inferential statistics.
2. Identify different types of data (nominal, ordinal, D. Analysis of covariance (ANCOVA).
continuous [ratio and interval]) to determine an
appropriate type of statistical test (parametric vs. 2. You are evaluating a randomized, double-blind,
nonparametric). parallel-group controlled trial that compares four
3. Describe strengths and limitations of different antihypertensive drugs for their effect on blood
types of measures of central tendency (mean, pressure. The authors conclude that hydrochlorothi-
median, and mode) and data spread (standard azide is better than atenolol (p<0.05) and that enal-
deviation, standard error of the mean, range, and april is better than hydrochlorothiazide (p<0.01),
interquartile range). but no difference is observed between any other
4. Describe the concepts of normal distribution and the drugs. The investigators used an unpaired (inde-
associated parameters that describe the distribution. pendent samples) t-test to test the hypothesis that
5. State the types of decision errors that can occur each drug was equal to the other. Which statement
when using statistical tests and the conditions is most appropriate?
under which they can occur. A. Investigators used the appropriate statistical
6. Describe hypothesis testing, and state the meaning test to analyze their data.
of and distinguish between p-values and confi- B. Enalapril is the most effective of these drugs.
dence intervals.
C. ANOVA would have been a more appropriate
7. Describe areas of misuse or misrepresentation that
test.
are associated with various statistical methods.
8. Select appropriate statistical tests on the basis D. A paired t-test is a more appropriate test.
of the sample distribution, data type, and study
design. 3. In the results of a randomized, double-blind, con-
9. Interpret statistical significance for results from trolled clinical trial, the difference in hospital
commonly used statistical tests. readmission rates between the intervention group
10. Describe the similarities and differences between and the control group is 6% (p=0.01), and it is con-
statistical tests, and state how to apply them cluded that there is a statistically significant differ-
appropriately. ence between the groups. Which statement is most
11. Identify the use of survival analysis and different consistent with this finding and conclusions?
ways to perform and report it. A. The chance of making a type I error is 5 in 100.
B. The trial does not have enough power.
C. There is a high likelihood of having made a
Self-Assessment Questions
type II error.
Answers and explanations to these questions may be
found at the end of the chapter. D. The chance of making a type I error is 1 in 100.

1. A randomized controlled trial assesses the effects 4. You are reading a manuscript that evaluates the
of the treatment of heart failure on global func- impact of obesity on enoxaparin pharmacokinet-
tioning in three groups of adults after 6 months of ics. The authors used an unpaired t-test to compare
treatment. Investigators wanted to assess global the baseline values of body mass index (BMI) in
functioning with the New York Heart Association normal subjects and obese subjects. You are eval-
(NYHA) functional classification, an ordered scale uating the use of an unpaired t-test to compare the
from I to IV, and to compare the patient classifi- BMI between the two groups. Which choice best
cation after 6 months of treatment. Which statisti- represents the most appropriate criteria to be met
cal test is most appropriate to assess differences in to use this parametric test?
functional classification between the groups?

ACCP Updates in Therapeutics® 2019: Pharmacotherapy Preparatory Review and Recertification Course

1-4
Biostatistics: A Refresher

A.
The sample sizes in the normal and obese B. The new drug and active control appear to
subjects should be equal to allow the use of a be equally efficacious in increasing HDL
t-test. concentrations.
B. A t-test is not appropriate because BMI data C. The new drug is better than lifestyle modifica-
are ordinal. tions because it increases HDL concentrations
C. The variance of the BMI data has to be similar to a greater extent.
in each group. D. This study is potentially underpowered.
D. The pre-study power should be at least 90%.
7. Researchers planned a study to evaluate the per-
5. You are evaluating the results and discussion of centage of subjects who achieved less than a tar-
a journal club article to present to the pharmacy get blood pressure (less than 140/90 mm Hg)
residents at your institution. The randomized, pro- when initiating therapy with two different doses
spective, controlled trial evaluated the efficacy of a of amlodipine. In the study of 100 subjects, the
new controller drug for asthma. The primary end amlodipine 5-mg group (n=50) and the amlodipine
point was the morning forced expiratory volume 10-mg group (n=50) were compared. The investi-
in 1 second (FEV1) in two groups of subjects (men gators used a blood pressure goal as their primary
and women). The difference in FEV1 between the end point, defined as the percentage of subjects
two groups was 15% (95% confidence interval who successfully achieved the blood pressure goal
[CI], 10%–21%). Which statement is most appro- at 3 months. Which is the most appropriate statisti-
priate, given the results? cal test to answer such a question?
A. Without the reporting of a p-value, it is not A. Independent samples t-test.
possible to conclude whether these results B. Chi-square or Fisher exact test.
were statistically significant. C. Wilcoxon signed-rank test.
B. There is a statistically significant difference D. One-sample t-test.
between the men and women (p<0.05).
C. There is a statistically significant difference 8. An investigational drug is being compared with
between the men and women (p<0.01). an existing drug for the treatment of anemia in
D. There is no statistically significant difference patients with chronic kidney disease. The study is
between the men and women. designed to detect a minimum 20% difference in
response rates between the groups, if one exists,
6. An early-phase clinical trial of 40 subjects evalu- with an a priori α of 0.05 or less. The investigators
ated a new drug known to increase high-density are unclear whether the 20% difference between
lipoprotein cholesterol (HDL) concentrations. The response rates is too large and think a smaller
objective of the trial was to compare the new drug’s difference might be more clinically meaningful.
ability to increase HDL with that of lifestyle mod- In revising their study, they decide they want to
ifications (active control group). At the beginning be able to detect a minimum 10% difference in
of the study, the mean baseline HDL was 37 mg/ response. Which change to the study parameters is
dL in the active control group and 38 mg/dL in the most appropriate?
new drug group. At the end of the 3-month trial, A. Increase the sample size.
the mean HDL for the control group was 44 mg/dL B. Select an α of 0.001 as a cutoff for statistical
and for the new drug group, 49 mg/dL. The p-value significance.
for the comparison at 3 months was 0.08. Which
C. Select an α of 0.10 as a cutoff for statistical
statement provides the best interpretation of these
significance.
results?
D. Decrease the sample size.
A. An a priori α of less than 0.10 would have
made the study more clinically useful.

ACCP Updates in Therapeutics® 2019: Pharmacotherapy Preparatory Review and Recertification Course

1-5
Biostatistics: A Refresher

9. You are designing a new computer alert system to


investigate the impact of several factors on the risk
of corrected QT interval (QTc) prolongation. You
want to develop a model to predict which patients
are most likely to experience QTc prolongation
after the administration of certain drugs or the
presence of certain conditions. You plan to assess
the presence or absence of several different vari-
ables. Which technique will be most useful in com-
pleting such an analysis?
A. Correlation.
B. Kaplan-Meier curve.
C. Regression.
D. Confidence intervals.

ACCP Updates in Therapeutics® 2019: Pharmacotherapy Preparatory Review and Recertification Course

1-6
Biostatistics: A Refresher

I. INTRODUCTION TO STATISTICS

A. Method for Collecting, Classifying, Summarizing, and Analyzing Data

B. Useful Tool for Quantifying Clinical and Laboratory Data in a Meaningful Way

C. Assists in Determining Whether and by How Much a Treatment or Procedure Affects a Group of Patients

D. Why Pharmacists Need to Know Statistics

E. As Statistics Pertains to Most of You


1. Pharmacotherapy Specialty Examination content outline, Domain 2: Drug Information and Evidence
Based Medicine (25%)
2. Task statements:
a. Retrieve information that addresses pharmacotherapy-related inquiries in order to optimize patient
care.
b. Evaluate pharmacotherapy-related literature, databases, and health information in order to trans-
late findings into practice.
c. Conduct pharmacotherapy-related research using appropriate scientific principles in order to
ensure optimal patient care.
d. Disseminate pharmacotherapy-related information or research in order to educate health care pro-
fessionals and trainees.

F. Examples of Online Statistical and Study Design Tools


1. www.graphpad.com/quickcalcs/
2. http://statpages.org/

G. Several Papers Have Investigated the Various Types of Statistical Tests Used in the Biomedical Literature;
the data from one of these papers are illustrated in the text that follows. Tables 1 and 2 are modified from
Windish DM, Huot SJ, Green ML. Medicine resident’s understanding of the biostatistics and results in the
medical literature. JAMA 2007;298:1010-22.

ACCP Updates in Therapeutics® 2019: Pharmacotherapy Preparatory Review and Recertification Course

1-7
Biostatistics: A Refresher

Table 1. Statistical Content of Original Articles in New England Journal of Medicine, 2004–2005
% of Articles % of Articles
Statistical Procedure Statistical Procedure
Containing Methods Containing Methods
No statistics or descriptive 13 Adjustment and standardization 1
statistics
t-tests 26 Multiway tables 13
Contingency tables 53 Power analyses 39
Nonparametric tests 27 Cost-benefit analysis <1
Epidemiologic statistics 35 Sensitivity analysis 6
Pearson correlation 3 Repeated-measures analysis 12
Simple linear regression 6 Missing-data methods 8
Analysis of variance 16 Noninferiority trials 4
Transformation 10 Receiver operating characteristics 2
Nonparametric correlation 5 Resampling 2
Principal component and cluster
Survival methods 61 2
analyses
Multiple regression 51 Other methods 4
Multiple comparisons 23

Table 2. Statistical Content of Original Articles from Six Major Medical Journals from January to March 2005
(n=239 articles)a
Statistical Test No. (%) Statistical Test No. (%)
Descriptive statistics (mean,
219 (91.6) Others
median, frequency, SD, and IQR)
Simple statistics 120 (50.2) Intention-to-treat analysis 42 (17.6)
Chi-square analysis 70 (29.3) Incidence or prevalence 39 (16.3)
t-test 48 (20.1) Relative risk or risk ratio 29 (12.2)
Kaplan-Meier analysis 48 (20.1) Sensitivity analysis 21 (8.8)
Wilcoxon rank sum test 38 (15.9) Sensitivity or specificity 15 (6.3)
Fisher exact test 33 (13.8)
Analysis of variance 21 (8.8)
Correlation 16 (6.7)
Multivariate analysis 164 (68.6)
Cox proportional hazards 64 (26.8)
Multiple logistic regression 54 (22.6)
Multiple linear regression 7 (2.9)
Other regression analysis 38 (15.9)
None 5 (2.1)
Articles published in American Journal of Medicine, Annals of Internal Medicine, BMJ, Journal of the American Medical Association, Lancet,
a

and New England Journal of Medicine.


IQR = interquartile range; SD = standard deviation.

ACCP Updates in Therapeutics® 2019: Pharmacotherapy Preparatory Review and Recertification Course

1-8
Biostatistics: A Refresher

II. TYPES OF VARIABLES AND DATA

A. Definition: Random Variables—A variable with observed values that may be considered outcomes of an
experiment and whose values cannot be anticipated with certainty before the experiment is conducted
B. Two Types of Random Variables
1. Discrete variables (e.g., dichotomous, categorical)
2. Continuous variables
C. Discrete Variables
1. Can take only a limited number of values within a given range
2. Nominal: Classified into groups in an unordered manner and with no indication of relative severity
(e.g., male or female sex, mortality [dead or alive], disease presence [yes or no], race, marital status)
3. Ordinal: Ranked in a specific order but with no consistent level of magnitude of difference between
ranks (e.g., NYHA functional class describes the functional status of patients with heart failure, and
subjects are classified in increasing order of symptoms: I, II, III, IV; Likert-type scales)
4. Common error: Measure of central tendency—In most cases, means and standard deviations (SDs)
should not be reported with ordinal data. What is a common incorrect use of means and SDs to show
ordinal data?
D. Continuous Variables, Sometimes Called Counting Variables
1. Continuous variables can take on any value within a given range.
2. Interval: Data are ranked in a specific order with a consistent change in magnitude between units; the
zero point is arbitrary (e.g., degrees Fahrenheit).
3. Ratio: Like “interval” but with an absolute zero (e.g., degrees Kelvin, heart rate, blood pressure, time,
distance)

III. TYPES OF STATISTICS

A. Descriptive Statistics: Used to summarize and describe data that are collected or generated in research
studies. This is done both visually and numerically.
1. Visual methods of describing data
a. Frequency distribution
b. Histogram
c. Scatterplot
d. Boxplot
2. Numerical methods of describing data: Measures of central tendency
a. Arithmetic mean (i.e., average)
i. Sum of all values divided by the total number of values
ii. Should generally be used only for continuous and normally distributed data
iii. Very sensitive to outliers and tend toward the tail, which has the outliers
iv. Most commonly used and most understood measure of central tendency
v. Geometric mean
b. Median
i. Midpoint of the values when placed in order from highest to lowest. Half of the observations
are above and half are below. When there is an even number of observations, it is the mean of
the two middle values.
ii. Also called 50th percentile
iii. Can be used for ordinal or continuous data (especially good for skewed populations)
iv. Insensitive to outliers

ACCP Updates in Therapeutics® 2019: Pharmacotherapy Preparatory Review and Recertification Course

1-9
Biostatistics: A Refresher

c. Mode
i. Most common value in a distribution
ii. Can be used for nominal, ordinal, or continuous data
iii. Sometimes, there may be more than one mode (e.g., bimodal, trimodal).
iv. Does not help describe meaningful distributions with a large range of values, each of which
occurs infrequently
3. Numerical methods of describing data: Measures of data spread or variability
a. Standard deviation
i. Measure of the variability about the mean; most common measure used to describe the spread
of data
ii. Square root of the variance (average squared difference of each observation from the mean);
returns variance back to original units (non-squared)
iii. Appropriately applied only to continuous data that are normally or near normally distributed
or that can be transformed to be normally distributed
iv. By the empirical rule for normal distributions, 68% of the sample values are found within
±1 SD, 95% are found within ±2 SD, and 99% are found within ±3 SD.
v. The coefficient of variation relates the mean and the SD (SD/mean × 100%).
b. Range
i. Difference between the smallest and largest values in a data set does not give a tremendous
amount of information by itself.
ii. Easy to compute (simple subtraction)
iii. Size of range is very sensitive to outliers.
iv. Often reported as the actual values rather than the difference between the two extreme values
c. Percentiles
i. The point (value) in a distribution in which a value is larger than some percentage of the other
values in the sample. Can be calculated by ranking all data in a data set
ii. The 75th percentile lies at a point at which 75% of the other values are smaller.
iii. Does not assume the population has a normal distribution (or any other distribution)
iv. The interquartile range (IQR) is an example of the use of percentiles to describe the middle
50% values. The IQR encompasses the 25th–75th percentile.
4. Presenting data using only measures of central tendency can be misleading without some idea of data
spread. Studies that report only medians or means without their accompanying measures of data spread
should be closely scrutinized. What are the measures of spread that should be used with means and
medians?
5. Example data set (Table 3)

Table 3. Twenty Baseline HDL Concentrations from an Experiment Evaluating the Impact of Green Tea on HDL
64 60 59 65 64 62 54
54 68 67 79 55 48 65
59 65 87 49 46 46

a. Calculate the mean, median, and mode of the data set given in Table 3.
b. Calculate the range, and SD (on examination, you will not have to do this by hand).
c. Evaluate the visual presentation of the data.

ACCP Updates in Therapeutics® 2019: Pharmacotherapy Preparatory Review and Recertification Course

1-10
Biostatistics: A Refresher

B. Inferential Statistics
1. Conclusions or generalizations made about a population (large group) from the study of a sample of that
population
2. Choosing and evaluating statistical methods depend, in part, on the type of data used.
3. An educated statement about an unknown population is commonly called an inference in statistics.
4. Statistical inference can be made by estimation or hypothesis testing.

IV. POPULATION DISTRIBUTIONS

A. Discrete Distributions
1. Binomial distribution
2. Poisson distribution

B. Normal (Gaussian) Distribution


1. Most common model for population distributions
2. Symmetric or bell-shaped frequency distribution
3. Landmarks for continuous, normally distributed data
a. µ: Population mean is equal to zero.
b. σ: Population SD is equal to 1.
c. x and s: These represent the sample mean and SD.
4. When a random variable is measured in a large enough sample of any population, some values will
occur more often than will others.
5. A visual check of a distribution can help determine whether it is normally distributed (whether it appears
symmetric and bell shaped). Need the data to perform these checks.
a. Frequency distribution and histograms (visually look at the data; you should do this anyway)
b. Median and mean will be about equal for normally distributed data (most practical and easiest to
use).
c. Formal test: Kolmogorov-Smirnov test
d. More challenging to evaluate this when we do not have access to the data (when we are reading an
article), because most articles do not present all data or both the mean and median
6. The parameters mean and SD define a normally distributed population.
7. Probability: The likelihood that any one event will occur given all the possible outcomes
8. Estimation and sampling variability
a. One method that can be used to make an inference about a population parameter
b. Separate samples (even of the same size) from a single population will give slightly different
estimates.
c. The distribution of means from random samples approximates a normal distribution.
i. The mean of this “distribution of means” is equal to the unknown population mean, µ.
ii. The SD of the means is estimated by the standard error of the mean (SEM).
iii. As in any normal distribution, 95% of the sample means lie within ±2 SEM of the population
mean.
d. The distribution of means from these random samples is about normal regardless of the underlying
population distribution (central limit theorem). You will get slightly different mean and SD values
each time you repeat this experiment.
e. The SEM is estimated with a single sample by dividing the SD by the square root of the sample
size (n). The SEM quantifies uncertainty in the estimate of the mean, not variability in the sample.
Important for hypothesis testing and 95% CI estimation

ACCP Updates in Therapeutics® 2019: Pharmacotherapy Preparatory Review and Recertification Course

1-11
Biostatistics: A Refresher

f. Why is all this information about the difference between the SEM and SD worth knowing?
i. Calculation of CIs. (95% CI is approximately the mean ± 2 times the SEM.)
ii. Hypothesis testing
iii. Deception (e.g., makes results look less “variable,” especially when used in graphic format)
9. Recall the previous example about HDL and green tea. From the calculated values in section III,
do these data appear to be normally distributed?

V. CONFIDENCE INTERVALS

A. Commonly Reported as a Way to Estimate a Population Parameter


1. In the medical literature, 95% CIs are the most commonly reported CIs. In repeated samples, 95% of all
CIs include true population value (i.e., the likelihood or confidence [or probability] that the population
value is contained within the interval). In some cases, 90% or 99% CIs are reported. Why are 95% CIs
most often reported?
2. Example
a. Assume a baseline birth weight in a group (n=51) with a mean ± SD of 1.18 ± 0.4 kg.
b. 95% CI is about equal to the mean ± 1.96 × SEM (or 2 × SEM). In reality, it depends on the distri-
bution being used and is a bit more complicated.
c. What is the 95% CI? The 95% CI is calculated to be (1.07, 1.29), meaning there is 95% certainty
that the true mean of the entire population studied will be 1.07–1.29 kg.
d. What is the 90% CI? The 90% CI is calculated to be (1.09, 1.27). The 95% CI will always be wider
than the 90% CI for any given sample. Therefore, the wider the CI, the more likely it is to encom-
pass the true population mean.
3. The differences between the SD, SEM, and CIs should be noted when interpreting the literature because
they are often used interchangeably. Although it is common for CIs to be confused with SDs, the infor-
mation each provides is quite different and has to be assessed correctly.
4. Recall the previous example about HDL and green tea. What is the 95% CI of the data set, and what
does that mean?

B. CIs Can Also Be Used for Any Sample Estimate. Estimates derived from categorical data such as risk, risk
differences, and risk ratios are often presented with the CI and will be discussed in the text that follows.

C. CIs Instead of Hypothesis Testing


1. Hypothesis testing and calculation of p-values tell us (ideally) whether there is or is not a statistically
significant difference between groups, but they do not tell us anything about the magnitude of the
difference.
2. CIs help us determine the importance of a finding or findings, which we can apply to a situation.
3. CIs give us an idea of the magnitude of the difference between groups and the statistical significance.
4. CIs are a range of data, together with a point estimate of the difference.
5. Wide CIs
a. Many results are possible, either larger or smaller than the point estimate provided by the study.
b. All values contained in the CI are statistically plausible.
6. If the estimate is the difference between two continuous variables: A CI that includes zero (no dif-
ference between two variables) can be interpreted as not statistically significant (a p-value of 0.05 or
greater). There is no need to show both the 95% CI and the p-value.

ACCP Updates in Therapeutics® 2019: Pharmacotherapy Preparatory Review and Recertification Course

1-12
Biostatistics: A Refresher

7. The interpretation of CIs for odds ratios and relative risks is somewhat different. In that case, a value
of 1 indicates no difference in risk, and if the CI includes 1, there is no statistical difference. (See the
discussion of case-control/cohort in other sections for how to interpret CIs for odds ratios and relative
risks.)

VI. HYPOTHESIS TESTING

A. Null and Alternative Hypotheses (see Table 4 for other types of examples)
1. Null hypothesis (H0): Example: No difference between groups being compared (treatment A equals
treatment B)
2. Alternative hypothesis (H A): Example: Opposite of null hypothesis; states that there is a difference
(treatment A does not equal treatment B)
3. The structure or the manner in which the hypothesis is written dictates which statistical test is used.
Two-sample t-test: H0: Mean 1 = Mean 2
4. Used to assist in determining whether any observed differences between groups can be explained by
chance
5. Tests for statistical significance (hypothesis testing) determine whether the data are consistent with H0
(no difference).
6. The results of the hypothesis testing will indicate whether enough evidence exists for H0 to be rejected.
a. If H0 is rejected: Statistically significant difference between groups (unlikely attributable to chance)
b. If H0 is not rejected: No statistically significant difference between groups (any apparent differ-
ences may be attributable to chance). Note that we are not concluding that the treatments are equal.
7. Types of hypothesis testing. These are situations in which two groups are being compared. There are
numerous other examples of situations these procedures could be applied to (Table 4).

Table 4. Types of Hypothesis Testing


Question Hypothesis Method
Nondirectional
Difference Are the means different? H0: Mean1 = Mean2 Traditional 2-sided t-test
H A: Mean1 ≠ Mean2 Confidence intervals
or
H0: Mean1 − Mean2 = 0
HA: Mean1 − Mean2 ≠ 0
Equivalence Are the means practically H0: Mean1 − Mean2 ≥ Δ Two 1-sided t-test (TOST) procedure
equivalent? HA: Mean1 − Mean2 < Δ Confidence intervals
Directional
Superiority Is mean 1 > mean 2? H0: Mean1 ≤ Mean2 Traditional 1-sided t-test
(or some other similarly H A: Mean1 > Mean2 Confidence intervals
worded question) or
H0: Mean1 − Mean2 ≤ 0
HA: Mean1 − Mean2 > 0
Noninferiority Is mean 1 no more than a H0: Mean1 − Mean2 ≥ Δ Confidence intervals
certain amount lower than HA: Mean1 − Mean2 < Δ
mean 2?
Δ = equivalence or noninferiority margin; H0 = null hypothesis; H A = alternative hypothesis.

ACCP Updates in Therapeutics® 2019: Pharmacotherapy Preparatory Review and Recertification Course

1-13
Biostatistics: A Refresher

B. To Determine What Is Sufficient Evidence to Reject H0: Set the a priori significance level (α) and generate
the decision rule.
1. Developed after the research question has been stated in hypothesis form
2. Used to determine the level of acceptable error caused by a false positive (also known as level of
significance)
a. Convention: A priori α is usually 0.05.
b. Critical value is calculated, capturing how extreme the sample data must be to reject H0.

C. Perform the Experiment and Estimate the Test Statistic.


1. A test statistic is calculated from the observed data in the study, which is compared with the critical
value.
2. Depending on this test statistic’s value, H0 is not rejected (often called fail to reject) or rejected.
3. In general, the test statistic and critical value are not presented in the literature; instead, p-values
are generally reported and compared with a priori α values to assess statistical significance. p-value:
Probability of obtaining a test statistic and critical value as extreme as or more extreme than the one
actually obtained
4. Because computers are used in these tests, this step is often transparent; the p-value estimated in the
statistical test is compared with the a priori α (usually 0.05), and the decision is made.

VII. STATISTICAL TESTS AND CHOOSING A STATISTICAL TEST

A. Which Tests Do You Need to Know?

B. Choosing the Appropriate Statistical Test Depends on the Following:


1. Type of data (nominal, ordinal, or continuous)
2. Distribution of data (e.g., normal)
3. Number of groups
4. Study design (e.g., parallel, crossover)
5. Presence of confounding variables
6. One-tailed versus two-tailed
7. Parametric versus nonparametric tests
a. Parametric tests assume the following:
i. Data being investigated have an underlying distribution that is normal or close to normal
or, more correctly, randomly drawn from a parent population with a normal distribution.
Remember how to estimate this (mean ~ median)?
ii. Data measured are continuous data, measured on either an interval or a ratio scale.
iii. Parametric tests assume that the data being investigated have variances that are homogeneous
between the groups investigated. This is often called homoscedasticity.
b. Nonparametric tests are used when data are not normally distributed or do not meet other criteria
for parametric tests (e.g., discrete data).

ACCP Updates in Therapeutics® 2019: Pharmacotherapy Preparatory Review and Recertification Course

1-14
Biostatistics: A Refresher

C. Parametric Tests
1. Student t-test: Several different types
a. One-sample test: Compares the mean of the study sample with the population mean

Group 1 Known population mean

b. Two-sample, independent samples, or unpaired test: Compares the means of two independent
samples. This is an independent samples test.

Group 1 Group 2

i. Equal variance test


(a) Rule for variances: If the ratio of larger variance to smaller variance is greater than 2,
we generally conclude the variances are different.
(b) Formal test for differences in variances: F test
(c) Adjustments can be made for cases of unequal variance.
ii. Unequal variance
c. Paired test: Compares the mean difference of paired or matched samples. This is a related samples
test.

Group 1
Measurement 1 Measurement 2

d. Common error: Use of multiple t-tests with more than two groups
2. Analysis of variance (ANOVA): A more generalized version of the t-test that can apply to more than two
groups
a. One-way ANOVA: Compares the means of three or more groups in a study; also known as
single-factor ANOVA. This is an independent samples test.

Group 1 Group 2 Group 3

b. Two-way ANOVA: Additional factor (e.g., age) added

Young groups Group 1 Group 2 Group 3


Old groups Group 1 Group 2 Group 3

c. Repeated-measures ANOVA: This is a related samples test.

Related Measurements
Group 1 Measurement 1 Measurement 2 Measurement 3

d. Several more complex factorial ANOVAs can be used.


e. Many comparison procedures are used to determine which groups actually differ from each other.
Post hoc tests: Tukey HSD (Honestly Significant Difference), Bonferroni, Scheffé, Newman-Keuls
3. Analysis of covariance (ANCOVA): Provides a method to explain the influence of a categorical variable
(independent variable) on a continuous variable (dependent variable) while statistically controlling for
other variables (confounding)

ACCP Updates in Therapeutics® 2019: Pharmacotherapy Preparatory Review and Recertification Course

1-15
Biostatistics: A Refresher

D. Nonparametric Tests
1. These tests may also be used for continuous data that do not meet the assumptions of the t-test or
ANOVA.
2. Tests for independent samples
a. Wilcoxon rank sum test, Mann-Whitney U test, or Wilcoxon Mann-Whitney test: Compare two
independent samples (related to a t-test)
b. Kruskal-Wallis one-way ANOVA by ranks
i. Compares three or more independent groups (related to one-way ANOVA)
ii. Post hoc testing
3. Tests for related or paired samples
a. Sign test and Wilcoxon signed-rank test: Compares two matched or paired samples (related to a
paired t-test)
b. Friedman ANOVA by ranks: Compares three or more matched or paired groups

E. Nominal Data
1. Chi-square (χ2) test: Compares expected and observed proportions between two or more groups
a. Test of independence
b. Test of goodness of fit
2. Fisher exact test: Specialized version of the chi-square test for small groups (cells) containing less than
five predicted observations
3. McNemar: Paired samples
4. Mantel-Haenszel: Controls for the influence of confounders

F. Correlation and Regression (see section IX)

G. Choosing the Most Appropriate Statistical Test: Example 1


1. A trial was conducted to determine the efficacy and safety of alirocumab in reducing lipids and car-
diovascular events. Alirocumab plus statins was compared with placebo plus statins regarding their
effect on low-density lipoprotein cholesterol (LDL) concentrations. The trial was designed such that the
subjects’ baseline characteristics were as comparable as possible with each other. The intended primary
end point for this trial was the difference in LDL between the two treatments at week 24. The full trial
is published: N Engl J Med 2015;372:1489-99. Note that only partial results are presented. The results of
the trial are reported as follows:

Table 5. Baseline Characteristics and Alirocumab and Placebo Effect on LDLa


Alirocumab plus Statins Placebo plus Statins
(n=1553) (n=788)
Men/women (63.3%) 983/570 (60.2%) 474/314
Smokers (20.9%) 325/1228 (20.2%) 159/629
Baseline LDL, mg/dL 122.8 ± 42.7 122.0 ± 41.6
Final LDL, mg/dL 48.3 ± 35.2 118.9 ± 33.5
a
Data are presented as mean ± SD.

ACCP Updates in Therapeutics® 2019: Pharmacotherapy Preparatory Review and Recertification Course

1-16
Biostatistics: A Refresher

2. Which is the appropriate statistical test to determine baseline differences in the following:
a. Sex distribution?
b. LDL?
c. Percentage of smokers and nonsmokers?
3. Which is the appropriate statistical test to determine the following:
a. The effect of alirocumab plus statins on LDL?
b. The primary end point?

VIII. DECISION ERRORS

Table 6. Summary of Decision Errors


Underlying Truth or Reality
Test Result H0 is true (no difference) H0 is false (difference)
Accept H0 (no difference) No error (correct decision) Type II error (β error)
Reject H0 (difference) Type I error (α error) No error (correct decision)
H0 = null hypothesis.

A. Type I Error: The probability of making this error is defined as the significance level α.
1. Convention is to set the α to 0.05, effectively meaning that, 1 in 20 times, a type I error will occur when
the H0 is rejected. Thus, 5.0% of the time, a researcher will conclude that there is a statistically signifi-
cant difference when one does not actually exist.
2. The calculated chance that a type I error has occurred is called the p-value.
3. The p-value tells us the likelihood of obtaining a given (or a more extreme) test result if the H0 is true.
When the α level is set a priori, H0 is rejected when p is less than α. In other words, the p-value tells us
the probability of being wrong when we conclude that a true difference exists (false positive).
4. A lower p-value does not mean the result is more important or more meaningful but only that it is sta-
tistically significant and not likely to be attributable to chance.

B. Type II Error: The probability of making this error is called beta.


1. Concluding that no difference exists when one truly does (not rejecting H0 when it should be rejected)
2. It has become a convention to set β at 0.20–0.10.

C. Power (1 − β)
1. The probability of making a correct decision when H0 is false; the ability to detect differences between
groups if one actually exists
2. Dependent on the following factors:
a. Predetermined α
b. Sample size
c. The size of the difference between the outcomes you want to detect. Often not known before con-
ducting the experiment, so to estimate the power of your test, you will have to specify how large a
change is worth detecting
d. The variability of the outcomes that are being measured
e. Items c and d are generally determined from previous data or the literature.

ACCP Updates in Therapeutics® 2019: Pharmacotherapy Preparatory Review and Recertification Course

1-17
Biostatistics: A Refresher

3. Power is decreased by the following (in addition to the earlier criteria):


a. Poor study design
b. Incorrect statistical tests (use of nonparametric tests when parametric tests are appropriate)
4. Statistical power analysis and sample size calculation
a. Related to the previous discussion of power and sample size
b. Sample size estimates should be performed in all studies a priori.
c. Necessary components for estimating appropriate sample size
i. Acceptable type II error rate (usually 0.10–0.20)
ii. Observed difference in predicted study outcomes that is clinically significant
iii. The expected variability in item ii
iv. Acceptable type I error rate (usually 0.05)
v. Statistical test that will be used for primary end point
5. Statistical significance versus clinical significance
a. As stated earlier, the size of the p-value is not necessarily related to the clinical importance of the
result. Smaller values mean only that chance is less likely to explain observed differences.
b. Statistically significant does not necessarily mean clinically significant.
c. Lack of statistical significance does not mean that results are not clinically important.
d. When considering nonsignificant findings, consider sample size, estimated power, and observed
variability.

IX. CORRELATION AND REGRESSION

A. Introduction: Correlation vs. Regression


1. Correlation examines the strength of the association between two variables. It does not necessarily
assume that one variable is useful in predicting the other.
2. Regression examines the ability of one or more variables to predict another variable.

B. Pearson Correlation
1. The strength of the relationship between two variables that are normally distributed, ratio or interval
scaled, and linearly related is measured with a correlation coefficient.
2. Often called the degree of association between the two variables
3. Does not necessarily imply that one variable is dependent on the other (regression analysis will do that)
4. Pearson correlation (r) ranges from −1 to +1 and can take any value in between:

−1 0 +1
Perfect negative linear relationship No linear relationship Perfect positive linear relationship

5. Hypothesis testing is performed to determine whether the correlation coefficient is different from zero.
This test is highly influenced by sample size.

C. Pearls About Correlation


1. The closer the magnitude of r to 1 (either + or −), the more highly correlated the two variables. The weaker
the relationship between the two variables, the closer r is to 0.
2. There is no agreed-on or consistent interpretation of the value of the correlation coefficient. It is depen-
dent on the environment of the investigation (laboratory vs. clinical experiment).

ACCP Updates in Therapeutics® 2019: Pharmacotherapy Preparatory Review and Recertification Course

1-18
Biostatistics: A Refresher

3. Pay more attention to the magnitude of the correlation than to the p-value because it is influenced by
sample size.
4. Crucial to the proper use of correlation analysis is the interpretation of the graphic representation of
the two variables. Before using correlation analysis, it is essential to generate a scatterplot of the two
variables to visually examine the relationship.

D. Spearman Rank Correlation: Nonparametric test that quantifies the strength of an association between two
variables but does not assume a normal distribution of continuous data. Can be used for ordinal data or
nonnormally distributed continuous data

E. Regression
1. A statistical technique related to correlation. There are many different types. For simple linear regres-
sion, one continuous outcome (dependent) variable and one continuous independent (causative) variable
2. Two main purposes of regression: Development of prediction model and accuracy of prediction
3. Prediction model: Making predictions of the dependent variable from the independent variable;
Y = mx + b (dependent variable = slope × independent variable + intercept)
4. Accuracy of prediction: How well the independent variable predicts the dependent variable. Regression
analysis determines the extent of variability in the dependent variable that can be explained by the
independent variable.
a. Coefficient of determination (r2) measured describing this relationship. Values of r2 can range from
0 to 1.
b. An r2 of 0.80 could be interpreted as saying that 80% of the variability in Y is explained by the
variability in X.
c. This does not provide a mechanistic understanding of the relationship between X and Y but rather
a description of how clearly such a model (linear or otherwise) describes the relationship between
the two variables.
d. Like the interpretation of r, the interpretation of r2 is dependent on the scientific arena (e.g., clinical
research, basic research, social science research) to which it is applied.
5. For simple linear regression, two statistical tests can be used.
a. To test the hypothesis that the y-intercept differs from zero
b. To test the hypothesis that the slope of the line is different from zero
6. Regression is useful in constructing predictive models. The literature is full of examples of predictions.
The process involves developing a formula for a regression line that best fits the observed data.
7. Like correlation, there are many different types of regression analysis.
a. Multiple linear regression: One continuous independent variable and two or more continuous
dependent variables
b. Simple logistic regression: One categorical response (dependent) variable and one continuous or
categorical explanatory (independent) variable
c. Multiple logistic regression: One categorical response (dependent) variable and two or more con-
tinuous or categorical explanatory (independent) variables
d. Nonlinear regression: Variables are not linearly related (or cannot be transformed into a linear
relationship). This is where our pharmacokinetic equations come from.
e. Polynomial regression: Any number of response and continuous variables with a curvilinear
relationship (e.g., cubed, squared)

ACCP Updates in Therapeutics® 2019: Pharmacotherapy Preparatory Review and Recertification Course

1-19
Biostatistics: A Refresher

8. Example of regression
a. The following data are taken from a study evaluating enoxaparin use. The authors were interested
in predicting patient response (measured as anti-factor Xa concentrations) from the enoxaparin
dose in the 75 subjects who were studied.

1.20

1.00
Antifactor Xa Concentrations (U/mL)

0.80

0.60

0.40

0.20

0.00
0.00 1.00 2.00 3.00 4.00
Enoxaparin Dose (mg/Kg)

Figure 1. Relationship between antifactor Xa concentrations and enoxaparin dose.

b. The authors performed regression analysis and reported the following: Slope: 0.227, y-intercept:
0.097, p<0.05, r2 = 0.31.
c. Answer the following questions:
i. What are the assumptions necessary to use regression analysis?
ii. Provide an interpretation of the coefficient of determination.
iii. Predict anti-factor Xa concentrations at enoxaparin doses of 2 and 3.75 mg/kg.
iv. What does the p<0.05 value indicate?

X. SURVIVAL ANALYSIS

A. Studies the Time Between Entry in a Study and Some Event (e.g., death, myocardial infarction)
1. Censoring makes survival methods unique; considers that some subjects leave the study for reasons
other than the event (e.g., lost to follow-up, end of study period)
2. Considers that all subjects do not enter the study at the same time
3. Standard methods of statistical analysis such as t-tests and linear or logistic regression may not be
appropriately applied to survival data because of censoring.

ACCP Updates in Therapeutics® 2019: Pharmacotherapy Preparatory Review and Recertification Course

1-20
Biostatistics: A Refresher

B. Estimating the Survival Function


1. Kaplan-Meier method
a. Uses survival times (or censored survival times) to estimate the proportion of people who would
survive a given length of time under the same circumstances
b. Allows the production of a table (life table) and a graph (survival curve)
c. We can visually evaluate the curves, but we need a test to evaluate them formally.
2. Log-rank test: Compare the survival distributions between two or more groups.
a. This test precludes an analysis of the effects of several variables or the magnitude of difference
between groups or the CI (see the text that follows for the Cox proportional hazards model).
b. H0: No difference in survival between the two populations
c. Log-rank test uses several assumptions.
i. Random sampling and subjects chosen independently
ii. Consistent criteria for entry or end point
iii. Baseline survival rate does not change as time progresses.
iv. Censored subjects have the same average survival time as uncensored subjects.
3. Cox proportional hazards model
a. Most popular method to evaluate the impact of covariates; reported (graphically) like Kaplan-Meier
b. Investigates several variables at a time
c. Actual method of construction and calculation is complex.
d. Compares survival in two or more groups after adjusting for other variables
e. Allows calculation of a hazard ratio (and CI)

XI. SELECTED REPRESENTATIVE STATISTICAL TESTS

Table 7. Representative Statistical Testsa

Type of > 2 Groups > 2 Groups


2 Groups (independent) 2 Groups (related)
Variable (independent) (related)
Nominal χ2 or Fisher exact test McNemar test χ2 Cochran Q
Ordinal Wilcoxon rank sum Wilcoxon signed-rank Kruskal-Wallis Friedman ANOVA
Mann-Whitney U test Sign test (MCP)
Wilcoxon−Mann-Whitney
Continuous Equal variance t-test Paired t-test One-way ANOVA Repeated-measures
No factors Unequal variance t-test (MCP) ANOVA
1 factor ANCOVA Two-way repeated- Two-way ANOVA Two-way repeated-
measures ANOVA (MCP) measures ANOVA
For expanded table, see: DiCenzo R, ed. Clinical Pharmacist’s Guide to Biostatistics and Literature Evaluation. Lenexa, KS: ACCP, 2015.
a

ANCOVA = analysis of covariance; ANOVA = analysis of variance; MCP = multiple-comparisons procedure.

ACCP Updates in Therapeutics® 2019: Pharmacotherapy Preparatory Review and Recertification Course

1-21
Biostatistics: A Refresher

REFERENCES

1.
Crawford SL. Correlation and regression. 13. Jones SR, Carley S, Harrison M. An introduction
Circulation 2006;114:2083-8. to power and sample size estimation. Emerg Med J
2.
Davis RB, Mukamal KJ. Hypothesis testing: 2003;20:453-8.
means. Circulation 2006;114:1078-82. 14. Kier KL. Biostatistical methods in epidemiology.
3.
DeYoung GR. Understanding biostatistics: an Pharmacotherapy 2011;31:9-22.
approach for the clinician. In: Zarowitz B, Shumock 15.
Kusuoka H, Hoffman JIE. Advice on statisti-
G, Dunsworth T, et al., eds. Pharmacotherapy Self- cal analysis for circulation research. Circ Res
Assessment Program, 5th ed. Kansas City, MO: 2002;91:662-71.
ACCP, 2005:1-20. 16.
Larson MG. Analysis of variance. Circulation
4.
DiCenzo R, ed. Clinical Pharmacist’s Guide to 2008;117:115-21.
Biostatistics and Literature Evaluation. Lenexa, 17. Larson MG. Descriptive statistics and graphical
KS: ACCP, 2015. displays. Circulation 2006;114:76-81.
5. Gaddis ML, Gaddis GM. Introduction to biosta- 18.
Overholser BR, Sowinski KM. Biostatistics
tistics, part 1: basic concepts. Ann Emerg Med primer, part 1. Nutr Clin Pract 2007;22:629-35.
1990;19:86-9.
19.
Overholser BR, Sowinski KM. Biostatistics
6. Gaddis ML, Gaddis GM. Introduction to biostatis- primer, part 2. Nutr Clin Pract 2008;23:76-84.
tics, part 2: descriptive statistics. Ann Emerg Med
20.
Rao SR, Schoenfeld DA. Survival methods.
1990;19:309-15.
Circulation 2007;115:109-13.
7. Gaddis ML, Gaddis GM. Introduction to biosta-
21.
Rector TS, Hatton RC. Statistical concepts and
tistics, part 3: sensitivity, specificity, predictive
methods used to evaluate pharmacotherapy. In:
value, and hypothesis testing. Ann Emerg Med
Zarowitz B, Shumock G, Dunsworth T, et al., eds.
1990;19:591-7.
Pharmacotherapy Self-Assessment Program, 2nd
8. Gaddis ML, Gaddis GM. Introduction to biosta- ed. Kansas City, MO: ACCP, 1997:130-61.
tistics, part 4: statistical inference techniques in
22.
Strassels SA. Biostatistics. In: Dunsworth
hypothesis testing. Ann Emerg Med 1990;19:820-5.
TS, Richardson MM, Chant C, et al., eds.
9. Gaddis ML, Gaddis GM. Introduction to biosta- Pharmacotherapy Self-Assessment Program, 6th
tistics, part 5: statistical inference techniques for ed. Lenexa, KS: ACCP, 2007:1-16.
hypothesis testing with nonparametric data. Ann
23. Sullivan LM. Estimation from samples. Circulation
Emerg Med 1990;19:1054-9.
2006;114:445-9.
10. Gaddis ML, Gaddis GM. Introduction to biostatis-
24. Tsuyuki RT, Garg S. Interpreting data in cardiovas-
tics, part 6: correlation and regression. Ann Emerg
cular disease clinical trials: a biostatistical toolbox.
Med 1990;19:1462-8.
In: Richardson MM, Chant C, Cheng JWM, et al.,
11.
Harper ML. Biostatistics for the clinician. In: eds. Pharmacotherapy Self-Assessment Program,
Zarowitz B, Shumock G, Dunsworth T, et al., eds. 7th ed. Lenexa, KS: ACCP, 2010:241-55.
Pharmacotherapy Self-Assessment Program, 4th
25. Windish DM, Huot SJ, Green ML. Medicine resi-
ed. Kansas City, MO: ACCP, 2002:183-200.
dent’s understanding of the biostatistics and results
12. Hayney MS, Meek PD. Essential clinical concepts in the medical literature. JAMA 2007;298:1010-22.
of biostatistics. In: Carter BL, Lake KD, Raebel
MA, et al., eds. Pharmacotherapy Self-Assessment
Program, 3rd ed. Kansas City, MO: ACCP,
1999:19-46.

ACCP Updates in Therapeutics® 2019: Pharmacotherapy Preparatory Review and Recertification Course

1-22
Biostatistics: A Refresher

ANSWERS AND EXPLANATIONS TO SELF-ASSESSMENT QUESTIONS

1. Answer: A 4. Answer: C
The NYHA functional class is an ordinal scale from Sample sizes need not be equal to use a t-test (Answer
I (no symptoms) to IV (severe symptoms). Neither A is incorrect). Body mass index data are not ordinal
ANOVA nor ANCOVA is appropriate for ordinal or but continuous; thus, a t-test is appropriate (Answer
noncontinuous data (Answers C and D are incorrect). B is incorrect). The assumption of equal variances is
The Wilcoxon signed-rank test is an appropriate non- required to use any parametric test (Answer C is cor-
parametric test to use for paired ordinal data, such as rect). A specific value for power is not required to use a
the change in NYHA functional class over time on the test (Answer D is incorrect).
same person (Answer B is incorrect). The Kruskal-
Wallis test is the nonparametric analog of a one-way 5. Answer: B
ANOVA and is appropriate for this analysis (Answer The reporting of the mean difference and CI is thought
A is correct). by many to be a superior means of presenting the results
from a clinical trial because it describes both precision
2. Answer: C and statistical significance, as compared with a p-value,
You cannot determine which finding is more important which distills everything into one value, making
(in this case, the best drug) on the basis of the p-value Answer A incorrect. The presentation of the data in this
(i.e., a lower p-value does not mean more important) manner clearly shows all the necessary information for
(Answer B is incorrect). All statistically significant making the appropriate conclusion. To assess statistical
results are interpreted as significant without respect to significance by use of CIs, the 95% CI (corresponding
the size of the p-value. This trial had four independent to the 5% type I error rate used in most studies) may
samples, and use of the unpaired (independent sam- not contain zero (signifying no difference between men
ples) t-test is not appropriate because it requires several and women) for the mean difference, making Answer
unnecessary tests and increases the chances of making D incorrect. Answer B is correct because the p-value of
a type I error (Answer A is incorrect). In this setting, less than 0.05 corresponds to the 95% CI in that item. To
ANOVA is the correct test (Answer C is correct), fol- evaluate Answer C, we would need to know the 99% CI.
lowed by a multiple-comparisons procedure to deter-
mine where the actual differences between groups lie. 6. Answer: D
A paired t-test is inappropriate because this is a paral- Answer A is incorrect because it uses unconven-
lel-group trial (Answer D is incorrect). Use of ANOVA tional approaches to determine statistical significance.
in this case assumes a normal distribution and equal Although this can be done, it is unlikely to be accepted
variance in each of the four groups. by other readers and investigators. This study observed
a nonsignificant increase in HDL concentration
3. Answer: D between the two groups. With a small sample size, such
The typical a priori α error (type I) rate is 5% (i.e., when as the one used in this study, there is always concern
the study was designed, the error rate was designed to about adequate power to observe a difference between
be 5% or less). The actual type I error rate is reported the two treatments. A difference may exist between
in the question as 0.01 (1%) (Answer A is incorrect). these two drugs, but the number of subjects studied
Answers B and C are related; the study did have enough may be too small to detect it statistically. Answer D is
power because a statistically significant difference correct because, with the lack of information provided
was observed. Similarly, a type II error was not made in this narrative, it is not possible to estimate power;
because this error has to do with not finding a differ- thus, more information is needed. Answer B may be
ence when one truly exists. In this question, the type I correct, but without first addressing the question of ade-
error rate is 1%, the value of the p-value (Answer D is quate power, it would be an inappropriate conclusion to
correct). draw. Answer C is incorrect because even though the
new drug increased HDL concentration more than the
other treatment, it is inappropriate to conclude that it is
better because, statistically, it is not.

ACCP Updates in Therapeutics® 2019: Pharmacotherapy Preparatory Review and Recertification Course

1-23
Biostatistics: A Refresher

7. Answer: B 9. Answer: C
The primary end point in this study, the percentage of Regression analysis is the most effective way to develop
subjects at or below the target blood pressure, is nom- models to predict outcomes or variables (Answer C is
inal data. Subjects at target blood pressure (less than correct). There are many different types of regression,
140/90 mm Hg) are defined as having reached the tar- but all share the ability to evaluate the impact of mul-
get. This type of data requires either a chi-square test tiple variables simultaneously on an outcome variable.
or a Fisher exact test (depending on the sample size or, Correlation analysis is used to assess the association
more accurately, the number of counts in the individ- between two (or more) variables, not to make predic-
ual contingency table cells) (Answer B is correct). An tions (Answer A is incorrect). Kaplan-Meier curves are
independent samples t-test is not appropriate because used to graphically depict survival curves or time to an
actual blood pressure values are not being compared event (Answer B is incorrect). Confidence intervals are
(at least not in this question or this end point) (Answer not used to make predictions (Answer D is incorrect).
A is incorrect). If we were comparing the actual blood
pressure between the two groups, the test might be
appropriate if parametric assumptions were met. The
Wilcoxon signed-rank test is the appropriate nonpara-
metric test for comparing paired samples (usually in
a crossover trial) (Answer C is incorrect). Finally, a
one-sample t-test is used to compare the mean of a sin-
gle group with the mean of a reference group. This is
also incorrect in this situation because two groups are
being compared (Answer D is incorrect).

8. Answer: A
Detecting the smaller difference between the treat-
ments requires more power. Power can be increased
in several different ways. Answer A is correct because
the most common approach is to increase the sample
size, which is expensive for the researchers. Answer
D is incorrect because smaller sample sizes diminish
a study’s ability to detect differences between groups.
Power can also be increased by increasing α, but doing
so increases the chances of a type I error. Answer B
decreases α, thus making it more difficult to detect dif-
ferences between groups. Answer C certainly makes it
easier to detect a difference between the two groups,
but it uses an unconventional α value and is thus not the
most appropriate technique.

ACCP Updates in Therapeutics® 2019: Pharmacotherapy Preparatory Review and Recertification Course

1-24

You might also like