Statistics Glossary
Statistics Glossary
Addition rule (for mutually exclusive events): The concept that if the events A and B are
mutually exclusive, then the probability of A or B happening is the sum of the probabilities of A
and B
B
Bayes’ rule: (Refer to Bayes’ theorem)
Bayes' theorem: A math formula for stating that for any two events A and B, the probability of
A given B equals the probability of A multiplied by the probability of B given A divided by the
probability of B; also referred to as Bayes’ rule
Bayesian statistics: A powerful method for analyzing and interpreting data in modern data
analytics; also referred to as Bayesian inference
Binomial distribution: A discrete distribution that models the probability of events with only
two possible outcomes: success or failure
C
Central Limit Theorem: The idea that the sampling distribution of the mean approaches a
normal distribution as the sample size increases
Classical probability: A type of probability based on formal reasoning about events with
equally likely outcomes
Cluster random sample: A probability sampling method that divides a population into
clusters, randomly selects certain clusters, and includes all members from the chosen clusters
in the sample
Complement rule: A concept stating that the probability that event A does not occur is one
minus the probability of A
Conditional probability: The probability of an event occurring given that another event has
already occurred
Confidence level: A measure that expresses the uncertainty of the estimation process
Continuous random variable: A variable that takes all the possible values in some range of
numbers
D
Dependent events: The concept that two events are dependent if one event changes the
probability of the other event
Descriptive statistics: A type of statistics that summarizes the main features of a dataset
Discrete random variable: A variable that has a countable number of possible values
E
Econometrics: A branch of economics that uses statistics to analyze economic problems
Empirical rule: A concept stating that the values on a normal curve are distributed in a regular
pattern, based on their distance from the mean
F
False positive: A test result that indicates something is present when it really is not
I
Independent events: The concept that two events are independent if the occurrence of one
event does not change the probability of the other event
Inferential statistics: A type of statistics that uses sample data to draw conclusions about a
larger population
Interquartile range: The distance between the first quartile (Q1) and the third quartile (Q3)
L
Literacy rate: The percentage of the population in a given age group that can read and write
Lower limit: When constructing an interval, the calculation of the sample means minus the
margin of error
M
Margin of error: The maximum expected difference between a population parameter and a
sample estimate
Measure of dispersion: A value that represents the spread of a dataset, or the amount of
variation in data points
Measure of position: A method by which the position of a value in relation to other values in a
dataset is determined
Multiplication rule (for independent events): The concept that if the events A and B are
independent, then the probability of both A and B happening is the probability of A multiplied
by the probability of B
Mutually exclusive: The concept that two events are mutually exclusive if they cannot occur
at the same time
N
Non-probability sampling: A sampling method that is based on convenience or the personal
preferences of the researcher, rather than random selection
Nonresponse bias: When certain groups of people are less likely to provide responses
O
Objective probability: A type of probability based on statistics, experiments, and
mathematical measurements
P
Parameter: A characteristic of a population
Point estimate: A calculation that uses a single value to estimate a population parameter
Poisson distribution: A probability distribution that models the probability that a certain
number of events will occur during a specific time period
Probability: The branch of mathematics that deals with measuring and quantifying
uncertainty
Probability distribution: A function that describes the likelihood of the possible outcomes of
a random event
Probability sampling: A sampling method that uses random selection to generate a sample
R
Random experiment: A process whose outcome cannot be predicted with certainty
Random variable: A variable that represents the values for the possible outcomes of a
random event
Range: The difference between the largest and smallest value in a dataset
S
Sample: A subset of a population
Sample size: The number of individuals or items chosen for a study or experiment
Sample space: The set of all possible values for a random variable
Sampling with replacement: When a population element can be selected more than one
time
Sampling without replacement: When a population element can be selected only one time
Standard error of the mean: The sample standard deviation divided by the square root of the
sample size
Standard error of the proportion: The square root of the sample proportion times one minus
the sample proportion divided by the sample size
Statistical significance: The claim that the results of a test or experiment are not explainable
by chance alone
Stratified random sample: A probability sampling method that divides a population into
groups and randomly selects some members from each group to be in the sample
Systematic random sample: A probability sampling method that puts every member of a
population into an ordered sequence, chooses a random starting point in the sequence, and
selects members for the sample at regular intervals
T
Target population: The complete set of elements that someone is interested in knowing
more about
U
Undercoverage bias: When some members of a population are inadequately represented in a
sample
Upper limit: When constructing an interval, the calculation of the sample means plus the
margin of error
V
Variance: The average of the squared difference of each data point from the mean
Z
Z-score: A measure of how many standard deviations below or above the population mean a
data point is