Intraclass Correlation
Intraclass Correlation
where
Later versions of this statistic [3] used the degrees of freedom 2N −1 in the denominator for calculating s2
and N −1 in the denominator for calculating r, so that s2 becomes unbiased, and r becomes unbiased if s is
known.
The key difference between this ICC and the interclass (Pearson) correlation is that the data are pooled to
estimate the mean and variance. The reason for this is that in the setting where an intraclass correlation is
desired, the pairs are considered to be unordered. For example, if we are studying the resemblance of twins,
there is usually no meaningful way to order the values for the two individuals within a twin pair. Like the
interclass correlation, the intraclass correlation for paired data will be confined to the interval [−1, +1].
The intraclass correlation is also defined for data sets with groups having more than 2 values. For groups
consisting of three values, it is defined as[3]
where
As the number of items per group grows, so does the number of cross-product terms in this expression
grows. The following equivalent form is simpler to calculate:
where K is the number of data values per group, and is the sample mean of the nth group.[3] This form
is usually attributed to Harris.[4] The left term is non-negative; consequently the intraclass correlation must
satisfy
which can be interpreted as the fraction of the total variance that is due to variation between groups. Ronald
Fisher devotes an entire chapter to intraclass correlation in his classic book Statistical Methods for Research
Workers.[3]
For data from a population that is completely noise, Fisher's formula produces ICC values that are
distributed about 0, i.e. sometimes being negative. This is because Fisher designed the formula to be
unbiased, and therefore its estimates are sometimes overestimates and sometimes underestimates. For small
or 0 underlying values in the population, the ICC calculated from a sample may be negative.
where Yij is the ith observation in the jth group, μ is an unobserved overall mean, αj is an unobserved
random effect shared by all values in group j, and εij is an unobserved noise term.[5] For the model to be
identified, the αj and εij are assumed to have expected value zero and to be uncorrelated with each other.
Also, the αj are assumed to be identically distributed, and the εij are assumed to be identically distributed.
2 2
The variance of αj is denoted σα and the variance of εij is denoted σε .
With this framework, the ICC is the correlation of two observations from the same group.
[Proof]
, , s and s independent
and s are independent from s.
An advantage of this ANOVA framework is that different groups can have different numbers of data
values, which is difficult to handle using the earlier ICC statistics. This ICC is always non-negative,
allowing it to be interpreted as the proportion of total variance that is "between groups." This ICC can be
generalized to allow for covariate effects, in which case the ICC is interpreted as capturing the within-class
similarity of the covariate-adjusted data values.[8]
This expression can never be negative (unlike Fisher's original formula) and therefore, in samples from a
population which has an ICC of 0, the ICCs in the samples will be higher than the ICC of the population.
A number of different ICC statistics have been proposed, not all of which estimate the same population
parameter. There has been considerable debate about which ICC statistics are appropriate for a given use,
since they may produce markedly different results for the same data.[9][10]
An important property of the Pearson correlation is that it is invariant to application of separate linear
transformations to the two variables being compared. Thus, if we are correlating X and Y, where, say,
Y = 2X + 1, the Pearson correlation between X and Y is 1 — a perfect correlation. This property does not
make sense for the ICC, since there is no basis for deciding which transformation is applied to each value in
a group. However, if all the data in all groups are subjected to the same linear transformation, the ICC does
not change.
Since the intraclass correlation coefficient gives a composite of intra-observer and inter-observer variability,
its results are sometimes considered difficult to interpret when the observers are not exchangeable.
Alternative measures such as Cohen's kappa statistic, the Fleiss kappa, and the concordance correlation
coefficient[12] have been proposed as more suitable measures of agreement among non-exchangeable
observers.
One-way random effects: each subject is measured by a different set of k randomly selected
raters;
Two-way random: k raters are randomly selected, then, each subject is measured by the
same set of k raters;
Two-way mixed: k fixed raters are defined. Each subject is measured by the k raters.
Number of measurements:
Single measures: even though more than one measure is taken in the experiment, reliability
is applied to a context where a single measure of a single rater will be performed;
Average measures: the reliability is applied to a context where measures of k raters will be
averaged for each subject.
Absolute agreement: the agreement between two raters is of interest, including systematic
errors of both raters and random residual errors;
Consistency: in the context of repeated measurements by the same rater, systematic errors
of the rater are canceled and only the random residual error is kept.
The consistency ICC cannot be estimated in the one-way random effects model, as there is no way to
separate the inter-rater and residual variances.
An overview and re-analysis of the three models for the single measures ICC, with an alternative recipe for
their use, has also been presented by Liljequist et al. (2019).[18]
Interpretation
Cicchetti (1994)[19] gives the following often quoted guidelines for interpretation for kappa or ICC inter-
rater agreement measures:
See also
Correlation ratio
Design effect
Effect size#Eta-squared (η2)
References
1. Koch GG (1982). "Intraclass correlation coefficient". In Samuel Kotz and Norman L. Johnson
(ed.). Encyclopedia of Statistical Sciences. Vol. 4. New York: John Wiley & Sons. pp. 213–
217.
2. Bartko JJ (August 1966). "The intraclass correlation coefficient as a measure of reliability".
Psychological Reports. 19 (1): 3–11. doi:10.2466/pr0.1966.19.1.3 (https://doi.org/10.2466%2
Fpr0.1966.19.1.3). PMID 5942109 (https://pubmed.ncbi.nlm.nih.gov/5942109).
S2CID 145480729 (https://api.semanticscholar.org/CorpusID:145480729).
3. Fisher RA (1954). Statistical Methods for Research Workers (https://archive.org/details/statist
icalmethoe7fish) (Twelfth ed.). Edinburgh: Oliver and Boyd. ISBN 978-0-05-002170-5.
4. Harris JA (October 1913). "On the Calculation of Intra-Class and Inter-Class Coefficients of
Correlation from Class Moments when the Number of Possible Combinations is Large".
Biometrika. 9 (3/4): 446–472. doi:10.1093/biomet/9.3-4.446 (https://doi.org/10.1093%2Fbiom
et%2F9.3-4.446). JSTOR 2331901 (https://www.jstor.org/stable/2331901).
5. Donner A, Koval JJ (March 1980). "The estimation of intraclass correlation in the analysis of
family data". Biometrics. 36 (1): 19–25. doi:10.2307/2530491 (https://doi.org/10.2307%2F25
30491). JSTOR 2530491 (https://www.jstor.org/stable/2530491). PMID 7370372 (https://pub
med.ncbi.nlm.nih.gov/7370372).
6. Proof that ICC in the anova model is the correlation of two items: ocram [1] (https://stats.stack
exchange.com/users/3019/ocram), Understanding the intra-class correlation coefficient, URL
(version: 2012-12-05): [2] (https://stats.stackexchange.com/q/45201)
7. dsaxton (https://stats.stackexchange.com/users/78861/dsaxton), Random effects model:
Observations from the same level have covariance $\sigma^2$?, URL (version: 2016-03-22)
link (https://stats.stackexchange.com/a/203052/253)
8. Stanish W, Taylor N (1983). "Estimation of the Intraclass Correlation Coefficient for the
Analysis of Covariance Model". The American Statistician. 37 (3): 221–224.
doi:10.2307/2683375 (https://doi.org/10.2307%2F2683375). JSTOR 2683375 (https://www.j
stor.org/stable/2683375).
9. Müller R, Büttner P (December 1994). "A critical discussion of intraclass correlation
coefficients". Statistics in Medicine. 13 (23–24): 2465–76. doi:10.1002/sim.4780132310 (http
s://doi.org/10.1002%2Fsim.4780132310). PMID 7701147 (https://pubmed.ncbi.nlm.nih.gov/7
701147). See also comment:
Vargha P (1997). "Letter to the Editor". Statistics in Medicine. 16 (7): 821–823.
doi:10.1002/(SICI)1097-0258(19970415)16:7<821::AID-SIM558>3.0.CO;2-B (https://doi.
org/10.1002%2F%28SICI%291097-0258%2819970415%2916%3A7%3C821%3A%3A
AID-SIM558%3E3.0.CO%3B2-B).
10. McGraw KO, Wong SP (1996). "Forming inferences about some intraclass correlation
coefficients". Psychological Methods. 1: 30–46. doi:10.1037/1082-989X.1.1.30 (https://doi.or
g/10.1037%2F1082-989X.1.1.30). There are several errors in the article:
McGraw KO, Wong SP (1996). "Correction to McGraw and Wong (1996)". Psychological
Methods. 1 (4): 390. doi:10.1037/1082-989x.1.4.390 (https://doi.org/10.1037%2F1082-98
9x.1.4.390).
11. Shrout PE, Fleiss JL (March 1979). "Intraclass correlations: uses in assessing rater
reliability". Psychological Bulletin. 86 (2): 420–8. doi:10.1037/0033-2909.86.2.420 (https://do
i.org/10.1037%2F0033-2909.86.2.420). PMID 18839484 (https://pubmed.ncbi.nlm.nih.gov/1
8839484).
12. Nickerson CA (December 1997). "A Note on 'A Concordance Correlation Coefficient to
Evaluate Reproducibility' ". Biometrics. 53 (4): 1503–1507. doi:10.2307/2533516 (https://doi.
org/10.2307%2F2533516). JSTOR 2533516 (https://www.jstor.org/stable/2533516).
13. Stoffel MA, Nakagawa S, Schielzeth J (2017). "rptR: repeatability estimation and variance
decomposition by generalized linear mixed-effects models" (https://doi.org/10.1111%2F204
1-210x.12797). Methods in Ecology and Evolution. 8 (11): 1639–1644. doi:10.1111/2041-
210x.12797 (https://doi.org/10.1111%2F2041-210x.12797). ISSN 2041-210X (https://www.w
orldcat.org/issn/2041-210X).
14. MacLennan RN (November 1993). "Interrater Reliability with SPSS for Windows 5.0". The
American Statistician. 47 (4): 292–296. doi:10.2307/2685289 (https://doi.org/10.2307%2F26
85289). JSTOR 2685289 (https://www.jstor.org/stable/2685289).
15. McGraw KO, Wong SP (1996). "Forming Inferences About Some Intraclass Correlation
Coefficients". Psychological Methods. 1 (1): 30–40. doi:10.1037/1082-989X.1.1.30 (https://d
oi.org/10.1037%2F1082-989X.1.1.30).
16. Stata user's guide release 15 (https://www.stata.com/manuals/r.pdf) (PDF). College Station,
Texas: Stata Press. 2017. pp. 1101–1123. ISBN 978-1-59718-249-2.
17. Howell DC. "Intra-class correlation coefficients" (https://www.uvm.edu/~dhowell/methods9/S
upplements/icc/More%20on%20ICCs.pdf) (PDF).
18. Liljequist D, Elfving B, Skavberg Roaldsen K (2019). "Intraclass correlation - A discussion
and demonstration of basic features" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC664548
5). PLOS ONE. 14 (7): e0219854. doi:10.1371/journal.pone.0219854 (https://doi.org/10.137
1%2Fjournal.pone.0219854). PMC 6645485 (https://www.ncbi.nlm.nih.gov/pmc/articles/PM
C6645485). PMID 31329615 (https://pubmed.ncbi.nlm.nih.gov/31329615).
19. Cicchetti DV (1994). "Guidelines, criteria, and rules of thumb for evaluating normed and
standardized assessment instruments in psychology". Psychological Assessment. 6 (4):
284–290. doi:10.1037/1040-3590.6.4.284 (https://doi.org/10.1037%2F1040-3590.6.4.284).
20. Koo TK, Li MY (June 2016). "A Guideline of Selecting and Reporting Intraclass Correlation
Coefficients for Reliability Research" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC49131
18). Journal of Chiropractic Medicine. 15 (2): 155–63. doi:10.1016/j.jcm.2016.02.012 (https://
doi.org/10.1016%2Fj.jcm.2016.02.012). PMC 4913118 (https://www.ncbi.nlm.nih.gov/pmc/ar
ticles/PMC4913118). PMID 27330520 (https://pubmed.ncbi.nlm.nih.gov/27330520).
Others
A comparison of two indices for the intraclass correlation coefficient (https://pubmed.ncbi.nl
m.nih.gov/22396135/)
External links
AgreeStat 360: cloud-based inter-rater reliability analysis, Cohen's kappa, Gwet's AC1/AC2,
Krippendorff's alpha, Brennan-Prediger, Fleiss generalized kappa, intraclass correlation
coefficients (https://agreestat360.com/)
A useful online tool that allows calculation of the different types of ICC (http://department.ob
g.cuhk.edu.hk/researchsupport/IntraClass_correlation.asp)