A measure of interrater absolute agreement for ordinal scales is proposed capitalizing on the dispersion index for ordinal variables proposed by Giuseppe Leti. The procedure allows to overcome the limits affecting traditional measures of interrater agreement in different fields of application. An unbiased estimator of the proposed measure is introduced and its sampling properties are investigated. In order to construct confidence intervals for interrater absolute agreement both asymptotic results and bootstrapping methods are used and their performance is evaluated. Simulated data are employed to demonstrate the accuracy and practical utility of the new procedure for assessing agreement. Finally, an application to a real case is provided.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Allsbrook WC, Mangold KA, Johnson MH, Lane RB, Lane CG, Amin MB (2001) Interobserver reproducibility of Gleason grading of prostatic carcinoma: urologic pathologists. Hum Pathol 32(1):74–80
Billingsley P (1995) Probability and measure, 3rd edn. Wiley, New York
Booth JG, Butler RW, Hall P (1994) Bootstrap methods for finite populations. J Am Stat Assoc 89(428):1282–1289
Bove G, Nuzzo E, Serafini A (2018) Measurement of interrater agreement for the assessment of language proficiency. In: Capecchi S, Di Iorio F, Simone R. ASMOD 2018: proceedings of the advanced statistical modelling for ordinal data conference. Università Federico II di Napoli. FedOAPress, Napoli pp 61–68
Burke MJ, Finkelstein LM, Dusig MS (1999) On average deviation indices for estimating interrater agreement. Organ Res Methods 2:49–68
Cohen A, Doveh E, Eick U (2001) Statistical properties of the rwg index of agreement. Psychol Methods 6(3):297–310
Cumming A, Kantor R, Powers DE (2002) Decision making while rating ESL/EFL writing tasks: a descriptive framework. Mod Lang J 86:67–96
Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 7(1):1–26
Grilli L, Rampichini C (2002) Scomposizione della dispersione per variabili statistiche ordinali [Dispersion decomposition for ordinal variables]. Statistica 62:111–116
Gross S (1980). Median estimation in sample surveys. In: Proceedings of the section on survey research methods. American Statistical Association, pp. 181–184
James LJ, Demaree RG, Wolf G (1984) Estimating within-group interrater reliability with and without response bias. J Appl Psychol 69:85–98
James LJ, Demaree RG, Wolf G (1993) rwg: an assessment of within-group interrater agreement. J Appl Psychol 78:306–309
James LR, Demaree RG, Wolf G (1984) Estimating within-group interrater reliability with and without response bias. J Appl Psychol 69:85–98
Kuiken F, Vedder I (2014) Rating written performance: What do raters do and why? Lang Test 31(3):329–348
Kuiken F, Vedder I (2017) Functional adequacy in L2 writing: towards a new rating scale. Lang Test 34:321–336
LeBreton JM, Burgess JRD, Kaiser RB, Atchley EK, James LR (2003) The restriction of variance hypothesis and interrater reliability and agreement: are ratings from multiple sources really dissimilar? Organ Res Methods 6:80–128
LeBreton JM, James LR, Lindell MK (2005) Recent issues regarding rwg, r*wg, rwg(j), and r*wg(j). Organ Res Methods 8(1):128–138
LeBreton JM, Senter JL (2008) Answers to 20 questions about interrater reliability and interrater agreement. Organ Res Methods 11(4):815–852
Leti G (1983) Statistica descrittiva. Il Mulino, Bologna
Lindell MK, Brandt CJ (1997) Measuring interrater agreement for ratings of a single target. Appl Psychol Meas 21:271–278
Lomnicki ZA (1952) The standard error of Gini’s mean difference. Ann Math Stat 23(14):635–637
Mashreghi Z, Haziza D, Léger C (2016) A survey of bootstrap methods in finite population sampling. Stati Surv 10:1–52
McGraw KO, Wong SP (1996) Forming inferences about some intraclass correlation coefficients. Psychol Methods 1:30–46
Nuzzo E, Bove G (2020) Assessing functional adequacy across tasks: a comparison of learners and native speakers’ written texts. Euro Am J Appl Linguist Lang 7(2):9–27
Piccarreta R (2001) A new measure of nominal-ordinal association. J Appl Stat 28(1):107–120
Sheather SJ, Jones MC (1991) A reliable data-based bandwidth selection method for Kernel density estimation. J R Stat Soc Ser B 53:683–690
Shrout PE, Fleiss JL (1979) Intraclass correlations: uses in assessing reliability. Psychol Bull 86:420–428
Thompson I (1991) Foreign accents revisited: factors relating to transfer of accent from the first language to a second language. Lang Speech 24(3):265–272
von Eye A, Mun EY (2005) Analyzing rater agreement. Manifest variable methods. Lawrence Erlbaum Associates, Mahwah, New Jersey
The authors are grateful to the referees for very careful reading of the manuscript and thoughtful comments. We dedicate this paper to the memory of Professor Giovanni Battista Tranquilli for being a source of motivation with his scientific and human support.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Bove, G., Conti, P.L. & Marella, D. A measure of interrater absolute agreement for ordinal categorical data. Stat Methods Appl (2020). https://doi.org/10.1007/s10260-020-00551-5
- Ordinal data
- Interrater agreement