A measure of interrater absolute agreement for ordinal categorical data

Abstract

A measure of interrater absolute agreement for ordinal scales is proposed capitalizing on the dispersion index for ordinal variables proposed by Giuseppe Leti. The procedure allows to overcome the limits affecting traditional measures of interrater agreement in different fields of application. An unbiased estimator of the proposed measure is introduced and its sampling properties are investigated. In order to construct confidence intervals for interrater absolute agreement both asymptotic results and bootstrapping methods are used and their performance is evaluated. Simulated data are employed to demonstrate the accuracy and practical utility of the new procedure for assessing agreement. Finally, an application to a real case is provided.

This is a preview of subscription content, access via your institution.

Fig. 1

References

  1. Allsbrook WC, Mangold KA, Johnson MH, Lane RB, Lane CG, Amin MB (2001) Interobserver reproducibility of Gleason grading of prostatic carcinoma: urologic pathologists. Hum Pathol 32(1):74–80

    Article  Google Scholar 

  2. Billingsley P (1995) Probability and measure, 3rd edn. Wiley, New York

    Google Scholar 

  3. Booth JG, Butler RW, Hall P (1994) Bootstrap methods for finite populations. J Am Stat Assoc 89(428):1282–1289

    MathSciNet  Article  Google Scholar 

  4. Bove G, Nuzzo E, Serafini A (2018) Measurement of interrater agreement for the assessment of language proficiency. In: Capecchi S, Di Iorio F, Simone R. ASMOD 2018: proceedings of the advanced statistical modelling for ordinal data conference. Università Federico II di Napoli. FedOAPress, Napoli pp 61–68

  5. Burke MJ, Finkelstein LM, Dusig MS (1999) On average deviation indices for estimating interrater agreement. Organ Res Methods 2:49–68

    Article  Google Scholar 

  6. Cohen A, Doveh E, Eick U (2001) Statistical properties of the rwg index of agreement. Psychol Methods 6(3):297–310

    Article  Google Scholar 

  7. Cumming A, Kantor R, Powers DE (2002) Decision making while rating ESL/EFL writing tasks: a descriptive framework. Mod Lang J 86:67–96

    Article  Google Scholar 

  8. Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 7(1):1–26

    MathSciNet  Article  Google Scholar 

  9. Grilli L, Rampichini C (2002) Scomposizione della dispersione per variabili statistiche ordinali [Dispersion decomposition for ordinal variables]. Statistica 62:111–116

    MathSciNet  MATH  Google Scholar 

  10. Gross S (1980). Median estimation in sample surveys. In: Proceedings of the section on survey research methods. American Statistical Association, pp. 181–184

  11. James LJ, Demaree RG, Wolf G (1984) Estimating within-group interrater reliability with and without response bias. J Appl Psychol 69:85–98

    Article  Google Scholar 

  12. James LJ, Demaree RG, Wolf G (1993) rwg: an assessment of within-group interrater agreement. J Appl Psychol 78:306–309

    Article  Google Scholar 

  13. James LR, Demaree RG, Wolf G (1984) Estimating within-group interrater reliability with and without response bias. J Appl Psychol 69:85–98

    Article  Google Scholar 

  14. Kuiken F, Vedder I (2014) Rating written performance: What do raters do and why? Lang Test 31(3):329–348

    Article  Google Scholar 

  15. Kuiken F, Vedder I (2017) Functional adequacy in L2 writing: towards a new rating scale. Lang Test 34:321–336

    Article  Google Scholar 

  16. LeBreton JM, Burgess JRD, Kaiser RB, Atchley EK, James LR (2003) The restriction of variance hypothesis and interrater reliability and agreement: are ratings from multiple sources really dissimilar? Organ Res Methods 6:80–128

    Article  Google Scholar 

  17. LeBreton JM, James LR, Lindell MK (2005) Recent issues regarding rwg, r*wg, rwg(j), and r*wg(j). Organ Res Methods 8(1):128–138

    Article  Google Scholar 

  18. LeBreton JM, Senter JL (2008) Answers to 20 questions about interrater reliability and interrater agreement. Organ Res Methods 11(4):815–852

    Article  Google Scholar 

  19. Leti G (1983) Statistica descrittiva. Il Mulino, Bologna

    Google Scholar 

  20. Lindell MK, Brandt CJ (1997) Measuring interrater agreement for ratings of a single target. Appl Psychol Meas 21:271–278

    Article  Google Scholar 

  21. Lomnicki ZA (1952) The standard error of Gini’s mean difference. Ann Math Stat 23(14):635–637

    MathSciNet  Article  Google Scholar 

  22. Mashreghi Z, Haziza D, Léger C (2016) A survey of bootstrap methods in finite population sampling. Stati Surv 10:1–52

    MathSciNet  Article  Google Scholar 

  23. McGraw KO, Wong SP (1996) Forming inferences about some intraclass correlation coefficients. Psychol Methods 1:30–46

    Article  Google Scholar 

  24. Nuzzo E, Bove G (2020) Assessing functional adequacy across tasks: a comparison of learners and native speakers’ written texts. Euro Am J Appl Linguist Lang 7(2):9–27

    Google Scholar 

  25. Piccarreta R (2001) A new measure of nominal-ordinal association. J Appl Stat 28(1):107–120

    MathSciNet  Article  Google Scholar 

  26. Sheather SJ, Jones MC (1991) A reliable data-based bandwidth selection method for Kernel density estimation. J R Stat Soc Ser B 53:683–690

    MathSciNet  MATH  Google Scholar 

  27. Shrout PE, Fleiss JL (1979) Intraclass correlations: uses in assessing reliability. Psychol Bull 86:420–428

    Article  Google Scholar 

  28. Thompson I (1991) Foreign accents revisited: factors relating to transfer of accent from the first language to a second language. Lang Speech 24(3):265–272

    Google Scholar 

  29. von Eye A, Mun EY (2005) Analyzing rater agreement. Manifest variable methods. Lawrence Erlbaum Associates, Mahwah, New Jersey

    Google Scholar 

Download references

Acknowledgements

The authors are grateful to the referees for very careful reading of the manuscript and thoughtful comments. We dedicate this paper to the memory of Professor Giovanni Battista Tranquilli for being a source of motivation with his scientific and human support.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Giuseppe Bove.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bove, G., Conti, P.L. & Marella, D. A measure of interrater absolute agreement for ordinal categorical data. Stat Methods Appl (2020). https://doi.org/10.1007/s10260-020-00551-5

Download citation

Keywords

  • Ordinal data
  • Interrater agreement
  • Resampling