Quality & Quantity

, Volume 48, Issue 3, pp 1803–1815 | Cite as

Intercoder reliability indices: disuse, misuse, and abuse

Article

Abstract

Although intercoder reliability has been considered crucial to the validity of a content study, the choice among them has been controversial. This study analyzed all the content studies published in the two major communication journals that reported intercoder reliability, aiming to find how scholars conduct intercoder reliability test. The results revealed that some intercoder reliability indices were misused persistently concerning the levels of measurement, the number of coders, and the means of reporting reliability over the past 30 years. Implications of misuse, disuse, and abuse were discussed, and suggestions regarding proper choice of indices in various situations were made at last.

Keywords

Intercoder reliability Content analysis Misuse 

References

  1. Artstein, R., Poesio, M.: Inter-coder agreement for computational linguistics. Comput. Linguist. 34(4), 555–596 (2008)CrossRefGoogle Scholar
  2. Bates, D., Maechler, M., Bolker, B.: lme4: Linear mixed-effects models using s4 classes [Computer software manual], 2011, August. Retrieved from http://cran.r-project.org/web/packages/lme4/index.html
  3. Bennett, E.M., Alpert, R., Goldstein, A.C.: Communications through limited-response questioning. Public Opin. Q. 18(3), 303–308 (1954). Retrieved from http://poq.oxfordjournals.org/content/18/3/303.abstract doi:10.1086/266520
  4. Brennan, R., Prediger, D.: Coefficient kappa: some uses, misuses, and alternatives. Educ. Psychol. Meas. 41(3), 687 (1981)CrossRefGoogle Scholar
  5. Byrt, T., Bishop, J., Carlin, J.B.: Bias, prevalence and kappa. J. Clin. Epidemiology 46(5), 423–429 (1993). doi:10.1016/0895-4356(93)90018-V. Retrieved from http://www.sciencedirect.com/science/article/pii/089543569390018V Google Scholar
  6. Canty, A., Ripley, B.: Boot: Bootstrap functions (1.3-4 ed.) [Computer software manual], 2012, March. Retrieved from http://cran.r-project.org/web/packages/boot/index.html
  7. Cicchetti, D., Feinstein, A.: High agreement but low kappa: II. Resolving the paradoxes* 1. J. Clin. Epidemiol. 43(6), 551–558 (1990)CrossRefGoogle Scholar
  8. Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20(1), 37–46 (1960). doi:10.1177/001316446002000104 CrossRefGoogle Scholar
  9. Cohen, J.: Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychol. Bull. 70(4), 213–220 (1968). Retrieved from http://search.ebscohost.com/login.aspx?direct=truedb=pdhAN=bul-70-4-213site=ehost-live doi:10.1037/h0026256
  10. Conger, A.: Integration and generalization of kappas for multiple raters. Psychol. Bull. 88(2), 322 (1980). doi:10.1037/0033-2909.88.2.322 CrossRefGoogle Scholar
  11. Cronbach, L.: Coefficient alpha and the internal structure of tests. Psychometrika 16(3), 297–334 (1951)Google Scholar
  12. Feinstein, A.R., Cicchetti, D.V.: High agreement but low kappa: I. The problems of two paradoxes. J. Clin. Epidemiol. 43(6), 543–549 (1990). doi:10.1016/0895-4356(90)90158-L. Retrieved from http://www.sciencedirect.com/science/article/pii/089543569090158L Google Scholar
  13. Feng, G.C.: Factors affecting intercoder reliability: a Monte Carlo experiment. Qual. Quant. 47(5), 2959–2982 (2013a). doi:10.1007/s11135-012-9745-9
  14. Feng, G.C.: Underlying determinants driving agreement among coders. Qual. Quant. 47(5), 2983–2997 (2013b). doi:10.1007/s11135-012-9807-z
  15. Finn, R.: A note on estimating the reliability of categorical data. Educ. Psychol. Meas. 30(1), 71–76 (1970). doi:10.1177/001316447003000106 CrossRefGoogle Scholar
  16. Fleiss, J.: Measuring nominal scale agreement among many raters. Psychol. Bull. 76(5), 378–382 (1971)CrossRefGoogle Scholar
  17. Fleiss, J.L., Levin, B., Paik, M.C.: The measurement of interrater agreement. In: Statistical Methods for Rates and Proportions, 3rd ed., pp. 598–626. Wiley, New York (2004). doi:10.1002/0471445428.ch18. Retrieved from http://dx.doi.org/10.1002/0471445428.ch18
  18. Gwet, K.: Inter-rater reliability: dependency on trait prevalence and marginal homogeneity. Stat. Methods Inter-Rater Reliab. Assess. Ser. 2, 1–9 (2002)Google Scholar
  19. Gwet, K.: Computing inter-rater reliability and its variance in the presence of high agreement. Br. J. Math. Stat. Psychol. 61(1), 29–48 (2008)CrossRefGoogle Scholar
  20. Gwet, K.: Handbook of Inter-Rater Reliability—A Definitive Guide to Measuring the Extent of Agreement Among Multiple Raters. Advanced Analytics LLC, Gaithersburg (2010)Google Scholar
  21. Holsti, O.: Content analysis for the social sciences and humanities. Addison-Wesley: Reading, MA (1969)Google Scholar
  22. Hughes, M.A., Garrett, D.E.: Intercoder reliability estimation approaches in marketing: a generalizability theory framework for quantitative data. J. Mark. Res. 27(2), 185–195 (1990). Retrieved from http://www.jstor.org/stable/3172845
  23. Kolbe, R.H., Burnett, M.S.: Content-analysis research: an examination of applications with directives for improving research reliability and objectivity. J. Consum. Res. 18(2), 243–250 (1991). Retrieved from http://www.jstor.org/stable/2489559
  24. Krippendorff, K.: Bivariate agreement coefficients for reliability of data. Sociol. Methodol. 2, 139–150 (1970). Retrieved from http://www.jstor.org/stable/270787
  25. Krippendorff, K.: Content Analysis: An Introduction to Its Methodology, 2nd ed. Sage, Thousand Oaks (2004a)Google Scholar
  26. Krippendorff, K.: Reliability in content analysis. some common misconceptions and recommendations. Hum. Commun. Res. 30(3), 411–433 (2004b). doi:10.1111/j.1468-2958.2004.tb00738.x
  27. Krippendorff, K.: Computing Krippendorff ’s alpha reliability, 2007, June. Retrieved from http://repository.upenn.edu/cgi/viewcontent.cgi?article=1043context=ascpapers
  28. Krippendorff, K.: Agreement and information in the reliability of coding. Commun. Methods Meas. 5(2), 93–112 (2011). doi:10.1080/19312458.2011.568376 CrossRefGoogle Scholar
  29. Krippendorff, K.: A dissenting view on so-called paradoxes of reliability coefficients. In: Salmon, C.T. (ed.) Communication Yearbook, vol. 36, pp. 481–500. Routledge, New York (2012)Google Scholar
  30. Light, R.J.: Measures of response agreement for qualitative data: some generalizations and alternatives. Psychol. Bull. 76(5), 365–377 (1971)CrossRefGoogle Scholar
  31. Lin, L.: A concordance correlation coefficient to evaluate reproducibility. Biometrics 45(1), 255 (1989)CrossRefGoogle Scholar
  32. Lin, L., Hedayat, A.S., Wenting, W.: A unified approach for assessing agreement for continuous and categorical data. J. Biopharm. Stat. 17(4), 629–652 (2007). doi:10.1080/10543400701376498 CrossRefGoogle Scholar
  33. Lombard, M., Snyder Duch, J.: Content analysis in mass communication: assessment and reporting of intercoder reliability. Hum. Commun. Res. 28(4), 587–604 (2002)CrossRefGoogle Scholar
  34. Maxwell, A.E.: Coefficients of agreement between observers and their interpretation. Br. J. Psychiatry 130(1), 79–83 (1977). doi: 10.1192/bjp.130.1.79. Retrieved from http://bjp.rcpsych.org/content/130/1/79.abstract Google Scholar
  35. Osgood, C.: The representational model and relevant research methods. In: de Sola Pool, I. (ed.) Trends in Content Analysis, pp. 33–88. University of Illinois Press, Champaign (1959)Google Scholar
  36. Perreault, J., William D., Leigh, L.E.: Reliability of nominal data based on qualitative judgments. J. Mark. Res. 26(2), 135–148 (1989). Retrieved from http://www.jstor.org/stable/3172601 Google Scholar
  37. Potter, W.J., Levine-Donnerstein, D.: Rethinking validity and reliability in content analysis. J. Appl. Commun. Res. 27(3), 258–284 (1999). doi:10.1080/00909889909365539 CrossRefGoogle Scholar
  38. Riffe, D., Lacy, S., Fico, F.: Analyzing Media Messages: Using Quantitative Content Analysis in Research. Lawrence Erlbaum Assoc Inc, New Jersey (2005)Google Scholar
  39. Scott, W.: Reliability of content analysis: the case of nominal scale coding. Public Opin. Q. 19, 321–325 (1955). doi:10.1086/266577
  40. Spiegelman, M., Terwilliger, C., Fearing, F.: The reliability of agreement in content analysis. J. Soc. Psychol. 37, 175–187 (1953)Google Scholar
  41. Warrens, M.: A formal proof of a paradox associated with Cohen’s kappa. J. Classif. 1–11 (2010). doi:10.1007/s00357-010-9060-x. Retrieved from https://openaccess.leidenuniv.nl/bitstream/handle/1887/16310/Warrens2010JoC27322332.2
  42. Zhao, X.: A Reliability Index (ai) that Assumes Honest Coders and Variable Randomness. Association for Education in Journalism and Mass Communication, Chicago (2012)Google Scholar
  43. Zhao, X., Liu, J.S., Deng, K.: Assumptions behind inter-coder reliability indices. In: Salmon, C.T. (ed.) Communication Yearbook, vol. 36, pp. 419–480. Routledge, New York (2012)Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2013

Authors and Affiliations

  1. 1.School of Journalism and CommunicationJinan UniversityGuangzhouChina

Personalised recommendations