Abstract
Intercoder reliability is usually estimated with a summary index, and yet the limitations concerning the indexing approach have been well noted. This study critically reviewed all the existing major modeling approaches to estimating intercoder reliability, and empirically tested and further compared these approaches. It was found that latent variable modeling, also called the second-generation SEM, generally perform better than log-linear modeling, and is able to explain the paradox haunting some indices, and to spot the sources of disagreement among coders. Implications were discussed at last.
Similar content being viewed by others
Notes
The traditional SEM has many restrictive assumptions, i.e., multivariate normality, completely random missing data, homogeneity of population, and correct model specification (cf. Kline 2011).
This is also called “local or conditional independence” in the literature.
Hagen (2003) estimated the membership of objects by proposing an index called fuzzy \(\kappa \).
Measures and items were used interchangeably here. They may, however, differ from context to context.
Final class counts and proportions for the latent class patterns based on estimated posterior probabilities are very close.
References
Agresti, A.: An agreement model with kappa as parameter. Stat. Probab. Lett. 7(4), 271–273 (1989)
Agresti, A.: Modelling patterns of agreement and disagreement. Stat. Methods Med. Res. 1(2), 201–218 (1992). doi:10.1177/096228029200100205
Agresti, A.: Categorical Data Analysis. Wiley, New York (2002)
Agresti, A., Lang, J.B.: Quasi-symmetric latent class models, with application to rater agreement. Biometrics 49(1), 131–139 (1993) http://0-www.jstor.org/stable/2532608
Aickin, M.: Maximum likelihood estimation of agreement in the constant predictive probability model, and its relation to cohen’s kappa. Biometrics 46(2), 293–302 (1990) http://www.jstor.org/stable/2531434
Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19(6), 716–723 (1974). doi:10.1109/TAC.1974.1100705
Anisimova, M., Gascuel, O.: Approximate likelihood-ratio test for branches: a fast, accurate, and powerful alternative. Syst. Biol. 55(4), 539–552 (2006). doi:10.1080/10635150600755453
Banerjee, M., Capozzoli, M., McSweeney, L., Sinha, D.: Beyond kappa: a review of interrater agreement measures. Can. J. Stat. 27(1), 3–23 (1999)
Bishop, Y., Fienberg, S., Holland, P.: Discrete Multivariate Analysis: Theory and Practice. MIT Press, Cambridge, MA (1975)
Caussinus, H.: Contribution a l’analyse statistique des tableaux de correlation. Annales de la faculte des sciences de Toulouse Ser 4(29), 77–183 (1965). doi:10.5802/afst.519
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20(1), 37–46 (1960). doi:10.1177/001316446002000104
Conger, A.: Integration and generalization of kappas for multiple raters. Psychol. Bull. 88(2), 322–328 (1980). doi:10.1037/0033-2909.88.2.322
Crocker, L., Algina, J.: Introduction to Classical and Modern Test Theory. Cengage Learning, Mason, OH (2008)
De Ayala, R.: The Theory and Practice of Item Response Theory. The Guilford Press, New York (2009)
De Gruijter, D., de Gruijter, D., Leo, J., et al.: Statistical Test Theory for the Behavioral Sciences, vol. 2. Chapman & Hall/CRC, Boca Raton (2007)
DeCarlo, L.T.: A latent class extension of signal detection theory, with applications. Multivar. Behav. Res. 37(4), 423–451 (2002). doi:10.1207/S15327906MBR3704_01
Dumenci, L.: The psychometric latent agreement model (plam) for discrete latent variables measured by multiple items. Organ. Res. Methods 14(1), 91–115 (2011). doi:10.1177/1094428110374649
Feng, G.C.: Factors affecting intercoder reliability: a monte carlo experiment. Qual. Quant. 47(5), 2959–2982 (2013a)
Feng, G.C.: Intercoder reliability indices: disuse, misuse, and abuse. Qual. Quant. 1–13, (2013b). doi:10.1007/s11135-013-9956-8
Feng, G.C.: Underlying determinants driving agreement among coders. Qual. Quant. 47(5), 2983–2997 (2013c)
Fleiss, J.: Measuring nominal scale agreement among many raters. Psychol. Bull. 76(5), 378–382 (1971)
Goodman, L.: Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 61(2), 215–231 (1974). doi:10.1093/biomet/61.2.215
Goodman, L., Magidson, J.: Analyzing Qualitative/Categorical Data: Log-Linear Models and Latent-Structure Analysis. Abt Books Cambridge, Cambridge, MA (1978)
Guggenmoos-Holzmann, I.: How reliable are change-corrected measures of agreement? Stat. Med. 12(23), 2191–2205 (1993). doi:10.1002/sim.4780122305
Guggenmoos-Holzmann, I., Vonk, R.: Kappa-like indices of observer agreement viewed from a latent class perspective. Stat. Med. 17(8), 797–812 (1998)
Gwet, K.: Handbook of Inter-Rater Reliability-A Definitive Guide to Measuring the Extent of Agreement Among Multiple Raters. Advanced Analytics, LLC, Gaithersburg, MD (2010)
Haberman, S.: Analysis of Qualitative Data, vol. 2. Academic Press, New York (1979)
Hagen, A.: Fuzzy set approach to assessing similarity of categorical maps. Int. J. Geogr. Inform. Sci. 17(3), 235 (2003)
Hallquist, M. MplusAutomation: Automating Mplus Model Estimation and Interpretation (2011) http://cran.r-project.org/web/packages/MplusAutomation/MplusAutomation.pdf
Holmquist, N., McMahan, C., Williams, O., et al.: Variability in classification of carcinoma in situ of the uterine cervix. Arch. Pathol. 84(4), 334 (1967)
Kline, R.B.: Principles and Practice of Structural Equation Modeling. Guilford press, New York (2011)
Krippendorff, K.: Content Analysis: An Introduction to Its Methodology, 2nd edn. Sage Publications Inc, Thousand Oaks (2004)
Krippendorff, K.: Agreement and information in the reliability of coding. Commun. Methods Meas. 5(2), 93–112 (2011). doi:10.1080/19312458.2011.568376
Landis, J., Koch, G.: An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics 33(2), 363–374 (1977)
Lazarsfeld, P., Henry, N.: Latent Structure Analysis. Houghton, Mifflin (1968)
Light, R.J.: Measures of response agreement for qualitative data: some generalizations and alternatives. Psychol. Bull. 76(5), 365–377 (1971)
Linzer, D.A., Lewis, J.B.: poLCA: An R package for polytomous variable latent class analysis. J. Stat. Softw. 42(10), 1–29 (2011) http://www.jstatsoft.org/v42/i10/
Lo, Y., Mendell, N., Rubin, D.: Testing the number of components in a normal mixture. Biometrika 88(3), 767–778 (2001). doi:10.1093/biomet/88.3.767
Lombard, M., Snyder Duch, J.: Content analysis in mass communication: assessment and reporting of intercoder reliability. Hum. Commun. Res. 28(4), 587–604 (2002)
Lord, F., Novick, M., Birnbaum, A.: Statistical Theories of Mental Test Scores, 2008th edn. Addison-Wesley, Don Mills (1968)
McLachlan, G.J.: On bootstrapping the likelihood ratio test stastistic for the number of components in a normal mixture. J. R. Stat. Soc. Ser. C (Appl. Stat.) 36(3), 318–324 (1987) http://www.jstor.org/stable/2347790
Muthén, B.: Latent Variable Mixture Modeling. Lawrence Erlbaum Associates, Mahwah, New Jersey, chap 1, pp. 1–33. New Developments and Techniques in Structural Equation Modeling (2001)
Muthén, B.: Second-generation structural equation modeling with a combination of categorical and continuous latent variables: New opportunities for latent class-latent growth modeling. In: Sayer, A.G. (ed.) New Methods for the Analysis of Change. Decade of Behavior, American Psychological Association, Washington, DC, US, pp. 291–322 (2001b) 10.1037/10409-010.
Muthén, B.: Beyond sem: General latent variable modeling. Behaviormetrika 29(1; ISSU 51), 81–118 (2002)
Muthén, B., Muthén, L.: Mplus version 6.1 [software] (2010)
Nelson, J.C., Pepe, M.S.: Statistical description of interrater variability in ordinal ratings. Stat. Methods Med. Res. 9(5), 475–496 (2000). doi:10.1177/096228020000900505
R Development Core Team: R: A language and environment for statistical computing (2011) http://www.R-project.org/, ISBN 3-900051-07-0
Raykov, T., Dimitrov, D.M., von Eye, A., Marcoulides, G.A.: Interrater agreement evaluation: a latent variable modeling approach. Educ. Psychol. Meas. 73(3), 512–531 (2013). doi:10.1177/0013164412449016
Reeve, B.: An introduction to modern measurement theory (2002) http://faculty.ksu.edu.sa/darandari/spss/IRT.pdf
Rost, J.: A logistic mixture distribution model for polychotomous item responses. Br. J. Math. Stat. Psychol. 44(1), 75–92 (1991). doi:10.1111/j.2044-8317.1991.tb00951.x
Schuster, C.: A mixture model approach to indexing rater agreement. Br. J. Math. Stat. Psychol. 55(2), 289–303 (2002). doi:10.1348/000711002760554598
Schuster, C., Smith, D.A.: Indexing systematic rater agreement with a latent-class model. Psychol. Methods 7(3), 384–395 (2002) http://www.sciencedirect.com/science/article/pii/S1082989X02001900
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978) http://www.jstor.org/stable/2958889
Sclove, S.: Application of model-selection criteria to some problems in multivariate analysis. Psychometrika 52(3), 333–343 (1987)
Shrout, P.: Measurement reliability and agreement in psychiatry. Stat. Methods Med. Res. 7(3), 301–317 (1998)
Tanner, M.A., Young, M.A.: Modeling agreement among raters. J. Am. Stat. Assoc. 80(389), 175–180 (1985) http://www.jstor.org/stable/2288068
Uebersax, J.: Modeling approaches for the analysis of observer agreement. Investig. Radiol. 27(9), 738–743 (1992)
Uebersax, J., Grove, W.: A latent trait finite mixture model for the analysis of rating agreement. Biometrics 49(3), 823–835 (1993)
Uebersax, J.S.: Probit latent class analysis with dichotomous or ordered category measures: conditional independence/dependence models. Appl. Psychol. Meas. 23(4), 283–297 (1999). doi:10.1177/01466219922031400
Varki, S., Cooil, B., Rust, R.T.: Modeling fuzzy data in qualitative marketing research. J. Mark. Res. 37(4), 480–489 (2000) http://www.jstor.org/stable/1558516
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Feng, G.C. Estimating intercoder reliability: a structural equation modeling approach. Qual Quant 48, 2355–2369 (2014). https://doi.org/10.1007/s11135-014-0034-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11135-014-0034-7