Skip to main content
Log in

Estimating intercoder reliability: a structural equation modeling approach

  • Published:
Quality & Quantity Aims and scope Submit manuscript

Abstract

Intercoder reliability is usually estimated with a summary index, and yet the limitations concerning the indexing approach have been well noted. This study critically reviewed all the existing major modeling approaches to estimating intercoder reliability, and empirically tested and further compared these approaches. It was found that latent variable modeling, also called the second-generation SEM, generally perform better than log-linear modeling, and is able to explain the paradox haunting some indices, and to spot the sources of disagreement among coders. Implications were discussed at last.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. The traditional SEM has many restrictive assumptions, i.e., multivariate normality, completely random missing data, homogeneity of population, and correct model specification (cf. Kline 2011).

  2. This is also called “local or conditional independence” in the literature.

  3. Hagen (2003) estimated the membership of objects by proposing an index called fuzzy \(\kappa \).

  4. Measures and items were used interchangeably here. They may, however, differ from context to context.

  5. Final class counts and proportions for the latent class patterns based on estimated posterior probabilities are very close.

  6. Aickin (1990), Gwet (2010) and some others have tried to incorporate the factor of the difficulty level of coding tasks into the calculation of intercoder reliability with the indexing approach.

References

  • Agresti, A.: An agreement model with kappa as parameter. Stat. Probab. Lett. 7(4), 271–273 (1989)

    Article  Google Scholar 

  • Agresti, A.: Modelling patterns of agreement and disagreement. Stat. Methods Med. Res. 1(2), 201–218 (1992). doi:10.1177/096228029200100205

    Article  Google Scholar 

  • Agresti, A.: Categorical Data Analysis. Wiley, New York (2002)

    Book  Google Scholar 

  • Agresti, A., Lang, J.B.: Quasi-symmetric latent class models, with application to rater agreement. Biometrics 49(1), 131–139 (1993) http://0-www.jstor.org/stable/2532608

  • Aickin, M.: Maximum likelihood estimation of agreement in the constant predictive probability model, and its relation to cohen’s kappa. Biometrics 46(2), 293–302 (1990) http://www.jstor.org/stable/2531434

  • Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19(6), 716–723 (1974). doi:10.1109/TAC.1974.1100705

    Article  Google Scholar 

  • Anisimova, M., Gascuel, O.: Approximate likelihood-ratio test for branches: a fast, accurate, and powerful alternative. Syst. Biol. 55(4), 539–552 (2006). doi:10.1080/10635150600755453

    Article  Google Scholar 

  • Banerjee, M., Capozzoli, M., McSweeney, L., Sinha, D.: Beyond kappa: a review of interrater agreement measures. Can. J. Stat. 27(1), 3–23 (1999)

    Article  Google Scholar 

  • Bishop, Y., Fienberg, S., Holland, P.: Discrete Multivariate Analysis: Theory and Practice. MIT Press, Cambridge, MA (1975)

    Google Scholar 

  • Caussinus, H.: Contribution a l’analyse statistique des tableaux de correlation. Annales de la faculte des sciences de Toulouse Ser 4(29), 77–183 (1965). doi:10.5802/afst.519

    Article  Google Scholar 

  • Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20(1), 37–46 (1960). doi:10.1177/001316446002000104

    Article  Google Scholar 

  • Conger, A.: Integration and generalization of kappas for multiple raters. Psychol. Bull. 88(2), 322–328 (1980). doi:10.1037/0033-2909.88.2.322

    Article  Google Scholar 

  • Crocker, L., Algina, J.: Introduction to Classical and Modern Test Theory. Cengage Learning, Mason, OH (2008)

    Google Scholar 

  • De Ayala, R.: The Theory and Practice of Item Response Theory. The Guilford Press, New York (2009)

    Google Scholar 

  • De Gruijter, D., de Gruijter, D., Leo, J., et al.: Statistical Test Theory for the Behavioral Sciences, vol. 2. Chapman & Hall/CRC, Boca Raton (2007)

    Book  Google Scholar 

  • DeCarlo, L.T.: A latent class extension of signal detection theory, with applications. Multivar. Behav. Res. 37(4), 423–451 (2002). doi:10.1207/S15327906MBR3704_01

    Article  Google Scholar 

  • Dumenci, L.: The psychometric latent agreement model (plam) for discrete latent variables measured by multiple items. Organ. Res. Methods 14(1), 91–115 (2011). doi:10.1177/1094428110374649

    Article  Google Scholar 

  • Feng, G.C.: Factors affecting intercoder reliability: a monte carlo experiment. Qual. Quant. 47(5), 2959–2982 (2013a)

    Article  Google Scholar 

  • Feng, G.C.: Intercoder reliability indices: disuse, misuse, and abuse. Qual. Quant. 1–13, (2013b). doi:10.1007/s11135-013-9956-8

  • Feng, G.C.: Underlying determinants driving agreement among coders. Qual. Quant. 47(5), 2983–2997 (2013c)

    Article  Google Scholar 

  • Fleiss, J.: Measuring nominal scale agreement among many raters. Psychol. Bull. 76(5), 378–382 (1971)

    Article  Google Scholar 

  • Goodman, L.: Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 61(2), 215–231 (1974). doi:10.1093/biomet/61.2.215

    Article  Google Scholar 

  • Goodman, L., Magidson, J.: Analyzing Qualitative/Categorical Data: Log-Linear Models and Latent-Structure Analysis. Abt Books Cambridge, Cambridge, MA (1978)

    Google Scholar 

  • Guggenmoos-Holzmann, I.: How reliable are change-corrected measures of agreement? Stat. Med. 12(23), 2191–2205 (1993). doi:10.1002/sim.4780122305

    Article  Google Scholar 

  • Guggenmoos-Holzmann, I., Vonk, R.: Kappa-like indices of observer agreement viewed from a latent class perspective. Stat. Med. 17(8), 797–812 (1998)

    Article  Google Scholar 

  • Gwet, K.: Handbook of Inter-Rater Reliability-A Definitive Guide to Measuring the Extent of Agreement Among Multiple Raters. Advanced Analytics, LLC, Gaithersburg, MD (2010)

    Google Scholar 

  • Haberman, S.: Analysis of Qualitative Data, vol. 2. Academic Press, New York (1979)

    Google Scholar 

  • Hagen, A.: Fuzzy set approach to assessing similarity of categorical maps. Int. J. Geogr. Inform. Sci. 17(3), 235 (2003)

    Article  Google Scholar 

  • Hallquist, M. MplusAutomation: Automating Mplus Model Estimation and Interpretation (2011) http://cran.r-project.org/web/packages/MplusAutomation/MplusAutomation.pdf

  • Holmquist, N., McMahan, C., Williams, O., et al.: Variability in classification of carcinoma in situ of the uterine cervix. Arch. Pathol. 84(4), 334 (1967)

    Google Scholar 

  • Kline, R.B.: Principles and Practice of Structural Equation Modeling. Guilford press, New York (2011)

    Google Scholar 

  • Krippendorff, K.: Content Analysis: An Introduction to Its Methodology, 2nd edn. Sage Publications Inc, Thousand Oaks (2004)

    Google Scholar 

  • Krippendorff, K.: Agreement and information in the reliability of coding. Commun. Methods Meas. 5(2), 93–112 (2011). doi:10.1080/19312458.2011.568376

    Article  Google Scholar 

  • Landis, J., Koch, G.: An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics 33(2), 363–374 (1977)

    Article  Google Scholar 

  • Lazarsfeld, P., Henry, N.: Latent Structure Analysis. Houghton, Mifflin (1968)

    Google Scholar 

  • Light, R.J.: Measures of response agreement for qualitative data: some generalizations and alternatives. Psychol. Bull. 76(5), 365–377 (1971)

    Article  Google Scholar 

  • Linzer, D.A., Lewis, J.B.: poLCA: An R package for polytomous variable latent class analysis. J. Stat. Softw. 42(10), 1–29 (2011) http://www.jstatsoft.org/v42/i10/

  • Lo, Y., Mendell, N., Rubin, D.: Testing the number of components in a normal mixture. Biometrika 88(3), 767–778 (2001). doi:10.1093/biomet/88.3.767

    Article  Google Scholar 

  • Lombard, M., Snyder Duch, J.: Content analysis in mass communication: assessment and reporting of intercoder reliability. Hum. Commun. Res. 28(4), 587–604 (2002)

    Article  Google Scholar 

  • Lord, F., Novick, M., Birnbaum, A.: Statistical Theories of Mental Test Scores, 2008th edn. Addison-Wesley, Don Mills (1968)

    Google Scholar 

  • McLachlan, G.J.: On bootstrapping the likelihood ratio test stastistic for the number of components in a normal mixture. J. R. Stat. Soc. Ser. C (Appl. Stat.) 36(3), 318–324 (1987) http://www.jstor.org/stable/2347790

  • Muthén, B.: Latent Variable Mixture Modeling. Lawrence Erlbaum Associates, Mahwah, New Jersey, chap 1, pp. 1–33. New Developments and Techniques in Structural Equation Modeling (2001)

  • Muthén, B.: Second-generation structural equation modeling with a combination of categorical and continuous latent variables: New opportunities for latent class-latent growth modeling. In: Sayer, A.G. (ed.) New Methods for the Analysis of Change. Decade of Behavior, American Psychological Association, Washington, DC, US, pp. 291–322 (2001b) 10.1037/10409-010.

  • Muthén, B.: Beyond sem: General latent variable modeling. Behaviormetrika 29(1; ISSU 51), 81–118 (2002)

  • Muthén, B., Muthén, L.: Mplus version 6.1 [software] (2010)

  • Nelson, J.C., Pepe, M.S.: Statistical description of interrater variability in ordinal ratings. Stat. Methods Med. Res. 9(5), 475–496 (2000). doi:10.1177/096228020000900505

    Article  Google Scholar 

  • R Development Core Team: R: A language and environment for statistical computing (2011) http://www.R-project.org/, ISBN 3-900051-07-0

  • Raykov, T., Dimitrov, D.M., von Eye, A., Marcoulides, G.A.: Interrater agreement evaluation: a latent variable modeling approach. Educ. Psychol. Meas. 73(3), 512–531 (2013). doi:10.1177/0013164412449016

    Article  Google Scholar 

  • Reeve, B.: An introduction to modern measurement theory (2002) http://faculty.ksu.edu.sa/darandari/spss/IRT.pdf

  • Rost, J.: A logistic mixture distribution model for polychotomous item responses. Br. J. Math. Stat. Psychol. 44(1), 75–92 (1991). doi:10.1111/j.2044-8317.1991.tb00951.x

    Article  Google Scholar 

  • Schuster, C.: A mixture model approach to indexing rater agreement. Br. J. Math. Stat. Psychol. 55(2), 289–303 (2002). doi:10.1348/000711002760554598

  • Schuster, C., Smith, D.A.: Indexing systematic rater agreement with a latent-class model. Psychol. Methods 7(3), 384–395 (2002) http://www.sciencedirect.com/science/article/pii/S1082989X02001900

  • Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978) http://www.jstor.org/stable/2958889

  • Sclove, S.: Application of model-selection criteria to some problems in multivariate analysis. Psychometrika 52(3), 333–343 (1987)

    Article  Google Scholar 

  • Shrout, P.: Measurement reliability and agreement in psychiatry. Stat. Methods Med. Res. 7(3), 301–317 (1998)

    Article  Google Scholar 

  • Tanner, M.A., Young, M.A.: Modeling agreement among raters. J. Am. Stat. Assoc. 80(389), 175–180 (1985) http://www.jstor.org/stable/2288068

  • Uebersax, J.: Modeling approaches for the analysis of observer agreement. Investig. Radiol. 27(9), 738–743 (1992)

    Article  Google Scholar 

  • Uebersax, J., Grove, W.: A latent trait finite mixture model for the analysis of rating agreement. Biometrics 49(3), 823–835 (1993)

    Article  Google Scholar 

  • Uebersax, J.S.: Probit latent class analysis with dichotomous or ordered category measures: conditional independence/dependence models. Appl. Psychol. Meas. 23(4), 283–297 (1999). doi:10.1177/01466219922031400

    Article  Google Scholar 

  • Varki, S., Cooil, B., Rust, R.T.: Modeling fuzzy data in qualitative marketing research. J. Mark. Res. 37(4), 480–489 (2000) http://www.jstor.org/stable/1558516

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guangchao Charles Feng.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Feng, G.C. Estimating intercoder reliability: a structural equation modeling approach. Qual Quant 48, 2355–2369 (2014). https://doi.org/10.1007/s11135-014-0034-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11135-014-0034-7

Keywords

Navigation