Estimating intercoder reliability: a structural equation modeling approach

Feng, Guangchao Charles

doi:10.1007/s11135-014-0034-7

Estimating intercoder reliability: a structural equation modeling approach

Published: 20 May 2014

Volume 48, pages 2355–2369, (2014)
Cite this article

Quality & Quantity Aims and scope Submit manuscript

Guangchao Charles Feng¹

551 Accesses
8 Citations
Explore all metrics

Abstract

Intercoder reliability is usually estimated with a summary index, and yet the limitations concerning the indexing approach have been well noted. This study critically reviewed all the existing major modeling approaches to estimating intercoder reliability, and empirically tested and further compared these approaches. It was found that latent variable modeling, also called the second-generation SEM, generally perform better than log-linear modeling, and is able to explain the paradox haunting some indices, and to spot the sources of disagreement among coders. Implications were discussed at last.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

The traditional SEM has many restrictive assumptions, i.e., multivariate normality, completely random missing data, homogeneity of population, and correct model specification (cf. Kline 2011).
This is also called “local or conditional independence” in the literature.
Hagen (2003) estimated the membership of objects by proposing an index called fuzzy \(\kappa \).
Measures and items were used interchangeably here. They may, however, differ from context to context.
Final class counts and proportions for the latent class patterns based on estimated posterior probabilities are very close.
Aickin (1990), Gwet (2010) and some others have tried to incorporate the factor of the difficulty level of coding tasks into the calculation of intercoder reliability with the indexing approach.

References

Agresti, A.: An agreement model with kappa as parameter. Stat. Probab. Lett. 7(4), 271–273 (1989)
Article Google Scholar
Agresti, A.: Modelling patterns of agreement and disagreement. Stat. Methods Med. Res. 1(2), 201–218 (1992). doi:10.1177/096228029200100205
Article Google Scholar
Agresti, A.: Categorical Data Analysis. Wiley, New York (2002)
Book Google Scholar
Agresti, A., Lang, J.B.: Quasi-symmetric latent class models, with application to rater agreement. Biometrics 49(1), 131–139 (1993) http://0-www.jstor.org/stable/2532608
Aickin, M.: Maximum likelihood estimation of agreement in the constant predictive probability model, and its relation to cohen’s kappa. Biometrics 46(2), 293–302 (1990) http://www.jstor.org/stable/2531434
Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19(6), 716–723 (1974). doi:10.1109/TAC.1974.1100705
Article Google Scholar
Anisimova, M., Gascuel, O.: Approximate likelihood-ratio test for branches: a fast, accurate, and powerful alternative. Syst. Biol. 55(4), 539–552 (2006). doi:10.1080/10635150600755453
Article Google Scholar
Banerjee, M., Capozzoli, M., McSweeney, L., Sinha, D.: Beyond kappa: a review of interrater agreement measures. Can. J. Stat. 27(1), 3–23 (1999)
Article Google Scholar
Bishop, Y., Fienberg, S., Holland, P.: Discrete Multivariate Analysis: Theory and Practice. MIT Press, Cambridge, MA (1975)
Google Scholar
Caussinus, H.: Contribution a l’analyse statistique des tableaux de correlation. Annales de la faculte des sciences de Toulouse Ser 4(29), 77–183 (1965). doi:10.5802/afst.519
Article Google Scholar
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20(1), 37–46 (1960). doi:10.1177/001316446002000104
Article Google Scholar
Conger, A.: Integration and generalization of kappas for multiple raters. Psychol. Bull. 88(2), 322–328 (1980). doi:10.1037/0033-2909.88.2.322
Article Google Scholar
Crocker, L., Algina, J.: Introduction to Classical and Modern Test Theory. Cengage Learning, Mason, OH (2008)
Google Scholar
De Ayala, R.: The Theory and Practice of Item Response Theory. The Guilford Press, New York (2009)
Google Scholar
De Gruijter, D., de Gruijter, D., Leo, J., et al.: Statistical Test Theory for the Behavioral Sciences, vol. 2. Chapman & Hall/CRC, Boca Raton (2007)
Book Google Scholar
DeCarlo, L.T.: A latent class extension of signal detection theory, with applications. Multivar. Behav. Res. 37(4), 423–451 (2002). doi:10.1207/S15327906MBR3704_01
Article Google Scholar
Dumenci, L.: The psychometric latent agreement model (plam) for discrete latent variables measured by multiple items. Organ. Res. Methods 14(1), 91–115 (2011). doi:10.1177/1094428110374649
Article Google Scholar
Feng, G.C.: Factors affecting intercoder reliability: a monte carlo experiment. Qual. Quant. 47(5), 2959–2982 (2013a)
Article Google Scholar
Feng, G.C.: Intercoder reliability indices: disuse, misuse, and abuse. Qual. Quant. 1–13, (2013b). doi:10.1007/s11135-013-9956-8
Feng, G.C.: Underlying determinants driving agreement among coders. Qual. Quant. 47(5), 2983–2997 (2013c)
Article Google Scholar
Fleiss, J.: Measuring nominal scale agreement among many raters. Psychol. Bull. 76(5), 378–382 (1971)
Article Google Scholar
Goodman, L.: Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 61(2), 215–231 (1974). doi:10.1093/biomet/61.2.215
Article Google Scholar
Goodman, L., Magidson, J.: Analyzing Qualitative/Categorical Data: Log-Linear Models and Latent-Structure Analysis. Abt Books Cambridge, Cambridge, MA (1978)
Google Scholar
Guggenmoos-Holzmann, I.: How reliable are change-corrected measures of agreement? Stat. Med. 12(23), 2191–2205 (1993). doi:10.1002/sim.4780122305
Article Google Scholar
Guggenmoos-Holzmann, I., Vonk, R.: Kappa-like indices of observer agreement viewed from a latent class perspective. Stat. Med. 17(8), 797–812 (1998)
Article Google Scholar
Gwet, K.: Handbook of Inter-Rater Reliability-A Definitive Guide to Measuring the Extent of Agreement Among Multiple Raters. Advanced Analytics, LLC, Gaithersburg, MD (2010)
Google Scholar
Haberman, S.: Analysis of Qualitative Data, vol. 2. Academic Press, New York (1979)
Google Scholar
Hagen, A.: Fuzzy set approach to assessing similarity of categorical maps. Int. J. Geogr. Inform. Sci. 17(3), 235 (2003)
Article Google Scholar
Hallquist, M. MplusAutomation: Automating Mplus Model Estimation and Interpretation (2011) http://cran.r-project.org/web/packages/MplusAutomation/MplusAutomation.pdf
Holmquist, N., McMahan, C., Williams, O., et al.: Variability in classification of carcinoma in situ of the uterine cervix. Arch. Pathol. 84(4), 334 (1967)
Google Scholar
Kline, R.B.: Principles and Practice of Structural Equation Modeling. Guilford press, New York (2011)
Google Scholar
Krippendorff, K.: Content Analysis: An Introduction to Its Methodology, 2nd edn. Sage Publications Inc, Thousand Oaks (2004)
Google Scholar
Krippendorff, K.: Agreement and information in the reliability of coding. Commun. Methods Meas. 5(2), 93–112 (2011). doi:10.1080/19312458.2011.568376
Article Google Scholar
Landis, J., Koch, G.: An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics 33(2), 363–374 (1977)
Article Google Scholar
Lazarsfeld, P., Henry, N.: Latent Structure Analysis. Houghton, Mifflin (1968)
Google Scholar
Light, R.J.: Measures of response agreement for qualitative data: some generalizations and alternatives. Psychol. Bull. 76(5), 365–377 (1971)
Article Google Scholar
Linzer, D.A., Lewis, J.B.: poLCA: An R package for polytomous variable latent class analysis. J. Stat. Softw. 42(10), 1–29 (2011) http://www.jstatsoft.org/v42/i10/
Lo, Y., Mendell, N., Rubin, D.: Testing the number of components in a normal mixture. Biometrika 88(3), 767–778 (2001). doi:10.1093/biomet/88.3.767
Article Google Scholar
Lombard, M., Snyder Duch, J.: Content analysis in mass communication: assessment and reporting of intercoder reliability. Hum. Commun. Res. 28(4), 587–604 (2002)
Article Google Scholar
Lord, F., Novick, M., Birnbaum, A.: Statistical Theories of Mental Test Scores, 2008th edn. Addison-Wesley, Don Mills (1968)
Google Scholar
McLachlan, G.J.: On bootstrapping the likelihood ratio test stastistic for the number of components in a normal mixture. J. R. Stat. Soc. Ser. C (Appl. Stat.) 36(3), 318–324 (1987) http://www.jstor.org/stable/2347790
Muthén, B.: Latent Variable Mixture Modeling. Lawrence Erlbaum Associates, Mahwah, New Jersey, chap 1, pp. 1–33. New Developments and Techniques in Structural Equation Modeling (2001)
Muthén, B.: Second-generation structural equation modeling with a combination of categorical and continuous latent variables: New opportunities for latent class-latent growth modeling. In: Sayer, A.G. (ed.) New Methods for the Analysis of Change. Decade of Behavior, American Psychological Association, Washington, DC, US, pp. 291–322 (2001b) 10.1037/10409-010.
Muthén, B.: Beyond sem: General latent variable modeling. Behaviormetrika 29(1; ISSU 51), 81–118 (2002)
Muthén, B., Muthén, L.: Mplus version 6.1 [software] (2010)
Nelson, J.C., Pepe, M.S.: Statistical description of interrater variability in ordinal ratings. Stat. Methods Med. Res. 9(5), 475–496 (2000). doi:10.1177/096228020000900505
Article Google Scholar
R Development Core Team: R: A language and environment for statistical computing (2011) http://www.R-project.org/, ISBN 3-900051-07-0
Raykov, T., Dimitrov, D.M., von Eye, A., Marcoulides, G.A.: Interrater agreement evaluation: a latent variable modeling approach. Educ. Psychol. Meas. 73(3), 512–531 (2013). doi:10.1177/0013164412449016
Article Google Scholar
Reeve, B.: An introduction to modern measurement theory (2002) http://faculty.ksu.edu.sa/darandari/spss/IRT.pdf
Rost, J.: A logistic mixture distribution model for polychotomous item responses. Br. J. Math. Stat. Psychol. 44(1), 75–92 (1991). doi:10.1111/j.2044-8317.1991.tb00951.x
Article Google Scholar
Schuster, C.: A mixture model approach to indexing rater agreement. Br. J. Math. Stat. Psychol. 55(2), 289–303 (2002). doi:10.1348/000711002760554598
Schuster, C., Smith, D.A.: Indexing systematic rater agreement with a latent-class model. Psychol. Methods 7(3), 384–395 (2002) http://www.sciencedirect.com/science/article/pii/S1082989X02001900
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978) http://www.jstor.org/stable/2958889
Sclove, S.: Application of model-selection criteria to some problems in multivariate analysis. Psychometrika 52(3), 333–343 (1987)
Article Google Scholar
Shrout, P.: Measurement reliability and agreement in psychiatry. Stat. Methods Med. Res. 7(3), 301–317 (1998)
Article Google Scholar
Tanner, M.A., Young, M.A.: Modeling agreement among raters. J. Am. Stat. Assoc. 80(389), 175–180 (1985) http://www.jstor.org/stable/2288068
Uebersax, J.: Modeling approaches for the analysis of observer agreement. Investig. Radiol. 27(9), 738–743 (1992)
Article Google Scholar
Uebersax, J., Grove, W.: A latent trait finite mixture model for the analysis of rating agreement. Biometrics 49(3), 823–835 (1993)
Article Google Scholar
Uebersax, J.S.: Probit latent class analysis with dichotomous or ordered category measures: conditional independence/dependence models. Appl. Psychol. Meas. 23(4), 283–297 (1999). doi:10.1177/01466219922031400
Article Google Scholar
Varki, S., Cooil, B., Rust, R.T.: Modeling fuzzy data in qualitative marketing research. J. Mark. Res. 37(4), 480–489 (2000) http://www.jstor.org/stable/1558516

Download references

Author information

Authors and Affiliations

School of Journalism & Communication, Jinan University, 601th, West Huangpu Avenue, Tianhe District, Guangzhou, Guangdong, China
Guangchao Charles Feng

Authors

Guangchao Charles Feng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guangchao Charles Feng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Feng, G.C. Estimating intercoder reliability: a structural equation modeling approach. Qual Quant 48, 2355–2369 (2014). https://doi.org/10.1007/s11135-014-0034-7

Download citation

Published: 20 May 2014
Issue Date: July 2014
DOI: https://doi.org/10.1007/s11135-014-0034-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Estimating intercoder reliability: a structural equation modeling approach

Abstract

Access this article

Similar content being viewed by others

Interrater reliability estimators tested against true interrater reliabilities

Comparing Hyperprior Distributions to Estimate Variance Components for Interrater Reliability Coefficients

Reliability Analysis of Instruments and Data Coding

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Estimating intercoder reliability: a structural equation modeling approach

Abstract

Access this article

Similar content being viewed by others

Interrater reliability estimators tested against true interrater reliabilities

Comparing Hyperprior Distributions to Estimate Variance Components for Interrater Reliability Coefficients

Reliability Analysis of Instruments and Data Coding

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation