Abstract
A statistical model is proposed for a multi-dimensional agreement matrix tabulating nominal variables and judgments of two raters in order to analyze the structure of agreement. The model is derived from the notion of the measurement error model in test theory, and it is expressed as a tree with a single divergence. The model is composed of probabilities of agreement, a true score and errors of raters as parameters. It is shown that the parameters can be estimated by the usual maximum likelihood approach and the agreement probabilities are reliability measures. The model is extended to apply to cases of more than three variables and three raters. Partial modifications to the model are discussed for ordered categorical variables and the consistency matrix generated by two sets of answers to a questionnaire. An application example is presented for the consistency matrix.
Similar content being viewed by others
References
Altman, D.G. (1990). Practical statistics for medical research. London: Chapman and Hall.
Agresti, A. (1990). Categorical data analysis. New York: Wiley.
Boyd, N.F., Wolfson, C., Moskowitz, M., Carlile, T., Petitclerc, M., Ferri H.A., Fishell, E., Gregoire, A., Kiernan, M., Longley, J.D., Simor, I.S., & Miller, A.B. (1982). Observer variation in the interpretation of xeromammograms. Journal of the National Cancer Institute, 68, 357–363.
Cicchetti, D.V. & Allison, T. (1971). A new procedure for assessing reliability of scoring EEG sleep recordings. American Journal of EEG Technology, 11, 101–109.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46.
Cohen, J. (1968). Weighted Kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213–220.
Cooil, B. & Rust, R.T. (1994). Reliability and expected loss: A unifying principle. Psychometrika, 59, 203–216.
Fleiss, J.L. & Cohen, J. (1973). The equivalence of weighted Kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement, 33, 613–619.
Fleiss, J.L. & Cohen, J. (1969). Large sample standard errors of kappa and weighted kappa. Psychological Bulletin, 72, 323–327.
Homma, A., Kusunoki, T., Urata, M., Ishino, N., Sawada, T., Hirai, S. & the SKETCH Study Group (1999). Reliability and validity of a rating scale for post-stroke psychiatric symptoms. Alzheimer Disease and Associated Disorders, 13(suppl.3), 148–158.
Hu, X. & Batchelder, W.H. (1994). The statistical analysis of general processing tree models with the EM algorithm. Psychometrika, 59, 21–47.
Ishii, H., Yamamoto, T. & Ohashi, Y. (2001). Development of insulin therapy related quality of life measure (ITR-QOL). Tonyobyo, 44, 9–15. (in Japanese)
Klauer, K.C. & Batchelder, W.H. (1996). Structural analysis of subjective categorical data. Psy-chometrika, 61, 199–240.
McCullagh, P. & Nelder, J.A. (1989). Generalized linear models (2nd ed.). London: Chapman and Hall.
Schervish, M.J. (1995). Theory of statistics. New York: Springer-Verlag.
Author information
Authors and Affiliations
About this article
Cite this article
Kiyomi, F. Multivariate Structural Analysis of Agreement for Categorical Data. Behaviormetrika 30, 195–218 (2003). https://doi.org/10.2333/bhmk.30.195
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.2333/bhmk.30.195