Bayesian testing of agreement criteria under order constraints

Ganjali, M.; Moradzadeh, N.; Baghfalaki, T.

doi:10.1016/j.jkss.2016.06.004

Bayesian testing of agreement criteria under order constraints

Published: 29 July 2016

Volume 46, pages 78–87, (2017)
Cite this article

Journal of the Korean Statistical Society Aims and scope Submit manuscript

M. Ganjali¹,
N. Moradzadeh¹ &
T. Baghfalaki²

8 Accesses
1 Citation
Explore all metrics

Abstract

The most popular criterion to measure the overall agreement between two raters is the Cohen’s kappa coefficient. This coefficient measures the agreement of two raters who judge about some subjects with a binary nominal rating. In this paper, we consider a unified Bayesian approach for testing some hypotheses about the kappa coefficients under order constraints. This is done for rating of more than two studies with binary response. The Monte Carlo Markov Chain (MCMC) approach is used for the model implementation. The approach is illustrated using some simulation studies. Also, the proposed method is applied for analyzing a real data set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agresti, A. (1983). Testing marginal homogeneity for ordinal categorical variables. Biometrics, 39, 505–510.
Article Google Scholar
Altaye, M., Donner, A., & Klar, N. (2001). Inference procedures for assessing agreement among multiple raters. Biometrics, 57(2), 584–588.
Article MathSciNet MATH Google Scholar
Barlow, W., Lai, M. Y., & Azen, S. (1991). A comparison of methods for calculating a stratified kappa. Statistics in Medicine, 10, 1465–1472.
Article Google Scholar
Basu, S., Banerjee, M., & Sen, A. (2000). Bayesian inference for kappa from single and multiple studies. Biometrics, 56(2), 577–582.
Article MathSciNet MATH Google Scholar
Bloch, D. A., & Kraemer, H. C. (1989). 2x2 kappa coefficients: Measures of agreement or association. Biometrics, 45, 269–287.
Article MATH Google Scholar
Brennan, R. L, & Prediger, D.J. (1981). Coefficient kappa: some uses, misuses, and alternatives. Educational and Psychological Measurement, 41, 687–699.
Article Google Scholar
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 213–220.
Article Google Scholar
Conger, A. J. (1980). Integration and generalization of kappas for multiple raters. Psychological Bulletin, 88, 322–328.
Article Google Scholar
Davies, M., & Fleiss, J. L. (1982). Measuring agreement for multinomial data. Biometrics, 38, 1047–1051.
Article MATH Google Scholar
Dickey, J. (1971). The weighted likelihood ratio, linear hypotheses on normal location parameters. The Annals of Statistics, 42, 204–223.
Article MathSciNet MATH Google Scholar
Dickey, J. (1976). Approximate posterior distributions. Journal of the American Statistical Association, 71, 680–689.
Article MathSciNet MATH Google Scholar
Dickey, J., & Lientz, B. P. (1970). The weighted likelihood ratio, sharp hypotheses about chances, the order of a Markov chain. The Annals of Mathematical Statistics, 41, 214–226.
Article MathSciNet MATH Google Scholar
Donner, A., & Eliasziw, M. (1992). A goodness-of-fit approach to inference procedures for the kappa statistic: Confidence interval construction, significancetesting and sample size estimation. Statistics in Medicine, 11, 1511–1519.
Article Google Scholar
Donner, A., Eliasziw, M., & Klar, N. (1996). Testing the homogeniety of kappa statistics. Biometrics, 52, 176–183.
Article MATH Google Scholar
Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76, 378–382.
Article Google Scholar
Fleiss, J. L. (1981). Statistical methods for rates and proportions. New York: Wiley.
MATH Google Scholar
Gilks, W. R., & Wild, P. (1992). Adaptive rejection sampling for Gibbs sampling. Applied Statistics, 337–348.
Google Scholar
Hale, C. A., & Fleiss, J. L. (1993). Interval estimation under two study designs for kappa with binary classifications. Biometrics, 49, 523–534.
Google Scholar
Hoijtink, H. (2011). Informative hypotheses: theory and practice for behavioral and social scientists. London, UK: Chapman & Hall/CRC.
Book Google Scholar
Hsu, L. M., & Field, R. (2003). Inter-rater agreement measures: comments on kappa, Cohen’s kappa, Scott’s π and Aickin’s α. Understanding Statistics, 2, 205–219.
Article Google Scholar
Jakobsson, U., & Westergren, A. (2005). Statistical methods for assessing agreement for ordinal data. Scandinavian Journal of Caring Sciences, 19, 427–431.
Article Google Scholar
Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90, 773–795.
Article MathSciNet MATH Google Scholar
Klugkist, I., Laudy, O., & Hoijtink, H. (2005). Inequality constrained analysis of variance: A Bayesian approach. Psychological Methods, 10, 477–493.
Article Google Scholar
Klugkist, I., Laudy, O., & Hoijtink, H. (2010). Bayesian evaluation of inequality and equality constrained hypotheses for contingency tables. Psychological Methods.
Google Scholar
Kraemer, H. C. (1992a). How many raters? Toward the most reliable diagnostic consensus. Statistics in Medicine, 11, 317–332.
Article Google Scholar
Kraemer, H. C. (1992b). Measurement of reliability for categorical data in medical research. Statistical Methods in Medical Research, 1, 183–200.
Article Google Scholar
Kraemer, H. C, Periyakoil, V. S., & Noda, A. (2004). Tutorial in biostatistics: kappa coefficients in medical research. Statistics in Medicine, 21, 2109–2129.
Article Google Scholar
Krippendorff, K. (2004). Reliability in content analysis: some common misconceptions and recommendations. Human Communication Research, 30, 411–433.
Google Scholar
Lee, J. J., & Tu, Z. N. (1994). A better confidence interval for kappa (k) on measuring agreement between two raters with binary outcomes. Journal of Computational and Graphical Statistics, 3(3), 301–321.
Google Scholar
Lipsitz, S. R., Laird, N. M., & Breman, T. A. (1994). Simple moment estimates of the k-coefficient and its variance. Applied Statistics, 43(2), 309–323.
Article MathSciNet MATH Google Scholar
Mulder, J. (2015). Bayes factors for testing order-constrained hypotheses 486 on correlations. Journal of Mathematical Psychology.
Google Scholar
Mulder, J. (2016). Bayes factors for testing order-constrained hypotheses on correlations. Journal of Mathematical Psychology, 72, 104–115.
Article MathSciNet MATH Google Scholar
Mulder, J., Hoijtink, H., & Klugkist, I. (2010). Equality and inequality constrained multivariate linear models: Objective model selection using constrained posterior priors. Journal of Statistical Planning and Inference, 140, 887–906.
Article MathSciNet MATH Google Scholar
Mulder, J., Klugkist, I., Meeus, W., van de Schoot, A., Selfhout, M., & Hoijtink, H. (2009). Bayesian model selection of informative hypotheses for repeated measurements. Journal of Mathematical Psychology, 53, 530–546.
Article MathSciNet MATH Google Scholar
Nam, J. M. (2003). Homogeneity score test for the intraclass version of the kappa statistics and sample size determination in multiple or stratified studies. Biometrics, 59, 1027–1035.
Article MathSciNet MATH Google Scholar
Oh, M. S., & Shin, D. W. (2011). A unified Bayesian inference on treatment means with order constraints. Computational Statistics & Data Analysis, 55(1), 924–934.
Article MathSciNet MATH Google Scholar
Popping, R. (2010). Some views on agreement to be used in content analysis studies. Quality & Quantity, 44, 1067–1078.
Article Google Scholar
Rifkin, M. D., Zerhouni, E. A., Constantine, M. D., Gastonis, C. A., Quint, L. E., Paushter, D. M., Epstein, J. I., Hamper, U., Walsh, P. C, & McNeil, B. J. (1990). Comparison of magnetic resonance imaging and ultrasonography in staging early prostate cancer. The New England Journal of Medicine, 323(10), 621–626.
Article Google Scholar
Rogel, A., Boelle, P. Y., & Mary, J. Y. (1998). Globaland partial agreement among Several observers. Statistics Methods, 17, 489–501.
Google Scholar
Scott, W. A. (1955). Reliability of content analysis: the case of nominal scale coding. Public Opinion Quarterly, 19, 321–325.
Article Google Scholar
Shoukri, M. M. (2004). Measures of inter observer agreement. Boca Raton: Chapman & Hall/CRC.
Google Scholar
Shoukri, M. M., Martin, S. W., & Mian, I. U. H. (1995). Maximum likelihood estimation of the kappa coefficient from models of matched binary responses. Statistics in Medicine, 14, 83–99.
Article Google Scholar
Sturtz, S., Ligges, U, & Gelman, A. (2005). R2WinBUGS: A package for running WinBUGS from R. Journal of Statistical Software, 12(3), 1–16.
Article Google Scholar
Thompson, W. D., & Walter, S. D. (1988). A reappraisal of the kappa coefficient. Journal of Clinical Epidemiology, 41, 949–958.
Article Google Scholar
Vanbelle, S., & Albert, A. (2009a). Agreement between two independent groups of raters. Psychometrika, 74, 477–491.
Article MathSciNet MATH Google Scholar
Vanbelle, S., & Albert, A. (2009b). Agreement between an isolated rater and a group of raters. Statistica Neerlandica, 63, 82–100.
Article MathSciNet Google Scholar
Verdinelli, I., & Wasserman, L. (1995). Computing Bayes factors using a generalization of the Savage-Dickey density ratio. Journal of the American Statistical Association, 90, 614–618.
Article MathSciNet MATH Google Scholar
Visser, H., & de Nijs, T. (2006). The map comparison kit. Environmental Modeling and Software, 21, 346–358.
Article Google Scholar
Von Eye, A., & Mun, E. Y. (2005). Analyzing rater agreement, manifest variable methods. Mahwash, N.J., London: Lawrence Erlbaum Associates.
Google Scholar
Warrens, M. J. (2008a). On similarity coefficients for 2 × 2 tables and correction for chance. Psychometrika, 73, 487–502.
Article MathSciNet MATH Google Scholar
Warrens, M.J. (2008b). On the equivalence of Cohen’s kappa and the Hubert-Arabie adjusted Rand index. Journal of Classification, 25, 177–183.
Article MathSciNet MATH Google Scholar
Warrens, M.J. (2010a). Inequalities between kappa and kappa-like statistics for k × k tables. Psychometrika, 75, 176–185.
Article MathSciNet MATH Google Scholar
Warrens, M.J. (2010b). A formal proof of a paradox associated with Cohen’s kappa. Journal of Classification, 27, 322–332.
Article MathSciNet MATH Google Scholar
Warrens, M. J. (2011). Weighted kappa is higher than Cohen’s kappa for tridiagonal agreement tables. Statistical Methodology, 4, 271–286.
MathSciNet MATH Google Scholar
Williams, G. W. (1976). Comparing the joint agreement of several raters with one rater. Biometrics, 32, 619–627.
Article MATH Google Scholar
Zwick, R. (1988). Another look at interrater agreement. Psychological Bulletin, 103, 374–378.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran
M. Ganjali & N. Moradzadeh
Department of Statistics, Faculty of Mathematical Sciences, Tarbiat Modares University, Tehran, Iran
T. Baghfalaki

Authors

M. Ganjali
View author publications
You can also search for this author in PubMed Google Scholar
N. Moradzadeh
View author publications
You can also search for this author in PubMed Google Scholar
T. Baghfalaki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to T. Baghfalaki.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ganjali, M., Moradzadeh, N. & Baghfalaki, T. Bayesian testing of agreement criteria under order constraints. J. Korean Stat. Soc. 46, 78–87 (2017). https://doi.org/10.1016/j.jkss.2016.06.004

Download citation

Received: 08 June 2015
Accepted: 30 June 2016
Published: 29 July 2016
Issue Date: March 2017
DOI: https://doi.org/10.1016/j.jkss.2016.06.004

AMS 2000 subject classifications

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian testing of agreement criteria under order constraints

Abstract

Access this article

Similar content being viewed by others

Measuring inter-rater reliability for nominal data – which coefficients and confidence intervals are appropriate?

Better to be in agreement than in bad company

Bayesian Modelling of Dependence Between Experts: Some Comparisons with Cooke’s Classical Model

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

AMS 2000 subject classifications

Keywords

Navigation

Bayesian testing of agreement criteria under order constraints

Abstract

Access this article

Similar content being viewed by others

Measuring inter-rater reliability for nominal data – which coefficients and confidence intervals are appropriate?

Better to be in agreement than in bad company

Bayesian Modelling of Dependence Between Experts: Some Comparisons with Cooke’s Classical Model

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

AMS 2000 subject classifications

Keywords

Search

Navigation