Skip to main content
Log in

Sklar’s Omega: A Gaussian copula-based framework for assessing agreement

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

The statistical measurement of agreement—the most commonly used form of which is inter-coder agreement (also called inter-rater reliability), i.e., consistency of scoring among two or more coders for the same units of analysis—is important in a number of fields, e.g., content analysis, education, computational linguistics, sports. We propose Sklar’s Omega, a Gaussian copula-based framework for measuring not only inter-coder agreement but also intra-coder agreement, inter-method agreement, and agreement relative to a gold standard. We demonstrate the efficacy and advantages of our approach by applying both Sklar’s Omega and Krippendorff’s Alpha (a well-established nonparametric agreement coefficient) to simulated data, to nominal data previously analyzed by Krippendorff, and to continuous data from an imaging study of hip cartilage in femoroacetabular impingement. Application of our proposed methodology is supported by our open-source R package, sklarsomega, which is available for download from the Comprehensive R Archive Network. The package permits users to apply the Omega methodology to nominal scores, ordinal scores, percentages, counts, amounts (i.e., non-negative real numbers), and balances (i.e., any real number); and can accommodate any number of units, any number of coders, and missingness. Classical inference is available for all levels of measurement while Bayesian inference is available for continuous outcomes only.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19(6), 716–723 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  • Altman, D.G., Bland, J.M.: Measurement in medicine: The analysis of method comparison studies. The Statistician 32(3), 307–317 (1983)

  • Artstein, R., Poesio, M.: Inter-coder agreement for computational linguistics. Comput. Linguist. 34(4), 555–596 (2008)

    Article  Google Scholar 

  • Banerjee, M., Capozzoli, M., McSweeney, L., Sinha, D.: Beyond kappa: A review of interrater agreement measures. Canadian Journal Statistics 27(1), 3–23 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  • Bennett, E.M., Alpert, R., Goldstein, A.C.: Communications Through Limited-Response Questioning. Public Opin. Q. 18(3), 303–308 (1954)

    Article  Google Scholar 

  • Burgert, C., Rüschendorf, L.: On the optimal risk allocation problem. Statistics & Decisions 24(1/2006), 153–171 (2006)

    MathSciNet  MATH  Google Scholar 

  • Burnham, K.P., Anderson, D.R., Huyvaert, K.P.: AIC model selection and multimodel inference in behavioral ecology: Some background, observations, and comparisons. Behav. Ecol. Sociobiol. 65(1), 23–35 (2011)

    Article  Google Scholar 

  • Byrd, R., Lu, P., Nocedal, J., Zhu, C.: A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 16(5), 1190–1208 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  • Chen, X., Fan, Y., Tsyrennikov, V.: Efficient estimation of semiparametric multivariate copula models. Technical Report 04-W20. Vanderbilt University, Nashville, TN (2004)

    Google Scholar 

  • Chrisman, N.R.: Rethinking levels of measurement for cartography. Cartography Geographic Information Systems 25(4), 231–242 (1998)

    Article  Google Scholar 

  • Cicchetti, D.V., Feinstein, A.R.: High agreement but low kappa: II. resolving the paradoxes. J. Clin. Epidemiol. 43(6), 551–558 (1990)

    Article  Google Scholar 

  • Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Measur. 20(1), 37–46 (1960)

    Article  Google Scholar 

  • Cohen, J.: Weighed kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychol. Bull. 70(4), 213–220 (1968)

    Article  Google Scholar 

  • Conger, A.J.: Integration and generalization of kappas for multiple raters. Psychol. Bull. 88(2), 322 (1980)

    Article  Google Scholar 

  • Conway, R.W., Maxwell, W.L.: Network dispatching by the shortest-operation discipline. Oper. Res. 10(1), 51–73 (1962)

    Article  Google Scholar 

  • Davies, M., Fleiss, J.L.: Measuring agreement for multinomial data. Biometrics, pp. 1047–1051 (1982)

  • Davison, A.C., Hinkley, D.V.: Bootstrap Methods and their Application, vol. 1. Cambridge University Press, Cambridge (1997)

    Book  MATH  Google Scholar 

  • Eddelbuettel, D., Francois, R.: Rcpp: Seamless R and C++ integration. J. Stat. Softw. 40(8), 1–18 (2011)

    Article  Google Scholar 

  • Feinstein, A.R., Cicchetti, D.V.: High agreement but low kappa: I. the problems of two paradoxes. J. Clin. Epidemiol. 43(6), 543–549 (1990)

    Article  Google Scholar 

  • Ferguson, T.S.: Mathematical Statistics: A Decision Theoretic Approach. Academic Press, New York (1967)

    MATH  Google Scholar 

  • Fernholz, L.T.: Almost sure convergence of smoothed empirical distribution functions. Scand. J. Stat. 18(3), 255–262 (1991)

    MathSciNet  MATH  Google Scholar 

  • Flegal, J.M., Haran, M., Jones, G.L.: Markov chain Monte Carlo: Can we trust the third significant figure? Stat. Sci. 23(2), 250–260 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • Flegal, J.M., Hughes, J., Vats, D., Dai, N., Gupta, K., Maji, U.: mcmcse: Monte Carlo Standard Errors for MCMC. Riverside, CA, Kanpur, India (2021). (R package version 1.5-0)

  • Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychol. Bull. 76(5), 378 (1971)

    Article  Google Scholar 

  • Furrer, R., Sain, S.R.: spam: A sparse matrix R package with emphasis on MCMC methods for Gaussian Markov random fields. J. Stat. Softw. 36(10), 1–25 (2010)

    Article  Google Scholar 

  • Genest, C., Neslehova, J.: A primer on copulas for count data. Astin Bulletin 37(2), 475 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Genz, A.: Numerical computation of multivariate normal probabilities. J. Comput. Graph. Stat. 1(2), 141–149 (1992)

  • Geyer, C.J.: Le Cam made simple: Asymptotics of maximum likelihood without the LLN or CLT or sample size going to infinity. In: Jones, G.L., Shen, X. (eds.) Advances in Modern Statistical Theory and Applications: A Festschrift in honor of Morris L. Eaton, Institute of Mathematical Statistics, Beachwood, Ohio, USA (2013)

    Google Scholar 

  • Gilbert, P., Varadhan, R.: numDeriv: Accurate Numerical Derivatives. R package version 2016(8–1), 1 (2019)

    Google Scholar 

  • Godambe, V.: An optimum property of regular maximum likelihood estimation. Ann. Math. Stat. 31(4), 1208–1211 (1960)

  • Gwet, K.L.: Computing inter-rater reliability and its variance in the presence of high agreement. Br. J. Math. Stat. Psychol. 61(1), 29–48 (2008)

    Article  MathSciNet  Google Scholar 

  • Gwet, K.L.: Handbook of Inter-Rater Reliability: The Definitive Guide to Measuring the Extent of Agreement Among Raters, 4th edn. Advanced Analytics, LLC, Gaithersburg, MD (2014)

    Google Scholar 

  • Gwet, K.L.: Testing the difference of correlated agreement coefficients for statistical significance. Educ. Psychol. Measur. 76(4), 609–637 (2016)

    Article  Google Scholar 

  • Han, Z., De Oliveira, V.: On the correlation structure of Gaussian copula models for geostatistical count data. Australian & New Zealand Journal of Statistics 58(1), 47–69 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  • Hayes, A.F., Krippendorff, K.: Answering the call for a standard reliability measure for coding data. Commun. Methods Meas. 1(1), 77–89 (2007)

    Article  Google Scholar 

  • Henn, L.L.: Limitations and performance of three approaches to Bayesian inference for Gaussian copula regression models of discrete data. Computational Statistics, pp. 1–38 (2021)

  • Henn, L.L., Hughes, J., Iisakka, E., Ellermann, J., Mortazavi, S., Ziegler, C., Nissi, M.J., Morgan, P.: Disease severity classification using quantitative magnetic resonance imaging data of cartilage in femoroacetabular impingement. Stat. Med. 36(9), 1491–1505 (2017)

    Article  MathSciNet  Google Scholar 

  • Holm, S.: A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6(2), 65–70 (1979)

    MathSciNet  MATH  Google Scholar 

  • Hooke, R., Jeeves, T.A.: Direct search solution of numerical and statistical problems. J. ACM 8(2), 212–229 (1961)

    Article  MATH  Google Scholar 

  • Huang, A.: Mean-parametrized Conway-Maxwell-Poisson regression models for dispersed counts. Stat. Model. 17(6), 359–380 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  • Hughes, J.: krippendorffsalpha: An R package for measuring agreement using Krippendorff’s Alpha coefficient. The R Journal 13(1), 413–425 (2021)

    Article  Google Scholar 

  • Hughes, J.: On the occasional exactness of the distributional transform approximation for direct Gaussian copula models with discrete margins. Statistics & Probability Letters 177, 109159 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  • Ihaka, R., Gentleman, R.: R: A language for data analysis and graphics. J. Comput. Graph. Stat. 5, 299–314 (1996)

    Google Scholar 

  • Kazianka, H.: Approximate copula-based estimation and prediction of discrete spatial data. Stoch. Env. Res. Risk Assess. 27(8), 2015–2026 (2013)

    Article  Google Scholar 

  • Kazianka, H., Pilz, J.: Copula-based geostatistical modeling of continuous and discrete data including covariates. Stoch. Env. Res. Risk Assess. 24(5), 661–673 (2010)

    Article  MATH  Google Scholar 

  • Kendall, M.G.: A new measure of rank correlation. Biometrika 30(1/2), 81–93 (1938)

    Article  MATH  Google Scholar 

  • Klaassen, C.A., Wellner, J.A., et al.: Efficient estimation in the bivariate normal copula model: Normal margins are least favourable. Bernoulli 3(1), 55–77 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  • Krippendorff, K.: Content Analysis: An Introduction to Its Methodology. Sage, Los Angeles (2012)

  • Krippendorff, K.: Computing Krippendorff’s alpha-reliability. Technical report, University of Pennsylvania (2013)

  • Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics, pp. 159–174 (1977)

  • Lindsay, B.: Composite likelihood methods. Contemp. Math. 80(1), 221–239 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  • Liu, H., Lafferty, J., Wasserman, L.: The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. J. Mach. Learn. Res. 10(Oct), 2295–2328 (2009)

    MathSciNet  MATH  Google Scholar 

  • Morgan, P., Nissi, M.J., Hughes, J., Mortazavi, S., Ellermann, J.: T2* mapping provides information that is statistically comparable to an arthroscopic evaluation of acetabular cartilage. Cartilage 9(3), 237–240 (2018)

    Article  Google Scholar 

  • Mosteller, F., Tukey, J.: Data Analysis and Regression: A Second Course in Statistics. Addison-Wesley series in behavioral science, Addison-Wesley Publishing Company (1977)

    Google Scholar 

  • Musgrove, D., Hughes, J., Eberly, L.: Hierarchical copula regression models for areal data. Spatial Statistics 17, 38–49 (2016)

    Article  MathSciNet  Google Scholar 

  • Nelsen, R.B.: An Introduction to Copulas. Springer, New York (2006)

    MATH  Google Scholar 

  • Nissi, M.J., Mortazavi, S., Hughes, J., Morgan, P., Ellermann, J.: T2* relaxation time of acetabular and femoral cartilage with and without intra-articular Gd-DTPA2 in patients with femoroacetabular impingement. Am. J. Roentgenol. 204(6), W695 (2015)

    Article  Google Scholar 

  • Prentice, R.L.: Correlated binary regression with covariates specific to each binary observation. Biometrics, pp. 1033–1048 (1988)

  • R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2021)

    Google Scholar 

  • Ribatet, M., Cooley, D., Davison, A.C.: Bayesian inference from composite likelihoods, with an application to spatial extremes. Statistica Sinica, pp. 813–845 (2012)

  • Rüschendorf, L.: Stochastically ordered distributions and monotonicity of the OC-function of sequential probability ratio tests. Statistics 12(3), 327–338 (1981)

    MathSciNet  MATH  Google Scholar 

  • Rüschendorf, L.: On the distributional transform, Sklar’s theorem, and the empirical copula process. J. Stat. Planning Inference 139(11), 3921–3927 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Scott, W.A.: Reliability of content analysis: The case of nominal scale coding. Public Opin. Q. 19, 321–325 (1955)

    Article  Google Scholar 

  • Sellers, K.F., Borle, S., Shmueli, G.: The COM-Poisson model for count data: a survey of methods and applications. Appl. Stoch. Model. Bus. Ind. 28(2), 104–116 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Serfling, R., Mazumder, S.: Exponential probability inequality and convergence results for the median absolute deviation and its modifications. Statistics & Probability Letters 79(16), 1767–1773 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Shmueli, G., Minka, T.P., Kadane, J.B., Borle, S., Boatwright, P.: A useful distribution for fitting discrete data: Revival of the Conway-Maxwell-Poisson distribution. J. Roy. Stat. Soc.: Ser. C (Appl. Stat.) 54(1), 127–142 (2005)

    MathSciNet  MATH  Google Scholar 

  • Singh, S., Póczos, B.: Nonparanormal information estimation. In: Precup, D., Teh, Y.W., (eds), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp. 3210–3219. PMLR (2017)

  • Sklar, A.: Fonctions de répartition à n dimensions et leurs marges. Publ. Inst. Statist. Univ. Paris 8, 229–231 (1959)

    MathSciNet  MATH  Google Scholar 

  • Smeeton, N.C.: Early history of the kappa statistic. Biometrics 41(3), 795–795 (1985)

    Google Scholar 

  • Spearman, C.E.: The proof and measurement of association between two things. Am. J. Psychol. 15, 72–101 (1904)

    Article  Google Scholar 

  • Spiegelhalter, D.J., Best, N.G., Carlin, B.P., Van Der Linde, A.: Bayesian measures of model complexity and fit. J. Royal Stat. Society: Series B (Statistical Methodology) 64(4), 583–639 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Stevens, S.S.: On the theory of scales of measurement. Science 103(2684), 677–680 (1946)

    Article  MATH  Google Scholar 

  • Szabó, Z., Póczos, B., Szirtes, G., Lőrincz, A.: Post nonlinear independent subspace analysis. In: International Conference on Artificial Neural Networks, pp. 677–686. Springer (2007)

  • Tierney, L., Rossini, A.J., Li, N., Sevcikova, H.: snow: Simple Network of Workstations. R package version 0.4-3 (2018)

  • Varadhan, R., University, J.H., Borchers, H.W., Research, A.C., Bechard, V., Montreal, H.: dfoptim: Derivative-Free Optimization. R package version 2020.10-1 (2020)

  • Varin, C.: On composite marginal likelihoods. AStA Advances Statistical Analysis 92(1), 1–28 (2008)

  • Xue-Kun Song, P.: Multivariate dispersion models generated from Gaussian copula. Scand. J. Stat. 27(2), 305–320 (2000)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to John Hughes.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

Here we briefly introduce our R package, sklarsomega, version 3.0 of which is available for download from the Comprehensive R Archive Network.

R package sklarsomega

We introduce our R package by way of a brief usage example. Additional examples are provided in the package documentation.

We apply our Bayesian methodology to a subset of the cartilage data, assuming first a \(\textsc {Laplace}(\mu ,\sigma )\) and then a \(\textsc {T}(\nu ,\mu )\) marginal distribution. First we load the cartilage data, which are included in the package.

figure a

We see that sampling terminated when 4,000 samples had been drawn, since that sample size yielded \(\widehat{\text {cv}}_j<0.01\) for \(j\in \{1,2,3\}\). As a second check we examine the plot given in Fig. 4, which shows the estimated posterior mean for \(\omega \) as a function of sample size. The estimate evidently stabilized after approximately 2,500 samples had been drawn.

Fig. 4
figure 4

A plot of estimated posterior mean versus sample size for \(\omega \), having assumed a Laplace marginal distribution

The proposal standard deviations (1 for \(\mu \), 0.1 for \(\sigma \), and 0.2 for \(\omega \)) led to sensible acceptance rates of 40%, 60%, and 67%.

figure b

For a t marginal distribution only 3,000 samples were required.

figure c

Note that the Laplace model yielded a much smaller value of DIC, and hence a very small relative likelihood for the t model.

figure d

Much additional functionality is supported by package sklarsomega, e.g., plotting, simulation, influence statistics. And we note that computational efficiency is supported by our use of sparse-matrix routines (Furrer and Sain 2010) and a clever bit of Fortran code (Genz 1992) for the CML method. Future versions of the package will employ C++ (Eddelbuettel and Francois 2011).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hughes, J. Sklar’s Omega: A Gaussian copula-based framework for assessing agreement. Stat Comput 32, 46 (2022). https://doi.org/10.1007/s11222-022-10105-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11222-022-10105-2

Keywords

Navigation