A copula-based method of classifying individuals into binary disease categories using dependent biomarkers

Abstract

Classification of a disease often depends on more than one test, and the tests can be interrelated. Under the incorrect assumption of independence, the test result using dependent biomarkers can lead to a conflicting disease classification. We develop a copula-based method for this purpose that takes dependency into account and leads to a unique decision. We first construct the joint probability distribution of the biomarkers considering Frank’s, Clayton’s and Gumbel’s copulas. We then develop the classification method and perform a comprehensive simulation. Using simulated data sets, we study the statistical properties of joint probability distributions and determine the joint threshold with maximum classification accuracy. Our simulation study results show that parameter estimates for the copula-based bivariate distributions are not biased. We observe that the thresholds for disease classification converge to a stationary distribution across different choices of copulas. We also observe that the classification accuracy decreases with the increasing value of the dependence parameter of the copulas. Finally, we illustrate our method with a real data example, where we identify the joint threshold of Apolipoprotein B to Apolipoprotein A1 ratio and total cholesterol to high-density lipoprotein ratio for the classification of myocardial infarction. We conclude, the copula-based method works well in identifying the joint threshold of two dependent biomarkers for an outcome classification. Our method is flexible and allows modeling broad classes of bivariate distributions that take dependency into account. The threshold may allow clinicians to classify uniquely individuals at risk of developing the disease and plan for early intervention.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  1. Bedford T, Cooke RM (2002) Vines—a new graphical model for dependent random variables. Ann Stat 30(4):1031–1068

    MathSciNet  MATH  Article  Google Scholar 

  2. Bouyé E, Durrleman V, Nikeghbali A, Riboulet G, Roncalli T (2001) Copulas: an open field for risk management. Retrieved from http://www.thierry-roncalli.com/download/copula-rm.pdf. Accessed 1 Dec 2015

  3. Brechmann EC, Schepsmeier U (2013) Modeling dependence with C- and D-Vine copulas: the R packatge CDVine. J Stat Softw 52(3):1–27

    Article  Google Scholar 

  4. Clayton DG (1978) A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika 65(1):141–151

    MathSciNet  MATH  Article  Google Scholar 

  5. Frank MJ (1979) On the simultaneous associativity of F(x, y) and x + y - F(x, y). Aequationes Mathematicae 19(1):194–226

    MathSciNet  MATH  Article  Google Scholar 

  6. Genest C, Rivest L-P (1993) Statistical inference procedures for bivariate archimedian copulas. J Am Stat Assoc 88(423):1034–1043

    MATH  Article  Google Scholar 

  7. Gumbel E (1960) Bivariate exponential distributions. J Am Stat Assoc 55(292):698–707

    MathSciNet  MATH  Article  Google Scholar 

  8. Heidelberger P, Welch PD (1981) A spectral method for confidence interval generation and run length control in simulations. Commun ACM 24(4):233–245

    MathSciNet  Article  Google Scholar 

  9. Heidelberger P, Welch PD (1983) Simulation run length control in the presence of an initial transient. Oper Res 31(6):1109–1144

    MATH  Article  Google Scholar 

  10. Hofert M, Mächler M (2011) Nested archimedean copulas meet R: the nacopula package. J Stat Softw 39(9):1–20

    Article  Google Scholar 

  11. Hofert AM, Kojadi I, Maech M (2014) Copula: multivariate dependence with copulas. R package version 0.999-14. Retrieved from http://cran.r-project.org/package=copula. Accessed 1 Dec 2015

  12. Huang W, Prokhorov A (2014) A goodness-of-fit test for copulas. Econom Rev 33(7):751–771

    MathSciNet  Article  Google Scholar 

  13. Jaworski P, Durante F, Härdle W, Rychlik T (2009) Copula theory and its applications. In: Proceedings of the workshop held in Warsaw, 25–26 September. Retrieved from http://www.springer.com/gp/book/9783642124648. Accessed 1 Dec 2015

  14. Joe H (1997) Multivariate models and dependence concepts. Chapman & Hall, London

    Google Scholar 

  15. Kojadinovic I, Yan J (2010) Modeling multivariate distributions with continuous margins using the copula R package. J Stat Softw 34(9):1–20

    Article  Google Scholar 

  16. Kuss O, Hoyer A, Solms A (2014) Meta-analysis for diagnostic accuracy studies: a new statistical model using beta-binomial distributions and bivariate copulas. Stat Med 33(1):17–30

    MathSciNet  Article  Google Scholar 

  17. Luke Y (1969) The special functions and their approximations, vol II. Academic Press, New York

    Google Scholar 

  18. Mazumdar M, Glassman G (2000) Categorizing a prognostic value: review of methods, code for easy implementation and applications to decision making about cancer treatments. Stat Med 19(1):113–132

    Article  Google Scholar 

  19. McQueen MJ, Hawken S, Wang X, Ounpuu S, Sniderman A, Probstfield J, Steyn K, Sanderson JE, Hasani M, Volkova E, Kazmi K, Yusuf S (2008) Lipids, lipoproteins, and apolipoproteins as risk markers of myocardial infarction in 52 countries (the INTERHEART study): a case-control study. The Lancet 372(9634):224–233

    Article  Google Scholar 

  20. Nagler T, Schellhase C, Czado C (2017) Nonparametric estimation of simplified vine copula models: comparison of methods. Depend Model 5(1):99–120

    MathSciNet  MATH  Article  Google Scholar 

  21. Nelsen RB (2006) An introduction to copulas, Springer Series in Statistics, 2nd edn. Springer, New York

    Google Scholar 

  22. Nusier MK, Ababneh BM (2006) Diagnostic efficiency of creatine kinase (CK), CKMB, troponin T and troponin I in patients with suspected acute myocardial infarction. J Health Sci 52(2):180–185

    Article  Google Scholar 

  23. Sklar A (1959) Fonctions de répartition à n dimensions et leurs marges. Publications de l’Institut de Statistique de L’Université de Paris 8:229–231

    MathSciNet  MATH  Google Scholar 

  24. Sniderman AD, Jungner I, Holme I, Aastveit A, Walldius G (2006) Errors that result from using the TC/HDL C ratio rather than the apoB/apoA-I ratio to identify the lipoprotein-related risk of vascular disease. J Intern Med 259(5):455–461

    Article  Google Scholar 

  25. Tzimas P, Milionis H, Arnaoutoglou H, Kalantzi K, Pappas K, Karfis E (2008) Cardiac troponin I versus creatine kinase-MB in the detection of post operative cardiac events after coronary artery bypass grafting surgery. J Cardiovasc Surg 49(1):95–101

    Google Scholar 

  26. Vasan RS (2006) Biomarkers of cardiovascular disease: molecular basis and practical considerations. Circulation 113(19):2335–2362

    Article  Google Scholar 

  27. Walldius G, Jungner I, Aastveit AH, Holme I, Furberg CD, Sniderman AD (2004) The apoB/apoA-I ratio is better than the cholesterol ratios to estimate the balance between plasma proatherogenic and antiatherogenic lipoproteins and to predict coronary risk. Clin Chem Lab Med 42(12):1355–1363

    Article  Google Scholar 

  28. Yan J (2007) Enjoy the joy of copulas: with a package copula. J Stat Softw 21(4):1–21

    Article  Google Scholar 

Download references

Acknowledgements

SI would like to acknowledge help and support received from the Population Health Research institute (PHRI) and the Population Genomic Program (PGP) at McMaster University. JB holds the John D. Cameron Endowed Chair in the Genetic Determinants of Chronic Diseases, McMaster University. JB would like to acknowledge funding from the Natural Sciences and Engineering Research Council of Canada (NSERC) and the Canadian Institutes of Health Research (CIHR). The authors are grateful to two anonymous reviewers for their helpful comments that greatly contributed to improving the final version of the article.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Joseph Beyene.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (txt 20 KB)

Appendix

Appendix

See Tables 7 and 8, Figs. 6, 7, 8, 9.

Table 7 Relative bias in percent and MSE based on 1000 simulated samples of size 1000 in each using Frank’s, Clayton’s and Gumbel’s copula and gamma marginals
Table 8 Degree of dependence based on 1000 samples of size 1000 in each using Frank’s, Clayton’s and Gumbel’s copula and gamma marginals
Fig. 6
figure6

Univariate density plot of a simulated sample by case and control data separately for CK and cTn

Fig. 7
figure7

Bivariate density plot of CK and cTn based on a simulated sample using Frank’s copula by case and control status

Fig. 8
figure8

Contour plot for the joint probability distribution of CK and cTn based on a simulated sample using Frank’s copula by case and control status

Fig. 9
figure9

Bivariate density plot of ApoB to ApoA1 ratio and TC to HDL ratio using Frank’s and Gumbel’s copula by case or control status. The Arrow in each of these plots indicate the joint threshold identified with maximum classification accuracy

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Islam, S., Anand, S., Hamid, J. et al. A copula-based method of classifying individuals into binary disease categories using dependent biomarkers. Stat Methods Appl 29, 871–897 (2020). https://doi.org/10.1007/s10260-020-00507-9

Download citation

Keywords

  • Copula
  • Gamma distribution
  • Biomarker
  • Sensitivity
  • Specificity
  • AUC