Classification of a disease often depends on more than one test, and the tests can be interrelated. Under the incorrect assumption of independence, the test result using dependent biomarkers can lead to a conflicting disease classification. We develop a copula-based method for this purpose that takes dependency into account and leads to a unique decision. We first construct the joint probability distribution of the biomarkers considering Frank’s, Clayton’s and Gumbel’s copulas. We then develop the classification method and perform a comprehensive simulation. Using simulated data sets, we study the statistical properties of joint probability distributions and determine the joint threshold with maximum classification accuracy. Our simulation study results show that parameter estimates for the copula-based bivariate distributions are not biased. We observe that the thresholds for disease classification converge to a stationary distribution across different choices of copulas. We also observe that the classification accuracy decreases with the increasing value of the dependence parameter of the copulas. Finally, we illustrate our method with a real data example, where we identify the joint threshold of Apolipoprotein B to Apolipoprotein A1 ratio and total cholesterol to high-density lipoprotein ratio for the classification of myocardial infarction. We conclude, the copula-based method works well in identifying the joint threshold of two dependent biomarkers for an outcome classification. Our method is flexible and allows modeling broad classes of bivariate distributions that take dependency into account. The threshold may allow clinicians to classify uniquely individuals at risk of developing the disease and plan for early intervention.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Bedford T, Cooke RM (2002) Vines—a new graphical model for dependent random variables. Ann Stat 30(4):1031–1068
Bouyé E, Durrleman V, Nikeghbali A, Riboulet G, Roncalli T (2001) Copulas: an open field for risk management. Retrieved from http://www.thierry-roncalli.com/download/copula-rm.pdf. Accessed 1 Dec 2015
Brechmann EC, Schepsmeier U (2013) Modeling dependence with C- and D-Vine copulas: the R packatge CDVine. J Stat Softw 52(3):1–27
Clayton DG (1978) A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika 65(1):141–151
Frank MJ (1979) On the simultaneous associativity of F(x, y) and x + y - F(x, y). Aequationes Mathematicae 19(1):194–226
Genest C, Rivest L-P (1993) Statistical inference procedures for bivariate archimedian copulas. J Am Stat Assoc 88(423):1034–1043
Gumbel E (1960) Bivariate exponential distributions. J Am Stat Assoc 55(292):698–707
Heidelberger P, Welch PD (1981) A spectral method for confidence interval generation and run length control in simulations. Commun ACM 24(4):233–245
Heidelberger P, Welch PD (1983) Simulation run length control in the presence of an initial transient. Oper Res 31(6):1109–1144
Hofert M, Mächler M (2011) Nested archimedean copulas meet R: the nacopula package. J Stat Softw 39(9):1–20
Hofert AM, Kojadi I, Maech M (2014) Copula: multivariate dependence with copulas. R package version 0.999-14. Retrieved from http://cran.r-project.org/package=copula. Accessed 1 Dec 2015
Huang W, Prokhorov A (2014) A goodness-of-fit test for copulas. Econom Rev 33(7):751–771
Jaworski P, Durante F, Härdle W, Rychlik T (2009) Copula theory and its applications. In: Proceedings of the workshop held in Warsaw, 25–26 September. Retrieved from http://www.springer.com/gp/book/9783642124648. Accessed 1 Dec 2015
Joe H (1997) Multivariate models and dependence concepts. Chapman & Hall, London
Kojadinovic I, Yan J (2010) Modeling multivariate distributions with continuous margins using the copula R package. J Stat Softw 34(9):1–20
Kuss O, Hoyer A, Solms A (2014) Meta-analysis for diagnostic accuracy studies: a new statistical model using beta-binomial distributions and bivariate copulas. Stat Med 33(1):17–30
Luke Y (1969) The special functions and their approximations, vol II. Academic Press, New York
Mazumdar M, Glassman G (2000) Categorizing a prognostic value: review of methods, code for easy implementation and applications to decision making about cancer treatments. Stat Med 19(1):113–132
McQueen MJ, Hawken S, Wang X, Ounpuu S, Sniderman A, Probstfield J, Steyn K, Sanderson JE, Hasani M, Volkova E, Kazmi K, Yusuf S (2008) Lipids, lipoproteins, and apolipoproteins as risk markers of myocardial infarction in 52 countries (the INTERHEART study): a case-control study. The Lancet 372(9634):224–233
Nagler T, Schellhase C, Czado C (2017) Nonparametric estimation of simplified vine copula models: comparison of methods. Depend Model 5(1):99–120
Nelsen RB (2006) An introduction to copulas, Springer Series in Statistics, 2nd edn. Springer, New York
Nusier MK, Ababneh BM (2006) Diagnostic efficiency of creatine kinase (CK), CKMB, troponin T and troponin I in patients with suspected acute myocardial infarction. J Health Sci 52(2):180–185
Sklar A (1959) Fonctions de répartition à n dimensions et leurs marges. Publications de l’Institut de Statistique de L’Université de Paris 8:229–231
Sniderman AD, Jungner I, Holme I, Aastveit A, Walldius G (2006) Errors that result from using the TC/HDL C ratio rather than the apoB/apoA-I ratio to identify the lipoprotein-related risk of vascular disease. J Intern Med 259(5):455–461
Tzimas P, Milionis H, Arnaoutoglou H, Kalantzi K, Pappas K, Karfis E (2008) Cardiac troponin I versus creatine kinase-MB in the detection of post operative cardiac events after coronary artery bypass grafting surgery. J Cardiovasc Surg 49(1):95–101
Vasan RS (2006) Biomarkers of cardiovascular disease: molecular basis and practical considerations. Circulation 113(19):2335–2362
Walldius G, Jungner I, Aastveit AH, Holme I, Furberg CD, Sniderman AD (2004) The apoB/apoA-I ratio is better than the cholesterol ratios to estimate the balance between plasma proatherogenic and antiatherogenic lipoproteins and to predict coronary risk. Clin Chem Lab Med 42(12):1355–1363
Yan J (2007) Enjoy the joy of copulas: with a package copula. J Stat Softw 21(4):1–21
SI would like to acknowledge help and support received from the Population Health Research institute (PHRI) and the Population Genomic Program (PGP) at McMaster University. JB holds the John D. Cameron Endowed Chair in the Genetic Determinants of Chronic Diseases, McMaster University. JB would like to acknowledge funding from the Natural Sciences and Engineering Research Council of Canada (NSERC) and the Canadian Institutes of Health Research (CIHR). The authors are grateful to two anonymous reviewers for their helpful comments that greatly contributed to improving the final version of the article.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
About this article
Cite this article
Islam, S., Anand, S., Hamid, J. et al. A copula-based method of classifying individuals into binary disease categories using dependent biomarkers. Stat Methods Appl 29, 871–897 (2020). https://doi.org/10.1007/s10260-020-00507-9
- Gamma distribution