Abstract
The choice of variable-selection methods to identify important variables for binary classification modeling is critical for producing stable statistical models that are interpretable, that generate accurate predictions, and have minimal bias. This work is motivated by the availability of data on clinical and laboratory features of dengue fever infections obtained from 51 individuals enrolled in a prospective observational study of acute human dengue infections. Our paper uses objective Bayesian method to identify important variables for dengue hemorrhagic fever (DHF) over the dengue data set. With the selected important variables by objective Bayesian method, we employ a Gaussian copula marginal regression model considering correlation error structure and a general method of semi-parametric Bayesian inference for Gaussian copula model to estimate, separately, the marginal distribution and dependence structure. We also carry out a receiver operating characteristic (ROC) analysis for the predictive model for DHF and compare our proposed model with the other models of Ju and Brasier (Variable selection methods for developing a biomarker panel for prediction of dengue hemorrhagic fever. BMC Res Notes 6:365, 2013) tested on the basis of the ROC analysis. Our results extend the previous models of DHF by suggesting that IL-10, Days Fever, Sex and Lymphocytes are the major features for predicting DHF on the basis of blood chemistries and cytokine measurements. In addition, the dependence structure of these Days Fever, Lymphocytes, IL-10 and Sex protein profiles associated with disease outcomes was discovered by the semi-parametric Bayesian Gaussian copula model and Gaussian partial correlation method.
Similar content being viewed by others
References
Aasa K, Czadob C, Frigessic A, Bakkend H (2009) Pair-copula constructions of multiple dependence. Insur Math Econ 44(2):182–198
Ahmad Z (2019) The hyperbolic Sine Rayleigh distribution with application to bladder cancer susceptibility. Ann Data Sci 6:211–222. https://doi.org/10.1007/s40745-018-0165-0
Bayarri MJ, Berger JO, Forte A, Garcia-Donato G (2012) Criteria for Bayesian model choice with application to variable selection. Ann Stat 40:1550–1577
Brasier AR, Ju H, Garcia J, Spratt HM, Victor SS, Forshey BM, Halsey ES, Comach G, Sierra G, Blair PJ, Rocha C, Morrison AC, Scott TW, Bazan I, Kochel TJ, Venezuelan Dengue Fever Working Group (2012) A Three-Component Biomarker Panel for Prediction of Dengue Hemorrhagic Fever. Am J Trop Med Hyg 86(2):341–348
Denuit M, Lambert P (2005) Constraints on concordance measures in bivariate discrete data. J Multivar Anal 93(1):40–57
Garcia-Donato G, Forte A (2015) R package BayesVarSel. R Foundation for Statistical Computing, Vienna
Genest, C., & Nešlehová, J. (2007). A primer on copulas for count data. ASTIN Bulletin 37(2):475–515. https://doi.org/10.1017/S0515036100014963
Genest C, Ghoudi K, Rivest LP (1995) A semiparametric estimation procedure of dependence parameters in multivariate families of distributions. Biometrika 82(3):543–552
Hoff PD (2007) Extending the rank likelihood for semiparametric copula estimation. Ann Appl Stat 1(1):265–283
Joe H (1997) Multivariate models and dependence concepts. Chapman and Hall, London
Ju H, Brasier AR (2013) Variable selection methods for developing a biomarker panel for prediction of dengue hemorrhagic fever. BMC Res Notes 6:365
Kim D, Kim J-M (2014) Analysis of directional dependence using asymmetric copula-based regression models. J Stat Comput Simul 84(9):1990–2010
Kim J-M, Jung Y-S, Sungur EA, Han K, Park C, Sohn I (2008) A copula method for modeling directional dependence of genes. BMC Bioinform 9:225
Kim J-M, Jung Y-S, Choi T, Sungur EA (2011) Partial correlation with copula modeling. Comput Stat Data Anal 55(3):1357–1366
Kojadinovic I, Yan J (2010) Modeling multivariate distributions with continuous margins using the copula R Package. J Stat Softw 34(9):1–20
Madsen L, Fang Y (2011) Joint regression analysis for discrete longitudinal data. Biometrics 67(3):1171–1175
Masarotto G, Varin C (2012) Gaussian copula marginal regression. Electron J Stat 6:1517–1549
Nelsen R (2006) An introduction to copulas, 2nd edn. Springer, New York
Olson D, Shi Y (2007) Introduction to business data mining. McGraw-Hill/Irwin, Boston
Shi Y, Tian YJ, Kou G, Peng Y, Li JP (2011) Optimization based data mining: theory and applications. Springer, London
Sklar A (1959) Fonctions de repartition a n-dimensions et leurs marges, (French). Publ Inst Stat Univ Paris 8:229–231
Song PX-K (2000) Multivariate dispersion models generated from Gaussian copula. Scand J Stat 27:305–320
Zellner A (1986) On assessing prior distributions and Bayesian regression analysis with g-prior distributions. In: Zellner A (ed) In Bayesian inference and decision techniques: essays in Honor of Bruno de Finetti. Edward Elgar Publishing Limited, Cheltenham, pp 389–399
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kim, JM., Ju, H. & Jung, Y. Copula Approach for Developing a Biomarker Panel for Prediction of Dengue Hemorrhagic Fever. Ann. Data. Sci. 7, 697–712 (2020). https://doi.org/10.1007/s40745-020-00293-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40745-020-00293-x