Abstract
Ridge regression estimators can be interpreted as a Bayesian posterior mean (or mode) when the regression coefficients follow multivariate normal prior. However, the multivariate normal prior may not give efficient posterior estimates for regression coefficients, especially in the presence of interaction terms. In this paper, the vine copula-based priors are proposed for Bayesian ridge estimators under the Cox proportional hazards model. The semiparametric Cox models are built on the posterior density under two likelihoods: Cox’s partial likelihood and the full likelihood under the gamma process prior. The simulations show that the full likelihood is generally more efficient and stable for estimating regression coefficients than the partial likelihood. We also show via simulations and a data example that the Archimedean copula priors (the Clayton and Gumbel copula) are superior to the multivariate normal prior and the Gaussian copula prior.
Similar content being viewed by others
Change history
14 February 2023
A Correction to this paper has been published: https://doi.org/10.1007/s10182-023-00470-2
References
Aas, K., Czado, C., Frigessi, A., Bakken, H.: Pair-copula constructions of multiple dependence. Insurance Math. Econ. 44, 182–198 (2009)
Abonazel, M.R., Taha, I.M.: Beta ridge regression estimators: simulation and application. Commun. Stat. Simul. Comput. (2021). https://doi.org/10.1080/03610918.2021.1960373
Arashi, M., Roozbeh, M., Hamzah, N.A., Gasparini, M.: Ridge regression and its applications in genetic studies. PLoS ONE 16, e0245376 (2021)
Arbel, J., Lijoi, A., Nipoti, B.: Full Bayesian inference with hazard mixture models. Comput. Stat. Data Anal. 93, 359–372 (2016)
Armagan, A., Zaretzki, R.L.: Model selection via adaptive shrinkage with t priors. Comput. Stat. 25, 441–461 (2010)
Assaf, A.G., Tsionas, M., Tasiopoulos, A.: Diagnosing and correcting the effects of multicollinearity: Bayesian implications of ridge regression. Tour. Manage. 71, 1–8 (2019)
Avalos, B.R., Klein, J.L., Kapoor, N., Tutschka, P.J., Klein, J.P., Copelan, E.A.: Preparation for marrow transplantation in Hodgkin’s and non-Hodgkin’s lymphoma using Bu/CY. Bone Marrow Transpl. 12, 133–138 (1993)
Box, G.E.P., Tiao, G.C.: Bayesian Inference in Statistical Analysis. Wiley (1992)
Burzykowski, T., Molenberghs, G., Buyse, M., Geys, H., Renard, D.: Validation of surrogate end points in multiple randomized clinical trials with failure time end points. J. R. Stat. Soc. Ser. C (appl. Stat.) 50, 405422 (2001)
Chang, B., Joe, H.: Prediction based on conditional distributions of vine copulas. Comput. Stat. Data Anal. 139, 45–63 (2019)
Czado, C.: Analyzing Dependent Data with Vine Copulas. Lecture Notes in Statistics. Springer (2019)
Chang, Y.C., Mastrangelo, C.: Addressing multicollinearity in semiconductor manufacturing. Qual. Reliab. Eng. Int. 27, 843–854 (2011)
Chipman, H.: Bayesian variable selection with related predictors. Can. J. Stat. 24, 17–36 (1996)
Corradin, R., Nieto-Barajas, L.E., Nipoti, B.: Optimal stratification of survival data via Bayesian nonparametric mixtures. Econom. Stat. 22, 17–38 (2022)
Cox, D.R.: Regression models and life-tables. J. R. Stat. Soc.: Ser. B (methodol.) 34, 187–202 (1972)
Durante, F., Sempi, C.: Principles of Copula Theory. CRC Press, Boca Raton (2016)
Emura, T., Michimae, H., Matsui, S.: Dynamic risk prediction via a joint frailty-copula model and IPD meta-analysis: building web applications. Entropy 24, 589 (2022)
Emura, T., Sofeu, C., Rondeau, V.: Conditional copula models for correlated survival endpoints: individual patient data meta-analysis of randomized controlled trials. Stat. Methods Med. Res. 30, 26342650 (2021)
Flórez, A.J., Abad, A.A., Molenberghs, G., Van Der Elst, W.: Generating random correlation matrices with fixed values: an application to the evaluation of multivariate surrogate endpoints. Comput. Stat. Data Anal. 142, 106834 (2020)
Flórez, A.J., Molenberghs, G., Van der Elst, W., Abad, A.A.: An efficient algorithm to assess multivariate surrogate endpoints in a causal inference framework. Comput. Stat. Data Anal. 172, 107494 (2022)
García, C.B., García, J., López Martín, M.M., Salmerón, R.: Collinearity: revisiting the variance inflation factor in ridge regression. J. Appl. Stat. 42, 648–661 (2015)
Griffin, J., Brown, P.: Hierarchical shrinkage priors for regression models. Bayesian Anal. 12, 135–159 (2017)
Gruber, M.H.J.: Improving Efficiency by Shrinkage: The James-Stein and Ridge Regression Estimators. CRC Press (1998)
Hoerl, A.E., Kennard, R.W.: Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12, 55–67 (1970)
Hoerl, R.W.: Ridge regression: a historical context. Technometrics 62, 420–425 (2020)
Huard, D., Evin, G., Favre, A.C.: Bayesian copula selection. Comput. Stat. Data Anal. 51, 809–822 (2006)
Ibrahim, J.G., Chen, M.H., Sinha, D.: Bayesian Survival Analysis. Springer, New York (2001)
Joe, H.: Generating random correlation matrices based on partial correlations. J. Multivar. Anal. 97, 2177–2189 (2006)
Kalbfleisch, J.D.: Nonparametric Bayesian analysis of survival time data. J. R. Stat. Soc. B 40, 214–221 (1978)
Killiches, M., Kraus, D., Czado, C.: Examination and visualisation of the simplifying assumption for vine copulas in three dimensions. Aust. N. Z. J. Stat. 59, 95–117 (2017)
Klein, J.P., Moeschberger, M.L.: Survival Analysis: Techniques for Censored and Truncated Data, 2nd edn. Springer (2013)
Klein, J.P., Van Houwelingen, H.C., Ibrahim, J.G., Scheike, T.H.: Handbook of Survival Analysis. CRC Press, Boca Raton (2014)
Kurowicka, D., Cooke, R.: A parameterization of positive definite matrices in terms of partial correlation vines. Linear Algebra Appl. 372, 225–251 (2003)
Kurowicka, D., Cooke, R.M.: Uncertainty Analysis with High Dimensional Dependence Modelling. Wiley, Hoboken (2006)
Kwon, S., Ha, I.D., Shih, J.H., Emura, T.: Flexible parametric copula modelling approaches for clustered survival data. Pharm. Stat. 21(1), 69–88 (2022)
Lewandowski, D., Kurowicka, D., Joe, H.: Generating random correlation matrices based on vines and extended onion method. J. Multivar. Anal. 100(9), 1989–2001 (2009)
Lustbader, E.D.: Relative risk regression diagnosis. In: Moolgavkar, S.H., Prentice, R.L. (eds.) Modern Statistical Methods in Chronic Disease Epidemiology. SIAM, Philadelphia (1986)
Loesgen, K.H.: A generalization and Bayesian interpretation of ridge-type estimators with good prior means. Stat. Pap. 31, 147 (1990)
Michimae, H., Emura, T.: Bayesian ridge estimators based on copula-based joint prior distributions for regression parameters. Comput. Stat. 37, 2741–2769 (2022). https://doi.org/10.1007/s00180-022-01213-8
Michimae, H., Matsunami, M., Emura, T.: Robust ridge regression for estimating the effects of correlated gene expressions on phenotypic traits. Environ. Ecol. Stat. 27, 41–72 (2020)
Montgomery, D.C., Peck, E.A., Vining, G.G.: Introduction to Linear Regression Analysis, 5th edn. Wiley (2012)
Nagler, T., Bumann, C., Czado, C.: Model selection in sparse high-dimensional vine copula models with an application to portfolio risk. J. Multivar. Anal. 172, 180–192 (2019)
Nelsen, R.B.: An Introduction to Copulas. Springer (2006)
Park, T., Casella, G.: The Bayesian lasso. J. Am. Stat. Assoc. 103, 681–686 (2008)
Peng, M., Xiang, L., Wang, S.: Semiparametric regression analysis of clustered survival data with semi-competing risks. Comput. Stat. Data Anal. 124, 53–70 (2018)
Pliskin, J.L.: A ridge-type estimator and good prior means. Commun. Stat.-Theory Methods 16, 3429–3437 (1987)
Polson, N.G., Scott, J.G.: On the half-Cauchy prior for a global scale parameter. Bayesian Anal. 7, 887–902 (2012)
Rubio, F.J., Yu, K.: Flexible objective Bayesian linear regression with applications in survival analysis. J. Appl. Stat. 44, 798–810 (2017)
Salmerón, R., García, J., García, C., del Mar, L.M.: Transformation of variables and the condition number in ridge estimation. Comput. Stat. 33, 1497–1524 (2018)
Sambasivan, R., Das, S., Sahu, S.K.: A Bayesian perspective of statistical machine learning for big data. Comput. Stat. 35, 893–930 (2020)
Schepsmeier, U., Stöber, J.: Derivatives and Fisher information of bivariate copulas. Stat. Pap. 55, 525–542 (2014)
Schafer, R.L., Roi, L.D., Wolfe, R.A.: A ridge logistic estimator. Commun. Stat.-Theory Methods 13, 99–113 (1984)
Scheipl, F., Kneib, T., Fahrmeir, L.: Penalized likelihood and Bayesian function selection in regression models. Adv. Stat. Anal. 97, 349–385 (2013)
Segerstedt, B.: On ordinary ridge regression in generalized linear models. Commun. Stat.-Theory Methods 21, 2227–2246 (1992)
Sinha, D., Ibrahim, J.G., Chen, M.H.: A Bayesian justification of Cox’s partial likelihood. Biometrika 90, 629–641 (2003)
Spiegelhalter, D.J., Best, N.G., Carlin, B.P., van der Linde, A.: Bayesian measures of model complexity and fit. J. R. Stat. Soc. B 64, 583–640 (2002)
Stan Development Team.: Stan Modeling Language Users Guide and Reference Manual, https://mc-stan.org (2017)
Stan Development Team.: RStan: The R interface to Stan. R package version 2.17.3: http://mc-stan.org (2018)
Stewart, G.W.: Collinearity and least squares regression. Stat. Sci. 2, 68–100 (1987)
Stöber, J., Joe, H., Czado, C.: Simplified pair copula constructions—limitations and extensions. J. Multivar. Anal. 119, 101–118 (2013)
Van Wieringen, W.N.: Lecture Notes on Ridge Regression. Preprint. https://arxiv.org/pdf/1509.09169 (2020)
Van Wieringen, W.N., Kun, D., Hampel, R., Boulesteix, A.L.: Survival prediction using gene expression data: a review and comparison. Comput. Stat. Data Anal. 53, 1590–1603 (2009)
Veerman, J.R., Leday, G.G., van de Wiel, M.A.: Estimation of variance components, heritability and the ridge penalty in high-dimensional generalized linear models. Commun. Stat. Simul. Comput. 51(1), 116–134 (2022)
Verweij, P.J.M., van Houwelingen, H.C.: Penalized likelihood in Cox regression. Stat. Med. 13, 2427–2436 (1994)
Xue, X., Kim, M.Y., Shore, R.E.: Cox regression analysis in presence of collinearity: an application to assessment of health risks associated with occupational radiation exposure. Lifetime Data Anal. 13, 333–350 (2007)
Yang, S.P., Emura, T.: A Bayesian approach with generalized ridge estimation for high-dimensional regression and testing. Commun. Stat.-Simul. Comput. 46, 6083–6105 (2017)
Acknowledgements
The authors thank Editor, Associate Editor, and two referees for their valuable suggestions that improved the paper. This work was supported by JSPS KAKENHI Grant Number JP21K12127.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original online version of this article was revised: In Table 3 of this article, the first two values in the third column headed "n" were incorrect. The first value 200 should be 20, the second value 1.00 should be 200. Equation 12 has been corrected.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Michimae, H., Emura, T. Bayesian ridge regression for survival data based on a vine copula-based prior. AStA Adv Stat Anal 107, 755–784 (2023). https://doi.org/10.1007/s10182-022-00466-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10182-022-00466-4