Abstract
Statistical models with unobservable random variables such as random-effect models have been recently studied for analyzing data of complex types (e.g. longitudinal and time-to-event data) in various areas. The hierarchical likelihood [h-likelihood; Lee and Nelder (Journal of the Royal Statistical Society 58, 619–678, 1996)] provides a unified framework for the inference of such models with unobservable random variables. In this paper, we review the h-likelihood framework for survival analysis. We also demonstrate how to analyze survival data via web-based software Albatross Analytics which was recently developed based on the h-likelihood procedure. Furthermore, we discuss recent extensions of the h-likelihood.
Similar content being viewed by others
References
Aitkin, M., & Foxall, R. (2003). Statistical modelling of artificial neural networks using the multi-layer perceptron. Statistics and Computing, 13, 227–239.
Austin, P. C. (2017). A tutorial on multilevel survival analysis: Methods, models and applications. International Statistical Review, 85, 185–203.
Balan, T. A., & Putter, H. (2020). A tutorial on frailty models. Statistical Methods in Medical Research, 29, 3424–3454.
Breslow, N. E. (1972). Discussion of Professor Cox’s paper. Journal of the Royal Statistical Society: Series B, 34, 216–217.
Breslow, N. E. (1974). Covariance analysis of censored survival data. Biometrics, 30, 89–99.
Chee, C.-S., Ha, I.D., Seo, B., & Lee, Y. (2021). Semiparametric estimation for nonparametric frailty models using nonparametric maximum likelihood approach. revision submitted to Statistical Methods in Medical Research.
Ching, T., Zhu, X., & Garmire, L. X. (2018). Cox-nnet: an artificial neural network method for prognosis prediction of high-throughput omics data. PLoS Computational Biology, 14(4), e1006076.
Christian, N. J., Ha, I. D., & Jeong, J.-H. (2016). Hierarchical likelihood inference on clustered competing risks data. Statistics in Medicine, 35, 251–267.
Duchateau, L., & Janssen, P. (2008). The frailty model. Berlin: Springer.
Efron, B. (2020). Prediction, estimation, and attribution. Journal of the American Statistical Association, 530, 636–655.
Elbers, C., & Ridder, G. (1982). True and spurious duration dependence: the identifiability of the proportional hazard model. Review of Economics Studies, 49, 403–409.
Elghafghuf, A., & Stryhn, H. (2017). Robust Poisson likelihood estimation for frailty cox models: a simulation study. Communications in Statistics—Simulation and Computation, 46, 2907–2923.
Emura, T., Nakatochi, M., Murotani, K., et al. (2017). A joint frailty-copula model between tumour progression and death for meta-analysis. Statistical Methods in Medical Research, 26, 2649–2666.
Emura, T., Matsui, S., & Rondeau, V. (2019). Survival analysis with correlated endpoints, joint frailty-copula models. JSS Research series in statistics. Singapore: Springer.
Emura, T., Shih, J.-H., Ha, I. D., & Wilke, R. A. (2020). Comparison of the marginal hazard model and the sub-distribution hazard model for competing risks under an assumed copula. Statistical Methods in Medical Research, 29, 2307–2327.
Fan, J., Ma, C., & Zhong, Y. (2019). A selective overview of deep learning. [stat.ML], 14 2019.
Fine, J. P., & Gray, R. J. (1999). A proportional hazards model for the subdistribution of a competing risk. Journal of the American Statistical Association, 94, 548–560.
Goethals, K., Janssen, P., & Duchateau, L. (2008). Frailty models and copulas: similarities and differences. Journal of Applied Statistics, 35, 1071–1079.
Guo, X., & Carlin, B. P. (2004). Separate and joint modeling of longitudinal and event time data using standard computer packages. American Statistician, 58, 16–24.
Ha, I. D., Lee, Y., & Song, J.-K. (2001). Hierarchical likelihood approach for frailty models. Biometrika, 88, 233–243.
Ha, I. D., Lee, Y., & Song, J.-K. (2002). Hierarchical likelihood approach for mixed linear models with censored data. Lifetime Data Analysis, 8, 163–176.
Ha, I. D., & Lee, Y. (2003). Estimating frailty models via Poisson hierarchical generalized linear models. Journal of Computational and Graphical Statistics, 12, 663–681.
Ha, I. D., Park, T., & Lee, Y. (2003). Joint modelling of repeated measures and survival time data. Biometrical Journal, 45, 647–658.
Ha, I. D., & Lee, Y. (2005a). Comparison of hierarchical likelihood versus orthodox best linear unbiased predictor approaches for frailty models. Biometrika, 92, 717–723.
Ha, I. D., & Lee, Y. (2005b). Multilevel mixed linear models for survival data. Lifetime Data Analysis, 11, 131–142.
Ha, I. D., Lee, Y., & MacKenzie, G. (2007a). Model selection for multi-component frailty models. Statistics in Medicine, 26, 4790–4807.
Ha, I. D., Lee, Y., & Pawitan, Y. (2007b). Genetic mixed linear models for twin survival data. Behavior Genetics, 37, 621–630.
Ha, I. D., Noh, M., & Lee, Y. (2010). Bias reduction of likelihood estimators in semi-parametric frailty models. Scandinavian Journal of Statistics, 37, 307–320.
Ha, I. D., Sylvester, R., Legrand, C., & MacKenzie, G. (2011). Frailty modelling for survival data from multi-centre clinical trials. Statistics in Medicine, 30, 28–37.
Ha, I. D., Noh, M., & Lee, Y. (2012). frailtyHL: a package for fitting frailty models with h-likelihood. R Journal, 4, 307–320.
Ha, I. D., Pan, J., Oh, S., & Lee, Y. (2014a). Variable selection in general frailty models using penalized h-likelihood. Journal of Computational and Graphical Statistics, 23, 1044–1060.
Ha, I. D., Lee, M., Oh, S., Jeong, J.-H., Sylvester, R., & Lee, Y. (2014b). Variable selection in subdistribution hazard frailty models with competing risks data. Statistics in Medicine, 33, 4590–4604.
Ha, I. D., Christian, N. J., Jeong, J.-H., Park, J., & Lee, Y. (2016a). Analysis of clustered competing risks data using subdistribution hazard models with multivariate frailties. Statistical Methods in Medical Research, 25, 2488–2505.
Ha, I. D., Vaida, F., & Lee, Y. (2016b). Interval estimation of random effects in proportional hazards models with frailties. Statistical Methods in Medical Research, 25, 936–953.
Ha, I. D., Jeong, J.-H., & Lee, Y. (2017a). Statistical modelling of survival data with random effects: h-likelihood approach. Singapore: Springer.
Ha, I. D., Noh, M., & Lee, Y. (2017b). H-likelihood approach for joint modelling of longitudinal outcomes and time-to-event data. Biometrical Journal, 59, 1122–1143.
Ha, I.D., Noh, M., Kim, J., & Lee, Y. (2018). FrailtyHL: frailty models using h-likelihood. R package version 2.1. http://CRAN.Rproject.org/package=frailtyHL
Ha, I. D., Kim, J., & Emura, T. (2019). Profile likelihood approaches for semiparametric copula and frailty models for clustered survival data. Journal of Applied Statistics, 46, 2553–2571.
Ha, I. D., Lee, Y., Xiang, L., Peng, M., & Jeong, J.-H. (2020). Frailty modelling approaches for semi-parametric risks data. Lifetime Data Analysis, 26, 109–133.
Hao, L., Kim, J., Kwon, S., & Ha, I.D. (2021). Deep learning-based survival analysis for high-dimensional survival data. submitted to Mathematics, in press.
Hougaard, P. (2000). Analysis of multivariate survival data. New York: Springer.
Huang, X., & Wolfe, R. (2002). A frailty model for informative censoring. Biometrics, 58, 510–520.
Huang, R., Xiang, L., & Ha, I. D. (2019). Frailty proportional mean residual life regression for clustered survival data: A hierarchical quasi-likelihood method. Statistics in Medicine, 38, 4854–4870.
Jin, S., & Lee, Y. (2020). A review of h-likelihood and hierarchical generalized linear model. WIREs Computational Statistics, in press.
Kalbfleisch, J. D., & Prentice, R. L. (1980). The statistical analysis of failure time data. New York: Wiley.
Katzman, J. L., Shaham, U., Cloninger, A., Bates, J., Jiang, T., & Kluger, Y. (2018). DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Medical Research Methodology, 18(1), 24.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature., 521, 436–444.
Lee, M., Ha, I. D., & Lee, Y. (2017). Frailty modeling for clustered competing risks data with missing cause of failure. Statistical Methods in Medical Research, 26, 356–373.
Lee, Y., & Nelder, J. A. (1996). Hierarchical generalized linear models (with discussion). Journal of the Royal Statistical Society: Series B, 58, 619–678.
Lee, Y., & Nelder, J. A. (2001). Hierarchical generalised linear models: a synthesis of generalised linear models, random-effect models and structured dispersions. Biometrika, 88, 987–1006.
Lee, Y., Nelder, J. A., & Pawitan, Y. (2017). Generalised linear models with random effects: unified analysis via h-likelihood (2nd ed.). Boca Raton: Chapman and Hall.
Lee, Y., & Noh, M. (2018). dhglm: Double Hierarchical Generalized Linear Models. R package version 2.0. Retrieved from https://CRAN.R-project.org/package=dhglm.
Lee, Y., & Kim, G. (2016). H-likelihood predictive intervals for unobservables. International Statistical Review, 84, 487–505.
Lee, Y., & Kim, G. (2020). Properties of h-likelihood estimators in clustered data. International Statistical Review, 88, 380–395.
Liu, L., Wolfe, R. A., & Huang, X. (2004). Shared Frailty Models for Recurrent Events and a Terminal Event. Biometrics, 60, 747–756.
Paik, M. C., Lee, Y., & Ha, I. D. (2015). Frequentist inference on random effects based on summarizability. Statistica Sinica, 25, 1107–1132.
Park, E., & Ha, I. D. (2019). Penalized variable selection for accelerated failure time models with random effects. Statistics in Medicine, 38, 878–892.
Rakhmawati, T.W., Ha, I.D., Lee, H., & Lee, Y. (2021). Penalized variable selection for cause-specific frailty models with clustered competing-risks data. revision submitted to Statistics in Medicine.
Ripatti, S., & Palmgren, J. (2000). Estimation of multivariate frailty models using penalized partial likelihood. Biometrics, 56, 1016–1022.
Rizopoulos, D. (2012). Joint models for longitudinal and time-to-event data, with applications in R. Boca Raton: Chapman and Hall.
Rondeau, V., Filleul, L., & Joly, P. (2006). Nested frailty models using maximum penalized likelihood estimation. Statistics in Medicine, 25, 4036–4052.
Rondeau, V., Schaffner, E., Corbiere, F., Gonzalez, J. R., & Mathoulin-Pelissier, S. (2013). Cure frailty models for survival data: application to recurrences for breast cancer and to hospital readmissions for colorectal cancer. Statistical Methods in Medical Research, 22, 243–260.
Sylvester, R. J., van der Meijden, A. P., Oosterlinck, W., et al. (2006). Predicting recurrence and progression in individual patients with stage Ta T1 bladder cancer using EORTC risk tables: a combined analysis of 2596 patients from seven EORTC trials. European Urology, 49, 466–477.
Tawiah, R., McLachlan, G. J., & Ng, S. K. (2020a). Mixture cure models with time-varying and multilevel frailties for recurrent event data. Statistical Methods in Medical Research, 29, 1368–1385.
Tawiah, R., McLachlan, G. J., & Ng, S. K. (2020b). A bivariate joint frailty model with mixture framework for survival analysis of recurrent events with dependent censoring and cure fraction. Biometrics, 76, 753–766.
Therneau, T. M., & Grambsch, P. M. (2000). Modeling survival data: extending the Cox model. New York: Springer.
Vaida, F., & Blanchard, S. (2005). Conditional Akaike information for mixed-effects models. Biometrika, 92, 351–370.
Zheng, M., & Klein, J. P. (1995). Estimates of marginal survival for dependent competing risks based on an assumed copula. Biometrika, 82, 127–138.
Zhou, B., & Latouche, A. (2015). crrSC: Competing risks regression for stratified and clustered data. R package version, 1, 1.
Acknowledgements
The research of Dr. Il Do Ha was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education (No. NRF-2020R1F1A1A01056987). The research of Dr. Youngjo Lee was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No 2019R1A2C1002408).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Appendix
Appendix A: Appendix
Appendix A: Derivation of h-likelihood (1)
Under the assumptions of the conditional independence and non-informative censoring, the h-(log)likelihood based on the (i, j)th observation is defined by a joint density function of \((Y_{ij}, \Delta _{ij}, V_{i})\) as follows.
where the first term of \(h_{ij}\) can be expressed as
Here \(S(\cdot |u_{i})=\exp \{- \Lambda (\cdot |u_{i})\}\), \(\Lambda (\cdot |u_{i})\) and \(\lambda (\cdot |u_{i})\) are the conditional survival function, cumulative hazard function and hazard function of \(T_{ij}|u_{i}\), respectively. Thus, the h-likelihood (1) of all observations becomes
where \(\ell _{1ij}= \log f(y_{ij}, \delta _{ij}|u_{i})\) and \(\ell _{2i}= \log g(v_{i})\). Note here that for \(v_{i}=\nu (u_{i})\) on a strictly monotone function of \(u_{i}\),
Note here that the Jacobian is involved when we transform the u-scale to the \(\nu \)-scale in getting of h-likelihood.
Appendix B: Derivation of the PHL under the joint model
Let \(C_{i}\) denote independent censoring time. We assume that given \(v_{i}\), \(C_{i}\) is independent of \((T_{ik},\Delta _{ik})\) for \(k=1,2\). Then the observed event time and event indicator are, respectively, given by
Thus, all observable random variables are \((Y_{ij},T_{i}^{*},\Delta _{ik})\) with their observed values \((y_{ij},t_{i}^{*},\delta _{ik})\) \((i=1,\ldots ,q;j=1,\ldots ,n_{i};k=1,2)\). Here the h-likelihood for the general joint models with competing-risk data becomes
where \(\ell _{1ij}\) is the conditional normal log-likelihood for \(y_{ij}\) given \(v_{i1}\), and for \(k=1,2\)
where \(\eta _{2ik}=x_{i2}^{T}\beta _{k+1}+ v_{i,k+1}\), and \(\ell _{3i}\) is the log-likelihood for \(v_{i}=(v_{i1}, v_{i2}, v_{i3})^T\) with trivariate normal distribution, given by
Following (5) and (6), it can be easily shown that the corresponding PHL \(h_{p}\) is given by
where \(d_{(kr)}\) is the number of events at time \(t_{(kr)}\) and \( R_{(kr)}=\{i:t_{i}^{*}\ge t_{(kr)}\}\) is the risk set at \(t_{(kr)}\) which is the rth (\(r=1,\ldots ,D_{k}\)) smallest distinct event time for Type k event among the \(t_{i}^{*}\)s.
Rights and permissions
About this article
Cite this article
Ha, I.D., Lee, Y. A review of h-likelihood for survival analysis. Jpn J Stat Data Sci 4, 1157–1178 (2021). https://doi.org/10.1007/s42081-021-00125-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42081-021-00125-z