Abstract
Statisticians are frequently confronted with highly complex data such as clustered data, missing data or censored data. In this manuscript, we consider hierarchically clustered survival data. This type of data arises when a sample consists of clusters, and each cluster has several, correlated sub-clusters containing various, dependent survival times. Two approaches are commonly used to analysis such data and estimate the association between the survival times within a cluster and/or sub-cluster. The first approach is by using random effects in a frailty model while a second approach is by using copula models. Hereby we assume that the joint survival function is described by a copula function evaluated in the marginal survival functions of the different individuals within a cluster. In this manuscript, we introduce a copula model based on a nested Archimedean copula function for hierarchical survival data, where both the clusters and sub-clusters are allowed to be moderate to large and varying in size. We investigate one-stage, two-stage and three-stage parametric estimation procedures for the association parameters in this model. In a simulation study we check the finite sample properties of these estimators. Furthermore we illustrate the methods on a real life data-set on Chronic Granulomatous Disease.
Similar content being viewed by others
Code availability
The computational code added as a supplementary text.
References
Bandeen-Roche KJ, Liang KY (1996) Modelling failure-time associations in data with multiple levels of clustering. Biometrika 83:29–39
Cox DR, Hinkley D (1974) Theoretical statistics. Chapman and Hall, London
Craik ADD (2005) Prehistory of Faà di Bruno’s formula. Am Math Mon 112(2):119–130
Duchateau L, Janssen P (2008) The frailty model. Springer, New York
Fleming TR, Harrington DP (1991) Counting processes and survival analysis. Wiley, New York
Glidden DV (2000) A two-stage estimator of the dependence parameter for the Clayton–Oakes model. Lifetime Data Anal 6:141–156
Hofert M (2011) Efficiently sampling nested Archimedean copulas. Comput Stat Data Anal 55:57–70
Hofert M, Pham D (2013) Densities of nested Archimedean copulas. J Multivar Anal 118:37–52
Hoffman EB, Sen PK, Weinberg CR (2001) Within-cluster resampling. Biometrika 88:1121–1134
Joe H (1993) Parametric families of multivariate distributions with given margins. J Multivar Anal 46:262–282
Joe H (1997) Multivariate models and dependence concepts. Chapman and Hall, New York
Li Y, Lin X (2006) Semiparametric normal transformation models for spatially correlated survival data. J Am Stat Assoc 101:591–603
Li Y, Prentice R, Lin X (2008) Semiparametric maximum likelihood estimation in normal transformation models for bivariate survival data. Biometrika 95(4):947–960
Ma R, Krewski R, Burnett RT (2003) Random effects Cox models: a Poisson modelling approach. Biometrika 90:157–169
McNeil AJ (2008) Sampling nested Archimedean copulas. J Stat Comput Simul 78(6):567–581
Nelsen RB (2006) An introduction to copulas. Springer, New York
Othus M, Li Y (2010) A Gaussian copula model for multivariate survival data. Stat Biosci 2:154–179
Prenen L, Braekers R, Duchateau L (2017) Extending the Archimedean copula methodology to model multivariate survival data grouped in clusters of variable size. J R Stat Soc B 79:483–505
Rondeau V, Filleul L, Joly P (2006) Nested frailty models using maximum penalized likelihood estimation. Stat Med 25:4036–4052
Sastry N (1997) A nested frailty model for survival data, with an application to the study of child survival in northeast Brazil. J Am Stat Assoc 92:426–435
Shih JH, Louis TA (1995) Inferences on the association parameter in copula models for bivariate survival data. Biometrics 51:1384–1399
Shih JH, Lu S (2007) Analysis of failure time data with multilevel clustering, with application to the child vitamin A intervention trial in Nepal. Biometrics 63:673–680
Wienke A (2011) Frailty models in survival analysis. Chapman and Hall, New York
Zhao Y, Joe H (2005) Composite likelihood estimation in multivariate data analysis. Can J Stat 33:335–356
Acknowledgements
We would like to thank the editors and anonymous referees for valuable comments and insightful suggestions, which helped us to improve the manuscript. For the simulations we used the infrastructure of the Flemish Supercomputer Center, funded by the Hercules Foundation and the Flemish Government-department Economics, Science and Innovation.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no potential conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Appendices
Appendix A: Proof of Theorem 2
Let \(\varvec{\beta _0}\) denote the true parameter vector for the margins. Expanding the score function \(\varvec{U^*_\beta }\) in a Taylor series around \(\varvec{\beta _0}\) and evaluating it at \(\varvec{\beta }=\varvec{\bar{\beta }}\), we get under regularity conditions (stated in the supplementary material) of maximum likelihood theory
The \(o_p\)-notation stands for convergence in probability, i.e., \(Y_L=o_p(\sqrt{L})\) is defined as \(\lim _{L\rightarrow \infty }P(|Y_L/\sqrt{L}|\ge \varepsilon )=0\) for every positive \(\varepsilon \). Similarly,
By the law of large numbers, as \(L\rightarrow \infty \),
Hence
By the central limit theorem, \(\frac{1}{\sqrt{L}}\begin{pmatrix}\varvec{U^*_\beta (\beta _0)} \\ \varvec{U_\theta (\beta _0,\theta _0)}\end{pmatrix}\) converges to multivariate normal with mean \(\begin{pmatrix} {\mathbf {0}}\\ {\mathbf {0}} \end{pmatrix}\) and variance-covariance matrix \(\begin{pmatrix} {\mathbf {V}} &{} {\mathbf {0}}\\ {\mathbf {0}} &{} {\mathbf {I}_{\theta \theta }} \end{pmatrix}\) with \({\mathbf {V}}=\text {Var}\Big (\varvec{U^*}_{1,\varvec{\beta }}(\varvec{\beta _0})\Big )=E\Big [\varvec{U^*}_{1,\varvec{\beta }}(\varvec{\beta _0})(\varvec{U^*}_{1,\varvec{\beta }}(\varvec{\beta _0}))'\Big ]\).
Thus, \(\sqrt{L}\begin{pmatrix}\varvec{\bar{\beta }}-\varvec{\beta _0}\\ \varvec{\bar{\theta }}-\varvec{\theta _0} \end{pmatrix}\) converges to multivariate normal with mean vector zero and variance-covariance matrix
The lower right element of this matrix is the asymptotic variance of \(\sqrt{L}(\varvec{\bar{\theta }}-\varvec{\theta _0})\),which is given by
[Since \(({\mathbf {I}_{\theta \theta }})^{-1^T}=({\mathbf {I}_{\theta \theta }})^{-1}\) and \(({\mathbf {I}}^*)^{-1^T}=({\mathbf {I}}^*)^{-1}\)].
Appendix B: Compute \(s_{nk}(\alpha _1)\) for Clayton copula
The nth and \((n+1)\)th derivatives of the inner generator for the nested Clayton copula function are as follows,
We also get the \((n+1)\)th derivative of the inner generator by differentiating Eq. (14), which gives,
Now, by comparing Eqs. (15) and (16), we get,
Using that for \(n=0\), \(s_{0,0}(\alpha _1)=1\) we get that
Therefore, \(\log \Big \{(-1)^{n-n} s_{n,n}(\alpha _1)\Big \}= (-1)^{n-n} n \log (\alpha _1)= n \log (\alpha _1)\) for all \(n\in ~{\mathbb {N}}\). Furthermore we get that
Hence, we get that \(\log \Big \{(-1)^{n-1} s_{n,1}(\alpha _1)\Big \}= \log \Big \{(-1)^{n-1} (\alpha _1)_n\Big \}\) in which \((-1)^{n-1} (\alpha _1)_n > 0\) and \(\log \Big \{(-1)^{n+1-k}s_{n+1,k}(\alpha _1)\Big \} = \log \Big \{(-1)^{n+1-k}(\alpha _1k-n) s_{n,k}(\alpha _1)+(-1)^{n+1-k}\alpha _1s_{n,k-1}(\alpha _1)\Big \}\) for all \(n\in {\mathbb {N}}\).
Rights and permissions
About this article
Cite this article
Hasan, M.N., Braekers, R. Estimation of the association parameters in hierarchically clustered survival data by nested Archimedean copula functions. Comput Stat 36, 2755–2787 (2021). https://doi.org/10.1007/s00180-021-01094-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-021-01094-3