Estimation of the association parameters in hierarchically clustered survival data by nested Archimedean copula functions

Hasan, Mirza Nazmul; Braekers, Roel

doi:10.1007/s00180-021-01094-3

Estimation of the association parameters in hierarchically clustered survival data by nested Archimedean copula functions

Original paper
Published: 26 March 2021

Volume 36, pages 2755–2787, (2021)
Cite this article

Computational Statistics Aims and scope Submit manuscript

287 Accesses
1 Citation
Explore all metrics

Abstract

Statisticians are frequently confronted with highly complex data such as clustered data, missing data or censored data. In this manuscript, we consider hierarchically clustered survival data. This type of data arises when a sample consists of clusters, and each cluster has several, correlated sub-clusters containing various, dependent survival times. Two approaches are commonly used to analysis such data and estimate the association between the survival times within a cluster and/or sub-cluster. The first approach is by using random effects in a frailty model while a second approach is by using copula models. Hereby we assume that the joint survival function is described by a copula function evaluated in the marginal survival functions of the different individuals within a cluster. In this manuscript, we introduce a copula model based on a nested Archimedean copula function for hierarchical survival data, where both the clusters and sub-clusters are allowed to be moderate to large and varying in size. We investigate one-stage, two-stage and three-stage parametric estimation procedures for the association parameters in this model. In a simulation study we check the finite sample properties of these estimators. Furthermore we illustrate the methods on a real life data-set on Chronic Granulomatous Disease.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modelling the association in bivariate survival data by using a Bernstein copula

Article 17 November 2021

Integrated likelihoods in parametric survival models for highly clustered censored data

Article 26 July 2015

Flexible Modeling of Frailty Effects in Clustered Survival Data

Code availability

The computational code added as a supplementary text.

References

Bandeen-Roche KJ, Liang KY (1996) Modelling failure-time associations in data with multiple levels of clustering. Biometrika 83:29–39
Article MathSciNet Google Scholar
Cox DR, Hinkley D (1974) Theoretical statistics. Chapman and Hall, London
Book Google Scholar
Craik ADD (2005) Prehistory of Faà di Bruno’s formula. Am Math Mon 112(2):119–130
MathSciNet MATH Google Scholar
Duchateau L, Janssen P (2008) The frailty model. Springer, New York
MATH Google Scholar
Fleming TR, Harrington DP (1991) Counting processes and survival analysis. Wiley, New York
MATH Google Scholar
Glidden DV (2000) A two-stage estimator of the dependence parameter for the Clayton–Oakes model. Lifetime Data Anal 6:141–156
Article MathSciNet Google Scholar
Hofert M (2011) Efficiently sampling nested Archimedean copulas. Comput Stat Data Anal 55:57–70
Article MathSciNet Google Scholar
Hofert M, Pham D (2013) Densities of nested Archimedean copulas. J Multivar Anal 118:37–52
Article MathSciNet Google Scholar
Hoffman EB, Sen PK, Weinberg CR (2001) Within-cluster resampling. Biometrika 88:1121–1134
Article MathSciNet Google Scholar
Joe H (1993) Parametric families of multivariate distributions with given margins. J Multivar Anal 46:262–282
Article MathSciNet Google Scholar
Joe H (1997) Multivariate models and dependence concepts. Chapman and Hall, New York
Book Google Scholar
Li Y, Lin X (2006) Semiparametric normal transformation models for spatially correlated survival data. J Am Stat Assoc 101:591–603
Article MathSciNet Google Scholar
Li Y, Prentice R, Lin X (2008) Semiparametric maximum likelihood estimation in normal transformation models for bivariate survival data. Biometrika 95(4):947–960
Article MathSciNet Google Scholar
Ma R, Krewski R, Burnett RT (2003) Random effects Cox models: a Poisson modelling approach. Biometrika 90:157–169
Article MathSciNet Google Scholar
McNeil AJ (2008) Sampling nested Archimedean copulas. J Stat Comput Simul 78(6):567–581
Article MathSciNet Google Scholar
Nelsen RB (2006) An introduction to copulas. Springer, New York
MATH Google Scholar
Othus M, Li Y (2010) A Gaussian copula model for multivariate survival data. Stat Biosci 2:154–179
Article Google Scholar
Prenen L, Braekers R, Duchateau L (2017) Extending the Archimedean copula methodology to model multivariate survival data grouped in clusters of variable size. J R Stat Soc B 79:483–505
Article MathSciNet Google Scholar
Rondeau V, Filleul L, Joly P (2006) Nested frailty models using maximum penalized likelihood estimation. Stat Med 25:4036–4052
Article MathSciNet Google Scholar
Sastry N (1997) A nested frailty model for survival data, with an application to the study of child survival in northeast Brazil. J Am Stat Assoc 92:426–435
Article Google Scholar
Shih JH, Louis TA (1995) Inferences on the association parameter in copula models for bivariate survival data. Biometrics 51:1384–1399
Article MathSciNet Google Scholar
Shih JH, Lu S (2007) Analysis of failure time data with multilevel clustering, with application to the child vitamin A intervention trial in Nepal. Biometrics 63:673–680
Article MathSciNet Google Scholar
Wienke A (2011) Frailty models in survival analysis. Chapman and Hall, New York
Google Scholar
Zhao Y, Joe H (2005) Composite likelihood estimation in multivariate data analysis. Can J Stat 33:335–356
Article MathSciNet Google Scholar

Download references

Acknowledgements

We would like to thank the editors and anonymous referees for valuable comments and insightful suggestions, which helped us to improve the manuscript. For the simulations we used the infrastructure of the Flemish Supercomputer Center, funded by the Hercules Foundation and the Flemish Government-department Economics, Science and Innovation.

Author information

Authors and Affiliations

Data Science Institute, I-BioStat, Universiteit Hasselt, Martelarenlaan 42, 3500, Hasselt, Belgium
Mirza Nazmul Hasan & Roel Braekers
Department of Statistics, Shahjalal University of Science and Technology, Sylhet, 3114, Bangladesh
Mirza Nazmul Hasan

Authors

Mirza Nazmul Hasan
View author publications
You can also search for this author in PubMed Google Scholar
Roel Braekers
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mirza Nazmul Hasan.

Ethics declarations

Conflict of interest

The authors declare that they have no potential conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 225 KB)

Appendices

Appendix A: Proof of Theorem 2

Let $\varvec{\beta _0}$ denote the true parameter vector for the margins. Expanding the score function $\varvec{U^*_\beta }$ in a Taylor series around $\varvec{\beta _0}$ and evaluating it at $\varvec{\beta }=\varvec{\bar{\beta }}$, we get under regularity conditions (stated in the supplementary material) of maximum likelihood theory

$$\begin{aligned} \varvec{U^*_\beta (\bar{\beta })}={\mathbf {0}}=\varvec{U^*_\beta (\beta _0)}+\frac{\partial \varvec{U^*_\beta }}{\partial \varvec{\beta }}\Bigg |_{\varvec{\beta }=\varvec{\beta _0}}(\varvec{\bar{\beta }}-\varvec{\beta _0})+o_p(\sqrt{L}). \end{aligned}$$

The $o_p$-notation stands for convergence in probability, i.e., $Y_L=o_p(\sqrt{L})$ is defined as $\lim _{L\rightarrow \infty }P(|Y_L/\sqrt{L}|\ge \varepsilon )=0$ for every positive $\varepsilon $. Similarly,

$$\begin{aligned} \varvec{U_\theta (\bar{\beta },\bar{\theta })}={\mathbf {0}}=\varvec{U_\theta (\beta _0,\theta _0)}&+\frac{\partial \varvec{U_\theta }}{\partial \varvec{\beta }}\Bigg |_{(\varvec{\beta ,\theta })=(\varvec{\beta _0,\theta _0})}(\varvec{\bar{\beta }}-\varvec{\beta _0})\\&+\frac{\partial \varvec{U_\theta }}{\partial \varvec{\theta }}\Bigg |_{(\varvec{\beta ,\theta })=(\varvec{\beta _0,\theta _0})}(\varvec{\bar{\theta }}-\varvec{\theta _0})+o_p(\sqrt{L}). \end{aligned}$$

By the law of large numbers, as $L\rightarrow \infty $,

$$\begin{aligned}&-\frac{1}{L}\frac{\partial \varvec{U^*_\beta }}{\partial \varvec{\beta }}\Bigg |_{\varvec{\beta }=\varvec{\beta _0}}= \frac{1}{L}\sum _{l=1}^{L}\Big [-\frac{\partial }{\partial \varvec{\beta }}\varvec{U^*}_{l,\varvec{\beta }}(\varvec{\beta _0})\Big ]\rightarrow \mathbf {I^*}\\&-\frac{1}{L}\frac{\partial \varvec{U_\theta }}{\partial \varvec{\beta }}\Bigg |_{(\varvec{\beta ,\theta })=(\varvec{\beta _0,\theta _0})}= \frac{1}{L}\sum _{l=1}^{L}\Big [-\frac{\partial }{\partial \varvec{\beta }}{\varvec{U}}_{l,\varvec{\theta }}(\varvec{\beta _0,\theta _0})\Big ]\rightarrow \mathbf {I}_{\theta \beta }\\&-\frac{1}{L}\frac{\partial \varvec{U_\theta }}{\partial \varvec{\theta }}\Bigg |_{(\varvec{\beta ,\theta })=(\varvec{\beta _0,\theta _0})}= \frac{1}{L}\sum _{l=1}^{L}\Big [-\frac{\partial }{\partial \varvec{\theta }}{\varvec{U}}_{l,\varvec{\theta }}(\varvec{\beta _0,\theta _0})\Big ]\rightarrow {\mathbf {I}_{\theta \theta }} \end{aligned}$$

Hence

$$\begin{aligned} \frac{1}{\sqrt{L}}\begin{pmatrix}\varvec{U^*_\beta (\beta _0)} \\ \varvec{U_\theta (\beta _0,\theta _0)}\end{pmatrix}\rightarrow \sqrt{L}\begin{pmatrix}\mathbf {I^*} &{} {\mathbf {0}}\\ \mathbf {I}_{\theta \beta } &{} {\mathbf {I}_{\theta \theta }}\end{pmatrix}\begin{pmatrix}\varvec{\bar{\beta }}-\varvec{\beta _0}\\ \varvec{\bar{\theta }}-\varvec{\theta _0} \end{pmatrix} \end{aligned}$$

By the central limit theorem, $\frac{1}{\sqrt{L}}\begin{pmatrix}\varvec{U^*_\beta (\beta _0)} \\ \varvec{U_\theta (\beta _0,\theta _0)}\end{pmatrix}$ converges to multivariate normal with mean $\begin{pmatrix} {\mathbf {0}}\\ {\mathbf {0}} \end{pmatrix}$ and variance-covariance matrix $\begin{pmatrix} {\mathbf {V}} &{} {\mathbf {0}}\\ {\mathbf {0}} &{} {\mathbf {I}_{\theta \theta }} \end{pmatrix}$ with ${\mathbf {V}}=\text {Var}\Big (\varvec{U^*}_{1,\varvec{\beta }}(\varvec{\beta _0})\Big )=E\Big [\varvec{U^*}_{1,\varvec{\beta }}(\varvec{\beta _0})(\varvec{U^*}_{1,\varvec{\beta }}(\varvec{\beta _0}))'\Big ]$.

Thus, $\sqrt{L}\begin{pmatrix}\varvec{\bar{\beta }}-\varvec{\beta _0}\\ \varvec{\bar{\theta }}-\varvec{\theta _0} \end{pmatrix}$ converges to multivariate normal with mean vector zero and variance-covariance matrix

$$\begin{aligned}&\begin{pmatrix}{\mathbf {I}}^* &{}\quad {\mathbf {0}}\\ \mathbf {I}_{\theta \beta } &{}\quad {\mathbf {I}_{\theta \theta }}\end{pmatrix}^{-1}\begin{pmatrix} {\mathbf {V}} &{}\quad {\mathbf {0}}\\ {\mathbf {0}} &{}\quad {\mathbf {I}_{\theta \theta }} \end{pmatrix}\begin{pmatrix}\mathbf {I^*} &{}\quad {\mathbf {0}}\\ \mathbf {I}_{\theta \beta } &{}\quad {\mathbf {I}_{\theta \theta }}\end{pmatrix}^{-1^T}\\&\quad =\begin{pmatrix}({\mathbf {I}}^*)^{-1} &{}\quad {\mathbf {0}}\\ -\mathbf {I}_{\theta \beta }({\mathbf {I}}^*)^{-1}({\mathbf {I}_{\theta \theta }})^{-1} &{}\quad ({\mathbf {I}_{\theta \theta }})^{-1}\end{pmatrix} \begin{pmatrix} {\mathbf {V}} &{}\quad {\mathbf {0}}\\ {\mathbf {0}} &{}\quad \mathbf {I}_{{\theta \theta }} \end{pmatrix} \begin{pmatrix}({\mathbf {I}}^*)^{-1^T} &{}\quad -({\mathbf {I}_{\theta \theta }})^{-1^T}({\mathbf {I}}^*)^{-1^T}{\mathbf {I}}^T_{{\theta \beta }}\\ {\mathbf {0}} &{}\quad ({\mathbf {I}_{\theta \theta }})^{-1^T}\end{pmatrix} \\&\quad =\begin{pmatrix} ({\mathbf {I}}^*)^{-1}{\mathbf {V}}({\mathbf {I}}^*)^{-1^T} &{}\quad -({\mathbf {I}_{\theta \theta }})^{-1^T}({\mathbf {I}}^*)^{-1}{\mathbf {V}}({\mathbf {I}}^*)^{-1^T}{\mathbf {I}_{\beta \theta }}\\ -\mathbf {I}_{\theta \beta }({\mathbf {I}}^*)^{-1}{\mathbf {V}}({\mathbf {I}}^*)^{-1^T} ({\mathbf {I}_{\theta \theta }})^{-1} &{}\quad V^* \end{pmatrix} \end{aligned}$$

$$\begin{aligned} \text {with}\;\;V^*= ({\mathbf {I}_{\theta \theta }})^{-1}+({\mathbf {I}_{\theta \theta }})^{-1^T}\mathbf {I}_{\theta \beta }({\mathbf {I}}^*)^{-1}{\mathbf {V}}({\mathbf {I}}^*)^{-1^T}{\mathbf {I}_{\beta \theta }} ({\mathbf {I}_{\theta \theta }})^{-1} \end{aligned}$$

The lower right element of this matrix is the asymptotic variance of $\sqrt{L}(\varvec{\bar{\theta }}-\varvec{\theta _0})$,which is given by

$$\begin{aligned} \text {Var-Cov}(\varvec{\bar{\theta }})=V^*=({\mathbf {I}_{\theta \theta }})^{-1}+({\mathbf {I}_{\theta \theta }})^{-1}\mathbf {I}_{\theta \beta }({\mathbf {I}}^*)^{-1}{\mathbf {V}}({\mathbf {I}}^*)^{-1}{\mathbf {I}_{\beta \theta }} ({\mathbf {I}_{\theta \theta }})^{-1} \end{aligned}$$

[Since $({\mathbf {I}_{\theta \theta }})^{-1^T}=({\mathbf {I}_{\theta \theta }})^{-1}$ and $({\mathbf {I}}^*)^{-1^T}=({\mathbf {I}}^*)^{-1}$].

Appendix B: Compute $s_{nk}(\alpha _1)$ for Clayton copula

The nth and $(n+1)$th derivatives of the inner generator for the nested Clayton copula function are as follows,

$$\begin{aligned} \psi _{01}^{(n)}(t;x_0)&=\psi _{01}(t;x_0)\sum _{k=1}^{n}(1+t)^{\alpha _1k-n}s_{nk}(\alpha _1)(-x_0)^k \end{aligned}$$

(14)

$$\begin{aligned} \psi _{01}^{(n+1)}(t;x_0)&=\psi _{01}(t;x_0)\sum _{k=1}^{n+1}(1+t)^{\alpha _1k-(n+1)}s_{n+1,k}(\alpha _1)(-x_0)^k \end{aligned}$$

(15)

We also get the $(n+1)$th derivative of the inner generator by differentiating Eq. (14), which gives,

$$\begin{aligned} \psi _{01}^{(n+1)}(t;x_0)&=\psi _{01}^{(1)}(t;x_0)\sum _{k=1}^{n}(1+t)^{\alpha _1k-n}s_{nk}(\alpha _1)(-x_0)^k \nonumber \\&\quad + \psi _{01}(t;x_0)\sum _{k=1}^{n}(\alpha _1k-n)(1+t)^{\alpha _1k-n-1}s_{nk}(\alpha _1)(-x_0)^k \nonumber \\&=\psi _{01}(t;x_0)\alpha _1 (1+t)^{\alpha _1-1} (-x_0)\sum _{k=1}^{n}(1+t)^{\alpha _1k-n}s_{nk}(\alpha _1)(-x_0)^k \nonumber \\&\quad +\psi _{01}(t;x_0)\sum _{k=1}^{n}(\alpha _1k-n)(1+t)^{\alpha _1k-(n+1)}s_{nk}(\alpha _1)(-x_0)^k \nonumber \\&=\psi _{01}(t;x_0)\sum _{k=1}^{n}(1+t)^{\alpha _1(k+1)-(n+1)}\alpha _1 s_{nk}(\alpha _1)(-x_0)^{(k+1)} \nonumber \\&\quad +\psi _{01}(t;x_0)\sum _{k=1}^{n}(\alpha _1k-n)(1+t)^{\alpha _1k-(n+1)}s_{nk}(\alpha _1)(-x_0)^k \nonumber \\&= \psi _{01}(t;x_0)\sum _{k^*=2}^{n+1}(1+t)^{\alpha _1k^*-(n+1)}\alpha _1 s_{n,k^*-1}(\alpha _1)(-x_0)^{k^*}\nonumber \\&\quad +\psi _{01}(t;x_0)\sum _{k=1}^{n}(\alpha _1k-n)(1+t)^{\alpha _1k-(n+1)}s_{nk}(\alpha _1)(-x_0)^k \end{aligned}$$

(16)

Now, by comparing Eqs. (15) and (16), we get,

$$\begin{aligned} s_{n+1,1}(\alpha _1)&= (\alpha _1-n) s_{n,1}(\alpha _1)\quad \text {for} \quad k=1 \\ s_{n+1,n+1}(\alpha _1)&= \alpha _1 s_{n,n}(\alpha _1)\quad \text {for} \quad k=n+1 \\ s_{n+1,k}(\alpha _1)&= (\alpha _1k-n) s_{n,k}(\alpha _1)+\alpha _1s_{n,k-1}(\alpha _1)\quad \text {for} \quad k=2,3,\ldots ,n \end{aligned}$$

Using that for $n=0$, $s_{0,0}(\alpha _1)=1$ we get that

$$\begin{aligned}&s_{1,1}(\alpha _1) = \alpha _1 s_{0,0}(\alpha _1)= \alpha _1\\&\Rightarrow s_{2,2}(\alpha _1) =\alpha _1 s_{1,1}(\alpha _1)= \alpha _1^2\\&\Rightarrow \ldots \Rightarrow s_{n,n}(\alpha _1) =\alpha _1 s_{n-1,n-1}(\alpha _1)= \alpha _1^n \end{aligned}$$

Therefore, $\log \Big \{(-1)^{n-n} s_{n,n}(\alpha _1)\Big \}= (-1)^{n-n} n \log (\alpha _1)= n \log (\alpha _1)$ for all $n\in ~{\mathbb {N}}$. Furthermore we get that

$$\begin{aligned} s_{2,1}(\alpha _1)&= (\alpha _1-1) s_{1,1}(\alpha _1)=\alpha _1(\alpha _1-1)= (\alpha _1)_2 \\ \Rightarrow s_{3,1}(\alpha _1)&= (\alpha _1-2) s_{2,1}(\alpha _1)= (\alpha _1)_3\\ \ldots \\ \Rightarrow s_{n,1}(\alpha _1)&= (\alpha _1)_n= \alpha _1(\alpha _1-1)(\alpha _1-2) \ldots (\alpha _1-n+1) \end{aligned}$$

Hence, we get that $\log \Big \{(-1)^{n-1} s_{n,1}(\alpha _1)\Big \}= \log \Big \{(-1)^{n-1} (\alpha _1)_n\Big \}$ in which $(-1)^{n-1} (\alpha _1)_n > 0$ and $\log \Big \{(-1)^{n+1-k}s_{n+1,k}(\alpha _1)\Big \} = \log \Big \{(-1)^{n+1-k}(\alpha _1k-n) s_{n,k}(\alpha _1)+(-1)^{n+1-k}\alpha _1s_{n,k-1}(\alpha _1)\Big \}$ for all $n\in {\mathbb {N}}$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hasan, M.N., Braekers, R. Estimation of the association parameters in hierarchically clustered survival data by nested Archimedean copula functions. Comput Stat 36, 2755–2787 (2021). https://doi.org/10.1007/s00180-021-01094-3

Download citation

Received: 15 September 2020
Accepted: 02 March 2021
Published: 26 March 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s00180-021-01094-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Estimation of the association parameters in hierarchically clustered survival data by nested Archimedean copula functions

Abstract

Access this article

Similar content being viewed by others

Modelling the association in bivariate survival data by using a Bernstein copula

Integrated likelihoods in parametric survival models for highly clustered censored data

Flexible Modeling of Frailty Effects in Clustered Survival Data

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary material 1 (pdf 225 KB)

Appendices

Appendix A: Proof of Theorem 2

Appendix B: Compute \(s_{nk}(\alpha _1)\) for Clayton copula

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Estimation of the association parameters in hierarchically clustered survival data by nested Archimedean copula functions

Abstract

Access this article

Similar content being viewed by others

Modelling the association in bivariate survival data by using a Bernstein copula

Integrated likelihoods in parametric survival models for highly clustered censored data

Flexible Modeling of Frailty Effects in Clustered Survival Data

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary material 1 (pdf 225 KB)

Appendices

Appendix A: Proof of Theorem 2

Appendix B: Compute \(s_{nk}(\alpha _1)\) for Clayton copula

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation