Skip to main content
Log in

A flexible semiparametric transformation model for survival data

  • Published:
Lifetime Data Analysis Aims and scope Submit manuscript

Abstract

I suggest an extension of the semiparametric transformation model that specifies a time-varying regression structure for the transformation, and thus allows time-varying structure in the data. Special cases include a stratified version of the usual semiparametric transformation model. The model can be thought of as specifying a first order Taylor expansion of a completely flexible baseline. Large sample properties are derived and estimators of the asymptotic variances of the regression coefficients are given. The method is illustrated by a worked example and a small simulation study. A goodness of fit procedure for testing if the regression effects lead to a satisfactory fit is also suggested.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Bagdonavicius V, Nikulin M (1999) Generalised proportional hazards model based on modified partial likelihood. Lifetime Data Anal 5:329–350

    Article  MathSciNet  Google Scholar 

  • Bagdonavicius V, Nikulin M (2001) Accelerated life models: modelling and statistical analysis. Chapman and Hall, London

    Google Scholar 

  • Bennett S (1983) Analysis of survival data by the proportional odds model. Statist Med 2:273–277

    Google Scholar 

  • Cai T, Wei LJ, Wilcox M (2000) Semiparametric regression analysis for clustered failure time data. Biometrika 87:867–878

    Article  MathSciNet  Google Scholar 

  • Chen K, Jin Z, Ying Z (2002) Semiparametric analysis of transformation models with censored data. Biometrika 89:659–668

    Article  MathSciNet  Google Scholar 

  • Cheng SC, Wei LJ, Ying Z (1995) Analysis of transformation models with censored data. Biometrika 82:835–845

    Article  MathSciNet  Google Scholar 

  • Cheng SC, Wei LJ, Ying Z (1997) Prediction of survival probabilities with semi-parametric transformation models. J Am Statist Assoc 92:227–235

    Article  MathSciNet  Google Scholar 

  • Cox DR (1972) Regression models and life tables (with discussion). J Roy Statist Soc Ser B 34:187–220

    MathSciNet  Google Scholar 

  • Dabrowska DM (1997) Smoothed Cox regression. Ann Statist 25:1510–1540

    Article  MathSciNet  Google Scholar 

  • Dabrowska DM (2005) Quantile regression in transformation models. arXiv:math.ST 05115082v1:1–34

  • Dabrowska DM (2006) Estimation in a class of semiparametric transformation models. arXiv:math.ST 0511506v2:1–48

  • Dabrowska DM, Doksum KA (1988) Partial likelihood in transformation models with censored data. Scand J Statist 15:1–24

    MathSciNet  Google Scholar 

  • Fine J, Ying Z, Wei LJ (1998) On the linear transformation model with censored data. Biometrika 85:980–986

    Article  Google Scholar 

  • Gill RD, Johansen S (1990) A survey of product-integration with a view towards application in survival analysis. Ann Statist 18:1501–1555

    MathSciNet  Google Scholar 

  • Jensen GV, Torp-Pedersen C, Hildebrandt P, Kober L, Nielsen FE, Melchior T, Joen T, Andersen PK (1997) Does in-hospital ventricular fibrillation affect prognosis after myocardial infarction?. Eur Heart J 18:919–924

    Google Scholar 

  • Kosorok MR, Lee BL, Fine JP (2004) Robust inference for univariate proportional hazards frailty regression models. Ann Statist 32:1448–1491

    Article  MathSciNet  Google Scholar 

  • Lin DY, Wei LJ, Ying Z (1993) Checking the Cox model with cumulative sums of martingale-based residuals. Biometrika 80:557–572

    Article  MathSciNet  Google Scholar 

  • Murphy S, Rossini A, Van Der Vaart A (1997) Maximum likelihood estimation in the proportional odds model. J Am Statist Assoc 92:968–976

    Article  Google Scholar 

  • Scheike TH, Zhang MJ (2002) An additive-multiplicative Cox–Aalen model. Scand J Statist 28:75–88

    Article  MathSciNet  Google Scholar 

  • Scheike TH, Zhang MJ (2003) Extensions and applications of the Cox–Aalen survival model. Biometrics 59:1033–1045

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

I would like to thank Martin Jacobsen for pointing my direction to the product integration formulae. I also appreciate discussions with Torben Martinussen. Part of the this work was done while the author visited the Center for Advanced Study in Oslo, and was partly supported by an NIH grant. I would also like to thank two referees and the associate editor. One for a detailed reading and making several suggestions for an improved presentation, and one for providing me with several recent references that deals with the asymptotic analysis. Their comments helped improve the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas H. Scheike.

Appendix

Appendix

I here sketch the main arguments of the proof that establishes the asymptotic variances, extending that of Bagdonavicius and Nikulin (1999, 2001) and Chen et al. (2002) to more than one dimension. A detailed consistency proof for the standard transformation model can be found in Dabrowska (2005), and it appears that these arguments can be extended to our setting, but this is no trivial exercise. The key to the multivariate extension is the use of product-limit integration formulae and I here focus on establishing the key formulae that gives expressions for the standard deviations of the estimated quantities.

Assume that subjects are i.i.d. with covariates that are uniformly bounded. Define \({\varvec D}^{**}(\beta,A) = \hbox{diag}(\exp(2 Z_i^{T}\beta)\dot{\lambda}_0(H({t-}|X_i)\exp(Z_i^{T}\beta))), {\varvec S}^{**}(\beta, A) = n^{-1}{\varvec X}^T(t){\varvec D}^{**}(t,\beta,A) {\varvec X}(t), {\varvec S}_j^{**}(\beta, A) = n^{-1}{\varvec X}^{T}(t){\varvec D}(X_{ij}){\varvec D}^{**}(t,\beta,A){\varvec X}(t), {\varvec S}^{Z}(\beta, A) = n^{-1}{\varvec Z}^T(t){\varvec D}(t,\beta,A) {\varvec X}(t)\), and \({\varvec S}_j^{**Z}(\beta, A) = n^{-1}{\varvec Z}^{T}(t) {\varvec D}(X_{ij}){\varvec D}^{**}(t,\beta,A){\varvec X}(t)\).

One of the needed assumptions is that all S matrices converge uniformly to deterministic matrices in both time and the parameter space for β. Let Θ denote an open ball around the true parameter value β0, then it is assumed that the limit of all large S’s exist and are denoted by small s’s. Such that for example \(lim_p {\varvec S}_j^{**Z}(\beta, A) = s_j^{**Z}\) uniformly in \(\Theta\times [0,\tau]\). Define the limit of \(E_j^{**}(t) = {\varvec S}_0^{-1}(t,\beta){\varvec S}_j^{**}(t,\beta)\) as \(e_j^{**}(u,\beta)\). I also assume that the covariates are uniformly bounded.

I start by making the observation that (up to \(o_p(n^{-1})\))

$$ \begin{aligned} \tilde{A}(\beta_0,t) - A_0(t) &= \int_{0}^{t}{\varvec X}^{-}(\beta_0,\tilde{A})\,\hbox{d}{\varvec M} \\ &- \int_{0}^{t} \left\{{\varvec X}^{-}(\beta_0,A_0) - {\varvec X}^{-}(\beta_0,\tilde{A})\right\} {\varvec D}(\beta_0,A_0) {\varvec X}\,\hbox{d}A_0(s) \end{aligned} $$

where the time arguments were omitted for notational simplicity. Note that the first term, V n , by the martingale CLT, can be shown to converge to a Gaussian martingale process V(t) with variance \(\sigma^2(t) = \int_{0}^{t}s_{0}^{-1}\,\hbox{d}A_0\). Note also that V n (t) can be written as

$$ V_n(t) = \sum_i\int_{0}^{t}{\varvec S}_0^{-1}(t,\beta_0) X_i\,\hbox{d}M_{i}. $$

The integrand in the last integral above can be Taylor expanded by the use of matrix derivatives thus leading to (up to \(o_p(n^{-1/2})\))

$$ \begin{aligned} ({\varvec S}_0^{-1}(t,\beta_0,\tilde{A}) - {\varvec S}_0^{-1}(t,&\beta_0,A_0)) {\varvec S}_0(\beta_0,A_0) = \\ &- \sum_{j=1}^p {\varvec S}_0^{-1}(t,\beta_0,A_0){\varvec S}_j^{**}(t,\beta_0,A_0)(\tilde{A} _j({t-}) - A_{0j}({t-})). \end{aligned} $$

This implies that \(\sqrt{n}(\tilde{A}(\beta_0,t) - A_0(t))\) converges in distribution to a p-dimensional process W that satisfies the integral equation

$$W(t) = \sum_{j=1}^p \int_{0}^{t} - W_j(s-) e_j^{**}(s,\beta_0)\,\hbox{d}A_0(s) + V(t).$$

Denote the elements of \(e_{j}^{**}(t)\) as \(e_{j,{km}}^{**}(t)\). With the p × p matrices F(t) with elements \(F_{jk}(t) = \sum_{m=1}^p e_{j,{km}}^{**}(t)\alpha_{0,m}(t)\) the solution can be written as

$$\hbox{d}W(t) = -W({t-})F\,\hbox{d}t +\hbox{d}V(t). $$

Using the product integration formulae (Gill and Johansen 1990) leads to the solution

$$\begin{aligned} W(t) &= \int_{0}^{t}\,\hbox{d}V(s) {\cal F}(s,t) \\ {\cal F}(s,t) &= \prod_{{]}s,t{]}} (I - F\,\hbox{d}t) \end{aligned}$$
(13)

and where \(\mathcal{F}(s,t)\) is the product integral of F. In the one-dimensional (commutative) case this equals \(\exp(-\int_{s}^{t} e^{**}(u,\beta_0)\,\hbox{d}A_0(u))\). Note that the matrix \(\mathcal{F}(s,t)\) is estimated consistently by the product of (the atoms)

$$ \hat{\mathcal{F}}(s,t) = \prod_{]s,t]}(I - \,\hbox{d}\hat{F}) $$
(14)

where \(\hbox{d}\hat{F}_{jk}(t) = \sum_{m=1}^p E_{j,{km}}^{**}(t)\,\hbox{d}\hat{A}_m(t).\) Using the product integral rather than the exponential in the one-dimensional case yields an estimator with the same asymptotic properties.

I now turn to the estimating function for \(\beta, \tilde{U}(\beta).\) By the martingale decomposition

$$ \begin{aligned} \tilde{U}(\beta_0) &= \int{\left\{{\varvec Z}^T -{\varvec Z}^{T}{\varvec D}(\beta_0,\tilde{A}){\varvec X}S_0^{-1}(\beta_0,\tilde{A}) {\varvec X}^T \right\}}\,\hbox{d}M \\ &+ \int{\left\{{\varvec Z}^{T} D(\beta_0, A_0){\varvec X} - {\varvec Z}^{T}{\varvec D}(\beta_0, \tilde{A}){\varvec X}\right\}}\,\hbox{d}A_0 \\ &+ \int{{\varvec Z}^{T}{\varvec D}(\beta_0,\tilde{A}){\varvec X}\left\{{\varvec S}_0^{-1}(\beta_0,A_0) - {\varvec S}_0^{-1}(\beta_0,\tilde{A})\right\} {\varvec S}_0(\beta_0,A_0)}\,\hbox{d}A_0. \end{aligned} $$

By a Taylor expansion, as before, it may be shown that the difference (up to \(o_p(n^{-1/2})\))

$$ {\varvec Z}^{T} {\varvec D}(\beta_0, A_0) {\varvec X} - {\varvec Z}^{T} {\varvec D}(\beta_0, \tilde{A}) {\varvec X} = -\sum_{j=1}^p {\varvec S}_j^{**Z}(\beta_0, A_0)(\tilde{A}_j({t-}) - A_{0j}({t-})). $$

The last two term therefore equals ( \(o_p(n^{-1/2})\))

$$ \sum_{j=1}^p \int{\left\{s_j^{**z} - s^z s_0^{-1} s_j^{**}\right\}}W_j\,\hbox{d}A_0$$

that can be written as

$$ \int{\left\{G_1(t) - G_2(t)\right\}} W(t)\,\hbox{d}t$$

with G 1(t) and G 2(t) an q × p matrix with elements \(G_{1,kj}(t) = \sum_{m=1}^p s_{j,km}^{**z}\alpha_{0,m}(t)\) and \(G_{2,kj}(t) = \sum_{m=1}^p \left\{s^z s_0^{-1}s_j^{**}\right\}_{km} \alpha_{0,m}(t)\). Now changing the order of integration and using the structure of W(t) one gets

$$ \begin{aligned} & \int_{0}^{\tau}\left\{G_1(t)-G_2(t)\right\}W(t)\,\hbox{d}t = \int_{0}^{\tau}\left\{G_1(t)-G_2(t)\right\}\int_{0}^{t}\mathcal{F} (s,t)V(s)\,\hbox{d}s\,\hbox{d}t \\ &= \int_{0}^{\tau} g(s)V(s)\,\hbox{d}s = \sum_{i}\int_{0}^{\tau} g(s) s_0^{-1}(s,\beta) X_i \,\hbox{d}M_i(s), \\ \end{aligned} $$

where

$$ g(s) = \int_{s}^{\tau}\left\{G_1(t)-G_2(t)\right\}\mathcal{F}(s,t)\,\hbox{d}t. $$

Define

$$ \begin{aligned} q_{1i}(t,\beta) &= Z_i - s_z(t,\beta)s_0^{-1}(t,\beta)X_i \\ q_{2i}(t,\beta) &= g(t) s_0^{-1}(t,\beta)X_i \\ q_i(t) &= q_{1i}(t,\beta) + q_{2i}(t,\beta). \end{aligned} $$
(15)

Then one can write the estimating function as follows (up to an o p (1) term)

$$ n^{-1/2}\tilde{U}(\beta_0) = n^{-1/2}\sum_{i}\int_{0}^{\tau}\left\{q_{1i}(t,\beta_0) +q_{2i}(t,\beta_0)\right\}\,\hbox{d}M_i(t). $$

This is a sum of i.i.d. terms (or a martingale) and therefore converges to a normal distribution with variance that is estimated by the robust estimator given in (11).

Based on a Taylor series expansion (up to an o p (1) term) we find that

$$ \begin{aligned} \sqrt{n}(\hat{A}(t) - A_0(t)) &= n^{1/2} \int_{0}^{t} \frac{\partial}{\partial\beta}X^{-}(t,\beta_0)\,\hbox{d}{\varvec N}(\hat{\beta} - \beta_0) + n^{-1/2}\sum_{i}\int_{0}^{t}\mathcal{F}(s,t) s_0^{-1}(s,\beta_0) X_i(s)\,\hbox{d}M_i(s) \\ &= n^{-1/2} \sum_{i}L_i(t,\beta_0) \end{aligned} $$

where

$$ \begin{aligned} L_i(t,\beta) &= P(t,\beta_0)I^{-1}(\tau)\int_{0}^{\tau} Y_i(t) q_i(t,\beta_0)\,\hbox{d}M_i(t) \\ &+ \int_{0}^{t} Y_i(s) \mathcal{F}(s,t) s_0^{-1}(s,\beta_0) X_i(s)\,\hbox{d}M_i(s), \end{aligned} $$
(16)

and with P(t,β) the limit of \(n^{-1}\tilde{P}(t,\hat{\beta})\) defined in (8), with I(τ) the limit of \(n^{-1} \mathcal{I}(\tau,\beta_0)\), and with the product integral \(\mathcal{F}(s,t)\) defined in (13).

Note that an estimator of \(\mathcal{F}(s,t)\) is given in (14).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Scheike, T. A flexible semiparametric transformation model for survival data. Lifetime Data Anal 12, 461–480 (2006). https://doi.org/10.1007/s10985-006-9021-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10985-006-9021-1

Keywords

Navigation