Abstract
When the distribution of the truncation time is known up to a finite-dimensional parameter vector, many researches have been conducted with the objective to improve the efficiency of estimation for nonparametric or semiparametric model with left-truncated and right-censored (LTRC) data. When the distribution of truncation times is unspecified, one approach is to use the conditional maximum likelihood estimators (cMLE) (Chen and Shen in Lifetime Data Anal https://doi.org/10.1007/s10985-016-9385-9, 2017). Although the cMLE has nice asymptotic properties, it is not efficient since the conditional likelihood function does not incorporate information on the distribution of truncation time. In this article, we aim to develop a more efficient estimator by considering the full likelihood function. Following Turnbull (J R Stat Soc B 38:290–295, 1976) and Qin et al. (J Am Stat Assoc 106:1434–1449, 2011), we treat the unobserved (left-truncated) subpopulation as missing data and propose a two-stage approach for obtaining the pseudo maximum likelihood estimators (PMLE) of regression parameters. In the first stage, the distribution of left truncation time is estimated by the inverse-probability-weighted (IPW) estimator (Wang in J Am Stat Assoc 86:130–143, 1991). In the second stage, we obtain the pseudo complete-data likelihood function by replacing the distribution of truncation time with the IPW estimator in the full likelihood. We propose an expectation–maximization algorithm for obtaining the PMLE and establish the consistency of the PMLE. Simulation results show that the PMLE outperforms the cMLE in terms of mean squared error. The PMLE can also be used to analyze the length-biased data, where the truncation time is uniformly distributed. We demonstrate that the PMLE works more robust against the support assumption of truncation time for length-biased data compared with the MLE proposed by Qin et al. (2011). We apply our proposed method to the channing house data. While the PMLE is quite appealing under specific cases with independent censoring and time-invariant covariates, its applicability, as shown in simulation study, can be rather restricted for more general settings.
Similar content being viewed by others
References
Asgharian M, Wolfson DB (2005) Asymptotic behavior of the unconditional NPMLE of the length-biased survivor function from right censored prevalent cohort data. Ann Stat 33:2109–2131
Asgharian M, M’Lan CE, Wolfson DB (2002) Length-biased sampling with right censoring: an unconditional approach. J Am Stat Assoc 97:201–209
Asgharian M, Wolfson DB, Zhang X (2006) Checking stationarity of the incidence rate using prevalent cohort survival data. Stat Med 25:1751–1767
Bennett S (1983) Analysis of survival data by the proportional odds model. Stat Med 2:273–277
Chen Y-H (2009) Weighted Breslow-type and maximum likelihood estimation in semiparametric transformation models. Biometrika 96:591–600
Chen C-M, Shen PS (2017) Conditional maximum likelihood estimation for LTRC data. Lifetime Data Anal. https://doi.org/10.1007/s10985-016-9385-9
Chen K, Jin Z, Ying Z (2002) Semiparametric analysis of transformation models with censored data. Biometrika 89:659–668
Chen L, Lin DY, Zeng D (2012) Checking semiparametric transformation models with censored data. Biostatistics 13:18–31
Cheng Y-J, Huang C-Y (2014) Combined estimating equation approaches for semiparametric transformation models with length-biased survival data. Biometrics 70:608–618
Cheng SC, Wei LJ, Ying Z (1995) Analysis of transformation models with censored data. Biometrika 82:835–845
Cox D (1972) Regression models and life tables (with Discussion). J R Stat Soc Ser B 34:187–220
Dabrowska DM, Doksum KA (1988) Estimation and testing in the two-ample generalized odds-rate model. J Am Stat Assoc 83:744–749
Huang C-Y, Qin J (2013) Semiparametric estimation for the additive hazards model with left-truncated and right-censored data. Biometrika 100:877–888
Huang C-Y, Ning J, Qin J (2015) Semiparametric likelihood inference for left-truncated and right-censored data. Biostatistics 16:785–798
Hyde J (1977) Testing survival under right censoring and left truncation. Biometrika 64:225–230
Kalbfleisch JD, Prentice RL (2002) The statistical analysis of failure time data, 2nd edn. Wiley, New York
Kim JP, Lu W, Sit T, Ying Z (2013) A unified approach to semiparametric transformation models under general biased sampling schemes. J Am Stat Assoc 108:217–227
Klein JP, Moeschberger ML (1997) Survival analysis: techniques for censored and truncated data. Springer, Berlin
Lai TZ, Ying Z (1991) Estimating a distribution function with truncated and censored data. Ann Stat 19:417–442
Liu H, Ning J, Qin J, Shen Y (2016) Semiparametric maximum likelihood inference for truncated or biased-sampling data. Stat Sin 26:1087–1115
Mandel M, Betensky RA (2007) Testing goodness of fit of a uniform truncation model. Biometrics 63:405–412
Murphy SA (1994) Consistency in a proportional hazards model incorporating a random effect. Ann Stat 22:712–31
Murphy SA (1995) Asymptotic theory for the frailty model. Ann Stat 23:182–198
Murphy SA, Rossini AJ, van der Vaart AW (1997) Maximum likelihood estimation in the proportional odds model. J Am Stat Assoc 92:968–976
Parner E (1998) Asymptotic theory for the correlated gamma-frailty models. Ann Stat 26:183–214
Qin J, Shen Y (2010) Statistical methods for analyzing right-censored length-biased data under Cox model. Biometrics 66:382–392
Qin J, Ning J, Liu H, Shen Y (2011) Maximum likelihood estimations and EM algorithms with length-biased data. J Am Stat Assoc 106:1434–1449
Shen PS (2011) Semiparametric analysis of transformation models with left-truncated and right-censored data. Comput Stat 26:521–537
Shen PS, Liu Y (2017) Pseudo maximum likelihood estimation for the Cox model with doubly truncated data. Stat Papers https://doi.org/10.1007/s00362-016-0870-8
Tsai W-Y (2009) Pseudo-partial likelihood for proportional hazards models with biased-sampling data. Biometrika 96:601–615
Turnbull BW (1976) The empirical distribution function with arbitrarily grouped, censored and truncated data. J R Stat Soc B 38:290–295
van der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes: with applications to statistics. Springer, New York
Vardi Y (1989) Multiplicative censoring, renewal processes, deconvolution and decreasing density: nonparametric estimation. Biometrika 76:751–761
Wang M-C (1989) A semiparametric model for randomly truncated data. J Am Stat Assoc 84:742–748
Wang M-C (1991) Nonparametric estimation from cross-sectional survival data. J Am Stat Assoc 86:130–143
Woodroofe M (1985) Estimating a distribution function with truncated data. Ann Stat 13:163–167
Zeng D, Lin DY (2006) Efficient estimation of semiparametric transformation models for counting processes. Biometrika 93:627–640
Zeng D, Lin DY (2007) Maximum likelihood estimation in semiparametric regression models with censored data (with discussion). J R Stat Soc Ser B 69:507–564
Zeng D, Lin DY (2010) A general theory for maximum likelihood estimation in semiparametric regression models with censored data. Stat Sin 20:871–910
Acknowledgements
The author would like to thank the associate editor and referees for their helpful and valuable comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1: Proof of Theorem 1
Our proofs follow essentially the same steps as Zeng and Lin (2006) (also see Murphy 1994, 1995; Parner 1998).
By condition (C2), there exists a constant M such that \(\sup _{\beta \in {{\mathcal {B}}}} |\beta ^T Z^{*}|\le M\) with probability one. Hence, the \(i^{th}\) term in (3) satisfies
Under condition (C3), this quantity diverges to \(-\infty \) if \(R\{X_i\}\) tends to \(\infty \) for some \(X_i\). Hence, the jump sizes of R must be finite. Since \(\zeta _n\) maximizes the likelihood function \(l_n({\hat{\zeta }}_n,{\hat{K}}_n)\), the following inequality holds \(l_n({\hat{\zeta }}_n,{\hat{K}}_n)-l_n({\tilde{\zeta }}_n,{\hat{K}}_n)\ge 0\), where \({\tilde{\zeta }}_n=({\tilde{R}}_n,{\hat{\beta }}_n)\), \({\tilde{R}}_n(t)={\hat{R}}_n(t)/\xi _n\), \(\xi _n={\hat{R}}_n(\tau _c)\). Since \({\hat{K}}_n\) is a consistent estimator of \(K_0\) (Wang 1991), \(l_n({\hat{\zeta }}_n,K_0)-l_n({\tilde{\zeta }}_n,K_0)\) is asymptotically nonnegative. Using the approach of Zeng and Lin (2006), we first show that \({\hat{{\varLambda }}}_n(\tau )\) is bounded almost surely by contradiction. From (3) and \(n^{-1}[l_n({\hat{\zeta }}_n,K_0)-l_n({\tilde{\zeta }}_n,K_0)]\ge 0\), we obtain
Note that the right-hand side is bounded from below by
The left-hand side is bounded from above by
Under condition (C3), \(\log \xi _n\sup _{y\le \xi _n e^{M}}g(y)\le \epsilon G(\xi _n e^{M})\) for any \(\epsilon \) when n is large enough. It follows that if we choose \(\epsilon \) such that \(\epsilon E[N_i(\tau _c)]\le P(X_i\ge \tau _c)/2\), the left-hand side diverges to \(-\infty \) when \(\xi _n\rightarrow \infty \). This is a contradiction. Thus, \({\hat{R}}_n\) is bounded on \([0,\tau _c]\) with probability one. By the Helly’s selection theorem, along a subsequence, we assume that \({\hat{\zeta }}_n\) converges to \(\zeta ^{*}=(R^{*},\beta ^{*})\).
Next, by differentiating \(l_n(R,\beta ,K_0)\) with respect to \(R\{X_i\}\) and setting the derivative to be zero,we obtain \([n{\hat{R}}_n\{X_i\}]^{-1}=\phi _n(X_i;{\hat{\beta }}_n,{\hat{R}}_n,K_0),\) where
where \(\dot{g}(x)=dg(x)/dx\). It follows that
By the Glivenko–Cantelli theorem, \(\phi _n(s;{\hat{R}}_n,{\hat{\beta }}_n,K_0)\) uniformly converges to a continuously differentiable function \(\phi (s;R^{*},\beta ^{*},K_0)\). Similar to the arguments of Zeng and Lin (2006), it follows that when n is large enough \(|{\hat{\phi }}_n(t;{\hat{R}}_n,{\hat{\beta }}_n,K_0)|>\epsilon \) for some \(\epsilon _0\). Let
By the Glivenko–Cantelli theorem, \({\hat{R}}_n^{0}(t)\) converges uniformly to \({\varLambda }_0\) almost surely. By the lower bound of \(|\phi _n(t)|\), \({\hat{R}}_n(t)\) is absolutely continuous respect to \({\hat{R}}_n^{0}(t)\) and \(d{\hat{R}}_n/d{\hat{R}}_n^{0}\) converges to a bounded measurable function, i.e., \(R^{*}(t)=\int _{0}^{t}b(s)dR_0(s)\). Thus, \(R^{*}\) is absolutely continuous with derivative \(r^{*}(t)\) and \(b(t)=r^{*}(t)/r_0(t)\). Since \(l_n(R,\beta ,K_0)\) is maximized at \(({\hat{R}}_n,{\hat{\beta }}_n)\), we have
We take the limits on both sides. Then, by the Glivenko–Cantelli theorem and the fact that \({\hat{R}}_n\{t\}/{\hat{R}}_n^{0}\{t\}\) converges uniformly to \(r^{*}(t)/r_0(t)\), the Kullback–Leibler information between the density indexed by \((R^{*},\beta ^{*})\) and \((R_0,\beta _0)\) is negative. Thus, with probability one
This equality holds for the case \(X\ge \tau _c\), \(\delta _i=0\) and also holds for the case where \(X\ge \tau _c\) and \(N(t-)=1\) for \(t\in [0,\tau _c]\), \(N(\tau _c)=1\). The difference between the equalities from two cases entails that
Under (C2), it follows that \(\beta ^{*}=\beta _0\) and \(R^{*}=R_0\). Hence, \({\hat{\beta }}_n\) converges almost surely to \(\beta _0\) and by (C1), \({\hat{R}}_n(t)\) converges uniformly in t for \(t\in [0,\tau _c]\).
Appendix 2: Proof of the asymptotic distribution based on \(l_n(\beta ,R,K_0)\)
In additions conditions (C1)–(C3), we need the following condition:
(C4) Let \({\hat{R}}_n(\cdot ,\beta )\) be the maximizer of \(l_n(R,\beta ,K_0)\) for given \(\beta \). The information matrix \(-\partial ^2E[l_n({\hat{R}}_n(\cdot ,\beta ),\beta ,K_0)]/\partial ^2\beta \) evaluated at true value \(\beta _0\) is positive definite.
The proof of the asymptotic normality is similar to the work of Zeng and Lin (2006). We provide only a sketch of the proof. Let \({{\mathcal {P}}}_n\) denote the empirical measure determined by n i.i.d. observations and \({{\mathcal {P}}}_0\) denote its expectation. Furthermore, we define \(l(\beta ,R,K_0)\) as the logarithm of the observed likelihood function from a single subject. Define the derivative of \(l(R,\beta ,K_0)\) with respect to R as
and also define
Let \(l_{\beta }(R,\beta ,K_0)\) denote the score vector of \(l(R,\beta ,K_0)\) for \(\beta \) and \(l_{\beta \beta }(R,\beta ,K_0)\) the Hessian matrix of \(l(R,\beta ,K_0)\). Define \(\psi (t;R,\beta )=\dot{g}\{R(X\wedge t)e^{\beta ^T Z}\}/g\{R(X\wedge t)e^{\beta ^T Z}\}.\) We choose \(\epsilon _0\) small enough and define a map \(U_n :=(U_{1n},U_{2n})\) from \({{\mathcal {S}}}=\{(R,\beta ): ||R-R_0||_{l^{\infty }[0,\tau _c]}<\epsilon _0,|\beta -\beta _0| <\epsilon _0 \}\subset {{\mathcal {R}}}_p\times l^{\infty }({{\mathcal {D}}})\) to \(l^{\infty }({{\mathcal {D}}}) \times {{\mathcal {R}}}_p\) as follows: for any \(q(t)\in {{\mathcal {D}}}\),
and
Similarly, we can define the limit version of \(U_n\) as \(U_0 :=(U_{10},U_{20})\) by replacing \({{\mathcal {P}}}_n\) by \({{\mathcal {P}}}_0\), i.e., \(U_{10}(R,\beta )[q]=E_0[U_{1n}(R,\beta )[q]]\) and \(U_{20}(R,\beta )=E_0[U_{2n}(R,\beta )]\). Clearly, \(U_n({\hat{R}}_n,{\hat{\beta }}_n)\) is asymptotically equal to zero and \(U_0(R_0,\beta _0)=0\). By conditions (C1) and (C2) and the Donsker theorem, we can show that \(\sqrt{n}(U_n-U_0)({\hat{R}}_n,{\hat{\beta }}_n)-\sqrt{n}(U_n-U_0)(R_0,\beta _0)=o_p(1)\) in the metric space \({{\mathcal {R}}}^{p}\times l^{\infty }({{\mathcal {D}}})\). Since \(\sqrt{n}(U_n-U_0)(R_0,\beta _0)\) is a sum of i.i.d. random quantities, by empirical theory it converges weakly to \({{\mathcal {W}}}=({{\mathcal {W}}}_1,{{\mathcal {W}}}_2)\), where \({{\mathcal {W}}}_1\) is a tight Gaussian process and \({{\mathcal {W}}}_2\) is a Gaussian random vector. Furthermore, the covariance matrix for \({{\mathcal {W}}}_2\) is \({\varSigma }_{22}=E_0[U_{2n}(R_0,\beta _0)^{\otimes 2}]\) and covariance between \({{\mathcal {W}}}_1(s)\) and \({{\mathcal {W}}}_1(t)\) is \({\varSigma }_{11}(s,t)=E_0[U_{1n}(R_0,\beta _0)[q_1]U_{1n}(R_0,\beta _0)[q_2]]\), where \(q_1(\cdot )=I_{[\cdot \le s]}\) and \(q_2(\cdot )=I_{[\cdot \le t]}\). By Theorem 3.3.1 of van der Vaart and Wellner (1996), it remains to show that \(U_0\) is Fréchet-differentiable at \(\zeta _0=(R_0,\beta _0)\) and the derivative is continuously invertible in the set \({{\mathcal {S}}}\). The Fréchet-differentiability can be verified directly. The derivative of \(U_0\) maps \({{\mathcal {S}}}\) to \(l^{\infty }({{\mathcal {D}}})\times {{\mathcal {R}}}^p\) and has the form
where \(U_{11}(R-R_0)[q]=\int (-p(t)I+D)[q]d(R-R_0)\), \(U_{12}(\beta -\beta _0)[q]=A[\int q dR_0](\beta -\beta _0)\), \(U_{21}(R-R_0)=A^{*}[R-R_0]\), \(U_{22}(\beta -\beta _0)=B(\beta -\beta _0)\), where \(p(t) > 0\), I is identity operator, A and D are both linear operator, \(A^{*}\) is the dual operator of A and B is a \(p\times p\) matrix. Specifically,
It suffices to show that \(U_{22}\) and \(L:=U_{11}-U_{12}U_{22}^{-1}U_{21}\) are continuously invertible. Note \(-U_{22}\) is the information at \(\beta _0\). By condition (C2), it follows that \(U_{22}\) is continuously invertible. Furthermore, based on the following equality
and the arguments given in Zeng and Lin (2006), we can show that L is continuously invertible in the set \({{\mathcal {S}}}\). Denote \(\dot{U}_{\zeta _0}\) as the Fréchet derivative of the map \(U_0\) at \(\zeta _0\). Thus, we have \(\dot{U}_{\zeta _0}[\sqrt{n}(\zeta _n-\zeta _0)] =-\sqrt{n}(U_n-U_0)(R_0,\beta _0)+o_p(1)\). It follows that \(\sqrt{n}[\zeta _n-\zeta _0]\) converges weakly to a mean zero Gaussian process \(-\dot{U}^{-1}_{\zeta _0}({{\mathcal {W}}})\).
Rights and permissions
About this article
Cite this article
Chen, CM., Shen, Ps. & Liu, Y. On semiparametric transformation model with LTRC data: pseudo likelihood approach. Stat Papers 62, 3–30 (2021). https://doi.org/10.1007/s00362-018-01080-w
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-018-01080-w