Abstract
Interval-censored data often arise naturally in medical, biological, and demographical studies. As a matter of routine, the Cox proportional hazards regression is employed to fit such censored data. The related work in the framework of additive hazards regression, which is always considered as a promising alternative, remains to be investigated. We propose a sieve maximum likelihood method for estimating regression parameters in the additive hazards regression with case II interval-censored data, which consists of right-, left- and interval-censored observations. We establish the consistency and the asymptotic normality of the proposed estimator and show that it attains the semiparametric efficiency bound. The finite-sample performance of the proposed method is assessed via comprehensive simulation studies, which is further illustrated by a real clinical example for patients with hemophilia.
Similar content being viewed by others
References
Betensky RA, Rabinowitz D, Tsiatis AA (2001) Computationally simple accelerated failure time regression for interval censored data. Biometrika 88:703–711
Bickel PJ, Klaassen CAJ, Ritov Y, Wellner JA (1993) Efficient and adaptive estimation for semiparametric models. Johns Hopkins University Press, Baltimore
Cox DR (1972) Regression models and life tables (with discussion). J R Stat Soc B 34:187–220
Finkelstein DM (1986) A proportional hazards model for interval-censored failure time data. Biometrics 42:845–854
Geman S, Hwang C-R (1982) Nonparametric maximum likelihood estimation by the method of sieves. Ann Stat 10:401–414
Goggins WB, Finkelsten DM (2000) A proportional hazards model for multivariate interval-censored failure time data. Biometrics 56:940–943
Gómez G, Espinal A, Lagakos SW (2003) Inference for a linear regression model with an interval-censored covariate. Stat Med 22:409–425
Huang J (1996) Efficient estimation for the proportional hazards model with interval censoring. Ann Stat 24:540–568
Huang J, Wellner JA (1997) Interval censored survival data: a review of recent process. In: Lin DY, Fleming TR (eds) Proceedings of the first Seattle symposium in biostatistics: survival analysis. Springer, New York, pp 123–169
Huang J, Zhang Y, Hua L (2008) A least-squares approach to consistent information estimation in semiparametric models. Technical Report, Department of Biostatistics, University of Iowa
Kim M, Xue X (2002) The analysis of multivariate interval-censored survival data. Stat Med 21:3715–3726
Kroner BL, Rosenberg PS, Aledort LM, Alvord WG, Goedert JJ (1994) HIV-1 infection incidence among persons with hemophilia in the United States and Western Europe, 1978–1990. J Acquir Immune Defic 7:279–286
Li G, Zhang C-H (1998) Linear regression with interval censored data. Ann Stat 26:1306–1327
Li L, Pu Z (2003) Rank estimation of log-linear regression with interval-censored data. Lifetime Data Anal 9:57–70
Martinussen T, Scheike TH (2002) Efficient estimation in additive hazards regression with current status data. Biometrika 89:649–658
Rabinowitz D, Tsiatis A, Aragon J (1995) Regression with interval-censored data. Biometrika 82:501–513
Satten GA, Datta S, Williamson JM (1998) Inference based on imputed failure times for the proportional hazards model with interval-censored data. J Am Stat Assoc 93:318–327
Schumaker L (1981) Spline function: basic theory. Wiley, New York
Seaman SR, Bird SM (2001) Proportional hazards model for interval-censored failure times and time-dependent covariates: application to hazard of HIV infection of injecting drug users in prison. Stat Med 20:1855–1870
Shen X, Wong WH (1994) Convergence rate of sieve estimates. Ann Stat 22:580–615
Song X, Ma S (2008) Multiple augmentation for interval-censored data with measurement error. Stat Med 27:3178–3190
Sun J (2006) The statistical analysis of interval-censored failure time data. Springer, New York
Sun J, Shen J (2009) Efficient estimation for the proportional hazards model with competing risks and current status data. Can J Stat 37:592–606
van der Vaart AW (1998) Asymptotic statistics. Cambridge University Press, Cambridge
Wang L, Sun J, Tong X (2008) Efficient estimation for the proportional hazards with bivariate current status data. Lifetime Data Anal 14:134–153
Wang L, Sun J, Tong X (2010) Regression analysis of case II interval-censored failure time data with the additive hazards model. Stat Sin 20:1709–1723
Xue H, Lam KF, Li G (2004) Sieve maximum likelihood estimator for semiparametric regresion models with current status data. J Am Stat Assoc 99:346–356
Zeng D, Cai J, Shen Y (2006) Semiparametric additive risks model for interval-censored data. Stat Sin 16:287–302
Zhang W, Zhang Y, Chaloner K, Stapleton JT (2009) Imputation methods for doubly censored HIV data. J Stat Comput Sim 79:1245–1257
Zhang Y, Hua L, Huang J (2010) A spline-based semiparametric maximum likelihood estimation method for the Cox model with interval-censored data. Scand J Stat 37:338–354
Zhang Y, Liu W, Zhan Y (2001) A nonparametric two-sample test of the failure function with interval censoring case 2. Biometrika 88:677–686
Zhao X, Lim H, Sun J (2005) Estimating equation approach for regression analysis of failure time data in the presence of interval-censoring. J Stat Plan Inference 129:145–157
Zhou Q, Hu T, Sun J (2017) A sieve semiparametric maximum likelihood approach for regression analysis of bivariate interval-censored failure time data. J Am Stat Assoc 112:664–672
Acknowledgements
The authors would like to thank the Editor, the Associate Editor and the two reviewers for their constructive and insightful comments and suggestions that greatly improved the paper. This research is partly supported by the National Natural Science Foundation of China (Nos. 11571263, 11671311, 11771366) and the Research Grant Council of Hong Kong (15301218, 15303319).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Proofs of Theorems
Appendix: Proofs of Theorems
First we derive the integral equation for the least favorable direction. Denote
where \(E_\mathbf{z}\) means taking expectation with respect \(\mathbf{Z}\). Follow the similar steps of Huang et al. (2008), define function,
Then we can attain (4).
Next we present the proof for Theorem 1 and 2. Throughout the following proofs, for notation simplicity, we denote \(P_nf= \frac{1}{n}\sum _{i=1}^n f(\mathbf{O}_i)\), \(M(\tau )=P\ell (\tau ;\mathbf{O}) =P\ell ({\varvec{\theta }},g; \mathbf{O})\) and \(M_n(\tau )=P_n\ell (\tau ; \mathbf{O})= P_n\ell ({\varvec{\theta }},g;\mathbf{O})\), let C represent a generic constant that may vary from place to place.
Proof of Theorem 1
To show the consistency and derive the convergence rate, we just need to verify the following conditions C1–C3 in Theorem 1 of Shen and Wong (1994), which are presented as follows:
-
C1
\(\inf _{\{d(\tau ,\tau _0)\ge \epsilon , \tau \in \varTheta \times \mathbb {G}_n\}} M(\tau _0)-M(\tau )\ge C \inf _{\{d(\tau ,\tau _0)\ge \epsilon , \tau \in \varTheta \times \mathbb {G}_n\}}d^2(\tau ,\tau _0)\) where \(\tau _0=({\varvec{\theta }}_0,g_0)\), and C1 holds with \(\alpha =1\).
-
C2
\(\sup _{\{d(\tau ,\tau _0)\le \epsilon , \tau \in \varTheta \times \mathbb {G}_n\}} \text{ var }( \ell (\tau _0;\mathbf{O})-\ell (\tau ;\mathbf{O}))\le \sup _{\{d(\tau ,\tau _0)\le \epsilon , \tau \in \varTheta \times \mathbb {G}_n\}}d^2(\tau ,\tau _0)\), and C2 holds with \(\beta =1\).
-
C3
Let \(\mathcal {F}_n=\{\ell (\tau ;\cdot ): \tau \in \varTheta \times \mathbb {G}_n\}\), \(H(\epsilon , \mathcal {F}_n)\le C n^{2\gamma _0}\log (1/\epsilon ),\) where \(H(\epsilon , \mathcal {F}_n)\) is the \(L_{\infty }\)-metric entropy of the space \(\mathcal {F}_n\) and C3 holds with \(2\gamma _0=\nu \) and \(\gamma =0^{+}\).
Condition C1 with \(\alpha =1\) can be verified by similar contexts as in Zhang et al. (2010). Condition C2 can be easily obtained through a Taylor expansion combined with conditions A1–A5. By inequality \(\log (x)\le x-1\), we have the following results, for \(\tau \in \varTheta \times {\mathbb {G}_n}\),
where the second and the fourth inequality follow from the inequality \((a+b)^2\le C(a^2+b^2)\), the sixth inequality is obtained by Cauchy–Schwartz inequality and \(g^\star (s)\) is a value between \(g_0(s)\) and g(s). With condition C1 which we have already shown, we can verify condition C2 with \(\beta =1\).
Next we verify the condition C3. Let \(L_1=\{\ell (\tau ;\mathbf{O}): \tau \in \varTheta \times \mathbb {G}_n\}\). We can easily construct a set of brackets \(\{[\ell _{s,i}^{L}(\mathbf{O}),\ell _{s,i}^{U}(\mathbf{O})]: s=1,2, \ldots , [C(1/\epsilon )^d]; i=1, 2, \ldots , [C(1/\epsilon )^{Cq_n}]\}\) for any \(\ell (\tau ;\mathbf{O}) \in L_1\), Specifically,
and
where \(\{[g_i^L,g_i^U]: i=1,\ldots ,[(1/\epsilon )]^{Cq_n}\}\) is the brackets set for any \(g \in S_n\). Then, using a Taylor expansion along with conditions A1–A3, we can conclude that the \(\epsilon \)-bracketing number for \(L_1\) with \(L_1(P)\)-norm is bounded by \(C(1/\epsilon )^{Cq_n+d}\) and \(H(\epsilon , L_1)\le C n^{-\nu }\log (1/\epsilon )\). Hence, condition C3 in Theorem 1 of Shen and Wong (1994) holds with \(2\gamma _0=\nu \) and \(r=0^{+}\).
With condition A4, for \(g_0 \in \mathbb {G}\), employing Corollary 6.21 in Schumaker (1981), there exists a function \(g_{0n}\in S_n\) of order \(m\ge p+2\) such that \(\Vert g_{0n}-g_0\Vert _\infty =O(n^{-p \nu })\), where \(\Vert \cdot \Vert _{\infty }\) is the sup-norm, which also means \(\Vert g_{0n}-g_0\Vert _{\mathbb {G}}=O(n^{-p \nu })\). Now denote \(\tau _{0,n}=({\varvec{\theta }}_0,g_{0,n})\). Then we have
Similar as (Zhang et al. 2010), we can conclude that
and then \(\widehat{\tau }_n\) satisfies inequality (1.1) in Shen and Wong (1994).
Next, we derive the convergence rate. We have obtained that condition C3 in Theorem 1 of Shen and Wong (1994) holds with constants \(2\gamma _0=\nu \) and \(r=0^{+}\) in their notation. Furthermore, the constant \(\tau \) in Theorem 1 of Shen and Wong (1994) is \((1-\nu )/2-(\log \log n)/( 2\log n)\). On the other hand, we can pick a \({\bar{\nu }}\) slightly greater than \(\nu \) such that \((1-{\bar{\nu }})/2 \le (1-\nu )/2-(\log \log n)/(2\log n)\) for large n. We still denote \({\bar{\nu }}\) by \(\nu \) and then \(\tau =(1-\nu )/2\). The Kullback-Leibler distance between \(\tau _0=({\varvec{\theta }}_0,g_0)\) and \(\tau _{0,n}=({\varvec{\theta }}_0,g_{0n})\) is given by
where \(m(x)=x \log x-x+1\le x(x-1)-x+1\le (x-1)^2\). Then, we can obtain \(K^{\frac{1}{2}}(\tau _0,\tau _{0n})=O(n^{-p \nu })\). Following Theorem 1 of Shen and Wong (1994), we have \(d(\widehat{\tau }_n,\tau _0)=O_p\{n^{-\min (p \nu ,(1-\nu )/2)}\}\), which completes the proof of Theorem 1. \(\square \)
Proof of Theorem 2
By Zhang et al. (2010), it is sufficient to derive the asymptotic normality for \(\widehat{{\varvec{\theta }}}_n\) by verifying the following conditions.
-
B1
\(P_n \dot{\ell }_1(\widehat{\tau }_n; \mathbf{O})= o_p(n^{-1/2})\) and \(P_n\dot{\ell }_2(\widehat{\tau }_n; \mathbf{O})[h_0]=o_p(n^{-1/2})\).
-
B2
\((P_n-P)\{{\ell }^*(\widehat{\tau }_n; \mathbf{O})-{\ell }^*(\tau _0; \mathbf{O})\}=o_p(n^{-1/2})\).
-
B3
\(P\{{\ell }^*(\widehat{\tau }_n; \mathbf{O}) -{\ell }^*(\tau _0; \mathbf{O})\}=-I({\varvec{\theta }}_0)(\widehat{\varvec{\theta }}_n-{\varvec{\theta }}_0)+ O_p(\Vert \widehat{\varvec{\theta }}_n-{\varvec{\theta }}_0\Vert )+o_p(n^{-1/2})\).
Conditions B1 and B2 can be verified by similar arguments as Zhang et al. (2010). As for condition B3, using (a1) minus (a2) of conditions A7 and A8, we have
By Theorem 1 and the fact \(\alpha p \nu > \frac{1}{2}\), we have, \(O(\Vert \widehat{g}_n-g_0\Vert ^\alpha )=o_p(n^{-1/2})\). So B3 holds. Then Theorem 2 can be established follow the general procedure which has stated in Zhang et al. (2010). \(\square \)
Rights and permissions
About this article
Cite this article
He, B., Liu, Y., Wu, Y. et al. Semiparametric efficient estimation for additive hazards regression with case II interval-censored survival data. Lifetime Data Anal 26, 708–730 (2020). https://doi.org/10.1007/s10985-020-09496-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10985-020-09496-z