A nonparametric maximum likelihood approach for survival data with observed cured subjects, left truncation and right-censoring

Hou, Jue; Chambers, Christina D.; Xu, Ronghui

doi:10.1007/s10985-017-9415-2

A nonparametric maximum likelihood approach for survival data with observed cured subjects, left truncation and right-censoring

Published: 13 December 2017

Volume 24, pages 612–651, (2018)
Cite this article

Lifetime Data Analysis Aims and scope Submit manuscript

532 Accesses
4 Citations
Explore all metrics

Abstract

We consider observational studies in pregnancy where the outcome of interest is spontaneous abortion (SAB). This at first sight is a binary ‘yes’ or ‘no’ variable, albeit there is left truncation as well as right-censoring in the data. Women who do not experience SAB by gestational week 20 are ‘cured’ from SAB by definition, that is, they are no longer at risk. Our data is different from the common cure data in the literature, where the cured subjects are always right-censored and not actually observed to be cured. We consider a commonly used cure rate model, with the likelihood function tailored specifically to our data. We develop a conditional nonparametric maximum likelihood approach. To tackle the computational challenge we adopt an EM algorithm making use of “ghost copies” of the data, and a closed form variance estimator is derived. Under suitable assumptions, we prove the consistency of the resulting estimator which involves an unbounded cumulative baseline hazard function, as well as the asymptotic normality. Simulation results are carried out to evaluate the finite sample performance. We present the analysis of the motivating SAB study to illustrate the advantages of our model addressing both occurrence and timing of SAB, as compared to existing approaches in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian Meta-Analysis of Health State Utility Values: A Tutorial with a Practical Application in Heart Failure

Article Open access 20 May 2024

Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range

Article Open access 19 December 2014

Non-linear Mendelian randomization: detection of biases using negative controls with a focus on BMI, Vitamin D and LDL cholesterol

Article Open access 25 May 2024

References

Andersen PK, Borgan O, Gill RD, Keiding N (1993) Statistical models based on counting processes. Springer, New York
Book MATH Google Scholar
Asgharian M, Wolfson DB, Zhang X (2006) Checking stationarity of the incidence rate using prevalent cohort survival data. Stat Med 25(10):1751–1767
Article MathSciNet Google Scholar
Chambers CD, Braddock SR, Briggs GG, Einarson A, Johnson YR, Miller RK, Polifka JE, Robinson LK, Stepanuk K, Jones KL (2001) Postmarketing surveillance for human teratogenicity: a model approach. Teratology 64:252–261
Article Google Scholar
Chambers CD, Johnson D, Xu R, Jones KL (2011) Challenges and design of a prospective, observational cohort study to assess the risk of spontaneous abortion following administration of human papillomavirus (HPV) bivalent (types 16 and 18) recombinant vaccine. In: The 27th international conference on pharmacoepidemiology and therapeutic risk management, Chicago, IL, USA
Chambers CD, Johnson D, Xu R, Luo Y, Louik C, Mitchell AA, Schatz M, Jones KL (2013) Risks and safety of pandemic h1n1 in uenza vaccine in pregnancy: birth defects, spontaneous abortion, preterm delivery, and small for gestational age infants. Teratology 31(44):5026–5032
Google Scholar
Chen M-H, Ibrahim JG, Sinha D (1999) A new bayesian model for survival data with a surviving fraction. J Am Stat Assoc 94(447):909–919
Article MathSciNet MATH Google Scholar
Chen C-M, Shen P-S, Wei JC-C, Lin L (2017) A semiparametric mixture cure survival model for left-truncated and right-censored data. Biom J 59:270–290
Article MathSciNet MATH Google Scholar
Conway JB (1990) A course in functional analysis, 2nd edn. Springer, New York
MATH Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J Am Stat Assoc 39(1):1–38
MathSciNet MATH Google Scholar
Farewell VT (1982) The use of mixture models for the analysis of survival data with long-time survivors. Biometrics 38:1041–1046
Article Google Scholar
Farewell VT (1986) Mixture models in survival analysis: are they worth the risk? Can J Stat 14(3):257–262
Article MathSciNet Google Scholar
Gamst A, Donohue M, Xu R (2009) Asymptotic properties and empirical evaluation of the NPMLE in the proportional hazards mixed-effects model. Stat Sin 19:997–1011
MathSciNet MATH Google Scholar
Gross ST, Lai TL (1996) Nonparametric estimation and regression analysis with left-truncated and right-censored data. J Am Stat Assoc 91:1166–1180
Article MathSciNet MATH Google Scholar
Hanson T, Bedrick EJ, Johnson WO, Thurmond MC (2003) A mixture model for bovine abortion and foetal survival. Stat Med 22(10):1725–1739
Article Google Scholar
Johansen S (1983) An extension of Cox’s regression model. Int Stat Rev 51:165–174
Article MathSciNet MATH Google Scholar
Kim Y-J, Jhun M (2008) Cure rate model with interval censored data. Stat Med 27(1):3–14
Article MathSciNet Google Scholar
Kuk AY, Chen C-H (1992) A mixture model combining logistic regression with proportional hazards regression. Biometrika 79(3):531–541
Article MATH Google Scholar
Lagakos SW, Marraj LM, De Gruttola V (1988) Nonparametric analysis of truncated survival data, with application to aids. Biometrika 75:515–523
Article MathSciNet MATH Google Scholar
Lai TL, Ying Z (1991) Estimating a distribution function with truncated and censored data. Ann Stat 19:417–442
Article MathSciNet MATH Google Scholar
Li C-S, Taylor JM, Sy JP (2001) Identifiability of cure models. Stat Probab Lett 54(4):389–395
Article MathSciNet MATH Google Scholar
Louis T (1982) Finding the observed information matrix when using the em algorithm. J R Stat Soc Ser B 44(2):226–233
MathSciNet MATH Google Scholar
Lu W, Ying Z (2004) On semiparametric transformation cure models. Biometrika 91(2):331–343
Article MathSciNet MATH Google Scholar
Meister R, Schaefer C (2008) Statistical methods for estimating the probability of spontaneous abortion in observational studies—analyzing pregnancies exposed to coumarin derivatives. Reprod Toxicol 26:31–35
Article Google Scholar
Murphy SA (1994) Consistency in a proportional hazards model incorporating a random effect. Ann Stat 22(2):712–731
Article MathSciNet MATH Google Scholar
Murphy SA (1995) Asymptotic theory for the frailty model. Ann Stat 23(1):182–198
Article MathSciNet MATH Google Scholar
Ning J, Qin J, Shen Y (2010) Non-parametric tests for right-censored data with biased sampling. J R Stat Soc Ser B 72:609–630
Article MathSciNet Google Scholar
Pan W (2000) A multiple imputation approach to Cox regression with interval-censored data. Biometrics 56(1):199–203
Article MATH Google Scholar
Qin J, Ning J, Liu H, Shen Y (2011) Maximum likelihood estimations and EM algorithms with length-biased data. J Am Stat Assoc 106(496):1434–1449
Article MathSciNet MATH Google Scholar
Rubin D, Little RJA (2002) Statistical analysis with missing data, 2nd edn. Wiley, New York
MATH Google Scholar
Sy JP, Taylor JM (2000) Estimation in a cox proportional hazards cure model. Biometrika 56(1):227–236
Article MathSciNet MATH Google Scholar
Turnbull BW (1976) The empirical distribution function with arbitrarily grouped, censored and truncated data. J R Stat Soc Ser B 38(3):290–295
MathSciNet MATH Google Scholar
Vaida F, Xu R (2000) Proportional hazards model with random effects. Stat Med 19:3309–3324
Article Google Scholar
Van der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes. Springer Series in Statistics. Springer, New York
Book MATH Google Scholar
Vardi Y (1985) Empirical distributions in selection bias models. Ann Stat 13(1):178–203
Article MathSciNet MATH Google Scholar
Wilcox AJ, Weinberg CR, O’Connor JF, Baird DD, Schlatterer JP, Canfield RE, Armstrong EG, Nisula BC (1988) Incidence of early loss of pregnancy. N Eng J Med 319(4):189–194
Article Google Scholar
Xu R, Chambers C (2011) A sample size calculation for spontaneous abortion in observational studies. Reprod Toxicol 32(4):490–493
Article Google Scholar
Zeng D, Lin DY (2007) Maximum likelihood estimation in semiparametric regression models with censored data. J R Stat Soc Ser B 69:507–564
Article MathSciNet Google Scholar
Zeng D, Yin G, Ibrahim JG (2006) Semiparametric transformation models for survival data with a cure fraction. J Am Stat Assoc 101:670–684
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, University of California, San Diego, CA, USA
Jue Hou & Ronghui Xu
Department of Family Medicine and Public Health, University of California, San Diego, CA, USA
Christina D. Chambers & Ronghui Xu
Department of Pediatrics, University of California, San Diego, CA, USA
Christina D. Chambers

Authors

Jue Hou
View author publications
You can also search for this author in PubMed Google Scholar
Christina D. Chambers
View author publications
You can also search for this author in PubMed Google Scholar
Ronghui Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ronghui Xu.

Appendices

Appendix A: Proofs

1.1 A.1 The existence of NPMLE

Proof of Theorem 1

Let $\theta _B$ be the maximizer on the compliment of compact set $\{\Vert \varvec{\alpha }\Vert \vee \Vert \varvec{\beta }\Vert \vee \Vert {\varvec{\lambda }}\Vert \le B\}$. We show that $l(\theta _B) \rightarrow -\infty $ when $B \rightarrow \infty $.

By Assumptions 1 and 2, we have the bound (17).

All terms in the log-likelihood are bounded except for

$$\begin{aligned} \sum _{i=1}^{n}\Big \{\delta ^1_i\log \lambda (X_i)-\delta ^1_i e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \Lambda (X_i)\Big \}. \end{aligned}$$

Let $\lambda _{\max }$ be the largest element in ${\varvec{\lambda }}$. The expression above has the upper bound

$$\begin{aligned} \log ( \lambda _{\max }/m)- \lambda _{\max }/m-K\log m, \end{aligned}$$

which diverges to $-\infty $ when we set $B \rightarrow \infty $.

Then, the global maximizer must be in one of the compact set $\{\Vert \varvec{\alpha }\Vert \vee \Vert \varvec{\beta }\Vert \vee \Vert {\varvec{\lambda }}\Vert \le B^*\}$ for some $B^*>0$. $\square $

Let $W_i^{\varvec{\theta }}(t)$ be defined as in (21). We define a generic inequality to be referenced later, for any $\varvec{\theta }= (\varvec{\alpha },\varvec{\beta }, \Lambda )$ in the parameter space whose baseline cumulative hazard $\Lambda $ is a step function jumping only at the observed event times, $t_1, \ldots , t_K$:

$$\begin{aligned} 0 < d\Lambda (t_k) \le \left( \sum _{j=1}^nW_j^{\varvec{\theta }} (t_k)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_j}\right) ^{-1}d\bar{N}(t_k), \quad k = 1, \ldots , K. \end{aligned}$$

(A1)

The conclusion of the following Lemma is used in the proofs of both Lemma 1 and Theorem 3.

Lemma A1

Let $\varvec{\theta }_{(n)} = \left( \varvec{\alpha }_{(n)}, \varvec{\beta }_{(n)}, \Lambda _{(n)}\right) $ be a sequence in the parameter space where $\Lambda _{(n)}$ is a non-decreasing step function with jumps only at the observed event times. Suppose that $\varvec{\theta }_{(n)}$ satisfies (A1) and has a subsequence $\varvec{\theta }_{(n_k)}$ converging to a limiting point ${\varvec{\theta }}^* = (\varvec{\alpha }^*, \varvec{\beta }^*, \Lambda ^*)$ a.s.:

$$\begin{aligned} \varvec{\alpha }_{(n_k)}-\varvec{\alpha }^* \rightarrow 0, \quad \varvec{\beta }_{(n_k)}-\varvec{\beta }^* \rightarrow 0, \quad \sup _{t\in [0,\tau ]}|e^{-\Lambda _{(n_k)}(t)}-e^{-\Lambda ^*(t)}| \rightarrow 0, \quad a.s..\qquad \end{aligned}$$

(A2)

Under Assumptions 1–4,

a) :: $\Lambda ^*(t)< \infty \text { for all } t<\tau $;
b) :: $\inf _{t\in [0,\zeta ]}E[W^{\varvec{\theta }^*}(t)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}]>C_w, \text { for some } C_w>0$.

Proof of Lemma A1

By checking the uniform continuity of $W_i^{\varvec{\theta }}(t)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}$ in $(\varvec{\alpha },\varvec{\beta },e^{-\Lambda (t)})$, we may establish

$$\begin{aligned} \sup _{t \in [0,\tau ]} \left| W_i^{\varvec{\theta }^*}(t)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }_i}- W_i^{\varvec{\theta }_{(n_k)}}(t)e^{\varvec{\beta }_{(n_k)}^\top {\mathbf { Z}_2 }_i}\right| \rightarrow 0, \quad a.s.. \end{aligned}$$

$W_i^{\varvec{\theta }}(t)$ as a function of observed random variables belongs to a Glivenko-Cantelli class of uniformly bounded functions with uniformly bounded variation. Thus, the pointwise convergence can be strengthen to be uniform convergence,

$$\begin{aligned} \sup _{t \in [0,\tau ]} \left| \frac{1}{n}\sum _{i=1}^{n_k} W_i^{\varvec{\theta }_{(n_k)}}(t)e^{\varvec{\beta }_{(n_k)}^\top {\mathbf { Z}_2 }_i} -E\left[ W^{\varvec{\theta }^*}(t)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}\right] \right| {\mathop {\longrightarrow }\limits ^{a.s.}}0. \end{aligned}$$

Note that $n^{-1}\sum _{i=1}^{n_k}W_i^{\varvec{\theta }_{(n_k)}} (t)e^{\varvec{\beta }_{(n_k)}^\top {\mathbf { Z}_2 }_i}$ is càglàd, so its limit $E[W^{\varvec{\theta }^*}(t)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }} ]$ must also be càglàd.

a) Let $\tau ^*=\inf \{t\in [0,\zeta ]: e^{-\Lambda ^*(t)}=0\}$. We shall prove that $\tau ^*=\tau $.

Suppose that $\tau ^*$ is an interior point of $[0,\tau ]$. From Assumption 4, $d\Lambda _0([s,t]) = \Lambda _0(t) -\Lambda _0(s) >0$ for any $s<t$ in $[0,\tau ]$. By the definition of $\tau ^*$, $\Lambda ^*(t)=\infty $ and $\phi ^{\varvec{\theta }^*}(t) = 0$ for $t \in [\tau ^*,\tau ]$, so we have

$$\begin{aligned} E\left[ W^{\varvec{\theta }^*}(\tau ^*)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}\right] =E\left[ \int _{\tau ^*_-}^\tau e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }} dN(u) \right] >0. \end{aligned}$$

By the left continuity of $W_i^{\varvec{\theta }}(t)$, $\exists \ s < \tau ^*$, s.t.

$$\begin{aligned} \inf _{t\in [s,\tau ^*]}E\left[ W^{\varvec{\theta }^*}(t) e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }} \right] \ge \frac{1}{2}E\left[ \int _{\tau ^*_-}^\tau e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }} dN(u) \right] . \end{aligned}$$

The total increment of $\Lambda _{(n_k)}$ in $[s,\tau ^*]$ must be bounded almost surely according to (A1). By the definition of $\tau ^*$, $\Lambda ^*(s)<\infty $. Putting these together, we reach the contradiction,

$$\begin{aligned} \Lambda ^*(\tau ^*) \le \varlimsup _{k \rightarrow \infty }\Lambda _{(n_k)}(\tau ^*) \le&\varlimsup _{k \rightarrow \infty }\Lambda _{(n_k)}(s)+ \int _{s_+}^{\tau ^*} \frac{d \bar{N}(u)}{\sum _{i=1}^{n_k}W^{\varvec{\theta }_{(n_k)}}_i(u)e^{\varvec{\beta }_{(n_k)}^\top {\mathbf { Z}_2 }_i}} \\ \le&\Lambda ^*(s)+ \frac{\tau ^*-s}{\inf _{t\in [s,\tau ^*]}E[W^{\varvec{\theta }^*}(t) e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}]}<\infty . \end{aligned}$$

The other case is $\tau ^* = 0$. Then, $\Lambda ^*(t)=\infty $ and $\phi ^{\varvec{\theta }^*}(t) = 0$ for $t \in [0,\tau ]$. The contradiction is easily established as

$$\begin{aligned} E\left[ W^{\varvec{\theta }^*}(0)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}\right] =E\left[ \int _0^\tau e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }} dN(u) \right] >0. \end{aligned}$$

b) Since $E[W^{\varvec{\theta }^*}(t) e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}]$ is càglàd, $\varvec{\theta }_{(n_k)}$ satisfies (A1) and converges uniformly to $\varvec{\theta }^*$, it can be seen that $E[W^{\varvec{\theta }^*}(t) e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}] \ge 0$ over the interior of $[0,\zeta ]$.

Write $n_k^{-1}\sum _{i=1}^{n_k}W_i^{\varvec{\theta }}(t)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}$ as

$$\begin{aligned}&n_k^{-1}\sum _{i=1}^{n_k} \int _{t-}^\tau \big \{1-\phi _i^{\varvec{\theta }}(u)\big \}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}dN_i(u)\nonumber \\&\quad +\int _t^\tau Y_i(u)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} d \phi _i^{\varvec{\theta }}(u)+Y_i(t)\phi _i^{\varvec{\theta }}(t)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \nonumber \\&=n_k^{-1}\sum _{i=1}^{n_k} \int _{t+}^\tau \left[ 1-\phi _i^{\varvec{\theta }}(u) -\frac{\sum _{j=1}^{n_k}Y_j(u)\phi _j^{\varvec{\theta }}(u)\big \{1-\phi _j^{\varvec{\theta }}(u)\big \}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_j}}{\sum _{j=1}^{n_k}W_j^{\varvec{\theta }}(u)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_j}}\right] e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}dN_i(u)\nonumber \\&\quad + \big \{1-\phi _i^{\varvec{\theta }}(t)\big \}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}dN_i(t)+ Y_i(t)\phi _i^{\varvec{\theta }}(t)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} . \end{aligned}$$

(A3)

By Assumption 4, all $Q_i < \zeta $ a.s.. Thus,

$$\begin{aligned} E\left[ W^{\varvec{\theta }^*}(\zeta ) e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}\right] =\,&E\left[ \big \{\delta ^1+\delta ^c \phi ^{\varvec{\theta }^*}(X)\big \} I\{\zeta \le X\}e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }} \right] \\ \ge \,&\,E\left[ \int _\zeta ^\tau e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}dN(u) \right] >0. \end{aligned}$$

For $t<\zeta $, the difference $E[W^{\varvec{\theta }^*}(t)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}]-E[ W^{\varvec{\theta }^*}(\zeta )e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}]$ is the limit of an integral like that in (A3), where the integrand has $\sum _{j=1}^{n_k}W_j^{\varvec{\theta }}(u)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_j}$ in the denominator. So it has potential singularities at the zeros of $E[ W^{\varvec{\theta }^*}(u)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}]$ for $ u \in [t,\zeta ]$. We shall show that $E[W^{\varvec{\theta }^*}(u)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}]$ is differentiable with respect to $d\Lambda _0(u)$ in $[0,\zeta ]$, so that its zero $u_0$ leads to the divergent form $ - \int _t ^\zeta |u-u_0|^{-1} du. $ We will then reach the contradiction that $E[W^{\varvec{\theta }^*}(t)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}]=-\infty $, as seen below.

Denote $R_0$ the set of zeros and limiting zeros from right for $E[ W^{\varvec{\theta }^*}(u)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}]$. Let set $R_{\triangle u}$ be the $\triangle u$ neighborhood of $R_0$ and $\Omega ^t_{\triangle u}=[t,\zeta ] \setminus R_{\triangle u}$. $E[W^{\varvec{\theta }^*}(u)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}]$ is bounded away from zero on $\Omega ^t_{\triangle u}$. Through (A3),

$$\begin{aligned} E&\left[ W^{\varvec{\theta }^*}(t)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}\right] -E\left[ W^{\varvec{\theta }^*}(\zeta )e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}\right] \nonumber \\&\le -\int _{\Omega ^t_{\triangle u}}\frac{E\left[ Y(u)\phi ^{\varvec{\theta }^*}(u)\big \{1-\phi ^{\varvec{\theta }^*}(u)\big \} e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}\right] }{E[ W^{\varvec{\theta }^*}(u)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}]}E\left[ e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}dN(u)\right] \nonumber \\&\quad + E\left[ \int _{t+}^\zeta \{1-\phi ^{\varvec{\theta }^*}(u)\}dN(u) + \big \{1-\phi ^{\varvec{\theta }^*}(t)\big \}e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}dN(t)+ Y(t)\phi ^{\varvec{\theta }^*}(t)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}\right] . \end{aligned}$$

(A4)

From part a), $e^{-\Lambda ^*(\zeta )}>0$. For any $ u<\zeta $,

$$\begin{aligned} \phi ^{\varvec{\theta }^*}_i(u) \ge \phi ^{\varvec{\theta }^*}_i(\zeta ) \ge \frac{m^{-1}e^{-m\Lambda ^*(\zeta )}}{1+m^{-1}e^{-m\Lambda ^*(\zeta )}}>0. \end{aligned}$$

So the limit of numerator term $E\left[ Y(u)\phi ^{\varvec{\theta }^*}(u)\{1-\phi ^{\varvec{\theta }^*}(u)\}e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}\right] $ is bounded away from zero. And $\forall u \in [0,\zeta ]$,

$$\begin{aligned} \left| \frac{dEW^{\varvec{\theta }^*}(u)}{d\Lambda _0(u)}\right| =&\left| E\left[ \big \{1-\phi ^{\varvec{\theta }^*}(u)\big \}Y(u)\phi ^{\varvec{\theta }_0}(u) e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }} -\phi ^{\varvec{\theta }^*}(u)\frac{dE[Y(u)|{\mathbf { Z}_1 },{\mathbf { Z}_2 }]}{d\Lambda _0(u)}\right] \right| \\ \le&m+\mathcal {L} <\infty . \end{aligned}$$

The first term in (A4) diverges to $-\infty $ when $\triangle u \rightarrow 0$. The other terms are bounded, so this is the desired contradiction. $\square $

Proof of Lemma 1

a) Define the marginal of the complete data likelihood

$$\begin{aligned} \tilde{L}(\varvec{\theta })=&\sum _{A_i=0,1}\sum _{M_i=0}^\infty \sum _{\widetilde{T}_{i1}=t_k: t_k\le Q_i}\dots \sum _{\widetilde{T}_{iM_i}=t_k: t_k\le Q_i} L^c_i(\varvec{\theta }) \\ =&\prod _{i=1}^n\frac{\big \{p_i\lambda _i(X_i)S_i(X_i)\big \}^{\delta ^1_i} (1-p_i)^{\delta ^0_i}\big \{p_iS_i(X_i)+1-p_i\big \}^{\delta ^c_i}}{1-p_i\sum _{k: t_k\le Q_i}\lambda _k e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}S_i(t_k)}. \end{aligned}$$

From (5) it can be seen that the complete data likelihood $L^c(\varvec{\theta })$ can be decomposed into the product of one logistic part with one Cox part. The Assumptions 1–3 contain the regularity conditions of these two parts. The event rate $P(A_i=1)$ is bounded away from both zero and one,

$$\begin{aligned} 0<\frac{m^{-1}}{m^{-1}+1} \le P(A_i=1) \le \frac{m}{m+1} < 1. \end{aligned}$$

The average at-risk process $E[Y_i(t)]$ is bounded away from zero almost surely. The matrices ${\mathbf { Z}_1 }$ and ${\mathbf { Z}_2 }$ are almost surely of full rank, as $\text {Var}({\mathbf { Z}_1 })$ and $\text {Var}({\mathbf { Z}_2 })$ are positive definite. Under these conditions, both parts of the likelihood are concave in the associated sets of parameters, $\varvec{\alpha }$ and $(\varvec{\beta },{\varvec{\lambda }})$, respective. Thus, $L^c(\varvec{\theta })$ is almost surely concave in $\varvec{\theta }$. $\tilde{L}(\varvec{\theta })$ is also concave as the sum over concave functions. The almost sure convergence of the EM algorithm is guaranteed by the almost sure concaveness of the marginal of the complete data likelihood Dempster et al. (1977). b) To prove the second result, we take the following strategy. For any $\varvec{\theta }$ denote $\lambda _{\max ,\zeta }=\max \{\lambda _k: t_k \le \zeta \}$, where $\zeta $ is the upper bound of truncation time defined in Assumption 4. Define a set in the parameter space:

$$\begin{aligned} {\Theta } = \left\{ \varvec{\theta }=(\varvec{\alpha },\varvec{\beta },\Lambda ) | \lambda _{\max ,\zeta } \le n^{-1}2/C_w\right\} , \end{aligned}$$

(A5)

with $C_w$ defined in Lemma A1. We would like to show that

$$\begin{aligned} \lim _{n\rightarrow \infty }P(\hat{\varvec{\theta }}, \tilde{\varvec{\theta }}\in \widehat{\Theta }) = 1. \end{aligned}$$

(A6)

This is done through applying Lemma A1, so we will need to verify condition (A1) for $\tilde{\varvec{\theta }}$ and $\hat{\varvec{\theta }}$. The convergence of the EM algorithm is obtained in the first step.

First, we show that the EM finds the unique stationary point of $\tilde{L}(\varvec{\theta })$, which then must be the global maximizer since it is concave from the proof of part 1. Consider the conditional expectation given the observed data as in (8)–(10). It can be verified directly (we skip the algebraic details here) that:

$$\begin{aligned} \nabla \log \tilde{L}(\varvec{\theta })=E_{\varvec{\theta }}[ \nabla \log L^c(\varvec{\theta }) |\mathcal {O}]. \end{aligned}$$

The estimator $\tilde{\varvec{\theta }}$ is by definition the solution to the left-hand side of the above being zero, hence also the stationary point of $\tilde{L}(\varvec{\theta })$.

We write down the stationary equation $\varvec{\theta }^{(l)}=\varvec{\theta }^{(l+1)} = \tilde{\varvec{\theta }}$ for $\tilde{\lambda }_k$’s at convergence,

$$\begin{aligned} \tilde{\lambda }_k=\frac{1+\tilde{\lambda }_k\sum _{i=1}^n\frac{\tilde{p}_ie^{\tilde{\varvec{\beta }}\top {\mathbf { Z}_2 }_i} \tilde{S}_i(t_k)I(Q_i \ge t_k)}{1-\tilde{p}_i\sum _{h:h<Q_i}\tilde{f}_i(t_h)}}{\sum _{i=1}^n\left\{ \delta ^1_i I(X_i \ge t_k)+\delta ^c_i\phi ^{\tilde{\varvec{\theta }}}_i(X_i)I(X_i \ge t_k) +\sum _{j\ge k}\frac{\tilde{p}_i \tilde{f}_i(t_j)I(Q_i \ge t_j)}{1-\tilde{p}_i\sum _{h:h<Q_i}\tilde{f}_i(t_h)} \right\} e^{\tilde{\varvec{\beta }}\top {\mathbf { Z}_2 }_i}}, \end{aligned}$$

(A7)

where $f_i$ was previously defined just above (6). Combining $\tilde{\lambda }_k$ terms leads to

$$\begin{aligned} \tilde{\lambda }_k^{-1}=\sum _{i=1}^n&\bigg \{\delta ^1_i I(X_i \ge t_k)+\delta ^c_i\phi ^{\tilde{\varvec{\theta }}}_i(X_i)I(X_i \ge t_k)\nonumber \\&-\tilde{p}_i \frac{\tilde{S}_i(t_k)I(Q_i \ge t_k)-\sum _{j\ge k}\tilde{f}_i(t_j)I(Q_i \ge t_j)}{1-\tilde{p}_i\sum _{h:h<Q_i}\tilde{f}_i(t_h)} \bigg \} e^{\tilde{\varvec{\beta }}\top {\mathbf { Z}_2 }_i}. \end{aligned}$$

(A8)

By the mean value theorem,

$$\begin{aligned} 0 \le e^{\lambda _ke^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}}-1-\lambda _ke^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\le \frac{1}{2}\left( \lambda _ke^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\right) ^2 e^{\lambda _ke^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}} \le \frac{1}{2}m^2\lambda _k^2e^{\lambda _km}. \end{aligned}$$

(A9)

where $ m$ is defined in (17). Applying (A9) to the denominator in (A8), we get

$$\begin{aligned} 1-\tilde{p}_i\sum _{h:h<Q_i}\tilde{f}_i(t_h) \ge 1-\tilde{p}_i\{1-\tilde{S}_i(Q_i)\}. \end{aligned}$$

By a similar argument, we have almost surely

$$\begin{aligned}&\tilde{S}_i(t_k)I(Q_i \ge t_k)-\sum _{j\ge k}\tilde{f}_i(t_j)I(Q_i \ge t_j) \\&\quad = \tilde{S}_i(Q_i)I(Q_i \ge t_k)+\sum _{j\ge k}\left\{ 1-e^{-\tilde{\lambda }_j e^{\tilde{\varvec{\beta }}\top {\mathbf { Z}_2 }_i}}-\tilde{\lambda }_j e^{\tilde{\varvec{\beta }}\top {\mathbf { Z}_2 }_i}\right\} \tilde{S}_i(t_j)I(Q_i > t_j)\\&\quad \le \tilde{S}_i(Q_i)I(Q_i \ge t_k). \end{aligned}$$

Then, $\tilde{\varvec{\theta }}$ satisfies (A1).

For $\hat{\varvec{\theta }}$, it must satisfy the score equation for $\lambda _k$’s:

$$\begin{aligned} \frac{\partial l(\varvec{\theta })}{\partial \lambda _k} = \sum _{i=1}^n \left\{ \frac{d N_i(t_k)}{\lambda _k} -W^{\varvec{\theta }}_i(t_k)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\right\} = 0, \quad \forall k=1,\ldots ,K. \end{aligned}$$

This is the equation version of (A1) after rearrangement.

Now let $\hat{\lambda }_{\max ,\zeta }$ and $\tilde{\lambda }_{\max ,\zeta }$ be the largest jump for $\hat{\Lambda }$ and $\tilde{\Lambda }$ on $[0,\zeta ]$, correspondingly. By Lemma A1 part b), we have

$$\begin{aligned} \limsup _{n\rightarrow \infty }n\hat{\lambda }_{\max ,\zeta } \le C_w^{-1}, \quad \limsup _{n\rightarrow \infty }n\tilde{\lambda }_{\max ,\zeta } \le C_w^{-1}, a.s.. \end{aligned}$$

Hence (A6) is established.

In the set $\Theta $, we evaluate the discrepancy between $\log \tilde{L}(\varvec{\theta })$ and $\log L (\varvec{\theta })$, which can be bounded as following

$$\begin{aligned} 1-S_i(Q_i)-\sum _{k:t_k<Q_i}\lambda _ke^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}S_i(t_k) =\sum _{k:t_k<Q_i} S_i(t_k) \left( e^{\lambda _ke^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}}-1-\lambda _ke^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\right) . \end{aligned}$$

(A10)

Applying (A9) to $|\log L(\varvec{\theta })-\log \tilde{L}(\varvec{\theta })|$, we have the bound

$$\begin{aligned}&\left| \log L(\varvec{\theta })-\log \tilde{L}(\varvec{\theta })\right| \\&\quad \le \sum _{i=1}^n\left| \log \left\{ 1-p_i+p_iS_i(Q_i)\right\} -\log \left\{ 1-p_i\sum _{k:t_k<Q_i}\lambda _ke^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}S_i(t_k)\right\} \right| \\&\quad \le \sum _{i=1}^n\left| \frac{p_i}{1-p_i}\frac{n}{2}m^2\lambda _k^2e^{\lambda _km}\right| \le \frac{1}{2}n^2 e^{m\lambda _{\max ,\zeta }} m^3\lambda _{\max ,\zeta }^2. \end{aligned}$$

Using the upper bound for $\lambda _{\max ,\zeta }$ in $\Theta $, we can bound

$$\begin{aligned} \sup _{\varvec{\theta }\in \Theta }\left| \log L(\varvec{\theta })-\log \tilde{L}(\varvec{\theta })\right| \le e^{\frac{2m}{C_w}}\frac{2m^3}{C_w^2}. \end{aligned}$$

(A11)

In summary whenever $\hat{\varvec{\theta }}, \tilde{\varvec{\theta }}\in {\Theta }$, we have

$$\begin{aligned} 0 \le \log L(\hat{\varvec{\theta }}) - \log L(\tilde{\varvec{\theta }}) \le \log L(\hat{\varvec{\theta }})-\log \tilde{L}(\hat{\varvec{\theta }}) + \log \tilde{L}(\tilde{\varvec{\theta }}) - \log L(\tilde{\varvec{\theta }}) <e^{\frac{2m}{C_w}}\frac{4m^3}{C_w^2}.\nonumber \\ \end{aligned}$$

(A12)

Combining (A12) and (A6) completes the proof. $\square $

Proof of Theorem 2 and 2’

From Lemma 1, we only need to establish the following two facts: (1) $ E[l_1(\varvec{\theta })]$ exists with one unique maximal, and (2) it is locally invertible at the maximal. We will see that (1) is verified through the Proof of Theorem 3, and (2) is verified through the Proof of Theorem 4. $\square $

1.2 A.2 Consistency of NPMLE

Proof of Theorem 3

The constants $m$, c, $\varepsilon $ and $\mathcal {L}$ are defined in (17), (18) and (19).

First, we show that the “bridge” $\bar{\Lambda }$ defined in (22) converges to the true $\Lambda _0$ in the following sense:

$$\begin{aligned} \sup _{t\in [0,\tau ]}\left| e^{-\bar{\Lambda }(t)}-e^{-\Lambda _0(t)}\right| \rightarrow 0, a.s. \end{aligned}$$

(A13)

as $n\rightarrow \infty $. We have the bound for $\forall t \in (0,\tau )$,

$$\begin{aligned} m\ge \frac{E\left[ Y(t)\phi ^{\varvec{\theta }_0}(t)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}\right] }{E\left[ \log \left\{ 1+\exp \left( \varvec{\alpha }_0^\top {\mathbf { Z}_1 }-\Lambda _0(t)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }_i}\right) \right\} \right] } \ge \frac{\varepsilon }{m^2+m}. \end{aligned}$$

(A14)

For any $\tau ^*<\tau $ in $\mathbb {Q}$ the set of rational numbers, $E[ Y(t)\phi ^{\varvec{\theta }_0}(t)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }} ]$ is bounded away from zero over $[0, \tau ^*]$. The uniform convergence of $\bar{\Lambda }$ to $\Lambda _0$ over any $[0, \tau ^*]$ can be obtained in the way like Murphy (1994). To extend the result to (A13), we use a trick described in (A15)–(A18). By Assumption 3, $\Lambda _0$ is non-decreasing and diverges to $\infty $ at $\tau $. Therefore,

$$\begin{aligned} \forall \epsilon >0, \, \exists \tau ^* \in (0,\tau ) \cap \mathbb {Q}, \, s.t. \, e^{-\Lambda _0(\tau ^*)}<\epsilon /3. \end{aligned}$$

(A15)

Through Rao’s law of large number and Helly-Bray argument, we have

$$\begin{aligned} \sup _{t\in [0,\tau ^*]}|\bar{\Lambda }(t)-\Lambda _0(t)| \rightarrow 0, \quad a.s. . \end{aligned}$$

(A16)

By continuity of the exponential function,

$$\begin{aligned} \exists N, \, \forall n>N, \, \sup _{t\in [0,\tau ^*]}|e^{-\bar{\Lambda }(t)} -e^{-\Lambda _0(t)}|<\epsilon /3. \end{aligned}$$

(A17)

Then,

$$\begin{aligned} \forall n>N, \, \sup _{t\in [\tau ^*,\tau ]}|e^{-\bar{\Lambda }(t)}-e^{-\Lambda _0(t)}| \le 2e^{-\Lambda _0(\tau ^*)}+|e^{-\bar{\Lambda }(\tau ^*)}-e^{-\Lambda _0(\tau ^*)}| <\epsilon .\qquad \end{aligned}$$

(A18)

Therefore, we have proved (A13).

Next, we evaluate the difference between the limits of $\hat{\Lambda }$ and $\bar{\Lambda }$. According to Assumption 1 and $e^{-\hat{\Lambda }(t)} \in [0,1]$, $(\hat{\varvec{\alpha }},\hat{\varvec{\beta }},e^{-\hat{\Lambda }(t)})$ is bounded. $\hat{\Lambda }(t)$ is Càdlàg, so is $e^{-\hat{\Lambda }(t)}$. By Helly’s Selection theorem, there is a subsequence converging uniformly almost surely to some $\varvec{\theta }^*=(\varvec{\alpha }^*, \varvec{\beta }^*, e^{-\Lambda ^*})$. Lemma A1 part b) gives the bound for $E\{ W^{\varvec{\theta }}(t)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }} \}$ over $[0,\zeta ]$. We only need to find its bound on $[\zeta ,\tau ]$ in order to mimic the Proof of Lemma 1 of Murphy (1994). Note that

$$\begin{aligned} E\left[ W^{\varvec{\theta }}(t)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }} \right] =&E\left[ \int _{t-}^\tau \big \{1-\phi ^{\varvec{\theta }}(u)\big \}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }}dN(u) \right] \\&-E\left[ \int _t^\tau \phi ^{\varvec{\theta }}(u)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }}dE[Y(u)|{\mathbf { Z}_1 },{\mathbf { Z}_2 }] \right] . \end{aligned}$$

By Assumption 4, $P(Q_i \le \zeta )=1$, so $E[Y(u)|{\mathbf { Z}_1 },{\mathbf { Z}_2 }]$ is decreasing on $[\zeta ,\tau ]$. Along with the Lipschitz continuity, we have for $\forall t \in [\zeta ,\tau )$

$$\begin{aligned} M\mathcal {L} \ge \frac{E[W^{\varvec{\theta }}(t)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }}]}{E\left[ \log \left\{ 1+\exp \left( \varvec{\alpha }_0^\top {\mathbf { Z}_1 }-\Lambda _0(t)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }_i}\right) \right\} \right] } \ge \frac{\varepsilon }{m^2+m}. \end{aligned}$$

Therefore, $\gamma (t)=\frac{E\left[ W^{\varvec{\theta }_0}(t)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }}\right] }{E\left[ W^{\varvec{\theta }^*}(t)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}\right] } $ is bounded away from both $\infty $ and zero, and

$$\begin{aligned} \sup _{t \in [0,\tau ]}\left| \frac{d\hat{\Lambda }}{d\bar{\Lambda }}(t)-\gamma (t) \right| \rightarrow 0 \text { and} \sup _{t \in [0,\tau ^*]}\left| \hat{\Lambda }(t)-\int _0^t\gamma d\Lambda _0 \right| \rightarrow 0 \; a.s. , \forall \tau ^*<\tau \text { in }\mathbb {Q}. \end{aligned}$$

(A19)

After all these preparation, we can use the semi-parametric Kullback-Leibler divergence argument from Murphy (1994). We have

$$\begin{aligned} 0 \le&\frac{1}{n} \big \{ l_n(\hat{\varvec{\alpha }},\hat{\varvec{\beta }},\hat{\Lambda }) -l_n(\varvec{\alpha }_0,\varvec{\beta }_0,\bar{\Lambda }) \big \} \nonumber \\ \nonumber =&\frac{1}{n}\sum _{i=1}^n \int _0^\tau \log \bigg \{ \frac{\phi _i^{\hat{\varvec{\theta }}}(u)e^{\hat{\varvec{\beta }}^\top {\mathbf { Z}_2 }_i}d \hat{\Lambda }(u) }{\phi _i^{\varvec{\theta }_0}(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }_i}d \bar{\Lambda }(u)} \bigg \} \bigg \{ dN_i(u)- \phi _i^{\varvec{\theta }_0}(u) Y_i(u) e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }_i} d\bar{\Lambda }(u)\bigg \}\nonumber \\&+ \int _0^\tau \left[ \log \bigg \{ \frac{\phi _i^{\hat{\varvec{\theta }}}(u)e^{\hat{\varvec{\beta }}^\top {\mathbf { Z}_2 }_i}d \hat{\Lambda }(u) }{\phi _i^{\varvec{\theta }_0}(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }_i}d \bar{\Lambda }(u)} \bigg \} - \bigg \{ \frac{\phi _i^{\hat{\varvec{\theta }}}(u)e^{\hat{\varvec{\beta }}^\top {\mathbf { Z}_2 }_i}d\hat{\Lambda }(u)}{\phi _i^{\varvec{\theta }_0}(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }_i}d\bar{\Lambda }(u)}-1\bigg \} \right] \nonumber \\&\times \phi _i^{\varvec{\theta }_0}(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }_i}Y_i(u) d\bar{\Lambda }(u). \end{aligned}$$

(A20)

Denote the function in the logarithm above as $\psi _i(u)$. Using the definition of $\bar{\Lambda }$, we can rewrite the first term in (A20) as

$$\begin{aligned}&\frac{1}{n}\sum _{i=1}^n \left\{ \int _0^\tau \log \big (\psi _i(u)\big ) - \frac{\sum _{j=1}^n \log \big (\psi _j(u)\big )\phi _j^{\varvec{\theta }_0}(u)Y_j(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }_j}}{\sum _{j=1}^n \phi _j^{\varvec{\theta }_0}(u)Y_j(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }_j}} \right\} dN_i(u) \nonumber \\&\quad = \frac{1}{n}\sum _{i=1}^n \left\{ \int _0^\tau \log \big (\psi _i(u)\big ) - \frac{\sum _{j=1}^n \log \big (\psi _j(u)\big )\phi _j^{\varvec{\theta }_0}(u)Y_j(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }_j}}{\sum _{j=1}^n \phi _j^{\varvec{\theta }_0}(u)Y_j(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }_j}} \right\} dM_i(u) \end{aligned}$$

(A21)

Inside $\psi _i(u)$, the ratio $d\hat{\Lambda }/d\bar{\Lambda }$ is bounded away from 0 and $\infty $ according to (A19). Denote the range of the ratio as [1 / R, R]. The $\phi _i^{\varvec{\theta }_0}(u)$ term and $\phi _i^{\hat{\varvec{\theta }}}(u)$ term in $\psi _i(u)$ creates potential singularity for (A21) at $\tau $, but its decay rate is bounded by $e^{-mR \Lambda _0(u)}$ by Assumptions 1 and 2. The integrands of martingale integral (A21) are all bounded a.s., and the quadratic variation of (A21) is bounded a.s. by

$$\begin{aligned} \frac{1}{n^2}\sum _{i=1}^n \int _0^\tau 4\big \{mR \Lambda _0(u) + \log (R) \big \}^2 \phi _i^{\varvec{\theta }_0}(u) Y_i(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }_i} d\Lambda _0(u). \end{aligned}$$

It is of order $O_p(1/n)$, so the limit of (A21) is zero almost surely.

The integrands in the second term of (A20) is of the form $\log (x)-(x-1) \le 0$. In order to satisfy the inequality in (A20), we must have

$$\begin{aligned} \lim _{n \rightarrow \infty } \frac{1}{n}\sum _{i=1}^n \int _0^\tau \big \{\log \big (\psi _i(u)\big ) - \big (\psi _i(u) -1 \big )\big \} \phi _i^{\varvec{\theta }_0}(u)e^{\hat{\varvec{\beta }}^\top {\mathbf { Z}_2 }_i}Y_i(u) d\bar{\Lambda }(u)= 0. \end{aligned}$$

Applying the same argument as in Murphy (1994), we get

$$\begin{aligned} E\left( \int _0^\tau \left| \phi ^{\varvec{\theta }^*}(u)e^{\varvec{\beta }^{*\top } {\mathbf { Z}_2 }}\gamma (u) - \phi ^{\varvec{\theta }_0}(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }} \right| Y(u)d\Lambda _0(u)\right) =0 \end{aligned}$$

(A22)

in the almost sure set. The identifiability of our model is verified in Li et al. (2001) Theorem 2. Along with our regularity conditions in Assumptions 2 and 3, (A22) leads to $\varvec{\alpha }^*=\varvec{\alpha }_0$, $\varvec{\beta }^*=\varvec{\beta }_0$ and $\gamma (t)=1$. This implies that

$$\begin{aligned} \sup _{t \in [0,\tau ^*]}\left| \hat{\Lambda }(t)-\Lambda _0 (t)\right| \rightarrow 0 \; a.s. , \forall \tau ^*<\tau \text { in }\mathbb {Q}. \end{aligned}$$

Repeating the trick in (A15)-(A18), we have

$$\begin{aligned} \sup _{t \in [0,\tau ]}\left| e^{-\hat{\Lambda }(t)}-e^{-\Lambda _0 (t)}\right| \rightarrow 0 \; a.s.. \end{aligned}$$

Finally, we summarize all usage of almost sure arguments to ensure that intersection of all almost sure sets still has probability one under $\sigma $-additivity. The steps (A15)–(A18) involves one almost sure argument for each choice of $\tau ^*$. We preserve the almost sure property by restricting $\tau ^*$ to be in the countable set $\mathbb {Q}$. One almost sure argument is made for Helly’s selection theorem. In Lemma A1, we use the Glivenko-Cantelli Theorem to avoid the dependence on the choice of $\varvec{\theta }^*$, so the almost sure argument is only applied once. Two more almost sure arguments are used in calculating the limit of the terms in (A20). $\square $

Proof of Theorem 3’

The proof is essentially the same as the Proof of Theorem 3, so the details are omitted. In fact, it is less technical due to the boundedness of $\Lambda _0$ over $[0, \tau ']$. $\square $

1.3 A.3 Asymptotic normality

First, we provide the definition of several quantities below. In Theorem 4 $\sigma (\mathbf {h})=\Big (\varvec{\sigma }_a(\mathbf {h}),\varvec{\sigma }_b(\mathbf {h}),\sigma _\eta (\mathbf {h})\Big )$ is

$$\begin{aligned} \varvec{\sigma }_a(\mathbf {h})= E\Bigg [&{\mathbf { Z}_1 }\bigg \{ -\int _0^{\tau '} K^{\varvec{\theta }_0}_1(\mathbf {h})(u)Y(u)d\phi ^{\varvec{\theta }_0}(u) \nonumber \\&+ K^{\varvec{\theta }_0}_2(\mathbf {h})Y(\tau ')\phi ^{\varvec{\theta }_0}(\tau ')\Big (1-\phi ^{\varvec{\theta }_0}(\tau ')\Big ) \bigg \}\Bigg ], \nonumber \\ \varvec{\sigma }_b(\mathbf {h})= E\Bigg [&{\mathbf { Z}_2 }\bigg \{\int _0^{\tau '} K^{\varvec{\theta }_0}_1(\mathbf {h})(u)Y(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}d\Big [\Lambda _0(u)\phi ^{\varvec{\theta }_0}(u)\Big ]\nonumber \\&- K^{\varvec{\theta }_0}_2(\mathbf {h})Y(\tau ')e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}\Lambda _0(\tau ')\phi ^{\varvec{\theta }_0}(\tau ')\Big (1-\phi ^{\varvec{\theta }_0}(\tau ')\Big )\bigg \}\Bigg ], \nonumber \\ \sigma _\eta (\mathbf {h})=E\Bigg [&e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}\bigg \{ K_1^{\varvec{\theta }_0}(\mathbf {h})(u)\phi ^{\varvec{\theta }_0}(u)Y(u) - K_2^{\varvec{\theta }_0}(\mathbf {h})Y(\tau ') \phi ^{\varvec{\theta }_0}(\tau ')\Big (1-\phi ^{\varvec{\theta }_0}(\tau ')\Big ) \nonumber \\&-\int _u^{\tau '} K^{\varvec{\theta }_0}_1(\mathbf {h})(s)\phi ^{\varvec{\theta }_0}(s)\Big (1-\phi ^{\varvec{\theta }_0}(s)\Big )Y(s)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}d\Lambda _0(s)\bigg \}\Bigg ], \end{aligned}$$

(A23)

where

$$\begin{aligned} K_1^{\varvec{\theta }}(\mathbf {h})(u)=\,&\mathbf {a}^\top {\mathbf { Z}_1 }\Big (1-\phi ^{\varvec{\theta }}(u)\Big ) +\mathbf {b}^\top {\mathbf { Z}_2 }\left\{ 1-\Big (1-\phi ^{\varvec{\theta }}(u)\Big ) \Lambda (u)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }}\right\} \nonumber \\&+\eta (u)-\Big (1-\phi ^{\varvec{\theta }}(u)\Big ) e^{\varvec{\beta }^\top {\mathbf { Z}_2 }} \int _0^u \eta d\Lambda , \nonumber \\ K_2^{\varvec{\theta }}(\mathbf {h})=&\,\Big \{\mathbf {a}^\top {\mathbf { Z}_1 }-\mathbf {b}^\top {\mathbf { Z}_2 }\Lambda (\tau ') e^{\varvec{\beta }^\top {\mathbf { Z}_2 }} -\int _0^{\tau '}\eta e^{\varvec{\beta }^\top {\mathbf { Z}_2 }} d\Lambda \Big \}. \end{aligned}$$

(A24)

Let $ \varvec{\theta }+t\mathbf {h}=\Big (\varvec{\alpha }+t\mathbf {a},\varvec{\beta }+t\mathbf {b},\int _0^\cdot (1+t\eta )d\Lambda \Big ) $. Define the directional derivatives

$$\begin{aligned} \lim _{t\rightarrow 0}\frac{l^I_n(\varvec{\theta }+t\mathbf {h})-l^I_n(\varvec{\theta })}{t} =S^{\varvec{\theta }}_n=S^{\varvec{\theta }}_{n,a}+S^{\varvec{\theta }}_{n,b}+S^{\varvec{\theta }}_{n,\eta }, \end{aligned}$$

where

$$\begin{aligned} S^{\varvec{\theta }}_{n,a}=&\frac{1}{n}\sum _{i=1}^n \mathbf {a}^\top {\mathbf { Z}_1 }_i \bigg \{\int _0^{\tau '} \Big (1-\phi _i^{\varvec{\theta }}(u)\Big ) dN_i(u)\\&-\int _0^{\tau '} Y_i(u)\phi _i^{\varvec{\theta }}(u)\Big (1-\phi _i^{\varvec{\theta }}(u)\Big ) e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}d\Lambda (u) \\&+\Big (N_i(\tau )-N_i(\tau ')\Big )\Big (1-\phi _i^{\varvec{\theta }}(\tau ')\Big ) -Y_i(\tau )\phi _i^{\varvec{\theta }}(\tau ')\bigg \}\\ S_{n,b}^{\varvec{\theta }}=&\frac{1}{n}\sum _{i=1}^n \mathbf {b}^\top {\mathbf { Z}_2 }_i \bigg [ \int _0^{\tau '} \left\{ 1-\Big (1-\phi _i^{\varvec{\theta }}(u)\Big ) \Lambda (u)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\right\} dN_i(u)\\&+\int _0^{\tau '} Y_i(u)\phi _i^{\varvec{\theta }}(u)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \left\{ \Big (1-\phi _i^{\varvec{\theta }}(u)\Big )\Lambda (u)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} -1 \right\} d\Lambda (u)\\&-\Big (N_i(\tau )-N_i(\tau ')\Big )\Big (1-\phi _i^{\varvec{\theta }} (\tau ')\Big )\Lambda (\tau ')e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}+Y_i(\tau )\phi _i^{\varvec{\theta }}(\tau ')\Lambda (\tau ')e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\bigg ]\\ S_{n,\eta }^{\varvec{\theta }} =&\frac{1}{n}\sum _{i=1}^n \int _0^{\tau '} \left[ \eta (u)-\Big \{1-\phi _i^{\varvec{\theta }}(u)\Big \} e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \int _0^u \eta d\Lambda \right] dN_i(u) \\&+ \int _0^{\tau '} Y_i(u)\phi _i^{\varvec{\theta }}(u)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \left[ \Big \{1-\phi _i^{\varvec{\theta }}(u)\Big \} e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \int _0^u \eta d\Lambda -\eta (u) \right] d\Lambda (u)\\&-\Big (N_i(\tau )-N_i(\tau ')\Big )\Big (1-\phi _i^{\varvec{\theta }}(\tau ')\Big )\int ^{\tau '}_0\eta d\Lambda e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\\&+Y_i(\tau )\phi _i^{\varvec{\theta }}(\tau ')\int ^{\tau '}_0\eta d\Lambda e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}. \end{aligned}$$

Their expectations are denoted as

$$\begin{aligned} S^{\varvec{\theta }}=S^{\varvec{\theta }}_a+S^{\varvec{\theta }}_b+S^{\varvec{\theta }}_\eta =E\left( S^{\varvec{\theta }}_{n,a}\right) +E\left( S^{\varvec{\theta }}_{n,b}\right) +E\left( S^{\varvec{\theta }}_{n,\eta }\right) . \end{aligned}$$

Again let $\varvec{\theta }_0$ be the true parameter and $\varvec{\theta }$ another element in the paramter space. Define $\triangle \varvec{\theta }=\varvec{\theta }-\varvec{\theta }_0$ with

$$\begin{aligned} \triangle \varvec{\alpha }=\varvec{\alpha }-\varvec{\alpha }_0, \, \triangle \varvec{\beta }=\varvec{\beta }-\varvec{\beta }_0 \text { and } \triangle \Lambda (\cdot )=\Big \{\Lambda (\cdot )-\Lambda _0(\cdot )\Big \}. \end{aligned}$$

Define $lin \Theta $ to be the linear space spanned by $\{ \varvec{\theta }-\varvec{\theta }_0 : \varvec{\theta }\text { in parameter space}\}$. Let $\varvec{\theta }_t = \varvec{\theta }_0+t\triangle \varvec{\theta }$. The functional Hessian is a linear operator $lin \Theta \mapsto l^\infty (H_p)$ defined as

$$\begin{aligned} \dot{S}^{\varvec{\theta }_0}(\triangle \varvec{\theta })(\mathbf {h}) =&\lim _{t\rightarrow 0}\frac{S^{\varvec{\theta }_t }(\mathbf {h})-S^{\varvec{\theta }_0}(\mathbf {h})}{t} \nonumber \\ =&-\triangle \varvec{\alpha }^\top \varvec{\sigma }_a(\mathbf {h}) -\triangle \varvec{\beta }^\top \varvec{\sigma }_b (\mathbf {h}) -\int _0^{\tau '} \sigma _\eta (\mathbf {h})(u)d\triangle \Lambda (u) \end{aligned}$$

(A25)

with $\sigma $ defined in (A23).

The following Lemma A2 is used in the proofs of Theorems 4 and 5. It tells us about the property of $\sigma $, the essential element in the functional Hessian.

Lemma A2

Let the operator $\sigma : (\mathbf {a},\mathbf {b},\eta ) \mapsto \Big (\varvec{\sigma }_a(\mathbf {h}),\varvec{\sigma }_b(\mathbf {h}),\sigma _\eta (\mathbf {h})\Big )$ be defined as in (A23). Under the conditions of Theorem 4, $\sigma $ is a continuously invertible bijection from $H_\infty $ to $H_\infty $.

Proof of Lemma A2

First we prove that $\sigma $ is injection by an identifiability argument. Define an inner-product between $\sigma (\mathbf {h})$ and $\mathbf {h}$ as

$$\begin{aligned} \Big <\sigma (\mathbf {h}),\mathbf {h}\Big >=&\, \mathbf {a}^\top \varvec{\sigma }_a(\mathbf {h})+\mathbf {b}^\top \varvec{\sigma }_b(\mathbf {h})+\int _0^{\tau '}\sigma _\eta (\mathbf {h})(u)\eta (u) d\Lambda _0(u) \\ =&\int _0^{\tau '}E\left[ \big \{K^{\varvec{\theta }_0}_1(\mathbf {h})(u)\big \}^2Y(u)\phi ^{\varvec{\theta }_0}(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}\right] d\Lambda _0(u)\\&+E\left[ \big \{ K^{\varvec{\theta }_0}_2(\mathbf {h})\big \}^2Y(\tau ')\phi ^{\varvec{\theta }_0}(\tau ')\Big (1-\phi ^{\varvec{\theta }_0}(\tau ')\Big ) \right] . \end{aligned}$$

If $\Big <\sigma (\mathbf {h}),\mathbf {h}\Big >=0$, we have almost surely $K^{\varvec{\theta }_0}_2(\mathbf {h})=0$ and $K^{\varvec{\theta }_0}_1(\mathbf {h})(u)=0$ a.e. $u \in [0, \tau ']$. Therefore,

$$\begin{aligned} \int _0^t K^{\varvec{\theta }_0}_1(\mathbf {h})(u)\phi ^{\varvec{\theta }_0}(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}d\Lambda _0(u)=0, \forall t\in [0,\tau '], a.s.. \end{aligned}$$

Calculating the integral, we have for for any $t\in [0,\tau ']$ a.s.

$$\begin{aligned} -\mathbf {a}^\top {\mathbf { Z}_1 }\phi ^{\varvec{\theta }_0}(t)+\mathbf {b}^\top {\mathbf { Z}_2 }\phi ^{\varvec{\theta }_0}(t)\Lambda _0(t)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}+\int _0^t\eta (u)d\Lambda _0(u)\phi ^{\varvec{\theta }_0}(t)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}=0. \end{aligned}$$

Setting $t=0$, we have $-\mathbf {a}^\top {\mathbf { Z}_1 }\phi ^{\varvec{\theta }_0}(0)=0$, so $\mathbf {a}^\top {\mathbf { Z}_1 }=0$. By Assumption 2, $\mathbf {a}=0$. Plugging $\mathbf {a}=0$ into $K^{\varvec{\theta }_0}_2$ yields

$$\begin{aligned} K^{\varvec{\theta }_0}_2(\mathbf {h})= e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}\Big \{\mathbf {b}^\top {\mathbf { Z}_2 }\Lambda _0(\tau ')-\int _0^{\tau '}\eta (u)d\Lambda _0(u)\Big \}=0, a.s.. \end{aligned}$$

Again, $\mathbf {b}^\top {\mathbf { Z}_2 }= \int _0^{\tau '}\eta (u)d\Lambda _0(u)/\Lambda _0(\tau ')$ is deterministic, so $\mathbf {b}=0$. This way $\eta $ must also be constantly zero. As a result, $\sigma (\mathbf {h})=\sigma (\mathbf {h}') \Rightarrow \Big (\sigma (\mathbf {h}-\mathbf {h}'),\mathbf {h}-\mathbf {h}'\Big )=0 \Rightarrow \mathbf {h}=\mathbf {h}'$.

To show it is a bijection, we apply Theorem 3.11 in Conway (1990). It suffices to decompose $\sigma $ as the sum of one invertible operator and one compact operator. The invertible operator is defined as

$$\begin{aligned} \Sigma (\mathbf {h})=\Big (E\left( {\mathbf { Z}_1 }{\mathbf { Z}_1 }^\top \right) \mathbf {a},E\left( {\mathbf { Z}_2 }{\mathbf { Z}_2 }^\top \right) \mathbf {b}, \eta (t)E\left\{ e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}\phi ^{\varvec{\theta }_0}(t)Y(t)\right\} \Big ). \end{aligned}$$

Since $E\left( {\mathbf { Z}_1 }{\mathbf { Z}_1 }^\top \right) $, $E\left( {\mathbf { Z}_2 }{\mathbf { Z}_2 }^\top \right) $ are both positive definite, and $\inf _{t\in [0,\tau ']}Ee^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}\phi ^{\varvec{\theta }_0}(t)Y(t)>0$, the inverse exists as

$$\begin{aligned} \Sigma ^{-1}(\mathbf {h})=\Big (\left[ E\big \{{\mathbf { Z}_1 }{\mathbf { Z}_1 }^\top \big \}\right] ^{-1}\mathbf {a},\left[ E\big \{{\mathbf { Z}_2 }{\mathbf { Z}_2 }^\top \big \}\right] ^{-1} \mathbf {b}, \eta (t)\left[ E\big \{e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}\phi ^{\varvec{\theta }_0}(t)Y(t)\big \}\right] ^{-1}\Big ). \end{aligned}$$

For the compactness of $\sigma (\mathbf {h})-\Sigma (\mathbf {h})$, classical Helly-selection plus dominated convergence method applies as all terms are conveniently bounded. $\square $

The Proof of Theorem 4 is the application of Theorem 3.3.1 from Van der Vaart and Wellner (1996). We shall verify all the required conditions for the Theorem.

Proof of Theorem 4

Since we work under a modified Assumption 3’ now, the martingale representation in (15) needs to change accordingly beyond $\tau '$. We still use $M_i(t)$ as the notation. Define the filtrations $\big \{\mathcal {F}_t: t \in [0,\tau ] \big \}$. On $[0,\tau ']$, $\mathcal {F}_t$ is the natural $\sigma $-algebra generated by $\{N_i(t), Y_i(t), {\mathbf { Z}_1 }_i, {\mathbf { Z}_2 }_i, i = 1, \ldots , n\}$. Since there is no extra information in the tail window $(\tau ', \tau )$, we set $\mathcal {F}_t =\mathcal {F}_{\tau '}$ for $t \in (\tau ', \tau )$. $\mathcal {F}_\tau $ is the $\sigma $-algebra generated by $\{N_i(\tau )-N_i(\tau '), Y_i(\tau ), {\mathbf { Z}_1 }_i, {\mathbf { Z}_2 }_i, i = 1, \ldots , n\}$, where $Y_i(\tau ) = Y_i(\tau ') - dN_i(\tau ')$ is measurable in $\mathcal {F}_{\tau '}$. The filtrations on $[0,\tau ']$ stay the same, so $M_i(t)$ defined in (15) is still a martingale up to time $\tau '$. In the tail window $(\tau ', \tau )$, we set $M_i(t)$ constantly equals $M_i(\tau ')$. To extend its definition to time $\tau $, we define

$$\begin{aligned} d M_i(\tau ) = M_i(\tau ) - M_i(\tau ') = \big \{N_i(\tau )-N_i(\tau ')\big \} - Y_i(\tau ) \phi ^{\varvec{\theta }_0}_i(\tau '). \end{aligned}$$

(A26)

It is easy to verify that $E[ M_i(\tau )| \mathcal {F}_{\tau '}] = M_i(\tau ')$, so $M_i(t)$ thus defined is a martingale with respect to the new filtrations $\big \{\mathcal {F}_t: t \in [0,\tau '] \cup \{\tau \}\big \}$. Analogously, we define the process $M^{\varvec{\theta }}_i(\cdot )$ which replaces the true parameter $\varvec{\theta }_0$ in $M_i(\cdot )$ by arbitrary $\varvec{\theta }$ in the parameter space. Apparently, $M^{\varvec{\theta }_0}_i(\cdot ) = M_i(\cdot )$. From here, we establish the needed results based on the martingale theory.

First, we prove weak convergence of the empirical score

$$\begin{aligned} \sqrt{n}(S^{\varvec{\theta }_0}_n-S^{\varvec{\theta }_0}){\mathop {\longrightarrow }\limits ^{l^\infty (H_p)}} \mathcal {W}. \end{aligned}$$

(A27)

Notice that $S_1^{\varvec{\theta }_0}-S^{\varvec{\theta }_0}$ is a martingale integral with respect to (A26). The weak convergence follows from martingale central limit theorem. The covariance process is given by the expectation of its quadratic variation:

$$\begin{aligned}&\text {Cov}\big (\mathscr {G}(\mathbf {h}),\mathscr {G}(\mathbf {h}^*)\big )=E\Big [ \int _0^{\tau '} K^{\varvec{\theta }_0}_1(\mathbf {h}) K^{\varvec{\theta }_0}_1(\mathbf {h}^*) Y(u)\phi _0(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}d\Lambda _0(u) \\&\qquad \qquad \quad \quad \quad \quad \quad \quad \quad + K^{\varvec{\theta }_0}_2(\mathbf {h}) K^{\varvec{\theta }_0}_2(\mathbf {h}^*)\phi _0(\tau ')\big \{1-\phi _0(\tau ')\big \} \Big ], \end{aligned}$$

where $K_1$ and $K_2$ are defined as in (A24).

Next, we verify the approximation condition

$$\begin{aligned} \sqrt{n}\left( S_n^{\hat{\varvec{\theta }}}-S^{\hat{\varvec{\theta }}} - S_n^{\varvec{\theta }_0}-S^{\varvec{\theta }_0}\right) = o_p(1). \end{aligned}$$

(A28)

Consider the class $\{S_1^{\varvec{\theta }}(\mathbf {h})-S_1^{\varvec{\theta }_0}(\mathbf {h}): \Vert \varvec{\theta }-\varvec{\theta }_0\Vert \le \varepsilon , \mathbf {h}\in H_p \}$. All terms involved in this class are uniformly bounded with uniformly bounded variation, so it is a Donsker class for the set of observable random variables. By checking that $\phi _i^{\varvec{\theta }}$ is Lipschitz in $\varvec{\theta }$ under the $l^\infty (H_p)$ norm, we have almost surely

$$\begin{aligned} \sup _{t,{\mathbf { Z}_2 },{\mathbf { Z}_1 }} |\phi _i^{\varvec{\theta }}(t)-\phi _i^{\varvec{\theta }_0}(t)| = O\left( \Vert \varvec{\theta }-\varvec{\theta }_0\Vert \right) , \end{aligned}$$

and similarly

$$\begin{aligned} \sup _{t,{\mathbf { Z}_2 },{\mathbf { Z}_1 }} |\phi _i^{\varvec{\theta }}(t) {\Lambda }(t)-\phi _i^{\varvec{\theta }_0}(t)\Lambda _0(t)| =O\left( \Vert \varvec{\theta }-\varvec{\theta }_0\Vert \right) . \end{aligned}$$

For a single summand in the score,

$$\begin{aligned} \sup _{h\in H_p}E[S_1^{\varvec{\theta }}(\mathbf {h})-S_1^{\varvec{\theta }_0}(\mathbf {h})]^2 =O\left( \Vert \varvec{\theta }-\varvec{\theta }_0\Vert ^2\right) . \end{aligned}$$

We plug $\hat{\varvec{\theta }}$ into the expression above. Thus, the variance of the limiting process of (A28) is o(1) by the consistency of $\hat{\varvec{\theta }}$ from Theorem 3’, so the process itself is $o_p(1)$.

We then show the Fréchet differentiability of expected score S at $\varvec{\theta }_0$ in the direction of $\hat{\varvec{\theta }}-\varvec{\theta }_0$,

$$\begin{aligned} S^{\hat{\varvec{\theta }}_t}-S^{\varvec{\theta }_0}=t\dot{S}^{\varvec{\theta }_0} (\hat{\varvec{\theta }}-\varvec{\theta }_0)+o_p(t\Vert \hat{\varvec{\theta }}-\varvec{\theta }_0\Vert ). \end{aligned}$$

(A29)

We use a shorthand notation for the expected score at $\varvec{\theta }$:

$$\begin{aligned} S^{\varvec{\theta }}(\mathbf {h})&= E\left[ \int _0^{\tau '} K_1^{\varvec{\theta }}(\mathbf {h})(u)dM^{\varvec{\theta }}(u) + K_2^{\varvec{\theta }}(\mathbf {h}) d M^{\varvec{\theta }}(\tau )\right] \\&= E\left[ \int _0^{\tau } V^{\varvec{\theta }}(\mathbf {h})(u) d M^{\varvec{\theta }}(u)\right] , \end{aligned}$$

by setting

$$\begin{aligned} V^{\varvec{\theta }}(\mathbf {h})(t) = I(t \le \tau ')K_1^{\varvec{\theta }}(\mathbf {h})(t) + I(t=\tau ) K_2^{\varvec{\theta }}(\mathbf {h}). \end{aligned}$$

By the Lipschitz continuity with respect to $\Vert \varvec{\theta }\Vert $ for all terms involved, $ K_1^{\varvec{\theta }}(\mathbf {h})$, $K_2^{\varvec{\theta }}(\mathbf {h})$ and $dM^{\varvec{\theta }}$,

$$\begin{aligned}&S^{ {\varvec{\theta }}_t}(\mathbf {h})-S^{\varvec{\theta }}(\mathbf {h}) \\&\quad = E\left[ \int _0^{\tau '} V^{ {\varvec{\theta }}_t}(\mathbf {h})(u)dM^{ {\varvec{\theta }}_t}(u) \right] \\&\quad = E\left[ \int _0^{\tau '} V^{\varvec{\theta }_0} (\mathbf {h})(u)d\big \{M^{ {\varvec{\theta }}_t}(u)- M^{\varvec{\theta }_0}(u)\big \}\right] +E\left[ \int _0^{\tau '}V^{ {\varvec{\theta }}_t}(\mathbf {h})(u)dM^{\varvec{\theta }_0}(u)\right] \\&\quad \quad + E\left[ \int _0^{\tau '}\big \{V^{ {\varvec{\theta }}_t}(\mathbf {h})(u)- V^{\varvec{\theta }_0}(\mathbf {h})(u)\big \}d\big \{M^{ {\varvec{\theta }}_t}(u)- M^{\varvec{\theta }_0}(u)\big \} \right] \\&\quad = t\dot{S}^{\varvec{\theta }_0}( {\varvec{\theta }}-\varvec{\theta }_0)(\mathbf {h})+0+O_p(t^2\Vert {\varvec{\theta }}-\varvec{\theta }_0\Vert ^2). \end{aligned}$$

Again, we plug-in $\hat{\varvec{\theta }}$ and use the consistency result to verify the condition (A29).

Afterwards, we find the local inverse of the functional Hessian in (A25). We have shown in Lemma A2 that the functional operator $\sigma $ is a continuously invertible bijection from $H_\infty $ to $H_\infty $. The invertibility of $\dot{S}^{\varvec{\theta }_0}$ in $H_p$ follows from the following argument. By the continuous invertibility of $\sigma $, there is some q so that $\sigma ^{-1}(H_q) \subseteq H_p$, and

$$\begin{aligned}&\inf _{\triangle \varvec{\theta }\in lin \Theta } \frac{\sup _{\mathbf {h}\in H_p}|(\varvec{\alpha }-\varvec{\alpha }_0)^\top \varvec{\sigma }_a(\mathbf {h}) +(\varvec{\beta }-\varvec{\beta }_0)^\top \varvec{\sigma }_b (\mathbf {h}) +\int _0^{\tau '}\sigma _\eta (\mathbf {h}) d(\Lambda -\Lambda _0)|}{ \Vert \triangle \varvec{\theta }\Vert _{l^\infty (H_p)}\ } \nonumber \\&\quad \ge \inf _{\triangle \varvec{\theta }\in lin \Theta } \frac{\sup _{\mathbf {h}\in \sigma ^{-1}(H_q)}|(\varvec{\alpha }-\varvec{\alpha }_0)^\top \varvec{\sigma }_a(\mathbf {h}) +(\varvec{\beta }-\varvec{\beta }_0)^\top \varvec{\sigma }_b (\mathbf {h}) +\int _0^{\tau '}\sigma _\eta (\mathbf {h}) d(\Lambda -\Lambda _0)|}{ p\Vert \triangle \varvec{\theta }\Vert } \nonumber \\&\quad =\inf _{\triangle \varvec{\theta }\in lin \Theta } \frac{\sup _{\mathbf {h}\in H_q}| \triangle \varvec{\theta }(\mathbf {h}) |}{ p\Vert \triangle \varvec{\theta }\Vert } > \frac{q }{2p}. \end{aligned}$$

(A30)

Finally, let us put everything together. The NPMLE $\hat{\varvec{\theta }}$ is shown to be consistent in Theorem 3’, and (A27), (A28), (A29) and (A30) verify the conditions of Theorem 3.3.1 from Van der Vaart and Wellner (1996). $\square $

Proof of Theorem 5

The proof for the continuous invertibility of $\hat{\sigma }$ is similar to the Proof of Lemma A2. The approximation error between the natural estimator $\hat{\sigma }$ and Louis’ formula variance estimator using (14) again comes from the “ghost copies” like the case in Lemma 1, so the same argument applies to show their asymptotic equivalence. $\square $

Appendix B: Variance Estimator

1.1 B.1 Derivatives of log-likelihood

Let $l^c(\varvec{\alpha },\varvec{\beta },{\varvec{\lambda }})=\sum _{i=1}^n l^c_i(\varvec{\alpha },\varvec{\beta },{\varvec{\lambda }})$ be the complete data log-likelihood,

$$\begin{aligned} l^c_i(\varvec{\alpha },\varvec{\beta },{\varvec{\lambda }}) =&\, (A_i+M_i) \varvec{\alpha }^\top {\mathbf { Z}_1 }_i -(1+M_i)\log (1+e^{\varvec{\alpha }^\top {\mathbf { Z}_1 }_i})\\&+\delta ^1_i A_i \sum _{k=1}^K I\{X_i=t_k\}(\log \lambda _k +\varvec{\beta }^\top {\mathbf { Z}_2 }_i) - A_i \sum _{k:t_k \le X_i} \lambda _k e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\\&+M_i\sum _{k:t_k<Q_i} I\{\kappa _i=k\}\Big (\log \lambda _k+\varvec{\beta }^\top {\mathbf { Z}_2 }_i-\sum _{h=1}^k \lambda _h e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\Big ). \end{aligned}$$

Its gradient is given by

$$\begin{aligned} \nabla l^c_i=\left( \frac{\partial l^c_i}{\partial \varvec{\alpha }}, \frac{\partial l^c_i}{\partial \varvec{\beta }}, \frac{\partial l^c_i}{\partial {\varvec{\lambda }}}\right) ^\top , \end{aligned}$$

where

$$\begin{aligned} \frac{\partial l^c_i}{\partial \varvec{\alpha }} =&\, {\mathbf { Z}_1 }_i\Big \{A_i+M_i-(1+M_i)\frac{e^{\varvec{\alpha }^\top {\mathbf { Z}_1 }_i}}{1+e^{\varvec{\alpha }^\top {\mathbf { Z}_1 }_i}}\Big \} = {\mathbf { Z}_1 }_i\big \{A_i-p_i+M_i(1-p_i)\big \}, \\ \frac{\partial l^c_i}{\partial \varvec{\beta }} =&\, {\mathbf { Z}_2 }_i \bigg \{A_i \delta ^1_i +M_i-\Big (A_i \sum _{k:t_k \le X_i}\lambda _k +M_i \sum _{k=1}^{\kappa _i} \lambda _k \Big )e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\bigg \} \\ =&\, {\mathbf { Z}_2 }_i \Big \{A_i \delta ^1_i +M_i-A_i \Lambda _i(X_i) -M_i \Lambda _i(\kappa _i)\Big \}, \\ \frac{\partial l^c_i}{\partial \lambda _k} =&\, \Big (A_i \delta ^1_i I\{X_i=t_k\}+M_i I\{\kappa _i=k\}\Big )\frac{1}{\lambda _k}-\Big (A_i I\{t_k \le X_i\}+M_iI\{\kappa _i \ge t_k\}\Big ) e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\\ =&\, A_i\Big ( \frac{\delta ^1_i I\{X_i=t_k\}}{\lambda _k}-I\{t_k \le X_i\}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\Big ) + M_i\Big ( \frac{I\{\kappa _i=k\}}{\lambda _k}- I\{\kappa _i \ge t_k\} e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\Big ). \end{aligned}$$

Its Hessian is given by

$$\begin{aligned} \nabla ^2 l^c_i=\left( \begin{array}{ccc} \frac{\partial ^2 l^c_i}{\partial \varvec{\alpha } \partial \varvec{\alpha }^\top } &{} 0 &{} 0 \\ 0 &{} \frac{\partial ^2 l^c_i}{\partial \varvec{\beta } \partial \varvec{\beta }^\top } &{} \frac{\partial ^2 l^c_i}{\partial \varvec{\beta } \partial \varvec{\lambda }^\top } \\ 0 &{} \left[ {\frac{\partial ^2 l^c_i}{\partial \varvec{\beta } \partial \varvec{\lambda }^\top }} \right] ^\top &{} \text {diag}(\frac{\partial ^2 l^c_i}{\partial \lambda _k^2 }) \end{array}\right) , \end{aligned}$$

where

$$\begin{aligned} \frac{\partial ^2 l^c_i}{\partial \varvec{\alpha } \partial \varvec{\alpha }^\top } =&\, {\mathbf { Z}_1 }_i {\mathbf { Z}_1 }_i^\top \Big \{-(1+M_i)\frac{e^{\varvec{\alpha }^\top {\mathbf { Z}_1 }_i}}{(1+e^{\varvec{\alpha }^\top {\mathbf { Z}_1 }_i})^2}\Big \} = - {\mathbf { Z}_1 }_i {\mathbf { Z}_1 }_i^\top (1+M_i)p_i(1-p_i), \\ \frac{\partial ^2 l^c_i}{\partial \varvec{\beta } \partial \varvec{\beta }^\top } =&\, {\mathbf { Z}_2 }_i {\mathbf { Z}_2 }_i^\top \bigg \{-\Big (A_i \sum _{k:t_k \le X_i}\lambda _k +M_i \sum _{k=1}^{\kappa _i} \lambda _k \Big )e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\bigg \}, \\ \frac{\partial ^2 l^c_i}{\partial \varvec{\beta }\partial \lambda _k} =&\, {\mathbf { Z}_2 }_i \bigg \{-\Big (A_i I\{t_k \le X_i\} +M_i I\{t_k \le \kappa _i\} \Big )e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\bigg \}, \\ \frac{\partial ^2 l^c_i}{\partial \lambda _k^2 } =&-\Big (A_i \delta ^1_i I\{X_i=t_k\}+M_i I\{\kappa _i=k\}\Big )\frac{1}{\lambda _k^2}, \\ \frac{\partial ^2 l^c_i}{\partial \varvec{\alpha } \partial \varvec{\beta }^\top } =&\frac{\partial ^2 l^c_i}{\partial \varvec{\alpha } \partial \varvec{\lambda }^\top } = \frac{\partial ^2 l^c_i}{\partial \lambda _k \partial \lambda _h }=0, \ \ \ \ k\ne h. \end{aligned}$$

1.2 B.2 Conditional expectations

By the conditional expectations (8)–(10), we are able to calculate the ‘first order’ conditional expectations, $E[\nabla l^c_i|\mathcal {O}]$ and $E[\nabla ^2 l^c_i|\mathcal {O}]$:

$$\begin{aligned} E&\left[ \frac{\partial l^c_i}{\partial \varvec{\alpha }} \right] = {\mathbf { Z}_1 }_i\Big \{E(A_i)-p_i+E(M_i)(1-p_i)\Big \}, \\ E&\left[ \frac{\partial l^c_i}{\partial \varvec{\beta }} \right] = {\mathbf { Z}_2 }_i \bigg [E(A_i) \Big \{\delta ^1_i+\log S_i(X_i)\Big \} \\&\qquad \qquad \quad +E(M_i) \Big \{1+\sum _{k:t_k<Q_j}P(\tilde{T}_{ij}=t_k) \log S_i(t_k)\Big \}\bigg ], \\ E&\left[ \frac{\partial l^c_i}{\partial \lambda _k} \right] =E(A_i)\Big \{\frac{\delta ^1_i I\{t_k = X_i\}}{\lambda _k}-I\{t_k \le X_i\}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\Big \}\\&\qquad \qquad \quad +E(M_i)\Big \{\frac{P(\tilde{T}_{ij}=t_k)}{\lambda _k}-P(\tilde{T}_{ij} \ge t_k)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\Big \}. \\ E&\left[ \frac{\partial ^2 l^c_i}{\partial \varvec{\alpha } \partial \varvec{\alpha }^\top } \right] = - {\mathbf { Z}_1 }_i {\mathbf { Z}_1 }_i^\top (1+E(M_i))p_i(1-p_i), \\ E&\left[ \frac{\partial ^2 l^c_i}{\partial \varvec{\beta } \partial \varvec{\beta }^\top } \right] = {\mathbf { Z}_2 }_i {\mathbf { Z}_2 }_i^\top \Big \{ E(A_i) \log S_i(X_i) +E(M_i) \sum _{k:t_k<Q_i}P(\tilde{T}_{ij}=t_k) \log S_i(t_k)\Big \}, \\ E&\left[ \frac{\partial ^2 l^c_i}{\partial \varvec{\beta }\partial \lambda _k} \right] = -{\mathbf { Z}_2 }_i \Big \{E(A_i) I\{t_k \le X_i\} +E(M_i) P(t_k \le \kappa _i) \Big \}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}, \\ E&\left[ \frac{\partial ^2 l^c_i}{\partial \lambda _k^2 } \right] = -\Big \{E(A_i) \delta ^1_i I\{\tilde{T}_{ij}=t_k\}+E(M_i) P(\tilde{T}_{ij}=t_k)\Big \}\frac{1}{\lambda _k^2}. \end{aligned}$$

To calculate ‘second order’ expectation $E[\nabla l^c_i{\nabla l^c_i}^\top |\mathcal {O}]$, we first compute the conditional variances:

$$\begin{aligned} \text {Var}&[A_i|\mathcal {O}] = \delta ^c_i\frac{p_i(1-p_i)S_i(X_i)}{\big \{1-p_i+p_iS_i(X_i)\big \}^2}, \\ \text {Var}&[M_i|\mathcal {O}] = \frac{p_i\Big [1-S_i(Q_i)\big \}}{\big \{1-p_i+p_iS_i(Q_i)\big \}^2}. \end{aligned}$$

Then,

$$\begin{aligned}&E\left[ \frac{\partial l^c_i}{\partial \varvec{\alpha }} { \frac{\partial l^c_i}{\partial \varvec{\alpha }}}^\top \right] = E\left[ \frac{\partial l^c_i}{\partial \varvec{\alpha }} \right] E\left[ \frac{\partial l^c_i}{\partial \varvec{\alpha }} \right] ^\top +{\mathbf { Z}_1 }_i {\mathbf { Z}_1 }_i^\top \big \{(1-p_i)^2 \text {Var}(M_i)+ \text {Var}(A_i)\big \},\\&\quad E\left[ \frac{\partial l^c_i}{\partial \varvec{\alpha }} { \frac{\partial l^c_i}{\partial \varvec{\beta }}}^\top \right] =E\left[ \frac{\partial l^c_i}{\partial \varvec{\alpha }}\right] E\left[ \frac{\partial l^c_i}{\partial \varvec{\beta }} \right] ^\top +{\mathbf { Z}_1 }_i {\mathbf { Z}_2 }_i^\top \bigg [ \text {Var}(A_i)\big \{\delta ^1_i+\log S_i(X_i)\big \}\\&\qquad \qquad \qquad \qquad \qquad + \text {Var}(M_i)(1-p_i)\Big \{1+\sum _{k:t_k<Q_i}P(\tilde{T}_{ij}=t_k)\log S_i(t_k)\Big \}\bigg ],\\&\quad E\left[ \frac{\partial l^c_i}{\partial \varvec{\beta }} {\frac{\partial l^c_i}{\partial \varvec{\beta }}}^\top \right] =E\left[ \frac{\partial l^c_i}{\partial \varvec{\beta }} \right] E\left[ \frac{\partial l^c_i}{\partial \varvec{\beta }} \right] ^\top +{\mathbf { Z}_2 }_i {\mathbf { Z}_2 }_i^\top \bigg [ \text {Var}(A_i)\Big \{\delta ^1_i+\log S_i(X_i)\Big \}^2\\&\qquad \qquad \qquad \qquad \qquad + \text {Var}(M_i)\Big \{1+\sum _{k:t_k<Q_i}P(\tilde{T}_{ij}=t_k)\log S_i(t_k)\Big \}^2\\&\qquad \qquad \qquad \qquad \qquad +E(M_i)\Big \{\sum _{k:t_k<Q_i}P(\tilde{T}_{ij}=t_k)\log S_i(t_k)^2\\&\qquad \qquad \qquad \qquad \qquad -\Big (\sum _{k:t_k<Q_i}P(\tilde{T}_{ij}=t_k)\log S_i(t_k)\Big )^2\Big \}\bigg ],\\&\quad E\left[ \frac{\partial l^c_i}{\partial \varvec{\alpha }} \frac{\partial l^c_i}{\partial \lambda _k} \right] =E\left[ \frac{\partial l^c_i}{\partial \varvec{\alpha }}\right] E\left[ \frac{\partial l^c_i}{\partial \lambda _k}\right] \\&\qquad \qquad \qquad \qquad \quad + {\mathbf { Z}_1 }_i \bigg [\text {Var}(A_i) \Big \{\frac{\delta ^1_i I\{t_k = X_i\}}{\lambda _k}-I\{t_k \le X_i\}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\Big \} \\&\qquad \qquad \qquad \qquad \quad +\text {Var}(M_i)(1-p_i)\Big \{\frac{P(\tilde{T}_{ij}=t_k)}{\lambda _k}-P(\tilde{T}_{ij} \ge t_k)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\Big \}\bigg ],\\&\quad E\left[ \frac{\partial l^c_i}{\partial \varvec{\beta }} \frac{\partial l^c_i}{\partial \lambda _k} \right] =E\left[ \frac{\partial l^c_i}{\partial \varvec{\beta }}\right] E\left[ \frac{\partial l^c_i}{\partial \lambda _k}\right] \\&\qquad \qquad \qquad \qquad \quad +{\mathbf { Z}_2 }_i \bigg [ \text {Var}(A_i)\big \{\delta ^1_i+\log S_i(X_i)\big \}\\&\qquad \qquad \qquad \qquad \qquad \Big \{\frac{\delta ^1_i I\{t_k = X_i\}}{\lambda _k}-I\{t_k \le X_i\}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\Big \} \\&\qquad \qquad \qquad \qquad \quad + \text {Var}(M_i)\Big \{\frac{P(\tilde{T}_{ij}=t_k)}{\lambda _k}-P(\tilde{T}_{ij} \ge t_k)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\Big \}\\&\qquad \qquad \qquad \qquad \qquad \Big \{1+\sum _{h:t_h<Q_i}P(\tilde{T}_{ij}=t_h)\log S_i(t_h)\Big \}\\&\qquad \qquad \qquad \qquad \quad - E(M_i)\Big \{\sum _{h:t_h<Q_i}P(\tilde{T}_{ij}=t_h)\log S_i(t_h)\frac{P(\tilde{T}_{ij}=t_k)}{\lambda _k}\\&\qquad \qquad \qquad \qquad \quad -\frac{P(\tilde{T}_{ij}=t_k)\log S_i(t_k)}{\lambda _k}\\&\qquad \qquad \qquad \qquad \quad -P\{\tilde{T}_{ij} \ge t_k\}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \sum _{h:t_h<Q_i}P(\tilde{T}_{ij}=t_h)\log S_i(t_h)\\&\qquad \qquad \qquad \qquad \quad +e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\sum _{h =k}^{t_h<Q_i}P(\tilde{T}_{ij}=t_h)\log S_i(t_h)\Big \}\bigg ], \\ \end{aligned}$$

$$\begin{aligned}&\quad E\left[ \frac{\partial l^c_i}{\partial \lambda _k} \frac{\partial l^c_i}{\partial \lambda _h} \right] =EA_i \left\{ -\frac{\delta ^1_i I\{X_i =t_{k\vee h}\}}{\lambda _{k\vee h}}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}+ I\{X_i\ge t_{k\vee h}\}e^{2\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\right\} \\&\qquad \qquad \qquad \qquad \quad +E(A_i) E(M_i) \left\{ \frac{\delta ^1_i I\{\tilde{T}_{ij}=t_k\}}{\lambda _k}- I\{X_i\ge t_k\}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\right\} \\&\qquad \qquad \qquad \qquad \qquad \left\{ \frac{P(\tilde{T}_{ij}=t_h)}{\lambda _h}-P(\tilde{T}_{ij} \ge t_h)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \right\} \\&\qquad \qquad \qquad \qquad \quad +E(A_i) E(M_i) \left\{ \frac{\delta ^1_i I\{X_i =t_h\}}{\lambda _h}- I\{X_i\ge t_h\}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\right\} \\&\qquad \qquad \qquad \qquad \qquad \left\{ \frac{P(\tilde{T}_{ij}=t_k)}{\lambda _k}-P(\tilde{T}_{ij} \ge t_k)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \right\} \\&\qquad \qquad \qquad \qquad \quad + E[M_i^2-M_i] \left\{ \frac{P(\tilde{T}_{ij}=t_k)}{\lambda _k}-P(\tilde{T}_{ij} \ge t_k)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \right\} \\&\qquad \qquad \qquad \qquad \qquad \left\{ \frac{P(\tilde{T}_{ij}=t_h)}{\lambda _h}-P(\tilde{T}_{ij} \ge t_h)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \right\} \\&\qquad \qquad \qquad \qquad \quad +E(M_i) \left\{ -\frac{P(\tilde{T}_{ij} =t_{k\vee h})}{\lambda _{k \vee h}}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} +P(\kappa _i\ge t_{k \vee h})e^{2\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \right\} ,\\&\quad E\left[ \frac{\partial l^c_i}{\partial \lambda _k} \frac{\partial l^c_i}{\partial \lambda _k} \right] =EA_i \left\{ \frac{\delta ^1_i I\{\tilde{T}_{ij}=t_k\}}{\lambda _k}- I\{X_i\ge t_k\}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\right\} ^2\\&\qquad \qquad \qquad \qquad \quad +E(A_i) E(M_i) \left\{ \frac{\delta ^1_i I\{\tilde{T}_{ij}=t_k\}}{\lambda _k}- I\{X_i\ge t_k\}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\right\} \\&\qquad \qquad \qquad \qquad \qquad \left\{ \frac{P(\tilde{T}_{ij}=t_k)}{\lambda _k}-P(\tilde{T}_{ij} \ge t_k)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \right\} \\&\qquad \qquad \qquad \qquad \quad +E(A_i) E(M_i) \left\{ \frac{\delta ^1_i I\{\tilde{T}_{ij}=t_k\}}{\lambda _k}- I\{X_i\ge t_k\}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\right\} \\&\qquad \qquad \qquad \qquad \qquad \left\{ \frac{P(\tilde{T}_{ij}=t_k)}{\lambda _k}-P(\tilde{T}_{ij} \ge t_k)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \right\} \\&\qquad \qquad \qquad \qquad \quad + E[M_i^2-M_i] \left\{ \frac{P(\tilde{T}_{ij}=t_k)}{\lambda _k}-P(\tilde{T}_{ij} \ge t_k)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \right\} \\&\qquad \qquad \qquad \qquad \qquad \left\{ \frac{P(\tilde{T}_{ij}=t_k)}{\lambda _k}-P(\tilde{T}_{ij} \ge t_k)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \right\} \\&\qquad \qquad \qquad \qquad \quad +E(M_i) \left\{ \frac{P(\tilde{T}_{ij}=t_k)}{\lambda ^2_k}- 2\frac{P(\tilde{T}_{ij}=t_k)}{\lambda _k}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\right. \\&\qquad \qquad \qquad \qquad \quad \left. +P(\tilde{T}_{ij} \ge t_k)e^{2\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \right\} . \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hou, J., Chambers, C.D. & Xu, R. A nonparametric maximum likelihood approach for survival data with observed cured subjects, left truncation and right-censoring. Lifetime Data Anal 24, 612–651 (2018). https://doi.org/10.1007/s10985-017-9415-2

Download citation

Received: 06 June 2017
Accepted: 08 December 2017
Published: 13 December 2017
Issue Date: October 2018
DOI: https://doi.org/10.1007/s10985-017-9415-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A nonparametric maximum likelihood approach for survival data with observed cured subjects, left truncation and right-censoring

Abstract

Access this article

Similar content being viewed by others

Bayesian Meta-Analysis of Health State Utility Values: A Tutorial with a Practical Application in Heart Failure

Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range

Non-linear Mendelian randomization: detection of biases using negative controls with a focus on BMI, Vitamin D and LDL cholesterol

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Proofs

1.1 A.1 The existence of NPMLE

Proof of Theorem 1

Lemma A1

Proof of Lemma A1

Proof of Lemma 1

Proof of Theorem 2 and 2’

1.2 A.2 Consistency of NPMLE

Proof of Theorem 3

Proof of Theorem 3’

1.3 A.3 Asymptotic normality

Lemma A2

Proof of Lemma A2

Proof of Theorem 4

Proof of Theorem 5

Appendix B: Variance Estimator

1.1 B.1 Derivatives of log-likelihood

1.2 B.2 Conditional expectations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A nonparametric maximum likelihood approach for survival data with observed cured subjects, left truncation and right-censoring

Abstract

Access this article

Similar content being viewed by others

Bayesian Meta-Analysis of Health State Utility Values: A Tutorial with a Practical Application in Heart Failure

Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range

Non-linear Mendelian randomization: detection of biases using negative controls with a focus on BMI, Vitamin D and LDL cholesterol

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Proofs

1.1 A.1 The existence of NPMLE

Proof of Theorem 1

Lemma A1

Proof of Lemma A1

Proof of Lemma 1

Proof of Theorem 2 and 2’

1.2 A.2 Consistency of NPMLE

Proof of Theorem 3

Proof of Theorem 3’

1.3 A.3 Asymptotic normality

Lemma A2

Proof of Lemma A2

Proof of Theorem 4

Proof of Theorem 5

Appendix B: Variance Estimator

1.1 B.1 Derivatives of log-likelihood

1.2 B.2 Conditional expectations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation