Skip to main content
Log in

A nonparametric maximum likelihood approach for survival data with observed cured subjects, left truncation and right-censoring

  • Published:
Lifetime Data Analysis Aims and scope Submit manuscript

Abstract

We consider observational studies in pregnancy where the outcome of interest is spontaneous abortion (SAB). This at first sight is a binary ‘yes’ or ‘no’ variable, albeit there is left truncation as well as right-censoring in the data. Women who do not experience SAB by gestational week 20 are ‘cured’ from SAB by definition, that is, they are no longer at risk. Our data is different from the common cure data in the literature, where the cured subjects are always right-censored and not actually observed to be cured. We consider a commonly used cure rate model, with the likelihood function tailored specifically to our data. We develop a conditional nonparametric maximum likelihood approach. To tackle the computational challenge we adopt an EM algorithm making use of “ghost copies” of the data, and a closed form variance estimator is derived. Under suitable assumptions, we prove the consistency of the resulting estimator which involves an unbounded cumulative baseline hazard function, as well as the asymptotic normality. Simulation results are carried out to evaluate the finite sample performance. We present the analysis of the motivating SAB study to illustrate the advantages of our model addressing both occurrence and timing of SAB, as compared to existing approaches in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Andersen PK, Borgan O, Gill RD, Keiding N (1993) Statistical models based on counting processes. Springer, New York

    Book  MATH  Google Scholar 

  • Asgharian M, Wolfson DB, Zhang X (2006) Checking stationarity of the incidence rate using prevalent cohort survival data. Stat Med 25(10):1751–1767

    Article  MathSciNet  Google Scholar 

  • Chambers CD, Braddock SR, Briggs GG, Einarson A, Johnson YR, Miller RK, Polifka JE, Robinson LK, Stepanuk K, Jones KL (2001) Postmarketing surveillance for human teratogenicity: a model approach. Teratology 64:252–261

    Article  Google Scholar 

  • Chambers CD, Johnson D, Xu R, Jones KL (2011) Challenges and design of a prospective, observational cohort study to assess the risk of spontaneous abortion following administration of human papillomavirus (HPV) bivalent (types 16 and 18) recombinant vaccine. In: The 27th international conference on pharmacoepidemiology and therapeutic risk management, Chicago, IL, USA

  • Chambers CD, Johnson D, Xu R, Luo Y, Louik C, Mitchell AA, Schatz M, Jones KL (2013) Risks and safety of pandemic h1n1 in uenza vaccine in pregnancy: birth defects, spontaneous abortion, preterm delivery, and small for gestational age infants. Teratology 31(44):5026–5032

    Google Scholar 

  • Chen M-H, Ibrahim JG, Sinha D (1999) A new bayesian model for survival data with a surviving fraction. J Am Stat Assoc 94(447):909–919

    Article  MathSciNet  MATH  Google Scholar 

  • Chen C-M, Shen P-S, Wei JC-C, Lin L (2017) A semiparametric mixture cure survival model for left-truncated and right-censored data. Biom J 59:270–290

    Article  MathSciNet  MATH  Google Scholar 

  • Conway JB (1990) A course in functional analysis, 2nd edn. Springer, New York

    MATH  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J Am Stat Assoc 39(1):1–38

    MathSciNet  MATH  Google Scholar 

  • Farewell VT (1982) The use of mixture models for the analysis of survival data with long-time survivors. Biometrics 38:1041–1046

    Article  Google Scholar 

  • Farewell VT (1986) Mixture models in survival analysis: are they worth the risk? Can J Stat 14(3):257–262

    Article  MathSciNet  Google Scholar 

  • Gamst A, Donohue M, Xu R (2009) Asymptotic properties and empirical evaluation of the NPMLE in the proportional hazards mixed-effects model. Stat Sin 19:997–1011

    MathSciNet  MATH  Google Scholar 

  • Gross ST, Lai TL (1996) Nonparametric estimation and regression analysis with left-truncated and right-censored data. J Am Stat Assoc 91:1166–1180

    Article  MathSciNet  MATH  Google Scholar 

  • Hanson T, Bedrick EJ, Johnson WO, Thurmond MC (2003) A mixture model for bovine abortion and foetal survival. Stat Med 22(10):1725–1739

    Article  Google Scholar 

  • Johansen S (1983) An extension of Cox’s regression model. Int Stat Rev 51:165–174

    Article  MathSciNet  MATH  Google Scholar 

  • Kim Y-J, Jhun M (2008) Cure rate model with interval censored data. Stat Med 27(1):3–14

    Article  MathSciNet  Google Scholar 

  • Kuk AY, Chen C-H (1992) A mixture model combining logistic regression with proportional hazards regression. Biometrika 79(3):531–541

    Article  MATH  Google Scholar 

  • Lagakos SW, Marraj LM, De Gruttola V (1988) Nonparametric analysis of truncated survival data, with application to aids. Biometrika 75:515–523

    Article  MathSciNet  MATH  Google Scholar 

  • Lai TL, Ying Z (1991) Estimating a distribution function with truncated and censored data. Ann Stat 19:417–442

    Article  MathSciNet  MATH  Google Scholar 

  • Li C-S, Taylor JM, Sy JP (2001) Identifiability of cure models. Stat Probab Lett 54(4):389–395

    Article  MathSciNet  MATH  Google Scholar 

  • Louis T (1982) Finding the observed information matrix when using the em algorithm. J R Stat Soc Ser B 44(2):226–233

    MathSciNet  MATH  Google Scholar 

  • Lu W, Ying Z (2004) On semiparametric transformation cure models. Biometrika 91(2):331–343

    Article  MathSciNet  MATH  Google Scholar 

  • Meister R, Schaefer C (2008) Statistical methods for estimating the probability of spontaneous abortion in observational studies—analyzing pregnancies exposed to coumarin derivatives. Reprod Toxicol 26:31–35

    Article  Google Scholar 

  • Murphy SA (1994) Consistency in a proportional hazards model incorporating a random effect. Ann Stat 22(2):712–731

    Article  MathSciNet  MATH  Google Scholar 

  • Murphy SA (1995) Asymptotic theory for the frailty model. Ann Stat 23(1):182–198

    Article  MathSciNet  MATH  Google Scholar 

  • Ning J, Qin J, Shen Y (2010) Non-parametric tests for right-censored data with biased sampling. J R Stat Soc Ser B 72:609–630

    Article  MathSciNet  Google Scholar 

  • Pan W (2000) A multiple imputation approach to Cox regression with interval-censored data. Biometrics 56(1):199–203

    Article  MATH  Google Scholar 

  • Qin J, Ning J, Liu H, Shen Y (2011) Maximum likelihood estimations and EM algorithms with length-biased data. J Am Stat Assoc 106(496):1434–1449

    Article  MathSciNet  MATH  Google Scholar 

  • Rubin D, Little RJA (2002) Statistical analysis with missing data, 2nd edn. Wiley, New York

    MATH  Google Scholar 

  • Sy JP, Taylor JM (2000) Estimation in a cox proportional hazards cure model. Biometrika 56(1):227–236

    Article  MathSciNet  MATH  Google Scholar 

  • Turnbull BW (1976) The empirical distribution function with arbitrarily grouped, censored and truncated data. J R Stat Soc Ser B 38(3):290–295

    MathSciNet  MATH  Google Scholar 

  • Vaida F, Xu R (2000) Proportional hazards model with random effects. Stat Med 19:3309–3324

    Article  Google Scholar 

  • Van der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes. Springer Series in Statistics. Springer, New York

    Book  MATH  Google Scholar 

  • Vardi Y (1985) Empirical distributions in selection bias models. Ann Stat 13(1):178–203

    Article  MathSciNet  MATH  Google Scholar 

  • Wilcox AJ, Weinberg CR, O’Connor JF, Baird DD, Schlatterer JP, Canfield RE, Armstrong EG, Nisula BC (1988) Incidence of early loss of pregnancy. N Eng J Med 319(4):189–194

    Article  Google Scholar 

  • Xu R, Chambers C (2011) A sample size calculation for spontaneous abortion in observational studies. Reprod Toxicol 32(4):490–493

    Article  Google Scholar 

  • Zeng D, Lin DY (2007) Maximum likelihood estimation in semiparametric regression models with censored data. J R Stat Soc Ser B 69:507–564

    Article  MathSciNet  Google Scholar 

  • Zeng D, Yin G, Ibrahim JG (2006) Semiparametric transformation models for survival data with a cure fraction. J Am Stat Assoc 101:670–684

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ronghui Xu.

Appendices

Appendix A: Proofs

1.1 A.1 The existence of NPMLE

Proof of Theorem 1

Let \(\theta _B\) be the maximizer on the compliment of compact set \(\{\Vert \varvec{\alpha }\Vert \vee \Vert \varvec{\beta }\Vert \vee \Vert {\varvec{\lambda }}\Vert \le B\}\). We show that \(l(\theta _B) \rightarrow -\infty \) when \(B \rightarrow \infty \).

By Assumptions 1 and 2, we have the bound (17).

All terms in the log-likelihood are bounded except for

$$\begin{aligned} \sum _{i=1}^{n}\Big \{\delta ^1_i\log \lambda (X_i)-\delta ^1_i e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \Lambda (X_i)\Big \}. \end{aligned}$$

Let \(\lambda _{\max }\) be the largest element in \({\varvec{\lambda }}\). The expression above has the upper bound

$$\begin{aligned} \log ( \lambda _{\max }/m)- \lambda _{\max }/m-K\log m, \end{aligned}$$

which diverges to \(-\infty \) when we set \(B \rightarrow \infty \).

Then, the global maximizer must be in one of the compact set \(\{\Vert \varvec{\alpha }\Vert \vee \Vert \varvec{\beta }\Vert \vee \Vert {\varvec{\lambda }}\Vert \le B^*\}\) for some \(B^*>0\). \(\square \)

Let \(W_i^{\varvec{\theta }}(t)\) be defined as in (21). We define a generic inequality to be referenced later, for any \(\varvec{\theta }= (\varvec{\alpha },\varvec{\beta }, \Lambda )\) in the parameter space whose baseline cumulative hazard \(\Lambda \) is a step function jumping only at the observed event times, \(t_1, \ldots , t_K\):

$$\begin{aligned} 0 < d\Lambda (t_k) \le \left( \sum _{j=1}^nW_j^{\varvec{\theta }} (t_k)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_j}\right) ^{-1}d\bar{N}(t_k), \quad k = 1, \ldots , K. \end{aligned}$$
(A1)

The conclusion of the following Lemma is used in the proofs of both Lemma 1 and Theorem 3.

Lemma A1

Let \(\varvec{\theta }_{(n)} = \left( \varvec{\alpha }_{(n)}, \varvec{\beta }_{(n)}, \Lambda _{(n)}\right) \) be a sequence in the parameter space where \(\Lambda _{(n)}\) is a non-decreasing step function with jumps only at the observed event times. Suppose that \(\varvec{\theta }_{(n)}\) satisfies (A1) and has a subsequence \(\varvec{\theta }_{(n_k)}\) converging to a limiting point \({\varvec{\theta }}^* = (\varvec{\alpha }^*, \varvec{\beta }^*, \Lambda ^*)\) a.s.:

$$\begin{aligned} \varvec{\alpha }_{(n_k)}-\varvec{\alpha }^* \rightarrow 0, \quad \varvec{\beta }_{(n_k)}-\varvec{\beta }^* \rightarrow 0, \quad \sup _{t\in [0,\tau ]}|e^{-\Lambda _{(n_k)}(t)}-e^{-\Lambda ^*(t)}| \rightarrow 0, \quad a.s..\qquad \end{aligned}$$
(A2)

Under Assumptions 14,

a) :

\(\Lambda ^*(t)< \infty \text { for all } t<\tau \);

b) :

\(\inf _{t\in [0,\zeta ]}E[W^{\varvec{\theta }^*}(t)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}]>C_w, \text { for some } C_w>0\).

Proof of Lemma A1

By checking the uniform continuity of \(W_i^{\varvec{\theta }}(t)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\) in \((\varvec{\alpha },\varvec{\beta },e^{-\Lambda (t)})\), we may establish

$$\begin{aligned} \sup _{t \in [0,\tau ]} \left| W_i^{\varvec{\theta }^*}(t)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }_i}- W_i^{\varvec{\theta }_{(n_k)}}(t)e^{\varvec{\beta }_{(n_k)}^\top {\mathbf { Z}_2 }_i}\right| \rightarrow 0, \quad a.s.. \end{aligned}$$

\(W_i^{\varvec{\theta }}(t)\) as a function of observed random variables belongs to a Glivenko-Cantelli class of uniformly bounded functions with uniformly bounded variation. Thus, the pointwise convergence can be strengthen to be uniform convergence,

$$\begin{aligned} \sup _{t \in [0,\tau ]} \left| \frac{1}{n}\sum _{i=1}^{n_k} W_i^{\varvec{\theta }_{(n_k)}}(t)e^{\varvec{\beta }_{(n_k)}^\top {\mathbf { Z}_2 }_i} -E\left[ W^{\varvec{\theta }^*}(t)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}\right] \right| {\mathop {\longrightarrow }\limits ^{a.s.}}0. \end{aligned}$$

Note that \(n^{-1}\sum _{i=1}^{n_k}W_i^{\varvec{\theta }_{(n_k)}} (t)e^{\varvec{\beta }_{(n_k)}^\top {\mathbf { Z}_2 }_i}\) is càglàd, so its limit \(E[W^{\varvec{\theta }^*}(t)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }} ]\) must also be càglàd.

a) Let \(\tau ^*=\inf \{t\in [0,\zeta ]: e^{-\Lambda ^*(t)}=0\}\). We shall prove that \(\tau ^*=\tau \).

Suppose that \(\tau ^*\) is an interior point of \([0,\tau ]\). From Assumption 4, \(d\Lambda _0([s,t]) = \Lambda _0(t) -\Lambda _0(s) >0\) for any \(s<t\) in \([0,\tau ]\). By the definition of \(\tau ^*\), \(\Lambda ^*(t)=\infty \) and \(\phi ^{\varvec{\theta }^*}(t) = 0\) for \(t \in [\tau ^*,\tau ]\), so we have

$$\begin{aligned} E\left[ W^{\varvec{\theta }^*}(\tau ^*)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}\right] =E\left[ \int _{\tau ^*_-}^\tau e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }} dN(u) \right] >0. \end{aligned}$$

By the left continuity of \(W_i^{\varvec{\theta }}(t)\), \(\exists \ s < \tau ^*\), s.t.

$$\begin{aligned} \inf _{t\in [s,\tau ^*]}E\left[ W^{\varvec{\theta }^*}(t) e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }} \right] \ge \frac{1}{2}E\left[ \int _{\tau ^*_-}^\tau e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }} dN(u) \right] . \end{aligned}$$

The total increment of \(\Lambda _{(n_k)}\) in \([s,\tau ^*]\) must be bounded almost surely according to (A1). By the definition of \(\tau ^*\), \(\Lambda ^*(s)<\infty \). Putting these together, we reach the contradiction,

$$\begin{aligned} \Lambda ^*(\tau ^*) \le \varlimsup _{k \rightarrow \infty }\Lambda _{(n_k)}(\tau ^*) \le&\varlimsup _{k \rightarrow \infty }\Lambda _{(n_k)}(s)+ \int _{s_+}^{\tau ^*} \frac{d \bar{N}(u)}{\sum _{i=1}^{n_k}W^{\varvec{\theta }_{(n_k)}}_i(u)e^{\varvec{\beta }_{(n_k)}^\top {\mathbf { Z}_2 }_i}} \\ \le&\Lambda ^*(s)+ \frac{\tau ^*-s}{\inf _{t\in [s,\tau ^*]}E[W^{\varvec{\theta }^*}(t) e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}]}<\infty . \end{aligned}$$

The other case is \(\tau ^* = 0\). Then, \(\Lambda ^*(t)=\infty \) and \(\phi ^{\varvec{\theta }^*}(t) = 0\) for \(t \in [0,\tau ]\). The contradiction is easily established as

$$\begin{aligned} E\left[ W^{\varvec{\theta }^*}(0)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}\right] =E\left[ \int _0^\tau e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }} dN(u) \right] >0. \end{aligned}$$

b) Since \(E[W^{\varvec{\theta }^*}(t) e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}]\) is càglàd, \(\varvec{\theta }_{(n_k)}\) satisfies (A1) and converges uniformly to \(\varvec{\theta }^*\), it can be seen that \(E[W^{\varvec{\theta }^*}(t) e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}] \ge 0\) over the interior of \([0,\zeta ]\).

Write \(n_k^{-1}\sum _{i=1}^{n_k}W_i^{\varvec{\theta }}(t)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\) as

$$\begin{aligned}&n_k^{-1}\sum _{i=1}^{n_k} \int _{t-}^\tau \big \{1-\phi _i^{\varvec{\theta }}(u)\big \}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}dN_i(u)\nonumber \\&\quad +\int _t^\tau Y_i(u)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} d \phi _i^{\varvec{\theta }}(u)+Y_i(t)\phi _i^{\varvec{\theta }}(t)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \nonumber \\&=n_k^{-1}\sum _{i=1}^{n_k} \int _{t+}^\tau \left[ 1-\phi _i^{\varvec{\theta }}(u) -\frac{\sum _{j=1}^{n_k}Y_j(u)\phi _j^{\varvec{\theta }}(u)\big \{1-\phi _j^{\varvec{\theta }}(u)\big \}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_j}}{\sum _{j=1}^{n_k}W_j^{\varvec{\theta }}(u)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_j}}\right] e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}dN_i(u)\nonumber \\&\quad + \big \{1-\phi _i^{\varvec{\theta }}(t)\big \}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}dN_i(t)+ Y_i(t)\phi _i^{\varvec{\theta }}(t)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} . \end{aligned}$$
(A3)

By Assumption 4, all \(Q_i < \zeta \) a.s.. Thus,

$$\begin{aligned} E\left[ W^{\varvec{\theta }^*}(\zeta ) e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}\right] =\,&E\left[ \big \{\delta ^1+\delta ^c \phi ^{\varvec{\theta }^*}(X)\big \} I\{\zeta \le X\}e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }} \right] \\ \ge \,&\,E\left[ \int _\zeta ^\tau e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}dN(u) \right] >0. \end{aligned}$$

For \(t<\zeta \), the difference \(E[W^{\varvec{\theta }^*}(t)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}]-E[ W^{\varvec{\theta }^*}(\zeta )e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}]\) is the limit of an integral like that in (A3), where the integrand has \(\sum _{j=1}^{n_k}W_j^{\varvec{\theta }}(u)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_j}\) in the denominator. So it has potential singularities at the zeros of \(E[ W^{\varvec{\theta }^*}(u)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}]\) for \( u \in [t,\zeta ]\). We shall show that \(E[W^{\varvec{\theta }^*}(u)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}]\) is differentiable with respect to \(d\Lambda _0(u)\) in \([0,\zeta ]\), so that its zero \(u_0\) leads to the divergent form \( - \int _t ^\zeta |u-u_0|^{-1} du. \) We will then reach the contradiction that \(E[W^{\varvec{\theta }^*}(t)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}]=-\infty \), as seen below.

Denote \(R_0\) the set of zeros and limiting zeros from right for \(E[ W^{\varvec{\theta }^*}(u)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}]\). Let set \(R_{\triangle u}\) be the \(\triangle u\) neighborhood of \(R_0\) and \(\Omega ^t_{\triangle u}=[t,\zeta ] \setminus R_{\triangle u}\). \(E[W^{\varvec{\theta }^*}(u)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}]\) is bounded away from zero on \(\Omega ^t_{\triangle u}\). Through (A3),

$$\begin{aligned} E&\left[ W^{\varvec{\theta }^*}(t)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}\right] -E\left[ W^{\varvec{\theta }^*}(\zeta )e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}\right] \nonumber \\&\le -\int _{\Omega ^t_{\triangle u}}\frac{E\left[ Y(u)\phi ^{\varvec{\theta }^*}(u)\big \{1-\phi ^{\varvec{\theta }^*}(u)\big \} e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}\right] }{E[ W^{\varvec{\theta }^*}(u)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}]}E\left[ e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}dN(u)\right] \nonumber \\&\quad + E\left[ \int _{t+}^\zeta \{1-\phi ^{\varvec{\theta }^*}(u)\}dN(u) + \big \{1-\phi ^{\varvec{\theta }^*}(t)\big \}e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}dN(t)+ Y(t)\phi ^{\varvec{\theta }^*}(t)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}\right] . \end{aligned}$$
(A4)

From part a), \(e^{-\Lambda ^*(\zeta )}>0\). For any \( u<\zeta \),

$$\begin{aligned} \phi ^{\varvec{\theta }^*}_i(u) \ge \phi ^{\varvec{\theta }^*}_i(\zeta ) \ge \frac{m^{-1}e^{-m\Lambda ^*(\zeta )}}{1+m^{-1}e^{-m\Lambda ^*(\zeta )}}>0. \end{aligned}$$

So the limit of numerator term \(E\left[ Y(u)\phi ^{\varvec{\theta }^*}(u)\{1-\phi ^{\varvec{\theta }^*}(u)\}e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}\right] \) is bounded away from zero. And \(\forall u \in [0,\zeta ]\),

$$\begin{aligned} \left| \frac{dEW^{\varvec{\theta }^*}(u)}{d\Lambda _0(u)}\right| =&\left| E\left[ \big \{1-\phi ^{\varvec{\theta }^*}(u)\big \}Y(u)\phi ^{\varvec{\theta }_0}(u) e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }} -\phi ^{\varvec{\theta }^*}(u)\frac{dE[Y(u)|{\mathbf { Z}_1 },{\mathbf { Z}_2 }]}{d\Lambda _0(u)}\right] \right| \\ \le&m+\mathcal {L} <\infty . \end{aligned}$$

The first term in (A4) diverges to \(-\infty \) when \(\triangle u \rightarrow 0\). The other terms are bounded, so this is the desired contradiction. \(\square \)

Proof of Lemma 1

a) Define the marginal of the complete data likelihood

$$\begin{aligned} \tilde{L}(\varvec{\theta })=&\sum _{A_i=0,1}\sum _{M_i=0}^\infty \sum _{\widetilde{T}_{i1}=t_k: t_k\le Q_i}\dots \sum _{\widetilde{T}_{iM_i}=t_k: t_k\le Q_i} L^c_i(\varvec{\theta }) \\ =&\prod _{i=1}^n\frac{\big \{p_i\lambda _i(X_i)S_i(X_i)\big \}^{\delta ^1_i} (1-p_i)^{\delta ^0_i}\big \{p_iS_i(X_i)+1-p_i\big \}^{\delta ^c_i}}{1-p_i\sum _{k: t_k\le Q_i}\lambda _k e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}S_i(t_k)}. \end{aligned}$$

From (5) it can be seen that the complete data likelihood \(L^c(\varvec{\theta })\) can be decomposed into the product of one logistic part with one Cox part. The Assumptions 13 contain the regularity conditions of these two parts. The event rate \(P(A_i=1)\) is bounded away from both zero and one,

$$\begin{aligned} 0<\frac{m^{-1}}{m^{-1}+1} \le P(A_i=1) \le \frac{m}{m+1} < 1. \end{aligned}$$

The average at-risk process \(E[Y_i(t)]\) is bounded away from zero almost surely. The matrices \({\mathbf { Z}_1 }\) and \({\mathbf { Z}_2 }\) are almost surely of full rank, as \(\text {Var}({\mathbf { Z}_1 })\) and \(\text {Var}({\mathbf { Z}_2 })\) are positive definite. Under these conditions, both parts of the likelihood are concave in the associated sets of parameters, \(\varvec{\alpha }\) and \((\varvec{\beta },{\varvec{\lambda }})\), respective. Thus, \(L^c(\varvec{\theta })\) is almost surely concave in \(\varvec{\theta }\). \(\tilde{L}(\varvec{\theta })\) is also concave as the sum over concave functions. The almost sure convergence of the EM algorithm is guaranteed by the almost sure concaveness of the marginal of the complete data likelihood Dempster et al. (1977). b) To prove the second result, we take the following strategy. For any \(\varvec{\theta }\) denote \(\lambda _{\max ,\zeta }=\max \{\lambda _k: t_k \le \zeta \}\), where \(\zeta \) is the upper bound of truncation time defined in Assumption 4. Define a set in the parameter space:

$$\begin{aligned} {\Theta } = \left\{ \varvec{\theta }=(\varvec{\alpha },\varvec{\beta },\Lambda ) | \lambda _{\max ,\zeta } \le n^{-1}2/C_w\right\} , \end{aligned}$$
(A5)

with \(C_w\) defined in Lemma A1. We would like to show that

$$\begin{aligned} \lim _{n\rightarrow \infty }P(\hat{\varvec{\theta }}, \tilde{\varvec{\theta }}\in \widehat{\Theta }) = 1. \end{aligned}$$
(A6)

This is done through applying Lemma A1, so we will need to verify condition (A1) for \(\tilde{\varvec{\theta }}\) and \(\hat{\varvec{\theta }}\). The convergence of the EM algorithm is obtained in the first step.

First, we show that the EM finds the unique stationary point of \(\tilde{L}(\varvec{\theta })\), which then must be the global maximizer since it is concave from the proof of part 1. Consider the conditional expectation given the observed data as in (8)–(10). It can be verified directly (we skip the algebraic details here) that:

$$\begin{aligned} \nabla \log \tilde{L}(\varvec{\theta })=E_{\varvec{\theta }}[ \nabla \log L^c(\varvec{\theta }) |\mathcal {O}]. \end{aligned}$$

The estimator \(\tilde{\varvec{\theta }}\) is by definition the solution to the left-hand side of the above being zero, hence also the stationary point of \(\tilde{L}(\varvec{\theta })\).

We write down the stationary equation \(\varvec{\theta }^{(l)}=\varvec{\theta }^{(l+1)} = \tilde{\varvec{\theta }}\) for \(\tilde{\lambda }_k\)’s at convergence,

$$\begin{aligned} \tilde{\lambda }_k=\frac{1+\tilde{\lambda }_k\sum _{i=1}^n\frac{\tilde{p}_ie^{\tilde{\varvec{\beta }}\top {\mathbf { Z}_2 }_i} \tilde{S}_i(t_k)I(Q_i \ge t_k)}{1-\tilde{p}_i\sum _{h:h<Q_i}\tilde{f}_i(t_h)}}{\sum _{i=1}^n\left\{ \delta ^1_i I(X_i \ge t_k)+\delta ^c_i\phi ^{\tilde{\varvec{\theta }}}_i(X_i)I(X_i \ge t_k) +\sum _{j\ge k}\frac{\tilde{p}_i \tilde{f}_i(t_j)I(Q_i \ge t_j)}{1-\tilde{p}_i\sum _{h:h<Q_i}\tilde{f}_i(t_h)} \right\} e^{\tilde{\varvec{\beta }}\top {\mathbf { Z}_2 }_i}}, \end{aligned}$$
(A7)

where \(f_i\) was previously defined just above (6). Combining \(\tilde{\lambda }_k\) terms leads to

$$\begin{aligned} \tilde{\lambda }_k^{-1}=\sum _{i=1}^n&\bigg \{\delta ^1_i I(X_i \ge t_k)+\delta ^c_i\phi ^{\tilde{\varvec{\theta }}}_i(X_i)I(X_i \ge t_k)\nonumber \\&-\tilde{p}_i \frac{\tilde{S}_i(t_k)I(Q_i \ge t_k)-\sum _{j\ge k}\tilde{f}_i(t_j)I(Q_i \ge t_j)}{1-\tilde{p}_i\sum _{h:h<Q_i}\tilde{f}_i(t_h)} \bigg \} e^{\tilde{\varvec{\beta }}\top {\mathbf { Z}_2 }_i}. \end{aligned}$$
(A8)

By the mean value theorem,

$$\begin{aligned} 0 \le e^{\lambda _ke^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}}-1-\lambda _ke^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\le \frac{1}{2}\left( \lambda _ke^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\right) ^2 e^{\lambda _ke^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}} \le \frac{1}{2}m^2\lambda _k^2e^{\lambda _km}. \end{aligned}$$
(A9)

where \( m\) is defined in (17). Applying (A9) to the denominator in (A8), we get

$$\begin{aligned} 1-\tilde{p}_i\sum _{h:h<Q_i}\tilde{f}_i(t_h) \ge 1-\tilde{p}_i\{1-\tilde{S}_i(Q_i)\}. \end{aligned}$$

By a similar argument, we have almost surely

$$\begin{aligned}&\tilde{S}_i(t_k)I(Q_i \ge t_k)-\sum _{j\ge k}\tilde{f}_i(t_j)I(Q_i \ge t_j) \\&\quad = \tilde{S}_i(Q_i)I(Q_i \ge t_k)+\sum _{j\ge k}\left\{ 1-e^{-\tilde{\lambda }_j e^{\tilde{\varvec{\beta }}\top {\mathbf { Z}_2 }_i}}-\tilde{\lambda }_j e^{\tilde{\varvec{\beta }}\top {\mathbf { Z}_2 }_i}\right\} \tilde{S}_i(t_j)I(Q_i > t_j)\\&\quad \le \tilde{S}_i(Q_i)I(Q_i \ge t_k). \end{aligned}$$

Then, \(\tilde{\varvec{\theta }}\) satisfies (A1).

For \(\hat{\varvec{\theta }}\), it must satisfy the score equation for \(\lambda _k\)’s:

$$\begin{aligned} \frac{\partial l(\varvec{\theta })}{\partial \lambda _k} = \sum _{i=1}^n \left\{ \frac{d N_i(t_k)}{\lambda _k} -W^{\varvec{\theta }}_i(t_k)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\right\} = 0, \quad \forall k=1,\ldots ,K. \end{aligned}$$

This is the equation version of (A1) after rearrangement.

Now let \(\hat{\lambda }_{\max ,\zeta }\) and \(\tilde{\lambda }_{\max ,\zeta }\) be the largest jump for \(\hat{\Lambda }\) and \(\tilde{\Lambda }\) on \([0,\zeta ]\), correspondingly. By Lemma A1 part b), we have

$$\begin{aligned} \limsup _{n\rightarrow \infty }n\hat{\lambda }_{\max ,\zeta } \le C_w^{-1}, \quad \limsup _{n\rightarrow \infty }n\tilde{\lambda }_{\max ,\zeta } \le C_w^{-1}, a.s.. \end{aligned}$$

Hence (A6) is established.

In the set \(\Theta \), we evaluate the discrepancy between \(\log \tilde{L}(\varvec{\theta })\) and \(\log L (\varvec{\theta })\), which can be bounded as following

$$\begin{aligned} 1-S_i(Q_i)-\sum _{k:t_k<Q_i}\lambda _ke^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}S_i(t_k) =\sum _{k:t_k<Q_i} S_i(t_k) \left( e^{\lambda _ke^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}}-1-\lambda _ke^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\right) . \end{aligned}$$
(A10)

Applying (A9) to \(|\log L(\varvec{\theta })-\log \tilde{L}(\varvec{\theta })|\), we have the bound

$$\begin{aligned}&\left| \log L(\varvec{\theta })-\log \tilde{L}(\varvec{\theta })\right| \\&\quad \le \sum _{i=1}^n\left| \log \left\{ 1-p_i+p_iS_i(Q_i)\right\} -\log \left\{ 1-p_i\sum _{k:t_k<Q_i}\lambda _ke^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}S_i(t_k)\right\} \right| \\&\quad \le \sum _{i=1}^n\left| \frac{p_i}{1-p_i}\frac{n}{2}m^2\lambda _k^2e^{\lambda _km}\right| \le \frac{1}{2}n^2 e^{m\lambda _{\max ,\zeta }} m^3\lambda _{\max ,\zeta }^2. \end{aligned}$$

Using the upper bound for \(\lambda _{\max ,\zeta }\) in \(\Theta \), we can bound

$$\begin{aligned} \sup _{\varvec{\theta }\in \Theta }\left| \log L(\varvec{\theta })-\log \tilde{L}(\varvec{\theta })\right| \le e^{\frac{2m}{C_w}}\frac{2m^3}{C_w^2}. \end{aligned}$$
(A11)

In summary whenever \(\hat{\varvec{\theta }}, \tilde{\varvec{\theta }}\in {\Theta }\), we have

$$\begin{aligned} 0 \le \log L(\hat{\varvec{\theta }}) - \log L(\tilde{\varvec{\theta }}) \le \log L(\hat{\varvec{\theta }})-\log \tilde{L}(\hat{\varvec{\theta }}) + \log \tilde{L}(\tilde{\varvec{\theta }}) - \log L(\tilde{\varvec{\theta }}) <e^{\frac{2m}{C_w}}\frac{4m^3}{C_w^2}.\nonumber \\ \end{aligned}$$
(A12)

Combining (A12) and (A6) completes the proof. \(\square \)

Proof of Theorem 2 and 2’

From Lemma 1, we only need to establish the following two facts: (1) \( E[l_1(\varvec{\theta })]\) exists with one unique maximal, and (2) it is locally invertible at the maximal. We will see that (1) is verified through the Proof of Theorem 3, and (2) is verified through the Proof of Theorem  4. \(\square \)

1.2 A.2 Consistency of NPMLE

Proof of Theorem 3

The constants \(m\), c, \(\varepsilon \) and \(\mathcal {L}\) are defined in (17), (18) and (19).

First, we show that the “bridge” \(\bar{\Lambda }\) defined in (22) converges to the true \(\Lambda _0\) in the following sense:

$$\begin{aligned} \sup _{t\in [0,\tau ]}\left| e^{-\bar{\Lambda }(t)}-e^{-\Lambda _0(t)}\right| \rightarrow 0, a.s. \end{aligned}$$
(A13)

as \(n\rightarrow \infty \). We have the bound for \(\forall t \in (0,\tau )\),

$$\begin{aligned} m\ge \frac{E\left[ Y(t)\phi ^{\varvec{\theta }_0}(t)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}\right] }{E\left[ \log \left\{ 1+\exp \left( \varvec{\alpha }_0^\top {\mathbf { Z}_1 }-\Lambda _0(t)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }_i}\right) \right\} \right] } \ge \frac{\varepsilon }{m^2+m}. \end{aligned}$$
(A14)

For any \(\tau ^*<\tau \) in \(\mathbb {Q}\) the set of rational numbers, \(E[ Y(t)\phi ^{\varvec{\theta }_0}(t)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }} ]\) is bounded away from zero over \([0, \tau ^*]\). The uniform convergence of \(\bar{\Lambda }\) to \(\Lambda _0\) over any \([0, \tau ^*]\) can be obtained in the way like Murphy (1994). To extend the result to (A13), we use a trick described in (A15)–(A18). By Assumption 3, \(\Lambda _0\) is non-decreasing and diverges to \(\infty \) at \(\tau \). Therefore,

$$\begin{aligned} \forall \epsilon >0, \, \exists \tau ^* \in (0,\tau ) \cap \mathbb {Q}, \, s.t. \, e^{-\Lambda _0(\tau ^*)}<\epsilon /3. \end{aligned}$$
(A15)

Through Rao’s law of large number and Helly-Bray argument, we have

$$\begin{aligned} \sup _{t\in [0,\tau ^*]}|\bar{\Lambda }(t)-\Lambda _0(t)| \rightarrow 0, \quad a.s. . \end{aligned}$$
(A16)

By continuity of the exponential function,

$$\begin{aligned} \exists N, \, \forall n>N, \, \sup _{t\in [0,\tau ^*]}|e^{-\bar{\Lambda }(t)} -e^{-\Lambda _0(t)}|<\epsilon /3. \end{aligned}$$
(A17)

Then,

$$\begin{aligned} \forall n>N, \, \sup _{t\in [\tau ^*,\tau ]}|e^{-\bar{\Lambda }(t)}-e^{-\Lambda _0(t)}| \le 2e^{-\Lambda _0(\tau ^*)}+|e^{-\bar{\Lambda }(\tau ^*)}-e^{-\Lambda _0(\tau ^*)}| <\epsilon .\qquad \end{aligned}$$
(A18)

Therefore, we have proved (A13).

Next, we evaluate the difference between the limits of \(\hat{\Lambda }\) and \(\bar{\Lambda }\). According to Assumption 1 and \(e^{-\hat{\Lambda }(t)} \in [0,1]\), \((\hat{\varvec{\alpha }},\hat{\varvec{\beta }},e^{-\hat{\Lambda }(t)})\) is bounded. \(\hat{\Lambda }(t)\) is Càdlàg, so is \(e^{-\hat{\Lambda }(t)}\). By Helly’s Selection theorem, there is a subsequence converging uniformly almost surely to some \(\varvec{\theta }^*=(\varvec{\alpha }^*, \varvec{\beta }^*, e^{-\Lambda ^*})\). Lemma A1 part b) gives the bound for \(E\{ W^{\varvec{\theta }}(t)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }} \}\) over \([0,\zeta ]\). We only need to find its bound on \([\zeta ,\tau ]\) in order to mimic the Proof of Lemma 1 of Murphy (1994). Note that

$$\begin{aligned} E\left[ W^{\varvec{\theta }}(t)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }} \right] =&E\left[ \int _{t-}^\tau \big \{1-\phi ^{\varvec{\theta }}(u)\big \}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }}dN(u) \right] \\&-E\left[ \int _t^\tau \phi ^{\varvec{\theta }}(u)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }}dE[Y(u)|{\mathbf { Z}_1 },{\mathbf { Z}_2 }] \right] . \end{aligned}$$

By Assumption 4, \(P(Q_i \le \zeta )=1\), so \(E[Y(u)|{\mathbf { Z}_1 },{\mathbf { Z}_2 }]\) is decreasing on \([\zeta ,\tau ]\). Along with the Lipschitz continuity, we have for \(\forall t \in [\zeta ,\tau )\)

$$\begin{aligned} M\mathcal {L} \ge \frac{E[W^{\varvec{\theta }}(t)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }}]}{E\left[ \log \left\{ 1+\exp \left( \varvec{\alpha }_0^\top {\mathbf { Z}_1 }-\Lambda _0(t)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }_i}\right) \right\} \right] } \ge \frac{\varepsilon }{m^2+m}. \end{aligned}$$

Therefore, \(\gamma (t)=\frac{E\left[ W^{\varvec{\theta }_0}(t)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }}\right] }{E\left[ W^{\varvec{\theta }^*}(t)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}\right] } \) is bounded away from both \(\infty \) and zero, and

$$\begin{aligned} \sup _{t \in [0,\tau ]}\left| \frac{d\hat{\Lambda }}{d\bar{\Lambda }}(t)-\gamma (t) \right| \rightarrow 0 \text { and} \sup _{t \in [0,\tau ^*]}\left| \hat{\Lambda }(t)-\int _0^t\gamma d\Lambda _0 \right| \rightarrow 0 \; a.s. , \forall \tau ^*<\tau \text { in }\mathbb {Q}. \end{aligned}$$
(A19)

After all these preparation, we can use the semi-parametric Kullback-Leibler divergence argument from Murphy (1994). We have

$$\begin{aligned} 0 \le&\frac{1}{n} \big \{ l_n(\hat{\varvec{\alpha }},\hat{\varvec{\beta }},\hat{\Lambda }) -l_n(\varvec{\alpha }_0,\varvec{\beta }_0,\bar{\Lambda }) \big \} \nonumber \\ \nonumber =&\frac{1}{n}\sum _{i=1}^n \int _0^\tau \log \bigg \{ \frac{\phi _i^{\hat{\varvec{\theta }}}(u)e^{\hat{\varvec{\beta }}^\top {\mathbf { Z}_2 }_i}d \hat{\Lambda }(u) }{\phi _i^{\varvec{\theta }_0}(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }_i}d \bar{\Lambda }(u)} \bigg \} \bigg \{ dN_i(u)- \phi _i^{\varvec{\theta }_0}(u) Y_i(u) e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }_i} d\bar{\Lambda }(u)\bigg \}\nonumber \\&+ \int _0^\tau \left[ \log \bigg \{ \frac{\phi _i^{\hat{\varvec{\theta }}}(u)e^{\hat{\varvec{\beta }}^\top {\mathbf { Z}_2 }_i}d \hat{\Lambda }(u) }{\phi _i^{\varvec{\theta }_0}(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }_i}d \bar{\Lambda }(u)} \bigg \} - \bigg \{ \frac{\phi _i^{\hat{\varvec{\theta }}}(u)e^{\hat{\varvec{\beta }}^\top {\mathbf { Z}_2 }_i}d\hat{\Lambda }(u)}{\phi _i^{\varvec{\theta }_0}(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }_i}d\bar{\Lambda }(u)}-1\bigg \} \right] \nonumber \\&\times \phi _i^{\varvec{\theta }_0}(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }_i}Y_i(u) d\bar{\Lambda }(u). \end{aligned}$$
(A20)

Denote the function in the logarithm above as \(\psi _i(u)\). Using the definition of \(\bar{\Lambda }\), we can rewrite the first term in (A20) as

$$\begin{aligned}&\frac{1}{n}\sum _{i=1}^n \left\{ \int _0^\tau \log \big (\psi _i(u)\big ) - \frac{\sum _{j=1}^n \log \big (\psi _j(u)\big )\phi _j^{\varvec{\theta }_0}(u)Y_j(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }_j}}{\sum _{j=1}^n \phi _j^{\varvec{\theta }_0}(u)Y_j(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }_j}} \right\} dN_i(u) \nonumber \\&\quad = \frac{1}{n}\sum _{i=1}^n \left\{ \int _0^\tau \log \big (\psi _i(u)\big ) - \frac{\sum _{j=1}^n \log \big (\psi _j(u)\big )\phi _j^{\varvec{\theta }_0}(u)Y_j(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }_j}}{\sum _{j=1}^n \phi _j^{\varvec{\theta }_0}(u)Y_j(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }_j}} \right\} dM_i(u) \end{aligned}$$
(A21)

Inside \(\psi _i(u)\), the ratio \(d\hat{\Lambda }/d\bar{\Lambda }\) is bounded away from 0 and \(\infty \) according to (A19). Denote the range of the ratio as [1 / RR]. The \(\phi _i^{\varvec{\theta }_0}(u)\) term and \(\phi _i^{\hat{\varvec{\theta }}}(u)\) term in \(\psi _i(u)\) creates potential singularity for (A21) at \(\tau \), but its decay rate is bounded by \(e^{-mR \Lambda _0(u)}\) by Assumptions 1 and 2. The integrands of martingale integral (A21) are all bounded a.s., and the quadratic variation of (A21) is bounded a.s. by

$$\begin{aligned} \frac{1}{n^2}\sum _{i=1}^n \int _0^\tau 4\big \{mR \Lambda _0(u) + \log (R) \big \}^2 \phi _i^{\varvec{\theta }_0}(u) Y_i(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }_i} d\Lambda _0(u). \end{aligned}$$

It is of order \(O_p(1/n)\), so the limit of (A21) is zero almost surely.

The integrands in the second term of (A20) is of the form \(\log (x)-(x-1) \le 0\). In order to satisfy the inequality in (A20), we must have

$$\begin{aligned} \lim _{n \rightarrow \infty } \frac{1}{n}\sum _{i=1}^n \int _0^\tau \big \{\log \big (\psi _i(u)\big ) - \big (\psi _i(u) -1 \big )\big \} \phi _i^{\varvec{\theta }_0}(u)e^{\hat{\varvec{\beta }}^\top {\mathbf { Z}_2 }_i}Y_i(u) d\bar{\Lambda }(u)= 0. \end{aligned}$$

Applying the same argument as in Murphy (1994), we get

$$\begin{aligned} E\left( \int _0^\tau \left| \phi ^{\varvec{\theta }^*}(u)e^{\varvec{\beta }^{*\top } {\mathbf { Z}_2 }}\gamma (u) - \phi ^{\varvec{\theta }_0}(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }} \right| Y(u)d\Lambda _0(u)\right) =0 \end{aligned}$$
(A22)

in the almost sure set. The identifiability of our model is verified in Li et al. (2001) Theorem 2. Along with our regularity conditions in Assumptions 2 and 3, (A22) leads to \(\varvec{\alpha }^*=\varvec{\alpha }_0\), \(\varvec{\beta }^*=\varvec{\beta }_0\) and \(\gamma (t)=1\). This implies that

$$\begin{aligned} \sup _{t \in [0,\tau ^*]}\left| \hat{\Lambda }(t)-\Lambda _0 (t)\right| \rightarrow 0 \; a.s. , \forall \tau ^*<\tau \text { in }\mathbb {Q}. \end{aligned}$$

Repeating the trick in (A15)-(A18), we have

$$\begin{aligned} \sup _{t \in [0,\tau ]}\left| e^{-\hat{\Lambda }(t)}-e^{-\Lambda _0 (t)}\right| \rightarrow 0 \; a.s.. \end{aligned}$$

Finally, we summarize all usage of almost sure arguments to ensure that intersection of all almost sure sets still has probability one under \(\sigma \)-additivity. The steps (A15)–(A18) involves one almost sure argument for each choice of \(\tau ^*\). We preserve the almost sure property by restricting \(\tau ^*\) to be in the countable set \(\mathbb {Q}\). One almost sure argument is made for Helly’s selection theorem. In Lemma A1, we use the Glivenko-Cantelli Theorem to avoid the dependence on the choice of \(\varvec{\theta }^*\), so the almost sure argument is only applied once. Two more almost sure arguments are used in calculating the limit of the terms in (A20). \(\square \)

Proof of Theorem 3’

The proof is essentially the same as the Proof of Theorem 3, so the details are omitted. In fact, it is less technical due to the boundedness of \(\Lambda _0\) over \([0, \tau ']\). \(\square \)

1.3 A.3 Asymptotic normality

First, we provide the definition of several quantities below. In Theorem 4 \(\sigma (\mathbf {h})=\Big (\varvec{\sigma }_a(\mathbf {h}),\varvec{\sigma }_b(\mathbf {h}),\sigma _\eta (\mathbf {h})\Big )\) is

$$\begin{aligned} \varvec{\sigma }_a(\mathbf {h})= E\Bigg [&{\mathbf { Z}_1 }\bigg \{ -\int _0^{\tau '} K^{\varvec{\theta }_0}_1(\mathbf {h})(u)Y(u)d\phi ^{\varvec{\theta }_0}(u) \nonumber \\&+ K^{\varvec{\theta }_0}_2(\mathbf {h})Y(\tau ')\phi ^{\varvec{\theta }_0}(\tau ')\Big (1-\phi ^{\varvec{\theta }_0}(\tau ')\Big ) \bigg \}\Bigg ], \nonumber \\ \varvec{\sigma }_b(\mathbf {h})= E\Bigg [&{\mathbf { Z}_2 }\bigg \{\int _0^{\tau '} K^{\varvec{\theta }_0}_1(\mathbf {h})(u)Y(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}d\Big [\Lambda _0(u)\phi ^{\varvec{\theta }_0}(u)\Big ]\nonumber \\&- K^{\varvec{\theta }_0}_2(\mathbf {h})Y(\tau ')e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}\Lambda _0(\tau ')\phi ^{\varvec{\theta }_0}(\tau ')\Big (1-\phi ^{\varvec{\theta }_0}(\tau ')\Big )\bigg \}\Bigg ], \nonumber \\ \sigma _\eta (\mathbf {h})=E\Bigg [&e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}\bigg \{ K_1^{\varvec{\theta }_0}(\mathbf {h})(u)\phi ^{\varvec{\theta }_0}(u)Y(u) - K_2^{\varvec{\theta }_0}(\mathbf {h})Y(\tau ') \phi ^{\varvec{\theta }_0}(\tau ')\Big (1-\phi ^{\varvec{\theta }_0}(\tau ')\Big ) \nonumber \\&-\int _u^{\tau '} K^{\varvec{\theta }_0}_1(\mathbf {h})(s)\phi ^{\varvec{\theta }_0}(s)\Big (1-\phi ^{\varvec{\theta }_0}(s)\Big )Y(s)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}d\Lambda _0(s)\bigg \}\Bigg ], \end{aligned}$$
(A23)

where

$$\begin{aligned} K_1^{\varvec{\theta }}(\mathbf {h})(u)=\,&\mathbf {a}^\top {\mathbf { Z}_1 }\Big (1-\phi ^{\varvec{\theta }}(u)\Big ) +\mathbf {b}^\top {\mathbf { Z}_2 }\left\{ 1-\Big (1-\phi ^{\varvec{\theta }}(u)\Big ) \Lambda (u)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }}\right\} \nonumber \\&+\eta (u)-\Big (1-\phi ^{\varvec{\theta }}(u)\Big ) e^{\varvec{\beta }^\top {\mathbf { Z}_2 }} \int _0^u \eta d\Lambda , \nonumber \\ K_2^{\varvec{\theta }}(\mathbf {h})=&\,\Big \{\mathbf {a}^\top {\mathbf { Z}_1 }-\mathbf {b}^\top {\mathbf { Z}_2 }\Lambda (\tau ') e^{\varvec{\beta }^\top {\mathbf { Z}_2 }} -\int _0^{\tau '}\eta e^{\varvec{\beta }^\top {\mathbf { Z}_2 }} d\Lambda \Big \}. \end{aligned}$$
(A24)

Let \( \varvec{\theta }+t\mathbf {h}=\Big (\varvec{\alpha }+t\mathbf {a},\varvec{\beta }+t\mathbf {b},\int _0^\cdot (1+t\eta )d\Lambda \Big ) \). Define the directional derivatives

$$\begin{aligned} \lim _{t\rightarrow 0}\frac{l^I_n(\varvec{\theta }+t\mathbf {h})-l^I_n(\varvec{\theta })}{t} =S^{\varvec{\theta }}_n=S^{\varvec{\theta }}_{n,a}+S^{\varvec{\theta }}_{n,b}+S^{\varvec{\theta }}_{n,\eta }, \end{aligned}$$

where

$$\begin{aligned} S^{\varvec{\theta }}_{n,a}=&\frac{1}{n}\sum _{i=1}^n \mathbf {a}^\top {\mathbf { Z}_1 }_i \bigg \{\int _0^{\tau '} \Big (1-\phi _i^{\varvec{\theta }}(u)\Big ) dN_i(u)\\&-\int _0^{\tau '} Y_i(u)\phi _i^{\varvec{\theta }}(u)\Big (1-\phi _i^{\varvec{\theta }}(u)\Big ) e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}d\Lambda (u) \\&+\Big (N_i(\tau )-N_i(\tau ')\Big )\Big (1-\phi _i^{\varvec{\theta }}(\tau ')\Big ) -Y_i(\tau )\phi _i^{\varvec{\theta }}(\tau ')\bigg \}\\ S_{n,b}^{\varvec{\theta }}=&\frac{1}{n}\sum _{i=1}^n \mathbf {b}^\top {\mathbf { Z}_2 }_i \bigg [ \int _0^{\tau '} \left\{ 1-\Big (1-\phi _i^{\varvec{\theta }}(u)\Big ) \Lambda (u)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\right\} dN_i(u)\\&+\int _0^{\tau '} Y_i(u)\phi _i^{\varvec{\theta }}(u)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \left\{ \Big (1-\phi _i^{\varvec{\theta }}(u)\Big )\Lambda (u)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} -1 \right\} d\Lambda (u)\\&-\Big (N_i(\tau )-N_i(\tau ')\Big )\Big (1-\phi _i^{\varvec{\theta }} (\tau ')\Big )\Lambda (\tau ')e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}+Y_i(\tau )\phi _i^{\varvec{\theta }}(\tau ')\Lambda (\tau ')e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\bigg ]\\ S_{n,\eta }^{\varvec{\theta }} =&\frac{1}{n}\sum _{i=1}^n \int _0^{\tau '} \left[ \eta (u)-\Big \{1-\phi _i^{\varvec{\theta }}(u)\Big \} e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \int _0^u \eta d\Lambda \right] dN_i(u) \\&+ \int _0^{\tau '} Y_i(u)\phi _i^{\varvec{\theta }}(u)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \left[ \Big \{1-\phi _i^{\varvec{\theta }}(u)\Big \} e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \int _0^u \eta d\Lambda -\eta (u) \right] d\Lambda (u)\\&-\Big (N_i(\tau )-N_i(\tau ')\Big )\Big (1-\phi _i^{\varvec{\theta }}(\tau ')\Big )\int ^{\tau '}_0\eta d\Lambda e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\\&+Y_i(\tau )\phi _i^{\varvec{\theta }}(\tau ')\int ^{\tau '}_0\eta d\Lambda e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}. \end{aligned}$$

Their expectations are denoted as

$$\begin{aligned} S^{\varvec{\theta }}=S^{\varvec{\theta }}_a+S^{\varvec{\theta }}_b+S^{\varvec{\theta }}_\eta =E\left( S^{\varvec{\theta }}_{n,a}\right) +E\left( S^{\varvec{\theta }}_{n,b}\right) +E\left( S^{\varvec{\theta }}_{n,\eta }\right) . \end{aligned}$$

Again let \(\varvec{\theta }_0\) be the true parameter and \(\varvec{\theta }\) another element in the paramter space. Define \(\triangle \varvec{\theta }=\varvec{\theta }-\varvec{\theta }_0\) with

$$\begin{aligned} \triangle \varvec{\alpha }=\varvec{\alpha }-\varvec{\alpha }_0, \, \triangle \varvec{\beta }=\varvec{\beta }-\varvec{\beta }_0 \text { and } \triangle \Lambda (\cdot )=\Big \{\Lambda (\cdot )-\Lambda _0(\cdot )\Big \}. \end{aligned}$$

Define \(lin \Theta \) to be the linear space spanned by \(\{ \varvec{\theta }-\varvec{\theta }_0 : \varvec{\theta }\text { in parameter space}\}\). Let \(\varvec{\theta }_t = \varvec{\theta }_0+t\triangle \varvec{\theta }\). The functional Hessian is a linear operator \(lin \Theta \mapsto l^\infty (H_p)\) defined as

$$\begin{aligned} \dot{S}^{\varvec{\theta }_0}(\triangle \varvec{\theta })(\mathbf {h}) =&\lim _{t\rightarrow 0}\frac{S^{\varvec{\theta }_t }(\mathbf {h})-S^{\varvec{\theta }_0}(\mathbf {h})}{t} \nonumber \\ =&-\triangle \varvec{\alpha }^\top \varvec{\sigma }_a(\mathbf {h}) -\triangle \varvec{\beta }^\top \varvec{\sigma }_b (\mathbf {h}) -\int _0^{\tau '} \sigma _\eta (\mathbf {h})(u)d\triangle \Lambda (u) \end{aligned}$$
(A25)

with \(\sigma \) defined in (A23).

The following Lemma A2 is used in the proofs of Theorems 4 and 5. It tells us about the property of \(\sigma \), the essential element in the functional Hessian.

Lemma A2

Let the operator \(\sigma : (\mathbf {a},\mathbf {b},\eta ) \mapsto \Big (\varvec{\sigma }_a(\mathbf {h}),\varvec{\sigma }_b(\mathbf {h}),\sigma _\eta (\mathbf {h})\Big )\) be defined as in (A23). Under the conditions of Theorem 4, \(\sigma \) is a continuously invertible bijection from \(H_\infty \) to \(H_\infty \).

Proof of Lemma A2

First we prove that \(\sigma \) is injection by an identifiability argument. Define an inner-product between \(\sigma (\mathbf {h})\) and \(\mathbf {h}\) as

$$\begin{aligned} \Big <\sigma (\mathbf {h}),\mathbf {h}\Big >=&\, \mathbf {a}^\top \varvec{\sigma }_a(\mathbf {h})+\mathbf {b}^\top \varvec{\sigma }_b(\mathbf {h})+\int _0^{\tau '}\sigma _\eta (\mathbf {h})(u)\eta (u) d\Lambda _0(u) \\ =&\int _0^{\tau '}E\left[ \big \{K^{\varvec{\theta }_0}_1(\mathbf {h})(u)\big \}^2Y(u)\phi ^{\varvec{\theta }_0}(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}\right] d\Lambda _0(u)\\&+E\left[ \big \{ K^{\varvec{\theta }_0}_2(\mathbf {h})\big \}^2Y(\tau ')\phi ^{\varvec{\theta }_0}(\tau ')\Big (1-\phi ^{\varvec{\theta }_0}(\tau ')\Big ) \right] . \end{aligned}$$

If \(\Big <\sigma (\mathbf {h}),\mathbf {h}\Big >=0\), we have almost surely \(K^{\varvec{\theta }_0}_2(\mathbf {h})=0\) and \(K^{\varvec{\theta }_0}_1(\mathbf {h})(u)=0\) a.e. \(u \in [0, \tau ']\). Therefore,

$$\begin{aligned} \int _0^t K^{\varvec{\theta }_0}_1(\mathbf {h})(u)\phi ^{\varvec{\theta }_0}(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}d\Lambda _0(u)=0, \forall t\in [0,\tau '], a.s.. \end{aligned}$$

Calculating the integral, we have for for any \(t\in [0,\tau ']\) a.s.

$$\begin{aligned} -\mathbf {a}^\top {\mathbf { Z}_1 }\phi ^{\varvec{\theta }_0}(t)+\mathbf {b}^\top {\mathbf { Z}_2 }\phi ^{\varvec{\theta }_0}(t)\Lambda _0(t)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}+\int _0^t\eta (u)d\Lambda _0(u)\phi ^{\varvec{\theta }_0}(t)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}=0. \end{aligned}$$

Setting \(t=0\), we have \(-\mathbf {a}^\top {\mathbf { Z}_1 }\phi ^{\varvec{\theta }_0}(0)=0\), so \(\mathbf {a}^\top {\mathbf { Z}_1 }=0\). By Assumption 2, \(\mathbf {a}=0\). Plugging \(\mathbf {a}=0\) into \(K^{\varvec{\theta }_0}_2\) yields

$$\begin{aligned} K^{\varvec{\theta }_0}_2(\mathbf {h})= e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}\Big \{\mathbf {b}^\top {\mathbf { Z}_2 }\Lambda _0(\tau ')-\int _0^{\tau '}\eta (u)d\Lambda _0(u)\Big \}=0, a.s.. \end{aligned}$$

Again, \(\mathbf {b}^\top {\mathbf { Z}_2 }= \int _0^{\tau '}\eta (u)d\Lambda _0(u)/\Lambda _0(\tau ')\) is deterministic, so \(\mathbf {b}=0\). This way \(\eta \) must also be constantly zero. As a result, \(\sigma (\mathbf {h})=\sigma (\mathbf {h}') \Rightarrow \Big (\sigma (\mathbf {h}-\mathbf {h}'),\mathbf {h}-\mathbf {h}'\Big )=0 \Rightarrow \mathbf {h}=\mathbf {h}'\).

To show it is a bijection, we apply Theorem 3.11 in Conway (1990). It suffices to decompose \(\sigma \) as the sum of one invertible operator and one compact operator. The invertible operator is defined as

$$\begin{aligned} \Sigma (\mathbf {h})=\Big (E\left( {\mathbf { Z}_1 }{\mathbf { Z}_1 }^\top \right) \mathbf {a},E\left( {\mathbf { Z}_2 }{\mathbf { Z}_2 }^\top \right) \mathbf {b}, \eta (t)E\left\{ e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}\phi ^{\varvec{\theta }_0}(t)Y(t)\right\} \Big ). \end{aligned}$$

Since \(E\left( {\mathbf { Z}_1 }{\mathbf { Z}_1 }^\top \right) \), \(E\left( {\mathbf { Z}_2 }{\mathbf { Z}_2 }^\top \right) \) are both positive definite, and \(\inf _{t\in [0,\tau ']}Ee^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}\phi ^{\varvec{\theta }_0}(t)Y(t)>0\), the inverse exists as

$$\begin{aligned} \Sigma ^{-1}(\mathbf {h})=\Big (\left[ E\big \{{\mathbf { Z}_1 }{\mathbf { Z}_1 }^\top \big \}\right] ^{-1}\mathbf {a},\left[ E\big \{{\mathbf { Z}_2 }{\mathbf { Z}_2 }^\top \big \}\right] ^{-1} \mathbf {b}, \eta (t)\left[ E\big \{e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}\phi ^{\varvec{\theta }_0}(t)Y(t)\big \}\right] ^{-1}\Big ). \end{aligned}$$

For the compactness of \(\sigma (\mathbf {h})-\Sigma (\mathbf {h})\), classical Helly-selection plus dominated convergence method applies as all terms are conveniently bounded. \(\square \)

The Proof of Theorem 4 is the application of Theorem 3.3.1 from Van der Vaart and Wellner (1996). We shall verify all the required conditions for the Theorem.

Proof of Theorem 4

Since we work under a modified Assumption 3’ now, the martingale representation in (15) needs to change accordingly beyond \(\tau '\). We still use \(M_i(t)\) as the notation. Define the filtrations \(\big \{\mathcal {F}_t: t \in [0,\tau ] \big \}\). On \([0,\tau ']\), \(\mathcal {F}_t\) is the natural \(\sigma \)-algebra generated by \(\{N_i(t), Y_i(t), {\mathbf { Z}_1 }_i, {\mathbf { Z}_2 }_i, i = 1, \ldots , n\}\). Since there is no extra information in the tail window \((\tau ', \tau )\), we set \(\mathcal {F}_t =\mathcal {F}_{\tau '}\) for \(t \in (\tau ', \tau )\). \(\mathcal {F}_\tau \) is the \(\sigma \)-algebra generated by \(\{N_i(\tau )-N_i(\tau '), Y_i(\tau ), {\mathbf { Z}_1 }_i, {\mathbf { Z}_2 }_i, i = 1, \ldots , n\}\), where \(Y_i(\tau ) = Y_i(\tau ') - dN_i(\tau ')\) is measurable in \(\mathcal {F}_{\tau '}\). The filtrations on \([0,\tau ']\) stay the same, so \(M_i(t)\) defined in (15) is still a martingale up to time \(\tau '\). In the tail window \((\tau ', \tau )\), we set \(M_i(t)\) constantly equals \(M_i(\tau ')\). To extend its definition to time \(\tau \), we define

$$\begin{aligned} d M_i(\tau ) = M_i(\tau ) - M_i(\tau ') = \big \{N_i(\tau )-N_i(\tau ')\big \} - Y_i(\tau ) \phi ^{\varvec{\theta }_0}_i(\tau '). \end{aligned}$$
(A26)

It is easy to verify that \(E[ M_i(\tau )| \mathcal {F}_{\tau '}] = M_i(\tau ')\), so \(M_i(t)\) thus defined is a martingale with respect to the new filtrations \(\big \{\mathcal {F}_t: t \in [0,\tau '] \cup \{\tau \}\big \}\). Analogously, we define the process \(M^{\varvec{\theta }}_i(\cdot )\) which replaces the true parameter \(\varvec{\theta }_0\) in \(M_i(\cdot )\) by arbitrary \(\varvec{\theta }\) in the parameter space. Apparently, \(M^{\varvec{\theta }_0}_i(\cdot ) = M_i(\cdot )\). From here, we establish the needed results based on the martingale theory.

First, we prove weak convergence of the empirical score

$$\begin{aligned} \sqrt{n}(S^{\varvec{\theta }_0}_n-S^{\varvec{\theta }_0}){\mathop {\longrightarrow }\limits ^{l^\infty (H_p)}} \mathcal {W}. \end{aligned}$$
(A27)

Notice that \(S_1^{\varvec{\theta }_0}-S^{\varvec{\theta }_0}\) is a martingale integral with respect to (A26). The weak convergence follows from martingale central limit theorem. The covariance process is given by the expectation of its quadratic variation:

$$\begin{aligned}&\text {Cov}\big (\mathscr {G}(\mathbf {h}),\mathscr {G}(\mathbf {h}^*)\big )=E\Big [ \int _0^{\tau '} K^{\varvec{\theta }_0}_1(\mathbf {h}) K^{\varvec{\theta }_0}_1(\mathbf {h}^*) Y(u)\phi _0(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}d\Lambda _0(u) \\&\qquad \qquad \quad \quad \quad \quad \quad \quad \quad + K^{\varvec{\theta }_0}_2(\mathbf {h}) K^{\varvec{\theta }_0}_2(\mathbf {h}^*)\phi _0(\tau ')\big \{1-\phi _0(\tau ')\big \} \Big ], \end{aligned}$$

where \(K_1\) and \(K_2\) are defined as in (A24).

Next, we verify the approximation condition

$$\begin{aligned} \sqrt{n}\left( S_n^{\hat{\varvec{\theta }}}-S^{\hat{\varvec{\theta }}} - S_n^{\varvec{\theta }_0}-S^{\varvec{\theta }_0}\right) = o_p(1). \end{aligned}$$
(A28)

Consider the class \(\{S_1^{\varvec{\theta }}(\mathbf {h})-S_1^{\varvec{\theta }_0}(\mathbf {h}): \Vert \varvec{\theta }-\varvec{\theta }_0\Vert \le \varepsilon , \mathbf {h}\in H_p \}\). All terms involved in this class are uniformly bounded with uniformly bounded variation, so it is a Donsker class for the set of observable random variables. By checking that \(\phi _i^{\varvec{\theta }}\) is Lipschitz in \(\varvec{\theta }\) under the \(l^\infty (H_p)\) norm, we have almost surely

$$\begin{aligned} \sup _{t,{\mathbf { Z}_2 },{\mathbf { Z}_1 }} |\phi _i^{\varvec{\theta }}(t)-\phi _i^{\varvec{\theta }_0}(t)| = O\left( \Vert \varvec{\theta }-\varvec{\theta }_0\Vert \right) , \end{aligned}$$

and similarly

$$\begin{aligned} \sup _{t,{\mathbf { Z}_2 },{\mathbf { Z}_1 }} |\phi _i^{\varvec{\theta }}(t) {\Lambda }(t)-\phi _i^{\varvec{\theta }_0}(t)\Lambda _0(t)| =O\left( \Vert \varvec{\theta }-\varvec{\theta }_0\Vert \right) . \end{aligned}$$

For a single summand in the score,

$$\begin{aligned} \sup _{h\in H_p}E[S_1^{\varvec{\theta }}(\mathbf {h})-S_1^{\varvec{\theta }_0}(\mathbf {h})]^2 =O\left( \Vert \varvec{\theta }-\varvec{\theta }_0\Vert ^2\right) . \end{aligned}$$

We plug \(\hat{\varvec{\theta }}\) into the expression above. Thus, the variance of the limiting process of (A28) is o(1) by the consistency of \(\hat{\varvec{\theta }}\) from Theorem 3’, so the process itself is \(o_p(1)\).

We then show the Fréchet differentiability of expected score S at \(\varvec{\theta }_0\) in the direction of \(\hat{\varvec{\theta }}-\varvec{\theta }_0\),

$$\begin{aligned} S^{\hat{\varvec{\theta }}_t}-S^{\varvec{\theta }_0}=t\dot{S}^{\varvec{\theta }_0} (\hat{\varvec{\theta }}-\varvec{\theta }_0)+o_p(t\Vert \hat{\varvec{\theta }}-\varvec{\theta }_0\Vert ). \end{aligned}$$
(A29)

We use a shorthand notation for the expected score at \(\varvec{\theta }\):

$$\begin{aligned} S^{\varvec{\theta }}(\mathbf {h})&= E\left[ \int _0^{\tau '} K_1^{\varvec{\theta }}(\mathbf {h})(u)dM^{\varvec{\theta }}(u) + K_2^{\varvec{\theta }}(\mathbf {h}) d M^{\varvec{\theta }}(\tau )\right] \\&= E\left[ \int _0^{\tau } V^{\varvec{\theta }}(\mathbf {h})(u) d M^{\varvec{\theta }}(u)\right] , \end{aligned}$$

by setting

$$\begin{aligned} V^{\varvec{\theta }}(\mathbf {h})(t) = I(t \le \tau ')K_1^{\varvec{\theta }}(\mathbf {h})(t) + I(t=\tau ) K_2^{\varvec{\theta }}(\mathbf {h}). \end{aligned}$$

By the Lipschitz continuity with respect to \(\Vert \varvec{\theta }\Vert \) for all terms involved, \( K_1^{\varvec{\theta }}(\mathbf {h})\), \(K_2^{\varvec{\theta }}(\mathbf {h})\) and \(dM^{\varvec{\theta }}\),

$$\begin{aligned}&S^{ {\varvec{\theta }}_t}(\mathbf {h})-S^{\varvec{\theta }}(\mathbf {h}) \\&\quad = E\left[ \int _0^{\tau '} V^{ {\varvec{\theta }}_t}(\mathbf {h})(u)dM^{ {\varvec{\theta }}_t}(u) \right] \\&\quad = E\left[ \int _0^{\tau '} V^{\varvec{\theta }_0} (\mathbf {h})(u)d\big \{M^{ {\varvec{\theta }}_t}(u)- M^{\varvec{\theta }_0}(u)\big \}\right] +E\left[ \int _0^{\tau '}V^{ {\varvec{\theta }}_t}(\mathbf {h})(u)dM^{\varvec{\theta }_0}(u)\right] \\&\quad \quad + E\left[ \int _0^{\tau '}\big \{V^{ {\varvec{\theta }}_t}(\mathbf {h})(u)- V^{\varvec{\theta }_0}(\mathbf {h})(u)\big \}d\big \{M^{ {\varvec{\theta }}_t}(u)- M^{\varvec{\theta }_0}(u)\big \} \right] \\&\quad = t\dot{S}^{\varvec{\theta }_0}( {\varvec{\theta }}-\varvec{\theta }_0)(\mathbf {h})+0+O_p(t^2\Vert {\varvec{\theta }}-\varvec{\theta }_0\Vert ^2). \end{aligned}$$

Again, we plug-in \(\hat{\varvec{\theta }}\) and use the consistency result to verify the condition (A29).

Afterwards, we find the local inverse of the functional Hessian in (A25). We have shown in Lemma A2 that the functional operator \(\sigma \) is a continuously invertible bijection from \(H_\infty \) to \(H_\infty \). The invertibility of \(\dot{S}^{\varvec{\theta }_0}\) in \(H_p\) follows from the following argument. By the continuous invertibility of \(\sigma \), there is some q so that \(\sigma ^{-1}(H_q) \subseteq H_p\), and

$$\begin{aligned}&\inf _{\triangle \varvec{\theta }\in lin \Theta } \frac{\sup _{\mathbf {h}\in H_p}|(\varvec{\alpha }-\varvec{\alpha }_0)^\top \varvec{\sigma }_a(\mathbf {h}) +(\varvec{\beta }-\varvec{\beta }_0)^\top \varvec{\sigma }_b (\mathbf {h}) +\int _0^{\tau '}\sigma _\eta (\mathbf {h}) d(\Lambda -\Lambda _0)|}{ \Vert \triangle \varvec{\theta }\Vert _{l^\infty (H_p)}\ } \nonumber \\&\quad \ge \inf _{\triangle \varvec{\theta }\in lin \Theta } \frac{\sup _{\mathbf {h}\in \sigma ^{-1}(H_q)}|(\varvec{\alpha }-\varvec{\alpha }_0)^\top \varvec{\sigma }_a(\mathbf {h}) +(\varvec{\beta }-\varvec{\beta }_0)^\top \varvec{\sigma }_b (\mathbf {h}) +\int _0^{\tau '}\sigma _\eta (\mathbf {h}) d(\Lambda -\Lambda _0)|}{ p\Vert \triangle \varvec{\theta }\Vert } \nonumber \\&\quad =\inf _{\triangle \varvec{\theta }\in lin \Theta } \frac{\sup _{\mathbf {h}\in H_q}| \triangle \varvec{\theta }(\mathbf {h}) |}{ p\Vert \triangle \varvec{\theta }\Vert } > \frac{q }{2p}. \end{aligned}$$
(A30)

Finally, let us put everything together. The NPMLE \(\hat{\varvec{\theta }}\) is shown to be consistent in Theorem 3’, and (A27), (A28), (A29) and (A30) verify the conditions of Theorem 3.3.1 from Van der Vaart and Wellner (1996). \(\square \)

Proof of Theorem 5

The proof for the continuous invertibility of \(\hat{\sigma }\) is similar to the Proof of Lemma A2. The approximation error between the natural estimator \(\hat{\sigma }\) and Louis’ formula variance estimator using (14) again comes from the “ghost copies” like the case in Lemma 1, so the same argument applies to show their asymptotic equivalence. \(\square \)

Appendix B: Variance Estimator

1.1 B.1 Derivatives of log-likelihood

Let \(l^c(\varvec{\alpha },\varvec{\beta },{\varvec{\lambda }})=\sum _{i=1}^n l^c_i(\varvec{\alpha },\varvec{\beta },{\varvec{\lambda }})\) be the complete data log-likelihood,

$$\begin{aligned} l^c_i(\varvec{\alpha },\varvec{\beta },{\varvec{\lambda }}) =&\, (A_i+M_i) \varvec{\alpha }^\top {\mathbf { Z}_1 }_i -(1+M_i)\log (1+e^{\varvec{\alpha }^\top {\mathbf { Z}_1 }_i})\\&+\delta ^1_i A_i \sum _{k=1}^K I\{X_i=t_k\}(\log \lambda _k +\varvec{\beta }^\top {\mathbf { Z}_2 }_i) - A_i \sum _{k:t_k \le X_i} \lambda _k e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\\&+M_i\sum _{k:t_k<Q_i} I\{\kappa _i=k\}\Big (\log \lambda _k+\varvec{\beta }^\top {\mathbf { Z}_2 }_i-\sum _{h=1}^k \lambda _h e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\Big ). \end{aligned}$$

Its gradient is given by

$$\begin{aligned} \nabla l^c_i=\left( \frac{\partial l^c_i}{\partial \varvec{\alpha }}, \frac{\partial l^c_i}{\partial \varvec{\beta }}, \frac{\partial l^c_i}{\partial {\varvec{\lambda }}}\right) ^\top , \end{aligned}$$

where

$$\begin{aligned} \frac{\partial l^c_i}{\partial \varvec{\alpha }} =&\, {\mathbf { Z}_1 }_i\Big \{A_i+M_i-(1+M_i)\frac{e^{\varvec{\alpha }^\top {\mathbf { Z}_1 }_i}}{1+e^{\varvec{\alpha }^\top {\mathbf { Z}_1 }_i}}\Big \} = {\mathbf { Z}_1 }_i\big \{A_i-p_i+M_i(1-p_i)\big \}, \\ \frac{\partial l^c_i}{\partial \varvec{\beta }} =&\, {\mathbf { Z}_2 }_i \bigg \{A_i \delta ^1_i +M_i-\Big (A_i \sum _{k:t_k \le X_i}\lambda _k +M_i \sum _{k=1}^{\kappa _i} \lambda _k \Big )e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\bigg \} \\ =&\, {\mathbf { Z}_2 }_i \Big \{A_i \delta ^1_i +M_i-A_i \Lambda _i(X_i) -M_i \Lambda _i(\kappa _i)\Big \}, \\ \frac{\partial l^c_i}{\partial \lambda _k} =&\, \Big (A_i \delta ^1_i I\{X_i=t_k\}+M_i I\{\kappa _i=k\}\Big )\frac{1}{\lambda _k}-\Big (A_i I\{t_k \le X_i\}+M_iI\{\kappa _i \ge t_k\}\Big ) e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\\ =&\, A_i\Big ( \frac{\delta ^1_i I\{X_i=t_k\}}{\lambda _k}-I\{t_k \le X_i\}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\Big ) + M_i\Big ( \frac{I\{\kappa _i=k\}}{\lambda _k}- I\{\kappa _i \ge t_k\} e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\Big ). \end{aligned}$$

Its Hessian is given by

$$\begin{aligned} \nabla ^2 l^c_i=\left( \begin{array}{ccc} \frac{\partial ^2 l^c_i}{\partial \varvec{\alpha } \partial \varvec{\alpha }^\top } &{} 0 &{} 0 \\ 0 &{} \frac{\partial ^2 l^c_i}{\partial \varvec{\beta } \partial \varvec{\beta }^\top } &{} \frac{\partial ^2 l^c_i}{\partial \varvec{\beta } \partial \varvec{\lambda }^\top } \\ 0 &{} \left[ {\frac{\partial ^2 l^c_i}{\partial \varvec{\beta } \partial \varvec{\lambda }^\top }} \right] ^\top &{} \text {diag}(\frac{\partial ^2 l^c_i}{\partial \lambda _k^2 }) \end{array}\right) , \end{aligned}$$

where

$$\begin{aligned} \frac{\partial ^2 l^c_i}{\partial \varvec{\alpha } \partial \varvec{\alpha }^\top } =&\, {\mathbf { Z}_1 }_i {\mathbf { Z}_1 }_i^\top \Big \{-(1+M_i)\frac{e^{\varvec{\alpha }^\top {\mathbf { Z}_1 }_i}}{(1+e^{\varvec{\alpha }^\top {\mathbf { Z}_1 }_i})^2}\Big \} = - {\mathbf { Z}_1 }_i {\mathbf { Z}_1 }_i^\top (1+M_i)p_i(1-p_i), \\ \frac{\partial ^2 l^c_i}{\partial \varvec{\beta } \partial \varvec{\beta }^\top } =&\, {\mathbf { Z}_2 }_i {\mathbf { Z}_2 }_i^\top \bigg \{-\Big (A_i \sum _{k:t_k \le X_i}\lambda _k +M_i \sum _{k=1}^{\kappa _i} \lambda _k \Big )e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\bigg \}, \\ \frac{\partial ^2 l^c_i}{\partial \varvec{\beta }\partial \lambda _k} =&\, {\mathbf { Z}_2 }_i \bigg \{-\Big (A_i I\{t_k \le X_i\} +M_i I\{t_k \le \kappa _i\} \Big )e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\bigg \}, \\ \frac{\partial ^2 l^c_i}{\partial \lambda _k^2 } =&-\Big (A_i \delta ^1_i I\{X_i=t_k\}+M_i I\{\kappa _i=k\}\Big )\frac{1}{\lambda _k^2}, \\ \frac{\partial ^2 l^c_i}{\partial \varvec{\alpha } \partial \varvec{\beta }^\top } =&\frac{\partial ^2 l^c_i}{\partial \varvec{\alpha } \partial \varvec{\lambda }^\top } = \frac{\partial ^2 l^c_i}{\partial \lambda _k \partial \lambda _h }=0, \ \ \ \ k\ne h. \end{aligned}$$

1.2 B.2 Conditional expectations

By the conditional expectations (8)–(10), we are able to calculate the ‘first order’ conditional expectations, \(E[\nabla l^c_i|\mathcal {O}]\) and \(E[\nabla ^2 l^c_i|\mathcal {O}]\):

$$\begin{aligned} E&\left[ \frac{\partial l^c_i}{\partial \varvec{\alpha }} \right] = {\mathbf { Z}_1 }_i\Big \{E(A_i)-p_i+E(M_i)(1-p_i)\Big \}, \\ E&\left[ \frac{\partial l^c_i}{\partial \varvec{\beta }} \right] = {\mathbf { Z}_2 }_i \bigg [E(A_i) \Big \{\delta ^1_i+\log S_i(X_i)\Big \} \\&\qquad \qquad \quad +E(M_i) \Big \{1+\sum _{k:t_k<Q_j}P(\tilde{T}_{ij}=t_k) \log S_i(t_k)\Big \}\bigg ], \\ E&\left[ \frac{\partial l^c_i}{\partial \lambda _k} \right] =E(A_i)\Big \{\frac{\delta ^1_i I\{t_k = X_i\}}{\lambda _k}-I\{t_k \le X_i\}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\Big \}\\&\qquad \qquad \quad +E(M_i)\Big \{\frac{P(\tilde{T}_{ij}=t_k)}{\lambda _k}-P(\tilde{T}_{ij} \ge t_k)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\Big \}. \\ E&\left[ \frac{\partial ^2 l^c_i}{\partial \varvec{\alpha } \partial \varvec{\alpha }^\top } \right] = - {\mathbf { Z}_1 }_i {\mathbf { Z}_1 }_i^\top (1+E(M_i))p_i(1-p_i), \\ E&\left[ \frac{\partial ^2 l^c_i}{\partial \varvec{\beta } \partial \varvec{\beta }^\top } \right] = {\mathbf { Z}_2 }_i {\mathbf { Z}_2 }_i^\top \Big \{ E(A_i) \log S_i(X_i) +E(M_i) \sum _{k:t_k<Q_i}P(\tilde{T}_{ij}=t_k) \log S_i(t_k)\Big \}, \\ E&\left[ \frac{\partial ^2 l^c_i}{\partial \varvec{\beta }\partial \lambda _k} \right] = -{\mathbf { Z}_2 }_i \Big \{E(A_i) I\{t_k \le X_i\} +E(M_i) P(t_k \le \kappa _i) \Big \}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}, \\ E&\left[ \frac{\partial ^2 l^c_i}{\partial \lambda _k^2 } \right] = -\Big \{E(A_i) \delta ^1_i I\{\tilde{T}_{ij}=t_k\}+E(M_i) P(\tilde{T}_{ij}=t_k)\Big \}\frac{1}{\lambda _k^2}. \end{aligned}$$

To calculate ‘second order’ expectation \(E[\nabla l^c_i{\nabla l^c_i}^\top |\mathcal {O}]\), we first compute the conditional variances:

$$\begin{aligned} \text {Var}&[A_i|\mathcal {O}] = \delta ^c_i\frac{p_i(1-p_i)S_i(X_i)}{\big \{1-p_i+p_iS_i(X_i)\big \}^2}, \\ \text {Var}&[M_i|\mathcal {O}] = \frac{p_i\Big [1-S_i(Q_i)\big \}}{\big \{1-p_i+p_iS_i(Q_i)\big \}^2}. \end{aligned}$$

Then,

$$\begin{aligned}&E\left[ \frac{\partial l^c_i}{\partial \varvec{\alpha }} { \frac{\partial l^c_i}{\partial \varvec{\alpha }}}^\top \right] = E\left[ \frac{\partial l^c_i}{\partial \varvec{\alpha }} \right] E\left[ \frac{\partial l^c_i}{\partial \varvec{\alpha }} \right] ^\top +{\mathbf { Z}_1 }_i {\mathbf { Z}_1 }_i^\top \big \{(1-p_i)^2 \text {Var}(M_i)+ \text {Var}(A_i)\big \},\\&\quad E\left[ \frac{\partial l^c_i}{\partial \varvec{\alpha }} { \frac{\partial l^c_i}{\partial \varvec{\beta }}}^\top \right] =E\left[ \frac{\partial l^c_i}{\partial \varvec{\alpha }}\right] E\left[ \frac{\partial l^c_i}{\partial \varvec{\beta }} \right] ^\top +{\mathbf { Z}_1 }_i {\mathbf { Z}_2 }_i^\top \bigg [ \text {Var}(A_i)\big \{\delta ^1_i+\log S_i(X_i)\big \}\\&\qquad \qquad \qquad \qquad \qquad + \text {Var}(M_i)(1-p_i)\Big \{1+\sum _{k:t_k<Q_i}P(\tilde{T}_{ij}=t_k)\log S_i(t_k)\Big \}\bigg ],\\&\quad E\left[ \frac{\partial l^c_i}{\partial \varvec{\beta }} {\frac{\partial l^c_i}{\partial \varvec{\beta }}}^\top \right] =E\left[ \frac{\partial l^c_i}{\partial \varvec{\beta }} \right] E\left[ \frac{\partial l^c_i}{\partial \varvec{\beta }} \right] ^\top +{\mathbf { Z}_2 }_i {\mathbf { Z}_2 }_i^\top \bigg [ \text {Var}(A_i)\Big \{\delta ^1_i+\log S_i(X_i)\Big \}^2\\&\qquad \qquad \qquad \qquad \qquad + \text {Var}(M_i)\Big \{1+\sum _{k:t_k<Q_i}P(\tilde{T}_{ij}=t_k)\log S_i(t_k)\Big \}^2\\&\qquad \qquad \qquad \qquad \qquad +E(M_i)\Big \{\sum _{k:t_k<Q_i}P(\tilde{T}_{ij}=t_k)\log S_i(t_k)^2\\&\qquad \qquad \qquad \qquad \qquad -\Big (\sum _{k:t_k<Q_i}P(\tilde{T}_{ij}=t_k)\log S_i(t_k)\Big )^2\Big \}\bigg ],\\&\quad E\left[ \frac{\partial l^c_i}{\partial \varvec{\alpha }} \frac{\partial l^c_i}{\partial \lambda _k} \right] =E\left[ \frac{\partial l^c_i}{\partial \varvec{\alpha }}\right] E\left[ \frac{\partial l^c_i}{\partial \lambda _k}\right] \\&\qquad \qquad \qquad \qquad \quad + {\mathbf { Z}_1 }_i \bigg [\text {Var}(A_i) \Big \{\frac{\delta ^1_i I\{t_k = X_i\}}{\lambda _k}-I\{t_k \le X_i\}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\Big \} \\&\qquad \qquad \qquad \qquad \quad +\text {Var}(M_i)(1-p_i)\Big \{\frac{P(\tilde{T}_{ij}=t_k)}{\lambda _k}-P(\tilde{T}_{ij} \ge t_k)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\Big \}\bigg ],\\&\quad E\left[ \frac{\partial l^c_i}{\partial \varvec{\beta }} \frac{\partial l^c_i}{\partial \lambda _k} \right] =E\left[ \frac{\partial l^c_i}{\partial \varvec{\beta }}\right] E\left[ \frac{\partial l^c_i}{\partial \lambda _k}\right] \\&\qquad \qquad \qquad \qquad \quad +{\mathbf { Z}_2 }_i \bigg [ \text {Var}(A_i)\big \{\delta ^1_i+\log S_i(X_i)\big \}\\&\qquad \qquad \qquad \qquad \qquad \Big \{\frac{\delta ^1_i I\{t_k = X_i\}}{\lambda _k}-I\{t_k \le X_i\}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\Big \} \\&\qquad \qquad \qquad \qquad \quad + \text {Var}(M_i)\Big \{\frac{P(\tilde{T}_{ij}=t_k)}{\lambda _k}-P(\tilde{T}_{ij} \ge t_k)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\Big \}\\&\qquad \qquad \qquad \qquad \qquad \Big \{1+\sum _{h:t_h<Q_i}P(\tilde{T}_{ij}=t_h)\log S_i(t_h)\Big \}\\&\qquad \qquad \qquad \qquad \quad - E(M_i)\Big \{\sum _{h:t_h<Q_i}P(\tilde{T}_{ij}=t_h)\log S_i(t_h)\frac{P(\tilde{T}_{ij}=t_k)}{\lambda _k}\\&\qquad \qquad \qquad \qquad \quad -\frac{P(\tilde{T}_{ij}=t_k)\log S_i(t_k)}{\lambda _k}\\&\qquad \qquad \qquad \qquad \quad -P\{\tilde{T}_{ij} \ge t_k\}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \sum _{h:t_h<Q_i}P(\tilde{T}_{ij}=t_h)\log S_i(t_h)\\&\qquad \qquad \qquad \qquad \quad +e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\sum _{h =k}^{t_h<Q_i}P(\tilde{T}_{ij}=t_h)\log S_i(t_h)\Big \}\bigg ], \\ \end{aligned}$$
$$\begin{aligned}&\quad E\left[ \frac{\partial l^c_i}{\partial \lambda _k} \frac{\partial l^c_i}{\partial \lambda _h} \right] =EA_i \left\{ -\frac{\delta ^1_i I\{X_i =t_{k\vee h}\}}{\lambda _{k\vee h}}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}+ I\{X_i\ge t_{k\vee h}\}e^{2\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\right\} \\&\qquad \qquad \qquad \qquad \quad +E(A_i) E(M_i) \left\{ \frac{\delta ^1_i I\{\tilde{T}_{ij}=t_k\}}{\lambda _k}- I\{X_i\ge t_k\}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\right\} \\&\qquad \qquad \qquad \qquad \qquad \left\{ \frac{P(\tilde{T}_{ij}=t_h)}{\lambda _h}-P(\tilde{T}_{ij} \ge t_h)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \right\} \\&\qquad \qquad \qquad \qquad \quad +E(A_i) E(M_i) \left\{ \frac{\delta ^1_i I\{X_i =t_h\}}{\lambda _h}- I\{X_i\ge t_h\}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\right\} \\&\qquad \qquad \qquad \qquad \qquad \left\{ \frac{P(\tilde{T}_{ij}=t_k)}{\lambda _k}-P(\tilde{T}_{ij} \ge t_k)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \right\} \\&\qquad \qquad \qquad \qquad \quad + E[M_i^2-M_i] \left\{ \frac{P(\tilde{T}_{ij}=t_k)}{\lambda _k}-P(\tilde{T}_{ij} \ge t_k)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \right\} \\&\qquad \qquad \qquad \qquad \qquad \left\{ \frac{P(\tilde{T}_{ij}=t_h)}{\lambda _h}-P(\tilde{T}_{ij} \ge t_h)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \right\} \\&\qquad \qquad \qquad \qquad \quad +E(M_i) \left\{ -\frac{P(\tilde{T}_{ij} =t_{k\vee h})}{\lambda _{k \vee h}}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} +P(\kappa _i\ge t_{k \vee h})e^{2\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \right\} ,\\&\quad E\left[ \frac{\partial l^c_i}{\partial \lambda _k} \frac{\partial l^c_i}{\partial \lambda _k} \right] =EA_i \left\{ \frac{\delta ^1_i I\{\tilde{T}_{ij}=t_k\}}{\lambda _k}- I\{X_i\ge t_k\}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\right\} ^2\\&\qquad \qquad \qquad \qquad \quad +E(A_i) E(M_i) \left\{ \frac{\delta ^1_i I\{\tilde{T}_{ij}=t_k\}}{\lambda _k}- I\{X_i\ge t_k\}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\right\} \\&\qquad \qquad \qquad \qquad \qquad \left\{ \frac{P(\tilde{T}_{ij}=t_k)}{\lambda _k}-P(\tilde{T}_{ij} \ge t_k)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \right\} \\&\qquad \qquad \qquad \qquad \quad +E(A_i) E(M_i) \left\{ \frac{\delta ^1_i I\{\tilde{T}_{ij}=t_k\}}{\lambda _k}- I\{X_i\ge t_k\}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\right\} \\&\qquad \qquad \qquad \qquad \qquad \left\{ \frac{P(\tilde{T}_{ij}=t_k)}{\lambda _k}-P(\tilde{T}_{ij} \ge t_k)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \right\} \\&\qquad \qquad \qquad \qquad \quad + E[M_i^2-M_i] \left\{ \frac{P(\tilde{T}_{ij}=t_k)}{\lambda _k}-P(\tilde{T}_{ij} \ge t_k)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \right\} \\&\qquad \qquad \qquad \qquad \qquad \left\{ \frac{P(\tilde{T}_{ij}=t_k)}{\lambda _k}-P(\tilde{T}_{ij} \ge t_k)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \right\} \\&\qquad \qquad \qquad \qquad \quad +E(M_i) \left\{ \frac{P(\tilde{T}_{ij}=t_k)}{\lambda ^2_k}- 2\frac{P(\tilde{T}_{ij}=t_k)}{\lambda _k}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\right. \\&\qquad \qquad \qquad \qquad \quad \left. +P(\tilde{T}_{ij} \ge t_k)e^{2\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \right\} . \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hou, J., Chambers, C.D. & Xu, R. A nonparametric maximum likelihood approach for survival data with observed cured subjects, left truncation and right-censoring. Lifetime Data Anal 24, 612–651 (2018). https://doi.org/10.1007/s10985-017-9415-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10985-017-9415-2

Keywords

Navigation