Appendix A: Proofs
1.1 A.1 The existence of NPMLE
Proof of Theorem 1
Let \(\theta _B\) be the maximizer on the compliment of compact set \(\{\Vert \varvec{\alpha }\Vert \vee \Vert \varvec{\beta }\Vert \vee \Vert {\varvec{\lambda }}\Vert \le B\}\). We show that \(l(\theta _B) \rightarrow -\infty \) when \(B \rightarrow \infty \).
By Assumptions 1 and 2, we have the bound (17).
All terms in the log-likelihood are bounded except for
$$\begin{aligned} \sum _{i=1}^{n}\Big \{\delta ^1_i\log \lambda (X_i)-\delta ^1_i e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \Lambda (X_i)\Big \}. \end{aligned}$$
Let \(\lambda _{\max }\) be the largest element in \({\varvec{\lambda }}\). The expression above has the upper bound
$$\begin{aligned} \log ( \lambda _{\max }/m)- \lambda _{\max }/m-K\log m, \end{aligned}$$
which diverges to \(-\infty \) when we set \(B \rightarrow \infty \).
Then, the global maximizer must be in one of the compact set \(\{\Vert \varvec{\alpha }\Vert \vee \Vert \varvec{\beta }\Vert \vee \Vert {\varvec{\lambda }}\Vert \le B^*\}\) for some \(B^*>0\). \(\square \)
Let \(W_i^{\varvec{\theta }}(t)\) be defined as in (21). We define a generic inequality to be referenced later, for any \(\varvec{\theta }= (\varvec{\alpha },\varvec{\beta }, \Lambda )\) in the parameter space whose baseline cumulative hazard \(\Lambda \) is a step function jumping only at the observed event times, \(t_1, \ldots , t_K\):
$$\begin{aligned} 0 < d\Lambda (t_k) \le \left( \sum _{j=1}^nW_j^{\varvec{\theta }} (t_k)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_j}\right) ^{-1}d\bar{N}(t_k), \quad k = 1, \ldots , K. \end{aligned}$$
(A1)
The conclusion of the following Lemma is used in the proofs of both Lemma 1 and Theorem 3.
Lemma A1
Let \(\varvec{\theta }_{(n)} = \left( \varvec{\alpha }_{(n)}, \varvec{\beta }_{(n)}, \Lambda _{(n)}\right) \) be a sequence in the parameter space where \(\Lambda _{(n)}\) is a non-decreasing step function with jumps only at the observed event times. Suppose that \(\varvec{\theta }_{(n)}\) satisfies (A1) and has a subsequence \(\varvec{\theta }_{(n_k)}\) converging to a limiting point \({\varvec{\theta }}^* = (\varvec{\alpha }^*, \varvec{\beta }^*, \Lambda ^*)\) a.s.:
$$\begin{aligned} \varvec{\alpha }_{(n_k)}-\varvec{\alpha }^* \rightarrow 0, \quad \varvec{\beta }_{(n_k)}-\varvec{\beta }^* \rightarrow 0, \quad \sup _{t\in [0,\tau ]}|e^{-\Lambda _{(n_k)}(t)}-e^{-\Lambda ^*(t)}| \rightarrow 0, \quad a.s..\qquad \end{aligned}$$
(A2)
Under Assumptions 1–4,
-
a)
:
-
\(\Lambda ^*(t)< \infty \text { for all } t<\tau \);
-
b)
:
-
\(\inf _{t\in [0,\zeta ]}E[W^{\varvec{\theta }^*}(t)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}]>C_w, \text { for some } C_w>0\).
Proof of Lemma A1
By checking the uniform continuity of \(W_i^{\varvec{\theta }}(t)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\) in \((\varvec{\alpha },\varvec{\beta },e^{-\Lambda (t)})\), we may establish
$$\begin{aligned} \sup _{t \in [0,\tau ]} \left| W_i^{\varvec{\theta }^*}(t)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }_i}- W_i^{\varvec{\theta }_{(n_k)}}(t)e^{\varvec{\beta }_{(n_k)}^\top {\mathbf { Z}_2 }_i}\right| \rightarrow 0, \quad a.s.. \end{aligned}$$
\(W_i^{\varvec{\theta }}(t)\) as a function of observed random variables belongs to a Glivenko-Cantelli class of uniformly bounded functions with uniformly bounded variation. Thus, the pointwise convergence can be strengthen to be uniform convergence,
$$\begin{aligned} \sup _{t \in [0,\tau ]} \left| \frac{1}{n}\sum _{i=1}^{n_k} W_i^{\varvec{\theta }_{(n_k)}}(t)e^{\varvec{\beta }_{(n_k)}^\top {\mathbf { Z}_2 }_i} -E\left[ W^{\varvec{\theta }^*}(t)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}\right] \right| {\mathop {\longrightarrow }\limits ^{a.s.}}0. \end{aligned}$$
Note that \(n^{-1}\sum _{i=1}^{n_k}W_i^{\varvec{\theta }_{(n_k)}} (t)e^{\varvec{\beta }_{(n_k)}^\top {\mathbf { Z}_2 }_i}\) is càglàd, so its limit \(E[W^{\varvec{\theta }^*}(t)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }} ]\) must also be càglàd.
a) Let \(\tau ^*=\inf \{t\in [0,\zeta ]: e^{-\Lambda ^*(t)}=0\}\). We shall prove that \(\tau ^*=\tau \).
Suppose that \(\tau ^*\) is an interior point of \([0,\tau ]\). From Assumption 4, \(d\Lambda _0([s,t]) = \Lambda _0(t) -\Lambda _0(s) >0\) for any \(s<t\) in \([0,\tau ]\). By the definition of \(\tau ^*\), \(\Lambda ^*(t)=\infty \) and \(\phi ^{\varvec{\theta }^*}(t) = 0\) for \(t \in [\tau ^*,\tau ]\), so we have
$$\begin{aligned} E\left[ W^{\varvec{\theta }^*}(\tau ^*)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}\right] =E\left[ \int _{\tau ^*_-}^\tau e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }} dN(u) \right] >0. \end{aligned}$$
By the left continuity of \(W_i^{\varvec{\theta }}(t)\), \(\exists \ s < \tau ^*\), s.t.
$$\begin{aligned} \inf _{t\in [s,\tau ^*]}E\left[ W^{\varvec{\theta }^*}(t) e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }} \right] \ge \frac{1}{2}E\left[ \int _{\tau ^*_-}^\tau e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }} dN(u) \right] . \end{aligned}$$
The total increment of \(\Lambda _{(n_k)}\) in \([s,\tau ^*]\) must be bounded almost surely according to (A1). By the definition of \(\tau ^*\), \(\Lambda ^*(s)<\infty \). Putting these together, we reach the contradiction,
$$\begin{aligned} \Lambda ^*(\tau ^*) \le \varlimsup _{k \rightarrow \infty }\Lambda _{(n_k)}(\tau ^*) \le&\varlimsup _{k \rightarrow \infty }\Lambda _{(n_k)}(s)+ \int _{s_+}^{\tau ^*} \frac{d \bar{N}(u)}{\sum _{i=1}^{n_k}W^{\varvec{\theta }_{(n_k)}}_i(u)e^{\varvec{\beta }_{(n_k)}^\top {\mathbf { Z}_2 }_i}} \\ \le&\Lambda ^*(s)+ \frac{\tau ^*-s}{\inf _{t\in [s,\tau ^*]}E[W^{\varvec{\theta }^*}(t) e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}]}<\infty . \end{aligned}$$
The other case is \(\tau ^* = 0\). Then, \(\Lambda ^*(t)=\infty \) and \(\phi ^{\varvec{\theta }^*}(t) = 0\) for \(t \in [0,\tau ]\). The contradiction is easily established as
$$\begin{aligned} E\left[ W^{\varvec{\theta }^*}(0)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}\right] =E\left[ \int _0^\tau e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }} dN(u) \right] >0. \end{aligned}$$
b) Since \(E[W^{\varvec{\theta }^*}(t) e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}]\) is càglàd, \(\varvec{\theta }_{(n_k)}\) satisfies (A1) and converges uniformly to \(\varvec{\theta }^*\), it can be seen that \(E[W^{\varvec{\theta }^*}(t) e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}] \ge 0\) over the interior of \([0,\zeta ]\).
Write \(n_k^{-1}\sum _{i=1}^{n_k}W_i^{\varvec{\theta }}(t)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\) as
$$\begin{aligned}&n_k^{-1}\sum _{i=1}^{n_k} \int _{t-}^\tau \big \{1-\phi _i^{\varvec{\theta }}(u)\big \}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}dN_i(u)\nonumber \\&\quad +\int _t^\tau Y_i(u)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} d \phi _i^{\varvec{\theta }}(u)+Y_i(t)\phi _i^{\varvec{\theta }}(t)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \nonumber \\&=n_k^{-1}\sum _{i=1}^{n_k} \int _{t+}^\tau \left[ 1-\phi _i^{\varvec{\theta }}(u) -\frac{\sum _{j=1}^{n_k}Y_j(u)\phi _j^{\varvec{\theta }}(u)\big \{1-\phi _j^{\varvec{\theta }}(u)\big \}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_j}}{\sum _{j=1}^{n_k}W_j^{\varvec{\theta }}(u)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_j}}\right] e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}dN_i(u)\nonumber \\&\quad + \big \{1-\phi _i^{\varvec{\theta }}(t)\big \}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}dN_i(t)+ Y_i(t)\phi _i^{\varvec{\theta }}(t)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} . \end{aligned}$$
(A3)
By Assumption 4, all \(Q_i < \zeta \) a.s.. Thus,
$$\begin{aligned} E\left[ W^{\varvec{\theta }^*}(\zeta ) e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}\right] =\,&E\left[ \big \{\delta ^1+\delta ^c \phi ^{\varvec{\theta }^*}(X)\big \} I\{\zeta \le X\}e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }} \right] \\ \ge \,&\,E\left[ \int _\zeta ^\tau e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}dN(u) \right] >0. \end{aligned}$$
For \(t<\zeta \), the difference \(E[W^{\varvec{\theta }^*}(t)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}]-E[ W^{\varvec{\theta }^*}(\zeta )e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}]\) is the limit of an integral like that in (A3), where the integrand has \(\sum _{j=1}^{n_k}W_j^{\varvec{\theta }}(u)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_j}\) in the denominator. So it has potential singularities at the zeros of \(E[ W^{\varvec{\theta }^*}(u)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}]\) for \( u \in [t,\zeta ]\). We shall show that \(E[W^{\varvec{\theta }^*}(u)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}]\) is differentiable with respect to \(d\Lambda _0(u)\) in \([0,\zeta ]\), so that its zero \(u_0\) leads to the divergent form \( - \int _t ^\zeta |u-u_0|^{-1} du. \) We will then reach the contradiction that \(E[W^{\varvec{\theta }^*}(t)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}]=-\infty \), as seen below.
Denote \(R_0\) the set of zeros and limiting zeros from right for \(E[ W^{\varvec{\theta }^*}(u)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}]\). Let set \(R_{\triangle u}\) be the \(\triangle u\) neighborhood of \(R_0\) and \(\Omega ^t_{\triangle u}=[t,\zeta ] \setminus R_{\triangle u}\). \(E[W^{\varvec{\theta }^*}(u)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}]\) is bounded away from zero on \(\Omega ^t_{\triangle u}\). Through (A3),
$$\begin{aligned} E&\left[ W^{\varvec{\theta }^*}(t)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}\right] -E\left[ W^{\varvec{\theta }^*}(\zeta )e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}\right] \nonumber \\&\le -\int _{\Omega ^t_{\triangle u}}\frac{E\left[ Y(u)\phi ^{\varvec{\theta }^*}(u)\big \{1-\phi ^{\varvec{\theta }^*}(u)\big \} e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}\right] }{E[ W^{\varvec{\theta }^*}(u)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}]}E\left[ e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}dN(u)\right] \nonumber \\&\quad + E\left[ \int _{t+}^\zeta \{1-\phi ^{\varvec{\theta }^*}(u)\}dN(u) + \big \{1-\phi ^{\varvec{\theta }^*}(t)\big \}e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}dN(t)+ Y(t)\phi ^{\varvec{\theta }^*}(t)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}\right] . \end{aligned}$$
(A4)
From part a), \(e^{-\Lambda ^*(\zeta )}>0\). For any \( u<\zeta \),
$$\begin{aligned} \phi ^{\varvec{\theta }^*}_i(u) \ge \phi ^{\varvec{\theta }^*}_i(\zeta ) \ge \frac{m^{-1}e^{-m\Lambda ^*(\zeta )}}{1+m^{-1}e^{-m\Lambda ^*(\zeta )}}>0. \end{aligned}$$
So the limit of numerator term \(E\left[ Y(u)\phi ^{\varvec{\theta }^*}(u)\{1-\phi ^{\varvec{\theta }^*}(u)\}e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}\right] \) is bounded away from zero. And \(\forall u \in [0,\zeta ]\),
$$\begin{aligned} \left| \frac{dEW^{\varvec{\theta }^*}(u)}{d\Lambda _0(u)}\right| =&\left| E\left[ \big \{1-\phi ^{\varvec{\theta }^*}(u)\big \}Y(u)\phi ^{\varvec{\theta }_0}(u) e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }} -\phi ^{\varvec{\theta }^*}(u)\frac{dE[Y(u)|{\mathbf { Z}_1 },{\mathbf { Z}_2 }]}{d\Lambda _0(u)}\right] \right| \\ \le&m+\mathcal {L} <\infty . \end{aligned}$$
The first term in (A4) diverges to \(-\infty \) when \(\triangle u \rightarrow 0\). The other terms are bounded, so this is the desired contradiction. \(\square \)
Proof of Lemma 1
a) Define the marginal of the complete data likelihood
$$\begin{aligned} \tilde{L}(\varvec{\theta })=&\sum _{A_i=0,1}\sum _{M_i=0}^\infty \sum _{\widetilde{T}_{i1}=t_k: t_k\le Q_i}\dots \sum _{\widetilde{T}_{iM_i}=t_k: t_k\le Q_i} L^c_i(\varvec{\theta }) \\ =&\prod _{i=1}^n\frac{\big \{p_i\lambda _i(X_i)S_i(X_i)\big \}^{\delta ^1_i} (1-p_i)^{\delta ^0_i}\big \{p_iS_i(X_i)+1-p_i\big \}^{\delta ^c_i}}{1-p_i\sum _{k: t_k\le Q_i}\lambda _k e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}S_i(t_k)}. \end{aligned}$$
From (5) it can be seen that the complete data likelihood \(L^c(\varvec{\theta })\) can be decomposed into the product of one logistic part with one Cox part. The Assumptions 1–3 contain the regularity conditions of these two parts. The event rate \(P(A_i=1)\) is bounded away from both zero and one,
$$\begin{aligned} 0<\frac{m^{-1}}{m^{-1}+1} \le P(A_i=1) \le \frac{m}{m+1} < 1. \end{aligned}$$
The average at-risk process \(E[Y_i(t)]\) is bounded away from zero almost surely. The matrices \({\mathbf { Z}_1 }\) and \({\mathbf { Z}_2 }\) are almost surely of full rank, as \(\text {Var}({\mathbf { Z}_1 })\) and \(\text {Var}({\mathbf { Z}_2 })\) are positive definite. Under these conditions, both parts of the likelihood are concave in the associated sets of parameters, \(\varvec{\alpha }\) and \((\varvec{\beta },{\varvec{\lambda }})\), respective. Thus, \(L^c(\varvec{\theta })\) is almost surely concave in \(\varvec{\theta }\). \(\tilde{L}(\varvec{\theta })\) is also concave as the sum over concave functions. The almost sure convergence of the EM algorithm is guaranteed by the almost sure concaveness of the marginal of the complete data likelihood Dempster et al. (1977). b) To prove the second result, we take the following strategy. For any \(\varvec{\theta }\) denote \(\lambda _{\max ,\zeta }=\max \{\lambda _k: t_k \le \zeta \}\), where \(\zeta \) is the upper bound of truncation time defined in Assumption 4. Define a set in the parameter space:
$$\begin{aligned} {\Theta } = \left\{ \varvec{\theta }=(\varvec{\alpha },\varvec{\beta },\Lambda ) | \lambda _{\max ,\zeta } \le n^{-1}2/C_w\right\} , \end{aligned}$$
(A5)
with \(C_w\) defined in Lemma A1. We would like to show that
$$\begin{aligned} \lim _{n\rightarrow \infty }P(\hat{\varvec{\theta }}, \tilde{\varvec{\theta }}\in \widehat{\Theta }) = 1. \end{aligned}$$
(A6)
This is done through applying Lemma A1, so we will need to verify condition (A1) for \(\tilde{\varvec{\theta }}\) and \(\hat{\varvec{\theta }}\). The convergence of the EM algorithm is obtained in the first step.
First, we show that the EM finds the unique stationary point of \(\tilde{L}(\varvec{\theta })\), which then must be the global maximizer since it is concave from the proof of part 1. Consider the conditional expectation given the observed data as in (8)–(10). It can be verified directly (we skip the algebraic details here) that:
$$\begin{aligned} \nabla \log \tilde{L}(\varvec{\theta })=E_{\varvec{\theta }}[ \nabla \log L^c(\varvec{\theta }) |\mathcal {O}]. \end{aligned}$$
The estimator \(\tilde{\varvec{\theta }}\) is by definition the solution to the left-hand side of the above being zero, hence also the stationary point of \(\tilde{L}(\varvec{\theta })\).
We write down the stationary equation \(\varvec{\theta }^{(l)}=\varvec{\theta }^{(l+1)} = \tilde{\varvec{\theta }}\) for \(\tilde{\lambda }_k\)’s at convergence,
$$\begin{aligned} \tilde{\lambda }_k=\frac{1+\tilde{\lambda }_k\sum _{i=1}^n\frac{\tilde{p}_ie^{\tilde{\varvec{\beta }}\top {\mathbf { Z}_2 }_i} \tilde{S}_i(t_k)I(Q_i \ge t_k)}{1-\tilde{p}_i\sum _{h:h<Q_i}\tilde{f}_i(t_h)}}{\sum _{i=1}^n\left\{ \delta ^1_i I(X_i \ge t_k)+\delta ^c_i\phi ^{\tilde{\varvec{\theta }}}_i(X_i)I(X_i \ge t_k) +\sum _{j\ge k}\frac{\tilde{p}_i \tilde{f}_i(t_j)I(Q_i \ge t_j)}{1-\tilde{p}_i\sum _{h:h<Q_i}\tilde{f}_i(t_h)} \right\} e^{\tilde{\varvec{\beta }}\top {\mathbf { Z}_2 }_i}}, \end{aligned}$$
(A7)
where \(f_i\) was previously defined just above (6). Combining \(\tilde{\lambda }_k\) terms leads to
$$\begin{aligned} \tilde{\lambda }_k^{-1}=\sum _{i=1}^n&\bigg \{\delta ^1_i I(X_i \ge t_k)+\delta ^c_i\phi ^{\tilde{\varvec{\theta }}}_i(X_i)I(X_i \ge t_k)\nonumber \\&-\tilde{p}_i \frac{\tilde{S}_i(t_k)I(Q_i \ge t_k)-\sum _{j\ge k}\tilde{f}_i(t_j)I(Q_i \ge t_j)}{1-\tilde{p}_i\sum _{h:h<Q_i}\tilde{f}_i(t_h)} \bigg \} e^{\tilde{\varvec{\beta }}\top {\mathbf { Z}_2 }_i}. \end{aligned}$$
(A8)
By the mean value theorem,
$$\begin{aligned} 0 \le e^{\lambda _ke^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}}-1-\lambda _ke^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\le \frac{1}{2}\left( \lambda _ke^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\right) ^2 e^{\lambda _ke^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}} \le \frac{1}{2}m^2\lambda _k^2e^{\lambda _km}. \end{aligned}$$
(A9)
where \( m\) is defined in (17). Applying (A9) to the denominator in (A8), we get
$$\begin{aligned} 1-\tilde{p}_i\sum _{h:h<Q_i}\tilde{f}_i(t_h) \ge 1-\tilde{p}_i\{1-\tilde{S}_i(Q_i)\}. \end{aligned}$$
By a similar argument, we have almost surely
$$\begin{aligned}&\tilde{S}_i(t_k)I(Q_i \ge t_k)-\sum _{j\ge k}\tilde{f}_i(t_j)I(Q_i \ge t_j) \\&\quad = \tilde{S}_i(Q_i)I(Q_i \ge t_k)+\sum _{j\ge k}\left\{ 1-e^{-\tilde{\lambda }_j e^{\tilde{\varvec{\beta }}\top {\mathbf { Z}_2 }_i}}-\tilde{\lambda }_j e^{\tilde{\varvec{\beta }}\top {\mathbf { Z}_2 }_i}\right\} \tilde{S}_i(t_j)I(Q_i > t_j)\\&\quad \le \tilde{S}_i(Q_i)I(Q_i \ge t_k). \end{aligned}$$
Then, \(\tilde{\varvec{\theta }}\) satisfies (A1).
For \(\hat{\varvec{\theta }}\), it must satisfy the score equation for \(\lambda _k\)’s:
$$\begin{aligned} \frac{\partial l(\varvec{\theta })}{\partial \lambda _k} = \sum _{i=1}^n \left\{ \frac{d N_i(t_k)}{\lambda _k} -W^{\varvec{\theta }}_i(t_k)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\right\} = 0, \quad \forall k=1,\ldots ,K. \end{aligned}$$
This is the equation version of (A1) after rearrangement.
Now let \(\hat{\lambda }_{\max ,\zeta }\) and \(\tilde{\lambda }_{\max ,\zeta }\) be the largest jump for \(\hat{\Lambda }\) and \(\tilde{\Lambda }\) on \([0,\zeta ]\), correspondingly. By Lemma A1 part b), we have
$$\begin{aligned} \limsup _{n\rightarrow \infty }n\hat{\lambda }_{\max ,\zeta } \le C_w^{-1}, \quad \limsup _{n\rightarrow \infty }n\tilde{\lambda }_{\max ,\zeta } \le C_w^{-1}, a.s.. \end{aligned}$$
Hence (A6) is established.
In the set \(\Theta \), we evaluate the discrepancy between \(\log \tilde{L}(\varvec{\theta })\) and \(\log L (\varvec{\theta })\), which can be bounded as following
$$\begin{aligned} 1-S_i(Q_i)-\sum _{k:t_k<Q_i}\lambda _ke^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}S_i(t_k) =\sum _{k:t_k<Q_i} S_i(t_k) \left( e^{\lambda _ke^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}}-1-\lambda _ke^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\right) . \end{aligned}$$
(A10)
Applying (A9) to \(|\log L(\varvec{\theta })-\log \tilde{L}(\varvec{\theta })|\), we have the bound
$$\begin{aligned}&\left| \log L(\varvec{\theta })-\log \tilde{L}(\varvec{\theta })\right| \\&\quad \le \sum _{i=1}^n\left| \log \left\{ 1-p_i+p_iS_i(Q_i)\right\} -\log \left\{ 1-p_i\sum _{k:t_k<Q_i}\lambda _ke^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}S_i(t_k)\right\} \right| \\&\quad \le \sum _{i=1}^n\left| \frac{p_i}{1-p_i}\frac{n}{2}m^2\lambda _k^2e^{\lambda _km}\right| \le \frac{1}{2}n^2 e^{m\lambda _{\max ,\zeta }} m^3\lambda _{\max ,\zeta }^2. \end{aligned}$$
Using the upper bound for \(\lambda _{\max ,\zeta }\) in \(\Theta \), we can bound
$$\begin{aligned} \sup _{\varvec{\theta }\in \Theta }\left| \log L(\varvec{\theta })-\log \tilde{L}(\varvec{\theta })\right| \le e^{\frac{2m}{C_w}}\frac{2m^3}{C_w^2}. \end{aligned}$$
(A11)
In summary whenever \(\hat{\varvec{\theta }}, \tilde{\varvec{\theta }}\in {\Theta }\), we have
$$\begin{aligned} 0 \le \log L(\hat{\varvec{\theta }}) - \log L(\tilde{\varvec{\theta }}) \le \log L(\hat{\varvec{\theta }})-\log \tilde{L}(\hat{\varvec{\theta }}) + \log \tilde{L}(\tilde{\varvec{\theta }}) - \log L(\tilde{\varvec{\theta }}) <e^{\frac{2m}{C_w}}\frac{4m^3}{C_w^2}.\nonumber \\ \end{aligned}$$
(A12)
Combining (A12) and (A6) completes the proof. \(\square \)
Proof of Theorem 2 and 2’
From Lemma 1, we only need to establish the following two facts: (1) \( E[l_1(\varvec{\theta })]\) exists with one unique maximal, and (2) it is locally invertible at the maximal. We will see that (1) is verified through the Proof of Theorem 3, and (2) is verified through the Proof of Theorem 4. \(\square \)
1.2 A.2 Consistency of NPMLE
Proof of Theorem 3
The constants \(m\), c, \(\varepsilon \) and \(\mathcal {L}\) are defined in (17), (18) and (19).
First, we show that the “bridge” \(\bar{\Lambda }\) defined in (22) converges to the true \(\Lambda _0\) in the following sense:
$$\begin{aligned} \sup _{t\in [0,\tau ]}\left| e^{-\bar{\Lambda }(t)}-e^{-\Lambda _0(t)}\right| \rightarrow 0, a.s. \end{aligned}$$
(A13)
as \(n\rightarrow \infty \). We have the bound for \(\forall t \in (0,\tau )\),
$$\begin{aligned} m\ge \frac{E\left[ Y(t)\phi ^{\varvec{\theta }_0}(t)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}\right] }{E\left[ \log \left\{ 1+\exp \left( \varvec{\alpha }_0^\top {\mathbf { Z}_1 }-\Lambda _0(t)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }_i}\right) \right\} \right] } \ge \frac{\varepsilon }{m^2+m}. \end{aligned}$$
(A14)
For any \(\tau ^*<\tau \) in \(\mathbb {Q}\) the set of rational numbers, \(E[ Y(t)\phi ^{\varvec{\theta }_0}(t)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }} ]\) is bounded away from zero over \([0, \tau ^*]\). The uniform convergence of \(\bar{\Lambda }\) to \(\Lambda _0\) over any \([0, \tau ^*]\) can be obtained in the way like Murphy (1994). To extend the result to (A13), we use a trick described in (A15)–(A18). By Assumption 3, \(\Lambda _0\) is non-decreasing and diverges to \(\infty \) at \(\tau \). Therefore,
$$\begin{aligned} \forall \epsilon >0, \, \exists \tau ^* \in (0,\tau ) \cap \mathbb {Q}, \, s.t. \, e^{-\Lambda _0(\tau ^*)}<\epsilon /3. \end{aligned}$$
(A15)
Through Rao’s law of large number and Helly-Bray argument, we have
$$\begin{aligned} \sup _{t\in [0,\tau ^*]}|\bar{\Lambda }(t)-\Lambda _0(t)| \rightarrow 0, \quad a.s. . \end{aligned}$$
(A16)
By continuity of the exponential function,
$$\begin{aligned} \exists N, \, \forall n>N, \, \sup _{t\in [0,\tau ^*]}|e^{-\bar{\Lambda }(t)} -e^{-\Lambda _0(t)}|<\epsilon /3. \end{aligned}$$
(A17)
Then,
$$\begin{aligned} \forall n>N, \, \sup _{t\in [\tau ^*,\tau ]}|e^{-\bar{\Lambda }(t)}-e^{-\Lambda _0(t)}| \le 2e^{-\Lambda _0(\tau ^*)}+|e^{-\bar{\Lambda }(\tau ^*)}-e^{-\Lambda _0(\tau ^*)}| <\epsilon .\qquad \end{aligned}$$
(A18)
Therefore, we have proved (A13).
Next, we evaluate the difference between the limits of \(\hat{\Lambda }\) and \(\bar{\Lambda }\). According to Assumption 1 and \(e^{-\hat{\Lambda }(t)} \in [0,1]\), \((\hat{\varvec{\alpha }},\hat{\varvec{\beta }},e^{-\hat{\Lambda }(t)})\) is bounded. \(\hat{\Lambda }(t)\) is Càdlàg, so is \(e^{-\hat{\Lambda }(t)}\). By Helly’s Selection theorem, there is a subsequence converging uniformly almost surely to some \(\varvec{\theta }^*=(\varvec{\alpha }^*, \varvec{\beta }^*, e^{-\Lambda ^*})\). Lemma A1 part b) gives the bound for \(E\{ W^{\varvec{\theta }}(t)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }} \}\) over \([0,\zeta ]\). We only need to find its bound on \([\zeta ,\tau ]\) in order to mimic the Proof of Lemma 1 of Murphy (1994). Note that
$$\begin{aligned} E\left[ W^{\varvec{\theta }}(t)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }} \right] =&E\left[ \int _{t-}^\tau \big \{1-\phi ^{\varvec{\theta }}(u)\big \}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }}dN(u) \right] \\&-E\left[ \int _t^\tau \phi ^{\varvec{\theta }}(u)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }}dE[Y(u)|{\mathbf { Z}_1 },{\mathbf { Z}_2 }] \right] . \end{aligned}$$
By Assumption 4, \(P(Q_i \le \zeta )=1\), so \(E[Y(u)|{\mathbf { Z}_1 },{\mathbf { Z}_2 }]\) is decreasing on \([\zeta ,\tau ]\). Along with the Lipschitz continuity, we have for \(\forall t \in [\zeta ,\tau )\)
$$\begin{aligned} M\mathcal {L} \ge \frac{E[W^{\varvec{\theta }}(t)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }}]}{E\left[ \log \left\{ 1+\exp \left( \varvec{\alpha }_0^\top {\mathbf { Z}_1 }-\Lambda _0(t)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }_i}\right) \right\} \right] } \ge \frac{\varepsilon }{m^2+m}. \end{aligned}$$
Therefore, \(\gamma (t)=\frac{E\left[ W^{\varvec{\theta }_0}(t)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }}\right] }{E\left[ W^{\varvec{\theta }^*}(t)e^{\varvec{\beta }^{*\top }{\mathbf { Z}_2 }}\right] } \) is bounded away from both \(\infty \) and zero, and
$$\begin{aligned} \sup _{t \in [0,\tau ]}\left| \frac{d\hat{\Lambda }}{d\bar{\Lambda }}(t)-\gamma (t) \right| \rightarrow 0 \text { and} \sup _{t \in [0,\tau ^*]}\left| \hat{\Lambda }(t)-\int _0^t\gamma d\Lambda _0 \right| \rightarrow 0 \; a.s. , \forall \tau ^*<\tau \text { in }\mathbb {Q}. \end{aligned}$$
(A19)
After all these preparation, we can use the semi-parametric Kullback-Leibler divergence argument from Murphy (1994). We have
$$\begin{aligned} 0 \le&\frac{1}{n} \big \{ l_n(\hat{\varvec{\alpha }},\hat{\varvec{\beta }},\hat{\Lambda }) -l_n(\varvec{\alpha }_0,\varvec{\beta }_0,\bar{\Lambda }) \big \} \nonumber \\ \nonumber =&\frac{1}{n}\sum _{i=1}^n \int _0^\tau \log \bigg \{ \frac{\phi _i^{\hat{\varvec{\theta }}}(u)e^{\hat{\varvec{\beta }}^\top {\mathbf { Z}_2 }_i}d \hat{\Lambda }(u) }{\phi _i^{\varvec{\theta }_0}(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }_i}d \bar{\Lambda }(u)} \bigg \} \bigg \{ dN_i(u)- \phi _i^{\varvec{\theta }_0}(u) Y_i(u) e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }_i} d\bar{\Lambda }(u)\bigg \}\nonumber \\&+ \int _0^\tau \left[ \log \bigg \{ \frac{\phi _i^{\hat{\varvec{\theta }}}(u)e^{\hat{\varvec{\beta }}^\top {\mathbf { Z}_2 }_i}d \hat{\Lambda }(u) }{\phi _i^{\varvec{\theta }_0}(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }_i}d \bar{\Lambda }(u)} \bigg \} - \bigg \{ \frac{\phi _i^{\hat{\varvec{\theta }}}(u)e^{\hat{\varvec{\beta }}^\top {\mathbf { Z}_2 }_i}d\hat{\Lambda }(u)}{\phi _i^{\varvec{\theta }_0}(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }_i}d\bar{\Lambda }(u)}-1\bigg \} \right] \nonumber \\&\times \phi _i^{\varvec{\theta }_0}(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }_i}Y_i(u) d\bar{\Lambda }(u). \end{aligned}$$
(A20)
Denote the function in the logarithm above as \(\psi _i(u)\). Using the definition of \(\bar{\Lambda }\), we can rewrite the first term in (A20) as
$$\begin{aligned}&\frac{1}{n}\sum _{i=1}^n \left\{ \int _0^\tau \log \big (\psi _i(u)\big ) - \frac{\sum _{j=1}^n \log \big (\psi _j(u)\big )\phi _j^{\varvec{\theta }_0}(u)Y_j(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }_j}}{\sum _{j=1}^n \phi _j^{\varvec{\theta }_0}(u)Y_j(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }_j}} \right\} dN_i(u) \nonumber \\&\quad = \frac{1}{n}\sum _{i=1}^n \left\{ \int _0^\tau \log \big (\psi _i(u)\big ) - \frac{\sum _{j=1}^n \log \big (\psi _j(u)\big )\phi _j^{\varvec{\theta }_0}(u)Y_j(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }_j}}{\sum _{j=1}^n \phi _j^{\varvec{\theta }_0}(u)Y_j(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }_j}} \right\} dM_i(u) \end{aligned}$$
(A21)
Inside \(\psi _i(u)\), the ratio \(d\hat{\Lambda }/d\bar{\Lambda }\) is bounded away from 0 and \(\infty \) according to (A19). Denote the range of the ratio as [1 / R, R]. The \(\phi _i^{\varvec{\theta }_0}(u)\) term and \(\phi _i^{\hat{\varvec{\theta }}}(u)\) term in \(\psi _i(u)\) creates potential singularity for (A21) at \(\tau \), but its decay rate is bounded by \(e^{-mR \Lambda _0(u)}\) by Assumptions 1 and 2. The integrands of martingale integral (A21) are all bounded a.s., and the quadratic variation of (A21) is bounded a.s. by
$$\begin{aligned} \frac{1}{n^2}\sum _{i=1}^n \int _0^\tau 4\big \{mR \Lambda _0(u) + \log (R) \big \}^2 \phi _i^{\varvec{\theta }_0}(u) Y_i(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }_i} d\Lambda _0(u). \end{aligned}$$
It is of order \(O_p(1/n)\), so the limit of (A21) is zero almost surely.
The integrands in the second term of (A20) is of the form \(\log (x)-(x-1) \le 0\). In order to satisfy the inequality in (A20), we must have
$$\begin{aligned} \lim _{n \rightarrow \infty } \frac{1}{n}\sum _{i=1}^n \int _0^\tau \big \{\log \big (\psi _i(u)\big ) - \big (\psi _i(u) -1 \big )\big \} \phi _i^{\varvec{\theta }_0}(u)e^{\hat{\varvec{\beta }}^\top {\mathbf { Z}_2 }_i}Y_i(u) d\bar{\Lambda }(u)= 0. \end{aligned}$$
Applying the same argument as in Murphy (1994), we get
$$\begin{aligned} E\left( \int _0^\tau \left| \phi ^{\varvec{\theta }^*}(u)e^{\varvec{\beta }^{*\top } {\mathbf { Z}_2 }}\gamma (u) - \phi ^{\varvec{\theta }_0}(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }} \right| Y(u)d\Lambda _0(u)\right) =0 \end{aligned}$$
(A22)
in the almost sure set. The identifiability of our model is verified in Li et al. (2001) Theorem 2. Along with our regularity conditions in Assumptions 2 and 3, (A22) leads to \(\varvec{\alpha }^*=\varvec{\alpha }_0\), \(\varvec{\beta }^*=\varvec{\beta }_0\) and \(\gamma (t)=1\). This implies that
$$\begin{aligned} \sup _{t \in [0,\tau ^*]}\left| \hat{\Lambda }(t)-\Lambda _0 (t)\right| \rightarrow 0 \; a.s. , \forall \tau ^*<\tau \text { in }\mathbb {Q}. \end{aligned}$$
Repeating the trick in (A15)-(A18), we have
$$\begin{aligned} \sup _{t \in [0,\tau ]}\left| e^{-\hat{\Lambda }(t)}-e^{-\Lambda _0 (t)}\right| \rightarrow 0 \; a.s.. \end{aligned}$$
Finally, we summarize all usage of almost sure arguments to ensure that intersection of all almost sure sets still has probability one under \(\sigma \)-additivity. The steps (A15)–(A18) involves one almost sure argument for each choice of \(\tau ^*\). We preserve the almost sure property by restricting \(\tau ^*\) to be in the countable set \(\mathbb {Q}\). One almost sure argument is made for Helly’s selection theorem. In Lemma A1, we use the Glivenko-Cantelli Theorem to avoid the dependence on the choice of \(\varvec{\theta }^*\), so the almost sure argument is only applied once. Two more almost sure arguments are used in calculating the limit of the terms in (A20). \(\square \)
Proof of Theorem 3’
The proof is essentially the same as the Proof of Theorem 3, so the details are omitted. In fact, it is less technical due to the boundedness of \(\Lambda _0\) over \([0, \tau ']\). \(\square \)
1.3 A.3 Asymptotic normality
First, we provide the definition of several quantities below. In Theorem 4
\(\sigma (\mathbf {h})=\Big (\varvec{\sigma }_a(\mathbf {h}),\varvec{\sigma }_b(\mathbf {h}),\sigma _\eta (\mathbf {h})\Big )\) is
$$\begin{aligned} \varvec{\sigma }_a(\mathbf {h})= E\Bigg [&{\mathbf { Z}_1 }\bigg \{ -\int _0^{\tau '} K^{\varvec{\theta }_0}_1(\mathbf {h})(u)Y(u)d\phi ^{\varvec{\theta }_0}(u) \nonumber \\&+ K^{\varvec{\theta }_0}_2(\mathbf {h})Y(\tau ')\phi ^{\varvec{\theta }_0}(\tau ')\Big (1-\phi ^{\varvec{\theta }_0}(\tau ')\Big ) \bigg \}\Bigg ], \nonumber \\ \varvec{\sigma }_b(\mathbf {h})= E\Bigg [&{\mathbf { Z}_2 }\bigg \{\int _0^{\tau '} K^{\varvec{\theta }_0}_1(\mathbf {h})(u)Y(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}d\Big [\Lambda _0(u)\phi ^{\varvec{\theta }_0}(u)\Big ]\nonumber \\&- K^{\varvec{\theta }_0}_2(\mathbf {h})Y(\tau ')e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}\Lambda _0(\tau ')\phi ^{\varvec{\theta }_0}(\tau ')\Big (1-\phi ^{\varvec{\theta }_0}(\tau ')\Big )\bigg \}\Bigg ], \nonumber \\ \sigma _\eta (\mathbf {h})=E\Bigg [&e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}\bigg \{ K_1^{\varvec{\theta }_0}(\mathbf {h})(u)\phi ^{\varvec{\theta }_0}(u)Y(u) - K_2^{\varvec{\theta }_0}(\mathbf {h})Y(\tau ') \phi ^{\varvec{\theta }_0}(\tau ')\Big (1-\phi ^{\varvec{\theta }_0}(\tau ')\Big ) \nonumber \\&-\int _u^{\tau '} K^{\varvec{\theta }_0}_1(\mathbf {h})(s)\phi ^{\varvec{\theta }_0}(s)\Big (1-\phi ^{\varvec{\theta }_0}(s)\Big )Y(s)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}d\Lambda _0(s)\bigg \}\Bigg ], \end{aligned}$$
(A23)
where
$$\begin{aligned} K_1^{\varvec{\theta }}(\mathbf {h})(u)=\,&\mathbf {a}^\top {\mathbf { Z}_1 }\Big (1-\phi ^{\varvec{\theta }}(u)\Big ) +\mathbf {b}^\top {\mathbf { Z}_2 }\left\{ 1-\Big (1-\phi ^{\varvec{\theta }}(u)\Big ) \Lambda (u)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }}\right\} \nonumber \\&+\eta (u)-\Big (1-\phi ^{\varvec{\theta }}(u)\Big ) e^{\varvec{\beta }^\top {\mathbf { Z}_2 }} \int _0^u \eta d\Lambda , \nonumber \\ K_2^{\varvec{\theta }}(\mathbf {h})=&\,\Big \{\mathbf {a}^\top {\mathbf { Z}_1 }-\mathbf {b}^\top {\mathbf { Z}_2 }\Lambda (\tau ') e^{\varvec{\beta }^\top {\mathbf { Z}_2 }} -\int _0^{\tau '}\eta e^{\varvec{\beta }^\top {\mathbf { Z}_2 }} d\Lambda \Big \}. \end{aligned}$$
(A24)
Let \( \varvec{\theta }+t\mathbf {h}=\Big (\varvec{\alpha }+t\mathbf {a},\varvec{\beta }+t\mathbf {b},\int _0^\cdot (1+t\eta )d\Lambda \Big ) \). Define the directional derivatives
$$\begin{aligned} \lim _{t\rightarrow 0}\frac{l^I_n(\varvec{\theta }+t\mathbf {h})-l^I_n(\varvec{\theta })}{t} =S^{\varvec{\theta }}_n=S^{\varvec{\theta }}_{n,a}+S^{\varvec{\theta }}_{n,b}+S^{\varvec{\theta }}_{n,\eta }, \end{aligned}$$
where
$$\begin{aligned} S^{\varvec{\theta }}_{n,a}=&\frac{1}{n}\sum _{i=1}^n \mathbf {a}^\top {\mathbf { Z}_1 }_i \bigg \{\int _0^{\tau '} \Big (1-\phi _i^{\varvec{\theta }}(u)\Big ) dN_i(u)\\&-\int _0^{\tau '} Y_i(u)\phi _i^{\varvec{\theta }}(u)\Big (1-\phi _i^{\varvec{\theta }}(u)\Big ) e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}d\Lambda (u) \\&+\Big (N_i(\tau )-N_i(\tau ')\Big )\Big (1-\phi _i^{\varvec{\theta }}(\tau ')\Big ) -Y_i(\tau )\phi _i^{\varvec{\theta }}(\tau ')\bigg \}\\ S_{n,b}^{\varvec{\theta }}=&\frac{1}{n}\sum _{i=1}^n \mathbf {b}^\top {\mathbf { Z}_2 }_i \bigg [ \int _0^{\tau '} \left\{ 1-\Big (1-\phi _i^{\varvec{\theta }}(u)\Big ) \Lambda (u)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\right\} dN_i(u)\\&+\int _0^{\tau '} Y_i(u)\phi _i^{\varvec{\theta }}(u)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \left\{ \Big (1-\phi _i^{\varvec{\theta }}(u)\Big )\Lambda (u)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} -1 \right\} d\Lambda (u)\\&-\Big (N_i(\tau )-N_i(\tau ')\Big )\Big (1-\phi _i^{\varvec{\theta }} (\tau ')\Big )\Lambda (\tau ')e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}+Y_i(\tau )\phi _i^{\varvec{\theta }}(\tau ')\Lambda (\tau ')e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\bigg ]\\ S_{n,\eta }^{\varvec{\theta }} =&\frac{1}{n}\sum _{i=1}^n \int _0^{\tau '} \left[ \eta (u)-\Big \{1-\phi _i^{\varvec{\theta }}(u)\Big \} e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \int _0^u \eta d\Lambda \right] dN_i(u) \\&+ \int _0^{\tau '} Y_i(u)\phi _i^{\varvec{\theta }}(u)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \left[ \Big \{1-\phi _i^{\varvec{\theta }}(u)\Big \} e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \int _0^u \eta d\Lambda -\eta (u) \right] d\Lambda (u)\\&-\Big (N_i(\tau )-N_i(\tau ')\Big )\Big (1-\phi _i^{\varvec{\theta }}(\tau ')\Big )\int ^{\tau '}_0\eta d\Lambda e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\\&+Y_i(\tau )\phi _i^{\varvec{\theta }}(\tau ')\int ^{\tau '}_0\eta d\Lambda e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}. \end{aligned}$$
Their expectations are denoted as
$$\begin{aligned} S^{\varvec{\theta }}=S^{\varvec{\theta }}_a+S^{\varvec{\theta }}_b+S^{\varvec{\theta }}_\eta =E\left( S^{\varvec{\theta }}_{n,a}\right) +E\left( S^{\varvec{\theta }}_{n,b}\right) +E\left( S^{\varvec{\theta }}_{n,\eta }\right) . \end{aligned}$$
Again let \(\varvec{\theta }_0\) be the true parameter and \(\varvec{\theta }\) another element in the paramter space. Define \(\triangle \varvec{\theta }=\varvec{\theta }-\varvec{\theta }_0\) with
$$\begin{aligned} \triangle \varvec{\alpha }=\varvec{\alpha }-\varvec{\alpha }_0, \, \triangle \varvec{\beta }=\varvec{\beta }-\varvec{\beta }_0 \text { and } \triangle \Lambda (\cdot )=\Big \{\Lambda (\cdot )-\Lambda _0(\cdot )\Big \}. \end{aligned}$$
Define \(lin \Theta \) to be the linear space spanned by \(\{ \varvec{\theta }-\varvec{\theta }_0 : \varvec{\theta }\text { in parameter space}\}\). Let \(\varvec{\theta }_t = \varvec{\theta }_0+t\triangle \varvec{\theta }\). The functional Hessian is a linear operator \(lin \Theta \mapsto l^\infty (H_p)\) defined as
$$\begin{aligned} \dot{S}^{\varvec{\theta }_0}(\triangle \varvec{\theta })(\mathbf {h}) =&\lim _{t\rightarrow 0}\frac{S^{\varvec{\theta }_t }(\mathbf {h})-S^{\varvec{\theta }_0}(\mathbf {h})}{t} \nonumber \\ =&-\triangle \varvec{\alpha }^\top \varvec{\sigma }_a(\mathbf {h}) -\triangle \varvec{\beta }^\top \varvec{\sigma }_b (\mathbf {h}) -\int _0^{\tau '} \sigma _\eta (\mathbf {h})(u)d\triangle \Lambda (u) \end{aligned}$$
(A25)
with \(\sigma \) defined in (A23).
The following Lemma A2 is used in the proofs of Theorems 4 and 5. It tells us about the property of \(\sigma \), the essential element in the functional Hessian.
Lemma A2
Let the operator \(\sigma : (\mathbf {a},\mathbf {b},\eta ) \mapsto \Big (\varvec{\sigma }_a(\mathbf {h}),\varvec{\sigma }_b(\mathbf {h}),\sigma _\eta (\mathbf {h})\Big )\) be defined as in (A23). Under the conditions of Theorem 4, \(\sigma \) is a continuously invertible bijection from \(H_\infty \) to \(H_\infty \).
Proof of Lemma A2
First we prove that \(\sigma \) is injection by an identifiability argument. Define an inner-product between \(\sigma (\mathbf {h})\) and \(\mathbf {h}\) as
$$\begin{aligned} \Big <\sigma (\mathbf {h}),\mathbf {h}\Big >=&\, \mathbf {a}^\top \varvec{\sigma }_a(\mathbf {h})+\mathbf {b}^\top \varvec{\sigma }_b(\mathbf {h})+\int _0^{\tau '}\sigma _\eta (\mathbf {h})(u)\eta (u) d\Lambda _0(u) \\ =&\int _0^{\tau '}E\left[ \big \{K^{\varvec{\theta }_0}_1(\mathbf {h})(u)\big \}^2Y(u)\phi ^{\varvec{\theta }_0}(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}\right] d\Lambda _0(u)\\&+E\left[ \big \{ K^{\varvec{\theta }_0}_2(\mathbf {h})\big \}^2Y(\tau ')\phi ^{\varvec{\theta }_0}(\tau ')\Big (1-\phi ^{\varvec{\theta }_0}(\tau ')\Big ) \right] . \end{aligned}$$
If \(\Big <\sigma (\mathbf {h}),\mathbf {h}\Big >=0\), we have almost surely \(K^{\varvec{\theta }_0}_2(\mathbf {h})=0\) and \(K^{\varvec{\theta }_0}_1(\mathbf {h})(u)=0\) a.e. \(u \in [0, \tau ']\). Therefore,
$$\begin{aligned} \int _0^t K^{\varvec{\theta }_0}_1(\mathbf {h})(u)\phi ^{\varvec{\theta }_0}(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}d\Lambda _0(u)=0, \forall t\in [0,\tau '], a.s.. \end{aligned}$$
Calculating the integral, we have for for any \(t\in [0,\tau ']\) a.s.
$$\begin{aligned} -\mathbf {a}^\top {\mathbf { Z}_1 }\phi ^{\varvec{\theta }_0}(t)+\mathbf {b}^\top {\mathbf { Z}_2 }\phi ^{\varvec{\theta }_0}(t)\Lambda _0(t)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}+\int _0^t\eta (u)d\Lambda _0(u)\phi ^{\varvec{\theta }_0}(t)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}=0. \end{aligned}$$
Setting \(t=0\), we have \(-\mathbf {a}^\top {\mathbf { Z}_1 }\phi ^{\varvec{\theta }_0}(0)=0\), so \(\mathbf {a}^\top {\mathbf { Z}_1 }=0\). By Assumption 2, \(\mathbf {a}=0\). Plugging \(\mathbf {a}=0\) into \(K^{\varvec{\theta }_0}_2\) yields
$$\begin{aligned} K^{\varvec{\theta }_0}_2(\mathbf {h})= e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}\Big \{\mathbf {b}^\top {\mathbf { Z}_2 }\Lambda _0(\tau ')-\int _0^{\tau '}\eta (u)d\Lambda _0(u)\Big \}=0, a.s.. \end{aligned}$$
Again, \(\mathbf {b}^\top {\mathbf { Z}_2 }= \int _0^{\tau '}\eta (u)d\Lambda _0(u)/\Lambda _0(\tau ')\) is deterministic, so \(\mathbf {b}=0\). This way \(\eta \) must also be constantly zero. As a result, \(\sigma (\mathbf {h})=\sigma (\mathbf {h}') \Rightarrow \Big (\sigma (\mathbf {h}-\mathbf {h}'),\mathbf {h}-\mathbf {h}'\Big )=0 \Rightarrow \mathbf {h}=\mathbf {h}'\).
To show it is a bijection, we apply Theorem 3.11 in Conway (1990). It suffices to decompose \(\sigma \) as the sum of one invertible operator and one compact operator. The invertible operator is defined as
$$\begin{aligned} \Sigma (\mathbf {h})=\Big (E\left( {\mathbf { Z}_1 }{\mathbf { Z}_1 }^\top \right) \mathbf {a},E\left( {\mathbf { Z}_2 }{\mathbf { Z}_2 }^\top \right) \mathbf {b}, \eta (t)E\left\{ e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}\phi ^{\varvec{\theta }_0}(t)Y(t)\right\} \Big ). \end{aligned}$$
Since \(E\left( {\mathbf { Z}_1 }{\mathbf { Z}_1 }^\top \right) \), \(E\left( {\mathbf { Z}_2 }{\mathbf { Z}_2 }^\top \right) \) are both positive definite, and \(\inf _{t\in [0,\tau ']}Ee^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}\phi ^{\varvec{\theta }_0}(t)Y(t)>0\), the inverse exists as
$$\begin{aligned} \Sigma ^{-1}(\mathbf {h})=\Big (\left[ E\big \{{\mathbf { Z}_1 }{\mathbf { Z}_1 }^\top \big \}\right] ^{-1}\mathbf {a},\left[ E\big \{{\mathbf { Z}_2 }{\mathbf { Z}_2 }^\top \big \}\right] ^{-1} \mathbf {b}, \eta (t)\left[ E\big \{e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}\phi ^{\varvec{\theta }_0}(t)Y(t)\big \}\right] ^{-1}\Big ). \end{aligned}$$
For the compactness of \(\sigma (\mathbf {h})-\Sigma (\mathbf {h})\), classical Helly-selection plus dominated convergence method applies as all terms are conveniently bounded. \(\square \)
The Proof of Theorem 4 is the application of Theorem 3.3.1 from Van der Vaart and Wellner (1996). We shall verify all the required conditions for the Theorem.
Proof of Theorem 4
Since we work under a modified Assumption 3’ now, the martingale representation in (15) needs to change accordingly beyond \(\tau '\). We still use \(M_i(t)\) as the notation. Define the filtrations \(\big \{\mathcal {F}_t: t \in [0,\tau ] \big \}\). On \([0,\tau ']\), \(\mathcal {F}_t\) is the natural \(\sigma \)-algebra generated by \(\{N_i(t), Y_i(t), {\mathbf { Z}_1 }_i, {\mathbf { Z}_2 }_i, i = 1, \ldots , n\}\). Since there is no extra information in the tail window \((\tau ', \tau )\), we set \(\mathcal {F}_t =\mathcal {F}_{\tau '}\) for \(t \in (\tau ', \tau )\). \(\mathcal {F}_\tau \) is the \(\sigma \)-algebra generated by \(\{N_i(\tau )-N_i(\tau '), Y_i(\tau ), {\mathbf { Z}_1 }_i, {\mathbf { Z}_2 }_i, i = 1, \ldots , n\}\), where \(Y_i(\tau ) = Y_i(\tau ') - dN_i(\tau ')\) is measurable in \(\mathcal {F}_{\tau '}\). The filtrations on \([0,\tau ']\) stay the same, so \(M_i(t)\) defined in (15) is still a martingale up to time \(\tau '\). In the tail window \((\tau ', \tau )\), we set \(M_i(t)\) constantly equals \(M_i(\tau ')\). To extend its definition to time \(\tau \), we define
$$\begin{aligned} d M_i(\tau ) = M_i(\tau ) - M_i(\tau ') = \big \{N_i(\tau )-N_i(\tau ')\big \} - Y_i(\tau ) \phi ^{\varvec{\theta }_0}_i(\tau '). \end{aligned}$$
(A26)
It is easy to verify that \(E[ M_i(\tau )| \mathcal {F}_{\tau '}] = M_i(\tau ')\), so \(M_i(t)\) thus defined is a martingale with respect to the new filtrations \(\big \{\mathcal {F}_t: t \in [0,\tau '] \cup \{\tau \}\big \}\). Analogously, we define the process \(M^{\varvec{\theta }}_i(\cdot )\) which replaces the true parameter \(\varvec{\theta }_0\) in \(M_i(\cdot )\) by arbitrary \(\varvec{\theta }\) in the parameter space. Apparently, \(M^{\varvec{\theta }_0}_i(\cdot ) = M_i(\cdot )\). From here, we establish the needed results based on the martingale theory.
First, we prove weak convergence of the empirical score
$$\begin{aligned} \sqrt{n}(S^{\varvec{\theta }_0}_n-S^{\varvec{\theta }_0}){\mathop {\longrightarrow }\limits ^{l^\infty (H_p)}} \mathcal {W}. \end{aligned}$$
(A27)
Notice that \(S_1^{\varvec{\theta }_0}-S^{\varvec{\theta }_0}\) is a martingale integral with respect to (A26). The weak convergence follows from martingale central limit theorem. The covariance process is given by the expectation of its quadratic variation:
$$\begin{aligned}&\text {Cov}\big (\mathscr {G}(\mathbf {h}),\mathscr {G}(\mathbf {h}^*)\big )=E\Big [ \int _0^{\tau '} K^{\varvec{\theta }_0}_1(\mathbf {h}) K^{\varvec{\theta }_0}_1(\mathbf {h}^*) Y(u)\phi _0(u)e^{\varvec{\beta }_0^\top {\mathbf { Z}_2 }}d\Lambda _0(u) \\&\qquad \qquad \quad \quad \quad \quad \quad \quad \quad + K^{\varvec{\theta }_0}_2(\mathbf {h}) K^{\varvec{\theta }_0}_2(\mathbf {h}^*)\phi _0(\tau ')\big \{1-\phi _0(\tau ')\big \} \Big ], \end{aligned}$$
where \(K_1\) and \(K_2\) are defined as in (A24).
Next, we verify the approximation condition
$$\begin{aligned} \sqrt{n}\left( S_n^{\hat{\varvec{\theta }}}-S^{\hat{\varvec{\theta }}} - S_n^{\varvec{\theta }_0}-S^{\varvec{\theta }_0}\right) = o_p(1). \end{aligned}$$
(A28)
Consider the class \(\{S_1^{\varvec{\theta }}(\mathbf {h})-S_1^{\varvec{\theta }_0}(\mathbf {h}): \Vert \varvec{\theta }-\varvec{\theta }_0\Vert \le \varepsilon , \mathbf {h}\in H_p \}\). All terms involved in this class are uniformly bounded with uniformly bounded variation, so it is a Donsker class for the set of observable random variables. By checking that \(\phi _i^{\varvec{\theta }}\) is Lipschitz in \(\varvec{\theta }\) under the \(l^\infty (H_p)\) norm, we have almost surely
$$\begin{aligned} \sup _{t,{\mathbf { Z}_2 },{\mathbf { Z}_1 }} |\phi _i^{\varvec{\theta }}(t)-\phi _i^{\varvec{\theta }_0}(t)| = O\left( \Vert \varvec{\theta }-\varvec{\theta }_0\Vert \right) , \end{aligned}$$
and similarly
$$\begin{aligned} \sup _{t,{\mathbf { Z}_2 },{\mathbf { Z}_1 }} |\phi _i^{\varvec{\theta }}(t) {\Lambda }(t)-\phi _i^{\varvec{\theta }_0}(t)\Lambda _0(t)| =O\left( \Vert \varvec{\theta }-\varvec{\theta }_0\Vert \right) . \end{aligned}$$
For a single summand in the score,
$$\begin{aligned} \sup _{h\in H_p}E[S_1^{\varvec{\theta }}(\mathbf {h})-S_1^{\varvec{\theta }_0}(\mathbf {h})]^2 =O\left( \Vert \varvec{\theta }-\varvec{\theta }_0\Vert ^2\right) . \end{aligned}$$
We plug \(\hat{\varvec{\theta }}\) into the expression above. Thus, the variance of the limiting process of (A28) is o(1) by the consistency of \(\hat{\varvec{\theta }}\) from Theorem 3’, so the process itself is \(o_p(1)\).
We then show the Fréchet differentiability of expected score S at \(\varvec{\theta }_0\) in the direction of \(\hat{\varvec{\theta }}-\varvec{\theta }_0\),
$$\begin{aligned} S^{\hat{\varvec{\theta }}_t}-S^{\varvec{\theta }_0}=t\dot{S}^{\varvec{\theta }_0} (\hat{\varvec{\theta }}-\varvec{\theta }_0)+o_p(t\Vert \hat{\varvec{\theta }}-\varvec{\theta }_0\Vert ). \end{aligned}$$
(A29)
We use a shorthand notation for the expected score at \(\varvec{\theta }\):
$$\begin{aligned} S^{\varvec{\theta }}(\mathbf {h})&= E\left[ \int _0^{\tau '} K_1^{\varvec{\theta }}(\mathbf {h})(u)dM^{\varvec{\theta }}(u) + K_2^{\varvec{\theta }}(\mathbf {h}) d M^{\varvec{\theta }}(\tau )\right] \\&= E\left[ \int _0^{\tau } V^{\varvec{\theta }}(\mathbf {h})(u) d M^{\varvec{\theta }}(u)\right] , \end{aligned}$$
by setting
$$\begin{aligned} V^{\varvec{\theta }}(\mathbf {h})(t) = I(t \le \tau ')K_1^{\varvec{\theta }}(\mathbf {h})(t) + I(t=\tau ) K_2^{\varvec{\theta }}(\mathbf {h}). \end{aligned}$$
By the Lipschitz continuity with respect to \(\Vert \varvec{\theta }\Vert \) for all terms involved, \( K_1^{\varvec{\theta }}(\mathbf {h})\), \(K_2^{\varvec{\theta }}(\mathbf {h})\) and \(dM^{\varvec{\theta }}\),
$$\begin{aligned}&S^{ {\varvec{\theta }}_t}(\mathbf {h})-S^{\varvec{\theta }}(\mathbf {h}) \\&\quad = E\left[ \int _0^{\tau '} V^{ {\varvec{\theta }}_t}(\mathbf {h})(u)dM^{ {\varvec{\theta }}_t}(u) \right] \\&\quad = E\left[ \int _0^{\tau '} V^{\varvec{\theta }_0} (\mathbf {h})(u)d\big \{M^{ {\varvec{\theta }}_t}(u)- M^{\varvec{\theta }_0}(u)\big \}\right] +E\left[ \int _0^{\tau '}V^{ {\varvec{\theta }}_t}(\mathbf {h})(u)dM^{\varvec{\theta }_0}(u)\right] \\&\quad \quad + E\left[ \int _0^{\tau '}\big \{V^{ {\varvec{\theta }}_t}(\mathbf {h})(u)- V^{\varvec{\theta }_0}(\mathbf {h})(u)\big \}d\big \{M^{ {\varvec{\theta }}_t}(u)- M^{\varvec{\theta }_0}(u)\big \} \right] \\&\quad = t\dot{S}^{\varvec{\theta }_0}( {\varvec{\theta }}-\varvec{\theta }_0)(\mathbf {h})+0+O_p(t^2\Vert {\varvec{\theta }}-\varvec{\theta }_0\Vert ^2). \end{aligned}$$
Again, we plug-in \(\hat{\varvec{\theta }}\) and use the consistency result to verify the condition (A29).
Afterwards, we find the local inverse of the functional Hessian in (A25). We have shown in Lemma A2 that the functional operator \(\sigma \) is a continuously invertible bijection from \(H_\infty \) to \(H_\infty \). The invertibility of \(\dot{S}^{\varvec{\theta }_0}\) in \(H_p\) follows from the following argument. By the continuous invertibility of \(\sigma \), there is some q so that \(\sigma ^{-1}(H_q) \subseteq H_p\), and
$$\begin{aligned}&\inf _{\triangle \varvec{\theta }\in lin \Theta } \frac{\sup _{\mathbf {h}\in H_p}|(\varvec{\alpha }-\varvec{\alpha }_0)^\top \varvec{\sigma }_a(\mathbf {h}) +(\varvec{\beta }-\varvec{\beta }_0)^\top \varvec{\sigma }_b (\mathbf {h}) +\int _0^{\tau '}\sigma _\eta (\mathbf {h}) d(\Lambda -\Lambda _0)|}{ \Vert \triangle \varvec{\theta }\Vert _{l^\infty (H_p)}\ } \nonumber \\&\quad \ge \inf _{\triangle \varvec{\theta }\in lin \Theta } \frac{\sup _{\mathbf {h}\in \sigma ^{-1}(H_q)}|(\varvec{\alpha }-\varvec{\alpha }_0)^\top \varvec{\sigma }_a(\mathbf {h}) +(\varvec{\beta }-\varvec{\beta }_0)^\top \varvec{\sigma }_b (\mathbf {h}) +\int _0^{\tau '}\sigma _\eta (\mathbf {h}) d(\Lambda -\Lambda _0)|}{ p\Vert \triangle \varvec{\theta }\Vert } \nonumber \\&\quad =\inf _{\triangle \varvec{\theta }\in lin \Theta } \frac{\sup _{\mathbf {h}\in H_q}| \triangle \varvec{\theta }(\mathbf {h}) |}{ p\Vert \triangle \varvec{\theta }\Vert } > \frac{q }{2p}. \end{aligned}$$
(A30)
Finally, let us put everything together. The NPMLE \(\hat{\varvec{\theta }}\) is shown to be consistent in Theorem 3’, and (A27), (A28), (A29) and (A30) verify the conditions of Theorem 3.3.1 from Van der Vaart and Wellner (1996). \(\square \)
Proof of Theorem 5
The proof for the continuous invertibility of \(\hat{\sigma }\) is similar to the Proof of Lemma A2. The approximation error between the natural estimator \(\hat{\sigma }\) and Louis’ formula variance estimator using (14) again comes from the “ghost copies” like the case in Lemma 1, so the same argument applies to show their asymptotic equivalence. \(\square \)
Appendix B: Variance Estimator
1.1 B.1 Derivatives of log-likelihood
Let \(l^c(\varvec{\alpha },\varvec{\beta },{\varvec{\lambda }})=\sum _{i=1}^n l^c_i(\varvec{\alpha },\varvec{\beta },{\varvec{\lambda }})\) be the complete data log-likelihood,
$$\begin{aligned} l^c_i(\varvec{\alpha },\varvec{\beta },{\varvec{\lambda }}) =&\, (A_i+M_i) \varvec{\alpha }^\top {\mathbf { Z}_1 }_i -(1+M_i)\log (1+e^{\varvec{\alpha }^\top {\mathbf { Z}_1 }_i})\\&+\delta ^1_i A_i \sum _{k=1}^K I\{X_i=t_k\}(\log \lambda _k +\varvec{\beta }^\top {\mathbf { Z}_2 }_i) - A_i \sum _{k:t_k \le X_i} \lambda _k e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\\&+M_i\sum _{k:t_k<Q_i} I\{\kappa _i=k\}\Big (\log \lambda _k+\varvec{\beta }^\top {\mathbf { Z}_2 }_i-\sum _{h=1}^k \lambda _h e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\Big ). \end{aligned}$$
Its gradient is given by
$$\begin{aligned} \nabla l^c_i=\left( \frac{\partial l^c_i}{\partial \varvec{\alpha }}, \frac{\partial l^c_i}{\partial \varvec{\beta }}, \frac{\partial l^c_i}{\partial {\varvec{\lambda }}}\right) ^\top , \end{aligned}$$
where
$$\begin{aligned} \frac{\partial l^c_i}{\partial \varvec{\alpha }} =&\, {\mathbf { Z}_1 }_i\Big \{A_i+M_i-(1+M_i)\frac{e^{\varvec{\alpha }^\top {\mathbf { Z}_1 }_i}}{1+e^{\varvec{\alpha }^\top {\mathbf { Z}_1 }_i}}\Big \} = {\mathbf { Z}_1 }_i\big \{A_i-p_i+M_i(1-p_i)\big \}, \\ \frac{\partial l^c_i}{\partial \varvec{\beta }} =&\, {\mathbf { Z}_2 }_i \bigg \{A_i \delta ^1_i +M_i-\Big (A_i \sum _{k:t_k \le X_i}\lambda _k +M_i \sum _{k=1}^{\kappa _i} \lambda _k \Big )e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\bigg \} \\ =&\, {\mathbf { Z}_2 }_i \Big \{A_i \delta ^1_i +M_i-A_i \Lambda _i(X_i) -M_i \Lambda _i(\kappa _i)\Big \}, \\ \frac{\partial l^c_i}{\partial \lambda _k} =&\, \Big (A_i \delta ^1_i I\{X_i=t_k\}+M_i I\{\kappa _i=k\}\Big )\frac{1}{\lambda _k}-\Big (A_i I\{t_k \le X_i\}+M_iI\{\kappa _i \ge t_k\}\Big ) e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\\ =&\, A_i\Big ( \frac{\delta ^1_i I\{X_i=t_k\}}{\lambda _k}-I\{t_k \le X_i\}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\Big ) + M_i\Big ( \frac{I\{\kappa _i=k\}}{\lambda _k}- I\{\kappa _i \ge t_k\} e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\Big ). \end{aligned}$$
Its Hessian is given by
$$\begin{aligned} \nabla ^2 l^c_i=\left( \begin{array}{ccc} \frac{\partial ^2 l^c_i}{\partial \varvec{\alpha } \partial \varvec{\alpha }^\top } &{} 0 &{} 0 \\ 0 &{} \frac{\partial ^2 l^c_i}{\partial \varvec{\beta } \partial \varvec{\beta }^\top } &{} \frac{\partial ^2 l^c_i}{\partial \varvec{\beta } \partial \varvec{\lambda }^\top } \\ 0 &{} \left[ {\frac{\partial ^2 l^c_i}{\partial \varvec{\beta } \partial \varvec{\lambda }^\top }} \right] ^\top &{} \text {diag}(\frac{\partial ^2 l^c_i}{\partial \lambda _k^2 }) \end{array}\right) , \end{aligned}$$
where
$$\begin{aligned} \frac{\partial ^2 l^c_i}{\partial \varvec{\alpha } \partial \varvec{\alpha }^\top } =&\, {\mathbf { Z}_1 }_i {\mathbf { Z}_1 }_i^\top \Big \{-(1+M_i)\frac{e^{\varvec{\alpha }^\top {\mathbf { Z}_1 }_i}}{(1+e^{\varvec{\alpha }^\top {\mathbf { Z}_1 }_i})^2}\Big \} = - {\mathbf { Z}_1 }_i {\mathbf { Z}_1 }_i^\top (1+M_i)p_i(1-p_i), \\ \frac{\partial ^2 l^c_i}{\partial \varvec{\beta } \partial \varvec{\beta }^\top } =&\, {\mathbf { Z}_2 }_i {\mathbf { Z}_2 }_i^\top \bigg \{-\Big (A_i \sum _{k:t_k \le X_i}\lambda _k +M_i \sum _{k=1}^{\kappa _i} \lambda _k \Big )e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\bigg \}, \\ \frac{\partial ^2 l^c_i}{\partial \varvec{\beta }\partial \lambda _k} =&\, {\mathbf { Z}_2 }_i \bigg \{-\Big (A_i I\{t_k \le X_i\} +M_i I\{t_k \le \kappa _i\} \Big )e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\bigg \}, \\ \frac{\partial ^2 l^c_i}{\partial \lambda _k^2 } =&-\Big (A_i \delta ^1_i I\{X_i=t_k\}+M_i I\{\kappa _i=k\}\Big )\frac{1}{\lambda _k^2}, \\ \frac{\partial ^2 l^c_i}{\partial \varvec{\alpha } \partial \varvec{\beta }^\top } =&\frac{\partial ^2 l^c_i}{\partial \varvec{\alpha } \partial \varvec{\lambda }^\top } = \frac{\partial ^2 l^c_i}{\partial \lambda _k \partial \lambda _h }=0, \ \ \ \ k\ne h. \end{aligned}$$
1.2 B.2 Conditional expectations
By the conditional expectations (8)–(10), we are able to calculate the ‘first order’ conditional expectations, \(E[\nabla l^c_i|\mathcal {O}]\) and \(E[\nabla ^2 l^c_i|\mathcal {O}]\):
$$\begin{aligned} E&\left[ \frac{\partial l^c_i}{\partial \varvec{\alpha }} \right] = {\mathbf { Z}_1 }_i\Big \{E(A_i)-p_i+E(M_i)(1-p_i)\Big \}, \\ E&\left[ \frac{\partial l^c_i}{\partial \varvec{\beta }} \right] = {\mathbf { Z}_2 }_i \bigg [E(A_i) \Big \{\delta ^1_i+\log S_i(X_i)\Big \} \\&\qquad \qquad \quad +E(M_i) \Big \{1+\sum _{k:t_k<Q_j}P(\tilde{T}_{ij}=t_k) \log S_i(t_k)\Big \}\bigg ], \\ E&\left[ \frac{\partial l^c_i}{\partial \lambda _k} \right] =E(A_i)\Big \{\frac{\delta ^1_i I\{t_k = X_i\}}{\lambda _k}-I\{t_k \le X_i\}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\Big \}\\&\qquad \qquad \quad +E(M_i)\Big \{\frac{P(\tilde{T}_{ij}=t_k)}{\lambda _k}-P(\tilde{T}_{ij} \ge t_k)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\Big \}. \\ E&\left[ \frac{\partial ^2 l^c_i}{\partial \varvec{\alpha } \partial \varvec{\alpha }^\top } \right] = - {\mathbf { Z}_1 }_i {\mathbf { Z}_1 }_i^\top (1+E(M_i))p_i(1-p_i), \\ E&\left[ \frac{\partial ^2 l^c_i}{\partial \varvec{\beta } \partial \varvec{\beta }^\top } \right] = {\mathbf { Z}_2 }_i {\mathbf { Z}_2 }_i^\top \Big \{ E(A_i) \log S_i(X_i) +E(M_i) \sum _{k:t_k<Q_i}P(\tilde{T}_{ij}=t_k) \log S_i(t_k)\Big \}, \\ E&\left[ \frac{\partial ^2 l^c_i}{\partial \varvec{\beta }\partial \lambda _k} \right] = -{\mathbf { Z}_2 }_i \Big \{E(A_i) I\{t_k \le X_i\} +E(M_i) P(t_k \le \kappa _i) \Big \}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}, \\ E&\left[ \frac{\partial ^2 l^c_i}{\partial \lambda _k^2 } \right] = -\Big \{E(A_i) \delta ^1_i I\{\tilde{T}_{ij}=t_k\}+E(M_i) P(\tilde{T}_{ij}=t_k)\Big \}\frac{1}{\lambda _k^2}. \end{aligned}$$
To calculate ‘second order’ expectation \(E[\nabla l^c_i{\nabla l^c_i}^\top |\mathcal {O}]\), we first compute the conditional variances:
$$\begin{aligned} \text {Var}&[A_i|\mathcal {O}] = \delta ^c_i\frac{p_i(1-p_i)S_i(X_i)}{\big \{1-p_i+p_iS_i(X_i)\big \}^2}, \\ \text {Var}&[M_i|\mathcal {O}] = \frac{p_i\Big [1-S_i(Q_i)\big \}}{\big \{1-p_i+p_iS_i(Q_i)\big \}^2}. \end{aligned}$$
Then,
$$\begin{aligned}&E\left[ \frac{\partial l^c_i}{\partial \varvec{\alpha }} { \frac{\partial l^c_i}{\partial \varvec{\alpha }}}^\top \right] = E\left[ \frac{\partial l^c_i}{\partial \varvec{\alpha }} \right] E\left[ \frac{\partial l^c_i}{\partial \varvec{\alpha }} \right] ^\top +{\mathbf { Z}_1 }_i {\mathbf { Z}_1 }_i^\top \big \{(1-p_i)^2 \text {Var}(M_i)+ \text {Var}(A_i)\big \},\\&\quad E\left[ \frac{\partial l^c_i}{\partial \varvec{\alpha }} { \frac{\partial l^c_i}{\partial \varvec{\beta }}}^\top \right] =E\left[ \frac{\partial l^c_i}{\partial \varvec{\alpha }}\right] E\left[ \frac{\partial l^c_i}{\partial \varvec{\beta }} \right] ^\top +{\mathbf { Z}_1 }_i {\mathbf { Z}_2 }_i^\top \bigg [ \text {Var}(A_i)\big \{\delta ^1_i+\log S_i(X_i)\big \}\\&\qquad \qquad \qquad \qquad \qquad + \text {Var}(M_i)(1-p_i)\Big \{1+\sum _{k:t_k<Q_i}P(\tilde{T}_{ij}=t_k)\log S_i(t_k)\Big \}\bigg ],\\&\quad E\left[ \frac{\partial l^c_i}{\partial \varvec{\beta }} {\frac{\partial l^c_i}{\partial \varvec{\beta }}}^\top \right] =E\left[ \frac{\partial l^c_i}{\partial \varvec{\beta }} \right] E\left[ \frac{\partial l^c_i}{\partial \varvec{\beta }} \right] ^\top +{\mathbf { Z}_2 }_i {\mathbf { Z}_2 }_i^\top \bigg [ \text {Var}(A_i)\Big \{\delta ^1_i+\log S_i(X_i)\Big \}^2\\&\qquad \qquad \qquad \qquad \qquad + \text {Var}(M_i)\Big \{1+\sum _{k:t_k<Q_i}P(\tilde{T}_{ij}=t_k)\log S_i(t_k)\Big \}^2\\&\qquad \qquad \qquad \qquad \qquad +E(M_i)\Big \{\sum _{k:t_k<Q_i}P(\tilde{T}_{ij}=t_k)\log S_i(t_k)^2\\&\qquad \qquad \qquad \qquad \qquad -\Big (\sum _{k:t_k<Q_i}P(\tilde{T}_{ij}=t_k)\log S_i(t_k)\Big )^2\Big \}\bigg ],\\&\quad E\left[ \frac{\partial l^c_i}{\partial \varvec{\alpha }} \frac{\partial l^c_i}{\partial \lambda _k} \right] =E\left[ \frac{\partial l^c_i}{\partial \varvec{\alpha }}\right] E\left[ \frac{\partial l^c_i}{\partial \lambda _k}\right] \\&\qquad \qquad \qquad \qquad \quad + {\mathbf { Z}_1 }_i \bigg [\text {Var}(A_i) \Big \{\frac{\delta ^1_i I\{t_k = X_i\}}{\lambda _k}-I\{t_k \le X_i\}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\Big \} \\&\qquad \qquad \qquad \qquad \quad +\text {Var}(M_i)(1-p_i)\Big \{\frac{P(\tilde{T}_{ij}=t_k)}{\lambda _k}-P(\tilde{T}_{ij} \ge t_k)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\Big \}\bigg ],\\&\quad E\left[ \frac{\partial l^c_i}{\partial \varvec{\beta }} \frac{\partial l^c_i}{\partial \lambda _k} \right] =E\left[ \frac{\partial l^c_i}{\partial \varvec{\beta }}\right] E\left[ \frac{\partial l^c_i}{\partial \lambda _k}\right] \\&\qquad \qquad \qquad \qquad \quad +{\mathbf { Z}_2 }_i \bigg [ \text {Var}(A_i)\big \{\delta ^1_i+\log S_i(X_i)\big \}\\&\qquad \qquad \qquad \qquad \qquad \Big \{\frac{\delta ^1_i I\{t_k = X_i\}}{\lambda _k}-I\{t_k \le X_i\}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\Big \} \\&\qquad \qquad \qquad \qquad \quad + \text {Var}(M_i)\Big \{\frac{P(\tilde{T}_{ij}=t_k)}{\lambda _k}-P(\tilde{T}_{ij} \ge t_k)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\Big \}\\&\qquad \qquad \qquad \qquad \qquad \Big \{1+\sum _{h:t_h<Q_i}P(\tilde{T}_{ij}=t_h)\log S_i(t_h)\Big \}\\&\qquad \qquad \qquad \qquad \quad - E(M_i)\Big \{\sum _{h:t_h<Q_i}P(\tilde{T}_{ij}=t_h)\log S_i(t_h)\frac{P(\tilde{T}_{ij}=t_k)}{\lambda _k}\\&\qquad \qquad \qquad \qquad \quad -\frac{P(\tilde{T}_{ij}=t_k)\log S_i(t_k)}{\lambda _k}\\&\qquad \qquad \qquad \qquad \quad -P\{\tilde{T}_{ij} \ge t_k\}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \sum _{h:t_h<Q_i}P(\tilde{T}_{ij}=t_h)\log S_i(t_h)\\&\qquad \qquad \qquad \qquad \quad +e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\sum _{h =k}^{t_h<Q_i}P(\tilde{T}_{ij}=t_h)\log S_i(t_h)\Big \}\bigg ], \\ \end{aligned}$$
$$\begin{aligned}&\quad E\left[ \frac{\partial l^c_i}{\partial \lambda _k} \frac{\partial l^c_i}{\partial \lambda _h} \right] =EA_i \left\{ -\frac{\delta ^1_i I\{X_i =t_{k\vee h}\}}{\lambda _{k\vee h}}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}+ I\{X_i\ge t_{k\vee h}\}e^{2\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\right\} \\&\qquad \qquad \qquad \qquad \quad +E(A_i) E(M_i) \left\{ \frac{\delta ^1_i I\{\tilde{T}_{ij}=t_k\}}{\lambda _k}- I\{X_i\ge t_k\}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\right\} \\&\qquad \qquad \qquad \qquad \qquad \left\{ \frac{P(\tilde{T}_{ij}=t_h)}{\lambda _h}-P(\tilde{T}_{ij} \ge t_h)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \right\} \\&\qquad \qquad \qquad \qquad \quad +E(A_i) E(M_i) \left\{ \frac{\delta ^1_i I\{X_i =t_h\}}{\lambda _h}- I\{X_i\ge t_h\}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\right\} \\&\qquad \qquad \qquad \qquad \qquad \left\{ \frac{P(\tilde{T}_{ij}=t_k)}{\lambda _k}-P(\tilde{T}_{ij} \ge t_k)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \right\} \\&\qquad \qquad \qquad \qquad \quad + E[M_i^2-M_i] \left\{ \frac{P(\tilde{T}_{ij}=t_k)}{\lambda _k}-P(\tilde{T}_{ij} \ge t_k)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \right\} \\&\qquad \qquad \qquad \qquad \qquad \left\{ \frac{P(\tilde{T}_{ij}=t_h)}{\lambda _h}-P(\tilde{T}_{ij} \ge t_h)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \right\} \\&\qquad \qquad \qquad \qquad \quad +E(M_i) \left\{ -\frac{P(\tilde{T}_{ij} =t_{k\vee h})}{\lambda _{k \vee h}}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} +P(\kappa _i\ge t_{k \vee h})e^{2\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \right\} ,\\&\quad E\left[ \frac{\partial l^c_i}{\partial \lambda _k} \frac{\partial l^c_i}{\partial \lambda _k} \right] =EA_i \left\{ \frac{\delta ^1_i I\{\tilde{T}_{ij}=t_k\}}{\lambda _k}- I\{X_i\ge t_k\}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\right\} ^2\\&\qquad \qquad \qquad \qquad \quad +E(A_i) E(M_i) \left\{ \frac{\delta ^1_i I\{\tilde{T}_{ij}=t_k\}}{\lambda _k}- I\{X_i\ge t_k\}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\right\} \\&\qquad \qquad \qquad \qquad \qquad \left\{ \frac{P(\tilde{T}_{ij}=t_k)}{\lambda _k}-P(\tilde{T}_{ij} \ge t_k)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \right\} \\&\qquad \qquad \qquad \qquad \quad +E(A_i) E(M_i) \left\{ \frac{\delta ^1_i I\{\tilde{T}_{ij}=t_k\}}{\lambda _k}- I\{X_i\ge t_k\}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\right\} \\&\qquad \qquad \qquad \qquad \qquad \left\{ \frac{P(\tilde{T}_{ij}=t_k)}{\lambda _k}-P(\tilde{T}_{ij} \ge t_k)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \right\} \\&\qquad \qquad \qquad \qquad \quad + E[M_i^2-M_i] \left\{ \frac{P(\tilde{T}_{ij}=t_k)}{\lambda _k}-P(\tilde{T}_{ij} \ge t_k)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \right\} \\&\qquad \qquad \qquad \qquad \qquad \left\{ \frac{P(\tilde{T}_{ij}=t_k)}{\lambda _k}-P(\tilde{T}_{ij} \ge t_k)e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \right\} \\&\qquad \qquad \qquad \qquad \quad +E(M_i) \left\{ \frac{P(\tilde{T}_{ij}=t_k)}{\lambda ^2_k}- 2\frac{P(\tilde{T}_{ij}=t_k)}{\lambda _k}e^{\varvec{\beta }^\top {\mathbf { Z}_2 }_i}\right. \\&\qquad \qquad \qquad \qquad \quad \left. +P(\tilde{T}_{ij} \ge t_k)e^{2\varvec{\beta }^\top {\mathbf { Z}_2 }_i} \right\} . \end{aligned}$$