Skip to main content
Log in

Nonparametric estimation in the illness-death model using prevalent data

  • Published:
Lifetime Data Analysis Aims and scope Submit manuscript

Abstract

We study nonparametric estimation of the illness-death model using left-truncated and right-censored data. The general aim is to estimate the multivariate distribution of a progressive multi-state process. Maximum likelihood estimation under censoring suffers from problems of uniqueness and consistency, so instead we review and extend methods that are based on inverse probability weighting. For univariate left-truncated and right-censored data, nonparametric maximum likelihood estimation can be considerably improved when exploiting knowledge on the truncation distribution. We aim to examine the gain in using such knowledge for inverse probability weighting estimators in the illness-death framework. Additionally, we compare the weights that use truncation variables with the weights that integrate them out, showing, by simulation, that the latter performs more stably and efficiently. We apply the methods to intensive care units data collected in a cross-sectional design, and discuss how the estimators can be easily modified to more general multi-state models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Andersen P, Borgan O, Gill RD, Keiding N (1993) Statistical models based on counting processes. Springer-Verlag, New York

    Book  MATH  Google Scholar 

  • Asgharian M, M’Lan C, Wolfson D (2002) Length-biased sampling with right censoring: an unconditional approach. J Am Stat Assoc 97:201–209

    Article  MathSciNet  MATH  Google Scholar 

  • Chang S, Tzeng S (2006) Nonparametric estimation of sojourn time distributions for truncated serial event data—a weight-adjusted approach. Lifetime Data Anal 12:53–67

    Article  MathSciNet  MATH  Google Scholar 

  • Datta S, Satten G (2001) Validity of the Aalen-Johansen estimators of stage occupation probabilities and Nelson-Aalen estimators of integrated transition hazards for non-Markov models. Stat Probab Lett 55:403–411

    Article  MathSciNet  MATH  Google Scholar 

  • Gill R (1992) Multivariate survival analysis. Theory Probab Appl 37(1):18–31

    Article  MathSciNet  MATH  Google Scholar 

  • Gill R, van der Laan M, Wellner J (1995) Inefficient esttimators of the bivariate survival function for three models. Annales de l’Institut Henri Poincaré - Probabilités et Statistiques 31(3):545–597

    MATH  Google Scholar 

  • Hougaard P (2000) Analysis of multivariate survival data. Springer, New York

    Book  MATH  Google Scholar 

  • Huang Y, Wang M-C (1995) Estimating the occurrence rate for prevalent survival data in competing risks model. J Am Stat Assoc 90(432):1406–1415

    Article  MathSciNet  MATH  Google Scholar 

  • Kalbfleisch J, Prentice R (2002) The statistical analysis of failure time data. Wiley, Hoboken

    Book  MATH  Google Scholar 

  • Keiding N (1991) Age-specific incidence and prevalence: a statistical perspective. J R Stat Soc Ser A 154(3):371–412

    Article  MathSciNet  MATH  Google Scholar 

  • Kosorok M (2008) Introduction to empirical processes and semiparametric inference. Springer, New York

    Book  MATH  Google Scholar 

  • Lin D, Sun W, Ying Z (1999) Nonparametric estimation of the gap time distributions for serial events with censored data. Biometrika 86(1):59–70

    Article  MathSciNet  MATH  Google Scholar 

  • Mandel M (2010) The competing risks illness-death model under cross-sectional sampling. Biostatistics 11(2):290–303

    Article  Google Scholar 

  • Mandel M, Betensky R (2007) Testing goodness of fit of a uniform truncation model. Biometrics 63(2):405–412

    Article  MathSciNet  MATH  Google Scholar 

  • Mnatzaganian G, Galai N, Sprung CD, Zitser-Gurevich Y, Mandel M, Ben-Hur D, Gurman G, Klein M, Lev A, Levi L et al (2005) Increased risk of bloodstream and urinary infections in intensive care unit (ICU) patients compared with patients fitting ICU admission criteria treated in regular wards. J Hosp Infect 59:331–342

    Article  Google Scholar 

  • Neuhaus G (1971) On weak convergence of stochastic processes with multidimensional time parameter. Ann Math Stat 42(4):1285–1295

    Article  MathSciNet  MATH  Google Scholar 

  • Prentice R, Moodie Z, Wu J (2004) Nonparametric estimation of the bivariate survivor function. In Lin D, Heagerty P (eds) Proceedings of the second Seattle symposium in Biostatistics. Lecture notes in statistics, vol. 179. Springer, New York

  • Putter H, Fiocco M, Geskus RB (2007) Tutorial in biostatistics: competing risks and multi-state models. Stat Med 26(11):2389–2430

    Article  MathSciNet  Google Scholar 

  • Qin J, Shen Y (2010) Statistical methods for analyzing right-censored length-biased data under Cox model. Biometrics 66:382–392

    Article  MathSciNet  MATH  Google Scholar 

  • Robins J, Rotnitzky A et al (1992) Recovery of information and adjustment for dependent censoring using surrogate markers. In: Jewell N, Dietz K, Farewell V (eds) AIDS epidemiology—methodological issues. Springer, Boston

    Google Scholar 

  • Rubin D (1981) The Bayesian bootstrap. Ann Stat 9:130–134

    Article  MathSciNet  Google Scholar 

  • Tsai W-Y (1990) Testing the assumption of independence of truncation time and failure time. Biometrika 77(1):169–177

    Article  MathSciNet  MATH  Google Scholar 

  • Vakulenko-Lagun B, Mandel M (2016) Comparing estimation approaches for the illness-death model under left truncation and right censoring. Stat Med 35:1533–1548

    Article  MathSciNet  Google Scholar 

  • van der Laan M (1996) Nonparametric estimation of the bivariate survival function with truncated data. J Multivar Anal 58(1):107–131

    Article  MathSciNet  MATH  Google Scholar 

  • Wang M-C (1989) A semiparametric model for randomly truncated data. J Am Stat Assoc 84:742–748

    Article  MathSciNet  MATH  Google Scholar 

  • Wang M-C (1991) Nonparametric estimation from cross-sectional survival data. J Am Stat Assoc 86:130–143

    Article  MathSciNet  MATH  Google Scholar 

  • Wang M-C (1999) Gap time bias in incident and prevalent cohorts. Stat Sin 9:999–1010

    MATH  Google Scholar 

  • Wang M-C, Jewell N, Tsai W-Y (1986) Asymptotic properties of the product limit estimate under random truncation. Ann Stat 14(4):1597–1605

    Article  MathSciNet  MATH  Google Scholar 

  • Wang W, Wells M (1998) Nonparametric estimation of successive duration times under dependent censoring. Biometrika 85(3):561–572

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

We thank the two reviewers for their valuable comments and suggestions. The work was supported by The Israel Science Foundation (Grant No. 519/14) and by NSF grant DMS-1407732.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bella Vakulenko-Lagun.

Appendix

Appendix

1.1 Convergence to a Gaussian process

Definition 1

Let \(X_1,\ldots ,X_n\) be random variables. For every function f, define \(Pf\equiv E[f(X)]\). Define \(\mathbb {P}_n\) to be the empirical measure, such that \(\mathbb {P}_n f\equiv n^{-1} \sum _{i=1}^{n}f(X_i)\). Define \(\mathbb {P}_n^{(b)}\) to be the bootstrap empirical measure with weights \((M_{n1},M_{n2},\ldots ,M_{nn})\) which are i.i.d., positive random variables with expectation 1 and finite variance, and independent of \(X_1,\ldots ,X_n\), such that \(\mathbb {P}_n^{(b)} f\equiv n^{-1} \sum _{i=1}^{n} (M_{ni}/\bar{M}_n)f(X_i)\) where \(\bar{M}_n=n^{-1}\sum _{i=1}^n M_{ni}\). The convergence type for bootstrap \(\sqrt{n}(\mathbb {P}_n^{(b)}-\mathbb {P}_n)\underset{\text {M}}{\overset{\text {P}}{\rightsquigarrow }}\mathbb {G}\) is defined in Kosorok (2008, pp. 19–20). Finally, for a space \(\mathcal {X}\), define \(\ell ^{\infty }(\mathcal {X})\) to be the space of all uniformly bounded real functions on \(\mathcal {X}\).

Let

$$\begin{aligned} \widehat{N}(t,u,l)\equiv \frac{1}{n}\sum _{i=1}^{n}F^*_i I(\widetilde{T}^*_i\le t,\widetilde{U}^*_i\le u,L_i^* \le l)\,. \end{aligned}$$

Let N be the expectation of \(\widehat{N}\), in other words, \(N(t,u,l)\equiv P(T\le t, U\le u,L\le l, C>T+U-L \mid L\le T+U)\). We have \(N(t,u,l)=G^*_{\widetilde{T}^*,\widetilde{U}^*,L^*}(t,u,l)\) as defined in Eq. (1). In particular,

$$\begin{aligned} N(t_0,u_0,l_0)=\beta ^{-1}\int _0^{t_0}\int _0^{u_0}\int _0^{l_0} S_C(t+u-l)I(l\le t+u)F_L(dl)G_{T,U}(dt,du). \end{aligned}$$
(10)

Lemma 1

The process

$$\begin{aligned} n^{\frac{1}{2}}\left( \begin{array}{c} \widehat{N}(t,u,l) -N(t,u,l) \\ \widehat{S}_C(s)- S_C(s)\\ \widehat{S}_{T+U}(v)-S_{T+U}(v)\\ \widehat{F}_{L}(w) -F_{L}(w) \end{array}\right) \rightsquigarrow \left( \begin{array}{c} \mathbb {G}_1(t,u,l)\\ \mathbb {G}_2(s) \\ \mathbb {G}_3(v)\\ \mathbb {G}_4(w)\end{array}\right) \end{aligned}$$

where \((\mathbb {G}_1,\ldots ,\mathbb {G}_4)^T \in \ell ^\infty ([0,\tau ]^3)\times (\ell ^\infty [0,\tau ])^3\) is a tight zero-mean Gaussian process with covariance structure that appears in the proof. Moreover, its corresponding bootstrap process

$$\begin{aligned} n^{\frac{1}{2}}\left( \begin{array}{c} \widehat{N}^{(b)}(t,u,l) -\widehat{N}(t,u,l) \\ \widehat{S}_C^{(b)}(s)- \widehat{S}_C(s)\\ \widehat{S}_{T+U}^{(b)}(v)-\widehat{S}_{T+U}(v)\\ \widehat{F}_{L}^{(b)}(w) -\widehat{F}_{L}(w)\end{array}\right) \underset{\text {M}}{\overset{\text {P}}{\rightsquigarrow }}\left( \begin{array}{c} \mathbb {G}_1(u,t,l)\\ \mathbb {G}_2(s) \\ \mathbb {G}_3(v)\\ \mathbb {G}_4(w)\end{array}\right) \,. \end{aligned}$$
(11)

Proof

Write \(\widehat{N}(t,u,l) \equiv \mathbb {P}_n f_{t,u,l}(F^*,\widetilde{T}^*,\widetilde{U}^*,L^*)\), where \(f_{t,u,l}(F^*,\widetilde{T}^*,\widetilde{U}^*,L^*)=F^* I(\widetilde{T}^*\le t,\widetilde{U}^*\le u,L^*\le l)\). Note that the class

$$\begin{aligned} \mathcal F_1\equiv \left\{ f_{t,u,l}(F^*,\widetilde{T}^*,\widetilde{U}^*, L^*): (t,u,l)\in [0,\tau ]^3,l\le t+u\in [0,\tau ] \right\} \end{aligned}$$

is a P-Donsker class. Hence, \(\sqrt{n}(\widehat{N}-N)\rightsquigarrow \mathbb {G}_1\), where \(\mathbb {G}_1\) is a Brownian bridge on \(\ell ^{\infty }([0,\tau ]^3)\) with covariance

$$\begin{aligned}&\mathrm {Cov}\left( \mathbb {G}_1(t_1,u_1,l_1),\mathbb {G}_1(t_2,u_2,l_2)\right) \nonumber \\&\quad =G_{\widetilde{T}^*,\widetilde{U}^*,L^*}\big (\min (t_1,t_2),\min (u_1,u_2),\min (l_1,l_2)\big )\nonumber \\&\quad \quad -\,G_{\widetilde{T}^*,\widetilde{U}^*,L^*}(t_1,u_1,l_1)G_{\widetilde{T}^*,\widetilde{U}^*,L^*}(t_2,u_2,l_2)\,. \end{aligned}$$

Let \(\Lambda _C(s)\) be the cumulative hazard of C, and recall that C is randomly censored by \(T^*+U^*-L^*\). Let \(\pi (s)=P(\widetilde{T}^*+\widetilde{U}^*\ge s)\). Write

$$\begin{aligned} \nu (\widetilde{T}^*,\widetilde{U}^*,F^*,L^*,s)&\equiv -S_C(s)\left[ \frac{(1-F^*)I(\widetilde{T}^*+\widetilde{U}^*-L^*\le s)}{\pi (\widetilde{T}^*+\widetilde{U}^*-L^*)}\right. \nonumber \\&\quad \left. -\int _0^s\frac{I(\widetilde{T}^*+\widetilde{U}^*-L^*\ge u)}{\pi (u)}d\Lambda _C(u)\right] \,. \end{aligned}$$

By Kosorok (2008, Chap. 4.3), \(\sqrt{n}(\widehat{S}_C-S_C)=\sqrt{n}(\mathbb {P}_n-P)\nu +o_p(1)\). Note that the random process \(\nu \), as a process in \(s\in [0,\tau ]\), is P-Donsker by Corollary 9.32 combined with Lemma 4.1 of Kosorok (2008). Hence, \(\sqrt{n}(\widehat{S}_C-S_C)\rightsquigarrow \mathbb {G}_2\) where \(\mathbb {G}_2\) is a Brownian bridge on \(\ell ^{\infty }([0,\tau ])\) with covariance

$$\begin{aligned} \mathrm {Cov}\left( \mathbb {G}_2(s_1),\mathbb {G}_2(s_2)\right) =E[\nu (s_1)\nu (s_2)]\,. \end{aligned}$$

For \(\mathbb {G}_3\) and \(\mathbb {G}_4\) we use results from Wang (1991). Let

$$\begin{aligned} K(s)&\equiv P(\widetilde{T}^*+\widetilde{U}^*\le s, F^*=1)\\ R(s)&\equiv P(L^*\le s\le \widetilde{T}^*+\widetilde{U}^*)\\ F_{L^*}(s)&\equiv P(L^*\le s)\\ \xi (\widetilde{T}^*,\widetilde{U}^*,L^*,F^*,s)&\equiv -S_{T+U}(s)\left[ \frac{I(\widetilde{T}^*+\widetilde{U}^*\le s)F^*}{R(s)}\right. \\&\qquad +\left. \int _0^s\frac{I(\widetilde{T}^*+\widetilde{U}^*\le u)F^*}{R(u)^2}dR(u)\right. \nonumber \\&\quad \quad \left. -\int _0^s\frac{I(L^*\le u \le \widetilde{T}^*+\widetilde{U}^*)}{R(u)^2}dK(u) \right] \nonumber \\ \psi (\widetilde{T}^*,\widetilde{U}^*,L^*,F^*,s)&\equiv \int \frac{1}{S_{T+U}(u)^2}\xi (\widetilde{T}^*,\widetilde{U}^*,L^*,F^*,u)\nonumber \\&\quad \times \big (F_L(s)-I(u\le s\big )dF_{L^{*}}(s) \\ \zeta (L^*,s)&\equiv \frac{ I(L^*\le s)-F_L(s)}{S_{T+U}(L^*)}\\ \vartheta (\widetilde{T}^*,\widetilde{U}^*,L^*,F^*,s)&\equiv \beta \left( \psi (\widetilde{T}^*,\widetilde{U}^*,L^*,F^*,s)+\zeta (L^*,s)\right) \end{aligned}$$

where \(\beta \equiv P(T+U\ge L)\). By Wang (1991, Sect. 4), \(\sqrt{n}(\widehat{S}_{T+U}-S_{T+U})=\sqrt{n}(\mathbb {P}_n-P)\xi +o_p(1)\) and \(\sqrt{n}(\widehat{F}_L-F_L)=\sqrt{n}(\mathbb {P}_n-P)\vartheta +o_p(1) \). Note that the random processes \(\xi \) and \(\psi \), as processes in \(s\in [0,\tau ]\), are P-Donsker by Lemma 4.1 of Kosorok (2008). The process \(\zeta \) is also P-Donsker by the boundedness of \(S_{T+U}\) on \([0,\tau ]\) and therefore also their sum. Hence, \(\sqrt{n}(\widehat{S}_{T+U}-S_{T+U})\rightsquigarrow \mathbb {G}_3\) where \(\mathbb {G}_3\) is a Brownian bridge on \(\ell ^{\infty }([0,\tau ])\) with covariance given in Wang (1991, Lemma 4.1). Similarly, \(\sqrt{n}(\widehat{F}_L-F_L)\rightsquigarrow \mathbb {G}_4\) where \(\mathbb {G}_4\) is a Brownian bridge on \(\ell ^{\infty }([0,\tau ])\) with covariance given in Wang (1991, Theorem 4.1).

Since all four Gaussian processes are tight, by Lemmas 7.12 and 7.14 of Kosorok (2008), the joint process \((\mathbb {G}_1,\ldots ,\mathbb {G}_4)^T\) is also tight in \(\ell ^{\infty }([0,\tau ]^3)\times (\ell ^{\infty }[0,\tau ])^3\). By the Cramer-Wald device it is also zero-mean Gaussian with covariance

$$\begin{aligned}&\mathrm {Cov}\left( \mathbb {G}_1(t,u,l),\mathbb {G}_2(s)\right) =E[f_{t,u,l}\nu (s)],\quad \mathrm {Cov}\left( \mathbb {G}_1(t,u,l),\mathbb {G}_3(s)\right) =E[f_{t,u,l}\xi (s)]\,,\\&\mathrm {Cov}\left( \mathbb {G}_1(t,u,l),\mathbb {G}_4(s)\right) =E[f_{t,u,l}\vartheta (s)],\quad \mathrm {Cov}\left( \mathbb {G}_2(s),\mathbb {G}_3(t)\right) =E[\nu (s)\xi (t)]\,,\\&\mathrm {Cov}\left( \mathbb {G}_2(s),\mathbb {G}_4(t)\right) =E[\nu (s)\vartheta (t)],\quad \mathrm {Cov}\left( \mathbb {G}_3(s),\mathbb {G}_4(t)\right) =E[\xi (s)\vartheta (t)]\,, \end{aligned}$$

where the expectation is taken with respect to the random variables \(\widetilde{T}^*,\widetilde{U}^*,L^*,F^*\). Since all the functions’ classes are Donsker, by Theorem 2.6 of Kosorok (2008), the bootstrap version in (11) also holds.

1.2 Hadamard differentiability

Definition 2

Let \(\mathbb {D}\) and \(\mathbb {E}\) be normed spaces. Then \(\phi :\mathbb {D}\mapsto \mathbb {E}\) is Hadamard differentiable at \(A\in \mathbb {D}\) if there exists a linear and continuous function \(\phi _A':\mathbb {D}\mapsto \mathbb {E}\) such that

$$\begin{aligned} \frac{\phi (A +h_n a_n)-\phi (A)}{h_n}- \phi _A'(a)\rightarrow 0\,, \end{aligned}$$

for all converging sequences \(h_n\rightarrow 0\) and \(a_n\rightarrow a\) with \(h_n\in \mathbb {R}\), \(a_n\in \mathbb {D}\) and \(A+h_na_n\in \mathbb {D}\) (Kosorok 2008, Sect. 2.2.4).

Definition 3

The space \(D[0,\tau ]\) is the space of all cadlag functions (right continuous functions with left limits) from \([0,\tau ]\) to \(\mathbb {R}\) equipped with the sup-norm. Denote by \(BV_M[0,\tau ]\) the space of all functions with bounded variation, that is, all the functions \(A\in D[0,\tau ]\) such that \(\int _0^\tau |dA(t)|\equiv |A(0)|+\int _{(0,\tau )}|dA(t)| <M\) (see Kosorok 2008 Sect. 12.2.2). Finally, the space \(D([0,\tau ]^p)\) is the space of all cadlag p-variate functions equipped with the sup-norm (see Neuhaus 1971, for details).

Lemma 2

Let \(D_1\equiv \{f\in D[0,\tau ]\, :\, \inf _t|f(t)|>0\} \), \(D_2\equiv BV_M[0,\tau ]\times BV_M[0,\tau ]\), \(D_3\equiv D[0,\tau ]^2\times D[0,\tau ]^2\). Then

  1. (i)

    The function

    $$\begin{aligned}&H_1: D[0,\tau ]\times D[0,\tau ]\mapsto D([0,\tau ]^3)\quad ;\\&\quad H_1(\, (A,B)\,)(t,u,l)=A(t+u)B(t+u-l) \end{aligned}$$

    is Hadamard differentiable with derivative

    $$\begin{aligned} H_{1,(A,B)}'(a,b)(t,u,l)= & {} A(t+u)b(t+u-l)+a(t+u)B(t+u-l)\\= & {} H_1(A,b)(t,u,l)+H_1(a,B)(t,u,l)\,. \end{aligned}$$
  2. (ii)

    The function

    $$\begin{aligned} H_2: D_1\mapsto D[0,\tau ] \quad ; \quad H_2(A)(t)=\frac{1}{A(t)} \end{aligned}$$

    is Hadamard differentiable with derivative \(H_{2,(A)}'(a)(t)=-a(t)/A(t)^2\).

  3. (iii)

    Let \(C(t_0,u_0)=\{(t,u,l)\in [0,\tau ]^3\,:\, l\le t+u\le \tau ,t\le t_0,u\le u_0\}\). The function

    $$\begin{aligned} H_3: D_3\mapsto D[0,\tau ]^2\quad ;\quad H_3(\, (A,B)\, )(t_0,u_0)=\int _{C(t_0,u_0)} A(t,u,l)dB(t,u,l) \end{aligned}$$

    is Hadamard differentiable with derivative

    $$\begin{aligned} H_{3,(A,B)}'(a,b)(t_0,u_0)= & {} \int _{C(t_0,u_0)} A(t,u,l)db(t,u,l)\\&+\int _{C(t_0,u_0)} a(t,u,l)dB(t,u,l)\,. \end{aligned}$$
  4. (iv)

    The function

    $$\begin{aligned} H_4: D_2\mapsto D([0,\tau ]^3)\quad ;\quad H_4(\, (A,B)\, )(t,u,l)=\int _{(0,t+u]} A(t+u-s)dB(s) \end{aligned}$$

    is Hadamard differentiable with derivative

    $$\begin{aligned} H_{4,(A,B)}'(a,b)(t,u,l)&=\int _{(0,t+u])} A(t+u-s)db(s)+\int _{(0,t+u]} a(t+u-s)dB(s)\\&= H_4(A,b)(t,u,l)+H_4(a,B)(t,u,l)\,. \end{aligned}$$

Proof

For the proofs of i and iv, let \(h_n\rightarrow 0\) and \((a_n,b_n)\rightarrow (a,b)\) in the appropriate space.

  1. (i)

    Write

    $$\begin{aligned}&\frac{H_1(A+h_na_n,B+h_nb_n)(t,u,l)-H_{1}(t,u,l)}{h_n}-H_1'{(A,B)} (a,b)(t,u,l)\\&\quad =A(t+u)b_n(t+u-l)+a_n(t+u)B(t+u-l)+h_na_n(t+u)b_n(t+u-l)\\&\quad \quad -\left\{ A(t+u)b(t+u-l)+a(t+u)B(t+u-l)\right\} \rightarrow 0\,. \end{aligned}$$
  2. (ii)

    The proof appears in kosorok (2008, Sect. 2.2.4).

  3. (iii)

    The proof appears in Gill et al. (1995, Sect. 2, Illustration 1).

  4. (iv)

    Write

    $$\begin{aligned}&\frac{H_4( A+h_na_n,B+h_nb_n)(t,u,l)-H_4(A,B)(t,u,l)}{h_n}-H_{4,(A,B)}'(a,b)(t,u,l)\\&=h_n^{-1}\left( \int _0^{t+u} (A+h_na_n)(t+u-s)d(B+h_nb_n)(s)\right. \\&\quad \quad \left. -\int _0^{t+u} A(t+u-s)dB(s)\right) \\&\quad -\left( \int _0^{t+u} A(t+u-s)db(s)+\int _0^{t+u} a(t+u-s)dB(s)\right) \\&=\int _0^{t+u}A(t+u-s)d(b_n-b)(s)+\int _0^{t+u} (a_n-a)(t+u-s)dB(s)\\&\quad +h_n\int _0^{t+u}a_n(t+u-s)db_n(s)\,. \end{aligned}$$

    The first term in the above equation goes to zero using the same arguments as in proof 12.3 of Kosorok (2008, p. 242). The second term goes to zero since \(a_n\rightarrow a\) and B is bounded in total variation. The third term goes to zero since \(h_n\rightarrow 0\) and \(a_n\) and \(b_n\) are bounded, which completes the proof.

1.3 Computation of the asymptotic variance

Proof of Theorem 1

Let \(C(t_0,u_0)=\{(t,u,l)\in [0,\tau ]^3\,:\, l\le t+u\le \tau ,t\le t_0, u\le u_0\}\). Note that

$$\begin{aligned} G_{T,U}(t_0,u_0)=&\int _0^{t_0}\int _0^{u_0}{I(t+u\le \tau )G_{T,U}(dt,du)}\\ =&\int _0^{t_0}\int _0^{u_0}F_L(t+u) \frac{I(t+u\le \tau )G_{T,U}(dt,du)}{F_L(t+u)} \\ =&\int _0^{t_0}\int _0^{u_0}\int _0^{t+u} \frac{S_C(t+u-l)I(t+u\le \tau )F_L(dl)G_{T,U}(dt,du)}{F_L(t+u)S_C(t+u-l)} \\ =&\,\beta \int _{C(t_0,u_0)} \frac{dN(t,u,l)}{F_L(t+u)S_C(t+u-l)}\,. \end{aligned}$$

Therefore we can write \(G_{T,U}=\beta \phi (N,S_C,F_L)\), where (compare to (3))

$$\begin{aligned} \phi (N,S_C,F_L)(t_0,u_0)\equiv \int _{C(u_0,t_0)} \frac{dN(t,u,l)}{F_L(t+u)S_C(t+u-l)}\,. \end{aligned}$$

The function \(\phi (N,S_C,F_L)\) can be decomposed as a sequences of the following mappings:

$$\begin{aligned} (N,S_C,F_L)\mapsto (N,H_1(F_L,S_C))&\mapsto \left( N,H_2(H_1(F_L,S_C))\right) \\&\mapsto H_3(H_2(H_1(F_L,S_C)),N) \,, \end{aligned}$$

where \(H_1\), \(H_2\), and \(H_3\) are defined in Lemma 2. By (3), using the same mapping for \(t_0\le \tau \), \(u_0\le \tau \),

$$\begin{aligned} \widehat{G}^{NPE-1}_{T,U}(t_0,u_0)=\beta _n\phi (\widehat{N},\widehat{S}_C,\widehat{F}_L)=\frac{\beta _n}{n}\sum _{i=1}^n \frac{F^*_iI(\widetilde{T}^*_i\le t_0,\widetilde{U}^*_i\le u_0,L_i^*\le \tau )}{\widehat{F}_L(\widetilde{T}^*_i+\widetilde{U}^*_i)\widehat{S}_C(\widetilde{T}^*_i+\widetilde{U}^*_i-L^*_i)}\,, \end{aligned}$$

where

$$\begin{aligned} \beta _n&\equiv \Big \{\frac{1}{n}\sum _{j=1}^n\widehat{S}_{T+U}^{-1}(L^*_j)\Big \}^{-1}\,. \end{aligned}$$
(12)

The derivative of the map \(\phi \) at \((N,S_C,F_L)\), \(\phi _{(N,S_C,F_L)}'(a,b,c)\) for \((a,b,c)\) in \(D[0,\tau ]^3\times D[0,\tau ]\times D[0,\tau ] \) can be obtained using the chain rule for Hadamard differentiable functions (Kosorok 2008, Lemma 6.19).

$$\begin{aligned}&(a,b,c)\mapsto (a,H_1(F_L,b)+H_1(c, S_C))\mapsto \left( a,-\frac{H_1(F_L,b)+H_1(c,S_C)}{H_1(F_L,S_C)^2}\right) \\&\quad \mapsto \int _{C(t_0,u_0)}\frac{da(t,u,l)}{H_1(F_L,S_C)(t,u,l)}\\&\quad \quad - \int _{C(t_0,u_0)}\left( \frac{H_1(F_L,b)+H_1(c,S_C)}{H_1(F_L,S_C)^2}\right) (u,t,l)dN(t,u,l)\\&\quad =\int _{C(t_0,u_0)}\frac{da(t,u,l)}{F_L(t+u)S_C(t+u-l)}\\&\quad \quad - \int _{C(t_0,u_0)}\left( \frac{F_L(t+u)b(t+u-l)+c(t+u)S_C(t+u-l)}{\left( F_L(t+u)S_C(t+u-l)\right) ^2}\right) dN(t,u,l)\,. \end{aligned}$$

By the functional delta method (Kosorok 2008, Theorem 2.8), together with Slutsky’s Theorem (Kosorok 2008, Theorem 7.15) applied to the multiplication by \(\beta _n\rightarrow \beta \),

$$\begin{aligned} \sqrt{n}\left( \widehat{G}^{NPE-1}_{T,U}(t,u)-G_{T,U}(t,u)\right) \rightsquigarrow \beta \phi _{(N,S_C,F_L)}'(\mathbb {G}_1,\mathbb {G}_2,\mathbb {G}_3)\,. \end{aligned}$$

By the functional delta method for bootstrap processes (Kosorok 2008, Theorem 12.1), we also have

$$\begin{aligned} \sqrt{n}\left( \widehat{G}^{NPE-1(b)}_{T,U}(t,u)-\widehat{G}^{NPE-1}_{T,U}(t,u)\right) \underset{\text {M}}{\overset{\text {P}}{\rightsquigarrow }}\beta \phi _{(N,S_C,F_L)}'(\mathbb {G}_1,\mathbb {G}_2,\mathbb {G}_3)\,. \end{aligned}$$

This completes the proof for the first estimator. For the second estimator, note the relation \(F_{L^*}(dl)=F_L(dl)S_{T+U}(l)/\int _0^\infty F_L(dy)S_{T+U}(y)\) that follows from the model \(L^* \sim L\mid L\le T+U\), where L and \(T+U\) are independent. Therefore, the inverse formula is \(F_L(dl)=F_{L^*}(dl)S^{-1}_{T+U}(l)/\int _0^\infty F_{L^*}(dy)S^{-1}_{T+U}(y)\). Let \(\beta =\int _0^\infty F_L(dy)S_{T+U}(y) = P(L\le T+U)\) as before and \(\gamma =\int _0^\infty F_{L^*}(dy)S^{-1}_{T+U}(y)\). Let \(C_1(t_0,u_0)=\{(t,u):t+u\le \tau , t\le t_0,u\le u_0\}\).

Note that

$$\begin{aligned} G_{T,U}(t_0,u_0)=&\int _{C_1(t_0,u_0)} G_{T,U}(dt,du) \\ =&\int _{C_1(t_0,u_0)} \frac{G_{T,U}(dt,du) \int _{l=0}^{t+u}S_C(t+u-l)\gamma ^{-1} F_{L^*}(dl)S^{-1}_{T+U}(l) }{\int _{s=0}^{t+u}S^{-1}_{T+U}(s)S_C(t+u-s)\gamma ^{-1}F_{L^*}(ds)}\\ =&\int _{C_1(t_0,u_0)}\int _{0\le l\le t+u} \frac{S_C(t+u-l)F_L(dl)G_{T,U}(dt,du)}{\int _{s=0}^{t+u}S_C(t+u-s)F_{L}(ds)}\\ =&\,\beta \int _{C(t_0,u_0)} \frac{dN(t,u,l)}{\int _{s=0}^{t+u}S_C(t+u-s)F_{L}(ds)}\,, \end{aligned}$$

where the first equation follows from the definition of the density \(G_{T,U}(dt,tu)\); the second equation follows by multiplying and dividing by \(\gamma ^{-1} \int _0^{t+u}S_C(t+u-l)F_{L^*}(dl)S_{T+U}^{-1}\); the third equation follows from the inverse formula that relates \(F_{L^*}\) and \(F_L\) above; and the last equation follows from the definition of N in (10).

Write \(G_{T,U}(t_0,u_0)=\beta \psi (N,S_C,F_{L})(t_0,u_0)\), where (compare to (6))

$$\begin{aligned} \psi (N,S_C,F_L)(t_0,u_0)\equiv \int _{C(t_0,u_0)} \frac{dN(t,u,l)}{\int _{s=0}^{t+u}S_C(t+u-s)dF_{L}(s)}\,. \end{aligned}$$

The function \(\psi (N,S_C,F_L)\) can be decomposed as a sequence of the following mappings

$$\begin{aligned} (N,S_C,F_{L})&\mapsto (N,H_4(S_C,F_{L}))\mapsto \left( N,H_2(H_4(S_C,F_L))\right) \\&\mapsto H_3(H_2(H_4(S_C,F_L)),N)\,, \end{aligned}$$

where \(H_2\), \(H_3\), and \(H_4\) are defined in Lemma 2. By (6), using the same mappings

$$\begin{aligned} \widehat{G}^{NPE-2}_{T,U}(t_0,u_0)=\beta _n \psi (\widehat{N},\widehat{S}_C,\widehat{F}_{L})=\frac{\beta _n}{n}\sum _{i=1}^n \frac{F^*_iI(\widetilde{T}^*_i\le t_0,\widetilde{U}^*_i\le u_0)}{\int _{s=0}^{\widetilde{T}_i^*+\widetilde{U}_i^*}\widehat{S}_C(\widetilde{T}^*_i+\widetilde{U}^*_i-s)d\widehat{F}_{L}(s)} \end{aligned}$$

where \(\beta _n\) is defined in (12). The derivative of the map \(\psi \) at \((N,S_C,F_{L})\), \(\psi _{(N,S_C,F_{L})}'(a,b,c)\) for \((a,b,c)\) in \(D([0,\tau ]^3)\times (D[0,\tau ])^2 \) can be obtained using the chain rule for Hadamard differentiable functions (Kosorok 2008, Lemma 6.19).

$$\begin{aligned}&(a,b,c)\mapsto \left( a,H_4(S_C,c)+H_4(b,F_{L})\right) \mapsto \left( a,-\frac{H_4(S_C,c)+H_4(b,F_{L})}{\left( H_4(S_C,F_{L})\right) ^2} \right) \\&\quad \mapsto \int _{C(t_0,u_0)}\frac{da(t,u,l)}{H_4(S_C,F_L)(t,u,l)}-\\ {}&\quad \quad \int _{C(t_0,u_0)}\left( \frac{H_4(S_C,c)+H_4(b,F_{L})}{\left( H_4(S_C,F_{L})\right) ^2}\right) (t,u,l)dN(t,u,l) \\&\quad =\int _{C(t_0,u_0)}\frac{da(t,u,l)}{\int _0^{t+u}S_C(t+u-s)dF_L(s)} \\&\quad \quad - \int _{C(t_0,u_0)}\left( \frac{\int _0^{t+u}S_C(t+u-s)dc(s)+\int _0^{t+u}b(t+u-s)dF_{L}(s)}{\left( \int _0^{t+u}S_C(t+u-l)dF_{L}(s)\right) ^2}\right) dN(t,u,l) \end{aligned}$$

By the functional delta method (Kosorok 2008, Theorem 2.8), together with Slutsky’s Theorem for the convergence of \(\beta _n\) to \(\beta \) (Kosorok 2008, Theorem 7.14)

$$\begin{aligned} \sqrt{n}\left( \widehat{G}^{NPE-2}_{T,U}(t,u)-G_{T,U}(t,u)\right) \rightsquigarrow \beta \psi _{(N,S_C,F_{L})}'(\mathbb {G}_1,\mathbb {G}_2,\mathbb {G}_4)\,. \end{aligned}$$

By the functional delta method for bootstrap processes (Kosorok 2008, Theorem 12.1) we also have

$$\begin{aligned} \sqrt{n}\left( \widehat{G}^{NPE-2(b)}_{T,U}(t,u)-\widehat{G}^{NPE-2}_{T,U}(t,u)\right) \underset{\text {M}}{\overset{\text {P}}{\rightsquigarrow }}\beta \psi _{(N,S_C,F_{L})}'(\mathbb {G}_1,\mathbb {G}_2,\mathbb {G}_4)\,. \end{aligned}$$

\(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vakulenko-Lagun, B., Mandel, M. & Goldberg, Y. Nonparametric estimation in the illness-death model using prevalent data. Lifetime Data Anal 23, 25–56 (2017). https://doi.org/10.1007/s10985-016-9373-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10985-016-9373-0

Keywords

Navigation