Skip to main content
Log in

EM algorithm for the additive risk mixture cure model with interval-censored data

  • Published:
Lifetime Data Analysis Aims and scope Submit manuscript

Abstract

Interval-censored failure time data arise in a number of fields and many authors have recently paid more attention to their analysis. However, regression analysis of interval-censored data under the additive risk model can be challenging in maximizing the complex likelihood, especially when there exists a non-ignorable cure fraction in the population. For the problem, we develop a sieve maximum likelihood estimation approach based on Bernstein polynomials. To relieve the computational burden, an expectation–maximization algorithm by exploiting a Poisson data augmentation is proposed. Under some mild conditions, the asymptotic properties of the proposed estimator are established. The finite sample performance of the proposed method is evaluated by extensive simulations, and is further illustrated through a real data set from the smoking cessation study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Aalen O (1980) A model for nonparametric regression analysis of counting processes. Mathematical statistics and probability theory. Springer, New York, pp 1–25

    Google Scholar 

  • Banerjee S, Carlin BP (2004) Parametric spatial cure rate models for interval-censored time-to-relapse data. Biometrics 60(1):268–275

    MathSciNet  MATH  Google Scholar 

  • Berkson J, Gage RP (1952) Survival curve for cancer patients following treatment. J Am Stat Assoc 47(259):501–515

    Google Scholar 

  • Betensky RA, Rabinowitz D, Tsiatis AA (2001) Computationally simple accelerated failure time regression for interval censored data. Biometrika 88(3):703–711

    MathSciNet  MATH  Google Scholar 

  • Bickel PJ, Kwon J (2001) Inference for semiparametric models: some questions and an answer. Stat Sin 11(4):863–886

    MathSciNet  MATH  Google Scholar 

  • Boag JW (1949) Maximum likelihood estimates of the proportion of patients cured by cancer therapy. J R Stat Soc B 11(1):15–53

    MATH  Google Scholar 

  • Cox D (1972) Regression models and life-tables. J R Stat Soc B 34(2):187–220

    MathSciNet  MATH  Google Scholar 

  • Farewell VT (1982) The use of mixture models for the analysis of survival data with long-term survivors. Biometrics 38(4):1041–1046

    Google Scholar 

  • Finkelstein DM (1986) A proportional hazards model for interval-censored failure time data. Biometrics 42(4):845–854

    MathSciNet  MATH  Google Scholar 

  • Ghosh D (2001) Efficiency considerations in the additive hazards model with current status data. Stat Neerl 55(3):367–376

    MathSciNet  MATH  Google Scholar 

  • Györfi L, Kohler M, Krzyzak A, Walk H (2002) A distribution-free theory of nonparametric regression. Springer-Verlag, Berlin

    MATH  Google Scholar 

  • Hanin L, Huang L (2014) Identifiability of cure models revisited. J Multivar Anal 130:261–274

    MathSciNet  MATH  Google Scholar 

  • Hu T, Xiang L (2013) Efficient estimation for semiparametric cure models with interval-censored data. J Multivar Anal 121:139–151

    MathSciNet  MATH  Google Scholar 

  • Hu T, Xiang L (2016) Partially linear transformation cure models for interval-censored data. Comput Stat Data Anal 93:257–269

    MathSciNet  MATH  Google Scholar 

  • Huang J (1996) Efficient estimation for the proportional hazards model with interval censoring. Ann Stat 24(2):540–568

    MathSciNet  MATH  Google Scholar 

  • Huang J, Rossini AJ (1997) Sieve estimation for the proportional-odds failure-time regression model with interval censoring. J Am Stat Assoc 92(439):960–967

    MathSciNet  MATH  Google Scholar 

  • Jewell NP, Laan MV (1995) Generalizations of current status data with applications. Lifetime Data Anal 1(1):101–109

    MATH  Google Scholar 

  • Kim YJ, Jhun M (2008) Cure rate model with interval censored data. Stat Med 27(1):3–14

    MathSciNet  Google Scholar 

  • Li C, Taylor JMG, Sy JP (2001) Identifiability of cure models. Stat Prob Lett 54(4):389–395

    MathSciNet  MATH  Google Scholar 

  • Lin DY, Ying Z (1994) Semiparametric analysis of the additive risk model. Biometrika 81(1):61–71

    MathSciNet  MATH  Google Scholar 

  • Lin DY, Oakes D, Ying Z (1998) Additive hazards regression with current status data. Biometrika 85(2):289–298

    MathSciNet  MATH  Google Scholar 

  • Liu H, Shen Y (2009) A semiparametric regression cure model for interval-censored data. Publ Am Stat Assoc 104(487):1168–1178

    MathSciNet  MATH  Google Scholar 

  • Liu Y, Hu T, Sun J (2017) Regression analysis of current status data in the presence of a cured subgroup and dependent censoring. Lifetime Data Anal 23(4):626–650

    MathSciNet  MATH  Google Scholar 

  • Lorentz GG (1986) Bernstein polynomials. Chelsea Publishing Company, New York

    MATH  Google Scholar 

  • Louis T (1982) Finding the observed information matrix when using the EM algorithm. J R Stat Soc B 44(2):226–233

    MathSciNet  MATH  Google Scholar 

  • Lu W (2010) Efficient estimation for an accelerated failure time model with a cure fraction. Stat Sin 20:661–674

    MathSciNet  MATH  Google Scholar 

  • Ma S (2010) Mixed case interval censored data with a cured subgroup. Stat Sin 20:1165–1181

    MathSciNet  MATH  Google Scholar 

  • Ma S (2011) Additive risk model for current status data with a cured subgroup. Ann Inst Stat Math 63(1):117–134

    MathSciNet  MATH  Google Scholar 

  • Mao M, Wang JL (2010) Semiparametric efficient estimation for a class of generalized proportional odds cure models. J Am Stat Assoc 105(489):302–311

    MathSciNet  MATH  Google Scholar 

  • Martinussen T, Scheike TH (2002) Efficient estimation in additive hazards regression with current status data. Biometrika 89(3):649–658

    MathSciNet  MATH  Google Scholar 

  • McMahan CS, Wang L, Tebbs JM (2013) Regression analysis for current status data using the EM algorithm. Stat Med 32(25):4452–4466

    MathSciNet  Google Scholar 

  • Murray RP, Anthonisen NR, Connett JE, Wise RA, Lindgren PG, Greene PG, Nides MA (1998) Effects of multiple attempts to quit smoking and relapses to smoking on pulmonary function. J Clin Epidemiol 51(12):1317–1326

    Google Scholar 

  • Nelder JA, Mead R (1965) A simplex method for function minimization. Comput J 7(4):308–313

    MathSciNet  MATH  Google Scholar 

  • Osman M, Ghosh SK (2012) Nonparametric regression models for right-censored data using Bernstein polynomials. Comput Stat Data Anal 56(3):559–573

    MathSciNet  MATH  Google Scholar 

  • Peng Y, Dear KB (2000) A nonparametric mixture model for cure rate estimation. Biometrics 56(1):237–243

    MATH  Google Scholar 

  • Pollard D (1984) Convergence of stochastic processes. Springer, New York

    MATH  Google Scholar 

  • Pollard D (1990) Empirical processes: theory and applications. In: NSF-CBMS regional conference series in probability and statistics, pp 1–86. Institute of Mathematical Statistics and the American Statistical Association

  • Rossini AJ, Tsiatis AA (1996) A semiparametric proportional odds regression model for the analysis of current status data. J Am Stat Assoc 91(434):713–721

    MathSciNet  MATH  Google Scholar 

  • Shen X (1997) On methods of sieves and penalization. Ann Stat 25(6):2555–2591

    MathSciNet  MATH  Google Scholar 

  • Shen X, Wong WH (1994) Convergence rate of sieve estimates. Ann Stat 22(2):580–615

    MathSciNet  MATH  Google Scholar 

  • Shen Y, Cheng SC (1999) Confidence bands for cumulative incidence curves under the additive risk model. Biometrics 55(4):1093

    MathSciNet  MATH  Google Scholar 

  • Sun J (2006) The statistical analysis of interval-censored failure time data. Springer, New York

    MATH  Google Scholar 

  • Sy JP, Taylor JM (2000) Estimation in a Cox proportional hazards cure model. Biometrics 56(1):227–236

    MathSciNet  MATH  Google Scholar 

  • Van der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes. Springer, New York

    MATH  Google Scholar 

  • Wang L, Sun J, Tong X (2010) Regression analysis of case II interval-censored failure time data with the additive hazards model. Stat Sin 20:1709–1723

    MathSciNet  MATH  Google Scholar 

  • Wang L, McMahan CS, Hudgens MG, Qureshi ZP (2016) A flexible, computationally efficient method for fitting the proportional hazards model to interval-censored data. Biometrics 71(1):222–231

    MathSciNet  MATH  Google Scholar 

  • Wu Y, Chambers CD, Xu R (2019) Semiparametric sieve maximum likelihood estimation under cure model with partly interval censored and left truncated data for application to spontaneous abortion. Lifetime Data Anal 25:507–528

    MathSciNet  MATH  Google Scholar 

  • Xue H, Lam KF, Li G (2004) Sieve maximum likelihood estimator for semiparametric regression models with current status data. J Am Stat Assoc 99(466):346–356

    MathSciNet  MATH  Google Scholar 

  • Yu B, Peng Y (2008) Mixture cure models for multivariate survival data. Comput Stat Data Anal 52(3):1524–1532

    MathSciNet  MATH  Google Scholar 

  • Zeng D, Cai J, Shen Y (2006a) Semiparametric additive risks model for interval-censored data. Stat Sin 16:287–302

    MathSciNet  MATH  Google Scholar 

  • Zeng D, Yin G, Ibrahim JG (2006b) Semiparametric transformation models for survival data with a cure fraction. J Am Stat Assoc 101(474):670–684

    MathSciNet  MATH  Google Scholar 

  • Zhang J, Peng Y (2009) Accelerated hazards mixture cure model. Lifetime Data Anal 15(4):455–467

    MathSciNet  MATH  Google Scholar 

  • Zhang Y, Hua L, Huang J (2010) A spline-based semiparametric maximum likelihood estimation method for the Cox model with interval-censored data. Scand J Stat 37(2):338–354

    MathSciNet  MATH  Google Scholar 

  • Zhou J, Zhang J, Lu W (2017) An expectation maximization algorithm for fitting the generalized odds-rate model to interval censored data. Stat Med 36(7):1157–1171

    MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the Editor-in-Chief, the Associate Editor and two reviewers for their constructive comments and helpful suggestions that greatly improved the paper. Funding was provided by National Natural Science Foundation of China (Grant No. 11471065).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoguang Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

In this appendix, we present the detailed proofs for Theorems 13, and the calculation of \(I(\widehat{\pmb {\theta }})\). Our proof relies heavily on some results in empirical processes. To facilitate our proof for the consistency of our estimators, we need the following lemma, which play an important roles in the proof of our Theorem 1. Lemma 1 establishes the covering number of size \(\epsilon \) needed to cover \(\mathcal {L}_1=\{ l(\pmb {\theta };\text{ O}):\pmb {\theta }\in \varvec{\Theta }_n\}\).

Whether a given class \(\mathcal {L}\) is a Glivenko–Cantelli (GC) or Donsker class depends on the size of the class. A relatively simple way to measure the size of a class \(\mathcal {L}\) is to use entropy numbers. The existing results of the known GC or Donsker class are mostly in the parameter or non-parametric framework, while in the semiparametric framework, the size of the class \(\mathcal {L}\) can be more clearly reflected by the entropy number. Therefore, the technical analysis on the entropy numbers is necessary.

Lemma 1

The covering number of the function class \(\mathcal {L}_1=\{ l(\pmb {\theta };\text{ O}):\pmb {\theta }\in \varvec{\Theta }_n\}\) satisfies

$$\begin{aligned} N(\epsilon ,\mathcal {L}_1,\parallel \cdot \parallel _{\infty })\le CM_n^{(N+1)}\epsilon ^{-(N+p+q+1)}, \end{aligned}$$

where the degree N of Bernstein polynomials satisfies \(N=o(n^\nu )\) with \(0<\nu <1\) and the size of the sieve space \(\varvec{\Theta }_n\) is controlled by \(M_n=O(n^a)\) with a constant \(a>0\).

Proof of Lemma 1By the Taylor series expansion, for any \(\pmb {\theta }_1=\big (\pmb {\alpha }_{1},\pmb {\beta }_{1},\Lambda _{1}(\cdot )\big ),\pmb {\theta }_2=\big (\pmb {\alpha }_{2},\pmb {\beta }_{2},\Lambda _{2}(\cdot )\big )\in \varvec{\Theta }_n\), there exists a large enough constant C such that

$$\begin{aligned} \mid l(\pmb {\theta }_1;\text{ O})-l(\pmb {\theta }_2;\text{ O})\mid \le C\bigl (\Vert \pmb {\alpha }_1-\pmb {\alpha }_2\Vert +\Vert \pmb {\beta }_1-\pmb {\beta }_2\Vert +\Vert \Lambda _{1}(\cdot )-\Lambda _{2}(\cdot )\Vert _{\infty }\bigr ). \end{aligned}$$

For any \(\Lambda _1(\cdot ),\Lambda _2(\cdot )\in \mathcal {M}_n,\) let \(\pmb {\gamma }_1=(\gamma _{0,1},\ldots ,\gamma _{N,1})^{\intercal }, \pmb {\gamma }_2=(\gamma _{0,2},\ldots ,\gamma _{N,2})^{\intercal }\) be their Bernstein polynomials coefficient vectors, respectively. Then, we have

$$\begin{aligned} \Vert \Lambda _{1}(\cdot )-\Lambda _{2}(\cdot )\Vert _{\infty }= & {} \sup _t\mid \sum _{l=0}^N\gamma _{l,1}b_{l,N}(t)-\sum _{l=0}^N\gamma _{l,2}b_{l,N}(t) \mid \le \max _{0\le l\le N}\mid \gamma _{l,1}-\gamma _{l,2}\mid \\= & {} \Vert \pmb {\gamma }_1-\pmb {\gamma }_2 \Vert _{\infty }. \end{aligned}$$

Therefore one can write that

$$\begin{aligned} \mid l(\pmb {\theta }_1;\text{ O})-l(\pmb {\theta }_2;\text{ O})\mid \le C(\Vert \pmb {\alpha }_1-\pmb {\alpha }_2\Vert +\Vert \pmb {\beta }_1-\pmb {\beta }_2\Vert +\Vert \pmb {\gamma }_1-\pmb {\gamma }_2 \Vert _{\infty }). \end{aligned}$$

Let \(\mathcal {B}=\{(\pmb {\alpha }^{\intercal },\pmb {\beta }^{\intercal })^{\intercal } \in \mathbb {R}^{p+q}:\Vert \pmb {\alpha }\Vert +\Vert \pmb {\beta }\Vert \le M \}\) and \(\mathcal {C}=\{\pmb {\gamma }\in \mathbb {R}^{N+1}:\sum _{l=0}^N\mid \gamma _l\mid \le M_n \}.\) Combining Problem 18 in Pollard (1984),(p. 40), that is, tensor product of two \(\epsilon /2C\) balls in \(\mathcal {B}\) and \(\mathcal {C}\), respectively, is contained in a \(\epsilon \)-ball in \(\mathcal {L}_1\); tensor products of balls covering \(\mathcal {B}\) and balls covering \(\mathcal {C}\) produce sets covering the tensor of \(\mathcal {B}\) and \(\mathcal {C}\) of which \(\mathcal {L}_1\) is a subset; Hence the covering number of \(\mathcal {L}_1\) is controlled by the covering numbers of \(\mathcal {B}\) and \(\mathcal {C}\), we obtain that

$$\begin{aligned} N(\epsilon , \mathcal {L}_1, \parallel \cdot \parallel _{\infty }) \le N(\frac{\epsilon }{2C}, \mathcal {B}, \parallel \cdot \parallel ) N(\frac{\epsilon }{2C}, \mathcal {C}, \parallel \cdot \parallel _{\infty }). \end{aligned}$$

Using Lemma 4.1 of Pollard (1990) that presents a method for finding the bounds for the packing numbers of a set and the bounds on the packing numbers grow geometrically, thus the packing numbers of \(\mathcal {B}\) and \(\mathcal {C}\) that defined in Pollard (1990),(p.10) satisfy

$$\begin{aligned} M(\epsilon , \mathcal {B}, \parallel \cdot \parallel ) \le \biggl (\frac{6M}{\epsilon }\biggr )^{p+q} \text { and } \ M(\epsilon , \mathcal {C}, \parallel \cdot \parallel _{\infty }) \le \biggl (\frac{6M_n}{\epsilon }\biggr )^{N+1}. \end{aligned}$$

Next, by Lemma 9.2 in Györfi et al. (2002) shows that the \(L_{(p)}\) covering numbers of the size \(\epsilon \) are controlled by the \(L_{(p)}\) packing numbers of the size \(\epsilon \) where \(p\ge 1\), one can obtain that

$$\begin{aligned} N(\epsilon , \mathcal {B}, \parallel \cdot \parallel ) \le \biggl (\frac{6M}{\epsilon }\biggr )^{p+q} \text { and } \ N(\epsilon , \mathcal {C}, \parallel \cdot \parallel _{\infty }) \le \biggl (\frac{6M_n}{\epsilon }\biggr )^{N+1}. \end{aligned}$$

Combining the above results, we have the conclusion that

$$\begin{aligned} N(\epsilon ,\mathcal {L}_1,\parallel \cdot \parallel _{\infty })\le CM_n^{(N+1)}\epsilon ^{-(N+p+q+1)}. \end{aligned}$$

\(\square \)

Proof of Theorem 1 By adopting the notations of Pollard (1984), we write

$$\begin{aligned} {\left\{ \begin{array}{ll} &{}P_n l=l_n(\pmb {\theta },\mathcal {O})=\frac{1}{n}\sum _{i=1}^{n}l(\pmb {\theta },\text{ O}_i),\\ &{}Pl=E_0 l(\pmb {\theta },\text{ O}), \end{array}\right. } \end{aligned}$$

where \(E_0\) denotes the expectation with respect to \(\text{ O }\) under true values of parameters.

Proving for the consistency of our estimators requires the following steps.

  1. (a)

    : Calculate the covering number of the function class \(\mathcal {L}_1=\{ l(\pmb {\theta };\text{ O}):\pmb {\theta }\in \varvec{\Theta }_n\}\) by lemma 1.

  2. (b)

    : Prove uniform convergence, i.e. \(\sup _{\mathcal {L}_1}\big |P_nl-Pl\big | \rightarrow 0\) a.s.

  3. (c)

    : Prove \(d(\widehat{\pmb {\theta }}_n,\pmb {\theta }_0)\rightarrow 0\) a.s.

The function class \(\mathcal {L}_1=\{ l(\pmb {\theta };\text{ O}):\pmb {\theta }\in \varvec{\Theta }_n\}\) has an envelope G with \(Pl^2\le PG^2\le \delta ^2 \). Let \(\alpha _n=n^{-1/2+\phi _1}(\log n)^{1/2}\) with \(\frac{\nu }{2}<\phi _1<\frac{1}{2}\) such that \(\alpha _n\) is a non-increasing sequence of positive numbers. For any given \(\epsilon >0\), we choose \(\epsilon _n=\epsilon \delta ^2\alpha _n\). Then for any \(\pmb {\theta }\in \Theta _n\) and a sufficiently large n, we have

$$\begin{aligned} \frac{Var(P_nl)}{(4\epsilon _n)^2} \le \frac{\frac{1}{n}Pl^2}{16\epsilon ^2\delta ^4\alpha _n^2} \le \frac{1}{16\epsilon ^2\delta ^2n^{2\phi _1}\log n} \le \frac{1}{2}. \end{aligned}$$

Let \(P_n^{o}\) denote the signed measure that places mass \(\pm n^{-1}\) at each of the observations \(\{O_1,\ldots ,O_n\}\), with the random ± signs being decided independently of the \(O_i\)’s. Applying the symmetrization inequality (Pollard (1984), II (30)) and Hoeffding’s inequality (Pollard (1984), II (31)), it follows that

$$\begin{aligned} P\big (\sup _{\mathcal {L}_1}|P_nl-Pl|>8\epsilon _n\big )\le & {} 4P\big (\sup _{\mathcal {L}_1}|P_n^ol|>2\epsilon _n\big ) \\= & {} 4E\Bigl \{I_{ \{ \sup \limits _{\mathcal {L}_1}|P_n^ol|>2\epsilon _n\}} \Bigr \} \\= & {} 4E \Bigl \{E\bigl (I_{ \{ \sup \limits _{\mathcal {L}_1}|P_n^ol|>2\epsilon _n \}}|\mathcal {O}\bigr )\Bigr \} \\= & {} 4E\left\{ P(\sup _{\mathcal {L}_1}|P_n^ol|>2\epsilon _n|\mathcal {O})\right\} \\\le & {} 4E\left\{ 2N(\epsilon _n,\mathcal {L}_1,\parallel \cdot \parallel _{\infty })\exp \left( \frac{-\frac{1}{2}n\epsilon _n^2}{\max \limits _{j}P_nl_j^2}\right) \right\} \\\le & {} 8C M_n^{N+1}\epsilon _n^{-(N+p+q+1)}\exp \Bigl (\frac{-\frac{1}{2}n\epsilon _n^2}{\max \limits _{j}P_nl_j^2}\Bigr ), \end{aligned}$$

where the maximum runs over all functions \(\{l_j\} \text { in } \mathcal {L}_1\). Using the law of total probability, we have

$$\begin{aligned}&P\Big (\sup _{\mathcal {L}_1}|P_nl-Pl|>8\epsilon _n\Big ) \nonumber \\&\quad =P\Bigl (\sup _{\mathcal {L}_1}|P_nl-Pl|>8\epsilon _n \Big | \sup _{\mathcal {L}_1}|P_nl^2|\le 64\delta ^2\Bigr ) P\Bigl (\sup _{\mathcal {L}_1}|P_nl^2|\le 64\delta ^2\Bigr ) \nonumber \\&\qquad + P\Bigl (\sup _{\mathcal {L}_1}|P_nl-Pl|>8\epsilon _n \Big | \sup _{\mathcal {L}_1}|P_nl^2|> 64\delta ^2\Bigr ) P\Bigl (\sup _{\mathcal {L}_1}|P_nl^2|> 64\delta ^2\Bigr ) \nonumber \\&\quad \le P\Bigl (\sup _{\mathcal {L}_1}|P_nl-Pl|>8\epsilon _n \Big | \sup _{\mathcal {L}_1}|P_nl^2|\le 64\delta ^2\Bigr ) + P\Bigl (\sup _{\mathcal {L}_1}|P_nl^2|> 64\delta ^2\Bigr ).\nonumber \\ \end{aligned}$$
(12)

Note that \(N=o(n^\nu )\), and \(\nu /2<\phi _1\), which indicate \(N=o(n^{2\phi _1})\). Now we study the first term of the inequality (12),

$$\begin{aligned}&P\Bigl (\sup _{\mathcal {L}_1}|P_nl-Pl|>8\epsilon _n \Big | \sup _{\mathcal {L}_1}|P_nl^2|\le 64\delta ^2\Bigr ) \\&\quad \le 8C M_n^{N+1} \epsilon _n^{-(N+p+q+1)} \exp \Bigl (\frac{-n\epsilon _n^2}{128\delta ^2}\Bigr ) \\&\quad \le 8C \exp \Bigl \{ \bigl ( N+1 \bigr ) a\log n - \bigl ( N+p+q+1 \bigr ) \log {(\epsilon \delta ^2\alpha _n)} - \frac{n\epsilon _n^2}{128\delta ^2} \Bigr \} \\&\quad \le 8C \exp \Bigl \{ \bigl ( N+p+q+1 \bigr ) \bigl [ (a+\frac{1}{2}-\phi _1) \log n - \log {(\epsilon \delta ^2)} \\&\qquad -\frac{1}{2}\log {\log n} \bigr ] - \frac{\epsilon ^2\delta ^2 n^{2\phi _1}\log n}{128} \Bigr \} \\&\quad \le 8C \exp \Bigl \{ n^{2\phi _1}\log n \bigl [ \frac{\bigl ( o(n^{2\phi _1})+p+q+1 \bigr ) (a+\frac{1}{2}-\phi _1)}{n^{2\phi _1}} - \frac{\epsilon ^2\delta ^2}{128} \bigr ] \Bigr \} \\&\quad \le 8C \exp \bigl \{ -\tilde{C} n^{2\phi _1}\log n \bigr \}, \end{aligned}$$

where \(\tilde{C}\) is a constant. For the second term of the inequality (12), by Lemma 33 of Pollard (1984) it follows that

$$\begin{aligned} P\left( \sup _{\mathcal {L}_1}\mid P_nl^2 \mid> 64\delta ^2\right)= & {} P\left( \sup _{\mathcal {L}_1}|P_nl^2|^{1/2}> 8\delta \right) \\\le & {} 4E \Bigl \{ N(\delta ,\mathcal {L}_1,P_n) \exp (-n\delta ^2)\wedge 1 \Big \} \\\le & {} 4E \Bigl \{ N(\delta ,\mathcal {L}_1,\parallel \cdot \parallel _{\infty }) \exp (-n\delta ^2)\wedge 1 \Bigr \} \\\le & {} 4 \Bigl \{ C M_n^{N+1} \delta ^{-(N+p+q+1)} \exp (-n\delta ^2)\wedge 1 \Big \} \\\le & {} 4C \exp \Bigl \{ (o(n^\nu )+p+q+1) (a\log n - \log \delta ) -n\delta ^2 \Bigr \}. \end{aligned}$$

This result indicates that the second term converges to zero even faster than the first term. Therefore one can obtain that \(\sum _{n=1}^{\infty } P\big (\sup _{\mathcal {L}_1} |P_nl-Pl| > 8\epsilon _n\big ) < \infty \). By adopting the Borel–Contelli lemma, it is easy to obtain that

$$\begin{aligned} \sup _{\mathcal {L}_1}\big |P_nl-Pl\big | \rightarrow 0, \quad a.s. \end{aligned}$$

Note that \(P_nl(\pmb {\theta }_0;\text{ O}) \le P_nl(\widehat{\pmb {\theta }}_n;\text{ O})\), then we have

$$\begin{aligned} 0\le & {} Pl(\pmb {\theta }_0;\text{ O})-Pl(\widehat{\pmb {\theta }}_n;\text{ O}) \\= & {} Pl(\pmb {\theta }_0;\text{ O}) - P_nl(\pmb {\theta }_0;\text{ O}) + P_nl(\pmb {\theta }_0;\text{ O}) - P_nl(\widehat{\pmb {\theta }}_n;\text{ O}) + P_nl(\widehat{\pmb {\theta }}_n;\text{ O}) - Pl(\widehat{\pmb {\theta }}_n;\text{ O}) \\\le & {} Pl(\pmb {\theta }_0;\text{ O}) - P_nl(\pmb {\theta }_0;\text{ O}) + P_nl(\widehat{\pmb {\theta }}_n;\text{ O}) - Pl(\widehat{\pmb {\theta }}_n;\text{ O}). \end{aligned}$$

Therefore, we have the conclusion that

$$\begin{aligned}&\&\mid Pl(\pmb {\theta }_0;\text{ O})-Pl(\widehat{\pmb {\theta }}_n;\text{ O})\mid \\\le & {} \mid Pl(\pmb {\theta }_0;\text{ O}) - P_nl(\pmb {\theta }_0;\text{ O})\mid + \mid P_nl(\widehat{\pmb {\theta }}_n;\text{ O}) - Pl(\widehat{\pmb {\theta }}_n;\text{ O})\mid \rightarrow 0, \quad a.s. \end{aligned}$$

Considering the fact that the Kullback–Leibler information is greater than or equal to the square of the Hellinger metric (Xue et al. 2004), one can obtain that

$$\begin{aligned} \mid Pl(\pmb {\theta }_0;\text{ O})-Pl(\widehat{\pmb {\theta }}_n;\text{ O})\mid= & {} E_{\pmb {\theta }_0} \Bigl \{ l(\pmb {\theta }_0;\text{ O}) - l(\widehat{\pmb {\theta }}_n;\text{ O}) \Bigr \} \\\ge & {} \Big \Vert \sqrt{L(\pmb {\theta }_0;\text{ O})} - \sqrt{L(\widehat{\pmb {\theta }}_n;\text{ O})}\Big \Vert _2^2 \\= & {} \Big \Vert \frac{\nabla _{\pmb {\theta }}L(\check{\pmb {\theta }};\text{ O})}{2\sqrt{L(\check{\pmb {\theta }};\text{ O})} } (\pmb {\theta }_0 - \widehat{\pmb {\theta }}_n) \Big \Vert _2^2, \end{aligned}$$

where \(\check{\pmb {\theta }}\) is between \(\pmb {\theta }_0\) and \(\widehat{\pmb {\theta }}_n\), and the derivative of \(L(\pmb {\theta };\text{ O})\) with respect to \(\pmb {\theta }\) is \(\nabla _{\pmb {\theta }}L(\pmb {\theta };\text{ O})\). Note that \(\frac{\nabla _{\pmb {\theta }}L(\check{\pmb {\theta }};\text{ O})}{2\sqrt{L(\check{\pmb {\theta }};\text{ O})}}\) is not equal to zero and bounded. Therefore, \(d(\widehat{\pmb {\theta }}_n,\pmb {\theta }_0)\rightarrow 0\) almost surely. Note that, \(\parallel \widehat{\pmb {\alpha }}_n-\pmb {\alpha }_0\parallel \le d(\widehat{\pmb {\theta }}_n,\pmb {\theta }_0)\), \(\parallel \widehat{\pmb {\beta }}_n-\pmb {\beta }_0\parallel \le d(\widehat{\pmb {\theta }}_n,\pmb {\theta }_0)\) and \(\parallel \Lambda _{n}-\widehat{\Lambda }_{0}\parallel _2\le d(\widehat{\pmb {\theta }}_n,\pmb {\theta }_0)\). Then we have

$$\begin{aligned} \parallel \widehat{\pmb {\alpha }}_n-\pmb {\alpha }_0\parallel \rightarrow 0,\quad \parallel \widehat{\pmb {\beta }}_n-\pmb {\beta }_0\parallel \rightarrow 0,\quad \parallel \widehat{\Lambda }_n-\Lambda _0\parallel _2\rightarrow 0,\quad a.s. \end{aligned}$$

\(\square \)

Proof of Theorem 2 ne can derive the convergence rate of \(\widehat{\pmb {\theta }}_n\) by verifying the conditions of Theorem 3.2.5 in Van der Vaart and Wellner (1996). First, by the relationship between the Hellinger distance and the Kullback–Leibler information in the proof of theorem 1, we obtain

$$\begin{aligned} Pl(\pmb {\theta }_0;\text{ O})-Pl(\pmb {\theta };\text{ O})\ge Cd^2(\pmb {\theta }_0,\pmb {\theta }),\quad \pmb {\theta }\in \varvec{\Theta }_n. \end{aligned}$$

Second, note from Theorem 1.6.2 of Lorentz (1986) and the conclusions of Osman and Ghosh (2012), there exists the Bernstein polynomials \(\Lambda _{0,n}(\cdot )\) such that \(\parallel \Lambda _0(\cdot ) - \Lambda _{0,n}(\cdot ) \parallel _{\infty } = O(N^{-r/2})\) with \(r\ge 1\). Thus we obtain \(d(\pmb {\theta }_0,\pmb {\theta }_{0,n}) = O(n^{-{r\nu }/2})\), where \(\pmb {\theta }_{0,n}=(\pmb {\alpha }_{0},\pmb {\beta }_{0},\Lambda _{0,n}(\cdot ))\).

Then, we further explore \(P_nl(\widehat{\pmb {\theta }}_n;\text{ O}) - P_nl(\pmb {\theta }_0;\text{ O})\). In the proof of consistency, we know that

$$\begin{aligned} P_nl(\widehat{\pmb {\theta }}_n;\text{ O}) - P_nl(\pmb {\theta }_0;\text{ O})= & {} (P_n-P)\{l(\pmb {\theta }_{0n};\text{ O}) - l(\pmb {\theta }_0;\text{ O})\}+P\{Pl(\pmb {\theta }_{0n};\text{ O}) \\&\quad - Pl(\pmb {\theta }_0;\text{ O})\} \\= & {} I_{1n}+I_{2n}. \end{aligned}$$

We can construct a set of brackets for the class \(\mathcal {L}_2=\{l(\vartheta _{0},\Lambda (\cdot ))-l(\vartheta _{0},\Lambda _{0}(\cdot )): \Lambda (\cdot )\in M_{n}\ \text {and} \parallel \Lambda _0(\cdot ) - \Lambda _{0,n}(\cdot ) \parallel _{\infty } \le Cn^{-r\nu /2}\}\) with the \(\epsilon \)-bracketing number bounded by \((1/\epsilon )^{C(N+1)}\). This yields a finite value bracketing integral. Hence the class \(\mathcal {L}_2\) is P-Donsker. Using similar arguments as those in Zhang et al. (2010), we can obtain \(I_{1n}=o_p(n^{-r\nu })\) and \(I_{2n}\ge -O(n^{-r\nu })\). Thus, we have \(P_nl(\widehat{\pmb {\theta }}_n;\text{ O}) - P_nl(\pmb {\theta }_0;\text{ O})\ge -O_p(n^{-r\nu })=O_{p}(n^{-2\min (r\nu /2),(1-\nu )/2})\).

Let \(\mathcal {L}_2(\varsigma )=\big \{l(\pmb {\theta };\text{ O})-l(\pmb {\theta }_0;\text{ O}): \pmb {\theta }\in \varvec{\Theta }_n \text { and } d(\pmb {\theta },\pmb {\theta }_0)\le \varsigma \big \}\). Moreover, one can obtain \(P\big \{l(\pmb {\theta };\text{ O})-l(\pmb {\theta }_0;\text{ O})\big \}^2 \le Cd^2(\pmb {\theta },\pmb {\theta }_0)\le C\varsigma ^2\) for any \(l(\pmb {\theta };\text{ O})-l(\pmb {\theta }_0;\text{ O})\in \mathcal {L}_2(\varsigma )\). Note that \(\mathcal {L}_2(\varsigma )\) is uniformly bounded with conditions (C1)–(C3). Obeying the calculating results in Shen and Wong (1994),(p.597), we can establish that for \(0<\epsilon <\varsigma \), the bracketing entropy \(\log N_{[\,]}(\epsilon ,\mathcal {L}_2(\varsigma ),L_2(P))\) is bounded by \(C\tilde{N}\log {(\varsigma /\epsilon })\) with \(\tilde{N}=N+1\), where \(N_{[\,]}(\epsilon ,\mathcal {L}_2(\varsigma ),L_2(P))\) is the \(\epsilon \)-bracketing number of \(\mathcal {L}_2(\varsigma )\) presented in Definition 2.1.6 of Van der Vaart and Wellner (1996).

Therefore, according to Lemma 3.4.2 in Van der Vaart and Wellner (1996) that the continuity modulus of the empirical process \(\sqrt{n}(P_{n}-P)\) gives an upper bounded on the rate, we obtain that

$$\begin{aligned} E_P\parallel n^{1/2}(P_n-P)\parallel _{\mathcal {L}_2(\varsigma )} \le CJ_{[\,]}\big \{\varsigma ,\mathcal {L}_2(\varsigma ),L_2(P) \big \} \biggl \{ 1 + \frac{J_{[\,]}\{\varsigma ,\mathcal {L}_2(\varsigma ),L_2(P) \}}{n^{1/2}\varsigma ^2} \biggr \}, \end{aligned}$$

where \(J_{[\,]}\{\varsigma ,\mathcal {L}_2(\varsigma ),L_2(P) \} = \int _0^{\varsigma }\bigl [ 1 +\log N_{[\,]}\{\epsilon ,\mathcal {L}_2(\varsigma ),L_2(P)\} \bigr ]^{1/2}d\epsilon \le C\tilde{N}^{1/2}\varsigma \). This result implies that the function \(\phi _n(\varsigma )\) of Theorem 3.2.5 in Van der Vaart and Wellner (1996) can be given by \(\phi _n(\varsigma )=\tilde{N}^{1/2}\varsigma +\tilde{N}/n^{1/2}\) and we know that \(\phi _{n}(\varsigma )/\varsigma \) is decreasing in \(\varsigma \). Note that

$$\begin{aligned} n^{r\nu }\phi _n(1/n^{r\nu /2})=n^{r\nu }\{\tilde{N}^{1/2}n^{-{r\nu }/2}+\tilde{N}n^{-1/2}\} \le n^{1/2}\{n^{(\nu -1)/2+{r\nu }/2}+n^{\nu -1+r\nu }\}. \end{aligned}$$

If \(r\nu /2 \le (1-\nu )/2\), then \(n^{r\nu }\phi _n(1/n^{{r\nu }/2})\le n^{1/2}\). Thus \(r_n\) is given by \(r_n=n^{\min \{r\nu /2,(1-\nu )/2\}}\), which leads to \(r_n^2\phi _n(1/r_n)\le n^{1/2}\).

Finally, Combining above results that satisfy the conditions of Theorem 3.2.5 in Van der Vaart and Wellner (1996). We have \(r_nd(\widehat{\pmb {\theta }}_n,\pmb {\theta }_0)=O_p(1)\); that is, \(d(\widehat{\pmb {\theta }}_n,\pmb {\theta }_0)=O_p(n^{-\min \{r\nu /2,(1-\nu )/2\}})\). The proof of Theorem 2 is completed.

Proof of Theorem 3 For any \(v^*\in \varvec{\Theta }_0\), by Theorem 1.6.2 in Lorentz (1986) and the conclusions of Osman and Ghosh (2012), there exists \(\pi _n v^*\in \varvec{\Theta }_n\) such that \(\parallel \pi _n v^*-v^*\parallel =O(n^{-\frac{r\nu }{2}})\). Note that \(\delta _n\parallel \pi _n v^*-v^*\parallel =o(n^{-1/2})\) with \(r>1\) and \(r\nu >1/2\). Let \(\varepsilon _{n}\) be any positive sequence with \(\varepsilon _{n}=o(n^{-1/2})\) and define \(\rho [\pmb {\theta }-\pmb {\theta }_0;\text{ O}]=l(\pmb {\theta };\text{ O})-l(\pmb {\theta }_0;\text{ O})-\dot{l}(\pmb {\theta }_0;\text{ O})[\pmb {\theta }-\pmb {\theta }_0]\). Then by the \(P\{\dot{l}(\widehat{\pmb {\theta }}_{0};\text{ O})[\pi _{n}v^{*}]\}=0\). One can obtain that

$$\begin{aligned} \begin{aligned} 0&\le P_n\Bigl \{l(\widehat{\pmb {\theta }}_n;\text{ O}) - l(\widehat{\pmb {\theta }}_n\pm \varepsilon _{n}\pi _{n}v^*;\text{ O})\Bigr \} \\&= P_n\Bigl \{ [ l(\widehat{\pmb {\theta }}_n;\text{ O}) - l(\pmb {\theta }_{0};\text{ O}) ] - [ l(\widehat{\pmb {\theta }}_{n}\pm \varepsilon _{n}\pi _{n}v^{*};\text{ O}) - l(\pmb {\theta }_0;\text{ O}) ] \Bigr \} \\&= \mp \varepsilon _{n}P_n\dot{l}(\pmb {\theta }_0;\text{ O})[\pi _nv^*] + P_n\Bigl \{ \rho [\widehat{\pmb {\theta }}_n-\pmb {\theta }_0;\text{ O}] - \rho [\widehat{\pmb {\theta }}_{n}\pm \varepsilon _{n}\pi _{n}v^{*}-\pmb {\theta }_{0};\text{ O}] \Bigr \} \\&= \mp \varepsilon _{n}P_n\dot{l}(\pmb {\theta }_0;\text{ O})[v^*] \mp \varepsilon _{n}P_n\dot{l}(\pmb {\theta }_0;\text{ O})[\pi _{n}v^{*}-v^{*}]\\&\quad + (P_n-P)\Bigl \{\rho [\widehat{\pmb {\theta }}_{n}-\pmb {\theta }_{0};\text{ O}]-\rho [\widehat{\pmb {\theta }}_n\pm \varepsilon _{n}\pi _nv^*-\pmb {\theta }_0;\text{ O}]\Bigr \}\\&\quad + P\Bigl \{ \rho [\widehat{\pmb {\theta }}_{n}-\pmb {\theta }_{0};\text{ O}] - \rho [\widehat{\pmb {\theta }}_{n}\pm \varepsilon _{n}\pi _nv^{*}-\pmb {\theta }_{0};\text{ O}] \Bigr \} \\&:= \mp \varepsilon _{n}P_{n}\dot{l}(\pmb {\theta }_{0};\text{ O})[v^*] + I_1 + I_2 + I_3. \end{aligned} \end{aligned}$$

Next, we will study the asymptotic properties of \(I_1, I_2\) and \(I_3\).

For \(I_1\), by the Chebyshev’s inequality, \(P\dot{l}(\pmb {\theta }_0;\text{ O})[\pi _nv^*-v^*]=0\) and \(\parallel \pi _nv^*-v^*\parallel =o(1)\), we obtain

$$\begin{aligned} P\left( \frac{\mid P_n\dot{l}(\pmb {\theta }_0;\text{ O})[\pi _nv^*-v^*] \mid }{n^{-1/2}} \ge \varepsilon \right)\le & {} \frac{Var(P_n\dot{l}(\pmb {\theta }_0;\text{ O})[\pi _nv^*-v^*])}{n^{-1}\varepsilon ^2} \\= & {} \frac{ Var(\dot{l}(\pmb {\theta }_0;\text{ O})[\pi _nv^*-v^*])}{\varepsilon ^{2}} \\= & {} \frac{P(\dot{l}(\pmb {\theta }_0;\text{ O})[\pi _nv^*-v^*]\dot{l}(\pmb {\theta }_0;\text{ O})[\pi _nv^*-v^*])}{\varepsilon ^2} \\= & {} \frac{\parallel \pi _nv^*-v^* \parallel ^2}{\varepsilon ^2} \rightarrow 0, \end{aligned}$$

which indicates that \(P_n\dot{l}(\pmb {\theta }_0;\text{ O})[\pi _nv^*-v^*]=o_p(n^{-1/2})\). Thus, we have \(I_1=\varepsilon _{n}\times o_p(n^{-1/2})\).

For \(I_2\), by the mean value theorem, we have

$$\begin{aligned} \begin{aligned} I_2&= (P_n-P)\Bigl \{l(\widehat{\pmb {\theta }}_n;\text{ O}) - l(\widehat{\pmb {\theta }}_n\pm \varepsilon _n\pi _nv^*;\text{ O}) \pm \varepsilon _n\dot{l}(\pmb {\theta }_0;\text{ O})[\pi _nv^*] \Bigr \}\\&= (P_n-P)\Bigl \{ \dot{l}(\check{\pmb {\theta }};\text{ O})[\mp \varepsilon _n\pi _nv^*] \pm \varepsilon _n\dot{l}(\pmb {\theta }_0;\text{ O})[\pi _nv^*] \Bigr \}\\&= \mp \varepsilon _n(P_n-P)\Bigl \{ \dot{l}(\check{\pmb {\theta }};\text{ O})[\pi _nv^*] - \dot{l}(\pmb {\theta }_0;\text{ O})[\pi _nv^*] \Bigr \}, \end{aligned} \end{aligned}$$

where \(\check{\pmb {\theta }}\) is between \(\widehat{\pmb {\theta }}_n\) and \(\widehat{\pmb {\theta }}_n\pm \varepsilon _n\pi _nv^*\). Consider the function class \(\mathcal {L}_3=\{\dot{l}(\pmb {\theta };\text{ O})[\pi _nv^*] :\pmb {\theta }\in \varvec{\Theta }_n \text { and } \parallel \pmb {\theta }-\pmb {\theta }_0\parallel = O(\delta _n)\}\). For any \(\dot{l}(\pmb {\theta }_i;\text{ O})[\pi _nv^*]\in \mathcal {L}_3\ (i=1,2)\), we have \(\bigl |\dot{l}(\pmb {\theta }_1;\text{ O})[\pi _nv^*] - \dot{l}(\pmb {\theta }_2;\text{ O})[\pi _nv^*]\bigr |\le C\parallel \pmb {\theta }_1-\pmb {\theta }_2\parallel \). Thus it yields that

$$\begin{aligned} N\big (\epsilon ,\mathcal {L}_3,L_2(Q)\big )\le N\big (\epsilon ,\big \{\pmb {\theta }:\pmb {\theta }\in \varvec{\Theta }_n \text { and } \parallel \pmb {\theta }-\pmb {\theta }_0\parallel \le C\delta _n\big \},\parallel \cdot \parallel \big ). \end{aligned}$$

Similar to the proof of Lemma 1, one can obtain that \(N(\epsilon ,\mathcal {L}_3,L_2(Q)) \le (\frac{C\delta _n}{\epsilon })^{N+1}\). Then we have the finiteness of the entropy inequality, i.e. \(\int _0^{\infty }\sup _{Q}\sqrt{N(\epsilon ,\mathcal {L}_3,L_2(Q))}d\epsilon < \infty \) with \(\nu <1/2\) and \(r>1\). Note that \(\dot{l}(\pmb {\theta };\text{ O})[\pi _nv^*]\) is uniformly bounded under conditions (C1)–(C3). Hence \(\mathcal {L}_3\) is a Donsker class by Theorem 2.8.3 of Van der Vaart and Wellner (1996). Applying the relationship between Donsker and asymptotic equicontinuity given by Corollary 2.3.12 of Van der Vaart and Wellner (1996), one can obtain that

$$\begin{aligned} (P_n-P)\Bigl \{\dot{l}(\tilde{\pmb {\theta }};\text{ O})[\pi _nv^*] - \dot{l}(\pmb {\theta }_0;\text{ O})[\pi _nv^*]\Bigr \}=o_p(n^{-1/2}). \end{aligned}$$

Therefore, we have \(I_2=\epsilon _n\times o_p(n^{-1/2})\).

For \(I_{3}\), note that \(P\big \{\ddot{l}(\pmb {\theta }_0;\text{ O})[\widehat{\pmb {\theta }}_n-\pmb {\theta }_0,\widehat{\pmb {\theta }}_n-\pmb {\theta }_0]\big \}=-P\big \{\dot{l}(\pmb {\theta }_0;\text{ O})[\widehat{\pmb {\theta }}_n-\pmb {\theta }_0]\dot{l}(\pmb {\theta }_0;\text{ O})[\widehat{\pmb {\theta }}_n-\pmb {\theta }_0]\big \}.\) For any \(\pmb {\theta }\in \big \{\pmb {\theta }:d(\pmb {\theta },\pmb {\theta }_0)=O(\delta _n)\big \}\), \(P\big \{\ddot{l}({\pmb {\theta }};\text{ O})[\pmb {\theta }-\pmb {\theta }_0,\pmb {\theta }-\pmb {\theta }_0] - \ddot{l}(\pmb {\theta }_0;\text{ O})[\pmb {\theta }-\pmb {\theta }_0,\pmb {\theta }-\pmb {\theta }_0]\big \} = O(\delta _n^3)\), \(\delta _n^3 = o(n^{-1})\) with \(2/3r<\nu <1/3\) and \(r>2\). Then we have

$$\begin{aligned} P\big (\rho [\widehat{\pmb {\theta }}_n-\pmb {\theta }_0;\text{ O}]\big )= & {} P\bigl \{l(\widehat{\pmb {\theta }}_n;\text{ O}) - l(\pmb {\theta }_0;\text{ O}) - \dot{l}(\pmb {\theta }_0;\text{ O})[\widehat{\pmb {\theta }}_n-\pmb {\theta }_0]\bigr \}\\= & {} \frac{1}{2}P\bigl \{\ddot{l}(\check{\pmb {\theta }};\text{ O})[\widehat{\pmb {\theta }}_n-\pmb {\theta }_0,\widehat{\pmb {\theta }}_n-\pmb {\theta }_0] - \ddot{l}(\pmb {\theta }_0;\text{ O})[\widehat{\pmb {\theta }}_n-\pmb {\theta }_0,\widehat{\pmb {\theta }}_n-\pmb {\theta }_0]\bigr \}\\&\quad + \frac{1}{2}P\bigl \{\ddot{l}(\pmb {\theta }_0;\text{ O})[\widehat{\pmb {\theta }}_n-\pmb {\theta }_0,\widehat{\pmb {\theta }}_n-\pmb {\theta }_0]\bigr \} \\ \\= & {} \varepsilon _n \times o_p(n^{-1/2}) + \frac{1}{2}P\{\ddot{l}(\pmb {\theta }_0;\text{ O})[\widehat{\pmb {\theta }}_n-\pmb {\theta }_0,\widehat{\pmb {\theta }}_n-\pmb {\theta }_0]\} \\= & {} \varepsilon _n \times o_p(n^{-1/2}) - \frac{1}{2}\parallel \widehat{\pmb {\theta }}_n-\pmb {\theta }_0\parallel ^2, \end{aligned}$$

where \(\check{\pmb {\theta }}\) is between \(\widehat{\pmb {\theta }}_n\) and \(\pmb {\theta }_0\). By the facts \(\parallel \pi _nv^*\parallel ^2\rightarrow \parallel v^*\parallel ^2<\infty \), Cauchy-Schwarz inequality, and \(\delta _n\parallel \pi _n v^*-v^*\parallel =o(n^{-1/2})\), we have

$$\begin{aligned} I_3= & {} - \frac{1}{2}\parallel \widehat{\pmb {\theta }}_n-\pmb {\theta }_0\parallel ^2 + \frac{1}{2}\parallel \widehat{\pmb {\theta }}_n\pm \varepsilon _n\pi _nv^*-\pmb {\theta }_0\parallel ^2 + \varepsilon _n\times o_p(n^{-1/2})\\= & {} \pm \varepsilon _n\langle \widehat{\pmb {\theta }}_n-\pmb {\theta }_0,\pi _nv^*\rangle +\frac{1}{2}\parallel \varepsilon _n\pi _nv^*\parallel ^2 + \epsilon _n\times o_p(n^{-1/2})\\= & {} \pm \varepsilon _n\langle \widehat{\pmb {\theta }}_n-\pmb {\theta }_0,v^*\rangle +\frac{1}{2}\varepsilon _n^2\parallel \pi _nv^*\parallel ^2 + \varepsilon _n\times o_p(n^{-1/2})\\= & {} \pm \varepsilon _n\langle \widehat{\pmb {\theta }}_n-\pmb {\theta }_0,v^*\rangle + \varepsilon _n\times o_p(n^{-1/2}). \end{aligned}$$

Combining the above results, we can obtain

$$\begin{aligned} 0= & {} \le P_n\bigl \{ l(\widehat{\pmb {\theta }}_n;\text{ O}) - l(\widehat{\pmb {\theta }}_n\pm \varepsilon _n\pi _nv^*;\text{ O}) \bigr \} \\= & {} \mp \varepsilon _{n}P_n\dot{l}(\pmb {\theta }_0;\text{ O})[v^*] \pm \varepsilon _n\langle \widehat{\pmb {\theta }}_n-\pmb {\theta }_0,v^*\rangle + \varepsilon _n\times o_p(n^{-1/2}). \end{aligned}$$

Note that \(P\dot{l}(\pmb {\theta }_0;\text{ O})[v^*]=0\) and \(Var(\dot{l}(\pmb {\theta }_0;\text{ O})[v^*]) = \Vert v^{*}\Vert ^2\). Therefore, by the Central Limits Theorem we obtain

$$\begin{aligned} \sqrt{n}\langle \widehat{\pmb {\theta }}_n-\pmb {\theta }_0,v^*\rangle = \sqrt{n}(P_n-P)\{\dot{l}(\pmb {\theta }_0;\text{ O})[v^*]\} + o_p(1) \rightarrow N(0,\Vert v^*\Vert ^2) \end{aligned}$$

in distribution. Note that \(G(\pmb {\theta })-G(\pmb {\theta }_0)=\dot{G}(\pmb {\theta }_0)[\pmb {\theta }-\pmb {\theta }_0]\). By the Riesz representation theorem, there exists \(v^{*}\in \mathcal {\bar{V}} \text { such that } \dot{G}(\pmb {\theta }_0)[v]=\langle v,v^*\rangle \) for any \(v\in \mathcal {\bar{V}}\) and \(\Vert v^*\Vert = \Vert \dot{G}(\pmb {\theta }_0)\Vert \). Then we have

$$\begin{aligned} \sqrt{n}\big \{G(\widehat{\pmb {\theta }}_n)-G(\pmb {\theta }_0)\big \}=\sqrt{n}\langle \widehat{\pmb {\theta }}_n-\pmb {\theta }_0,v^*\rangle +o_{p}(1)\rightarrow N(0,\Vert \dot{G}(\pmb {\theta }_0)\Vert ^{2}) \end{aligned}$$

in distribution, namely, \( \sqrt{n} \Bigl \{\text{ b}_{1}^{\intercal }(\widehat{\pmb {\alpha }}_{n}-\pmb {\alpha }_{0})+\text{ b}_{2}^{\intercal }(\widehat{\pmb {\beta }}_{n}-\pmb {\beta }_{0})+ \int _0^{\tau } b_3(t)\big (\widehat{\Lambda }_n(t)-\Lambda _0(t)\big )dt \Bigr \}\xrightarrow {L} N(0,\Vert \dot{G}(\pmb {\theta }_0)\Vert ^2)\). Adopting Theorem 4 in Shen (1997) or the conclusions of Bickel and Kwon (2001), we can establish the semiparametric efficiency of the estimators. The proof is completed.

The calculation of \(I(\widehat{\pmb {\theta }})\)

One can obtain \(Q(\widetilde{\pmb {\theta }}; \widehat{\pmb {\theta }})\) with respect to \(\widetilde{\pmb {\theta }}\). Then, the quantities of the first part in \(I(\widehat{\pmb {\theta }})\) are given by

$$\begin{aligned} \frac{\partial ^2Q(\widetilde{\pmb {\theta }};\widehat{\pmb {\theta }})}{\partial \alpha _{k}\partial \alpha _{k}^{'}}&= \sum _{i=1}^n Z_{ik}Z_{ik}^{'}\widehat{\pi }(\text{ Z}_{i})\Big \{1-\widehat{\pi }(\text{ Z}_{i}) \Big \} , \\ \frac{\partial ^2Q(\widetilde{\pmb {\theta }};\widehat{\pmb {\theta }})}{\partial \alpha _{k}\partial \eta _{j}^{'}}&=\frac{\partial ^2Q(\widetilde{\pmb {\theta }};\widehat{\pmb {\theta }})}{\partial \eta _{j}\partial \alpha _{k}^{'}} = 0, \\ \frac{\partial ^2Q(\widetilde{\pmb {\theta }};\widehat{\pmb {\theta }})}{\partial \eta _j\partial \eta _{j'}}&= -\eta _j^{-2}\sum _{i=1}^n \Big \{\delta _{1i} E(Y_{ij}|\mathcal {O},\widehat{\pmb {\theta }})\\&\quad +\delta _{2i} E(W_{ij}|\mathcal {O},\widehat{\pmb {\theta }})\Big \}I_{(j'=j)}. \end{aligned}$$

The second part of the \(I(\widehat{\pmb {\theta }})\) derived from Eq. (6) is listed as follows

$$\begin{aligned}&Cov\left( \frac{\partial \log {L_c(\widetilde{\pmb {\theta }})}}{\partial \alpha _{k}}, \frac{\partial \log {L_c(\widetilde{\pmb {\theta }})}}{\partial \alpha _{k}^{'}}\right) = \sum _{i=1}^n Z_{ik}Z_{ik}^{'} Var(U_{i}), \\&Cov\left( \frac{\partial \log {L_c(\widetilde{\pmb {\theta }})}}{\partial \alpha _{k}}, \frac{\partial \log {L_c(\widetilde{\pmb {\theta }})}}{\partial \eta _{j}}\right) = - \sum _{i=1}^n Z_{ik}\delta _{3i}\xi _{j}(L_{i}) Var(U_{i}), \\&Cov\left( \frac{\partial \log {L_c(\widetilde{\pmb {\theta }})}}{\partial \eta _j}, \frac{\partial \log {L_c(\widetilde{\pmb {\theta }})}}{\partial \eta _{j'}}\right) = (\eta _j\eta _{j'})^{-1} \sum _{i=1}^n \Big \{\delta _{1i} Cov(Y_{ij},Y_{i{j'}})\\&\quad +\delta _{2i} Cov(Y_{ij},Y_{i{j'}})\Big \}, \end{aligned}$$

where

$$\begin{aligned} Var(U_i)= & {} E(U_{i}^{2})-\{E(U_{i})\}^{2}= E(U_{i})\{1-E(U_{i})\}, \\ Cov(Y_{ij},Y_{ij'})= & {} \frac{\delta _{1i}\widehat{\lambda }_{ij}}{\widehat{c}_{i}^{2}}(\widehat{c}_{i}-\widehat{\lambda }_{ij}+\widehat{c}_{i}\widehat{\lambda }_{ij}), \\ Cov(W_{ij},W_{i{j'}})= & {} \frac{\delta _{2i}\widehat{\omega }_{ij}}{\widehat{d}_{i}^{2}}(\widehat{d}_{i}-\widehat{\omega }_{ij}+\widehat{d}_{i}\widehat{\omega }_{ij}). \end{aligned}$$

Denote \(\widehat{c}_{i}=1-\exp {(-\widehat{\lambda }_{i})}\), \(\widehat{d}_{i}=1-\exp {(-\widehat{\omega }_{i})}\), then we can obtain \(I(\widehat{\pmb {\theta }})\) by Eq. (11) and these closed forms make the estimation of variance-covariance matrix easy to be calculated. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Wang, Z. EM algorithm for the additive risk mixture cure model with interval-censored data. Lifetime Data Anal 27, 91–130 (2021). https://doi.org/10.1007/s10985-020-09507-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10985-020-09507-z

Keywords

Mathematics Subject Classification

Navigation