EM algorithm for the additive risk mixture cure model with interval-censored data

Wang, Xiaoguang; Wang, Ziwen

doi:10.1007/s10985-020-09507-z

EM algorithm for the additive risk mixture cure model with interval-censored data

Published: 01 October 2020

Volume 27, pages 91–130, (2021)
Cite this article

Lifetime Data Analysis Aims and scope Submit manuscript

586 Accesses
4 Citations
Explore all metrics

Abstract

Interval-censored failure time data arise in a number of fields and many authors have recently paid more attention to their analysis. However, regression analysis of interval-censored data under the additive risk model can be challenging in maximizing the complex likelihood, especially when there exists a non-ignorable cure fraction in the population. For the problem, we develop a sieve maximum likelihood estimation approach based on Bernstein polynomials. To relieve the computational burden, an expectation–maximization algorithm by exploiting a Poisson data augmentation is proposed. Under some mild conditions, the asymptotic properties of the proposed estimator are established. The finite sample performance of the proposed method is evaluated by extensive simulations, and is further illustrated through a real data set from the smoking cessation study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerated Hazards Model and Its Extensions for Interval-Censored Data

A Bayesian proportional hazards mixture cure model for interval-censored data

Article 28 November 2023

Varying coefficient transformation cure models for failure time data

Article 09 October 2019

References

Aalen O (1980) A model for nonparametric regression analysis of counting processes. Mathematical statistics and probability theory. Springer, New York, pp 1–25
Google Scholar
Banerjee S, Carlin BP (2004) Parametric spatial cure rate models for interval-censored time-to-relapse data. Biometrics 60(1):268–275
MathSciNet MATH Google Scholar
Berkson J, Gage RP (1952) Survival curve for cancer patients following treatment. J Am Stat Assoc 47(259):501–515
Google Scholar
Betensky RA, Rabinowitz D, Tsiatis AA (2001) Computationally simple accelerated failure time regression for interval censored data. Biometrika 88(3):703–711
MathSciNet MATH Google Scholar
Bickel PJ, Kwon J (2001) Inference for semiparametric models: some questions and an answer. Stat Sin 11(4):863–886
MathSciNet MATH Google Scholar
Boag JW (1949) Maximum likelihood estimates of the proportion of patients cured by cancer therapy. J R Stat Soc B 11(1):15–53
MATH Google Scholar
Cox D (1972) Regression models and life-tables. J R Stat Soc B 34(2):187–220
MathSciNet MATH Google Scholar
Farewell VT (1982) The use of mixture models for the analysis of survival data with long-term survivors. Biometrics 38(4):1041–1046
Google Scholar
Finkelstein DM (1986) A proportional hazards model for interval-censored failure time data. Biometrics 42(4):845–854
MathSciNet MATH Google Scholar
Ghosh D (2001) Efficiency considerations in the additive hazards model with current status data. Stat Neerl 55(3):367–376
MathSciNet MATH Google Scholar
Györfi L, Kohler M, Krzyzak A, Walk H (2002) A distribution-free theory of nonparametric regression. Springer-Verlag, Berlin
MATH Google Scholar
Hanin L, Huang L (2014) Identifiability of cure models revisited. J Multivar Anal 130:261–274
MathSciNet MATH Google Scholar
Hu T, Xiang L (2013) Efficient estimation for semiparametric cure models with interval-censored data. J Multivar Anal 121:139–151
MathSciNet MATH Google Scholar
Hu T, Xiang L (2016) Partially linear transformation cure models for interval-censored data. Comput Stat Data Anal 93:257–269
MathSciNet MATH Google Scholar
Huang J (1996) Efficient estimation for the proportional hazards model with interval censoring. Ann Stat 24(2):540–568
MathSciNet MATH Google Scholar
Huang J, Rossini AJ (1997) Sieve estimation for the proportional-odds failure-time regression model with interval censoring. J Am Stat Assoc 92(439):960–967
MathSciNet MATH Google Scholar
Jewell NP, Laan MV (1995) Generalizations of current status data with applications. Lifetime Data Anal 1(1):101–109
MATH Google Scholar
Kim YJ, Jhun M (2008) Cure rate model with interval censored data. Stat Med 27(1):3–14
MathSciNet Google Scholar
Li C, Taylor JMG, Sy JP (2001) Identifiability of cure models. Stat Prob Lett 54(4):389–395
MathSciNet MATH Google Scholar
Lin DY, Ying Z (1994) Semiparametric analysis of the additive risk model. Biometrika 81(1):61–71
MathSciNet MATH Google Scholar
Lin DY, Oakes D, Ying Z (1998) Additive hazards regression with current status data. Biometrika 85(2):289–298
MathSciNet MATH Google Scholar
Liu H, Shen Y (2009) A semiparametric regression cure model for interval-censored data. Publ Am Stat Assoc 104(487):1168–1178
MathSciNet MATH Google Scholar
Liu Y, Hu T, Sun J (2017) Regression analysis of current status data in the presence of a cured subgroup and dependent censoring. Lifetime Data Anal 23(4):626–650
MathSciNet MATH Google Scholar
Lorentz GG (1986) Bernstein polynomials. Chelsea Publishing Company, New York
MATH Google Scholar
Louis T (1982) Finding the observed information matrix when using the EM algorithm. J R Stat Soc B 44(2):226–233
MathSciNet MATH Google Scholar
Lu W (2010) Efficient estimation for an accelerated failure time model with a cure fraction. Stat Sin 20:661–674
MathSciNet MATH Google Scholar
Ma S (2010) Mixed case interval censored data with a cured subgroup. Stat Sin 20:1165–1181
MathSciNet MATH Google Scholar
Ma S (2011) Additive risk model for current status data with a cured subgroup. Ann Inst Stat Math 63(1):117–134
MathSciNet MATH Google Scholar
Mao M, Wang JL (2010) Semiparametric efficient estimation for a class of generalized proportional odds cure models. J Am Stat Assoc 105(489):302–311
MathSciNet MATH Google Scholar
Martinussen T, Scheike TH (2002) Efficient estimation in additive hazards regression with current status data. Biometrika 89(3):649–658
MathSciNet MATH Google Scholar
McMahan CS, Wang L, Tebbs JM (2013) Regression analysis for current status data using the EM algorithm. Stat Med 32(25):4452–4466
MathSciNet Google Scholar
Murray RP, Anthonisen NR, Connett JE, Wise RA, Lindgren PG, Greene PG, Nides MA (1998) Effects of multiple attempts to quit smoking and relapses to smoking on pulmonary function. J Clin Epidemiol 51(12):1317–1326
Google Scholar
Nelder JA, Mead R (1965) A simplex method for function minimization. Comput J 7(4):308–313
MathSciNet MATH Google Scholar
Osman M, Ghosh SK (2012) Nonparametric regression models for right-censored data using Bernstein polynomials. Comput Stat Data Anal 56(3):559–573
MathSciNet MATH Google Scholar
Peng Y, Dear KB (2000) A nonparametric mixture model for cure rate estimation. Biometrics 56(1):237–243
MATH Google Scholar
Pollard D (1984) Convergence of stochastic processes. Springer, New York
MATH Google Scholar
Pollard D (1990) Empirical processes: theory and applications. In: NSF-CBMS regional conference series in probability and statistics, pp 1–86. Institute of Mathematical Statistics and the American Statistical Association
Rossini AJ, Tsiatis AA (1996) A semiparametric proportional odds regression model for the analysis of current status data. J Am Stat Assoc 91(434):713–721
MathSciNet MATH Google Scholar
Shen X (1997) On methods of sieves and penalization. Ann Stat 25(6):2555–2591
MathSciNet MATH Google Scholar
Shen X, Wong WH (1994) Convergence rate of sieve estimates. Ann Stat 22(2):580–615
MathSciNet MATH Google Scholar
Shen Y, Cheng SC (1999) Confidence bands for cumulative incidence curves under the additive risk model. Biometrics 55(4):1093
MathSciNet MATH Google Scholar
Sun J (2006) The statistical analysis of interval-censored failure time data. Springer, New York
MATH Google Scholar
Sy JP, Taylor JM (2000) Estimation in a Cox proportional hazards cure model. Biometrics 56(1):227–236
MathSciNet MATH Google Scholar
Van der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes. Springer, New York
MATH Google Scholar
Wang L, Sun J, Tong X (2010) Regression analysis of case II interval-censored failure time data with the additive hazards model. Stat Sin 20:1709–1723
MathSciNet MATH Google Scholar
Wang L, McMahan CS, Hudgens MG, Qureshi ZP (2016) A flexible, computationally efficient method for fitting the proportional hazards model to interval-censored data. Biometrics 71(1):222–231
MathSciNet MATH Google Scholar
Wu Y, Chambers CD, Xu R (2019) Semiparametric sieve maximum likelihood estimation under cure model with partly interval censored and left truncated data for application to spontaneous abortion. Lifetime Data Anal 25:507–528
MathSciNet MATH Google Scholar
Xue H, Lam KF, Li G (2004) Sieve maximum likelihood estimator for semiparametric regression models with current status data. J Am Stat Assoc 99(466):346–356
MathSciNet MATH Google Scholar
Yu B, Peng Y (2008) Mixture cure models for multivariate survival data. Comput Stat Data Anal 52(3):1524–1532
MathSciNet MATH Google Scholar
Zeng D, Cai J, Shen Y (2006a) Semiparametric additive risks model for interval-censored data. Stat Sin 16:287–302
MathSciNet MATH Google Scholar
Zeng D, Yin G, Ibrahim JG (2006b) Semiparametric transformation models for survival data with a cure fraction. J Am Stat Assoc 101(474):670–684
MathSciNet MATH Google Scholar
Zhang J, Peng Y (2009) Accelerated hazards mixture cure model. Lifetime Data Anal 15(4):455–467
MathSciNet MATH Google Scholar
Zhang Y, Hua L, Huang J (2010) A spline-based semiparametric maximum likelihood estimation method for the Cox model with interval-censored data. Scand J Stat 37(2):338–354
MathSciNet MATH Google Scholar
Zhou J, Zhang J, Lu W (2017) An expectation maximization algorithm for fitting the generalized odds-rate model to interval censored data. Stat Med 36(7):1157–1171
MathSciNet Google Scholar

Download references

Acknowledgements

The authors would like to thank the Editor-in-Chief, the Associate Editor and two reviewers for their constructive comments and helpful suggestions that greatly improved the paper. Funding was provided by National Natural Science Foundation of China (Grant No. 11471065).

Author information

Authors and Affiliations

School of Mathematical Sciences, Dalian University of Technology, Dalian, 116024, Liaoning, China
Xiaoguang Wang & Ziwen Wang

Authors

Xiaoguang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ziwen Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoguang Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

In this appendix, we present the detailed proofs for Theorems 1–3, and the calculation of $I(\widehat{\pmb {\theta }})$. Our proof relies heavily on some results in empirical processes. To facilitate our proof for the consistency of our estimators, we need the following lemma, which play an important roles in the proof of our Theorem 1. Lemma 1 establishes the covering number of size $\epsilon $ needed to cover $\mathcal {L}_1=\{ l(\pmb {\theta };\text{ O}):\pmb {\theta }\in \varvec{\Theta }_n\}$.

Whether a given class $\mathcal {L}$ is a Glivenko–Cantelli (GC) or Donsker class depends on the size of the class. A relatively simple way to measure the size of a class $\mathcal {L}$ is to use entropy numbers. The existing results of the known GC or Donsker class are mostly in the parameter or non-parametric framework, while in the semiparametric framework, the size of the class $\mathcal {L}$ can be more clearly reflected by the entropy number. Therefore, the technical analysis on the entropy numbers is necessary.

Lemma 1

The covering number of the function class $\mathcal {L}_1=\{ l(\pmb {\theta };\text{ O}):\pmb {\theta }\in \varvec{\Theta }_n\}$ satisfies

$$\begin{aligned} N(\epsilon ,\mathcal {L}_1,\parallel \cdot \parallel _{\infty })\le CM_n^{(N+1)}\epsilon ^{-(N+p+q+1)}, \end{aligned}$$

where the degree N of Bernstein polynomials satisfies $N=o(n^\nu )$ with $0<\nu <1$ and the size of the sieve space $\varvec{\Theta }_n$ is controlled by $M_n=O(n^a)$ with a constant $a>0$.

Proof of Lemma 1By the Taylor series expansion, for any $\pmb {\theta }_1=\big (\pmb {\alpha }_{1},\pmb {\beta }_{1},\Lambda _{1}(\cdot )\big ),\pmb {\theta }_2=\big (\pmb {\alpha }_{2},\pmb {\beta }_{2},\Lambda _{2}(\cdot )\big )\in \varvec{\Theta }_n$, there exists a large enough constant C such that

$$\begin{aligned} \mid l(\pmb {\theta }_1;\text{ O})-l(\pmb {\theta }_2;\text{ O})\mid \le C\bigl (\Vert \pmb {\alpha }_1-\pmb {\alpha }_2\Vert +\Vert \pmb {\beta }_1-\pmb {\beta }_2\Vert +\Vert \Lambda _{1}(\cdot )-\Lambda _{2}(\cdot )\Vert _{\infty }\bigr ). \end{aligned}$$

For any $\Lambda _1(\cdot ),\Lambda _2(\cdot )\in \mathcal {M}_n,$ let $\pmb {\gamma }_1=(\gamma _{0,1},\ldots ,\gamma _{N,1})^{\intercal }, \pmb {\gamma }_2=(\gamma _{0,2},\ldots ,\gamma _{N,2})^{\intercal }$ be their Bernstein polynomials coefficient vectors, respectively. Then, we have

$$\begin{aligned} \Vert \Lambda _{1}(\cdot )-\Lambda _{2}(\cdot )\Vert _{\infty }= & {} \sup _t\mid \sum _{l=0}^N\gamma _{l,1}b_{l,N}(t)-\sum _{l=0}^N\gamma _{l,2}b_{l,N}(t) \mid \le \max _{0\le l\le N}\mid \gamma _{l,1}-\gamma _{l,2}\mid \\= & {} \Vert \pmb {\gamma }_1-\pmb {\gamma }_2 \Vert _{\infty }. \end{aligned}$$

Therefore one can write that

$$\begin{aligned} \mid l(\pmb {\theta }_1;\text{ O})-l(\pmb {\theta }_2;\text{ O})\mid \le C(\Vert \pmb {\alpha }_1-\pmb {\alpha }_2\Vert +\Vert \pmb {\beta }_1-\pmb {\beta }_2\Vert +\Vert \pmb {\gamma }_1-\pmb {\gamma }_2 \Vert _{\infty }). \end{aligned}$$

Let $\mathcal {B}=\{(\pmb {\alpha }^{\intercal },\pmb {\beta }^{\intercal })^{\intercal } \in \mathbb {R}^{p+q}:\Vert \pmb {\alpha }\Vert +\Vert \pmb {\beta }\Vert \le M \}$ and $\mathcal {C}=\{\pmb {\gamma }\in \mathbb {R}^{N+1}:\sum _{l=0}^N\mid \gamma _l\mid \le M_n \}.$ Combining Problem 18 in Pollard (1984),(p. 40), that is, tensor product of two $\epsilon /2C$ balls in $\mathcal {B}$ and $\mathcal {C}$, respectively, is contained in a $\epsilon $-ball in $\mathcal {L}_1$; tensor products of balls covering $\mathcal {B}$ and balls covering $\mathcal {C}$ produce sets covering the tensor of $\mathcal {B}$ and $\mathcal {C}$ of which $\mathcal {L}_1$ is a subset; Hence the covering number of $\mathcal {L}_1$ is controlled by the covering numbers of $\mathcal {B}$ and $\mathcal {C}$, we obtain that

$$\begin{aligned} N(\epsilon , \mathcal {L}_1, \parallel \cdot \parallel _{\infty }) \le N(\frac{\epsilon }{2C}, \mathcal {B}, \parallel \cdot \parallel ) N(\frac{\epsilon }{2C}, \mathcal {C}, \parallel \cdot \parallel _{\infty }). \end{aligned}$$

Using Lemma 4.1 of Pollard (1990) that presents a method for finding the bounds for the packing numbers of a set and the bounds on the packing numbers grow geometrically, thus the packing numbers of $\mathcal {B}$ and $\mathcal {C}$ that defined in Pollard (1990),(p.10) satisfy

$$\begin{aligned} M(\epsilon , \mathcal {B}, \parallel \cdot \parallel ) \le \biggl (\frac{6M}{\epsilon }\biggr )^{p+q} \text { and } \ M(\epsilon , \mathcal {C}, \parallel \cdot \parallel _{\infty }) \le \biggl (\frac{6M_n}{\epsilon }\biggr )^{N+1}. \end{aligned}$$

Next, by Lemma 9.2 in Györfi et al. (2002) shows that the $L_{(p)}$ covering numbers of the size $\epsilon $ are controlled by the $L_{(p)}$ packing numbers of the size $\epsilon $ where $p\ge 1$, one can obtain that

$$\begin{aligned} N(\epsilon , \mathcal {B}, \parallel \cdot \parallel ) \le \biggl (\frac{6M}{\epsilon }\biggr )^{p+q} \text { and } \ N(\epsilon , \mathcal {C}, \parallel \cdot \parallel _{\infty }) \le \biggl (\frac{6M_n}{\epsilon }\biggr )^{N+1}. \end{aligned}$$

Combining the above results, we have the conclusion that

$$\begin{aligned} N(\epsilon ,\mathcal {L}_1,\parallel \cdot \parallel _{\infty })\le CM_n^{(N+1)}\epsilon ^{-(N+p+q+1)}. \end{aligned}$$

$\square $

Proof of Theorem 1 By adopting the notations of Pollard (1984), we write

$$\begin{aligned} {\left\{ \begin{array}{ll} &{}P_n l=l_n(\pmb {\theta },\mathcal {O})=\frac{1}{n}\sum _{i=1}^{n}l(\pmb {\theta },\text{ O}_i),\\ &{}Pl=E_0 l(\pmb {\theta },\text{ O}), \end{array}\right. } \end{aligned}$$

where $E_0$ denotes the expectation with respect to $\text{ O }$ under true values of parameters.

Proving for the consistency of our estimators requires the following steps.

(a)
: Calculate the covering number of the function class $\mathcal {L}_1=\{ l(\pmb {\theta };\text{ O}):\pmb {\theta }\in \varvec{\Theta }_n\}$ by lemma 1.
(b)
: Prove uniform convergence, i.e. $\sup _{\mathcal {L}_1}\big |P_nl-Pl\big | \rightarrow 0$ a.s.
(c)
: Prove $d(\widehat{\pmb {\theta }}_n,\pmb {\theta }_0)\rightarrow 0$ a.s.

The function class $\mathcal {L}_1=\{ l(\pmb {\theta };\text{ O}):\pmb {\theta }\in \varvec{\Theta }_n\}$ has an envelope G with $Pl^2\le PG^2\le \delta ^2 $. Let $\alpha _n=n^{-1/2+\phi _1}(\log n)^{1/2}$ with $\frac{\nu }{2}<\phi _1<\frac{1}{2}$ such that $\alpha _n$ is a non-increasing sequence of positive numbers. For any given $\epsilon >0$, we choose $\epsilon _n=\epsilon \delta ^2\alpha _n$. Then for any $\pmb {\theta }\in \Theta _n$ and a sufficiently large n, we have

$$\begin{aligned} \frac{Var(P_nl)}{(4\epsilon _n)^2} \le \frac{\frac{1}{n}Pl^2}{16\epsilon ^2\delta ^4\alpha _n^2} \le \frac{1}{16\epsilon ^2\delta ^2n^{2\phi _1}\log n} \le \frac{1}{2}. \end{aligned}$$

Let $P_n^{o}$ denote the signed measure that places mass $\pm n^{-1}$ at each of the observations $\{O_1,\ldots ,O_n\}$, with the random ± signs being decided independently of the $O_i$’s. Applying the symmetrization inequality (Pollard (1984), II (30)) and Hoeffding’s inequality (Pollard (1984), II (31)), it follows that

$$\begin{aligned} P\big (\sup _{\mathcal {L}_1}|P_nl-Pl|>8\epsilon _n\big )\le & {} 4P\big (\sup _{\mathcal {L}_1}|P_n^ol|>2\epsilon _n\big ) \\= & {} 4E\Bigl \{I_{ \{ \sup \limits _{\mathcal {L}_1}|P_n^ol|>2\epsilon _n\}} \Bigr \} \\= & {} 4E \Bigl \{E\bigl (I_{ \{ \sup \limits _{\mathcal {L}_1}|P_n^ol|>2\epsilon _n \}}|\mathcal {O}\bigr )\Bigr \} \\= & {} 4E\left\{ P(\sup _{\mathcal {L}_1}|P_n^ol|>2\epsilon _n|\mathcal {O})\right\} \\\le & {} 4E\left\{ 2N(\epsilon _n,\mathcal {L}_1,\parallel \cdot \parallel _{\infty })\exp \left( \frac{-\frac{1}{2}n\epsilon _n^2}{\max \limits _{j}P_nl_j^2}\right) \right\} \\\le & {} 8C M_n^{N+1}\epsilon _n^{-(N+p+q+1)}\exp \Bigl (\frac{-\frac{1}{2}n\epsilon _n^2}{\max \limits _{j}P_nl_j^2}\Bigr ), \end{aligned}$$

where the maximum runs over all functions $\{l_j\} \text { in } \mathcal {L}_1$. Using the law of total probability, we have

$$\begin{aligned}&P\Big (\sup _{\mathcal {L}_1}|P_nl-Pl|>8\epsilon _n\Big ) \nonumber \\&\quad =P\Bigl (\sup _{\mathcal {L}_1}|P_nl-Pl|>8\epsilon _n \Big | \sup _{\mathcal {L}_1}|P_nl^2|\le 64\delta ^2\Bigr ) P\Bigl (\sup _{\mathcal {L}_1}|P_nl^2|\le 64\delta ^2\Bigr ) \nonumber \\&\qquad + P\Bigl (\sup _{\mathcal {L}_1}|P_nl-Pl|>8\epsilon _n \Big | \sup _{\mathcal {L}_1}|P_nl^2|> 64\delta ^2\Bigr ) P\Bigl (\sup _{\mathcal {L}_1}|P_nl^2|> 64\delta ^2\Bigr ) \nonumber \\&\quad \le P\Bigl (\sup _{\mathcal {L}_1}|P_nl-Pl|>8\epsilon _n \Big | \sup _{\mathcal {L}_1}|P_nl^2|\le 64\delta ^2\Bigr ) + P\Bigl (\sup _{\mathcal {L}_1}|P_nl^2|> 64\delta ^2\Bigr ).\nonumber \\ \end{aligned}$$

(12)

Note that $N=o(n^\nu )$, and $\nu /2<\phi _1$, which indicate $N=o(n^{2\phi _1})$. Now we study the first term of the inequality (12),

$$\begin{aligned}&P\Bigl (\sup _{\mathcal {L}_1}|P_nl-Pl|>8\epsilon _n \Big | \sup _{\mathcal {L}_1}|P_nl^2|\le 64\delta ^2\Bigr ) \\&\quad \le 8C M_n^{N+1} \epsilon _n^{-(N+p+q+1)} \exp \Bigl (\frac{-n\epsilon _n^2}{128\delta ^2}\Bigr ) \\&\quad \le 8C \exp \Bigl \{ \bigl ( N+1 \bigr ) a\log n - \bigl ( N+p+q+1 \bigr ) \log {(\epsilon \delta ^2\alpha _n)} - \frac{n\epsilon _n^2}{128\delta ^2} \Bigr \} \\&\quad \le 8C \exp \Bigl \{ \bigl ( N+p+q+1 \bigr ) \bigl [ (a+\frac{1}{2}-\phi _1) \log n - \log {(\epsilon \delta ^2)} \\&\qquad -\frac{1}{2}\log {\log n} \bigr ] - \frac{\epsilon ^2\delta ^2 n^{2\phi _1}\log n}{128} \Bigr \} \\&\quad \le 8C \exp \Bigl \{ n^{2\phi _1}\log n \bigl [ \frac{\bigl ( o(n^{2\phi _1})+p+q+1 \bigr ) (a+\frac{1}{2}-\phi _1)}{n^{2\phi _1}} - \frac{\epsilon ^2\delta ^2}{128} \bigr ] \Bigr \} \\&\quad \le 8C \exp \bigl \{ -\tilde{C} n^{2\phi _1}\log n \bigr \}, \end{aligned}$$

where $\tilde{C}$ is a constant. For the second term of the inequality (12), by Lemma 33 of Pollard (1984) it follows that

$$\begin{aligned} P\left( \sup _{\mathcal {L}_1}\mid P_nl^2 \mid> 64\delta ^2\right)= & {} P\left( \sup _{\mathcal {L}_1}|P_nl^2|^{1/2}> 8\delta \right) \\\le & {} 4E \Bigl \{ N(\delta ,\mathcal {L}_1,P_n) \exp (-n\delta ^2)\wedge 1 \Big \} \\\le & {} 4E \Bigl \{ N(\delta ,\mathcal {L}_1,\parallel \cdot \parallel _{\infty }) \exp (-n\delta ^2)\wedge 1 \Bigr \} \\\le & {} 4 \Bigl \{ C M_n^{N+1} \delta ^{-(N+p+q+1)} \exp (-n\delta ^2)\wedge 1 \Big \} \\\le & {} 4C \exp \Bigl \{ (o(n^\nu )+p+q+1) (a\log n - \log \delta ) -n\delta ^2 \Bigr \}. \end{aligned}$$

This result indicates that the second term converges to zero even faster than the first term. Therefore one can obtain that $\sum _{n=1}^{\infty } P\big (\sup _{\mathcal {L}_1} |P_nl-Pl| > 8\epsilon _n\big ) < \infty $. By adopting the Borel–Contelli lemma, it is easy to obtain that

$$\begin{aligned} \sup _{\mathcal {L}_1}\big |P_nl-Pl\big | \rightarrow 0, \quad a.s. \end{aligned}$$

Note that $P_nl(\pmb {\theta }_0;\text{ O}) \le P_nl(\widehat{\pmb {\theta }}_n;\text{ O})$, then we have

$$\begin{aligned} 0\le & {} Pl(\pmb {\theta }_0;\text{ O})-Pl(\widehat{\pmb {\theta }}_n;\text{ O}) \\= & {} Pl(\pmb {\theta }_0;\text{ O}) - P_nl(\pmb {\theta }_0;\text{ O}) + P_nl(\pmb {\theta }_0;\text{ O}) - P_nl(\widehat{\pmb {\theta }}_n;\text{ O}) + P_nl(\widehat{\pmb {\theta }}_n;\text{ O}) - Pl(\widehat{\pmb {\theta }}_n;\text{ O}) \\\le & {} Pl(\pmb {\theta }_0;\text{ O}) - P_nl(\pmb {\theta }_0;\text{ O}) + P_nl(\widehat{\pmb {\theta }}_n;\text{ O}) - Pl(\widehat{\pmb {\theta }}_n;\text{ O}). \end{aligned}$$

Therefore, we have the conclusion that

$$\begin{aligned}&\&\mid Pl(\pmb {\theta }_0;\text{ O})-Pl(\widehat{\pmb {\theta }}_n;\text{ O})\mid \\\le & {} \mid Pl(\pmb {\theta }_0;\text{ O}) - P_nl(\pmb {\theta }_0;\text{ O})\mid + \mid P_nl(\widehat{\pmb {\theta }}_n;\text{ O}) - Pl(\widehat{\pmb {\theta }}_n;\text{ O})\mid \rightarrow 0, \quad a.s. \end{aligned}$$

Considering the fact that the Kullback–Leibler information is greater than or equal to the square of the Hellinger metric (Xue et al. 2004), one can obtain that

$$\begin{aligned} \mid Pl(\pmb {\theta }_0;\text{ O})-Pl(\widehat{\pmb {\theta }}_n;\text{ O})\mid= & {} E_{\pmb {\theta }_0} \Bigl \{ l(\pmb {\theta }_0;\text{ O}) - l(\widehat{\pmb {\theta }}_n;\text{ O}) \Bigr \} \\\ge & {} \Big \Vert \sqrt{L(\pmb {\theta }_0;\text{ O})} - \sqrt{L(\widehat{\pmb {\theta }}_n;\text{ O})}\Big \Vert _2^2 \\= & {} \Big \Vert \frac{\nabla _{\pmb {\theta }}L(\check{\pmb {\theta }};\text{ O})}{2\sqrt{L(\check{\pmb {\theta }};\text{ O})} } (\pmb {\theta }_0 - \widehat{\pmb {\theta }}_n) \Big \Vert _2^2, \end{aligned}$$

where $\check{\pmb {\theta }}$ is between $\pmb {\theta }_0$ and $\widehat{\pmb {\theta }}_n$, and the derivative of $L(\pmb {\theta };\text{ O})$ with respect to $\pmb {\theta }$ is $\nabla _{\pmb {\theta }}L(\pmb {\theta };\text{ O})$. Note that $\frac{\nabla _{\pmb {\theta }}L(\check{\pmb {\theta }};\text{ O})}{2\sqrt{L(\check{\pmb {\theta }};\text{ O})}}$ is not equal to zero and bounded. Therefore, $d(\widehat{\pmb {\theta }}_n,\pmb {\theta }_0)\rightarrow 0$ almost surely. Note that, $\parallel \widehat{\pmb {\alpha }}_n-\pmb {\alpha }_0\parallel \le d(\widehat{\pmb {\theta }}_n,\pmb {\theta }_0)$, $\parallel \widehat{\pmb {\beta }}_n-\pmb {\beta }_0\parallel \le d(\widehat{\pmb {\theta }}_n,\pmb {\theta }_0)$ and $\parallel \Lambda _{n}-\widehat{\Lambda }_{0}\parallel _2\le d(\widehat{\pmb {\theta }}_n,\pmb {\theta }_0)$. Then we have

$$\begin{aligned} \parallel \widehat{\pmb {\alpha }}_n-\pmb {\alpha }_0\parallel \rightarrow 0,\quad \parallel \widehat{\pmb {\beta }}_n-\pmb {\beta }_0\parallel \rightarrow 0,\quad \parallel \widehat{\Lambda }_n-\Lambda _0\parallel _2\rightarrow 0,\quad a.s. \end{aligned}$$

$\square $

Proof of Theorem 2 ne can derive the convergence rate of $\widehat{\pmb {\theta }}_n$ by verifying the conditions of Theorem 3.2.5 in Van der Vaart and Wellner (1996). First, by the relationship between the Hellinger distance and the Kullback–Leibler information in the proof of theorem 1, we obtain

$$\begin{aligned} Pl(\pmb {\theta }_0;\text{ O})-Pl(\pmb {\theta };\text{ O})\ge Cd^2(\pmb {\theta }_0,\pmb {\theta }),\quad \pmb {\theta }\in \varvec{\Theta }_n. \end{aligned}$$

Second, note from Theorem 1.6.2 of Lorentz (1986) and the conclusions of Osman and Ghosh (2012), there exists the Bernstein polynomials $\Lambda _{0,n}(\cdot )$ such that $\parallel \Lambda _0(\cdot ) - \Lambda _{0,n}(\cdot ) \parallel _{\infty } = O(N^{-r/2})$ with $r\ge 1$. Thus we obtain $d(\pmb {\theta }_0,\pmb {\theta }_{0,n}) = O(n^{-{r\nu }/2})$, where $\pmb {\theta }_{0,n}=(\pmb {\alpha }_{0},\pmb {\beta }_{0},\Lambda _{0,n}(\cdot ))$.

Then, we further explore $P_nl(\widehat{\pmb {\theta }}_n;\text{ O}) - P_nl(\pmb {\theta }_0;\text{ O})$. In the proof of consistency, we know that

$$\begin{aligned} P_nl(\widehat{\pmb {\theta }}_n;\text{ O}) - P_nl(\pmb {\theta }_0;\text{ O})= & {} (P_n-P)\{l(\pmb {\theta }_{0n};\text{ O}) - l(\pmb {\theta }_0;\text{ O})\}+P\{Pl(\pmb {\theta }_{0n};\text{ O}) \\&\quad - Pl(\pmb {\theta }_0;\text{ O})\} \\= & {} I_{1n}+I_{2n}. \end{aligned}$$

We can construct a set of brackets for the class $\mathcal {L}_2=\{l(\vartheta _{0},\Lambda (\cdot ))-l(\vartheta _{0},\Lambda _{0}(\cdot )): \Lambda (\cdot )\in M_{n}\ \text {and} \parallel \Lambda _0(\cdot ) - \Lambda _{0,n}(\cdot ) \parallel _{\infty } \le Cn^{-r\nu /2}\}$ with the $\epsilon $-bracketing number bounded by $(1/\epsilon )^{C(N+1)}$. This yields a finite value bracketing integral. Hence the class $\mathcal {L}_2$ is P-Donsker. Using similar arguments as those in Zhang et al. (2010), we can obtain $I_{1n}=o_p(n^{-r\nu })$ and $I_{2n}\ge -O(n^{-r\nu })$. Thus, we have $P_nl(\widehat{\pmb {\theta }}_n;\text{ O}) - P_nl(\pmb {\theta }_0;\text{ O})\ge -O_p(n^{-r\nu })=O_{p}(n^{-2\min (r\nu /2),(1-\nu )/2})$.

Let $\mathcal {L}_2(\varsigma )=\big \{l(\pmb {\theta };\text{ O})-l(\pmb {\theta }_0;\text{ O}): \pmb {\theta }\in \varvec{\Theta }_n \text { and } d(\pmb {\theta },\pmb {\theta }_0)\le \varsigma \big \}$. Moreover, one can obtain $P\big \{l(\pmb {\theta };\text{ O})-l(\pmb {\theta }_0;\text{ O})\big \}^2 \le Cd^2(\pmb {\theta },\pmb {\theta }_0)\le C\varsigma ^2$ for any $l(\pmb {\theta };\text{ O})-l(\pmb {\theta }_0;\text{ O})\in \mathcal {L}_2(\varsigma )$. Note that $\mathcal {L}_2(\varsigma )$ is uniformly bounded with conditions (C1)–(C3). Obeying the calculating results in Shen and Wong (1994),(p.597), we can establish that for $0<\epsilon <\varsigma $, the bracketing entropy $\log N_{[\,]}(\epsilon ,\mathcal {L}_2(\varsigma ),L_2(P))$ is bounded by $C\tilde{N}\log {(\varsigma /\epsilon })$ with $\tilde{N}=N+1$, where $N_{[\,]}(\epsilon ,\mathcal {L}_2(\varsigma ),L_2(P))$ is the $\epsilon $-bracketing number of $\mathcal {L}_2(\varsigma )$ presented in Definition 2.1.6 of Van der Vaart and Wellner (1996).

Therefore, according to Lemma 3.4.2 in Van der Vaart and Wellner (1996) that the continuity modulus of the empirical process $\sqrt{n}(P_{n}-P)$ gives an upper bounded on the rate, we obtain that

$$\begin{aligned} E_P\parallel n^{1/2}(P_n-P)\parallel _{\mathcal {L}_2(\varsigma )} \le CJ_{[\,]}\big \{\varsigma ,\mathcal {L}_2(\varsigma ),L_2(P) \big \} \biggl \{ 1 + \frac{J_{[\,]}\{\varsigma ,\mathcal {L}_2(\varsigma ),L_2(P) \}}{n^{1/2}\varsigma ^2} \biggr \}, \end{aligned}$$

where $J_{[\,]}\{\varsigma ,\mathcal {L}_2(\varsigma ),L_2(P) \} = \int _0^{\varsigma }\bigl [ 1 +\log N_{[\,]}\{\epsilon ,\mathcal {L}_2(\varsigma ),L_2(P)\} \bigr ]^{1/2}d\epsilon \le C\tilde{N}^{1/2}\varsigma $. This result implies that the function $\phi _n(\varsigma )$ of Theorem 3.2.5 in Van der Vaart and Wellner (1996) can be given by $\phi _n(\varsigma )=\tilde{N}^{1/2}\varsigma +\tilde{N}/n^{1/2}$ and we know that $\phi _{n}(\varsigma )/\varsigma $ is decreasing in $\varsigma $. Note that

$$\begin{aligned} n^{r\nu }\phi _n(1/n^{r\nu /2})=n^{r\nu }\{\tilde{N}^{1/2}n^{-{r\nu }/2}+\tilde{N}n^{-1/2}\} \le n^{1/2}\{n^{(\nu -1)/2+{r\nu }/2}+n^{\nu -1+r\nu }\}. \end{aligned}$$

If $r\nu /2 \le (1-\nu )/2$, then $n^{r\nu }\phi _n(1/n^{{r\nu }/2})\le n^{1/2}$. Thus $r_n$ is given by $r_n=n^{\min \{r\nu /2,(1-\nu )/2\}}$, which leads to $r_n^2\phi _n(1/r_n)\le n^{1/2}$.

Finally, Combining above results that satisfy the conditions of Theorem 3.2.5 in Van der Vaart and Wellner (1996). We have $r_nd(\widehat{\pmb {\theta }}_n,\pmb {\theta }_0)=O_p(1)$; that is, $d(\widehat{\pmb {\theta }}_n,\pmb {\theta }_0)=O_p(n^{-\min \{r\nu /2,(1-\nu )/2\}})$. The proof of Theorem 2 is completed.

Proof of Theorem 3 For any $v^*\in \varvec{\Theta }_0$, by Theorem 1.6.2 in Lorentz (1986) and the conclusions of Osman and Ghosh (2012), there exists $\pi _n v^*\in \varvec{\Theta }_n$ such that $\parallel \pi _n v^*-v^*\parallel =O(n^{-\frac{r\nu }{2}})$. Note that $\delta _n\parallel \pi _n v^*-v^*\parallel =o(n^{-1/2})$ with $r>1$ and $r\nu >1/2$. Let $\varepsilon _{n}$ be any positive sequence with $\varepsilon _{n}=o(n^{-1/2})$ and define $\rho [\pmb {\theta }-\pmb {\theta }_0;\text{ O}]=l(\pmb {\theta };\text{ O})-l(\pmb {\theta }_0;\text{ O})-\dot{l}(\pmb {\theta }_0;\text{ O})[\pmb {\theta }-\pmb {\theta }_0]$. Then by the $P\{\dot{l}(\widehat{\pmb {\theta }}_{0};\text{ O})[\pi _{n}v^{*}]\}=0$. One can obtain that

$$\begin{aligned} \begin{aligned} 0&\le P_n\Bigl \{l(\widehat{\pmb {\theta }}_n;\text{ O}) - l(\widehat{\pmb {\theta }}_n\pm \varepsilon _{n}\pi _{n}v^*;\text{ O})\Bigr \} \\&= P_n\Bigl \{ [ l(\widehat{\pmb {\theta }}_n;\text{ O}) - l(\pmb {\theta }_{0};\text{ O}) ] - [ l(\widehat{\pmb {\theta }}_{n}\pm \varepsilon _{n}\pi _{n}v^{*};\text{ O}) - l(\pmb {\theta }_0;\text{ O}) ] \Bigr \} \\&= \mp \varepsilon _{n}P_n\dot{l}(\pmb {\theta }_0;\text{ O})[\pi _nv^*] + P_n\Bigl \{ \rho [\widehat{\pmb {\theta }}_n-\pmb {\theta }_0;\text{ O}] - \rho [\widehat{\pmb {\theta }}_{n}\pm \varepsilon _{n}\pi _{n}v^{*}-\pmb {\theta }_{0};\text{ O}] \Bigr \} \\&= \mp \varepsilon _{n}P_n\dot{l}(\pmb {\theta }_0;\text{ O})[v^*] \mp \varepsilon _{n}P_n\dot{l}(\pmb {\theta }_0;\text{ O})[\pi _{n}v^{*}-v^{*}]\\&\quad + (P_n-P)\Bigl \{\rho [\widehat{\pmb {\theta }}_{n}-\pmb {\theta }_{0};\text{ O}]-\rho [\widehat{\pmb {\theta }}_n\pm \varepsilon _{n}\pi _nv^*-\pmb {\theta }_0;\text{ O}]\Bigr \}\\&\quad + P\Bigl \{ \rho [\widehat{\pmb {\theta }}_{n}-\pmb {\theta }_{0};\text{ O}] - \rho [\widehat{\pmb {\theta }}_{n}\pm \varepsilon _{n}\pi _nv^{*}-\pmb {\theta }_{0};\text{ O}] \Bigr \} \\&:= \mp \varepsilon _{n}P_{n}\dot{l}(\pmb {\theta }_{0};\text{ O})[v^*] + I_1 + I_2 + I_3. \end{aligned} \end{aligned}$$

Next, we will study the asymptotic properties of $I_1, I_2$ and $I_3$.

For $I_1$, by the Chebyshev’s inequality, $P\dot{l}(\pmb {\theta }_0;\text{ O})[\pi _nv^*-v^*]=0$ and $\parallel \pi _nv^*-v^*\parallel =o(1)$, we obtain

$$\begin{aligned} P\left( \frac{\mid P_n\dot{l}(\pmb {\theta }_0;\text{ O})[\pi _nv^*-v^*] \mid }{n^{-1/2}} \ge \varepsilon \right)\le & {} \frac{Var(P_n\dot{l}(\pmb {\theta }_0;\text{ O})[\pi _nv^*-v^*])}{n^{-1}\varepsilon ^2} \\= & {} \frac{ Var(\dot{l}(\pmb {\theta }_0;\text{ O})[\pi _nv^*-v^*])}{\varepsilon ^{2}} \\= & {} \frac{P(\dot{l}(\pmb {\theta }_0;\text{ O})[\pi _nv^*-v^*]\dot{l}(\pmb {\theta }_0;\text{ O})[\pi _nv^*-v^*])}{\varepsilon ^2} \\= & {} \frac{\parallel \pi _nv^*-v^* \parallel ^2}{\varepsilon ^2} \rightarrow 0, \end{aligned}$$

which indicates that $P_n\dot{l}(\pmb {\theta }_0;\text{ O})[\pi _nv^*-v^*]=o_p(n^{-1/2})$. Thus, we have $I_1=\varepsilon _{n}\times o_p(n^{-1/2})$.

For $I_2$, by the mean value theorem, we have

$$\begin{aligned} \begin{aligned} I_2&= (P_n-P)\Bigl \{l(\widehat{\pmb {\theta }}_n;\text{ O}) - l(\widehat{\pmb {\theta }}_n\pm \varepsilon _n\pi _nv^*;\text{ O}) \pm \varepsilon _n\dot{l}(\pmb {\theta }_0;\text{ O})[\pi _nv^*] \Bigr \}\\&= (P_n-P)\Bigl \{ \dot{l}(\check{\pmb {\theta }};\text{ O})[\mp \varepsilon _n\pi _nv^*] \pm \varepsilon _n\dot{l}(\pmb {\theta }_0;\text{ O})[\pi _nv^*] \Bigr \}\\&= \mp \varepsilon _n(P_n-P)\Bigl \{ \dot{l}(\check{\pmb {\theta }};\text{ O})[\pi _nv^*] - \dot{l}(\pmb {\theta }_0;\text{ O})[\pi _nv^*] \Bigr \}, \end{aligned} \end{aligned}$$

where $\check{\pmb {\theta }}$ is between $\widehat{\pmb {\theta }}_n$ and $\widehat{\pmb {\theta }}_n\pm \varepsilon _n\pi _nv^*$. Consider the function class $\mathcal {L}_3=\{\dot{l}(\pmb {\theta };\text{ O})[\pi _nv^*] :\pmb {\theta }\in \varvec{\Theta }_n \text { and } \parallel \pmb {\theta }-\pmb {\theta }_0\parallel = O(\delta _n)\}$. For any $\dot{l}(\pmb {\theta }_i;\text{ O})[\pi _nv^*]\in \mathcal {L}_3\ (i=1,2)$, we have $\bigl |\dot{l}(\pmb {\theta }_1;\text{ O})[\pi _nv^*] - \dot{l}(\pmb {\theta }_2;\text{ O})[\pi _nv^*]\bigr |\le C\parallel \pmb {\theta }_1-\pmb {\theta }_2\parallel $. Thus it yields that

$$\begin{aligned} N\big (\epsilon ,\mathcal {L}_3,L_2(Q)\big )\le N\big (\epsilon ,\big \{\pmb {\theta }:\pmb {\theta }\in \varvec{\Theta }_n \text { and } \parallel \pmb {\theta }-\pmb {\theta }_0\parallel \le C\delta _n\big \},\parallel \cdot \parallel \big ). \end{aligned}$$

Similar to the proof of Lemma 1, one can obtain that $N(\epsilon ,\mathcal {L}_3,L_2(Q)) \le (\frac{C\delta _n}{\epsilon })^{N+1}$. Then we have the finiteness of the entropy inequality, i.e. $\int _0^{\infty }\sup _{Q}\sqrt{N(\epsilon ,\mathcal {L}_3,L_2(Q))}d\epsilon < \infty $ with $\nu <1/2$ and $r>1$. Note that $\dot{l}(\pmb {\theta };\text{ O})[\pi _nv^*]$ is uniformly bounded under conditions (C1)–(C3). Hence $\mathcal {L}_3$ is a Donsker class by Theorem 2.8.3 of Van der Vaart and Wellner (1996). Applying the relationship between Donsker and asymptotic equicontinuity given by Corollary 2.3.12 of Van der Vaart and Wellner (1996), one can obtain that

$$\begin{aligned} (P_n-P)\Bigl \{\dot{l}(\tilde{\pmb {\theta }};\text{ O})[\pi _nv^*] - \dot{l}(\pmb {\theta }_0;\text{ O})[\pi _nv^*]\Bigr \}=o_p(n^{-1/2}). \end{aligned}$$

Therefore, we have $I_2=\epsilon _n\times o_p(n^{-1/2})$.

For $I_{3}$, note that $P\big \{\ddot{l}(\pmb {\theta }_0;\text{ O})[\widehat{\pmb {\theta }}_n-\pmb {\theta }_0,\widehat{\pmb {\theta }}_n-\pmb {\theta }_0]\big \}=-P\big \{\dot{l}(\pmb {\theta }_0;\text{ O})[\widehat{\pmb {\theta }}_n-\pmb {\theta }_0]\dot{l}(\pmb {\theta }_0;\text{ O})[\widehat{\pmb {\theta }}_n-\pmb {\theta }_0]\big \}.$ For any $\pmb {\theta }\in \big \{\pmb {\theta }:d(\pmb {\theta },\pmb {\theta }_0)=O(\delta _n)\big \}$, $P\big \{\ddot{l}({\pmb {\theta }};\text{ O})[\pmb {\theta }-\pmb {\theta }_0,\pmb {\theta }-\pmb {\theta }_0] - \ddot{l}(\pmb {\theta }_0;\text{ O})[\pmb {\theta }-\pmb {\theta }_0,\pmb {\theta }-\pmb {\theta }_0]\big \} = O(\delta _n^3)$, $\delta _n^3 = o(n^{-1})$ with $2/3r<\nu <1/3$ and $r>2$. Then we have

$$\begin{aligned} P\big (\rho [\widehat{\pmb {\theta }}_n-\pmb {\theta }_0;\text{ O}]\big )= & {} P\bigl \{l(\widehat{\pmb {\theta }}_n;\text{ O}) - l(\pmb {\theta }_0;\text{ O}) - \dot{l}(\pmb {\theta }_0;\text{ O})[\widehat{\pmb {\theta }}_n-\pmb {\theta }_0]\bigr \}\\= & {} \frac{1}{2}P\bigl \{\ddot{l}(\check{\pmb {\theta }};\text{ O})[\widehat{\pmb {\theta }}_n-\pmb {\theta }_0,\widehat{\pmb {\theta }}_n-\pmb {\theta }_0] - \ddot{l}(\pmb {\theta }_0;\text{ O})[\widehat{\pmb {\theta }}_n-\pmb {\theta }_0,\widehat{\pmb {\theta }}_n-\pmb {\theta }_0]\bigr \}\\&\quad + \frac{1}{2}P\bigl \{\ddot{l}(\pmb {\theta }_0;\text{ O})[\widehat{\pmb {\theta }}_n-\pmb {\theta }_0,\widehat{\pmb {\theta }}_n-\pmb {\theta }_0]\bigr \} \\ \\= & {} \varepsilon _n \times o_p(n^{-1/2}) + \frac{1}{2}P\{\ddot{l}(\pmb {\theta }_0;\text{ O})[\widehat{\pmb {\theta }}_n-\pmb {\theta }_0,\widehat{\pmb {\theta }}_n-\pmb {\theta }_0]\} \\= & {} \varepsilon _n \times o_p(n^{-1/2}) - \frac{1}{2}\parallel \widehat{\pmb {\theta }}_n-\pmb {\theta }_0\parallel ^2, \end{aligned}$$

where $\check{\pmb {\theta }}$ is between $\widehat{\pmb {\theta }}_n$ and $\pmb {\theta }_0$. By the facts $\parallel \pi _nv^*\parallel ^2\rightarrow \parallel v^*\parallel ^2<\infty $, Cauchy-Schwarz inequality, and $\delta _n\parallel \pi _n v^*-v^*\parallel =o(n^{-1/2})$, we have

$$\begin{aligned} I_3= & {} - \frac{1}{2}\parallel \widehat{\pmb {\theta }}_n-\pmb {\theta }_0\parallel ^2 + \frac{1}{2}\parallel \widehat{\pmb {\theta }}_n\pm \varepsilon _n\pi _nv^*-\pmb {\theta }_0\parallel ^2 + \varepsilon _n\times o_p(n^{-1/2})\\= & {} \pm \varepsilon _n\langle \widehat{\pmb {\theta }}_n-\pmb {\theta }_0,\pi _nv^*\rangle +\frac{1}{2}\parallel \varepsilon _n\pi _nv^*\parallel ^2 + \epsilon _n\times o_p(n^{-1/2})\\= & {} \pm \varepsilon _n\langle \widehat{\pmb {\theta }}_n-\pmb {\theta }_0,v^*\rangle +\frac{1}{2}\varepsilon _n^2\parallel \pi _nv^*\parallel ^2 + \varepsilon _n\times o_p(n^{-1/2})\\= & {} \pm \varepsilon _n\langle \widehat{\pmb {\theta }}_n-\pmb {\theta }_0,v^*\rangle + \varepsilon _n\times o_p(n^{-1/2}). \end{aligned}$$

Combining the above results, we can obtain

$$\begin{aligned} 0= & {} \le P_n\bigl \{ l(\widehat{\pmb {\theta }}_n;\text{ O}) - l(\widehat{\pmb {\theta }}_n\pm \varepsilon _n\pi _nv^*;\text{ O}) \bigr \} \\= & {} \mp \varepsilon _{n}P_n\dot{l}(\pmb {\theta }_0;\text{ O})[v^*] \pm \varepsilon _n\langle \widehat{\pmb {\theta }}_n-\pmb {\theta }_0,v^*\rangle + \varepsilon _n\times o_p(n^{-1/2}). \end{aligned}$$

Note that $P\dot{l}(\pmb {\theta }_0;\text{ O})[v^*]=0$ and $Var(\dot{l}(\pmb {\theta }_0;\text{ O})[v^*]) = \Vert v^{*}\Vert ^2$. Therefore, by the Central Limits Theorem we obtain

$$\begin{aligned} \sqrt{n}\langle \widehat{\pmb {\theta }}_n-\pmb {\theta }_0,v^*\rangle = \sqrt{n}(P_n-P)\{\dot{l}(\pmb {\theta }_0;\text{ O})[v^*]\} + o_p(1) \rightarrow N(0,\Vert v^*\Vert ^2) \end{aligned}$$

in distribution. Note that $G(\pmb {\theta })-G(\pmb {\theta }_0)=\dot{G}(\pmb {\theta }_0)[\pmb {\theta }-\pmb {\theta }_0]$. By the Riesz representation theorem, there exists $v^{*}\in \mathcal {\bar{V}} \text { such that } \dot{G}(\pmb {\theta }_0)[v]=\langle v,v^*\rangle $ for any $v\in \mathcal {\bar{V}}$ and $\Vert v^*\Vert = \Vert \dot{G}(\pmb {\theta }_0)\Vert $. Then we have

$$\begin{aligned} \sqrt{n}\big \{G(\widehat{\pmb {\theta }}_n)-G(\pmb {\theta }_0)\big \}=\sqrt{n}\langle \widehat{\pmb {\theta }}_n-\pmb {\theta }_0,v^*\rangle +o_{p}(1)\rightarrow N(0,\Vert \dot{G}(\pmb {\theta }_0)\Vert ^{2}) \end{aligned}$$

in distribution, namely, $ \sqrt{n} \Bigl \{\text{ b}_{1}^{\intercal }(\widehat{\pmb {\alpha }}_{n}-\pmb {\alpha }_{0})+\text{ b}_{2}^{\intercal }(\widehat{\pmb {\beta }}_{n}-\pmb {\beta }_{0})+ \int _0^{\tau } b_3(t)\big (\widehat{\Lambda }_n(t)-\Lambda _0(t)\big )dt \Bigr \}\xrightarrow {L} N(0,\Vert \dot{G}(\pmb {\theta }_0)\Vert ^2)$. Adopting Theorem 4 in Shen (1997) or the conclusions of Bickel and Kwon (2001), we can establish the semiparametric efficiency of the estimators. The proof is completed.

The calculation of $I(\widehat{\pmb {\theta }})$

One can obtain $Q(\widetilde{\pmb {\theta }}; \widehat{\pmb {\theta }})$ with respect to $\widetilde{\pmb {\theta }}$. Then, the quantities of the first part in $I(\widehat{\pmb {\theta }})$ are given by

$$\begin{aligned} \frac{\partial ^2Q(\widetilde{\pmb {\theta }};\widehat{\pmb {\theta }})}{\partial \alpha _{k}\partial \alpha _{k}^{'}}&= \sum _{i=1}^n Z_{ik}Z_{ik}^{'}\widehat{\pi }(\text{ Z}_{i})\Big \{1-\widehat{\pi }(\text{ Z}_{i}) \Big \} , \\ \frac{\partial ^2Q(\widetilde{\pmb {\theta }};\widehat{\pmb {\theta }})}{\partial \alpha _{k}\partial \eta _{j}^{'}}&=\frac{\partial ^2Q(\widetilde{\pmb {\theta }};\widehat{\pmb {\theta }})}{\partial \eta _{j}\partial \alpha _{k}^{'}} = 0, \\ \frac{\partial ^2Q(\widetilde{\pmb {\theta }};\widehat{\pmb {\theta }})}{\partial \eta _j\partial \eta _{j'}}&= -\eta _j^{-2}\sum _{i=1}^n \Big \{\delta _{1i} E(Y_{ij}|\mathcal {O},\widehat{\pmb {\theta }})\\&\quad +\delta _{2i} E(W_{ij}|\mathcal {O},\widehat{\pmb {\theta }})\Big \}I_{(j'=j)}. \end{aligned}$$

The second part of the $I(\widehat{\pmb {\theta }})$ derived from Eq. (6) is listed as follows

$$\begin{aligned}&Cov\left( \frac{\partial \log {L_c(\widetilde{\pmb {\theta }})}}{\partial \alpha _{k}}, \frac{\partial \log {L_c(\widetilde{\pmb {\theta }})}}{\partial \alpha _{k}^{'}}\right) = \sum _{i=1}^n Z_{ik}Z_{ik}^{'} Var(U_{i}), \\&Cov\left( \frac{\partial \log {L_c(\widetilde{\pmb {\theta }})}}{\partial \alpha _{k}}, \frac{\partial \log {L_c(\widetilde{\pmb {\theta }})}}{\partial \eta _{j}}\right) = - \sum _{i=1}^n Z_{ik}\delta _{3i}\xi _{j}(L_{i}) Var(U_{i}), \\&Cov\left( \frac{\partial \log {L_c(\widetilde{\pmb {\theta }})}}{\partial \eta _j}, \frac{\partial \log {L_c(\widetilde{\pmb {\theta }})}}{\partial \eta _{j'}}\right) = (\eta _j\eta _{j'})^{-1} \sum _{i=1}^n \Big \{\delta _{1i} Cov(Y_{ij},Y_{i{j'}})\\&\quad +\delta _{2i} Cov(Y_{ij},Y_{i{j'}})\Big \}, \end{aligned}$$

where

$$\begin{aligned} Var(U_i)= & {} E(U_{i}^{2})-\{E(U_{i})\}^{2}= E(U_{i})\{1-E(U_{i})\}, \\ Cov(Y_{ij},Y_{ij'})= & {} \frac{\delta _{1i}\widehat{\lambda }_{ij}}{\widehat{c}_{i}^{2}}(\widehat{c}_{i}-\widehat{\lambda }_{ij}+\widehat{c}_{i}\widehat{\lambda }_{ij}), \\ Cov(W_{ij},W_{i{j'}})= & {} \frac{\delta _{2i}\widehat{\omega }_{ij}}{\widehat{d}_{i}^{2}}(\widehat{d}_{i}-\widehat{\omega }_{ij}+\widehat{d}_{i}\widehat{\omega }_{ij}). \end{aligned}$$

Denote $\widehat{c}_{i}=1-\exp {(-\widehat{\lambda }_{i})}$, $\widehat{d}_{i}=1-\exp {(-\widehat{\omega }_{i})}$, then we can obtain $I(\widehat{\pmb {\theta }})$ by Eq. (11) and these closed forms make the estimation of variance-covariance matrix easy to be calculated. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, X., Wang, Z. EM algorithm for the additive risk mixture cure model with interval-censored data. Lifetime Data Anal 27, 91–130 (2021). https://doi.org/10.1007/s10985-020-09507-z

Download citation

Received: 06 September 2018
Accepted: 19 September 2020
Published: 01 October 2020
Issue Date: January 2021
DOI: https://doi.org/10.1007/s10985-020-09507-z

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

EM algorithm for the additive risk mixture cure model with interval-censored data

Abstract

Access this article

Similar content being viewed by others

Accelerated Hazards Model and Its Extensions for Interval-Censored Data

A Bayesian proportional hazards mixture cure model for interval-censored data

Varying coefficient transformation cure models for failure time data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Lemma 1

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

EM algorithm for the additive risk mixture cure model with interval-censored data

Abstract

Access this article

Similar content being viewed by others

Accelerated Hazards Model and Its Extensions for Interval-Censored Data

A Bayesian proportional hazards mixture cure model for interval-censored data

Varying coefficient transformation cure models for failure time data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Lemma 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation