Skip to main content

A new approach to estimation of the proportional hazards model based on interval-censored data with missing covariates


This paper discusses the fitting of the proportional hazards model to interval-censored failure time data with missing covariates. Many authors have discussed the problem when complete covariate information is available or the missing is completely at random. In contrast to this, we will focus on the situation where the missing is at random. For the problem, a sieve maximum likelihood estimation approach is proposed with the use of I-spline functions to approximate the unknown cumulative baseline hazard function in the model. For the implementation of the proposed method, we develop an EM algorithm based on a two-stage data augmentation. Furthermore, we show that the proposed estimators of regression parameters are consistent and asymptotically normal. The proposed approach is then applied to a set of the data concerning Alzheimer Disease that motivated this study.

This is a preview of subscription content, access via your institution.

Fig. 1


  • Chen K, Jin Z, Ying Z (2002) Semiparametric analysis of transformation models with censored data. Biometrika 89:659–668

    MathSciNet  Article  Google Scholar 

  • Chen HY, Little RJ (1999) Proportional hazards regression with missing covariates. J Am Stat Assoc 94(447):896–908

    MathSciNet  Article  Google Scholar 

  • Chang IS, Wen CC, Wu YJ (2007) A profile likelihood theory for the correlated gamma-frailty model with current status family data. Stat Sin 17:1023–1046

    MathSciNet  MATH  Google Scholar 

  • Du MY, Li HQ, Sun JG (2021) Regression analysis of censored data with nonignorable missing covariates and application to Alzheimer Disease. Comput Stat Data Anal 157:1–15

    MathSciNet  Article  Google Scholar 

  • Efron B (1981) Nonparametric estimates of standard error: the jackknife, the bootstrap and other methods. Biometrika 68(3):589–599

    MathSciNet  Article  Google Scholar 

  • Gilks WR, Wild P (1992) Adaptive rejection sampling for Gibbs sampling. J R Stat Soc Ser C (Appl Stat) 41:337–348

    MATH  Google Scholar 

  • Herring AH, Ibrahim JG (2001) Likelihood-based methods for missing covariates in the Cox proportional hazards model. J Am Stat Assoc 96:292–302

    MathSciNet  Article  Google Scholar 

  • Horton NJ, Kleinman KP (2007) Much ado about nothing: a comparison of missing data methods and software to fit incomplete data regression models. Am Stat 61(1):79–90

    MathSciNet  Article  Google Scholar 

  • Hu T, Zhou Q, Sun J (2017) Regression analysis of bivariate current status data under the proportional hazards model. Can J Stat 45:410–424

    MathSciNet  Article  Google Scholar 

  • Ibrahim JG, Lipsitz SR, Chen MH (1999) Missing covariates in generalized linear models when the missing data mechanism is nonignorable. J R Stat Soc Ser B (Stat Methodol) 61(1):173–190

    MathSciNet  Article  Google Scholar 

  • Li S, Hu T, Wang P et al (2017) Regression analysis of current status data in the presence of dependent censoring with applications to tumorigenicity experiments. Comput Stat Data Anal 110:75–86

    MathSciNet  Article  Google Scholar 

  • Lipsitz SR, Ibrahim JG, Zhao LP (1994) A weighted estimating equation for missing covariate data with properties similar to maximum likelihood. J Am Stat Assoc 94:1147–1160

    MathSciNet  Article  Google Scholar 

  • Little RJ, Rubin DB (2002) Statistical analysis with missing data. Wiley, New York

    Book  Google Scholar 

  • Li S, Wu Q, Sun J (2020) Penalized estimation of semiparametric transformation models with interval-censored data and application to Alzheimer disease. Stat Methods Med Res 29(8):2151–2166

    MathSciNet  Article  Google Scholar 

  • Ma L, Hu T, Sun J (2015) Sieve maximum likelihood regression analysis of dependent current status data. Biom J 102:731–738

    MathSciNet  MATH  Google Scholar 

  • McMahan CS, Wang L, Tebbs JM (2013) Regression analysis for current status data using the EM algorithm. Stat Med 32:4452–4466

    MathSciNet  Article  Google Scholar 

  • Qi L, Wang CY, Prentice RL (2005) Weighted estimators for proportional hazards regression with missing covariates. J Am Stat Assoc 100:1250–1263

    MathSciNet  Article  Google Scholar 

  • Ramsay JO (1988) Monotone regression splines in action. Stat Sci 3(4):425–441

    Google Scholar 

  • Schomaker M, Heumann C (2018) Bootstrap inference when using multiple imputations. Stat Med 37(14):2252–2266

    MathSciNet  Article  Google Scholar 

  • Sun J (2006) The statistical analysis of interval-censored failure time data. Springer, New York

    MATH  Google Scholar 

  • Shen X, Wong WH (1994) Convergence rate of sieve estimates. Ann Stat 22:580–615

    MathSciNet  Article  Google Scholar 

  • Su YR, Wang JL (2016) Semiparametric efficient estimation for shared-frailty models with doubly-censored clustered data. Ann Stat 44(3):1298–1331

    MathSciNet  MATH  Google Scholar 

  • Van der Vaart AW (1998) Asymptotic statistic. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Van Der Vaart A, Wellner JA (1996) Weak convergence and empirical processes: with applications to statistics. Springer, New York

    Book  Google Scholar 

  • Wen CC, Lin CT (2011) Analysis of current status data with missing covariates. Biometrics 67:760–769

    MathSciNet  Article  Google Scholar 

  • Wang L, McMahan CS, Hudgens MG et al (2016) A flexible, computationally efficient method for fitting the proportional hazards model to interval-censored data. Biometrics 72:222–231

    MathSciNet  Article  Google Scholar 

  • Zhao S, Hu T, Ma L et al (2015) Regression analysis of informative current status data with the additive hazards model. Lifetime Data Anal 21:241–258

    MathSciNet  Article  Google Scholar 

  • Zeng D, Mao L, Lin DY (2016) Maximum likelihood estimation for semiparametric transformation models with interval-censored data. Biometrika 103:253–271

    MathSciNet  Article  Google Scholar 

  • Zhou H, Pepe MS (1995) Auxiliary covariate data in failure time regression. Biometrika 82(1):139–149

    MathSciNet  Article  Google Scholar 

Download references


The authors wish to thank the Editor-in-Chief, Dr. Mei-Ling Ting Lee, the Associate Editor and two reviewers for their many helpful and insightful comments and suggestions that greatly improved the paper. The research was partially supported by grants from the Natural Science Foundation of China [Grant Number 11731011], a grant from key project of the Yunnan Province Foundation, China [Grant Number 202001BB050049]. Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database ( As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at:

Author information

Authors and Affiliations


Corresponding author

Correspondence to Huiqiong Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.



A.1.E-step of the EM algorithm for continuous covariates

In the E-step of the EM algorithm developed in Sect. 3, we need to calculate the expectations \(E(Z_i|\mathbf{O_i},\theta ^\mathbf{(d)} )\) and \(E(W_i|\mathbf{O_i},\theta ^\mathbf{(d)} )\). As described there, when missing covariates are categorical, they are some summations and can be expressed in the closed form. However, for continuous covariates, this will not be the case and instead we have to deal with the integrals that do not have a closed form. More specifically, we have that

$$\begin{aligned} E(Z_i|\mathbf{O_i},\theta ^\mathbf{(d)} )= & {} \int _{\mathbf{X_{miss}}}\frac{\varvec{\Lambda }^{(\mathbf{d})}(\mathbf{V_i})\mathbf{exp}(\beta _\mathbf{1}^{(\mathbf{d})'}{\mathbf{X_i}^\mathbf{obs}}+\beta _\mathbf{2}^{(\mathbf{d})'}{} \mathbf{X_{i}^{miss}})\delta _\mathbf{1i}}{1-\mathbf{exp}\{-\varvec{\Lambda }^{(\mathbf{d})}(\mathbf{V_i})\mathbf{exp}(\beta _\mathbf{1}^{(\mathbf{d})'}{\mathbf{X_i}^\mathbf{obs}}+\beta _\mathbf{2}^{(\mathbf{d})'}{} \mathbf{X_{i}^{miss}})\}}\\&\times p(\mathbf{X_{i}^{miss}}|{\mathbf{O_i}},{\theta ^{(\mathbf{d})}} )\mathbf{dX_{i}^{miss} }, \end{aligned}$$


$$\begin{aligned} E(W_i|\mathbf{O_i},\theta ^\mathbf{(d)} )= & {} \int _\mathbf{{{X_{i}^{miss}}}}\frac{\{{\varvec{\Lambda }}^{\mathbf{(d)}}(\mathbf{U_i}) -{\varvec{\Lambda }}^{\mathbf{(d)}}(\mathbf{V_i})\}{\mathbf{exp}}(\beta _\mathbf{1}^{(\mathbf{d})'}{\mathbf{X_i}^{\mathbf{obs}}} +\beta _\mathbf{2}^{(\mathbf{d})'}{} \mathbf{X_{i}^{miss}})\delta _{\mathbf{2i}}}{1-{\mathbf{exp}}[-\{{\varvec{{\Lambda }}}^{(\mathbf{d})}(\mathbf{U_i})- {\varvec{\Lambda }}^{(\mathbf{d})}(\mathbf{V_i})\}\text{ exp }(\beta _\mathbf{1}^{(\mathbf{d})'}{\mathbf{X_i}^{\mathbf{obs}}}+\beta _\mathbf{2}^{(\mathbf{d})'}{} \mathbf{X_{i}^{miss}})]}\\&\times p(\mathbf{X_{i}^{miss}}|{\mathbf{O_i}},{\theta ^{(\mathbf{d})})}{} \mathbf{dX_{i}^{miss}} \end{aligned}$$

by using the notation defined before.

To calculate the integrals above, by following Herring and Ibrahim (2001), one can employ the Monte-Carlo estimation approach, which draws the sample from

$$\begin{aligned} p_{ij}= & {} P(\mathbf{X_i^{mis}}|{\mathbf{O_i},\theta ^\mathbf{(d)}} )=\frac{\mathbf{f}(\mathbf{U_i}, \mathbf{V_i}, \delta _{\mathbf{1i}}, \delta _{\mathbf{2i}}, \delta _{\mathbf{3i}}|{{\mathbf{X_i}^{\mathbf{obs}}},\mathbf{X_i}^{\mathbf{mis}}})\mathbf{f}({\mathbf{X_i}^{\mathbf{obs}}}, \mathbf{X_i}^{\mathbf{mis}}; \gamma ^{(\mathbf{d})})}{\int _{\mathbf{X_{i}^{mis}}} \mathbf{f}(\mathbf{U_i}, \mathbf{V_i}, \delta _{\mathbf{1i}}, \delta _{\mathbf{2i}}, \delta _{\mathbf{3i}}|{{\mathbf{X_i}^{\mathbf{obs}}}, \mathbf{X_i}^{\mathbf{mis}}})\mathbf{f}({\mathbf{X_i}^{\mathbf{obs}}}, \mathbf{X_i}^{\mathbf{mis}}; \gamma ^{(\mathbf{d})})}\\\propto & {} \mathbf{f}(\mathbf{U_i}, \mathbf{V_i}, \delta _{1i}, \delta _{2i}, \delta _{3i}|{{\mathbf{X_i}^{\mathbf{obs}}}, \mathbf{X_i}^{\mathbf{mis}}})\mathbf{f}({\mathbf{X_i}^{{\mathbf{obs}}}}, \mathbf{X_i}^{\mathbf{mis}}; \gamma ^{(\mathbf{d})}) . \end{aligned}$$

Note that \(f(U_i, V_i, \delta _{1i}, \delta _{2i}, \delta _{3i}|{{\mathbf{X_i}^{\mathbf{obs}}}, \mathbf{X_i}^{\mathbf{mis}}})\) is log-concave (Ibrahim et al. 1999) and if \(f({\mathbf{X_i}^{\mathbf{obs}}},\mathbf{X_i}^{\mathbf{mis}};\gamma ^{(\mathbf{d})})\) belongs to the exponential family, the logrithm of \(P(\mathbf{{ X_i^{mis}}}|{\mathbf{O_i},\theta ^\mathbf{(d)}} )\) is concave. It follows that one can use the Gibbs sampler (Gilks and Wild 1992) and adaptive rejection algorithm (Gilks and Wild 1992) to sample from \(P({ \mathbf{X_i^{mis}}}|{\mathbf{O_i},\theta ^\mathbf{(d)}} )\).

More specifically for the determination of \(E(Z_i|\mathbf{O_i},\theta ^\mathbf{(d)} )\), for each subject with missing covariate \(\mathbf{X_{i}^{miss}}\), we first apply the Gibbs sampler and adaptive reject algorithm to draw the sample \(s_{i,1},...,s_{i,n_{i}}\) of size \(n_i\) from \(p(\mathbf{X_{i}^{miss}|O_{i}},\theta ^{\mathbf{(d)}})\). Then the conditional expectation can be approximated by

$$\begin{aligned} E(Z_i|\mathbf{O_i},\theta ^\mathbf{(d)} )=\frac{1}{\mathbf{n_{i}}}\sum _{\mathbf{k=1}}^{\mathbf{n_{i}}}\frac{{\varvec{\Lambda }}^{(\mathbf{d})}(\mathbf{V_i}){\mathbf{exp}}(\beta _\mathbf{1}^{(\mathbf{d})'}{\mathbf{X_i^{obs}}}+\beta _\mathbf{2}^{(\mathbf{d})'}{\mathbf{s_{i,k}}})\delta _{\mathbf{1i}}}{1-{\mathbf{exp}}\{-{{\varvec{\Lambda }}}^{(\mathbf{d})}(\mathbf{V_i}){\mathbf{exp}}(\beta _\mathbf{1}^{(\mathbf{d})'}{\mathbf{X_i^{obs}}}+\beta _\mathbf{2}^{(\mathbf{d})'}{} \mathbf{s}_{\mathbf{i,k}})\}} . \end{aligned}$$

In comparison to the categorical covariate situation, the above operation can be regarded as replacing each \(x_{i}^{miss}\) by \(n_{i}\) sampled values with equal weight. It is apparent that \(E(W_i|\mathbf{O_i},\theta ^\mathbf{(d)} )\) can be calculated similarly.

A.2.Proofs of the asymptotic properties

In this Appendix, we will sketch the proof for the consistency and asymptotic normality of the proposed estimators given in Theorem 1 by employing the empirical process theory and nonparametric techniques. Define \({P}f=\int f(x)dP(x)\) and \({P}_n f = n^{-1} \sum \limits _{i=1}^{n} f(X_i)\) for a function f, a probability function P and a sample \(X_1, \ldots , X_n\). For the proof, we need the following regularity conditions.

  1. (A1)

    Assume that \(\Lambda (\tau _1)<\infty \), \(\Lambda (\tau _2)<\infty \), and there exists a positive constant a such that \(P ( V - U> a ) > 0\). Also the union of the supports of U and V is contained in the interval \([r_1, r_2]\) with \(0<r_1<r_2< +\infty \).

  2. (A2)

    The function \(\Lambda _0\) is continuously differentiable on \([r_1, r_2]\), and satisfies \( M^{-1}<\Lambda _0(r_1)<\Lambda _0(r_2)< M\) for some positive constant M.

  3. (A3)

    The set of covariates (XZ) has bounded support.

  4. (A4)

    The conditional distribution \(f(\mathbf{X_i^{mis}}|\mathbf{X_i^{obs}}; \gamma )\) is identifiable and has continuous second-order derivatives with respect to \(\gamma \), and \(-E_0[\partial ^2/\partial \gamma ^2)\text{ log }f(\mathbf{X_i^{mis}}|\mathbf{X_i^{obs}}; \gamma _0)]\) is positive definite.

  5. (A5)

    For any \(({\theta }, \varvec{\Lambda })\) near \(({ \theta _\mathbf{0}}, {\varvec{\Lambda }_\mathbf{0}})\), \({P}_0(\text{ log }L({\theta , \varvec{\Lambda }})-\text{ log }L({\theta _\mathbf{0}, \varvec{\Lambda }_\mathbf{0}})\leqslant -K(||\theta -\theta _\mathbf{0}||^2+||\varvec{\Lambda }-\varvec{\Lambda }_\mathbf{0}||^2)\) for a fixed constant \(K>0\).

First we will prove the consistency and for this, we will verify the conditions of Theorem 5.7 of Van der Vaart (1998). Let \(BV_\omega [r_1, r_2]\) denote the functions whose total variation in \([r_1, r_2]\) are bounded by a given constant. Then the class of functions

$$\begin{aligned} F_\omega =\left\{ \int \limits _{0}^{U_k}\text{ exp }\{\beta ^{T}X_i\}d\Lambda (s): \Lambda \in BV_\omega [r_1, r_2]\right\} \end{aligned}$$

is a convex hull of functions \(\{I(U_k\geqslant s)\text{ exp }\{\beta ^{T}X_i\}\) and thus it is a Donsker class. Furthermore,

$$\begin{aligned} \text{ exp }\left( -\int \limits _{0}^{U_k}\text{ exp }\{\beta ^{T}X_i\}d\Lambda (s)\right) -\text{ exp }\left( -\int \limits _{0}^{U_{k+1}}\text{ exp }\{\beta ^{T}X_i\}d\Lambda (s)\right) \end{aligned}$$

is bounded away from zero. Therefore, \(l(\theta , {\hat{\alpha }}|\mathbf{O})=\text{ log }L(\theta , {\hat{\alpha }}|\mathbf{O})\) belongs to some Donsker class due to the preservation property of the Donsker class under Lipschitz-continuous transformations. Then we can conclude that \(\sup _{\theta \in \Theta _n}|{P}_nl(\theta , {\hat{\alpha }}|\mathbf{O})-{P}_nl(\theta _0, {\hat{\alpha }}|\mathbf{O})|\) converges in probability to 0 as \(n\rightarrow 0\).

Now we verify that another condition of Theorem 5.7 of Van der Vaart (1998) also holds. That is, for any \(\varepsilon >0\), we have

$$\begin{aligned} \sup _{d(\theta , \theta _0)>\varepsilon }Pl(\theta ,{\hat{\alpha }}|\mathbf{O}) <Pl(\theta _0, {\hat{\alpha }}|\mathbf{O}) . \end{aligned}$$

Note that this condition is satisfied if we can prove the model is identifiable. According to condition (A4) and similar arguments to the proof of Theorem 2.1 of Chang et al. (2007), we can show the identifiability of the model parameters. Now, by Theorem 5.7 of Van der Vaart (1998), we have \(d({\hat{\theta }}_n, \theta _0)= o_p(1)\), which completes the proof of consistency.

Before proving the asymptotic normality, we will need to establish the convergence rate. For this, we will first define the covering number of the class \({{\mathcal {L}}}=\{l(\theta ,{\hat{\alpha }}|\mathbf{O}):\theta \in \Theta \}\) and establish a needed lemma.

Lemma 1

Assume that Conditions (A1), (A3)–(A4) hold. Then the covering number of the class \({{\mathcal {L}}} = \{l(\theta ,{\hat{\alpha }}|\mathbf{O}): \theta \in \Theta \}\) satisfies

$$\begin{aligned} N(\epsilon , {{\mathcal {L}}}, L_2(P))=O(\epsilon ^{-1}). \end{aligned}$$

Proof of Lemma 1

The proof is similar to that of Zeng et al. (2016) and Hu et al. (2017) and thus omitted.

To establish the convergence rate, for any \(\eta >0\), define the class \({{\mathcal {F}}}_\eta =\{l(\theta _{n0}, {\hat{\alpha }}|\mathbf{O})-l(\theta , {\hat{\alpha }}|\mathbf{O}): \theta \in \Theta , d(\theta , \theta _{n0})\leqslant \eta \}\) with \(\theta _{n0}=(\beta _0,\Lambda _{n0})\). Following the calculation of (Shen and Wong 1994, p. 597), we can establish that \(\text{ log }N_{[]}(\epsilon , {{\mathcal {F}}}_{\eta }, \parallel .\parallel _{2})\leqslant CN \text{ log }(\eta /\epsilon )\) with \(N=m+1\), where \(N_{[]}(\epsilon , {{\mathcal {F}}}_{\eta }, d)\) denotes the bracketing number (see the Definition 2.1.6 in Van Der Vaart and Wellner 1996) with respect to the metric or semi-metric d of a function class \( {{\mathcal {F}}}\). Moreover, some algebraic calculations lead to \(\parallel l(\theta _{n0},{\hat{\alpha }}|\mathbf{O})-l(\theta , {\hat{\alpha }}|\mathbf{O})\parallel _{2}^2\leqslant C\eta ^2\) for any \(l(\theta _{n0}, {\hat{\alpha }}|\mathbf{O})-l(\theta , {\hat{\alpha }}|\mathbf{O})\in {{\mathcal {F}}}_\eta \). Therefore, by Lemma 3.4.2 of Van Der Vaart and Wellner (1996), we obtain

$$\begin{aligned} E_p\parallel n^{1/2}(P_n-P)\parallel _{{\mathcal {F}}_{\eta }}\leqslant CJ_\eta (\epsilon , {{\mathcal {F}}}_\eta , \parallel .\parallel _{2})\left\{ 1+\frac{J_\eta (\epsilon ,{{\mathcal {F}}}_\eta , \parallel .\parallel _{2})}{\eta ^2n^{1/2}}\right\} , ~~~~~~~~(S) \end{aligned}$$

where \(J_{[ ]}(\eta , {{\mathcal {F}}}_\eta , \parallel .\parallel _{2})=\int _{0}^\eta \{logN_{[]}(\epsilon , {{\mathcal {F}}}_{\eta }, \parallel .\parallel _{2})\}^{1/2}d\epsilon \). The right-hand side of (S) yields \(\phi _n(\eta )=C\eta ^{1/2}(1+\frac{\eta ^{1/2}}{\eta ^{2} n^{1/2}}M_1),\) where \(M_1\) is a positive constant. Then \(\phi _n(\eta )/\eta \) is a decreasing function, and \(n^{2/3}\phi _n(-1/3)=O(n^{1/2})\). According the theorem 3.4.1 of Van Der Vaart and Wellner (1996), we can conclude that \(d({\hat{\theta }}, \theta _0)=O_p(n^{-1/3})\).

Now we prove the asymptotic normality of \({\hat{\beta }}_n\). Following the proof of Theorem 2 in Zeng et al. (2016), one can obtain that

$$\begin{aligned} \sqrt{n} ( {{\hat{\beta }}}_n - \beta _0 )=(E[\{l_\beta -l_\Lambda (s^*)\}\{l_\beta -l_\Lambda (s^*)\}^{T})^{-1}G_n\{l_\beta -l_\Lambda (s^*)\}+o_p(1), \end{aligned}$$

where \(l_\beta \) is the score function for \(\beta \), \( l_\Lambda (s^*)\) is the score function along this submodel \(d\Lambda _{\epsilon , s^*}=(1+\epsilon s^*)d\Lambda \). This implies that the influence function for \({\hat{\beta }}_n\) is exactly the efficient influence function, so that \(\sqrt{n} ( {{\hat{\beta }}}_n - \beta _0 )\) converges to a zero-mean normal random vector whose covariance matrix attains the semiparametric efficiency bound. \(\square \)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhou, R., Li, H., Sun, J. et al. A new approach to estimation of the proportional hazards model based on interval-censored data with missing covariates. Lifetime Data Anal 28, 335–355 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Case II interval-censored data
  • EM algorithm
  • Missing at random
  • Sieve approach