Skip to main content
Log in

Semiparametric methods for left-truncated and right-censored survival data with covariate measurement error

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

Many methods have been developed for analyzing survival data which are commonly right-censored. These methods, however, are challenged by complex features pertinent to the data collection as well as the nature of data themselves. Typically, biased samples caused by left-truncation (or length-biased sampling) and measurement error often accompany survival analysis. While such data frequently arise in practice, little work has been available to simultaneously address these features. In this paper, we explore valid inference methods for handling left-truncated and right-censored survival data with measurement error under the widely used Cox model. We first exploit a flexible estimator for the survival model parameters which does not require specification of the baseline hazard function. To improve the efficiency, we further develop an augmented nonparametric maximum likelihood estimator. We establish asymptotic results and examine the efficiency and robustness issues for the proposed estimators. The proposed methods enjoy appealing features that the distributions of the covariates and of the truncation times are left unspecified. Numerical studies are reported to assess the finite sample performance of the proposed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Andersen, P. K., Gill, R. D. (1982). Cox’s regression model for counting processes: A large sample study. The Annals of Statistics, 10, 1100–1120.

    Article  MathSciNet  Google Scholar 

  • Augustin, T. (2004). An exact corrected log-likelihood function for Cox’s proportional hazards model under measurement error and some extensions. Scandinavian Journal of Statistics, 31, 43–50.

    Article  MathSciNet  Google Scholar 

  • Buzas, J. F. (1998). Unbiased scores in proportional hazards regression with covariate measurement error. Journal of Statistical Planning and Inference, 67, 247–257.

    Article  MathSciNet  Google Scholar 

  • Carroll, R. J., Li, K.-C. (1992). Measurement error regression with unknown link: Dimension reduction and data visualization. Journal of the American Statistical Association, 87, 1040–1050.

    Article  MathSciNet  Google Scholar 

  • Carroll, R. J., Ruppert, D., Stefanski, L. A., Crainiceanu, C. M. (2006). Measurement error in nonlinear model. New York: Chapman & Hall/CRC.

    Book  Google Scholar 

  • Goutis, C., Casella, G. (1999). Explaining the saddlepoint approximation. The American Statistician, 53, 216–224.

    MathSciNet  Google Scholar 

  • Greene, W. F., Cai, J. (2004). Measurement error in covariates in the marginal hazards model for multivariate failure time data. Biometrics, 60, 987–996.

    Article  MathSciNet  Google Scholar 

  • Henmi, M., Eguchi, S. (2004). A paradox concerning nuisance parameters and projected estimating functions. Biometrika, 91, 929–941.

    Article  MathSciNet  Google Scholar 

  • Hosmer, D. W., Lemeshow, S., May, S. (2008). Applied survival analysis: Regression modeling of time to event data. New York: Wiley.

    Book  Google Scholar 

  • Hu, C., Lin, D. Y. (2002). Cox regression with covariate measurement error. Scandnavian Journal of Statistics, 29, 637–655.

    Article  MathSciNet  Google Scholar 

  • Huang, C. Y., Qin, J., Follmann, D. A. (2012). A maximum pseudo-profile likelihood estimator for the Cox model under length-biased sampling. Biometrika, 99, 199–210.

    Article  MathSciNet  Google Scholar 

  • Huang, Y., Wang, C. Y. (2000). Cox regression with accurate covariates unascertainable: A nonparametric correction approach. Journal of the American Statistical Association, 95, 1209–1219.

    Article  MathSciNet  Google Scholar 

  • Kalbfleisch, J. D., Prentice, R. L. (2002). The Statistical Analysis of Failure Time Data. New York: Wiley.

    Book  Google Scholar 

  • Kong, F. H., Gu, M. (1999). Consistent estimation in Cox proportional hazards model with covariate measurement errors. Statistica Sinica, 9, 953–969.

    MathSciNet  MATH  Google Scholar 

  • Küchenoff, H., Bender, R., Langner, I. (2007). Effect of Berkson measurement error on parameter estimates in Cox regression models. Lifetime Data Analysis, 13, 261–272.

    Article  MathSciNet  Google Scholar 

  • Lawless, J. F. (2003). Statistical models and methods for lifetime data. New York: Wiley.

    MATH  Google Scholar 

  • Li, Y., Ryan, L. (2006). Inference on survival data with covariate measurement error: An imputation-based approach. Scandinavian Journal of Statistics, 33, 169–190.

    Article  MathSciNet  Google Scholar 

  • Nakamura, T. (1992). Proportional hazards model with covariates subject to measurement error. Biometrics, 48, 829–838.

    Article  MathSciNet  Google Scholar 

  • Ning, Y., Yi, G. Y., Reid, N. (2018). A class of weighted estimating equations for semiparametric transformation models with missing covariates. Scandinavian Journal of Statistics, 45, 87–109.

    Article  MathSciNet  Google Scholar 

  • Prentice, R. L. (1982). Covariate measurement errors and parameter estimation in a failure time regression model. Biometrika, 69, 331–342.

    Article  MathSciNet  Google Scholar 

  • Qin, J., Shen, Y. (2010). Statistical methods for analyzing right-censored length-biased data under Cox model. Biometrics, 66, 382–392.

    Article  MathSciNet  Google Scholar 

  • Qin, J., Ning, J., Liu, H., Shen, Y. (2011). Maximum likelihood estimations and EM algorithms with length-biased data. Journal of the American Statistical Association, 106, 1434–1449.

    Article  MathSciNet  Google Scholar 

  • Robins, J. M., Rotnitzky, A., Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association, 89, 846–866.

    Article  MathSciNet  Google Scholar 

  • Rothman, K. J. (2008). BMI-related errors in the measurement of obesity. International Journal of Obesity, 32, 56–59.

    Article  Google Scholar 

  • Silverman, B. W. (1978). Weak and strong uniform consistency of the kernel estimate of a density and its derivative. The Annals of Statistics, 6, 177–184.

    Article  MathSciNet  Google Scholar 

  • Song, X., Huang, Y. (2005). On corrected score approach for proportional hazards model with covariate measurement error. Biometrics, 61, 702–714.

    Article  MathSciNet  Google Scholar 

  • Su, Y., Wang, J. (2012). Modeling left-truncated and right-censored survival data with longitudinal covariates. The Annals of Statistics, 40, 1465–1488.

    Article  MathSciNet  Google Scholar 

  • van der Vaart, A. W. (1998). Asymptotic statistics. New York: Cambridge University Press.

    Book  Google Scholar 

  • Wang, C. Y. (1999). Robust sandwich covariance estimation for regression calibration estimator in Cox regression with measurement error. Statistics & Probability Letters, 45, 371–378.

    Article  MathSciNet  Google Scholar 

  • Wang, C. Y. (2000). Flexible regression calibration for covariate measurement error with longitudinal surrogate variables. Statistica Sinica, 10, 905–921.

    MathSciNet  MATH  Google Scholar 

  • Wang, M. C. (1991). Nonparametric estimation from cross-sectional survival data. Journal of the American Statistical Association, 86, 130–143.

    Article  MathSciNet  Google Scholar 

  • Wu, F., Kim, S., Qin, J., Saran, R., Li, Y. (2018). A pairwise likelihood augmented Cox estimator for left-truncated data. Biometrics, 74, 100–108.

    Article  MathSciNet  Google Scholar 

  • Xie, S. H., Wang, C. Y., Prentice, R. L. (2001). A risk set calibration method for failure time regression by using a covariate reliability sample. Journal of the Royal Statistical Society, Series B, 63, 855–870.

    Article  MathSciNet  Google Scholar 

  • Xu, Y., Kim, J. K., Li, Y. (2017). Semiparametric estimation for measurement error models with validation data. The Canadian Journal of Statistics, 45, 185–201.

    Article  MathSciNet  Google Scholar 

  • Yan, Y., Yi, G. Y. (2015). A corrected profile likelihood method for survival data with covariate measurement error under the Cox model. The Canadian Journal of Statistics, 43, 454–480.

    Article  MathSciNet  Google Scholar 

  • Yan, Y., Yi, G. Y. (2016). A class of functional methods for error-contaminated survival data under additive hazards models with replicate measurements. Journal of the American Statistical Association, 111, 684–695.

    Article  MathSciNet  Google Scholar 

  • Yi, G. Y. (2017). Statistical analysis with measurement error and misclassication: Strategy, method and application. New York: Springer.

    Book  Google Scholar 

  • Yi, G. Y., Lawless, J. F. (2007). A corrected likelihood method for the proportional hazards model with covariates subject to measurement error. Journal of Statistical Planning and Inference, 137, 1816–1828.

    Article  MathSciNet  Google Scholar 

  • Yi, G. Y., Ma, Y., Spiegelman, D., Carroll, R. J. (2015). Functional and structural methods with mixed measurement error and misclassification in covariates. Journal of the American Statistical Association, 110, 681–696.

    Article  MathSciNet  Google Scholar 

  • Zhao, S., Prentice, R. L. (2014). Covariate measurement error correction methods in mediation analysis with failure time data. Biometrics, 70, 835–844.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors thank the review team for the comments on the initial submission. This research was supported by the Natural Sciences and Engineering Research Council of Canada and partially supported by a Collaborative Research Team Project of the Canadian Statistical Sciences Institute. Yi is Canada Research Chair in Data Science (Tier 1). Her research was undertaken, in part, thanks to funding from the Canada Research Chairs program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Grace Y. Yi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

10463_2020_755_MOESM1_ESM.pdf

The proofs of the theorems and additional numerical results are included in the online Supplementary Material. (pdf 410 KB)

Appendix: Regularity conditions

Appendix: Regularity conditions

Like any other asymptotic results, the validity of our results requires regularity conditions imposed on the processes of survival, censoring, measurement error and covariates as well as the sampling scheme. Basically, our regularity conditions pertain to those in Andersen and Gill (1982), Huang et al. (2012), and Yan and Yi (2016), including the following assumptions:

  1. (C1)

    \(\varTheta \) is a compact set, and the true parameter value \(\beta _0\) is an interior point of \(\varTheta \).

  2. (C2)

    \(\int _0^\tau \lambda _0(t)\mathrm{d}t < \infty \), where \(\tau \) is the finite maximum support of the failure time.

  3. (C3)

    The \(\left\{ N_i(t), Y_i(t),Z_i,X_i \right\} \) are independent and identically distributed for \(i=1,\ldots ,n\).

  4. (C4)

    The covariates \(Z_i\) and \(X_i\) are bounded.

  5. (C5)

    Conditional on \(V_i^*\), \(\left( T_i^*, V_i^*\right) \) are independent of \(A_i^*\).

  6. (C6)

    Censoring time \(C_i\) is non-informative. That is, the failure time \(T_i\) and the censoring time \(C_i\) are independent, given the covariates \(\{Z_i, X_i\}\).

  7. (C7)

    Matrices \(E \left( -\frac{1}{n} \frac{\partial ^2 \ell _C^*}{\partial \beta \partial \beta ^\top } \right) \) and \(E \left( - \frac{1}{n}\frac{\partial ^2 \ell _M^*}{\partial \beta \partial \beta ^\top } \right) \) are positive definite, where \(\ell _C^*\) is defined in (10) and \(\ell _M^*\) is the logarithm of the likelihood function (19).

  8. (C8)

    The operations of differentiation and integration are exchangeable.

Condition (C1) is a basic condition that is used to derive the maximizer of the target function (e.g., Huang et al. 2012, p.203). (C2) to (C6) are standard conditions for survival analysis, which allow us to obtain the sum of independent and identically distributed random variables and hence to derive the asymptotic properties of the estimators (e.g., Andersen and Gill 1982). The requirement of positive definite matrices in Condition (C7) is standard which ensures asymptotic covariance matrices of \(\ell _C^*\) and \(\ell _M^*\) meaningful. Condition (C8) is a routine requirement for deriving asymptotic results.

Lemma 1

Let

$$ {\widehat{\ell }}_P^*= \sum _{i=1}^{n} \int \nolimits _{0}^{\tau } \left[ {\widetilde{v}}_i^\top \beta + \frac{1}{2} \beta _{x}^\top {\varSigma_{\epsilon}} \beta _{x} - \log \left\{ \sum _{j=1}^{n} \exp ({\widetilde{v}}_j^\top \beta ) I(a_j \le u \le y_j) \right\} \right] \mathrm{d}N_i (u). $$
(40)

Then (10) and (40) yield the same maximum likelihood estimator of \(\beta \).

The proof is given in Appendix B of the Supplementary Material. The following lemma is used to establish the consistency of the estimators \({\widehat{\beta }}\) and \({\widetilde{\beta }}\), respectively, given in Theorems 2 and 3.

Lemma 2

Define

$$\begin{aligned} \kappa _P = E\left( \frac{1}{n} {\widehat{\ell }}_P^*\right) \end{aligned}$$

and let

$$ \kappa= \kappa _P + E\left\{ \frac{1}{n} \log \left( L_M^*\right) \right\} , $$

where \({\widehat{\ell }}_P^*\) and \(L_M^*\) are determined by (40) and (19), respectively, with the data \(\{ {\widetilde{v}}_i, a_i, y_i,z_i \}\) replaced by the corresponding random variables \(\{ {\widetilde{V}}_i, A_i, Y_i,Z_i \}\). Then \(\beta _0\) is the unique maximizer of \(\kappa _P\) and \(\kappa \).

Proof

Part 1: We show that \(\beta _0\) is the unique maximizer of \(\kappa _P\).

Recall that \(\ell _C\) is the logarithm of the likelihood function (2) based on the true covariates X. In the absence of measurement error, i.e., based on the true covariates X, Huang et al. (2012, p.208) showed that the true value \(\beta _0\) is the unique maximizer of \(E(\ell _C)\). Noting that by (9), \(\ell _C\) and \(\ell _C^*\), defined in (10), have the relationship

$$\begin{aligned} E(\ell _C^*) = E(\ell _C). \end{aligned}$$
(41)

We conclude that \(\beta _0\) is also the unique maximizer of \(E(\ell _C^*)\). By Lemma 1, we conclude that \(\beta _0\) is the unique maximizer of \(\kappa _P\). With regularity conditions including (C8),

$$\begin{aligned} \beta _0 \ \ \text {is the unique solution of} \ \ E\left( \frac{1}{n} \frac{\partial {\widehat{\ell }}_P^*}{\partial \beta } \right) = 0, \end{aligned}$$
(42)

and

$$\begin{aligned} \left. \frac{\partial ^2 \kappa _P}{\partial \beta \partial \beta ^\top } \right| _{\beta =\beta _0} \ \ \text {is negative definite.} \end{aligned}$$
(43)

Part 2: We show that \(\beta _0\) is the unique maximizer of \(\kappa \).

Let \(\ell _M(\beta ;X,Z)\) denote the logarithm of the likelihood function (3) based on the true covariates X, and let \(\ell _M(\beta ;{\widetilde{X}},Z)\) be \(\ell _M(\beta ;X,Z)\) with X replaced by \({\widetilde{X}} = E\left( X | W^*\right) \), where \(W^*= W-{\varSigma_{\epsilon}} \alpha \) as defined before (7). Define \(U_M(\beta ;X,Z) = \frac{\partial \ell _M(\beta ;X,Z)}{\partial \beta }\) and let \({\mathcal{U}}_M(\beta ;X,Z) = E\left\{ {\frac{1}{n}} U_M(\beta ;X,Z) \right\} \).

Recall that \(\mu _X = E(X)\) defined before (17), then by (7), we have that \(E({\widetilde{X}}) = \mu _X\). Let \(\mu _Z = E(Z)\). Then by the linear approximation around \(\mu _X\) and \(\mu _Z\), we express \(U_M(\beta ;{\widetilde{X}},Z)\) and \(U_M(\beta ;X,Z)\), respectively, as

$$\begin{aligned} U_M(\beta ;{\widetilde{X}},Z)& {} \approx U_M(\beta ;\mu _X,\mu _Z) + \frac{\partial U_M(\beta ;\mu _X,\mu _Z)}{\partial \mu _X} ({\widetilde{X}} - \mu _X) \nonumber \\&\quad + \frac{\partial U_M(\beta ;\mu _X,\mu _Z)}{\partial \mu _Z} (Z - \mu _Z) \nonumber \\ \end{aligned}$$
(44)

and

$$\begin{aligned} U_M(\beta ;X,Z)& {} \approx U_M(\beta ;\mu _X,\mu _Z) + \frac{\partial U_M(\beta ;\mu _X,\mu _Z)}{\partial \mu _X} (X - \mu _X)\nonumber \\&\quad + \frac{\partial U_M(\beta ;\mu _X,\mu _Z)}{\partial \mu _Z} (Z - \mu _Z), \nonumber \\ \end{aligned}$$
(45)

where \(\frac{\partial U_M(\beta ;\mu _X,\mu _Z)}{\partial \mu _X}\) represents the partial derivative \(\frac{\partial U_M(\beta ;a,b)}{\partial a}\) evaluated at \((a,b) = (\mu _X,\mu _Z)\), and \(\frac{\partial U_M(\beta ;\mu _X,\mu _Z)}{\partial \mu _Z}\) represents the partial derivative \(\frac{\partial U_M(\beta ;a,b)}{\partial b}\) evaluated at \((a,b) = (\mu _X,\mu _Z)\). Here \(U_M(\beta ;a,b)\) has the same functional form as \(U_M(\beta ;X,Z)\) except that the former is a real-valued function with arguments \(\beta \), a and b, while the latter case is a function of random variables X and Z together with \(\beta \).

Combining (44) and (45) gives that

$$\begin{aligned} U_M(\beta ;{\widetilde{X}},Z) \approx U_M(\beta; X,Z) + \frac{\partial U_M(\beta ;\mu _X,\mu _Z)}{\partial \mu _X} ({\widetilde{X}} - X). \end{aligned}$$
(46)

Therefore, taking expectation on both sides of (46) and replacing \(\beta \) by \(\beta _0\) give

$$\begin{aligned} {\mathcal{U}}_M(\beta _0; {\widetilde{X}},Z) \approx 0 \end{aligned}$$
(47)

because that \(E( {\widetilde{X}}) - E(X) = \mu _X - \mu _X = 0\) and \({\mathcal{U}}_M(\beta _0; X,Z) = 0\) (e.g., Huang et al. 2012, p.208).

By (42) and (47),

$$\begin{aligned} E\left( \left. \frac{1}{n} \frac{\partial {\widehat{\ell }}_P^*}{\partial \beta } \right| _{\beta =\beta _0} \right) + {\mathcal{U}}_M(\beta _0; {\widetilde{X}},Z)\approx 0. \end{aligned}$$
(48)

By definition of \(U_M(\beta ;X,Z)\) and (19) together with (17), we have \(U_M(\beta ;{\widetilde{X}},Z) = \frac{\partial \log (L_M^*)}{\partial \beta }\), and thus, \({\mathcal{U}}_M(\beta ;{\widetilde{X}},Z) = E\left\{ \frac{1}{n} \frac{\partial \log (L_M^*)}{\partial \beta } \right\} \) and \(\frac{\partial {\mathcal{U}}_M(\beta ;{\widetilde{X}},Z)}{\partial \beta } = E\left\{ \frac{1}{n} \frac{\partial ^2 \log (L_M^*)}{\partial \beta \partial \beta ^\top } \right\} \). Then applying (48) gives

$$\begin{aligned} \left. \frac{\partial \kappa }{\partial \beta } \right| _{\beta =\beta _0} \approx 0. \end{aligned}$$
(49)

Next, by taking expectation on (46) and then taking the partial derivative with respect to \(\beta \) give that

$$\begin{aligned} \frac{\partial {\mathcal{U}}_M(\beta ;{\widetilde{X}},Z)}{\partial \beta } \approx \frac{\partial {\mathcal{U}}_M(\beta ;X,Z)}{\partial \beta }. \end{aligned}$$
(50)

By the derivations similar to Huang et al. (2012, p.208), \(\frac{\partial {\mathcal{U}}_M(\beta ;X,Z)}{\partial \beta }\) is negative definite at \(\beta =\beta _0\), and thus, by (50), \(\frac{\partial {\mathcal{U}}_M(\beta ;{\widetilde{X}},Z)}{\partial \beta }\) is also negative definite at \(\beta =\beta _0\). Then combining with (43) gives that \( \frac{\partial ^2 \kappa }{\partial \beta \partial \beta ^\top } = \frac{\partial ^2 \kappa _P}{\partial \beta \partial \beta ^\top } + E\left\{ \frac{1}{n} \frac{\partial ^2 \log (L_M^*)}{\partial \beta \partial \beta ^\top } \right\} \) is negative definite at \(\beta =\beta _0\). Therefore, combining with (49), we conclude that \(\beta _0\) is approximately the maximizer of \(\kappa \). \(\square \)

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, LP., Yi, G.Y. Semiparametric methods for left-truncated and right-censored survival data with covariate measurement error. Ann Inst Stat Math 73, 481–517 (2021). https://doi.org/10.1007/s10463-020-00755-2

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-020-00755-2

Keywords

Navigation