Semiparametric methods for left-truncated and right-censored survival data with covariate measurement error

Chen, Li-Pang; Yi, Grace Y.

doi:10.1007/s10463-020-00755-2

Semiparametric methods for left-truncated and right-censored survival data with covariate measurement error

Published: 02 June 2020

Volume 73, pages 481–517, (2021)
Cite this article

Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Li-Pang Chen¹ &
Grace Y. Yi^1,2

854 Accesses
18 Citations
Explore all metrics

Abstract

Many methods have been developed for analyzing survival data which are commonly right-censored. These methods, however, are challenged by complex features pertinent to the data collection as well as the nature of data themselves. Typically, biased samples caused by left-truncation (or length-biased sampling) and measurement error often accompany survival analysis. While such data frequently arise in practice, little work has been available to simultaneously address these features. In this paper, we explore valid inference methods for handling left-truncated and right-censored survival data with measurement error under the widely used Cox model. We first exploit a flexible estimator for the survival model parameters which does not require specification of the baseline hazard function. To improve the efficiency, we further develop an augmented nonparametric maximum likelihood estimator. We establish asymptotic results and examine the efficiency and robustness issues for the proposed estimators. The proposed methods enjoy appealing features that the distributions of the covariates and of the truncation times are left unspecified. Numerical studies are reported to assess the finite sample performance of the proposed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Nonparametric estimators of survival function under the mixed case interval-censored model with left truncation

Article 13 January 2020

Estimating the survival function based on the semi-Markov model for dependent censoring

Article 14 March 2015

Nonparametric estimation of univariate and bivariate survival functions under right censoring: a survey

Article 05 June 2023

References

Andersen, P. K., Gill, R. D. (1982). Cox’s regression model for counting processes: A large sample study. The Annals of Statistics, 10, 1100–1120.
Article MathSciNet Google Scholar
Augustin, T. (2004). An exact corrected log-likelihood function for Cox’s proportional hazards model under measurement error and some extensions. Scandinavian Journal of Statistics, 31, 43–50.
Article MathSciNet Google Scholar
Buzas, J. F. (1998). Unbiased scores in proportional hazards regression with covariate measurement error. Journal of Statistical Planning and Inference, 67, 247–257.
Article MathSciNet Google Scholar
Carroll, R. J., Li, K.-C. (1992). Measurement error regression with unknown link: Dimension reduction and data visualization. Journal of the American Statistical Association, 87, 1040–1050.
Article MathSciNet Google Scholar
Carroll, R. J., Ruppert, D., Stefanski, L. A., Crainiceanu, C. M. (2006). Measurement error in nonlinear model. New York: Chapman & Hall/CRC.
Book Google Scholar
Goutis, C., Casella, G. (1999). Explaining the saddlepoint approximation. The American Statistician, 53, 216–224.
MathSciNet Google Scholar
Greene, W. F., Cai, J. (2004). Measurement error in covariates in the marginal hazards model for multivariate failure time data. Biometrics, 60, 987–996.
Article MathSciNet Google Scholar
Henmi, M., Eguchi, S. (2004). A paradox concerning nuisance parameters and projected estimating functions. Biometrika, 91, 929–941.
Article MathSciNet Google Scholar
Hosmer, D. W., Lemeshow, S., May, S. (2008). Applied survival analysis: Regression modeling of time to event data. New York: Wiley.
Book Google Scholar
Hu, C., Lin, D. Y. (2002). Cox regression with covariate measurement error. Scandnavian Journal of Statistics, 29, 637–655.
Article MathSciNet Google Scholar
Huang, C. Y., Qin, J., Follmann, D. A. (2012). A maximum pseudo-profile likelihood estimator for the Cox model under length-biased sampling. Biometrika, 99, 199–210.
Article MathSciNet Google Scholar
Huang, Y., Wang, C. Y. (2000). Cox regression with accurate covariates unascertainable: A nonparametric correction approach. Journal of the American Statistical Association, 95, 1209–1219.
Article MathSciNet Google Scholar
Kalbfleisch, J. D., Prentice, R. L. (2002). The Statistical Analysis of Failure Time Data. New York: Wiley.
Book Google Scholar
Kong, F. H., Gu, M. (1999). Consistent estimation in Cox proportional hazards model with covariate measurement errors. Statistica Sinica, 9, 953–969.
MathSciNet MATH Google Scholar
Küchenoff, H., Bender, R., Langner, I. (2007). Effect of Berkson measurement error on parameter estimates in Cox regression models. Lifetime Data Analysis, 13, 261–272.
Article MathSciNet Google Scholar
Lawless, J. F. (2003). Statistical models and methods for lifetime data. New York: Wiley.
MATH Google Scholar
Li, Y., Ryan, L. (2006). Inference on survival data with covariate measurement error: An imputation-based approach. Scandinavian Journal of Statistics, 33, 169–190.
Article MathSciNet Google Scholar
Nakamura, T. (1992). Proportional hazards model with covariates subject to measurement error. Biometrics, 48, 829–838.
Article MathSciNet Google Scholar
Ning, Y., Yi, G. Y., Reid, N. (2018). A class of weighted estimating equations for semiparametric transformation models with missing covariates. Scandinavian Journal of Statistics, 45, 87–109.
Article MathSciNet Google Scholar
Prentice, R. L. (1982). Covariate measurement errors and parameter estimation in a failure time regression model. Biometrika, 69, 331–342.
Article MathSciNet Google Scholar
Qin, J., Shen, Y. (2010). Statistical methods for analyzing right-censored length-biased data under Cox model. Biometrics, 66, 382–392.
Article MathSciNet Google Scholar
Qin, J., Ning, J., Liu, H., Shen, Y. (2011). Maximum likelihood estimations and EM algorithms with length-biased data. Journal of the American Statistical Association, 106, 1434–1449.
Article MathSciNet Google Scholar
Robins, J. M., Rotnitzky, A., Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association, 89, 846–866.
Article MathSciNet Google Scholar
Rothman, K. J. (2008). BMI-related errors in the measurement of obesity. International Journal of Obesity, 32, 56–59.
Article Google Scholar
Silverman, B. W. (1978). Weak and strong uniform consistency of the kernel estimate of a density and its derivative. The Annals of Statistics, 6, 177–184.
Article MathSciNet Google Scholar
Song, X., Huang, Y. (2005). On corrected score approach for proportional hazards model with covariate measurement error. Biometrics, 61, 702–714.
Article MathSciNet Google Scholar
Su, Y., Wang, J. (2012). Modeling left-truncated and right-censored survival data with longitudinal covariates. The Annals of Statistics, 40, 1465–1488.
Article MathSciNet Google Scholar
van der Vaart, A. W. (1998). Asymptotic statistics. New York: Cambridge University Press.
Book Google Scholar
Wang, C. Y. (1999). Robust sandwich covariance estimation for regression calibration estimator in Cox regression with measurement error. Statistics & Probability Letters, 45, 371–378.
Article MathSciNet Google Scholar
Wang, C. Y. (2000). Flexible regression calibration for covariate measurement error with longitudinal surrogate variables. Statistica Sinica, 10, 905–921.
MathSciNet MATH Google Scholar
Wang, M. C. (1991). Nonparametric estimation from cross-sectional survival data. Journal of the American Statistical Association, 86, 130–143.
Article MathSciNet Google Scholar
Wu, F., Kim, S., Qin, J., Saran, R., Li, Y. (2018). A pairwise likelihood augmented Cox estimator for left-truncated data. Biometrics, 74, 100–108.
Article MathSciNet Google Scholar
Xie, S. H., Wang, C. Y., Prentice, R. L. (2001). A risk set calibration method for failure time regression by using a covariate reliability sample. Journal of the Royal Statistical Society, Series B, 63, 855–870.
Article MathSciNet Google Scholar
Xu, Y., Kim, J. K., Li, Y. (2017). Semiparametric estimation for measurement error models with validation data. The Canadian Journal of Statistics, 45, 185–201.
Article MathSciNet Google Scholar
Yan, Y., Yi, G. Y. (2015). A corrected profile likelihood method for survival data with covariate measurement error under the Cox model. The Canadian Journal of Statistics, 43, 454–480.
Article MathSciNet Google Scholar
Yan, Y., Yi, G. Y. (2016). A class of functional methods for error-contaminated survival data under additive hazards models with replicate measurements. Journal of the American Statistical Association, 111, 684–695.
Article MathSciNet Google Scholar
Yi, G. Y. (2017). Statistical analysis with measurement error and misclassication: Strategy, method and application. New York: Springer.
Book Google Scholar
Yi, G. Y., Lawless, J. F. (2007). A corrected likelihood method for the proportional hazards model with covariates subject to measurement error. Journal of Statistical Planning and Inference, 137, 1816–1828.
Article MathSciNet Google Scholar
Yi, G. Y., Ma, Y., Spiegelman, D., Carroll, R. J. (2015). Functional and structural methods with mixed measurement error and misclassification in covariates. Journal of the American Statistical Association, 110, 681–696.
Article MathSciNet Google Scholar
Zhao, S., Prentice, R. L. (2014). Covariate measurement error correction methods in mediation analysis with failure time data. Biometrics, 70, 835–844.
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors thank the review team for the comments on the initial submission. This research was supported by the Natural Sciences and Engineering Research Council of Canada and partially supported by a Collaborative Research Team Project of the Canadian Statistical Sciences Institute. Yi is Canada Research Chair in Data Science (Tier 1). Her research was undertaken, in part, thanks to funding from the Canada Research Chairs program.

Author information

Authors and Affiliations

Department of Statistical and Actuarial Sciences, University of Western Ontario, 1151 Richmond St, London, ON, N6A 3K7, Canada
Li-Pang Chen & Grace Y. Yi
Department of Computer Science, University of Western Ontario, 1151 Richmond St, London, ON, N6A 3K7, Canada
Grace Y. Yi

Authors

Li-Pang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Grace Y. Yi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Grace Y. Yi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

10463_2020_755_MOESM1_ESM.pdf

The proofs of the theorems and additional numerical results are included in the online Supplementary Material. (pdf 410 KB)

Appendix: Regularity conditions

Like any other asymptotic results, the validity of our results requires regularity conditions imposed on the processes of survival, censoring, measurement error and covariates as well as the sampling scheme. Basically, our regularity conditions pertain to those in Andersen and Gill (1982), Huang et al. (2012), and Yan and Yi (2016), including the following assumptions:

(C1)
$\varTheta $ is a compact set, and the true parameter value $\beta _0$ is an interior point of $\varTheta $.
(C2)
$\int _0^\tau \lambda _0(t)\mathrm{d}t < \infty $, where $\tau $ is the finite maximum support of the failure time.
(C3)
The $\left\{ N_i(t), Y_i(t),Z_i,X_i \right\} $ are independent and identically distributed for $i=1,\ldots ,n$.
(C4)
The covariates $Z_i$ and $X_i$ are bounded.
(C5)
Conditional on $V_i^*$, $\left( T_i^*, V_i^*\right) $ are independent of $A_i^*$.
(C6)
Censoring time $C_i$ is non-informative. That is, the failure time $T_i$ and the censoring time $C_i$ are independent, given the covariates $\{Z_i, X_i\}$.
(C7)
Matrices $E \left( -\frac{1}{n} \frac{\partial ^2 \ell _C^*}{\partial \beta \partial \beta ^\top } \right) $ and $E \left( - \frac{1}{n}\frac{\partial ^2 \ell _M^*}{\partial \beta \partial \beta ^\top } \right) $ are positive definite, where $\ell _C^*$ is defined in (10) and $\ell _M^*$ is the logarithm of the likelihood function (19).
(C8)
The operations of differentiation and integration are exchangeable.

Condition (C1) is a basic condition that is used to derive the maximizer of the target function (e.g., Huang et al. 2012, p.203). (C2) to (C6) are standard conditions for survival analysis, which allow us to obtain the sum of independent and identically distributed random variables and hence to derive the asymptotic properties of the estimators (e.g., Andersen and Gill 1982). The requirement of positive definite matrices in Condition (C7) is standard which ensures asymptotic covariance matrices of $\ell _C^*$ and $\ell _M^*$ meaningful. Condition (C8) is a routine requirement for deriving asymptotic results.

Lemma 1

Let

$$ {\widehat{\ell }}_P^*= \sum _{i=1}^{n} \int \nolimits _{0}^{\tau } \left[ {\widetilde{v}}_i^\top \beta + \frac{1}{2} \beta _{x}^\top {\varSigma_{\epsilon}} \beta _{x} - \log \left\{ \sum _{j=1}^{n} \exp ({\widetilde{v}}_j^\top \beta ) I(a_j \le u \le y_j) \right\} \right] \mathrm{d}N_i (u). $$

(40)

Then (10) and (40) yield the same maximum likelihood estimator of $\beta $.

The proof is given in Appendix B of the Supplementary Material. The following lemma is used to establish the consistency of the estimators ${\widehat{\beta }}$ and ${\widetilde{\beta }}$, respectively, given in Theorems 2 and 3.

Lemma 2

Define

$$\begin{aligned} \kappa _P = E\left( \frac{1}{n} {\widehat{\ell }}_P^*\right) \end{aligned}$$

and let

$$ \kappa= \kappa _P + E\left\{ \frac{1}{n} \log \left( L_M^*\right) \right\} , $$

where ${\widehat{\ell }}_P^*$ and $L_M^*$ are determined by (40) and (19), respectively, with the data $\{ {\widetilde{v}}_i, a_i, y_i,z_i \}$ replaced by the corresponding random variables $\{ {\widetilde{V}}_i, A_i, Y_i,Z_i \}$. Then $\beta _0$ is the unique maximizer of $\kappa _P$ and $\kappa $.

Proof

Part 1: We show that $\beta _0$ is the unique maximizer of $\kappa _P$.

Recall that $\ell _C$ is the logarithm of the likelihood function (2) based on the true covariates X. In the absence of measurement error, i.e., based on the true covariates X, Huang et al. (2012, p.208) showed that the true value $\beta _0$ is the unique maximizer of $E(\ell _C)$. Noting that by (9), $\ell _C$ and $\ell _C^*$, defined in (10), have the relationship

$$\begin{aligned} E(\ell _C^*) = E(\ell _C). \end{aligned}$$

(41)

We conclude that $\beta _0$ is also the unique maximizer of $E(\ell _C^*)$. By Lemma 1, we conclude that $\beta _0$ is the unique maximizer of $\kappa _P$. With regularity conditions including (C8),

$$\begin{aligned} \beta _0 \ \ \text {is the unique solution of} \ \ E\left( \frac{1}{n} \frac{\partial {\widehat{\ell }}_P^*}{\partial \beta } \right) = 0, \end{aligned}$$

(42)

and

$$\begin{aligned} \left. \frac{\partial ^2 \kappa _P}{\partial \beta \partial \beta ^\top } \right| _{\beta =\beta _0} \ \ \text {is negative definite.} \end{aligned}$$

(43)

Part 2: We show that $\beta _0$ is the unique maximizer of $\kappa $.

Let $\ell _M(\beta ;X,Z)$ denote the logarithm of the likelihood function (3) based on the true covariates X, and let $\ell _M(\beta ;{\widetilde{X}},Z)$ be $\ell _M(\beta ;X,Z)$ with X replaced by ${\widetilde{X}} = E\left( X | W^*\right) $, where $W^*= W-{\varSigma_{\epsilon}} \alpha $ as defined before (7). Define $U_M(\beta ;X,Z) = \frac{\partial \ell _M(\beta ;X,Z)}{\partial \beta }$ and let ${\mathcal{U}}_M(\beta ;X,Z) = E\left\{ {\frac{1}{n}} U_M(\beta ;X,Z) \right\} $.

Recall that $\mu _X = E(X)$ defined before (17), then by (7), we have that $E({\widetilde{X}}) = \mu _X$. Let $\mu _Z = E(Z)$. Then by the linear approximation around $\mu _X$ and $\mu _Z$, we express $U_M(\beta ;{\widetilde{X}},Z)$ and $U_M(\beta ;X,Z)$, respectively, as

$$\begin{aligned} U_M(\beta ;{\widetilde{X}},Z)& {} \approx U_M(\beta ;\mu _X,\mu _Z) + \frac{\partial U_M(\beta ;\mu _X,\mu _Z)}{\partial \mu _X} ({\widetilde{X}} - \mu _X) \nonumber \\&\quad + \frac{\partial U_M(\beta ;\mu _X,\mu _Z)}{\partial \mu _Z} (Z - \mu _Z) \nonumber \\ \end{aligned}$$

(44)

and

$$\begin{aligned} U_M(\beta ;X,Z)& {} \approx U_M(\beta ;\mu _X,\mu _Z) + \frac{\partial U_M(\beta ;\mu _X,\mu _Z)}{\partial \mu _X} (X - \mu _X)\nonumber \\&\quad + \frac{\partial U_M(\beta ;\mu _X,\mu _Z)}{\partial \mu _Z} (Z - \mu _Z), \nonumber \\ \end{aligned}$$

(45)

where $\frac{\partial U_M(\beta ;\mu _X,\mu _Z)}{\partial \mu _X}$ represents the partial derivative $\frac{\partial U_M(\beta ;a,b)}{\partial a}$ evaluated at $(a,b) = (\mu _X,\mu _Z)$, and $\frac{\partial U_M(\beta ;\mu _X,\mu _Z)}{\partial \mu _Z}$ represents the partial derivative $\frac{\partial U_M(\beta ;a,b)}{\partial b}$ evaluated at $(a,b) = (\mu _X,\mu _Z)$. Here $U_M(\beta ;a,b)$ has the same functional form as $U_M(\beta ;X,Z)$ except that the former is a real-valued function with arguments $\beta $, a and b, while the latter case is a function of random variables X and Z together with $\beta $.

Combining (44) and (45) gives that

$$\begin{aligned} U_M(\beta ;{\widetilde{X}},Z) \approx U_M(\beta; X,Z) + \frac{\partial U_M(\beta ;\mu _X,\mu _Z)}{\partial \mu _X} ({\widetilde{X}} - X). \end{aligned}$$

(46)

Therefore, taking expectation on both sides of (46) and replacing $\beta $ by $\beta _0$ give

$$\begin{aligned} {\mathcal{U}}_M(\beta _0; {\widetilde{X}},Z) \approx 0 \end{aligned}$$

(47)

because that $E( {\widetilde{X}}) - E(X) = \mu _X - \mu _X = 0$ and ${\mathcal{U}}_M(\beta _0; X,Z) = 0$ (e.g., Huang et al. 2012, p.208).

By (42) and (47),

$$\begin{aligned} E\left( \left. \frac{1}{n} \frac{\partial {\widehat{\ell }}_P^*}{\partial \beta } \right| _{\beta =\beta _0} \right) + {\mathcal{U}}_M(\beta _0; {\widetilde{X}},Z)\approx 0. \end{aligned}$$

(48)

By definition of $U_M(\beta ;X,Z)$ and (19) together with (17), we have $U_M(\beta ;{\widetilde{X}},Z) = \frac{\partial \log (L_M^*)}{\partial \beta }$, and thus, ${\mathcal{U}}_M(\beta ;{\widetilde{X}},Z) = E\left\{ \frac{1}{n} \frac{\partial \log (L_M^*)}{\partial \beta } \right\} $ and $\frac{\partial {\mathcal{U}}_M(\beta ;{\widetilde{X}},Z)}{\partial \beta } = E\left\{ \frac{1}{n} \frac{\partial ^2 \log (L_M^*)}{\partial \beta \partial \beta ^\top } \right\} $. Then applying (48) gives

$$\begin{aligned} \left. \frac{\partial \kappa }{\partial \beta } \right| _{\beta =\beta _0} \approx 0. \end{aligned}$$

(49)

Next, by taking expectation on (46) and then taking the partial derivative with respect to $\beta $ give that

$$\begin{aligned} \frac{\partial {\mathcal{U}}_M(\beta ;{\widetilde{X}},Z)}{\partial \beta } \approx \frac{\partial {\mathcal{U}}_M(\beta ;X,Z)}{\partial \beta }. \end{aligned}$$

(50)

By the derivations similar to Huang et al. (2012, p.208), $\frac{\partial {\mathcal{U}}_M(\beta ;X,Z)}{\partial \beta }$ is negative definite at $\beta =\beta _0$, and thus, by (50), $\frac{\partial {\mathcal{U}}_M(\beta ;{\widetilde{X}},Z)}{\partial \beta }$ is also negative definite at $\beta =\beta _0$. Then combining with (43) gives that $ \frac{\partial ^2 \kappa }{\partial \beta \partial \beta ^\top } = \frac{\partial ^2 \kappa _P}{\partial \beta \partial \beta ^\top } + E\left\{ \frac{1}{n} \frac{\partial ^2 \log (L_M^*)}{\partial \beta \partial \beta ^\top } \right\} $ is negative definite at $\beta =\beta _0$. Therefore, combining with (49), we conclude that $\beta _0$ is approximately the maximizer of $\kappa $. $\square $

About this article

Cite this article

Chen, LP., Yi, G.Y. Semiparametric methods for left-truncated and right-censored survival data with covariate measurement error. Ann Inst Stat Math 73, 481–517 (2021). https://doi.org/10.1007/s10463-020-00755-2

Download citation

Received: 09 October 2019
Revised: 25 February 2020
Published: 02 June 2020
Issue Date: June 2021
DOI: https://doi.org/10.1007/s10463-020-00755-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semiparametric methods for left-truncated and right-censored survival data with covariate measurement error

Abstract

Access this article

Similar content being viewed by others

Nonparametric estimators of survival function under the mixed case interval-censored model with left truncation

Estimating the survival function based on the semi-Markov model for dependent censoring

Nonparametric estimation of univariate and bivariate survival functions under right censoring: a survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

10463_2020_755_MOESM1_ESM.pdf

Appendix: Regularity conditions

Lemma 1

Lemma 2

Proof

About this article

Cite this article

Keywords

Navigation

Semiparametric methods for left-truncated and right-censored survival data with covariate measurement error

Abstract

Access this article

Similar content being viewed by others

Nonparametric estimators of survival function under the mixed case interval-censored model with left truncation

Estimating the survival function based on the semi-Markov model for dependent censoring

Nonparametric estimation of univariate and bivariate survival functions under right censoring: a survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

10463_2020_755_MOESM1_ESM.pdf

Appendix: Regularity conditions

Appendix: Regularity conditions

Lemma 1

Lemma 2

Proof

About this article

Cite this article

Share this article

Keywords

Search

Navigation