Abstract
Many methods have been developed for analyzing survival data which are commonly right-censored. These methods, however, are challenged by complex features pertinent to the data collection as well as the nature of data themselves. Typically, biased samples caused by left-truncation (or length-biased sampling) and measurement error often accompany survival analysis. While such data frequently arise in practice, little work has been available to simultaneously address these features. In this paper, we explore valid inference methods for handling left-truncated and right-censored survival data with measurement error under the widely used Cox model. We first exploit a flexible estimator for the survival model parameters which does not require specification of the baseline hazard function. To improve the efficiency, we further develop an augmented nonparametric maximum likelihood estimator. We establish asymptotic results and examine the efficiency and robustness issues for the proposed estimators. The proposed methods enjoy appealing features that the distributions of the covariates and of the truncation times are left unspecified. Numerical studies are reported to assess the finite sample performance of the proposed methods.
Similar content being viewed by others
References
Andersen, P. K., Gill, R. D. (1982). Cox’s regression model for counting processes: A large sample study. The Annals of Statistics, 10, 1100–1120.
Augustin, T. (2004). An exact corrected log-likelihood function for Cox’s proportional hazards model under measurement error and some extensions. Scandinavian Journal of Statistics, 31, 43–50.
Buzas, J. F. (1998). Unbiased scores in proportional hazards regression with covariate measurement error. Journal of Statistical Planning and Inference, 67, 247–257.
Carroll, R. J., Li, K.-C. (1992). Measurement error regression with unknown link: Dimension reduction and data visualization. Journal of the American Statistical Association, 87, 1040–1050.
Carroll, R. J., Ruppert, D., Stefanski, L. A., Crainiceanu, C. M. (2006). Measurement error in nonlinear model. New York: Chapman & Hall/CRC.
Goutis, C., Casella, G. (1999). Explaining the saddlepoint approximation. The American Statistician, 53, 216–224.
Greene, W. F., Cai, J. (2004). Measurement error in covariates in the marginal hazards model for multivariate failure time data. Biometrics, 60, 987–996.
Henmi, M., Eguchi, S. (2004). A paradox concerning nuisance parameters and projected estimating functions. Biometrika, 91, 929–941.
Hosmer, D. W., Lemeshow, S., May, S. (2008). Applied survival analysis: Regression modeling of time to event data. New York: Wiley.
Hu, C., Lin, D. Y. (2002). Cox regression with covariate measurement error. Scandnavian Journal of Statistics, 29, 637–655.
Huang, C. Y., Qin, J., Follmann, D. A. (2012). A maximum pseudo-profile likelihood estimator for the Cox model under length-biased sampling. Biometrika, 99, 199–210.
Huang, Y., Wang, C. Y. (2000). Cox regression with accurate covariates unascertainable: A nonparametric correction approach. Journal of the American Statistical Association, 95, 1209–1219.
Kalbfleisch, J. D., Prentice, R. L. (2002). The Statistical Analysis of Failure Time Data. New York: Wiley.
Kong, F. H., Gu, M. (1999). Consistent estimation in Cox proportional hazards model with covariate measurement errors. Statistica Sinica, 9, 953–969.
Küchenoff, H., Bender, R., Langner, I. (2007). Effect of Berkson measurement error on parameter estimates in Cox regression models. Lifetime Data Analysis, 13, 261–272.
Lawless, J. F. (2003). Statistical models and methods for lifetime data. New York: Wiley.
Li, Y., Ryan, L. (2006). Inference on survival data with covariate measurement error: An imputation-based approach. Scandinavian Journal of Statistics, 33, 169–190.
Nakamura, T. (1992). Proportional hazards model with covariates subject to measurement error. Biometrics, 48, 829–838.
Ning, Y., Yi, G. Y., Reid, N. (2018). A class of weighted estimating equations for semiparametric transformation models with missing covariates. Scandinavian Journal of Statistics, 45, 87–109.
Prentice, R. L. (1982). Covariate measurement errors and parameter estimation in a failure time regression model. Biometrika, 69, 331–342.
Qin, J., Shen, Y. (2010). Statistical methods for analyzing right-censored length-biased data under Cox model. Biometrics, 66, 382–392.
Qin, J., Ning, J., Liu, H., Shen, Y. (2011). Maximum likelihood estimations and EM algorithms with length-biased data. Journal of the American Statistical Association, 106, 1434–1449.
Robins, J. M., Rotnitzky, A., Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association, 89, 846–866.
Rothman, K. J. (2008). BMI-related errors in the measurement of obesity. International Journal of Obesity, 32, 56–59.
Silverman, B. W. (1978). Weak and strong uniform consistency of the kernel estimate of a density and its derivative. The Annals of Statistics, 6, 177–184.
Song, X., Huang, Y. (2005). On corrected score approach for proportional hazards model with covariate measurement error. Biometrics, 61, 702–714.
Su, Y., Wang, J. (2012). Modeling left-truncated and right-censored survival data with longitudinal covariates. The Annals of Statistics, 40, 1465–1488.
van der Vaart, A. W. (1998). Asymptotic statistics. New York: Cambridge University Press.
Wang, C. Y. (1999). Robust sandwich covariance estimation for regression calibration estimator in Cox regression with measurement error. Statistics & Probability Letters, 45, 371–378.
Wang, C. Y. (2000). Flexible regression calibration for covariate measurement error with longitudinal surrogate variables. Statistica Sinica, 10, 905–921.
Wang, M. C. (1991). Nonparametric estimation from cross-sectional survival data. Journal of the American Statistical Association, 86, 130–143.
Wu, F., Kim, S., Qin, J., Saran, R., Li, Y. (2018). A pairwise likelihood augmented Cox estimator for left-truncated data. Biometrics, 74, 100–108.
Xie, S. H., Wang, C. Y., Prentice, R. L. (2001). A risk set calibration method for failure time regression by using a covariate reliability sample. Journal of the Royal Statistical Society, Series B, 63, 855–870.
Xu, Y., Kim, J. K., Li, Y. (2017). Semiparametric estimation for measurement error models with validation data. The Canadian Journal of Statistics, 45, 185–201.
Yan, Y., Yi, G. Y. (2015). A corrected profile likelihood method for survival data with covariate measurement error under the Cox model. The Canadian Journal of Statistics, 43, 454–480.
Yan, Y., Yi, G. Y. (2016). A class of functional methods for error-contaminated survival data under additive hazards models with replicate measurements. Journal of the American Statistical Association, 111, 684–695.
Yi, G. Y. (2017). Statistical analysis with measurement error and misclassication: Strategy, method and application. New York: Springer.
Yi, G. Y., Lawless, J. F. (2007). A corrected likelihood method for the proportional hazards model with covariates subject to measurement error. Journal of Statistical Planning and Inference, 137, 1816–1828.
Yi, G. Y., Ma, Y., Spiegelman, D., Carroll, R. J. (2015). Functional and structural methods with mixed measurement error and misclassification in covariates. Journal of the American Statistical Association, 110, 681–696.
Zhao, S., Prentice, R. L. (2014). Covariate measurement error correction methods in mediation analysis with failure time data. Biometrics, 70, 835–844.
Acknowledgements
The authors thank the review team for the comments on the initial submission. This research was supported by the Natural Sciences and Engineering Research Council of Canada and partially supported by a Collaborative Research Team Project of the Canadian Statistical Sciences Institute. Yi is Canada Research Chair in Data Science (Tier 1). Her research was undertaken, in part, thanks to funding from the Canada Research Chairs program.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
10463_2020_755_MOESM1_ESM.pdf
The proofs of the theorems and additional numerical results are included in the online Supplementary Material. (pdf 410 KB)
Appendix: Regularity conditions
Appendix: Regularity conditions
Like any other asymptotic results, the validity of our results requires regularity conditions imposed on the processes of survival, censoring, measurement error and covariates as well as the sampling scheme. Basically, our regularity conditions pertain to those in Andersen and Gill (1982), Huang et al. (2012), and Yan and Yi (2016), including the following assumptions:
-
(C1)
\(\varTheta \) is a compact set, and the true parameter value \(\beta _0\) is an interior point of \(\varTheta \).
-
(C2)
\(\int _0^\tau \lambda _0(t)\mathrm{d}t < \infty \), where \(\tau \) is the finite maximum support of the failure time.
-
(C3)
The \(\left\{ N_i(t), Y_i(t),Z_i,X_i \right\} \) are independent and identically distributed for \(i=1,\ldots ,n\).
-
(C4)
The covariates \(Z_i\) and \(X_i\) are bounded.
-
(C5)
Conditional on \(V_i^*\), \(\left( T_i^*, V_i^*\right) \) are independent of \(A_i^*\).
-
(C6)
Censoring time \(C_i\) is non-informative. That is, the failure time \(T_i\) and the censoring time \(C_i\) are independent, given the covariates \(\{Z_i, X_i\}\).
-
(C7)
Matrices \(E \left( -\frac{1}{n} \frac{\partial ^2 \ell _C^*}{\partial \beta \partial \beta ^\top } \right) \) and \(E \left( - \frac{1}{n}\frac{\partial ^2 \ell _M^*}{\partial \beta \partial \beta ^\top } \right) \) are positive definite, where \(\ell _C^*\) is defined in (10) and \(\ell _M^*\) is the logarithm of the likelihood function (19).
-
(C8)
The operations of differentiation and integration are exchangeable.
Condition (C1) is a basic condition that is used to derive the maximizer of the target function (e.g., Huang et al. 2012, p.203). (C2) to (C6) are standard conditions for survival analysis, which allow us to obtain the sum of independent and identically distributed random variables and hence to derive the asymptotic properties of the estimators (e.g., Andersen and Gill 1982). The requirement of positive definite matrices in Condition (C7) is standard which ensures asymptotic covariance matrices of \(\ell _C^*\) and \(\ell _M^*\) meaningful. Condition (C8) is a routine requirement for deriving asymptotic results.
Lemma 1
Let
Then (10) and (40) yield the same maximum likelihood estimator of \(\beta \).
The proof is given in Appendix B of the Supplementary Material. The following lemma is used to establish the consistency of the estimators \({\widehat{\beta }}\) and \({\widetilde{\beta }}\), respectively, given in Theorems 2 and 3.
Lemma 2
Define
and let
where \({\widehat{\ell }}_P^*\) and \(L_M^*\) are determined by (40) and (19), respectively, with the data \(\{ {\widetilde{v}}_i, a_i, y_i,z_i \}\) replaced by the corresponding random variables \(\{ {\widetilde{V}}_i, A_i, Y_i,Z_i \}\). Then \(\beta _0\) is the unique maximizer of \(\kappa _P\) and \(\kappa \).
Proof
Part 1: We show that \(\beta _0\) is the unique maximizer of \(\kappa _P\).
Recall that \(\ell _C\) is the logarithm of the likelihood function (2) based on the true covariates X. In the absence of measurement error, i.e., based on the true covariates X, Huang et al. (2012, p.208) showed that the true value \(\beta _0\) is the unique maximizer of \(E(\ell _C)\). Noting that by (9), \(\ell _C\) and \(\ell _C^*\), defined in (10), have the relationship
We conclude that \(\beta _0\) is also the unique maximizer of \(E(\ell _C^*)\). By Lemma 1, we conclude that \(\beta _0\) is the unique maximizer of \(\kappa _P\). With regularity conditions including (C8),
and
Part 2: We show that \(\beta _0\) is the unique maximizer of \(\kappa \).
Let \(\ell _M(\beta ;X,Z)\) denote the logarithm of the likelihood function (3) based on the true covariates X, and let \(\ell _M(\beta ;{\widetilde{X}},Z)\) be \(\ell _M(\beta ;X,Z)\) with X replaced by \({\widetilde{X}} = E\left( X | W^*\right) \), where \(W^*= W-{\varSigma_{\epsilon}} \alpha \) as defined before (7). Define \(U_M(\beta ;X,Z) = \frac{\partial \ell _M(\beta ;X,Z)}{\partial \beta }\) and let \({\mathcal{U}}_M(\beta ;X,Z) = E\left\{ {\frac{1}{n}} U_M(\beta ;X,Z) \right\} \).
Recall that \(\mu _X = E(X)\) defined before (17), then by (7), we have that \(E({\widetilde{X}}) = \mu _X\). Let \(\mu _Z = E(Z)\). Then by the linear approximation around \(\mu _X\) and \(\mu _Z\), we express \(U_M(\beta ;{\widetilde{X}},Z)\) and \(U_M(\beta ;X,Z)\), respectively, as
and
where \(\frac{\partial U_M(\beta ;\mu _X,\mu _Z)}{\partial \mu _X}\) represents the partial derivative \(\frac{\partial U_M(\beta ;a,b)}{\partial a}\) evaluated at \((a,b) = (\mu _X,\mu _Z)\), and \(\frac{\partial U_M(\beta ;\mu _X,\mu _Z)}{\partial \mu _Z}\) represents the partial derivative \(\frac{\partial U_M(\beta ;a,b)}{\partial b}\) evaluated at \((a,b) = (\mu _X,\mu _Z)\). Here \(U_M(\beta ;a,b)\) has the same functional form as \(U_M(\beta ;X,Z)\) except that the former is a real-valued function with arguments \(\beta \), a and b, while the latter case is a function of random variables X and Z together with \(\beta \).
Combining (44) and (45) gives that
Therefore, taking expectation on both sides of (46) and replacing \(\beta \) by \(\beta _0\) give
because that \(E( {\widetilde{X}}) - E(X) = \mu _X - \mu _X = 0\) and \({\mathcal{U}}_M(\beta _0; X,Z) = 0\) (e.g., Huang et al. 2012, p.208).
By definition of \(U_M(\beta ;X,Z)\) and (19) together with (17), we have \(U_M(\beta ;{\widetilde{X}},Z) = \frac{\partial \log (L_M^*)}{\partial \beta }\), and thus, \({\mathcal{U}}_M(\beta ;{\widetilde{X}},Z) = E\left\{ \frac{1}{n} \frac{\partial \log (L_M^*)}{\partial \beta } \right\} \) and \(\frac{\partial {\mathcal{U}}_M(\beta ;{\widetilde{X}},Z)}{\partial \beta } = E\left\{ \frac{1}{n} \frac{\partial ^2 \log (L_M^*)}{\partial \beta \partial \beta ^\top } \right\} \). Then applying (48) gives
Next, by taking expectation on (46) and then taking the partial derivative with respect to \(\beta \) give that
By the derivations similar to Huang et al. (2012, p.208), \(\frac{\partial {\mathcal{U}}_M(\beta ;X,Z)}{\partial \beta }\) is negative definite at \(\beta =\beta _0\), and thus, by (50), \(\frac{\partial {\mathcal{U}}_M(\beta ;{\widetilde{X}},Z)}{\partial \beta }\) is also negative definite at \(\beta =\beta _0\). Then combining with (43) gives that \( \frac{\partial ^2 \kappa }{\partial \beta \partial \beta ^\top } = \frac{\partial ^2 \kappa _P}{\partial \beta \partial \beta ^\top } + E\left\{ \frac{1}{n} \frac{\partial ^2 \log (L_M^*)}{\partial \beta \partial \beta ^\top } \right\} \) is negative definite at \(\beta =\beta _0\). Therefore, combining with (49), we conclude that \(\beta _0\) is approximately the maximizer of \(\kappa \). \(\square \)
About this article
Cite this article
Chen, LP., Yi, G.Y. Semiparametric methods for left-truncated and right-censored survival data with covariate measurement error. Ann Inst Stat Math 73, 481–517 (2021). https://doi.org/10.1007/s10463-020-00755-2
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-020-00755-2