Abstract
This study focuses on estimating a nonparametric regression model with right-censored data when the covariate is subject to measurement error. To achieve this goal, it is necessary to solve the problems of censorship and measurement error ignored by many researchers. Note that the presence of measurement errors causes biased and inconsistent parameter estimates. Moreover, non-parametric regression techniques cannot be applied directly to right-censored observations. In this context, we consider an updated response variable using the Buckley–James method (BJM), which is essentially based on the Kaplan–Meier estimator, to solve the censorship problem. Then the measurement error problem is handled using the kernel deconvolution method, which is a specialized tool to solve this problem. Accordingly, three denconvoluted estimators based on BJM are introduced using kernel smoothing, local polynomial smoothing, and B-spline techniques that incorporate both the updated response variable and kernel deconvolution.The performances of these estimators are compared in a detailed simulation study. In addition, a real-world data example is presented using the Covid-19 dataset.
Similar content being viewed by others
References
Afshin A, Jorge MB (2020) COVID-19 data set resulted from a study on the quality of Novel Corona-virus official datasets. Mendeley Data, V1 https://doi.org/10.17632/nw5m4hs3jr.1
Aydin D, Yilmaz E (2018) Modified spline regression based on randomly right-censored data: a comparative study. Commun Stat Simul Comput 47(9):2587–2611
Buckley J, James I (1979) Linear regression with censored data. Biometrika 66(3):429–436
Buja A, Hastie T, Tibshirani R (1989) Linear smoothers and additive models. Ann Stat 17(2):453–510
Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM (2006) Measurement error in nonlinear models: a modern perspective. Chapman and Hall/CRC, New York
Craven P, Wahba G (1979) Smoothing noisy data with spline functions. Numer Math 31(4):377–403
Delaigle A, Meister A (2007) Nonparametric regression estimation in the heteroscedastic errors-in-variables problem. J Am Stat Assoc 102(480):1416–1426
Delecroix M, Lopez O, Patilea V (2008) Nonlinear censored regression using synthetic data. Scand J Stat 35(2):248–265
De Boor C (1978) A practical guide to splines, vol 27. Springer, New York, p 325
Fan J (1991) On the optimal rates of convergence for nonparametric deconvolution problems. Ann Stat 19(3):1257–1272
Belomestny D, Goldenshluger A (2021) Density deconvolution under general assumptions on the distribution of measurement errors. Ann Stat 49(2):615–649
Fan J, Gijbels I (1994) Censored regression: local linear approximations and their applications. J Am Stat Assoc 89(426):560–570
Fan J, Truong YK (1993) Nonparametric regression with errors in variables. Ann Stat 21(4):1900–1925
Fan J, Gijbels I, Hu TC, Huang LS (1996) A study of variable bandwidth selection for local polynomial regression. Stat Sin 6(1):113–127
Ghouch AE, Keilegom IV (2008) Non-parametric regression with dependent censored data. Scand J Stat 35(2):228–247
Glasson S (2007) Censored regression techniques for credit scoring. Doctoral dissertation, RMIT University
Han K, Park BU (2018) Smooth backfitting for errors-in-variables additive models. Ann Stat 46:2216–2250
Hazelton ML, Turlach BA (2009) Nonparametric density deconvolution by weighted kernel estimators. Stat Comput 19(3):217–228
James IR, Smith PJ (1984) Consistency results for linear regression with censored data. Ann Stat 12(2):590–600
Khardani S, Lemdani M, Said EO (2012) On the strong uniform consistency of the mode estimator for censored time series. Metrika 75(2):229–241
Koul H, Susarla V, Van Ryzin J (1981) Regression analysis with randomly right-censored data. Ann Stat 9(6):1276–1288
Lee YK, Mammen E, Park BU (2010) Backfitting and smooth backfitting for additive quantile models. Ann Stat 38:2857–2883
Liang H, Wang N (2005) Partially linear single-index measurement error models. Stat Sin 15(1):99–116
Li T, Vuong Q (1998) Nonparametric estimation of the measurement error model using multiple indicators. J Multivar Anal 65(2):139–165
Meier P (2011) Estimation of a distribution function from incomplete observations. J Appl Probab 12(S1):67–87
Miller RG (1976) Least squares regression with censored data. Biometrika 63(3):449–464
Moffatt JL, Scarf P (2016) Sequential regression measurement error models with application. Stat Model 16(6):454–476
Nadaraya EA (1964) On estimating regression. Theory Probab Appl 9(1):141–142
Osman M, Ghosh SK (2012) Nonparametric regression models for right-censored data using Bernstein polynomials. Comput Stat Data Anal 56(3):559–573
Orbe J, Ferreira E, Núñez-Antón V (2003) Censored partial regression. Biostatistics 4(1):109–121
Ruppert D, Wand MP, Carroll RJ (2003) Semiparametric regression, vol 12. Cambridge University Press, Cambridge
Stefanski LA, Carroll RJ (1990) Deconvolving kernel density estimators. Statistics 21(2):169–184
Stute W (1999) Nonlinear censored regression. Stat Sin 9(4):140–159
Tekwe CD, Carter RL, Cullings HM (2016) Generalized multiple indicators, multiple causes measurement error models. Stat Model 16(2):140–159
Wang XF, Wang B (2011) Deconvolution estimation in measurement error models: the R package decon. J Stat Softw 39(10):i10
Watson GS (1964) Smooth regression analysis. Sankhya Indian J Stat Ser A 26(4):359–372
Aydin D, Ahmed SE, Yilmaz E (2021) Right-censored time series modeling by modified semi-parametric A-spline estimator. Entropy 23(12):1586
Zhang S, Karunamuni RJ (2009) Deconvolution boundary kernel method in nonparametric density estimation. J Stat Plan Inference 139(7):2269–2283
Acknowledgements
We express our sincere gratitude to the Editor and anonymous reviewers whose meticulous and insightful feedback significantly contributed to the refinement and enhancement of this paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A1. Proof of Lemma 4.1
Differentiating the \({\text {AMISE}}\left\{ {\hat{m}}_{h}(x)\right\} \) with respect to h and setting the derivative equal to zero yields
Setting Eq. (A2.1) equal to zero we obtain the following equation,
By taking simple algebraic operations it is seen that the optimal value of parameter h is
as claimed.
Appendix A2. Proof of Lemma 4.2
To prove Lemma 4.2, one needs to strong assumptions and conditions with regard to nonparametric smoothing, measurement errors, censorship and, the Buckley–James procedure. We provided the restrictions as follows:
Conditions for the Buckley–James (BJ) Method
By following the study of James and Smith (1984) and Meier (1975), let \(\Xi =sup\{\xi :F(\xi )<1\}<\infty \) and suppose that \(N(\xi )\) is the mean of the number of data points (censored or not) for which \(\varepsilon ^*=Y^*-m(W)>\xi \). Finally, \(\varepsilon _C=C-m(W)\). Accordingly, to reach the variance equation given in Lemma 4.2 the following assumptions need to be ensured:
BJ1. \(N(\xi ) \rightarrow \infty \) when \(n \rightarrow \infty \) for all \(\xi <\Xi \).
BJ2. \(\sum _{i=1}^{n}(W_i-{\bar{W}})^2\rightarrow \infty \) when \(n \rightarrow \infty \).
BJ3. \(lim sup_{n\rightarrow \infty }\{\frac{\sum _{i=1}^{n} F(\varepsilon _{C_i}) \mid W_i-{\bar{W}}\mid }{\sum _{i=1}^{n} (W_i-{\bar{W}})^2}\}<\infty \).
BJ4. To ensure the BJ3, following condition is needed: \(\liminf _{n\rightarrow \infty } (1/n_c)\sum _i^n(W_i-{\bar{W}})^2>0\) where \(n_c\) is no. the censored observations.
In addition to these conditions, assumptions (A4) and (B1–B5) should be ensured regarding the \(\hat{{\textbf{m}}}_h(W)\rightarrow \hat{{\textbf{m}}}_h(X)\) based on the corresponding smoothing matrix. Note that assumptions (A1–A3) are also needed to take into account the censorship. Under the given restrictions, the following proof can be written for \(MSSE({\hat{\textbf{m}}})\):
Due to censorship and transformed response variable \({\textbf{Y}}^*\), the variance \({\text {Cov}}({\textbf{Y}}^{*})\) is written as given in (4.20) \(\sigma _{*}^{2} {\textbf{I}}=\sigma ^2_\varepsilon -Var(Y \mid Y>C){\bar{F}}(C)\) produces
As claimed, Eq. (4.19) has been proven.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Aydın, D., Yılmaz, E., Chamidah, N. et al. Right-censored nonparametric regression with measurement error. Metrika (2024). https://doi.org/10.1007/s00184-024-00953-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00184-024-00953-5