Skip to main content
Log in

Right-censored nonparametric regression with measurement error

  • Published:
Metrika Aims and scope Submit manuscript

Abstract

This study focuses on estimating a nonparametric regression model with right-censored data when the covariate is subject to measurement error. To achieve this goal, it is necessary to solve the problems of censorship and measurement error ignored by many researchers. Note that the presence of measurement errors causes biased and inconsistent parameter estimates. Moreover, non-parametric regression techniques cannot be applied directly to right-censored observations. In this context, we consider an updated response variable using the Buckley–James method (BJM), which is essentially based on the Kaplan–Meier estimator, to solve the censorship problem. Then the measurement error problem is handled using the kernel deconvolution method, which is a specialized tool to solve this problem. Accordingly, three denconvoluted estimators based on BJM are introduced using kernel smoothing, local polynomial smoothing, and B-spline techniques that incorporate both the updated response variable and kernel deconvolution.The performances of these estimators are compared in a detailed simulation study. In addition, a real-world data example is presented using the Covid-19 dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Afshin A, Jorge MB (2020) COVID-19 data set resulted from a study on the quality of Novel Corona-virus official datasets. Mendeley Data, V1 https://doi.org/10.17632/nw5m4hs3jr.1

  • Aydin D, Yilmaz E (2018) Modified spline regression based on randomly right-censored data: a comparative study. Commun Stat Simul Comput 47(9):2587–2611

    Article  MathSciNet  Google Scholar 

  • Buckley J, James I (1979) Linear regression with censored data. Biometrika 66(3):429–436

    Article  Google Scholar 

  • Buja A, Hastie T, Tibshirani R (1989) Linear smoothers and additive models. Ann Stat 17(2):453–510

    MathSciNet  Google Scholar 

  • Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM (2006) Measurement error in nonlinear models: a modern perspective. Chapman and Hall/CRC, New York

    Book  Google Scholar 

  • Craven P, Wahba G (1979) Smoothing noisy data with spline functions. Numer Math 31(4):377–403

    Article  Google Scholar 

  • Delaigle A, Meister A (2007) Nonparametric regression estimation in the heteroscedastic errors-in-variables problem. J Am Stat Assoc 102(480):1416–1426

    Article  MathSciNet  CAS  Google Scholar 

  • Delecroix M, Lopez O, Patilea V (2008) Nonlinear censored regression using synthetic data. Scand J Stat 35(2):248–265

    Article  MathSciNet  Google Scholar 

  • De Boor C (1978) A practical guide to splines, vol 27. Springer, New York, p 325

    Book  Google Scholar 

  • Fan J (1991) On the optimal rates of convergence for nonparametric deconvolution problems. Ann Stat 19(3):1257–1272

    Article  MathSciNet  Google Scholar 

  • Belomestny D, Goldenshluger A (2021) Density deconvolution under general assumptions on the distribution of measurement errors. Ann Stat 49(2):615–649

    Article  MathSciNet  Google Scholar 

  • Fan J, Gijbels I (1994) Censored regression: local linear approximations and their applications. J Am Stat Assoc 89(426):560–570

    Article  MathSciNet  Google Scholar 

  • Fan J, Truong YK (1993) Nonparametric regression with errors in variables. Ann Stat 21(4):1900–1925

    Article  MathSciNet  Google Scholar 

  • Fan J, Gijbels I, Hu TC, Huang LS (1996) A study of variable bandwidth selection for local polynomial regression. Stat Sin 6(1):113–127

    MathSciNet  Google Scholar 

  • Ghouch AE, Keilegom IV (2008) Non-parametric regression with dependent censored data. Scand J Stat 35(2):228–247

    Article  MathSciNet  Google Scholar 

  • Glasson S (2007) Censored regression techniques for credit scoring. Doctoral dissertation, RMIT University

  • Han K, Park BU (2018) Smooth backfitting for errors-in-variables additive models. Ann Stat 46:2216–2250

    Article  MathSciNet  Google Scholar 

  • Hazelton ML, Turlach BA (2009) Nonparametric density deconvolution by weighted kernel estimators. Stat Comput 19(3):217–228

    Article  MathSciNet  Google Scholar 

  • James IR, Smith PJ (1984) Consistency results for linear regression with censored data. Ann Stat 12(2):590–600

    Article  MathSciNet  Google Scholar 

  • Khardani S, Lemdani M, Said EO (2012) On the strong uniform consistency of the mode estimator for censored time series. Metrika 75(2):229–241

    Article  MathSciNet  Google Scholar 

  • Koul H, Susarla V, Van Ryzin J (1981) Regression analysis with randomly right-censored data. Ann Stat 9(6):1276–1288

    Article  MathSciNet  Google Scholar 

  • Lee YK, Mammen E, Park BU (2010) Backfitting and smooth backfitting for additive quantile models. Ann Stat 38:2857–2883

    Article  MathSciNet  Google Scholar 

  • Liang H, Wang N (2005) Partially linear single-index measurement error models. Stat Sin 15(1):99–116

    MathSciNet  Google Scholar 

  • Li T, Vuong Q (1998) Nonparametric estimation of the measurement error model using multiple indicators. J Multivar Anal 65(2):139–165

    Article  MathSciNet  Google Scholar 

  • Meier P (2011) Estimation of a distribution function from incomplete observations. J Appl Probab 12(S1):67–87

    Article  ADS  MathSciNet  Google Scholar 

  • Miller RG (1976) Least squares regression with censored data. Biometrika 63(3):449–464

    Article  MathSciNet  Google Scholar 

  • Moffatt JL, Scarf P (2016) Sequential regression measurement error models with application. Stat Model 16(6):454–476

    Article  MathSciNet  Google Scholar 

  • Nadaraya EA (1964) On estimating regression. Theory Probab Appl 9(1):141–142

    Article  Google Scholar 

  • Osman M, Ghosh SK (2012) Nonparametric regression models for right-censored data using Bernstein polynomials. Comput Stat Data Anal 56(3):559–573

    MathSciNet  Google Scholar 

  • Orbe J, Ferreira E, Núñez-Antón V (2003) Censored partial regression. Biostatistics 4(1):109–121

    Article  PubMed  Google Scholar 

  • Ruppert D, Wand MP, Carroll RJ (2003) Semiparametric regression, vol 12. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Stefanski LA, Carroll RJ (1990) Deconvolving kernel density estimators. Statistics 21(2):169–184

    Article  MathSciNet  Google Scholar 

  • Stute W (1999) Nonlinear censored regression. Stat Sin 9(4):140–159

    MathSciNet  Google Scholar 

  • Tekwe CD, Carter RL, Cullings HM (2016) Generalized multiple indicators, multiple causes measurement error models. Stat Model 16(2):140–159

    Article  MathSciNet  Google Scholar 

  • Wang XF, Wang B (2011) Deconvolution estimation in measurement error models: the R package decon. J Stat Softw 39(10):i10

    Article  PubMed  PubMed Central  Google Scholar 

  • Watson GS (1964) Smooth regression analysis. Sankhya Indian J Stat Ser A 26(4):359–372

    MathSciNet  Google Scholar 

  • Aydin D, Ahmed SE, Yilmaz E (2021) Right-censored time series modeling by modified semi-parametric A-spline estimator. Entropy 23(12):1586

    Article  ADS  MathSciNet  PubMed  PubMed Central  Google Scholar 

  • Zhang S, Karunamuni RJ (2009) Deconvolution boundary kernel method in nonparametric density estimation. J Stat Plan Inference 139(7):2269–2283

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We express our sincere gratitude to the Editor and anonymous reviewers whose meticulous and insightful feedback significantly contributed to the refinement and enhancement of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ersin Yılmaz.

Ethics declarations

Conflicts of interest

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A1. Proof of Lemma 4.1

Differentiating the \({\text {AMISE}}\left\{ {\hat{m}}_{h}(x)\right\} \) with respect to h and setting the derivative equal to zero yields

$$\begin{aligned} \frac{\partial }{\partial h} A {\text {MISE}}\left\{ {\hat{m}}_{h}(x)\right\} =\frac{-n[R(K) V(x)]}{(n h)^{2}}+\frac{4 h^{3} V(K)^{2} R\left( m^{\prime \prime }\right) 4}{16}=0 \end{aligned}$$
(A2.1)

Setting Eq. (A2.1) equal to zero we obtain the following equation,

$$\begin{aligned} n h^{5} V(K)^{2} R\left( m^{\prime \prime }\right) =R(K) V(x) \end{aligned}$$

By taking simple algebraic operations it is seen that the optimal value of parameter h is

$$\begin{aligned} h_{o p t}=\left[ \frac{R(K) V(x)}{V(K)^{2} R\left( m^{\prime \prime }\right) n}\right] ^{1 / 5}, \end{aligned}$$

as claimed.

Appendix A2. Proof of Lemma 4.2

To prove Lemma 4.2, one needs to strong assumptions and conditions with regard to nonparametric smoothing, measurement errors, censorship and, the Buckley–James procedure. We provided the restrictions as follows:

Conditions for the Buckley–James (BJ) Method

By following the study of James and Smith (1984) and Meier (1975), let \(\Xi =sup\{\xi :F(\xi )<1\}<\infty \) and suppose that \(N(\xi )\) is the mean of the number of data points (censored or not) for which \(\varepsilon ^*=Y^*-m(W)>\xi \). Finally, \(\varepsilon _C=C-m(W)\). Accordingly, to reach the variance equation given in Lemma 4.2 the following assumptions need to be ensured:

BJ1. \(N(\xi ) \rightarrow \infty \) when \(n \rightarrow \infty \) for all \(\xi <\Xi \).

BJ2. \(\sum _{i=1}^{n}(W_i-{\bar{W}})^2\rightarrow \infty \) when \(n \rightarrow \infty \).

BJ3. \(lim sup_{n\rightarrow \infty }\{\frac{\sum _{i=1}^{n} F(\varepsilon _{C_i}) \mid W_i-{\bar{W}}\mid }{\sum _{i=1}^{n} (W_i-{\bar{W}})^2}\}<\infty \).

BJ4. To ensure the BJ3, following condition is needed: \(\liminf _{n\rightarrow \infty } (1/n_c)\sum _i^n(W_i-{\bar{W}})^2>0\) where \(n_c\) is no. the censored observations.

In addition to these conditions, assumptions (A4) and (B1–B5) should be ensured regarding the \(\hat{{\textbf{m}}}_h(W)\rightarrow \hat{{\textbf{m}}}_h(X)\) based on the corresponding smoothing matrix. Note that assumptions (A1–A3) are also needed to take into account the censorship. Under the given restrictions, the following proof can be written for \(MSSE({\hat{\textbf{m}}})\):

$$\begin{aligned} {\text {MSSE}}\left( {\hat{m}}_{h}\right)&=\sum \limits _{i=1}^{n} E\left[ \left\{ {\hat{m}}_{h}\left( W_{i}\right) -m\left( W_{i}\right) \right\} ^{2}\right] =\sum _{i=1}^{m} {\text {MSE}}\left[ {\hat{m}}_{h}\left( W_{i}\right) \right] \\&=\sum \limits _{i=1}^{n}\left\{ E\left( {\hat{m}}_{h}\left( W_{i}\right) \right) -m\left( W_{i}\right) \right\} ^ {2}+{\text {Var}}\left[ {\hat{m}}_{h}\left( W_{i}\right) \right] \\&=\sum \limits _{i=1}^{n} \{E\left( {\textbf{S}}_{h} {\textbf{Y}}_{i}^{*}\right) -m(W_{i})\}^{2}+Var({\textbf{S}}_{h} {\textbf{Y}}_{i}^{*}) \\&=\sum \limits _{i=1}^{n}\{E\left( {\textbf{S}}_{h} {\textbf{Y}}_{i}^{*}\right) -m(W_{i})\}^{2}+{\text {tr}}\left[ {\text {Cov}}\left( {\textbf{S}}_{h} {\textbf{Y}}_{i}^{*}\right) \right] \\&=\Vert (\hat{{\varvec{m}}_{h}}-{\varvec{m}})\Vert ^{2}+{\text {tr}}\left[ {\text {Cov}}\left( {\textbf{S}}_{h} {\textbf{Y}}^{*}\right) \right] \\&=\left\| \left( {\textbf{S}}_{h}-{\textbf{I}}\right) {\varvec{m}}\right\| ^{2}+{\text {tr}}\left[ {\textbf{S}}_{h} {\text {Cov}}({\textbf{Y}}^{*}) {\textbf{S}}_{h}^{\prime }\right] \end{aligned}$$

Due to censorship and transformed response variable \({\textbf{Y}}^*\), the variance \({\text {Cov}}({\textbf{Y}}^{*})\) is written as given in (4.20) \(\sigma _{*}^{2} {\textbf{I}}=\sigma ^2_\varepsilon -Var(Y \mid Y>C){\bar{F}}(C)\) produces

$$\begin{aligned} {\text {MSSE}}\left( \widehat{{\varvec{m}}}_{h}\right) =\left\| \left( {\textbf{S}}_{h}-{\textbf{I}}\right) {\varvec{m}}\right\| ^{2}+\sigma _{*}^{2} {\text {tr}}\left( {\textbf{S}}_{h} {\textbf{S}}_{h}^{\prime }\right) \end{aligned}$$
(A3.1)

As claimed, Eq. (4.19) has been proven.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aydın, D., Yılmaz, E., Chamidah, N. et al. Right-censored nonparametric regression with measurement error. Metrika (2024). https://doi.org/10.1007/s00184-024-00953-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00184-024-00953-5

Keywords

Navigation