Abstract
This paper focuses on nonparametric regression modeling of time-series observations with data irregularities, such as censoring due to a cutoff value. In general, researchers do not prefer to put up with censored cases in time-series analyses because their results are generally biased. In this paper, we present an imputation algorithm for handling auto-correlated censored data based on a class of autoregressive nonparametric time-series model. The algorithm provides an estimation of the parameters by imputing the censored values with the values from a truncated normal distribution, and it enables unobservable values of the response variable. In this sense, the censored time-series observations are analyzed by nonparametric smoothing techniques instead of the usual parametric methods to reduce modelling bias. Typically, the smoothing methods are updated for estimating the censored time-series observations. We use Monte Carlo simulations based on right-censored data to compare the performances and accuracy of the estimates from the smoothing methods. Finally, the smoothing methods are illustrated using a meteorological time- series and unemployment datasets, where the observations are subject to the detection limit of the recording tool.
Similar content being viewed by others
References
Anderson, T. W. (1984). An introduction to multivariate statistical analysis. Hoboken: Wiley.
Aydin, D., & Yilmaz, E. (2017). Modified spline regression based on randomly right-censored data: a comparison study. Communications in Statistics-Simulation and Computation,. https://doi.org/10.1080/03610918.2017.1353615.
Bjørnstad, N., & Grenfell, B. T. (2001). Noisy clockwork: Time series analysis of population fluctuations in animals. American Association for the Advancement of Science, 293(5530), 638–643.
Box, G., & Jenkins, G. (1970). Time series analysis: Forecasting and control. San Francisco: Holden-Day.
Cai, T., & Betensky, R. A. (2003). Hazard regression for interval-censored data with penalized spline. Biometrics, 59(3), 570–579.
Chen, M.-H., & Deely, J. J. (1996). Bayesian analysis for a constrained linear multiple regression problem for predicting the new crop apples. Journal of Agricultural, Biological and Environmental Statistics, 1, 467–489.
Chen, C., Twycross, J., & Garibaldi, J. M. (2017). A new accuracy measure based on bounded relative error for time series forecasting. PLos ONE, 12(3), 0174202.
Chen, Z., & Yang, Y. (2004). Assessing forecast accuracy measures. Ames: Iowa State University.
Craven, P., & Wahba, G. (1979). Smoothing noisy data with spline functions. Numerical Mathematics, 31, 377–403.
Dabrowska, D. M. (1992). Nonparametric quantile regression with censored data. Sankhya: The Indian Journal of Statistics Series A, 54(2), 252–259.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1), 1–38.
Elogne, S. N., Hristopulos, D. T., & Varouchakis, E. (2008). An application of Spartan spatial random fields in environmental mapping: Focus on automatic mapping capabilities. Stochastic Environmental Research and Risk Assessment,. https://doi.org/10.1007/s00477-007-0167-5.
Fan, J., & Gijbels, I. (1994). Censored regression: Local linear approximations and their applications. Journal of the American Statistical Association, 89(426), 560–570.
Gelfand, A. E., Smith, A. F. M., & Lee, T. M. (1992). Bayesian analysis of constrained parameter and truncated data problems using Gibbs sampling. Journal of the American Statistical Association, 87, 523–532.
Glasbey, C. A. (1988). Examples of regression with serially correlated errors. Journal of the Royal Statistical Society. Series D (The Statistician), 37(3), 277–291.
Gooijer, J. G., & Hyndman, R. J. (2006). 25 years of time series forecasting. International Journal of Forecasting, 22(3), 443–473.
Green, P. J., & Silverman, B. W. (1994). Nonparametric regression and generalized linear model. Boca Raton: Chapman & Hall.
Guessoum, Z., & Ould Said, E. (2008). On the nonparametric estimation of the regression function under censorship model. Statistics & Decisions, 26, 159–77.
Hall, P., & Patil, P. (1994). Properties of nonparametric estimators of autocovariance for stationary random fields. Probability Theory and Related Fields, 99(3), 399–424.
Heitjan, D. F., & Rubin, D. B. (1990). Inference from coarse data via multiple imputation with application to age heaping. Journal of the American Statistical Association, 85(410), 304–314.
Helsel, D. R. (1990). Less than obvious: Statistical treatment of data below detection limit. Environmental Science & Technology, 24, 1767–1774.
Hopke, P. K., Liu, C., & Rubin, D. B. (2001). Multiple imputation for multivariate data with missing and below threshold measurements: Time-series concentrations of pollutants in the arctic. Biometrics, 57(1), 22–33.
Kim, H. T., & Truong, Y. K. (1998). Nonparametric regression estimates with censored data: Local linear smoothers and their applications. Biometrics, 54, 1434–1444.
Meng, X. L. (1994). Multiple-imputation inferences with uncongenial sources of input. Statistical Science, 9(4), 538–558.
Nadaraya, E. A. (1964). On Estimating Regression. Theory of Probability and Its Applications, 10, 186–190.
Park, J. W., Genton, M. G., & Ghosh, S. K. (2007). Censored time series analysis with autoregressive moving average models. Canadian Journal of Statistics, 35(1), 151–168.
Park, J. W., Genton, M. G., & Ghosh, S. K. (2009). Nonparametric auto-covariance estimation from censored time series by Gaussian imputation. Journal of nonparametric statistics, 21(2), 241–259.
Parzen, E. (1983). Time series analysis of irregularly observed data. In Proceedings of a symposium held at Texas A&M University, College Station, Texas.
Robinson, P. M. (1983). Nonparametric estimators for time-series. Journal of Time Series Analysis, 4(3), 185–207.
Rubin, D. (1996). Multiple imputation after 18+ years (with discussion). Journal of the American Statistical Association, 91, 473–489.
Ruppert, D. (2002). Selecting the number of knots for penalized splines. Journal of Computational and Graphical Statistics, 11(4), 735–757.
Ruppert, D., Wand, M. P., & Carroll, R. J. (2003). Semiparametric regression. New York: Cambridge University Press.
Sampson, P. D., & Guttorp, P. (1992). Nonparametric estimation of non-stationary spatial covariance structure. Journal of the American Statistical Association, 87(417), 108–119.
Shapiro, A., & Botha, J. D. (1991). Variogram fitting with a general class of conditionally nonnegative definite functions. Computational Statistics & Data Analysis, 11, 87–96.
Tanner, M. A. (1991). Tools for statistical inference. New York: Springer.
Tanner, M. A., & Wong, W. H. (1987). The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association, 82(398), 528–540.
Vaida, F., & Liu, L. (2009). Fast implementation for normal mixed effects models with censored response. Journal of Computational and Graphical Statistics, 18(4), 797–817.
Wang, C., & Chan, K. S. (2017). Carx: An R package to estimate censored autoregressive time series with exogenous covariates. R Journal, 9(2), 213–231.
Watson, G. S. (1964). Smooth regression analysis. Sankhya, Series A, 26, 359–72.
Wu, W. B., & Pourahmadi, M. (2003). Nonparametric estimation of large covariance matrices of longitudinal data. Biometrika, 90(4), 831–844.
Yang, S. (1999). Censored median regression using weighted empirical survival and hazard function. Journal of the American Statistical Association, 94(445), 137–145.
Zeger, S. L., & Brookmeyer, R. (1986). Regression analysis with censored auto-correlated data. Journal of the American Statistical Association, 81(395), 722–729.
Zheng, Z. K. (1984). Regression analysis with censored data. Ph.D. Dissertation, University of Colombia.
Zheng, J. X. (1998). A consistent nonparametric test of parametric regression models under conditional quantile restrictions. Econometric theory, 14, 123–138.
Acknowledgements
We would like to thank the editor, the associate editor, and the anonymous referees for beneficial comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices: Supplemental Technical Materials
Appendices: Supplemental Technical Materials
1.1 Appendix 1: Derivation of Equation (15)
Let \({\varvec{{Z}}}^{(k)}_t\) be the vector \(\varvec{(}Z^{\left( k\right) }_1,\ldots ,Z^{\left( k\right) }_n)^{\top }\). It is easily seen that the residual sum of squares about f can be rewritten as
since the vector \(\varvec{\mathrm {f}}\) is definitely the vector of the values \(f\left( X_t\right)\). Determine the penalty term \(\int ^b_a{{\left\{ f^{''}\left( x\right) \right\} }^2dx}\) in (11) as \(\varvec{\mathrm {f}}^{\varvec{{\top }}}\varvec{\mathrm {Kf}}\) from (13) to obtain the penalized residuals sum of squares criterion (14), given by
Taking the derivative with respect to \(\varvec{\mathrm {f}}\) and setting it to zero:
Setting Eq. (42) equal to zero and replacing \(\varvec{\mathrm {f}}\) by \({\widehat{\varvec{\mathrm {f}}}}^{\ SS}_{\lambda }\), we obtain
Hence, by multiplying the term \({\left( \varvec{\mathrm {I}}+\lambda \varvec{\mathrm {K}}\right) }^{-1}\)on the both sides of the equation above, we have the following solution based on smoothing spline for the vector
as expressed in the Eq. (15).
1.2 Appendix 2: Derivation of Equation (25)
Consider the model \({\varvec{\mathrm {Z}}}^{(k)}=\varvec{\mathrm {Xb}}+{\varvec{e}}\) , where \(\varvec{\mathrm {b}}=\left( b_0,b_1,\ldots ,b_q,b_{q+r},r=1,2,\ldots ,m\right)\), with \(b_{q+r}\)the coefficient of the r\(^{th}\) knot. The vector of ordinary least squares residuals can be written as \({\varvec{e}}={\varvec{\mathrm {Z}}}^{(k)}-\) \(\varvec{\mathrm {Xb}}\) and hence
The primary interest is to obtain the estimate \(\widehat{\varvec{\mathrm {b}}}\) that minimizes the least squares residuals defined above. It should be noted that such an unrestricted estimation of the \(b_{q+r}\) leads to a wiggly fit. To overcome this problem, we impose the constraint \(\sum ^m_{r=1}{b^2_{q+r}<C\ }\)(where \(>0\) ) on \(b_{q+r}\) the coefficients. Also, we assume that \({\varvec{\mathrm {S}}}^{PS}_{\lambda }\) is a positive definite and symmetric smoother matrix based on penalized spline, and D is a \(\left( m+2\right) \times \left( m+2\right)\) dimensional diagonal penalty matrix whose first \(\left( q+1\right)\) entries are 0, and the remaining entries are 1. Then, the Eq. (23) expressed in section (11) can be written as
By the Lagrange multiplier method, minimizing this optimization problem subject to the constraint is equivalent to the following penalized residuals sum of squares
Similar to the ideas in (42), by taking the derivative with respect to \(\varvec{\mathrm {b}}\) in (45) and setting it to zero, we obtain
Setting (46) equal to zero and replacing \(\varvec{\mathrm {b}}\) by \(\widehat{\varvec{\mathrm {b}}}\), it is easily seen that we obtain the penalized least squares normal equations
From (47), the estimated regression coefficients are simply
Hence, the fitted values \(\widehat{\varvec{\mathrm {f}}}\) based on RS can be given by
as claimed.
1.3 Appendix 3: Derivation of Equation (28)
Consider the \(RSS\left( \lambda \right)\) defined in Eq. (27). The \(RSS\left( \lambda \right)\) can be rewritten as
It is easily seen that \(RSS\left( \lambda \right)\) can be expressed in a quadratic form. When we take the expected value of this form, we obtain
as defined in the Eq. (28).
1.4 Appendix 4: Proof of the Lemma 4.1
\(SMSE\left( {\widehat{\varvec{\mathrm {f}}}}_{\varvec{\lambda }}\mathrm {\ },\varvec{\mathrm {f}}\right) =E{\left\| {\widehat{\varvec{\mathrm {f}}}}_{\varvec{\lambda }}-\varvec{\mathrm {f}}\right\| }^2\), where as \({\widehat{\varvec{\mathrm {f}}}}_{\varvec{\lambda }}\varvec{=}{\varvec{\mathrm {S}}}_{\varvec{\lambda }}{\varvec{\mathrm {Z}}}^{\varvec{(}k\varvec{)}}\) Then the SMSE matrix can be written as the sum of variance and squared bias
As in the usual linear smoother, the bias and Var (i.e, variance) terms in (50) can be written, respectively, as
and
Assume that \(Cov\left( {\varvec{\mathrm {Z}}}^{\varvec{(}k\varvec{)}}\right) ={\sigma }^2\varvec{\mathrm {I}}\) in (51) and hence, bringing (51) and (52) into (50) yields that
This completes the proof of Lemma 4.1.
Rights and permissions
About this article
Cite this article
Aydin, D., Yilmaz, E. Censored Nonparametric Time-Series Analysis with Autoregressive Error Models. Comput Econ 58, 169–202 (2021). https://doi.org/10.1007/s10614-020-10010-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10614-020-10010-8