Abstract
We are interested in renewable estimations and algorithms for nonparametric models with streaming data. In our method, the nonparametric function of interest is expressed through a functional depending on a weight function and a conditional distribution function (CDF). The CDF is estimated by renewable kernel estimations together with function interpolations, based on which we propose the method of renewable weighted composite quantile regression (WCQR). Then, by fully utilizing the model structure, we obtain new selectors for the weight function, such that the WCQR can achieve asymptotic unbiasness when estimating specific functions in the model. We also propose practical bandwidth selectors for streaming data and find the optimal weight function by minimizing the asymptotic variance. The asymptotical results show that our estimator is almost equivalent to the oracle estimator obtained from the entire data together. Besides, our method also enjoys adaptiveness to error distributions, robustness to outliers, and efficiency in both estimation and computation. Simulation studies and real data analyses further confirm our theoretical findings.
Similar content being viewed by others
References
Ashfahani, A., Pratama, M.: Autonomous Deep Learning: Continual Learning Approach for Dynamic Environments, pp. 666–674 (2019)
Bednar, J., Watt, T.: Alpha-trimmed means and their relationship to median filters. IEEE Trans. Acoust. Speech Signal Process. 32(1), 145–153 (1984). https://doi.org/10.1109/TASSP.1984.1164279
Bickel, P.J., Lehmann, E.L.: Descriptive statistics for nonparametric models. II. Location. Ann. Stat. 3(5), 1045–1069 (1975)
Boente, G., Fraiman, R.: Local \(L\)-estimators for nonparametric regression under dependence. J. Nonparametr. Statist. 4(1), 91–101 (1994). https://doi.org/10.1080/10485259408832603
Bucak, S.S., Gunsel, B.: Incremental subspace learning via non-negative matrix factorization. Pattern Recogn. 42(5), 788–797 (2009). https://doi.org/10.1016/j.patcog.2008.09.002
Burden, R.L., Faires, J.D., Burden, A.M.: Numerical Analysis. Cengage Learning (2015)
Chen, X., Liu, W., Zhang, Y.: Quantile regression under memory constraint. Ann. Stat. 47(6), 3244–3273 (2019). https://doi.org/10.1214/18-AOS1777
Das, M., Pratama, M., Savitri, S., Zhang, J.: Muse-rnn: A multilayer self-evolving recurrent neural network for data stream classification. In: 2019 IEEE International Conference on Data Mining (ICDM), pp. 110–119 (2019)
Fan, J.: Local linear regression smoothers and their minimax efficiencies. Ann. Stat. 21(1), 196–216 (1993). https://doi.org/10.1214/aos/1176349022
Fan, J., Gijbels, I.: Variable bandwidth and local linear regression smoothers. Ann. Stat. 20(4), 2008–2036 (1992). https://doi.org/10.1214/aos/1176348900
Fu, Y., Zhao, W., Zhou, T.: Efficient spectral sparse grid approximations for solving multi-dimensional forward backward SDEs. Discret. Contin. Dyn. Syst. Ser. B 22(9), 3439–3458 (2017). https://doi.org/10.3934/dcdsb.2017174
Gautschi, W.: Numerical Analysis, 2nd edn. Birkhäuser, Boston, MA (2012)
Gutenbrunner, C., Jurečková, J.: Regression rank scores and regression quantiles. Ann. Stat. 20(1), 305–330 (1992). https://doi.org/10.1214/aos/1176348524
Jiang, R., Qian, W.M., Zhou, Z.G.: Single-index composite quantile regression with heteroscedasticity and general error distributions. Stat. Pap. 57(1), 185–203 (2016). https://doi.org/10.1007/s00362-014-0646-y
Kai, B., Li, R., Zou, H.: Local composite quantile regression smoothing: an efficient and safe alternative to local polynomial regression. J. R. Stat. Soc. Ser. B Stat Methodol. 72(1), 49–69 (2010). https://doi.org/10.1111/j.1467-9868.2009.00725.x
Koenker, R.: Quantile Regression, Volume 38 of Econometric Society Monographs. Cambridge University Press, Cambridge (2005)
Koenker, R., Portnoy, S.: \(L\)-estimation for linear models. J. Am. Stat. Assoc. 82(399), 851–857 (1987)
Koenker, R., Zhao, Q.S.: \(L\)-estimation for linear heteroscedastic models. J. Nonparametr. Statist. 3(3–4), 223–235 (1994). https://doi.org/10.1080/10485259408832584
Lin, L., Li, F., Wang, K., Zhu, L.: Composite estimation: an asymptotically weighted least squares approach. Stat. Sinica 29(3), 1367–1393 (2019)
Lin, L., Li, W., Lu, J.: Unified rules of renewable weighted sums for various online updating estimations. arXiv:2008.08824 (2020)
Luo, L., Song, P.X.K.: Renewable estimation and incremental inference in generalized linear models with streaming data sets. J. R. Stat. Soc. Ser. B Stat Methodol. 82(1), 69–97 (2020)
Moroshko, E., Vaits, N., Crammer, K.: Second-order non-stationary online learning for regression. J. Mach. Learn. Res. 16, 1481–1517 (2015)
Nion, D., Sidiropoulos, N.D.: Adaptive algorithms to track the parafac decomposition of a third-order tensor. IEEE Trans. Signal Process. 57(6), 2299–2310 (2009). https://doi.org/10.1109/TSP.2009.2016885
Portnoy, S., Koenker, R.: Adaptive \(L\)-estimation for linear models. Ann. Stat. 17(1), 362–381 (1989). https://doi.org/10.1214/aos/1176347022
Pratama, M., Za’in, C., Ashfahani, A., Ong, Y.S., Ding, W.: Automatic construction of multi-layer perceptron network from streaming examples. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM ’19, New York, NY, USA, pp. 1171–1180. Association for Computing Machinery (2019)
Robbins, H., Monro, S.: A Stochastic Approximation Method. Ann. Math. Stat. 22(3), 400–407 (1951). https://doi.org/10.1214/aoms/1177729586
Ruppert, D., Wand, M.P., Holst, U., Hössjer, O.: Local polynomial variance-function estimation. Technometrics 39(3), 262–273 (1997)
Sauer, T.: Numerical Analysis. Addison-Wesley Publishing Company (2011)
Schifano, E.D., Wu, J., Wang, C., Yan, J., Chen, M.H.: Online updating of statistical inference in the big data setting. Technometrics 58(3), 393–403 (2016). https://doi.org/10.1080/00401706.2016.1142900
Serfling, R.J.: Approximation Theorems of Mathematical Statistics. Wiley Series in Probability and Mathematical Statistics. John Wiley & Sons Inc., New York (1980)
Shang, Z., Cheng, G.: Computational limits of a distributed algorithm for smoothing spline. J. Mach. Learn. Res. 18, 37 (2017)
Stigler, S.M.: Do robust estimators work with real data? Ann. Stat. 5(6), 1055–1098 (1977)
Sun, J., Gai, Y., Lin, L.: Weighted local linear composite quantile estimation for the case of general error distributions. J. Stat. Plann. Inference 143(6), 1049–1063 (2013). https://doi.org/10.1016/j.jspi.2013.01.002
Toulis, P., Rennie, J., Airoldi, E.: Statistical analysis of stochastic gradient methods for generalized linear models. Int. Conf. Mach. Learn. 32(1), 667–675 (2014)
Volgushev, S., Chao, S.K., Cheng, G.: Distributed inference for quantile regression processes. Ann. Stat. 47(3), 1634–1662 (2019). https://doi.org/10.1214/18-AOS1730
Wang, K., Li, S., Zhang, B.: Robust communication-efficient distributed composite quantile regression and variable selection for massive data. Comput. Stat. Data Anal. 161, 107262 (2021). https://doi.org/10.1016/j.csda.2021.107262
Wang, K., Wang, H., Li, S.: Renewable quantile regression for streaming datasets. Knowl.-Based Syst. 235, 107675 (2022). https://doi.org/10.1016/j.knosys.2021.107675
Zhao, W., Chen, L., Peng, S.: A new kind of accurate numerical method for backward stochastic differential equations. SIAM J. Sci. Comput. 28(4), 1563–1581 (2006). https://doi.org/10.1137/05063341X
Zhao, W., Fu, Y., Zhou, T.: New kinds of high-order multistep schemes for coupled forward backward stochastic differential equations. SIAM J. Sci. Comput. 36(4), A1731–A1751 (2014). https://doi.org/10.1137/130941274
Zou, H., Yuan, M.: Composite quantile regression and the oracle model selection theory. Ann. Stat. 36(3), 1108–1126 (2008). https://doi.org/10.1214/07-AOS507
Funding
The research was supported by NNSF project of China (11971265) and National Key R &D Program of China (2018YFA0703900).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, Y., Fang, S. & Lin, L. Renewable composite quantile method and algorithm for nonparametric models with streaming data. Stat Comput 34, 43 (2024). https://doi.org/10.1007/s11222-023-10352-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11222-023-10352-x