Skip to main content

Adaptive robust local online density estimation for streaming data


Accurate online density estimation is crucial to numerous applications that are prevalent with streaming data. Existing online approaches for density estimation somewhat lack prompt adaptability and robustness when facing concept-drifting and noisy streaming data, resulting in delayed or even deteriorated approximations. To alleviate this issue, in this work, we first propose an adaptive local online kernel density estimator (ALoKDE) for real-time density estimation on data streams. ALoKDE consists of two tightly integrated strategies: (1) a statistical test for concept drift detection and (2) an adaptive weighted local online density estimation when a drift does occur. Specifically, using a weighted form, ALoKDE seeks to provide an unbiased estimation by factoring in the statistical hallmarks of the latest learned distribution and any potential distributional changes that could be introduced by each incoming instance. A robust variant of ALoKDE, i.e., R-ALoKDE, is further developed to effectively handle data streams with varied types/levels of noise. Moreover, we analyze the asymptotic properties of ALoKDE and R-ALoKDE, and also derive their theoretical error bounds regarding bias, variance, MSE and MISE. Extensive comparative studies on various artificial and real-world (noisy) streaming data demonstrate the efficacies of ALoKDE and R-ALoKDE in online density estimation and real-time classification (with noise).

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11


  1. This kernel is asymptotically-optimally efficient among all other kernel functions [7].

  2. “Gaussian”, “Skewed unimodal”, “Strongly skewed”, “Kurtotic unimodal”, “Outlier”, “Bimodal”,“Separated bimodal”, “Skewed bimodal”, “Trimodal”, “Claw”, “Double Claw”, “Symmetric Claw”, “Asymmetric Double Claw”, “Smooth Comb”, and “Discrete Comb”.



  1. Gama J, Žliobaitė I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv (CSUR) 46(4):44

    Article  Google Scholar 

  2. Cesa-Bianchi N, Shalev-Shwartz S, Shamir O (2011) Online learning of noisy data. IEEE Trans Inf Theory 57(12):7907–7931

    Article  MathSciNet  Google Scholar 

  3. Deng C, Yang E, Liu T, Tao D (2019) Two-stream deep hashing with class-specific centers for supervised image search. IEEE Trans Neural Netw Learn Syst 31(6):2189–2201

    Article  Google Scholar 

  4. Procopiuc CM, Procopiuc O, (2005) Density estimation for spatial data streams. In: Bauzer Medeiros C, Egenhofer MJ, Bertino E (eds) Proceedings of advances in spatial and temporal databases, Heidelberg, (August 2005) Lecture notes in computer science, vol 3633. Springer, Berlin, pp 109–126

  5. Heinz C, Seeger B (2008) Cluster kernels: resource-aware kernel density estimators over streaming data. IEEE Trans Knowl Data Eng 20(7):880–893

    Article  Google Scholar 

  6. Kristan M, Leonardis A, Skočaj D (2011) Multivariate online kernel density estimation with Gaussian kernels. Pattern Recognit 44(10–11):2630–2642

    Article  Google Scholar 

  7. Qahtan A, Wang S, Zhang X (2017) KDE-Track: an efficient dynamic density estimator for data streams. IEEE Trans Knowl Data Eng 29(3):642–655

    Article  Google Scholar 

  8. Zhang P, Zhu X, Shi Y, Guo L, Wu X (2011) Robust ensemble learning for mining noisy data streams. Decis Support Syst 50(2):469–479

    Article  Google Scholar 

  9. Krawczyk B, Cano A (2018) Online ensemble learning with abstaining classifiers for drifting and noisy data streams. Appl Soft Comput 68(7):677–692

    Article  Google Scholar 

  10. Verma N, Branson K (2015) Sample complexity of learning Mahalanobis distance metrics. In: Proceedings of the advances in neural information processing systems, Montréal, December 2015, pp 2584–2592

  11. Bifet A, Gavalda R (2007) Learning from time-changing data with adaptive windowing. In: Proceedings of the SIAM international conference on data mining, Minneapolis, April 2007, pp 443–448

  12. Scott DW (2015) Multivariate density estimation: theory, practice, and visualization. Wiley, New York

    Book  Google Scholar 

  13. Sheather SJ (2004) Density estimation. Stat Sci 19(4):588–597

    Article  Google Scholar 

  14. Banerjee A, urlina P (2010) Efficient particle filtering via sparse kernel density estimation. IEEE Trans Image Process 19(9):2480–2490

    Article  MathSciNet  Google Scholar 

  15. Hong X, Chen S, Qatawneh A, Daqrouq K, Sheikh M, Morfeq A (2013) Sparse probability density function estimation using the minimum integrated square error. Neurocomputing 115:122–129

    Article  Google Scholar 

  16. Carbone P, Petri D, Barbé K (2017) Nonparametric probability density estimation via interpolation filtering. IEEE Trans Instrum Meas 66(4):681–690

    Article  Google Scholar 

  17. Zhang L, Lin J, Karim R (2018) Adaptive kernel density-based anomaly detection for nonlinear systems. Know Based Syst 139:50–63

    Article  Google Scholar 

  18. Wahid A, Rao A (2019) RKDOS: a relative kernel density-based outlier score. IETE Tech Rev 8:1–12

    Google Scholar 

  19. Deng C, Liu X, Li C, Tao D (2018) Active multi-kernel domain adaptation for hyperspectral image classification. Pattern Recognit 77:306–315

    Article  Google Scholar 

  20. Yang M, Deng C, Nie F (2019) Adaptive-weighting discriminative regression for multi-view classification. Pattern Recognit 88:236–245

    Article  Google Scholar 

  21. Zhou A, Cai Z, Wei L, Qian W (2003) M-kernel merging: towards density estimation over data streams. In: Proceedings of the IEEE international conference on database systems for advanced applications, Kyoto, March 2003, pp 285–292

  22. Cao Y, He H, Man H (2012) SOMKE: kernel density estimation over data streams by sequences of self-organizing maps. IEEE Trans Neural Netw Learn Syst 23(8):1254–1268

    Article  Google Scholar 

  23. Kristan M, Leonardis A (2014) Online discriminative kernel density estimator with Gaussian kernels. IEEE Trans Cybern 44(3):355–365

    Article  Google Scholar 

  24. Qiu T, Shen F, Zhao J (2015) Local adaptive and incremental Gaussian mixture for online density estimation. In Cao T, Lim EP, Zhou ZH, Ho TB, Cheung D, Motoda H (eds) Proceedings of the advances in knowledge discovery and data mining, Ho Chi Minh City, May 2015. Lecture Notes in Computer Science, vol 9077. Springer, Cham, pp 418–428

  25. Wilcox R (2005) Kolmogorov–Smirnov test. Encyclopedia of biostatistics

  26. Lall A (2015) Data streaming algorithms for the Kolmogorov-Smirnov test. In: Proceedings of the IEEE international conference on big data, Santa Clara, October 2015, pp 95–104

  27. Duong T, Hazelton ML (2005) Cross-validation bandwidth matrices for multivariate kernel density estimation. Scand J Stat 32(3):485–506

    Article  MathSciNet  Google Scholar 

  28. Raykar VC, Duraiswami R (2006) Fast optimal bandwidth selection for kernel density estimation. In: Proceedings of the SIAM international conference on data mining, Bethesda, April 2006, pp 524–528

  29. Yang C, Duraiswami R, Gumerov NA, Davis L (2003) Improved fast gauss transform and efficient kernel density estimation. In: Proceedings of the IEEE international conference on computer vision, Nice, October 2003, pp 464

  30. Botev ZI, Grotowski JF, Kroese DP (2010) Kernel density estimation via diffusion. Ann Stat 38(5):2916–2957

    Article  MathSciNet  Google Scholar 

  31. Foote J (2000) Automatic audio segmentation using a measure of audio novelty. In: Proceedings of the IEEE international conference on multimedia and expo, New York, July 2000, pp 452–455

  32. Losing V, Hammer B, Wersing H (2016) Knn classifier with self adjusting memory for heterogeneous concept drift. In: Proceedings of the IEEE international conference on data mining, Barcelona, December 2016, pp 291–300

  33. Boedihardjo A, Liu C, Chen F (2015) Fast adaptive kernel density estimator for data streams. Knowl Inf Syst 42(2):285–317

    Article  Google Scholar 

  34. Masud M, Gao J, Khan L, Han J, Thuraisingham B (2010) Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Trans Knowl Data Eng 23(6):859–874

    Article  Google Scholar 

  35. Masud M, Chen Q, Khan L, Aggarwal C, Gao J, Han J, Srivastava A, Oza N (2012) Classification and adaptive novel class detection of feature-evolving data streams. IEEE Trans Knowl Data Eng 25(7):1484–1497

    Article  Google Scholar 

Download references


This publication was made possible by funding from the DOD ARO Grant #W911NF-20-1-0249, and the NIH grants 5U54MD007595, 5P20GM103424, 5U19AG055373 and U54GM104940.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Kun Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.



1.1 Appendix 1: Proof of Theorem 1

Equation (6) is equivalent to the following equation,

$$\begin{aligned} \begin{aligned} \lambda _{t}&= \mathop {\mathrm{argmin}}_{\lambda \in [0,1]} \int [\lambda E(\hat{f}_{t}^{kde }({{\varvec{x}}};h_{\mathcal {D}_{t}},h_{{{\varvec{x}}}_{t}})) + (1-\lambda )E(\hat{f}_{t}({{\varvec{x}}}))\\&\quad -f({{\varvec{x}}})]^2 d {{\varvec{x}}} + \int Var \{\lambda \hat{f}_{t}^{kde }({{\varvec{x}}};h_{\mathcal {D}_{t}},h_{{{\varvec{x}}}_{t}})+ (1-\lambda )\hat{f}_{t}({{\varvec{x}}}) \} d {{\varvec{x}}}\\&=\mathop {\mathrm{argmin}}_{\lambda \in [0,1]} \lambda ^2 \int [E(\hat{f}_{t}^{kde }({{\varvec{x}}};h_{\mathcal {D}_{t}},h_{{{\varvec{x}}}_{t}})) - f({{\varvec{x}}})]^2 d {{\varvec{x}}} \\&\quad +(1-\lambda )^2 \int [E(\hat{f}_{t}({{\varvec{x}}})) - f({{\varvec{x}}})]^2 d {{\varvec{x}}} \\&\quad +2 \lambda (1-\lambda ) \int [E(\hat{f}_{t}^{kde }({{\varvec{x}}};h_{\mathcal {D}_{t}},h_{{{\varvec{x}}}_{t}})) - f({{\varvec{x}}})]\\&[E(\hat{f}_{t}({{\varvec{x}}})) - f({{\varvec{x}}})] d {{\varvec{x}}}\\&\quad + \lambda ^2 \int Var \{ \hat{f}_{t}^{kde }({{\varvec{x}}};h_{\mathcal {D}_{t}},h_{{{\varvec{x}}}_{t}}) \} d {{\varvec{x}}} + (1-\lambda )^2 \int Var \{ \hat{f}_{t}({{\varvec{x}}}) \} d {{\varvec{x}}}\\&\quad + 2 \lambda (1-\lambda ) \int Cov \{ \hat{f}_{t}^{kde }({{\varvec{x}}};h_{\mathcal {D}_{t}},h_{{{\varvec{x}}}_{t}}),\hat{f}_{t}({{\varvec{x}}}) \} d {{\varvec{x}}}\\&=\mathop {\mathrm{argmin}}_{\lambda \in [0,1]} \lambda ^2 A_t + (1-\lambda )^2 B_t + 2 \lambda (1-\lambda ) C_t\\&=\mathop {\mathrm{argmin}}_{\lambda \in [0,1]} (A_t + B_t - 2C_t)\lambda ^2 - 2(B_t - C_t)\lambda \end{aligned} \end{aligned}$$

In fact, \(A_{t}\) is the MISE of \(\hat{f}_{t}^{kde }\), \(B_t\) is the MISE of \(\hat{f}_{t}\), and \(C_t\) is the covariance between \(\hat{f}_{t}^{kde }\) and \(\hat{f}_{t}\). Technically, they can be efficiently computed by the second-order Taylor series approximation [29], where we can use an estimate of the curvature \(\int |f''({{\varvec{x}}})|^2 d {{\varvec{x}}}\) and choose \(h_{opt}\) accordingly.

Furthermore, since \(x^2 + y^2 \ge 2xy\), \(Var\{X - Y\} = Var\{X\} + Var\{Y\} - 2 Cov\{X, Y\} \ge 0\), \(Var\{X\} + Var\{Y\} \ge 2 Cov\{X, Y\}\), we obtain \(A_t + B_t \ge 2C_t\). Note that the equality holds if and only if \(\hat{f}_{t}^{kde }({{\varvec{x}}};h_{\mathcal {D}_{t}},h_{{{\varvec{x}}}_{t}})=\hat{f}_{t}({{\varvec{x}}})\), which is unlikely to happen since we utilize a local sampling strategy to construct \(\hat{f}_{t}^{kde }\) when concept drift occurs. Then, the solution of the optimization problem in Eq. (7) is discussed below:

  • Case 1: if \(\hat{f}_{t}^{kde }({{\varvec{x}}};h_{\mathcal {D}_{t}},h_{{{\varvec{x}}}_{t}})=\hat{f}_{t}({{\varvec{x}}})\), then \(A_t = B_t = C_t\), we have \(\forall \lambda _t \in [0,1]\);

  • Case 2: if \(\hat{f}_{t}^{kde }({{\varvec{x}}};h_{\mathcal {D}_{t}},h_{{{\varvec{x}}}_{t}}) \ne \hat{f}_{t}({{\varvec{x}}})\), then \(A_t + B_t > 2C_t\).

    • Case 2-1: if \(B_t < C_t\), then \(A_t > C_t\), \(\lambda _t = 0\);

    • Case 2-2: if \(B_t \ge C_t\) and \(A_t < C_t\), then \(\lambda _t = 1\);

    • Case 2-3: if \(B_t \ge C_t\) and \(A_t \ge C_t\), then \(\lambda _t = \frac{B_{t}-C_{t}}{A_{t}+B_{t}-2C_{t}}\), \(1 - \lambda _t = \frac{A_{t}-C_{t}}{A_{t}+B_{t}-2C_{t}}\).

Therefore, we have a closed-form solution of Eq. (7) for Case 2 (i.e., \(A_t + B_t > 2C_t\)):

$$\begin{aligned} \lambda _{t} = \max {\left\{ 0, \min {\left\{ 1,\frac{B_{t}-C_{t}}{A_{t}+B_{t}-2C_{t}}\right\} }\right\} }. \end{aligned}$$

This concludes the proof of Theorem 1.

1.2 Appendix 2: Proof of Theorem 2

Since \(\lambda _1=\cdots =\lambda _{t-1}=\lambda _{t}=\lambda\), then the following equation holds:

$$\begin{aligned} E(\hat{f}_{t+1}(x)) = \lambda E(\hat{f}_{t}^{kde}(x;h_{\mathcal {D}_{t}},h_{x_{t}})) + (1-\lambda )E(\hat{f}_{t}(x)), \end{aligned}$$

using the above equation repeatedly, we have:

$$\begin{aligned} E(\hat{f}_{t+1}(x))= & {} \lambda \sum _{j=0}^{t-1} (1-\lambda )^{j} E(\hat{f}_{t-j}^{kde}(x;h_{\mathcal {D}_{t-j}},h_{x_{t-j}}))\nonumber \\&+ (1-\lambda )^{t} E(\hat{f}_{1}(x)). \end{aligned}$$

Lemma 1 implies that: \(\lim \limits _{m_{t-j}\rightarrow \infty }E(\hat{f}_{t-j}^{kde}(x;h_{\mathcal {D}_{t-j}},h_{x_{t-j}}))=f(x)\), where \(j=0,1,\ldots ,t-1\).

On the other hand, since \(m_1=\cdots =m_{t}\), \(h_{\mathcal {D}_{1}}=\cdots =h_{\mathcal {D}_{t}}\), we have:

$$\begin{aligned} \begin{aligned}&\lim \limits _{m_{t}\rightarrow \infty }E(\hat{f}_{t+1}(x)) = \lambda \sum _{j=0}^{t-1} (1-\lambda )^{j} f(x) + (1-\lambda )^{t} E(\hat{f}_{1}(x))\\&\quad = (1-(1-\lambda )^{t})f(x)+(1-\lambda )^{t} E(\hat{f}_{1}(x)). \end{aligned} \end{aligned}$$

With \(t \rightarrow \infty\), for arbitrary \(\lambda \in (0,1]\), we have: \(\lim \nolimits _{m_{t}\rightarrow \infty }E(\hat{f}_{t+1}(x))=f(x)\).

We then consider the variance. Since \(\mathcal {D}_{j}\) and \(\mathcal {D}_{k}\) (\(j \ne k\)) are two independent sets, \(Cov\{\hat{f}_{j}^{kde}(x;h_{\mathcal {D}_{j}},h_{x_{j}}), \hat{f}_{k}^{kde}(x;h_{\mathcal {D}_{k}},h_{x_{k}}) \} = 0\). Since \(\hat{f}_{1}(x)\) is initialized by two given constants, then for arbitrary t, we have \(Cov\{\hat{f}_{t}^{kde}(x;h_{\mathcal {D}_{t}},h_{x_{t}}),\hat{f}_{1}(x)\} = 0\). Hence, the following equation holds:

$$\begin{aligned} \begin{aligned}&Var\{\hat{f}_{t+1}(x)\} = Var\{ \lambda \hat{f}_{t}^{kde}(x;h_{\mathcal {D}_{t}},h_{x_{t}}) + (1-\lambda )\hat{f}_{t}(x) \} \\&\quad = \lambda ^2 Var\{ \hat{f}_{t}^{kde}(x;h_{\mathcal {D}_{t}},h_{x_{t}}) \} + (1-\lambda )^2 Var\{ \hat{f}_{t}(x) \}\\&\qquad + 2\lambda (1-\lambda )Cov\{ \hat{f}_{t}^{kde}(x;h_{\mathcal {D}_{t}},h_{x_{t}}),\hat{f}_{t}(x) \} \\&\quad = \lambda ^2 Var\{ \hat{f}_{t}^{kde}(x;h_{\mathcal {D}_{t}},h_{x_{t}}) \} + \lambda ^2 (1-\lambda )^2\\&\quad Var\{ \hat{f}_{t-1}^{kde}(x;h_{\mathcal {D}_{t-1}},h_{x_{t-1}}) \} + (1-\lambda )^4 Var\{ \hat{f}_{t-1}(x) \} \\&\qquad + 2 \lambda (1-\lambda ) Cov \{ \hat{f}_{t}^{kde}(x;h_{\mathcal {D}_{t}},h_{x_{t}})+(1-\lambda )^2\hat{f}_{t-1}^{kde}\\&\qquad \times (x;h_{\mathcal {D}_{t-1}},h_{x_{t-1}}),\hat{f}_{t}(x) \} = \cdots = \\&\quad = \lambda ^2 \sum _{j=0}^{t-1} (1-\lambda )^{2j} Var\{\hat{f}_{t-j}^{kde}(x;h_{\mathcal {D}_{t-j}},h_{x_{t-j}})\}\\&\qquad +(1-\lambda )^{2t} Var\{\hat{f}_{1}(x)\} \\&\qquad + 2 \lambda (1-\lambda ) \sum _{j=0}^{t-1} (1-\lambda )^{2j} Cov\{\hat{f}_{t-j}^{kde}(x;h_{\mathcal {D}_{t-j}},h_{x_{t-j}}),\hat{f}_{1}(x) \} \\&\quad = \lambda ^2 \sum _{j=0}^{t-1} (1-\lambda )^{2j} Var\{\hat{f}_{t-j}^{kde}(x;h_{\mathcal {D}_{t-j}},h_{x_{t-j}})\}\\&\qquad +(1-\lambda )^{2t} Var\{\hat{f}_{1}(x)\}. \end{aligned} \end{aligned}$$

Lemma 1 implies that: \(\lim \nolimits _{m_{t-j}\rightarrow \infty }m_{t-j}h_{\mathcal {D}_{t-j}}Var\{\hat{f}_{t-j}^{kde}(x;h_{\mathcal {D}_{t-j}},h_{x_{t-j}})\}=f(x)\int _{-\infty }^{\infty }K^2(u)du\), where \(j=0,1,\ldots ,t-1\). Since \(m_1=\cdots =m_{t}\), \(h_{\mathcal {D}_{1}}=\cdots =h_{\mathcal {D}_{t}}\), we have:

$$\begin{aligned} \begin{aligned}&\lim \limits _{m_{t}\rightarrow \infty }m_{t}h_{\mathcal {D}_{t}}Var\{\hat{f}_{t+1}(x)\}\\&\quad = \lambda ^2 \sum _{j=0}^{t-1} (1-\lambda )^{2j} f(x)\int _{-\infty }^{\infty }K^2(u)du \\&\qquad +(1-\lambda )^{2t} Var\{\hat{f}_{1}(x)\}\\&\quad = \frac{\lambda ^2(1-(1-\lambda )^{2t})}{1-(1-\lambda )^2}f(x)\int _{-\infty }^{\infty }K^2(u)du + (1-\lambda )^{2t} Var\{ \hat{f}_{1}(x) \}. \end{aligned} \end{aligned}$$

Now with \(t \rightarrow \infty\), for arbitrary \(\lambda \in (0, 1]\), we have: \(\lim \nolimits _{m_{t}\rightarrow \infty }m_{t}h_{\mathcal {D}_{t}}Var\{\hat{f}_{t+1}(x)\}= \frac{\lambda f(x)}{2-\lambda }\int _{-\infty }^{\infty }K^2(u)du\). This concludes the proof of Theorem 2.

1.3 Appendix 3: Proof of Theorem 3

We consider the mean squared error of \(\hat{f}_{t+1}(x)\), \(MSE (\hat{f}_{t+1}(x))=Var\{\hat{f}_{t+1}(x)\}+(Bias\{\hat{f}_{t+1}(x)\})^2\). According to Theorem 2, the estimator \(\hat{f}_{t+1}(x)\) is asymptotically unbiased and therefore, \(\lim \nolimits _{m_{t}\rightarrow \infty }Bias\{\hat{f}_{t+1}(x)\}=0\). Since \(\lim \nolimits _{m_{t}\rightarrow \infty }m_{t}h_{\mathcal {D}_{t}} = \infty\) and \(f(x)\int _{-\infty }^{\infty }K^2(u)du<\infty\), then for arbitrary \(\lambda \in (0,1]\), we have, \(\lim \nolimits _{m_{t}\rightarrow \infty ,t\rightarrow \infty }Var\{\hat{f}_{t+1}(x)\}=\lim \limits _{m_{t}\rightarrow \infty ,t\rightarrow \infty }\frac{\lambda f(x)\int _{-\infty }^{\infty }K^2(u)du}{(2-\lambda )m_{t}h_{\mathcal {D}_{t}}}=0\). Therefore, \(\hat{f}_{t+1}(x)\) converges to f(x) in \(L^2\) and hence is weakly consistent. This concludes the proof of Theorem 3.

1.4 Appendix 4: Proof of Theorem 4

According to the proof of Theorem 2 (\(\lambda _1=\cdots =\lambda _{t-1}=\lambda _{t}=\lambda\)), the following equation holds:

$$\begin{aligned} \begin{aligned}&Bias\{\hat{f}_{t+1}(x)\} = E(\hat{f}_{t+1}(x)) -f(x)=\lambda \sum _{j=0}^{t-1} (1-\lambda )^{j} \cdot \\&E\left( \hat{f}_{t-j}^{kde}(x;h_{\mathcal {D}_{t-j}},h_{x_{t-j}})\right) + (1-\lambda )^{t} E(\hat{f}_{1}(x)) - f(x). \end{aligned} \end{aligned}$$

According to the properties of KDE by Taylor expansion [29], we have, \(Bias\{\hat{f}_{t-j}^{kde}(x;h_{\mathcal {D}_{t-j}},h_{x_{t-j}})\}= \frac{1}{2}f''(x)h_{\mathcal {D}_{t-j}}^2\int _{-\infty }^{\infty }u^2K(u)du+o(h_{\mathcal {D}_{t-j}}^2)\), where \(j=0,1,\ldots ,t-1\).

Thus, with \(m_1=\cdots =m_{t}\), \(h_{\mathcal {D}_{1}}=\cdots =h_{\mathcal {D}_{t}}\), we have:

$$\begin{aligned} \begin{aligned} Bias\{\hat{f}_{t+1}(x)\}&= (1-(1-\lambda )^{t}) \left\{ \frac{1}{2}f''(x)h_{\mathcal {D}_{t}}^2 \int _{-\infty }^{\infty }u^2K(u)du\right\} \\&\quad + o(h_{\mathcal {D}_{t}}^2)+(1-\lambda )^{t} E(\hat{f}_{1}(x)- f(x)). \end{aligned} \end{aligned}$$

For arbitrary \(\lambda \in (0,1]\), let t be large enough (\((1-\lambda )^t \approx 0\)), we can immediately derive Eq. (9) from the above equation.

For the variance, since Eq. (18) holds and according to the properties of KDE by Taylor expansion [29], we have, \(Var\{\hat{f}_{t-j}^{kde}(x;h_{\mathcal {D}_{t-j}},h_{x_{t-j}})\} = \frac{f(x)}{m_{t-j}h_{\mathcal {D}_{t-j}}}\int _{-\infty }^{\infty }K^2(u)du + o(\frac{1}{m_{t-j}})\), where \(j=0,1,\ldots ,t-1\). Since \(m_1=\cdots =m_{t}\) and \(h_{\mathcal {D}_{1}}=\cdots =h_{\mathcal {D}_{t}}\), we have:

$$\begin{aligned} \begin{aligned} Var\{\hat{f}_{t+1}(x)\}&= \frac{\lambda ^2(1-(1-\lambda )^{2t})}{1-(1-\lambda )^2} \left\{ \frac{f(x)}{m_{t}h_{\mathcal {D}_{t}}}\int _{-\infty }^{\infty }K^2(u)du \right\} \\&\quad + o\left( \frac{1}{m_{t}}\right) +(1-\lambda )^{2t} Var\{\hat{f}_{1}(x) \}. \end{aligned} \end{aligned}$$

For arbitrary \(\lambda \in (0,1]\), let t be large enough (\((1-\lambda )^t \approx 0\)), we can immediately derive Eq. (10) from the above equation. This concludes the proof of Theorem 4.

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Z., Fang, Z., Sheng, V. et al. Adaptive robust local online density estimation for streaming data. Int. J. Mach. Learn. & Cyber. 12, 1803–1824 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: