Efficient Estimation of Dynamic Density Functions with Applications in Data Streams

  • Abdulhakim Qahtan
  • Suojin Wang
  • Xiangliang ZhangEmail author
Part of the Studies in Big Data book series (SBD, volume 41)


Recently, many applications such as network monitoring, traffic management and environmental studies generate huge amount of data that cannot fit in the computer memory. Data of such applications arrive continuously in the form of streams. The main challenges for mining data streams are the high speed and the large volume of the arriving data. A typical solution to tackle the problems of mining data streams is to learn a model that fits in the computer memory. However, the underlying distributions of the streaming data change over time in unpredicted scenarios. In this sense, the learned models should be updated continuously and rely more on the most recent data in the streams.

In this chapter, we present an online density estimator that builds a model called KDE-Track for characterizing the dynamic density of the data streams. KDE-Track summarizes the distribution of a data stream by estimating the Probability Density Function (PDF) of the stream at a set of resampling points. KDE-Track is shown to be more accurate (as reflected by smaller error values) and more computationally efficient (as reflected by shorter running time) when compared with existing density estimation techniques. We demonstrate the usefulness of KDE-Track in visualizing the dynamic density of data streams and change detection.


Data Stream Re-sampled Points Kullback-Leibler Importance Estimation Procedure (KLIEP) Pick-up Event Kernel Merging 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Zhou, A., Cai, Z., Wei, L., Qian, W.: M-kernel merging: towards density estimation over data streams. In: DASFAA (2003)Google Scholar
  2. 2.
    Subramaniam, S., Palpanas, T., Papadopoulos, D., Kalogeraki, V., Gunopulos, D.: Online outlier detection in sensor data using non-parametric models. In: VLDB (2006)Google Scholar
  3. 3.
    Schaller, B.: A regression model of the number of taxicabs in U.S. cities. J. Public Transp. 8, 63–78 (2005)Google Scholar
  4. 4.
    Zhou, Z., Matteson, D.: Predicting ambulance demand: a spatio-temporal kernel approach. In: KDD (2015)Google Scholar
  5. 5.
    Wu, F., Li, Z., Lee, W., Wang, H., Huang, Z.: Semantic annotation of mobility data using social media. In: WWW (2015)Google Scholar
  6. 6.
    Scott, D.: Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley, New York (1992)CrossRefGoogle Scholar
  7. 7.
    Qahtan, A., Zhang, X., Wang, S.: Efficient estimation of dynamic density functions with an application to outlier detection. In: CIKM (2012)Google Scholar
  8. 8.
    Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: ACM SIGMOD-SIGACT-SIGART (2002)Google Scholar
  9. 9.
    Zhang, X., Furtlehner, C., Germain-Renaud, C., Sebag, M.: Data stream clustering with affinity propagation. IEEE Trans. Knowl. Data Eng. 26, 1644–1656 (2014)CrossRefGoogle Scholar
  10. 10.
    Heinz, C., Seeger, B.: Cluster kernels: Resource-aware kernel density estimators over streaming data. IEEE Trans. Knowl. Data Eng. 20, 880–893 (2008)CrossRefGoogle Scholar
  11. 11.
    Boedihardjo, A.P., Lu, C., Chen, F.: A framework for estimating complex probability density structures in data streams. In: CIKM (2008)Google Scholar
  12. 12.
    Cao, Y., He, H., Man, H.: SOMKE: Kernel density estimation over data streams by sequences of self-organizing maps. IEEE Trans. Neural Netw. Learn. Syst. 23, 1254–1268 (2012)CrossRefGoogle Scholar
  13. 13.
    Zheng, Y., Jestes, J., Phillips, J., Li, F.: Quality and efficiency in kernel density estimates for large data. In: SIGMOD (2013)Google Scholar
  14. 14.
    Procopiuc, C., Procopiuc, O.: Density estimation for spatial data streams. In: SSTD (2005)Google Scholar
  15. 15.
    Gary, A., Moore, A.: Nonparametric density estimation: toward computational tractability. In: SDM (2003)Google Scholar
  16. 16.
    Lin, C., Wu, J., Yen, C.: A note on kernel polygons. Biometrika 93, 228–234 (2006)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Hart, T., Zandbergen, P.: Kernel density estimation and hotspot mapping: examining the influence of interpolation method, grid cell size, and bandwidth on crime forecasting. Policing Int. J. Police Strateg. Manag. 37, 305–323 (2014)CrossRefGoogle Scholar
  18. 18.
    Qahtan, A., Wang, S., Zhang, X.: Kde-track: an efficient dynamic density estimator for data streams. IEEE Trans. Knowl. Data Eng. 29, 642–655 (2017)CrossRefGoogle Scholar
  19. 19.
    Wand, M.: Fast computation of multivariate kernel estimators. J. Comput. Graph. Stat. 3, 433–445 (1994)MathSciNetGoogle Scholar
  20. 20.
    Silverman, B.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, London (1986)CrossRefGoogle Scholar
  21. 21.
    Yamanishi, K., Takeuchi, J., Williams, G., Milne, P.: On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. Data Min. Knowl. Disc. 8, 275–300 (2004)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Ke, Y., Sukthankar, R., Hebert, M.: Event detection in crowded videos. In: ICCV (2007)Google Scholar
  23. 23.
    Kifer, D., Ben-David, S., Gehrke, J.: Detecting change in data streams. In: VLDB (2004)Google Scholar
  24. 24.
    Aggarwal, C.C.: A framework for diagnosing changes in evolving data streams. In: SIGMOD (2003)Google Scholar
  25. 25.
    Guralnik, V., Srivastava, J.: Event detection from time series data. In: KDD (1999)Google Scholar
  26. 26.
    Kawahara, Y., Sugiyama, M.: Change-point detection in time-series data by direct density-ratio estimation. In: SDM (2009)Google Scholar
  27. 27.
    Bifet, A., Gavaldà, R.: Learning from time-changing data with adaptive windowing. In: SDM (2007)Google Scholar
  28. 28.
    Dasu, T., Krishnan, S., Venkatasubramanian, S., Yi, K.: An information-theoretic approach to detecting changes in multi-dimensional data streams. In: Symposium on the Interface of Statistics, Computing Science, and Applications (2006)Google Scholar
  29. 29.
    Kuncheva, L.I., Faithfull, W.J.: PCA feature extraction for change detection in multidimensional unlabeled data. IEEE Trans. Neural Netw. Learn. Syst. 25, 69–80 (2014)CrossRefGoogle Scholar
  30. 30.
    Song, X., Wu, M., Jermaine, C., Ranka, S.: Statistical change detection for multi-dimensional data. In: KDD (2007)Google Scholar
  31. 31.
    Liu, S., Yamada, M., Collier, N., Sugiyama, M.: Change-point detection in time-series data by relative density-ratio estimation. In: International Conference on Structural, Syntactic, and Statistical Pattern Recognition, pp. 363–372 (2012)Google Scholar
  32. 32.
    Takeuchi, J., Yamanishi, K.: A unifying framework for detecting outliers and change points from time series. IEEE Trans. Knowl. Data Eng. 18, 482–492 (2006)CrossRefGoogle Scholar
  33. 33.
    Qahtan, A.A., Alharbi, B., Wang, S., Zhang, X.: A PCA-Based change detection framework for multidimensional data streams. In: SIGKDD, pp. 935–944 (2015)Google Scholar
  34. 34.
    Epanechnikov, V.A.: Non-parametric estimation of a multivariate probability density. Theory Probab. Appl. 14, 153–158 (1969)MathSciNetCrossRefGoogle Scholar
  35. 35.
    Turlach, B.: Bandwidth selection in kernel density estimation: a review. CORE and Institut de Statistique, vol. 19, pp. 1–33 (1993)Google Scholar
  36. 36.
    Scott, D., Terrell, G.: Biased and unbiased cross-validation in density estimation. J. Am. Stat. Assoc. 82, 1131–1146 (1987)MathSciNetCrossRefGoogle Scholar
  37. 37.
    Hall, P., Sheather, S., Jones, M., Marron, J.: On optimal data-based bandwidth selection in kernel density estimation. Biometrika 78, 263–269 (1992)MathSciNetCrossRefGoogle Scholar
  38. 38.
    Hall, P., Marron, J.: Estimation of integrated squared density derivatives. Stat. Probab. Lett. 6, 109–115 (1987)MathSciNetCrossRefGoogle Scholar
  39. 39.
    Jones, M.: The roles of ISE and mise in density estimation. Stat. Probab. Lett. 12, 51–56 (1991)MathSciNetCrossRefGoogle Scholar
  40. 40.
    Shimazaki, H., Shinomoto, S.: Kernel bandwidth optimization in spike rate estimation. J. Comput. Neurosci. 29, 171–182 (2010)MathSciNetCrossRefGoogle Scholar
  41. 41.
    Zheng, Y., Phillips, J.: l error and bandwidth selection for kernel density estimates of large data. In: KDD (2015)Google Scholar
  42. 42.
    Chan, T., Golub, G., LeVeque, R.: Algorithms for computing the sample variance: Analysis and recommendations. Am. Stat. 37, 242–247 (1983)MathSciNetzbMATHGoogle Scholar
  43. 43.
    Sain, R.: Multivariate locally adaptive density estimation. Comput. Stat. Data Anal. 39, 165–186 (2002)MathSciNetCrossRefGoogle Scholar
  44. 44.
    Marron, J., Wand, M.: Exact mean integrated squared error. Ann. Stat. 20, 712–736 (1992)MathSciNetCrossRefGoogle Scholar
  45. 45.
    Liu, D., Sun, D., Qiu, Z.: Feature selection for fusion of speaker verification via maximum kullback-leibler distance. In: ICSP (2010)Google Scholar
  46. 46.
    Jin, L., Wang, S., Wang, H.: A new nonparametric stationarity test of time series in time domain. J. R. Stat. Soc. Ser. B 77, 893–922 (2015)MathSciNetCrossRefGoogle Scholar
  47. 47.
    Cha, S.: Comprehensive survey on distance/similarity measures between probability density functions. Int. J. Math. Models Methods Appl. Sci. 1, 300–307 (2007)Google Scholar
  48. 48.
    Dai, X.L., Khorram, S.: Remotely sensed change detection based on artificial neural networks. Photogramm. Eng. Remote. Sens. 65, 1179–1186 (1999)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2019

Authors and Affiliations

  • Abdulhakim Qahtan
    • 1
  • Suojin Wang
    • 2
  • Xiangliang Zhang
    • 3
    Email author
  1. 1.Qatar Computing Research Institute (QCRI)HBKUDohaQatar
  2. 2.Department of StatisticsTAMUCollege StationUSA
  3. 3.CEMSEKing Abdullah University of Science and Technology (KAUST)ThuwalKingdom of Saudi Arabia

Personalised recommendations