Advertisement

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

  • Michael Geilke
  • Andreas Karwath
  • Stefan Kramer
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9851)

Abstract

The joint density of a data stream is suitable for performing data mining tasks without having access to the original data. However, the methods proposed so far only target a small to medium number of variables, since their estimates rely on representing all the interdependencies between the variables of the data. High-dimensional data streams, which are becoming more and more frequent due to increasing numbers of interconnected devices, are, therefore, pushing these methods to their limits. To mitigate these limitations, we present an approach that projects the original data stream into a vector space and uses a set of representatives to provide an estimate. Due to the structure of the estimates, it enables the density estimation of higher-dimensional data and approaches the true density with increasing dimensionality of the vector space. Moreover, it is not only designed to estimate homogeneous data, i.e., where all variables are nominal or all variables are numeric, but it can also estimate heterogeneous data. The evaluation is conducted on synthetic and real-world data. The software related to this paper is available at https://github.com/geilke/mideo.

Keywords

Vector Space Data Stream Mahalanobis Distance Density Estimator Joint Density 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases (VLDB), pp. 81–92 (2003)Google Scholar
  2. 2.
    Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional spaces. In: Proceedings of the 8th International Conference on Database Theory, pp. 420–434 (2001)Google Scholar
  3. 3.
    Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press (2006)Google Scholar
  4. 4.
    Davies, S., Moore, A.W.: Interpolating conditional density trees. In: Proceedings of the 18th Conference in Uncertainty in Artificial Intelligence, pp. 119–127 (2002)Google Scholar
  5. 5.
    Frank, E., Bouckaert, R.R.: Conditional density estimation with class probability estimators. In: Proceedings of the First Asian Conference on Machine Learning, pp. 65–81 (2009)Google Scholar
  6. 6.
    Geilke, M., Karwath, A., Frank, E., Kramer, S.: Online estimation of discrete densities. In: Proceedings of the 13th IEEE International Conference on Data Mining, pp. 191–200 (2013)Google Scholar
  7. 7.
    Geilke, M., Karwath, A., Kramer, S.: A probabilistic condensed representation of data for stream mining. In: Proceedings of the 1st International Conference on Data Science and Advanced Analytics, pp. 297–303 (2014)Google Scholar
  8. 8.
    Hwang, J.N., Lay, S.R., Lippman, A.: Nonparametric multivariate density estimation: a comparative study. IEEE Trans. Signal Process. 42(10), 2795–2810 (1994)CrossRefGoogle Scholar
  9. 9.
    Kim, J., Scott, C.D.: Robust kernel density estimation. J. Mach. Learn. Res. 13, 2529–2565 (2012)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Kristan, M., Leonardis, A.: Online discriminative kernel density estimation. In: 20th International Conference on Pattern Recognition, pp. 581–584 (2010)Google Scholar
  11. 11.
    Kristan, M., Leonardis, A., Skocaj, D.: Multivariate online kernel density estimation with Gaussian kernels. Pattern Recogn. 44(10–11), 2630–2642 (2011)CrossRefzbMATHGoogle Scholar
  12. 12.
    Peherstorfer, B., Pflüger, D., Bungartz, H.: Density estimation with adaptive sparse grids for large data sets. In: Proceedings of the 2014 SIAM International Conference on Data Mining, pp. 443–451 (2014)Google Scholar
  13. 13.
    Ram, P., Gray, A.G.: Density estimation trees. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 627–635 (2011)Google Scholar
  14. 14.
    Scott, D.W., Sain, S.R.: Multi-Dimensional Density Estimation, pp. 229–263. Elsevier, Amsterdam (2004)Google Scholar
  15. 15.
    Wu, K., Zhang, K., Fan, W., Edwards, A., Yu, P.S.: Rs-forest: a rapid density estimator for streaming anomaly detection. In: Proceedings of the 14th International Conference on Data Mining, pp. 600–609 (2014)Google Scholar
  16. 16.
    Xu, M., Ishibuchi, H., Gu, X., Wang, S.: Dm-KDE: dynamical kernel density estimation by sequences of KDE estimators with fixed number of components over data streams. Front. Comput. Sci. 8(4), 563–580 (2014)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Michael Geilke
    • 1
  • Andreas Karwath
    • 1
  • Stefan Kramer
    • 1
  1. 1.Johannes Gutenberg University MainzMainzGermany

Personalised recommendations