Datenbank-Spektrum

, Volume 12, Issue 1, pp 43–50 | Cite as

An Index-Inspired Algorithm for Anytime Classification on Evolving Data Streams

Fachbeitrag

Abstract

Due to the ever growing presence of data streams there has been a considerable amount of research on stream data mining over the past years. Anytime algorithms are particularly well suited for stream mining, since they flexibly use all available time on streams of varying data rates, and are also shown to outperform traditional budget approaches on constant streams. In this article we present an index-inspired algorithm for Bayesian anytime classification on evolving data streams and show its performance on benchmark data sets.

Keywords

Stream processing Data mining Anytime algorithms 

References

  1. 1.
    Arai B, Das G, Gunopulos D, Koudas N (2009) Anytime measures for top-k algorithms on exact and fuzzy data sets. VLDB J 18(2):407–427 CrossRefGoogle Scholar
  2. 2.
    Beckmann N, Kriegel HP, Schneider R, Seeger B (1990) The R*-tree: an efficient and robust access method for points and rectangles. In: SIGMOD, pp 322–331 Google Scholar
  3. 3.
    Boddy MS (1991) Anytime problem solving using dynamic programming. In: AAAI, pp 738–743 Google Scholar
  4. 4.
    Borodin A, El-Yaniv R (1998) Online computation and competitive analysis. Cambridge University Press, Cambridge MATHGoogle Scholar
  5. 5.
    Dean T, Boddy MS (1988) An analysis of time-dependent planning. In: AAAI, pp 49–54 Google Scholar
  6. 6.
    DeCoste D (2002) Anytime interval-valued outputs for kernel machines: fast support vector machine classification via distance geometry. In: ICML, pp 99–106 Google Scholar
  7. 7.
    DeCoste D (2003) Anytime query-tuned kernel machines via Cholesky factorization. In: SDM, pp 186–193 Google Scholar
  8. 8.
    DeCoste D, Mazzoni D (2003) Fast query-optimized kernel machine classification via incremental approximate nearest support vectors. In: ICML, pp 115–122 Google Scholar
  9. 9.
    Dempster AP, Laird NML, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc B 39(1):1–38 MathSciNetMATHGoogle Scholar
  10. 10.
    Esmeir S, Markovitch S (2011) Anytime learning of anycost classifiers. Mach Learn (25th Anniversary) 82(3):445–473 CrossRefGoogle Scholar
  11. 11.
    Flores MJ, Gámez JA, Martínez AM, Puerta JM (2009) Gaode and haode: two proposals based on aode to deal with continuous variables. In: ICML, pp 40–47 Google Scholar
  12. 12.
    Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml
  13. 13.
    Grass J, Zilberstein S (1996) Anytime algorithm development tools. SIGART Bull 7(2):20–27 CrossRefGoogle Scholar
  14. 14.
    Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: ACM SIGMOD, pp 47–57 Google Scholar
  15. 15.
    Keogh EJ, Pazzani MJ (2002) Learning the structure of augmented Bayesian classifiers. Int J Artif Intell Tools 11(4):587–601 CrossRefGoogle Scholar
  16. 16.
    Kranen P, Assent I, Baldauf C, Seidl T (2011) The clustree: indexing micro-clusters for anytime stream mining. Knowl Inf Syst J 29:249–272 CrossRefGoogle Scholar
  17. 17.
    Kranen P, Günnemann S, Fries S, Seidl T (2010) MC-tree: improving Bayesian anytime classification. In: SSDBM. Lecture notes in computer science, pp 252–269 Google Scholar
  18. 18.
    Kranen P, Krieger R, Denker S, Seidl T (2010) Bulk loading hierarchical mixture models for efficient stream classification. In: PAKDD, pp 325–334 Google Scholar
  19. 19.
    Kranen P, Seidl T (2009) Harnessing the strengths of anytime algorithms for constant data streams. Data Min Knowl Discov 19(2):245–260 MathSciNetCrossRefGoogle Scholar
  20. 20.
    Likhachev M, Ferguson D, Gordon GJ, Stentz A, Thrun S (2008) Anytime search in dynamic graphs. Artif Intell 172(14):1613–1643 MathSciNetMATHCrossRefGoogle Scholar
  21. 21.
    Likhachev M, Gordon GJ, Thrun S (2003) ARA*: anytime A* with provable bounds on sub-optimality. In: NIPS. Google Scholar
  22. 22.
    Seidl T, Assent I, Kranen P, Krieger R, Herrmann J (2009) Indexing density models for incremental learning and anytime classification on data streams. In: EDBT/ICDT, pp 311–322 Google Scholar
  23. 23.
    Shieh J, Keogh E (2010) Polishing the right apple: anytime classification also benefits data streams with constant arrival times. In: ICDM, pp 461–470 Google Scholar
  24. 24.
    Turaga DS, Verscheure O, Chaudhari UV, Amini L (2006) Resource management for networked classifiers in distributed stream mining systems. In: ICDM, pp 1102–1107 Google Scholar
  25. 25.
    Ueno K, Xi X, Keogh EJ, Lee DJ (2006) Anytime classification using the nearest neighbor algorithm with applications to stream mining. In: ICDM, pp 623–632 Google Scholar
  26. 26.
    Vlachos M, Lin J, Keogh EJ, Gunopulos D (2003) A wavelet-based anytime algorithm for k-means clustering of time series. In: Workshop on clustering high dimensionality data and its applications. Google Scholar
  27. 27.
    Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: KDD, pp 226–235 Google Scholar
  28. 28.
    Webb GI, Boughton JR, Wang Z (2005) Not so naive Bayes: aggregating one-dependence estimators. Mach Learn 58(1):5–24 MATHCrossRefGoogle Scholar
  29. 29.
    Yang Y, Webb GI, Cerquides J, Korb KB, Boughton JR, Ting KM (2007) To select or to weigh: a comparative study of linear combination schemes for superparent-one-dependence estimators. IEEE Trans Knowl Data Eng 19(12):1652–1665 CrossRefGoogle Scholar
  30. 30.
    Yang Y, Webb GI, Korb KB, Ting KM (2007) Classifying under computational resource constraints: anytime classification using probabilistic estimators. Mach Learn 69(1):35–53 CrossRefGoogle Scholar
  31. 31.
    Zheng F, Webb GI (2006) Efficient lazy elimination for averaged one-dependence estimators. In: ICML, pp 1113–1120 CrossRefGoogle Scholar
  32. 32.
    Zheng F, Webb GI (2007) Finding the right family: parent and child selection for averaged one-dependence estimators. In: ECML PKDD, pp 490–501 Google Scholar
  33. 33.
    Zilberstein S (1996) Using anytime algorithms in intelligent systems. AI Mag 17(3):73–83 Google Scholar

Copyright information

© Springer-Verlag 2012

Authors and Affiliations

  1. 1.Data Management and Data Exploration GroupRWTH Aachen UniversityAachenGermany
  2. 2.Department of Computer ScienceAarhus UniversityAarhusDenmark

Personalised recommendations