Data Mining and Knowledge Discovery

, Volume 19, Issue 2, pp 245–260 | Cite as

Harnessing the strengths of anytime algorithms for constant data streams

Article

Abstract

Anytime algorithms have been proposed for many different applications, e.g., in data mining. Their strengths are the ability to first provide a result after a very short initialization and second to improve their result with additional time. Therefore, anytime algorithms have so far been used when the available processing time varies, e.g., on varying data streams. In this paper we propose to employ anytime algorithms on constant data streams, i.e., for tasks with constant time allowance. We introduce two approaches that harness the strengths of anytime algorithms on constant data streams and thereby improve the over all quality of the result with respect to the corresponding budget algorithm. We derive formulas for the expected performance gain and demonstrate the effectiveness of our novel approaches using existing anytime algorithms on benchmark data sets.

Keywords

Anytime algorithms Stream data mining Classification confidence 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: Proceedings of the 29th VLDB, pp 81–92Google Scholar
  2. Aggarwal CC, Han J, Wang J, Yu PS (2004a) A framework for projected clustering of high dimensional data streams. In: Proceedings of the 30th VLDB. Morgan Kaufmann, pp 852–863Google Scholar
  3. Aggarwal CC, Han J, Wang J, Yu PS (2004b) On demand classification of data streams. In: Proceedings of the 10th ACM KDD. ACM, pp 503–508Google Scholar
  4. Arai B, Das G, Gunopulos D, Koudas N (2007) Anytime measures for top-k algorithms. In: Proceedings of the 33rd VLDB. ACM, pp 914–925Google Scholar
  5. Charikar M, O’Callaghan L, Panigrahy R (2003) Better streaming algorithms for clustering problems. In: Proceedings of the 35th ACM STOC. ACM, pp 30–39Google Scholar
  6. Cheetham W (2000) Case-based reasoning with confidence. In: Advances in case-based reasoning, (EWCBR). Lecture notes in computer science, vol 1898. Springer, pp 15–25Google Scholar
  7. Cormode G, Muthukrishnan S (2003) What’s hot and what’s not: tracking most frequent items dynamically. In: Proceedings of the 22nd ACM PODS, pp 296–306Google Scholar
  8. Crammer K, Kandola JS, Singer Y (2003) Online classification on a budget. In: NIPS. MIT PressGoogle Scholar
  9. DeCoste D (2002) Anytime interval-valued outputs for kernel machines: fast support vector machine classification via distance geometry. In: Proceedings of the 19th ICML. Morgan Kaufmann, pp 99–106Google Scholar
  10. DeCoste D (2003) Anytime query-tuned kernel machines via Cholesky factorization. In: Proceedings of the 3rd SIAM SDM. SIAMGoogle Scholar
  11. Delany SJ, Cunningham P, Doyle D, Zamolotskikh A (2005) Generating estimates of classification confidence for a case-based spam filter. In: 6th international conference on case-based reasoning (ICCBR). Lecture notes in computer science, vol 3620. Springer, pp 177–190Google Scholar
  12. Dredze M, Crammer K, Pereira F (2008) Confidence-weighted linear classification. In: Proceedings of the 25th ICML, pp 264–271Google Scholar
  13. Esmeir S, Markovitch S (2006) Anytime induction of decision trees: an iterative improvement approach. In: Proceedings of the 21st AAAI. AAAI PressGoogle Scholar
  14. Grass J, Zilberstein S (1996) Anytime algorithm development tools. SIGART Bull 7(2): 20–27CrossRefGoogle Scholar
  15. Hettich S, Bay S (1999) The UCI KDD archive. http://kdd.ics.uci.edu
  16. Hulten G, Domingos P (2002) Mining complex models from arbitrarily large databases in constant time. In: Proceedings of the 8th ACM KDD, pp 525–531Google Scholar
  17. Liu C-L, Wellman MP (1996) On state-space abstraction for anytime evaluation of Bayesian networks. SIGART Bull 7(2): 50–57CrossRefGoogle Scholar
  18. Manku GS, Motwani R (2002) Approximate frequency counts over data streams. In: Proceedings of the 28th VLDB. Morgan Kaufmann, pp 346–357Google Scholar
  19. Myers K, Kearns MJ, Singh SP, Walker MA (2000) A boosting approach to topic spotting on subdialogues. In: Proceedings of the 17th ICML, pp 655–662Google Scholar
  20. Seidl T, Assent I, Kranen P, Krieger R, Herrmann J (2009) Indexing density models for incremental learning and anytime classification on data streams. In: Proceedings of the 12th EDBT/ICDT. ACM international conference proceeding series, vol 360. ACM, pp 311–322Google Scholar
  21. Silberstein A, Gelfand A, Munagala K, Puggioni G, Yang J (2007) Suppressions and failures in sensor data: a Bayesian approach. In: Proceedings of the 33rd VLDB, pp 842–853Google Scholar
  22. Street WN, Kim YS (2001) A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the 7th ACM KDD, pp 377–382Google Scholar
  23. Ueno K, Xi X, Keogh EJ, Lee D-J (2006) Anytime classification using the nearest neighbor algorithm with applications to stream mining. In: Proceedings of the 6th IEEE ICDM. IEEE Computer Society, pp 623–632Google Scholar
  24. Vlachos M, Lin J, Keogh EJ, Gunopulos D (2003) A wavelet-based anytime algorithm for k-means clustering of time series. In: Workshop on clustering high dimensionality data and its applications (at ICDM)Google Scholar
  25. Wan EA (1990) Neural network classification: a Bayesian interpretation. IEEE Trans Neural Netw 1(4): 303–305. doi:10.1109/72.80269 CrossRefGoogle Scholar
  26. Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the 9th ACM KDD, pp 226–235Google Scholar
  27. Yang Y, Webb GI, Korb KB, Ting KM (2007) Classifying under computational resource constraints: anytime classification using probabilistic estimators. Mach Learn 69(1): 35–53CrossRefGoogle Scholar
  28. Zilberstein S (1996) Using anytime algorithms in intelligent systems. AI Mag 17(3): 73–83Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Data Management and Data Exploration GroupRWTH Aachen UniversityAachenGermany

Personalised recommendations