Towards an Adaptive Approach for Mining Data Streams in Resource Constrained Environments

  • Mohamed Medhat Gaber
  • Arkady Zaslavsky
  • Shonali Krishnaswamy
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3181)

Abstract

Mining data streams in resource constrained environments has emerged as a challenging research issue for the data mining community in the past two years. Several approaches have been proposed to tackle the challenges of limited capabilities for small devices that generate or receive data streams. These approaches try to approximate the mining results with acceptable accuracy and efficiency in space and time complexity. However these approaches are not resource-aware. In this paper, a thorough discussion about the state of the art of mining data streams is presented followed by a formalization of our Algorithm Output Granularity (AOG) approach in mining data streams. The incorporation of AOG within a generic ubiquitous data mining system architecture is shown and discussed. The industrial applications of AOG-based mining techniques are given and discussed.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aggarwal, C., Han, J., Wang, J., Yu, P.S.: A Framework for Clustering Evolving Data Streams. In: Proc. 2003 Int. Conf. on Very Large Data Bases (VLDB 2003), Berlin, Germany (2003)Google Scholar
  2. 2.
    Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of PODS (2002)Google Scholar
  3. 3.
    Babcock, B., Datar, M., Motwani, R.: Load Shedding Techniques for Data Stream Systems (short paper). In: Proc. of the 2003 Workshop on Management and Processing of Data Streams, MPDS 2003 (2003)Google Scholar
  4. 4.
    Babcock, B., Datar, M., Motwani, R., O’Callaghan, L.: Maintaining Variance and k-Medians over Data Stream Windows. In: Proceedings of the 22nd Symposium on Principles of Database Systems, PODS 2003 (2003) (to appear)Google Scholar
  5. 5.
    Burl, M., Fowlkes, C., Roden, J., Stechert, A., Mukhtar, S.: Diamond Eye: A distributed architecture for image data mining. In: SPIE DMKD, Orlando (April 1999)Google Scholar
  6. 6.
    Charikar, M., O’Callaghan, L., Panigrahy, R.: Better streaming algorithms for clustering problems. In: Proc. of 35th ACM Symposium on Theory of Computing, STOC (2003)Google Scholar
  7. 7.
    O’Callaghan, L., Mishra, N., Meyerson, A., Guha, S., Motwani, R.: Streaming-data algorithms for high-quality clustering. In: Proceedings of IEEE International Conference on Data Engineering (March 2002)Google Scholar
  8. 8.
    Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: tracking most frequent items dynamically. In: PODS 2003, pp. 296–306 (2003)Google Scholar
  9. 9.
    Datar, M., Gionis, A., Indyk, P., Motwani, R.: Maintaining Stream Statistics over Sliding Windows (Extended Abstract). In: Proceedings of 13th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2002 (2002)Google Scholar
  10. 10.
    Domingos, P., Hulten, G.: A General Method for Scaling Up Machine Learning Algorithms and its Application to Clustering. In: Proceedings of the Eighteenth International Conference on Machine Learning, Williamstown, MA, pp. 106–113. Morgan Kaufmann, San Francisco (2001)Google Scholar
  11. 11.
    Domingos, P., Hulten, G.: Mining High-Speed Data Streams. In: Proceedings of the Association for Computing Machinery Sixth International Conference on Knowledge Discovery and Data Mining, pp. 71–80 (2000)Google Scholar
  12. 12.
    Gaber, M.M., Krishnaswamy, S., Zaslavsky, A.: Cost-Efficient Mining Techniques for Data Streams. In: Purvis, M. (ed.) Proc. Australasian Workshop on Data Mining and Web Intelligence (DMWI 2004), Dunedin, New Zealand, CRPIT, ACS (2004)Google Scholar
  13. 13.
    Gaber, M.M., Krishnaswamy, S., Zaslavsky, A.: Adaptive Mining Techniques for Data Streams Using Algorithm Output Granularity. In: The Australasian Data Mining Workshop (AusDM 2003), Held in conjunction with the 2003 Congress on Evolutionary Computation (CEC 2003), Canberra, Australia, December 2003. LNCS, Springer, Heidelberg (2003)Google Scholar
  14. 14.
    Ganti, V., Gehrke, J., Ramakrishnan, R.: Mining Data Streams under Block Evolution. SIGKDD Explorations 3(2), 1–10 (2002)CrossRefGoogle Scholar
  15. 15.
    Garofalakis, M., Gehrke, J., Rastogi, R.: Querying and mining data streams: you only get one look a tutorial. In: SIGMOD Conference 2002, p. 635 (2002)Google Scholar
  16. 16.
    Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.S.: Mining Frequent Patterns in Data Streams at Multiple Time Granularities. In: Kargupta, H., Joshi, A., Sivakumar, K., Yesha, Y. (eds.) Next Generation Data Mining, AAAI/MIT (2003)Google Scholar
  17. 17.
    Guha, S., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams. In: Proceedings of the Annual Symposium on Foundations of Computer Science, November 2000, IEEE, Los Alamitos (2000)Google Scholar
  18. 18.
    Golab, L., Ozsu, M.T.: Issues in Data Stream Management. SIGMOD Record 32(2), 5–14 (2003)CrossRefGoogle Scholar
  19. 19.
    Henzinger, M., Raghavan, P., Rajagopalan, S.: Computing on data streams. Technical Note 1998-011, Digital Systems Research Center, Palo Alto, CA (May 1998)Google Scholar
  20. 20.
    Hulten, G., Spencer, L., Domingos, P.: Mining Time-Changing Data Streams. ACM SIGKDD (2001)Google Scholar
  21. 21.
    Kargupta, H.: CAREER: Ubiquitous Distributed Knowledge Discovery from Heterogeneous Data. In: NSF Information and Data Management (IDM) Workshop (2001)Google Scholar
  22. 22.
    Kargupta. H.: VEhicle DAta Stream Mining (VEDAS) Project (2003), http://www.cs.umbc.edu/%7Ehillol/vedas.html
  23. 23.
    Kargupta, H., Park, B., Pittie, S., Liu, L., Kushraj, D., Sarkar, K.: MobiMine: Monitoring the Stock Market from a PDA. ACM SIGKDD Explorations 3(2), 37–46 (2002)CrossRefGoogle Scholar
  24. 24.
    Keogh, E., Lin, J., Truppel, W.: Clustering of Time Series Subsequences is Meaningless: Implications for Past and Future Research. In: Proceedings of the 3rd IEEE International Conference on Data Mining, Melbourne, FL, November 2003, pp. 19–22 (2003)Google Scholar
  25. 25.
    Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: Proceedings of the 28th International Conference on Very Large Data Bases, Hong Kong, China (August 2002)Google Scholar
  26. 26.
    Muthukrishnan, S.: Data streams: algorithms and applications. In: Proceedings of the fourteenth annual ACM-SIAM symposium on discrete algorithms (2003)Google Scholar
  27. 27.
    Muthukrishnan, S.: Seminar on Processing Massive Data Sets. Available Online (2003), http://athos.rutgers.edu/%7Emuthu/stream-seminar.html
  28. 28.
    Ordonez, C.: Clustering Binary Data Streams with K-means. ACM DMKD (2003)Google Scholar
  29. 29.
    Park, B., Kargupta, H.: Distributed Data Mining: Algorithms, Systems, and Applications. In: Ye, N. (ed.) Data Mining Handbook (2002)Google Scholar
  30. 30.
    Papadimitriou, S., Faloutsos, C., Brockwell, A.: Adaptive, Hands-Off Stream Mining. In: 29th International Conference on Very Large Data Bases VLDB (2003)Google Scholar
  31. 31.
    Srivastava, A., Stroeve, J.: Onboard Detection of Snow, Ice, Clouds and Other Geophysical Processes Using Kernel Methods. In: Proceedings of the ICML 2003 workshop on Machine Learning Technologies for Autonomous Space Applications (2003)Google Scholar
  32. 32.
    Tanner, S., Alshayeb, M., Criswell, E., Iyer, M., McDowell, A., McEniry, M., Regner, K.: EVE: On-Board Process Planning and Execution, Earth Science Technology Conference, Pasadena, CA, June 11-14 (2002)Google Scholar
  33. 33.
    Tatbul, N., Cetintemel, U., Zdonik, S., Cherniack, M., Stonebraker, M.: Load Shedding in a Data Stream Manager. In: Proceedings of the 29th International Conference on Very Large Data Bases (VLDB) (September 2003)Google Scholar
  34. 34.
    Tatbul, N., Cetintemel, U., Zdonik, S., Cherniack, M., Stonebraker, M.: Load Shedding on Data Streams. In: Proceedings of the Workshop on Management and Processing of Data Streams (MPDS 2003), San Diego, CA, USA (June 2003)Google Scholar
  35. 35.
    Viglas, S.D., Naughton, J.: Rate based query optimization for streaming information sources. In: Proc. of SIGMOD (2002)Google Scholar
  36. 36.
    Wang, H., Fan, W., Yu, P., Han, J.: Mining Concept-Drifting Data Streams using Ensemble Classifiers. In: The 9th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Washington DC, USA (August 2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Mohamed Medhat Gaber
    • 1
  • Arkady Zaslavsky
    • 1
  • Shonali Krishnaswamy
    • 1
  1. 1.School of Computer Science and Software EngineeringMonash UniversityCaulfield EastAustralia

Personalised recommendations