On Futuristic Query Processing in Data Streams

  • Charu C. Aggarwal
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3896)

Abstract

Recent advances in hardware technology have resulted in the ability to collect and process large amounts of data. In many cases, the collection of the data is a continuous process over time. Such continuous collections of data are referred to as data streams. One of the interesting problems in data stream mining is that of predictive query processing. This is useful for a variety of data mining applications which require us to estimate the future behavior of the data stream. In this paper, we will discuss the problem from the point of view of predictive summarization. In predictive summarization, we would like to store statistical characteristics of the data stream which are useful for estimation of queries representing the behavior of the stream in the future. The example utilized for this paper is the case of selectivity estimation of range queries. For this purpose, we propose a technique which utilizes a local predictive approach in conjunction with a careful choice of storing and summarizing particular statistical characteristics of the data. We use this summarization technique to estimate the future selectivity of range queries, though the results can be utilized to estimate a variety of futuristic queries. We test the results on a variety of data sets and illustrate the effectiveness of the approach.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aggarwal, C.C.: A Framework for Diagnosing Changes in Evolving Data Streams. In: ACM SIGMOD Conference, pp. 575–586 (2003)Google Scholar
  2. 2.
    Aggarwal, C.C., Han, J., Wang, J., Yu, P.: A Framework for Clustering Evolving Data Streams. In: VLDB Conference, pp. 81–92 (2003)Google Scholar
  3. 3.
    Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and Issues in Data Stream Systems. In: ACM PODS Conference, pp. 1–16 (2002)Google Scholar
  4. 4.
    Chen, Y., Dong, G., Han, J., Wah, B., Wang, J.: Multi-Dimensional Regression Analysis of Time Series Data Streams. In: VLDB Conference, pp. 323–334 (2002)Google Scholar
  5. 5.
    Cortes, C., Fisher, K., Pregibon, D., Rogers, A., Smith, F.: Hancock: A Language for Extracting Signatures from Data Streams. In: ACM KDD Conference, pp. 9–17 (2000)Google Scholar
  6. 6.
    Dobra, A., Garofalakis, M., Gehrke, J., Rastogi, R.: Processing Complex Aggregate Queries over Data Streams. In: ACM SIGMOD Conference, pp. 61–72 (2002)Google Scholar
  7. 7.
    Dobra, A., Garofalakis, M., Gehrke, J., Rastogi, R.: Sketch-based multi-query processing over data streams. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 551–568. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  8. 8.
    Domingos, P., Hulten, G.: Mining High-Speed Data Streams. In: ACM KDD Conference, pp. 71–80 (2000)Google Scholar
  9. 9.
    Farnstrom, F., Lewis, J., Elkan, C.: Scalability for Clustering Algorithms Revisited. ACM SIGKDD Explorations 2(1), 51–57 (2000)CrossRefGoogle Scholar
  10. 10.
    Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: Surfing Wavelets on Streams: One-pass Summaries for Approximate Aggregate Queries. In: VLDB Conference, pp. 79–88 (2001)Google Scholar
  11. 11.
    Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: How to Summarize the Universe: Dynamic Maintenance of Quantiles. In: VLDB Conference, pp. 454–465 (2002)Google Scholar
  12. 12.
    Gunopulos, D., Kollios, G., Tsotras, V., Domeniconi, C.: Approximating Multi- Dimensional Aggregate Range Queries over Real Attributes. In: ACM SIGMOD Conference, pp. 463–474 (2000)Google Scholar
  13. 13.
    Manku, G.S., Motwani, R.: Approximate Frequency Counts over Data Streams. In: VLDB Conference, pp. 346–357 (2002)Google Scholar
  14. 14.
    O’Callaghan, L., Mishra, N., Meyerson, A., Guha, S., Motwani, R.: Streaming-Data Algorithms For High-Quality Clustering. In: IEEE ICDE Conference, pp. 685–696 (2002)Google Scholar
  15. 15.
    Vitter, J., Wang, M.: Approximate Computation of Multidimensional Aggregates of Sparse Data using Wavelets. In: ACM SIGMOD Conference, pp. 193–204 (1999)Google Scholar
  16. 16.

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Charu C. Aggarwal
    • 1
  1. 1.IBM T.J. Watson Research CenterHawthorne

Personalised recommendations