Advertisement

Knowledge and Information Systems

, Volume 12, Issue 3, pp 301–329 | Cite as

Answering ad hoc aggregate queries from data streams using prefix aggregate trees

  • Moonjung Cho
  • Jian Pei
  • Ke Wang
Regular Paper

Abstract

In some business applications such as trading management in financial institutions, it is required to accurately answer ad hoc aggregate queries over data streams. Materializing and incrementally maintaining a full data cube or even its compression or approximation over a data stream is often computationally prohibitive. On the other hand, although previous studies proposed approximate methods for continuous aggregate queries, they cannot provide accurate answers. In this paper, we develop a novel prefix aggregate tree (PAT) structure for online warehousing data streams and answering ad hoc aggregate queries. Often, a data stream can be partitioned into the historical segment, which is stored in a traditional data warehouse, and the transient segment, which can be stored in a PAT to answer ad hoc aggregate queries. The size of a PAT is linear in the size of the transient segment, and only one scan of the data stream is needed to create and incrementally maintain a PAT. Although the query answering using PAT costs more than the case of a fully materialized data cube, the query answering time is still kept linear in the size of the transient segment. Our extensive experimental results on both synthetic and real data sets illustrate the efficiency and the scalability of our design.

Keywords

Data warehousing Data cube Data stream Online analytic processing (OLAP) Aggregate query 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Arasu A, Manku GS (2004) Approximate counts and quantiles over sliding windows. In: Proceedings of the 23rd ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems (PODS′04), Paris, FranceGoogle Scholar
  2. 2.
    Babcock B, Babu S, Datar M, Motwani R, Widom J (2002) Models and issues in data stream systems. In: Proceedings of the 21st ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems (PODS′02), Madison, WIGoogle Scholar
  3. 3.
    Babu S, Widom J (2001) Continuous queries over data streams. SIGMOD Record 30:109–120CrossRefGoogle Scholar
  4. 4.
    Barbara D, Sullivan M (1997) Quasi-cubes: exploiting approximation in multidimensional databases. SIGMOD Record 26:12–17CrossRefGoogle Scholar
  5. 5.
    Barbara D, Wu X (2000) Using loglinear models to compress datacube. In: ‘WAIM′2000’, pp 311–322Google Scholar
  6. 6.
    Beyer K, Ramakrishnan R (1999) Bottom-up computation of sparse and iceberg cubes. In: Proceedings of the 1999 ACM-SIGMOD international conference on management of data (SIGMOD′99), Philadelphia, PA, pp 359–370Google Scholar
  7. 7.
    Chang JH, Lee WS (2003) Finding recent frequent itemsets adaptively over online data streams. In: KDD ′03: Proceedings of the 9th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM Press, pp 487–492Google Scholar
  8. 8.
    Chaudhuri S, Dayal U (1997) An overview of data warehousing and OLAP technology. SIGMOD Record 26:65–74CrossRefGoogle Scholar
  9. 9.
    Chen Y, Dong G, Han J, Wah BW, Wang J (2002) Multi-dimensional regression analysis of time-series data streams. In: Proceedings of the 2002 international conference on very large data bases (VLDB′02), Hong Kong, ChinaGoogle Scholar
  10. 10.
    Cohen S, Nutt W, Serebrenik A (1999) Rewriting aggregate queries using views. In: Proceedings of the 18th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, Philadelphia, Pennsylvania, ACM Press, pp 155–166Google Scholar
  11. 11.
    Cormode G, Korn F, Muthukrishnan S, Srivastava D (2003) Finding hierarchical heavy hitters in data streams. In: Proceedings of the 19th international conference on very large data bases (VLDB′03), Berlin, GermanyGoogle Scholar
  12. 12.
    Cormode G, Muthukrishnan S (2003). What's hot and what's not: tracking most frequent items dynamically. In: PODS ′03: Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, ACM Press, New York, NY, USA, pp 296–306Google Scholar
  13. 13.
    Datar M, Gionis A, Indyk P, Motwani R (n.d.) Maintaining stream statistics over sliding windows (extended abstract), citeseer.nj.nec.com/491746.htmlGoogle Scholar
  14. 14.
    Dobra A, Garofalakis M, Gehrke J, Rastogi R (2002) Processing complex aggregate queries over data streams. In: Proceedings of the 2002 ACM-SIGMOD international conference management of data (SIGMOD′02), Madison, WisconsinGoogle Scholar
  15. 15.
    Gehrke J, Korn F, Srivastava D (2001) On computing correlated aggregates over continuous data streams. In: Proceedings of the 2001 ACM-SIGMOD international conference management of data (SIGMOD′01), Santa Barbara, CA, pp 13–24Google Scholar
  16. 16.
    Giannella C, Han J, Pei J, Yu P (2004) Mining frequent patterns in data streams at multiple time granularities. In: Kargupta H, Joshi A, Sivakumar K, Yesha Y (eds) Next generation data mining, AAAI/MITGoogle Scholar
  17. 17.
    Gray J, Bosworth A, Layman A, Pirahesh H (1996) Data cube: a relational operator generalizing group-by, cross-tab and sub-totals. In: Proceedings of the 1996 international conference data engineering (ICDE′96), New Orleans, Louisiana, pp 152–159Google Scholar
  18. 18.
    Gupta A, Mumick IS, Subrahmanian VS (1993) Maintaining views incrementally. In: Buneman P, Jajodia S (eds) Proceedings of the 1993 ACM SIGMOD international conference on management of data, Washington, D.C., ACM Press, pp 157–166CrossRefGoogle Scholar
  19. 19.
    Hahn CJ, Warren SG, London J (1994) Edited synoptic cloud reports from ships and land stations over the globe, 1982–1991. Available at http://cdiac.esd.ornl.gov/.
  20. 20.
    Han J, Pei J, Dong G, Wang K (2001) Efficient computation of iceberg cubes with complex measures. In: Proceedings of the 2001 ACM-SIGMOD international conference on management of data (SIGMOD′01), Santa Barbara, CA, pp 1–12Google Scholar
  21. 21.
    Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM-SIGMOD international conference management of data (SIGMOD′00), Dallas, TX, pp 1–12Google Scholar
  22. 22.
    Harinarayan V, Rajaraman A, Ullman JD (1996) Implementing data cubes efficiently. In: Proceedings of the 1996 ACM-SIGMOD international conference on management of data (SIGMOD′96), Montreal, Canada, pp 205–216Google Scholar
  23. 23.
    Johnson T, Shasha D (1997) Some approaches to index design for cube forests. Bull Tech Comm Data Eng 20:27–35Google Scholar
  24. 24.
    Karp RM, Papadimitriou CH, Shenker S (2003) A simple algorithm for finding frequent elements in streams and bags. ACM Trans Database Syst (TODS) 28(1):51–55Google Scholar
  25. 25.
    Lakshmanan L, Pei J, Han J (2002) Quotient cube: How to summarize the semantics of a data cube. In: Proceedings of the 2002 international conference very large data bases (VLDB′02), Hong Kong, ChinaGoogle Scholar
  26. 26.
    Lashmanan L, Pei J, Zhao Y (2003) QC-Trees: An efficient summary structure for semantic OLAP. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data (SIGMOD′03), San Diego, CaliforniaGoogle Scholar
  27. 27.
    Levy AY, Mendelzon AO, Sagiv Y, Srivastava D (1995) Answering queries using views. In: Proceedings of the 14th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, San Jose, California, ACM Press, New York, pp 95–104Google Scholar
  28. 28.
    Manku GS, Motwani R (2002) Approximate frequency counts over data streams. In: Proceedings of the 2002 international conference on very large data bases (VLDB′02), Hong Kong, ChinaGoogle Scholar
  29. 29.
    Mendelzon AO, Vaisman AA (2000) Temporal queries in OLAP. In: Abbadi AE, Brodie ML, Chakravarthy S, Dayal U, Kamel N, Schlageter G, Whang K-Y (eds) VLDB 2000, Proceedings of the 26th international conference on very large data bases, Cairo, Egypt, Morgan Kaufmann, pp 242–253Google Scholar
  30. 30.
    Mumick IS, Quass D, Mumick BS (1997) Maintenance of data cubes and summary tables in a warehouse. In: Peckham J (ed) SIGMOD 1997, Proceedings ACM SIGMOD international conference on management of data, Tucson, Arizona, USA, ACM Press, pp 100–111CrossRefGoogle Scholar
  31. 31.
    Quass D, Gupta A, Mumick IS, Widom J (1996) Making views self-maintainable for data warehousing. In: Proceedings of the 1996 international conference parallel and distributed information systems, Miami Beach, Florida, pp 158–169Google Scholar
  32. 32.
    Quass D, Widom J (1997) On-line warehouse view maintenance. In: Peckham J (ed) SIGMOD 1997, Proceedings ACM SIGMOD international conference on management of data, Tucson, Arizona, USA, ACM Press, pp 393–404CrossRefGoogle Scholar
  33. 33.
    Ross K, Srivastava D (1997) Fast computation of sparse datacubes. In: Proceedings of the 1997 international conference very large data bases (VLDB′97), Athens, Greece, pp 116–125Google Scholar
  34. 34.
    Ross KA, Zaman KA (2000) Optimizing selections over datacubes. In: Statistical and scientific database management, pp 139–152. citeseer.nj.nec.com/article/ross98optimizing. htmlGoogle Scholar
  35. 35.
    Roussopoulos N, Kotidis Y, Roussopoulos M (1997) Cubetree: Organization of and bulk updates on the data cube. In: Peckham J (ed) SIGMOD 1997, Proceedings ACM SIGMOD international conference on management of data, Tucson, Arizona, USA, ACM Press, pp 89–99CrossRefGoogle Scholar
  36. 36.
    Sarawagi S (1997) Indexing OLAP data. Bull Tech Com Data Eng 20:36–43Google Scholar
  37. 37.
    Shanmugasundaram J, Fayyad U, Bradley PS (1999) Compressed data cubes for OLAP aggregate query approximation on continuous dimensions. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining, ACM Press, San Diego, California, United States, pp 223–232Google Scholar
  38. 38.
    Sismanis Y, Roussopoulos N, Deligiannakis A, Kotidis Y (2002) Dwarf: Shrinking the petacube. In: Proceedings of the 2002 ACM-SIGMOD international conference management of data (SIGMOD′02), Madison, WisconsinGoogle Scholar
  39. 39.
    Sristava D, Dar S, Jagadish HV, Levy AV (1996) Answering queries with aggregation using views. In: Proceedings of the 1996 international conference very large data bases (VLDB′96), Bombay, India, pp 318–329Google Scholar
  40. 40.
    Teng W-G, Chen M-S, Yu PS (2003) A regression-based temporal pattern mining scheme for data streams. In: Proceedings of the 19th international conference on very large data bases (VLDB′03), Berlin, GermanyGoogle Scholar
  41. 41.
    Vitter JS, Wang M, Iyer BR (1998) Data cube approximation and historgrams via wavelets. In: Proceedings of the 1998 international conference on information and knowledge management (CIKM′98), Washington DC, pp 96–104Google Scholar
  42. 42.
    Wang W, Lu H, Feng J, Yu JX (2002) Condensed cube: An effective approach to reducing data cube size. In: Proceedings of the 2002 international conference on data engineering (ICDE′02), San Fransisco, CAGoogle Scholar
  43. 43.
    Widom J (1995) Research problems in data warehousing. In: Proceedings of the 4th international conference on information and knowledge management, Baltimore, Maryland, pp 25–30Google Scholar
  44. 44.
    Yu JX, Chong Z, Lu H, Zhou A (2004) False positive or false negative: Mining frequent itemsets from high speed transactional data streams. In: Proceedings of the 30th international conference on very large data bases (VLDB′04), Toronto, ON, CanadaGoogle Scholar
  45. 45.
    Zhao Y, Deshpande PM, Naughton JF (1997) An array-based algorithm for simultaneous multidimensional aggregates. In: Proceedings of the 1997 ACM-SIGMOD international conference management of data (SIGMOD′97), Tucson, Arizona, pp 159–170Google Scholar

Copyright information

© Springer-Verlag London Limited 2006

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringState University of New York at BuffaloBuffaloUSA
  2. 2.School of Computing ScienceSimon Fraser University, 8888 University DriveBurnabyCanada

Personalised recommendations