The VLDB Journal

, Volume 19, Issue 4, pp 557–583

Streaming multiple aggregations using phantoms

  • Rui Zhang
  • Nick Koudas
  • Beng Chin Ooi
  • Divesh Srivastava
  • Pu Zhou
Regular Paper

Abstract

Data streams characterize the high speed and large volume input of a new class of applications such as network monitoring, web content analysis and sensor networks. Among these applications, network monitoring may be the most compelling one—the backbone of a large internet service provider can generate 1 petabyte of data per day. For many network monitoring tasks such as traffic analysis and statistics collection, aggregation is a primitive operation. Various analytical and statistical needs naturally lead to related aggregate queries. In this article, we address the problem of efficiently computing multiple aggregations over high-speed data streams based on the two-level query processing architecture of GS, a real data stream management system deployed in AT & T. We discern that additionally computing and maintaining fine-granularity aggregations (called phantoms) has the benefit of supporting shared computation. Based on a thorough analysis, we propose algorithms to identify the best set of phantoms to maintain and determine allocation of resources (particularly, space) to compute the aggregations. Experiments show that our algorithm achieves near-optimal computation costs, which outperforms the best adapted algorithm by more than an order of magnitude.

Keywords

Data stream Aggregation Multiple-query optimization Phantom GS 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. In: ACM Symposium on Theory of Computing (STOC), pp. 20–29. Philadephia, USA (1996)Google Scholar
  2. 2.
    Arasu A., Babcock B., Babu S., Datar M., Ito K., Motwani R., Nishizawa I., Srivastava U., Thomas D., Varma R., Widom J.: STREAM: the stanford stream data manager. IEEE Data Eng. Bull. 26(1), 19–26 (2003)Google Scholar
  3. 3.
    Arasu, A., Widom, J.: Resource sharing in continuous sliding-window aggregates. In: International Conference on very large data bases (VLDB), pp. 336–347. Toronto, Canada (2004)Google Scholar
  4. 4.
    Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: ACM SIGACT-SIGMOD-SIGART Symposium on principles of database systems (PODS), pp. 1–16. Madison, USA (2002)Google Scholar
  5. 5.
    Barbour, A.D., Holst, L., Janson, S.: Poisson approximation. Oxford Science Publications (1992)Google Scholar
  6. 6.
    Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Lee, S., Seidman, G., Stonebraker, M., Tatbul, N., Zdonik, S.: Monitoring streams—a new class of data management applications. In: International Conference on very large data bases (VLDB), pp. 215–226. Hong Kong, China (2002)Google Scholar
  7. 7.
    Chakravarthy U., Minker J.: Processing multiple queries in database systems. IEEE Database Eng. Bull. 5(3), 38–44 (1982)Google Scholar
  8. 8.
    Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M.J., Hellerstein, J.M., Hong, S.K.W., Madden, S., Raman, V., Reiss, F., Shah, M.: TelegraphCQ: continuous dataflow processing for an uncertain world. In: Conference on innovative data systems research (CIDR), Asilomar, USA (2003)Google Scholar
  9. 9.
    Chandrasekaran, S., Franklin, M.J.: Streaming queries over streaming data. In: International Conference on very large data bases (VLDB), pp. 203–214. Hong Kong, China (2002)Google Scholar
  10. 10.
    Chaudhuri, S., Das, G., Narasayya, V.: A robust, optimization-based approach for approximate answering of aggregate queries. In: ACM International Conference on management of data (SIGMOD), pp. 295–306. Santa Barbara, USA (2001)Google Scholar
  11. 11.
    Chen, J., DeWitt, D.J., Tian, F., Wang, Y.: NiagaraCQ: A scalable continuous query system for internet databases. In: ACM International Conference on management of data (SIGMOD), pp. 379–390. Dallas, USA (2000)Google Scholar
  12. 12.
    Cranor, C., Johnson, T., Spatscheck, O., Shkapenyuk, V.: Gigascope: a stream database for network applications. In: ACM International Conference on management of data (SIGMOD), pp. 647–651. San Diego, USA (2003)Google Scholar
  13. 13.
    Demers, A.J., Gehrke, J., Hong, M., Riedewald, M., White, W.M.: Towards expressive publish/subscribe systems. In: EDBT, pp. 627–644 (2006)Google Scholar
  14. 14.
    Diao Y., Altinel M., Franklin M.J., Zhang H., Fischer P.M.: Path sharing and predicate evaluation for high-performance xml filtering. ACM Trans. Database Syst. 28(4), 467–516 (2003)CrossRefGoogle Scholar
  15. 15.
    Dobra, A., Garofalakis, M.N., Gehrke, J., Rastogi, R.: Processing complex aggregate queries over data streams. In: ACM International Conference on management of data (SIGMOD), pp. 61–72. Madison, USA (2002)Google Scholar
  16. 16.
    Dobra, A., Garofalakis, M.N., Gehrke, J., Rastogi, R.: Sketch-based multi-query processing over data streams. In: International Conference on extending database technology (EDBT), pp. 551–568. Heraklion, Greece (2004)Google Scholar
  17. 17.
    Finkelstein, S.: Common expression analysis in database applications. In: ACM International Conference on management of data (SIGMOD), pp. 235–245. Orlando, USA (1982)Google Scholar
  18. 18.
    Gehrke, J., Korn, F., Srivastava, D.: On computing correlated aggregates over continual data streams. In: ACM International Conference on management of data (SIGMOD), pp. 13–24 (2001)Google Scholar
  19. 19.
    Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Surfing wavelets on streams: One-pass summaries for approximate aggregate queries. In: International Conference on very large data bases (VLDB), pp. 79–88. Roma, Italy (2001)Google Scholar
  20. 20.
    Gupta A., Mumick I.S.: Maintenance of materialized views: problems, techniques and applications. IEEE Data Eng. Bull., Special Issue on Materialized Views and Data Warehousing 18(2), 3–18 (1995)Google Scholar
  21. 21.
    Hall P.A.V.: Optimization of single expressions in a relational data base system. IBM J. Res. Dev. 20(3), 244–257 (1976)MATHCrossRefGoogle Scholar
  22. 22.
    Hammad, M.A., Mokbel, M.F., Ali, M.H., Aref, W.G., Catlin, A.C., Elmagarmid, A.K., Eltabakh, M.Y., Elfeky, M.G., Ghanem, T.M., Gwadera, R., Ilyas, I.F., Marzouk, M.S., Xiong, X.: Nile: A query processing engine for data streams. In: ICDE, p. 851 (2004)Google Scholar
  23. 23.
    Harinarayan, V., Rajaraman, A., Ullman, J.D.: Implementing data cubes efficiently. In: ACM International Conference on management of data (SIGMOD), pp. 205–216. Montreal, Canada (1996)Google Scholar
  24. 24.
    Hong, M., Riedewald, M., Koch, C., Gehrke, J., Demers, A.J.: Rule-based multi-query optimization. In: EDBT, (2009)Google Scholar
  25. 25.
    Koudas, N., Srivastava, D.: Data stream query processing: a tutorial. In: International Conference on Very Large Data Bases (VLDB), p. 1149 (2003)Google Scholar
  26. 26.
    Krishnamurthy, S., Wu, C., Franklin, M.J.: On-the-fly sharing for streamed aggregation. In: SIGMOD Conference (2006)Google Scholar
  27. 27.
    Larson, P.-Å.: Data reduction by partial preaggregation. In: ICDE (2002)Google Scholar
  28. 28.
    Madden, S., Shah, M., Hellerstein, J., Raman, V.: Continuously adaptive continuous queries over streams. In: ACM International Conference on management of data (SIGMOD), pp. 49–60. Madison, USA (2002)Google Scholar
  29. 29.
    Ross, K.A., Srivastava, D., Sudarshan, S.: Materialized view maintenance and integrity constraint checking: trading space for time. In: ACM International Conference on management of data (SIGMOD), pp. 447–458. Montreal, Canada (1996)Google Scholar
  30. 30.
    Roussopoulos N.: View indexing in relational databases. ACM Trans. Database Syst. 7(2), 256–290 (1982)CrossRefGoogle Scholar
  31. 31.
    Sullivan, M., Heybey, A.: Tribeca: A system for managing large databases of network traffic. In: USENIX Technical Conference. New Orleans, USA (1998)Google Scholar
  32. 32.
    Wong E., Youssefi K.: Decomposition - a strategy for query processing. ACM Trans. Database Syst. 1(3), 223–241 (1976)CrossRefGoogle Scholar
  33. 33.
    Zhang, R., Koudas, N., Ooi, B.C., Srivastava, D.: Multiple aggregations over data streams. In: ACM International Conference on management of data (SIGMOD), pp. 299–310. Baltimore, USA (2005)Google Scholar

Copyright information

© Springer-Verlag 2010

Authors and Affiliations

  • Rui Zhang
    • 1
  • Nick Koudas
    • 2
  • Beng Chin Ooi
    • 3
  • Divesh Srivastava
    • 4
  • Pu Zhou
    • 1
  1. 1.University of MelbourneParkvilleAustralia
  2. 2.University of TorontoTorontoCanada
  3. 3.National University of SingaporeSingaporeSingapore
  4. 4.AT&T Labs–ResearchMiddletownUSA

Personalised recommendations