The VLDB Journal

, Volume 19, Issue 4, pp 557–583

Streaming multiple aggregations using phantoms


    • University of Melbourne
  • Nick Koudas
    • University of Toronto
  • Beng Chin Ooi
    • National University of Singapore
  • Divesh Srivastava
    • AT&T Labs–Research
  • Pu Zhou
    • University of Melbourne
Regular Paper

DOI: 10.1007/s00778-010-0180-z

Cite this article as:
Zhang, R., Koudas, N., Ooi, B.C. et al. The VLDB Journal (2010) 19: 557. doi:10.1007/s00778-010-0180-z


Data streams characterize the high speed and large volume input of a new class of applications such as network monitoring, web content analysis and sensor networks. Among these applications, network monitoring may be the most compelling one—the backbone of a large internet service provider can generate 1 petabyte of data per day. For many network monitoring tasks such as traffic analysis and statistics collection, aggregation is a primitive operation. Various analytical and statistical needs naturally lead to related aggregate queries. In this article, we address the problem of efficiently computing multiple aggregations over high-speed data streams based on the two-level query processing architecture of GS, a real data stream management system deployed in AT & T. We discern that additionally computing and maintaining fine-granularity aggregations (called phantoms) has the benefit of supporting shared computation. Based on a thorough analysis, we propose algorithms to identify the best set of phantoms to maintain and determine allocation of resources (particularly, space) to compute the aggregations. Experiments show that our algorithm achieves near-optimal computation costs, which outperforms the best adapted algorithm by more than an order of magnitude.


Data streamAggregationMultiple-query optimizationPhantomGS

Copyright information

© Springer-Verlag 2010