Incremental Aggregation on Multiple Continuous Queries

  • Chun Jin
  • Jaime Carbonell
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4203)


Continuously monitoring large-scale aggregates over data streams is important for many stream processing applications, e.g. collaborative intelligence analysis, and presents new challenges to data management systems. The first challenge is to efficiently generate the updated aggregate values and provide the new results to users after new tuples arrive. We implemented an incremental aggregation mechanism for doing so for arbitrary algebraic aggregate functions including user-defined ones by keeping up-to-date finite data summaries. The second challenge is to construct shared query evaluation plans to support large-scale queries effectively. Since multiple query optimization is NP-complete and the queries generally arrive asynchronously, we apply an incremental sharing approach to obtain the shared plans that perform reasonably well. The system is built as a part of ARGUS, a stream processing system atop of a DBMS. The evaluation study shows that our approaches are effective and efficient on typical collaborative intelligence analysis data and queries.


Execution Time Data Stream Algebraic Function Query Plan Continuous Query 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abadi, D.J., et al.: Aurora: a new model and architecture for data stream management. VLDB J. 12(2), 120–139 (2003)CrossRefGoogle Scholar
  2. 2.
    Agarwal, S., et al.: On the computation of multidimensional aggregates. In: VLDB, pp. 506–521 (1996)Google Scholar
  3. 3.
    Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: PODS, pp. 1–16 (2002)Google Scholar
  4. 4.
    Blakeley, J.A., Coburn, N., Larson, P.-Å.: Updating derived relations: Detecting irrelevant and autonomously computable updates. ACM Trans. Database Syst. 14(3), 369–400 (1989)Google Scholar
  5. 5.
    Chandrasekaran, S., et al.: TelegraphCQ: Continuous Dataflow Processing for an Uncertain World. In: CIDR (January 2003)Google Scholar
  6. 6.
    Chen, J., DeWitt, D.J., Tian, F., Wang, Y.: Niagaracq: A scalable continuous query system for internet databases. In: SIGMOD Conference, pp. 379–390 (2000)Google Scholar
  7. 7.
    Chen, Z., Narasayya, V.R.: Efficient computation of multiple group by queries. In: SIGMOD Conference, pp. 263–274 (2005)Google Scholar
  8. 8.
    Cormode, G., et al.: Holistic aggregates in a networked world: Distributed tracking of approximate quantiles. In: SIGMOD Conference, pp. 25–36 (2005)Google Scholar
  9. 9.
    DeHaan, D., Larson, P.-Å., Zhou, J.: Stacked indexed views in Microsoft SQL Server. In: SIGMOD Conference, pp. 179–190 (2005)Google Scholar
  10. 10.
    Gazen, C., Carbonell, J., Hayes, P.: Novelty Detection in Data Streams: A Small Step Towards Anticipating Strategic Surprise. In: NIMD PI Meeting, Washington, DC (2005)Google Scholar
  11. 11.
    Gray, J., et al.: Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. J. Data Mining and Knowledge Discovery 1(1), 29–53 (1997)CrossRefGoogle Scholar
  12. 12.
    Gupta, A., Jagadish, H.V., Mumick, I.S.: Data integration using self-maintainable views. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 140–144. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  13. 13.
    Harinarayan, V., Rajaraman, A., Ullman, J.D.: Implementing data cubes efficiently. In: SIGMOD Conference, pp. 205–216 (1996)Google Scholar
  14. 14.
    Jin, C., Carbonell, J.: Toward Incremental Sharing On Continuous Queries. Tech. Report available upon request from authors, Carnegie Mellon Univ. (2005)Google Scholar
  15. 15.
    Jin, C., Carbonell, J., Hayes, P.: ARGUS: Rete + DBMS = Efficient Persistent Profile Matching on Large-Volume Data Streams. In: Hacid, M.-S., Murray, N.V., Raś, Z.W., Tsumoto, S. (eds.) ISMIS 2005. LNCS (LNAI), vol. 3488, pp. 142–151. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  16. 16.
    Levy, A.Y., Mendelzon, A.O., Sagiv, Y., Srivastava, D.: Answering queries using views. In: PODS, pp. 95–104 (1995)Google Scholar
  17. 17.
    Li, J., Maier, D., Tufte, K., Papadimos, V., Tucker, P.A.: Semantics and evaluation techniques for window aggregates in data streams. In: SIGMOD Conf., pp. 311–322 (2005)Google Scholar
  18. 18.
    Olston, C., Jiang, J., Widom, J.: Adaptive filters for continuous queries over distributed data streams. In: SIGMOD Conference, pp. 563–574 (2003)Google Scholar
  19. 19.
    Ross, K.A., Srivastava, D.: Fast computation of sparse datacubes. In: VLDB, pp. 116–125 (1997)Google Scholar
  20. 20.
    Scheufele, W., Moerkotte, G.: On the complexity of generating optimal plans with cross products. In: PODS, pp. 238–248 (1997)Google Scholar
  21. 21.
    Sellis, T.K., Ghosh, S.: On the multiple-query optimization problem. IEEE Trans. Knowl. Data Eng. 2(2), 262–266 (1990)CrossRefGoogle Scholar
  22. 22.
    Zhang, M., Kao, B., Cheung, D.W.-L., Yip, K.: Mining periodic patterns with gap requirement from sequences. In: SIGMOD Conference (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Chun Jin
    • 1
  • Jaime Carbonell
    • 1
  1. 1.Language Technologies Institute, School of Computer ScienceCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations