Joining Punctuated Streams

  • Luping Ding
  • Nishant Mehta
  • Elke A. Rundensteiner
  • George T. Heineman
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2992)

Abstract

We focus on stream join optimization by exploiting the constraints that are dynamically embedded into data streams to signal the end of transmitting certain attribute values. These constraints are called punctuations. Our stream join operator, PJoin, is able to remove no-longer-useful data from the state in a timely manner based on punctuations, thus reducing memory overhead and improving the efficiency of probing. We equip PJoin with several alternate strategies for purging the state and for propagating punctuations to benefit down-stream operators. We also present an extensive experimental study to explore the performance gains achieved by purging state as well as the trade-off between different purge strategies. Our experimental results of comparing the performance of PJoin with XJoin, a stream join operator without a constraint-exploiting mechanism, show that PJoin significantly outperforms XJoin with regard to both memory overhead and throughput.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abadi, D., Carney, D., Cetintemel, U., Cherniack, M., Convey, C., Lee, S., Stonebraker, M., Tatbul, N., Zdonik, S.: Aurora: A new model and architecture for data stream management. VLDB Journal 12(2), 120–139 (2003)CrossRefGoogle Scholar
  2. 2.
    Arasu, A., Babcock, B., Babu, S., McAlister, J., Widom, J.: Characterizing memory requirements for queries over continuous data streams. In: PODS, June 2002, pp. 221–232 (2002)Google Scholar
  3. 3.
    Babu, S., Widom, J.: Exploiting k-constraints to reduce memory overhead in continuous queries over data streams. Technical report, Stanford Univ. (November 2002)Google Scholar
  4. 4.
    Carney, D., Cetintemel, U., Cherniack, M., Convey, C., Lee, S., Seidman, G., Stonebraker, M., Tatbul, N., Zdonik, S.: Monitoring streams - a new class of data management applications. In: VLDB, August 2002, pp. 215–226 (2002)Google Scholar
  5. 5.
    Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M., Hellerstein, J., Hong, W., Krishnamurthy, S., Madden, S., Raman, V., Reiss, F., Shah, M.: TelegraphCQ: Continuous dataflow processing for an uncertain world. In: CIDR, January 2003, pp. 269–280 (2003)Google Scholar
  6. 6.
    Chen, J., DeWitt, D., Tian, F., Wang, Y.: NiagaraCQ: A scalable continuous query system for internet databases. In: ACM SIGMOD, June 2002, pp. 379–390 (2002)Google Scholar
  7. 7.
    Ding, L., Rundensteiner, E.A., Heineman, G.T.: MJoin: A metadata-aware stream join operator. In: DEBS (June 2003)Google Scholar
  8. 8.
    Golab, L., Ozsu, M.T.: Processing sliding window multi-joins in continuous queries over data streams. In: VLDB, September 2003, pp. 500–511 (2003)Google Scholar
  9. 9.
    Haas, P., Hellerstein, J.: Ripple joins for online aggregation. In: ACM SIGMOD, June 1999, pp. 287–298 (1999)Google Scholar
  10. 10.
    Hammad, M.A., Franklin, M.J., Aref, W.G., Elmagarmid, A.K.: Scheduling for shared window joins over data streams. In: VLDB, September 2003, pp. 297–308 (2003)Google Scholar
  11. 11.
    Hellerstein, J.M., Franklin, M.J., Chandrasekaran, S., Deshpande, A., Hildrum, K., Madden, S., Raman, V., Shah, M.: Adaptive query processing: Technology in evolution. IEEE Data Engineering Bulletin 23(2), 7–18 (2000)Google Scholar
  12. 12.
    Ives, Z.G., Florescu, D., Friedman, M., Levy, A., Weld, D.S.: An adaptive query execution system for data integration. In: ACM SIGMOD, pp. 299–310 (1999)Google Scholar
  13. 13.
    Kang, J., Naughton, J.F., Viglas, S.D.: Evaluating window joins over unbounded streams. In: ICDE, March 2003, pp. 341–352 (2003)Google Scholar
  14. 14.
    Madden, S., Franklin, M.: Fjording the stream: An architecture for queries over streaming sensor data. In: ICDE, February 2002, pp. 555–566 (2002)Google Scholar
  15. 15.
    Madden, S., Shah, M., Hellerstein, J.M., Raman, V.: Continuously adaptive continuous queries over streams. In: ACM SIGMOD, June 2002, pp. 49–60 (2002)Google Scholar
  16. 16.
    Motwani, R., Widom, J., Arasu, A., Babcock, B., Babu, S., Datar, M., Manku, G., Olston, C., Rosenstein, J., Varma, R.: Query processing, resource management, and approximation in a data stream management system. In: CIDR, January 2003, pp. 245–256 (2003)Google Scholar
  17. 17.
    Su, H., Jian, J., Rundensteiner, E.A.: Raindrop: A uniform and layered algebraic framework for XQueries on XML streams. In: CIKM, September 2003, pp. 279–286 (2003)Google Scholar
  18. 18.
    Tucker, P.A., Maier, D., Sheard, T., Fegaras, L.: Exploiting punctuation semantics in continuous data streams. IEEE Transactions on Knowledge and Data Engineering 15(3), 555–568 (2003)CrossRefGoogle Scholar
  19. 19.
    Urhan, T., Franklin, M.: XJoin: A reactively scheduled pipelined join operator. IEEE Data Engineering Bulletin 23(2), 27–33 (2000)Google Scholar
  20. 20.
    Urhan, T., Franklin, M.J.: Dynamic pipeline scheduling for improving interactive query performance. In: VLDB, September 2001, pp. 501–510 (2001)Google Scholar
  21. 21.
    Viglas, S., Naughton, J., Burger, J.: Maximizing the output rate of multi-way join queries over streaming information. In: VLDB, September 2003, pp. 285–296 (2003)Google Scholar
  22. 22.
    Wilschut, A.N., Apers, P.M.G.: Dataflow query execution in a parallel mainmemory environment. Distributed and Parallel Databases 1(1), 103–128 (1993)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Luping Ding
    • 1
  • Nishant Mehta
    • 1
  • Elke A. Rundensteiner
    • 1
  • George T. Heineman
    • 1
  1. 1.Department of Computer ScienceWorcester Polytechnic InstituteWorcester

Personalised recommendations