Advertisement

A Disk-Based, Adaptive Approach to Memory-Limited Computation of Windowed Stream Joins

  • Abhirup Chakraborty
  • Ajit Singh
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6261)

Abstract

We consider the problem of processing exact results for sliding window joins over data streams with limited memory. Existing approaches either, (a) deal with memory limitations by shedding loads, and therefore can not provide exact or even highly accurate results for sliding window joins over data streams showing time varying rate of data arrivals, or (b) suffer from large IO-overhead due to random disk flushes and disk-to-disk stages with a stream join, making the approaches inefficient to handle sliding window joins. We provide an Adaptive, Hash-partitioned Exact Window Join (AH-EWJ) algorithm incorporating disk storage as an archive. Our algorithm spills window data onto the disk on a periodic basis, and refines the output result by properly retrieving the disk resident data, and maximizes output rate by employing techniques to manage the memory blocks and by continuously adjusting the allocated memory within the stream windows. The problem of managing the window blocks in memory—similar in nature to the caching issue—captures both the temporal and frequency related properties of the stream arrivals. The algorithm adapts memory allocation both at a window level and a partition level. We provide experimental results demonstrating the performance and effectiveness of the proposed algorithm.

Keywords

Arrival Rate Data Stream Continuous Query Part Partition Stream Arrival 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Carney, D.: etintemel, U., Cherniack, M., Convey, C., Lee, S., Seidman, G., Stonebraker, M., Tatbul, N., Zdonik, S.B.: Monitoring streams – a new class of data management applications. In: Proc. Intl. Conf. on Very Large Databases (VLDB), Hong Kong, China, August 2002, pp. 215–226 (2002)Google Scholar
  2. 2.
    Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M.J., Hellerstein, J.M., Hong, W., Krishnamurthy, S., Madden, S., Raman, V., Reiss, F., Shah, M.A.: TelegraphCQ: Continuous dataflow processing for an uncertain world. In: Proc. Conf. on Innovative Data Systems Research, CIDR (January 2003)Google Scholar
  3. 3.
    Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Processing sliding window multi-joins in continuous queries over data streams. In: Proc. ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), Madison, Wisconsin, USA, June 2002, pp. 1–16 (2002)Google Scholar
  4. 4.
    Gedik, B., Wu, K.-L., Yu, P.S., Liu, L.: A load shedding framework and optimizations for m-way windowed stream joins. In: Proc. Intl. Conf. on Data Engineering (ICDE), Istanbul, Turkey, April 2007, pp. 536–545 (2007)Google Scholar
  5. 5.
    Srivastava, U., Widom, J.: Memory-limited execution of windowed stream joins. In: Proc. Intl. Conf. on Very Large Databases (VLDB), Toronto, Canada, September 2004, pp. 324–335 (2004)Google Scholar
  6. 6.
    Das, A., Gehrke, J., Riedewald, M.: Approximate join processing over data streams. In: Proc. ACM SIGMOD Intl. Conf. on Management of Data, San Diego, USA, June 2003, pp. 40–51 (2003)Google Scholar
  7. 7.
    Urhan, T., Franklin, M.J.: XJoin: A reactively-scheduled pipelined join operator. IEEE Data Engineering Bulletin 23(2), 7–18 (2000)Google Scholar
  8. 8.
    Mokbel, M., Liu, M., Aref, W.: Hash-merge-join: A non-blocking join algorithm for producing fast and early join results. In: Proc. Intl. Conf. on Data Engineering (ICDE), pp. 251–263 (2004)Google Scholar
  9. 9.
    Viglas, S.D., Naughton, J.F., Burger, J.: Maximizing the output rate of multi-way join queries over streaming information sources. In: Proc. Intl. Conf. on Very Large Databases (VLDB), Berlin, Germany, September 2003, pp. 285–296 (2003)Google Scholar
  10. 10.
    Liu, B., Zhu, Y., Rundensteiner, E.A.: Run-time operator state spilling for memory intensive long-running queries. In: Proc. ACM SIGMOD Intl. Conf. on Management of Data, Chicago, Illinois, USA, June 2006, pp. 347–358 (2006)Google Scholar
  11. 11.
    Tao, Y., Yiu, M.L., Papadias, D., Hadjieleftheriou, M., Mamoulis, N.: RPJ: Producing fast join results on streams through rate-based optimization. In: Proc. ACM SIGMOD Intl. Conf. on Management of Data, Baltimore, Maryland, USA, June 2005, pp. 371–382 (2005)Google Scholar
  12. 12.
    Wilschut, A.N., Apers, P.M.G.: Dataflow query execution in a parallel main-memory environment. In: Proc. Intl. Conf. on Parallel and Distributed Information Systems (PDIS), Miami, Florida, USA, December 1991, pp. 68–77 (1991)Google Scholar
  13. 13.
    Dittrich, J.-P., Seeger, B., Taylor, D.S., Widmayer, P.: Progressive merge join: A generic and non-blocking sort-based join algorithm. In: Proc. Intl. Conf. on Very Large Databases (VLDB), Hong kong, China, August 2002, pp. 299–310 (2002)Google Scholar
  14. 14.
    Levandoski, J., Khalefa, M.E., Mokbel, M.F.: Permjoin: An efficient algorithm for producing early results in multi-join query plans. In: Proc. Intl. Conf. on Data Engineering (ICDE), Cancun, Mexico, pp. 1433–1435 (2008)Google Scholar
  15. 15.
    Bornea, M.A., Vassalos, V., Kotidis, Y., Deligiannakis, A.: Double index nested-loop reactive join for result rate optimization. In: Proc. Intl. Conf. on Data Engineering (ICDE), pp. 481–492 (2009)Google Scholar
  16. 16.
    Kang, J., Naughton, J.F., Viglas, S.: Evaluating window joins over unbounded streams. In: Proc. Intl. Conf. on Data Engineering, Bangalore, India, March 2003, pp. 341–352 (2003)Google Scholar
  17. 17.
    Golab, L., Ozsu, T.: Processing sliding window multi-joins in continuous queries over data streams. In: Proc. Intl. Conf. on Very Large Databases (VLDB), Berlin, Germany, September 2003, pp. 500–511 (2003)Google Scholar
  18. 18.
    Chakraborty, A., Singh, A.: A partition-based approach to support streaming updates over persistent data in an active data warehouse. To Appear Proc. IEEE Intl. Symp. on Parallel and Distributed Processing (IPDPS), Rome, Italy, May 2009, pp. 1–11 (2009)Google Scholar
  19. 19.
    Chakraborty, A., Singh, A.: A Disk-based, Adaptive Approach to Memory-Limited Computation of Exact Results for Windowed Stream Joins. Department of Electrical & Computer Engineering, Technical Report UW-ECE #2009-09, University of Waterloo, Canada (2009)Google Scholar
  20. 20.
    Wang, M., Ailamaki, A., Faloutsos, C.: Capturing the spatio-temporal behavior of real traffic data. In: IFIP Intl. Symp. on Computer Performance Modeling, Measurement and Evaluation, Rome, Italy (September 2002)Google Scholar
  21. 21.
    Wang, M., Papadimitriou, S., Madhyastha, T., Faloutsos, C., Change, N.H.: Data mining meets performance evaluation: Fast algorithms for modeling bursty traffic. In: Proc. Intl. Conf. on Data Engineering, February 2002, pp. 507–516 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Abhirup Chakraborty
    • 1
  • Ajit Singh
    • 1
  1. 1.Dept. of Electrical and Computer EngineeringUniversity of WaterlooCanada

Personalised recommendations