Skip to main content
Book cover

Data Streams pp 209–236Cite as

A Survey of Join Processing in Data Streams

  • Chapter

Part of the book series: Advances in Database Systems ((ADBS,volume 31))

Abstract

Given the fundamental role played by joins in querying relational databases, it is not surprising that stream join has also been the focus of much research on streams. Recall that relational (theta) join between two non-streaming relations R1 and R2, denoted R 1θ R 2, returns the set of all pairs (r 1, r 1), where r 1R 1, r 2R 2, and the join condition θ(r 1, r 2) evaluates to true. A straightforward extension of join to streams gives the following semantics (in rough terms): At any time t, the set of output tuples generated thus far by the join between two streams S 1 and S 2 should be the same as the result of the relational (non- streaming) join between the sets of input tuples that have arrived thus far in S 1 and S 2.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arasu, A., Babcock, B., Babu, S., McAlister, J., and Widom, J. (2002). Characterizing memory requirements for queries over continuous data streams. In Proceedings of the 2002 ACM Symposium on Principles of Database Systems, pages 221–232, Madison, Wisconsin, USA.

    Google Scholar 

  2. Arasu, A., Babu, S., and Widom, J. (2003). The CQL continuous query language: Semantic foundations and query execution. Technical Report 2003-67, InfoLab, Stanford University.

    Google Scholar 

  3. Avnur, R. and Hellerstein, J. M. (2000). Eddies: Continuously adaptive query processing. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pages 261–272, Dallas, Texas, USA.

    Google Scholar 

  4. Ayad, A. and Naughton, J. F. (2004). Static optimization of conjunctive queries with sliding windows over infinite streams. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pages 419–430, Paris, France.

    Google Scholar 

  5. Babu, S., Motwani, R., Munagala, K., Nishizawa, I., and Widom, J. (2004a). Adaptive ordering of pipelined stream filters. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pages 407–418, Paris, France.

    Google Scholar 

  6. Babu, S., Munagala, K., Widom, J., and Motwani, R. (2005). Adaptive caching for continuous queries. In Proceedings of the 2005 International Conference on Data Engineering, Tokyo, Japan.

    Google Scholar 

  7. Babu, S., Srivastava, U., and Widom, J. (2004b). Exploiting k-constraints to reduce memory overhead in continuous queries over data streams. ACM Transactions on Database Systems, 29(3):545–580.

    Article  Google Scholar 

  8. Babu, S. and Widom, J. (2001). Continuous queries over data streams. ACM SIGMOD Record.

    Google Scholar 

  9. Bizarro, P., Babu, S., DeWitt, D., and Widom, J. (2005). Content-based routing: Different plans for different data. In Proceedings of the 2005 International Conference on Very Large Data Bases, Trondheim, Norway.

    Google Scholar 

  10. Carney, D., Çetintemel, U., Cherniack, M., Convey, C, Lee, S., Seidman, G., Stonebraker, M., Tatbul, N., and Zdonik, S. B. (2002). Monitoring streams-a new class of data management applications. In Proceedings of the 2002 International Conference on Very Large Data Bases, pages 215–226, Hong Kong, China.

    Google Scholar 

  11. Chandrasekaran, S. and Franklin, M. J. (2003). PSoup: a system for streaming queries over streaming data. The VLDB Journal, 12(2): 140–156.

    Article  Google Scholar 

  12. Chaudhuri, S., Motwani, R., and Narasayya, V. R. (1999). On random sampling over joins. In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, pages 263–274, Philadelphia, Pennsylvania, USA.

    Google Scholar 

  13. Chen, J., DeWitt, D. J., Tian, F., and Wang, Y. (2000). NiagaraCQ: A scalable continuous query system for internet databases. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pages 379–390, Dallas, Texas, USA.

    Google Scholar 

  14. Chirkova, R., Halevy, A. Y., and Suciu, D. (2001). A formal perspective on the view selection problem. In Proceedings of the 2001 International Conference on Very Large Data Bases, pages 59–68, Roma, Italy.

    Google Scholar 

  15. Das, A., Gehrke, J., and Riedewald, M. (2003). Approximate join processing over data streams. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pages 40–51, San Diego, California, USA.

    Google Scholar 

  16. Deshpande, A. and Hellerstein, J. M. (2004). Lifting the burden of history from adaptive query processing. In Proceedings of the 2004 International Conference on Very Large Data Bases, pages 948–959, Toronto, Canada.

    Google Scholar 

  17. Ding, L., Mehta, N., Rundensteiner, E., and Heineman, G. (2004). Joining punctuated streams. In Proceedings of the 2004 International Conference on Extending Database Technology, Heraklion, Crete, Greece.

    Google Scholar 

  18. Ding, L. and Rundensteiner, E. A. (2004). Evaluating window joins over punctuated streams. In Proceedings of the 2004 International Conference on Information and Knowledge Management, pages 98–107, Washington DC, USA.

    Google Scholar 

  19. Dingel, J. and Strom, R., editors (2005). Proceedings of the 2005 International Workshop on Distributed Event Based Systems, Columbus, Ohio, USA.

    Google Scholar 

  20. Dittrich, J.-P., Seeger, B., Taylor, D. S., and Widmayer, P. (2002). Progressive merge join: A generic and non-blocking sort-based join algorithm. In Proceedings of the 2002 International Conference on Very Large Data Bases, pages 299–310, Hong Kong, China.

    Google Scholar 

  21. Garcia-Molina, H., Labio, W., and Yang, J. (1998). Expiring data in a warehouse. In Proceedings of the 1998 International Conference on Very Large Data Bases, pages 500–511, New York City, New York, USA.

    Google Scholar 

  22. Golab, L., Garg, S., and Özsu, T. (2004). On indexing sliding windows over on-line data streams. In Proceedings of the 2004 International Conference on Extending Database Technology, Heraklion, Crete, Greece.

    Google Scholar 

  23. Golab, L. and Özsu, M. T. (2003). Processing sliding window multi-joins in continuous queries over data streams. In Proceedings of the 2003 International Conference on Very Large Data Bases, pages 500–511, Berlin, Germany.

    Google Scholar 

  24. Golab, L. and Özsu, M. T. (2005). Update-pattern-aware modeling and processing of continuous queries. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pages 658–669, Baltimore, Maryland, USA.

    Google Scholar 

  25. Gupta, A. and Mumick, I. S., editors (1999). Materialized Views: Techniques, Implementations, and Applications. MIT Press.

    Google Scholar 

  26. Halevy, A. Y. (2001). Answering queries using views: A survey. The VLDB Journal, 10(4):270–294.

    Article  MATH  Google Scholar 

  27. Hammad, M. A., Aref, W. G., and Elmagarmid, A. K. (2003a). Stream window join: Tracking moving objects in sensor-network databases. In Proceedings of the 2003 International Conference on Scientific and Statistical Database Management, pages 75–84, Cambridge, Massachusetts, USA.

    Google Scholar 

  28. Hammad, M. A., Franklin, M. J., Aref, W. G., and Elmagarmid, A. K. (2003b). Scheduling for shared window joins over data streams. In Proceedings of the 2003 International Conference on Very Large Data Bases, pages 297–308, Berlin, Germany.

    Google Scholar 

  29. Ives, Z. G., Florescu, D., Friedman, M., Levy, A. Y., and Weld, D. S. (1999). An adaptive query execution system for data integration. In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, pages 299–310, Philadelphia, Pennsylvania, USA.

    Google Scholar 

  30. Kang, J., Naughton, J. F., and Viglas, S. (2003). Evaluating window joins over unbounded streams. In Proceedings of the 2003 International Conference on Data Engineering, pages 341–352, Bangalore, India.

    Google Scholar 

  31. Liu, B. and Rundensteiner, E. A. (2005). Cost-driven general join view maintenance over distributed data sources. In Proceedings of the 2005 International Conference on Data Engineering, pages 578–579, Tokyo, Japan.

    Google Scholar 

  32. Madden, S., Shah, M. A., Hellerstein, J. M., and Raman, V. (2002). Continuously adaptive continuous queries over streams. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, Wisconsin, USA.

    Google Scholar 

  33. Mokbel, M. F., Lu, M., and Aref, W. G. (2004). Hash-merge join: A non-blocking join algorithm for producing fast and early join results. In Proceedings of the 2004 International Conference on Data Engineering, pages 251–263, Boston, Massachusetts, USA.

    Google Scholar 

  34. Olken, F. (1993). Random Sampling from Databases. PhD thesis, University of California at Berkeley.

    Google Scholar 

  35. Quass, D., Gupta, A., Mumick, I. S., and Widom, J. (1996). Making views self-maintainable for data warehousing. In Proceedings of the 1996 International Conference on Parallel and Distributed Information Systems, pages 158–169, Miami Beach, Florida, USA.

    Google Scholar 

  36. Raman, V., Deshpande, A., and Hellerstein, J. M. (2003). Using state modules for adaptive query processing. In Proceedings of the 2003 International Conference on Data Engineering, pages 353–364, Bangalore, India.

    Google Scholar 

  37. Srivastava, U. and Widom, J. (2004). Memory-limited execution of windowed stream joins. In Proceedings of the 2004 International Conference on Very Large Data Bases, pages 324–335, Toronto, Canada.

    Google Scholar 

  38. Tao, Y., Yiu, M. L., Papadias, D., Hadjieleftheriou, M., and Mamoulis, N. (2005). RPJ: Producing fast join results on streams through rate-based optimization. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pages 371–382, Baltimore, Maryland, USA.

    Google Scholar 

  39. Tatbul, N., Cetintemel, U., Zdonik, S. B., Cherniack, M., and Stonebraker, M. (2003). Load shedding in a data stream manager. In Proceedings of the 2003 International Conference on Very Large Data Bases, pages 309–320, Berlin, Germany.

    Google Scholar 

  40. Tucker, P. A., Maier, D., Sheard, T., and Fegaras, L. (2003). Exploiting punctuation semantics in continuous data streams. IEEE Transactions on Knowledge and Data Engineering, 15(3):555–568.

    Article  Google Scholar 

  41. Urhan, T. and Franklin, M. J. (2001). Dynamic pipeline scheduling for improving interactive query performance. In Proceedings of the 2001 International Conference on Very Large Data Bases, pages 501–510, Roma, Italy.

    Google Scholar 

  42. Viglas, S. D. and Naughton, J. F. (2002). Rate-based query optimization for streaming information sources. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pages 37–48, Madison, Wisconsin, USA.

    Google Scholar 

  43. Viglas, S. D., Naughton, J. F., and Burger, J. (2003). Maximizing the output rate of multi-way join queries over streaming information sources. In Proceedings of the 2003 International Conference on Very Large Data Bases, pages 285–296, Berlin, Germany.

    Google Scholar 

  44. Wilschut, A. N. and Apers, P. M. G. (1991). Dataflow query execution in a parallel main-memory environment. In Proceedings of the 1991 International Conference on Parallel and Distributed Information Systems, pages 68–77, Miami Beach, Florida, USA.

    Google Scholar 

  45. Xie, J., Yang, J., and Chen, Y. (2005). On joining and caching stochastic streams. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pages 359–370, Baltimore, Maryland, USA.

    Google Scholar 

  46. Zhuge, Y., Garcia-Molina, H., and Wiener, J. L. (1998). Consistency algorithms for multi-source warehouse view maintenance. Distributed and Parallel Databases, 6(1):7–40.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Xie, J., Yang, J. (2007). A Survey of Join Processing in Data Streams. In: Aggarwal, C.C. (eds) Data Streams. Advances in Database Systems, vol 31. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-47534-9_10

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-47534-9_10

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-28759-1

  • Online ISBN: 978-0-387-47534-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics