A Survey of Join Processing in Data Streams

Xie, Junyi; Yang, Jun

doi:10.1007/978-0-387-47534-9_10

A Survey of Join Processing in Data Streams

Junyi Xie³ &
Jun Yang³

Chapter

3102 Accesses
14 Citations
3 Altmetric

Part of the book series: Advances in Database Systems ((ADBS,volume 31))

Abstract

Given the fundamental role played by joins in querying relational databases, it is not surprising that stream join has also been the focus of much research on streams. Recall that relational (theta) join between two non-streaming relations R1 and R2, denoted R ₁⋈_θ R ₂, returns the set of all pairs (r ₁, r ₁), where r ₁ ∈ R ₁, r ₂ ∈ R ₂, and the join condition θ(r ₁, r ₂) evaluates to true. A straightforward extension of join to streams gives the following semantics (in rough terms): At any time t, the set of output tuples generated thus far by the join between two streams S ₁ and S ₂ should be the same as the result of the relational (non- streaming) join between the sets of input tuples that have arrived thus far in S ₁ and S ₂.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Arasu, A., Babcock, B., Babu, S., McAlister, J., and Widom, J. (2002). Characterizing memory requirements for queries over continuous data streams. In Proceedings of the 2002 ACM Symposium on Principles of Database Systems, pages 221–232, Madison, Wisconsin, USA.
Google Scholar
Arasu, A., Babu, S., and Widom, J. (2003). The CQL continuous query language: Semantic foundations and query execution. Technical Report 2003-67, InfoLab, Stanford University.
Google Scholar
Avnur, R. and Hellerstein, J. M. (2000). Eddies: Continuously adaptive query processing. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pages 261–272, Dallas, Texas, USA.
Google Scholar
Ayad, A. and Naughton, J. F. (2004). Static optimization of conjunctive queries with sliding windows over infinite streams. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pages 419–430, Paris, France.
Google Scholar
Babu, S., Motwani, R., Munagala, K., Nishizawa, I., and Widom, J. (2004a). Adaptive ordering of pipelined stream filters. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pages 407–418, Paris, France.
Google Scholar
Babu, S., Munagala, K., Widom, J., and Motwani, R. (2005). Adaptive caching for continuous queries. In Proceedings of the 2005 International Conference on Data Engineering, Tokyo, Japan.
Google Scholar
Babu, S., Srivastava, U., and Widom, J. (2004b). Exploiting k-constraints to reduce memory overhead in continuous queries over data streams. ACM Transactions on Database Systems, 29(3):545–580.
Article Google Scholar
Babu, S. and Widom, J. (2001). Continuous queries over data streams. ACM SIGMOD Record.
Google Scholar
Bizarro, P., Babu, S., DeWitt, D., and Widom, J. (2005). Content-based routing: Different plans for different data. In Proceedings of the 2005 International Conference on Very Large Data Bases, Trondheim, Norway.
Google Scholar
Carney, D., Çetintemel, U., Cherniack, M., Convey, C, Lee, S., Seidman, G., Stonebraker, M., Tatbul, N., and Zdonik, S. B. (2002). Monitoring streams-a new class of data management applications. In Proceedings of the 2002 International Conference on Very Large Data Bases, pages 215–226, Hong Kong, China.
Google Scholar
Chandrasekaran, S. and Franklin, M. J. (2003). PSoup: a system for streaming queries over streaming data. The VLDB Journal, 12(2): 140–156.
Article Google Scholar
Chaudhuri, S., Motwani, R., and Narasayya, V. R. (1999). On random sampling over joins. In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, pages 263–274, Philadelphia, Pennsylvania, USA.
Google Scholar
Chen, J., DeWitt, D. J., Tian, F., and Wang, Y. (2000). NiagaraCQ: A scalable continuous query system for internet databases. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pages 379–390, Dallas, Texas, USA.
Google Scholar
Chirkova, R., Halevy, A. Y., and Suciu, D. (2001). A formal perspective on the view selection problem. In Proceedings of the 2001 International Conference on Very Large Data Bases, pages 59–68, Roma, Italy.
Google Scholar
Das, A., Gehrke, J., and Riedewald, M. (2003). Approximate join processing over data streams. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pages 40–51, San Diego, California, USA.
Google Scholar
Deshpande, A. and Hellerstein, J. M. (2004). Lifting the burden of history from adaptive query processing. In Proceedings of the 2004 International Conference on Very Large Data Bases, pages 948–959, Toronto, Canada.
Google Scholar
Ding, L., Mehta, N., Rundensteiner, E., and Heineman, G. (2004). Joining punctuated streams. In Proceedings of the 2004 International Conference on Extending Database Technology, Heraklion, Crete, Greece.
Google Scholar
Ding, L. and Rundensteiner, E. A. (2004). Evaluating window joins over punctuated streams. In Proceedings of the 2004 International Conference on Information and Knowledge Management, pages 98–107, Washington DC, USA.
Google Scholar
Dingel, J. and Strom, R., editors (2005). Proceedings of the 2005 International Workshop on Distributed Event Based Systems, Columbus, Ohio, USA.
Google Scholar
Dittrich, J.-P., Seeger, B., Taylor, D. S., and Widmayer, P. (2002). Progressive merge join: A generic and non-blocking sort-based join algorithm. In Proceedings of the 2002 International Conference on Very Large Data Bases, pages 299–310, Hong Kong, China.
Google Scholar
Garcia-Molina, H., Labio, W., and Yang, J. (1998). Expiring data in a warehouse. In Proceedings of the 1998 International Conference on Very Large Data Bases, pages 500–511, New York City, New York, USA.
Google Scholar
Golab, L., Garg, S., and Özsu, T. (2004). On indexing sliding windows over on-line data streams. In Proceedings of the 2004 International Conference on Extending Database Technology, Heraklion, Crete, Greece.
Google Scholar
Golab, L. and Özsu, M. T. (2003). Processing sliding window multi-joins in continuous queries over data streams. In Proceedings of the 2003 International Conference on Very Large Data Bases, pages 500–511, Berlin, Germany.
Google Scholar
Golab, L. and Özsu, M. T. (2005). Update-pattern-aware modeling and processing of continuous queries. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pages 658–669, Baltimore, Maryland, USA.
Google Scholar
Gupta, A. and Mumick, I. S., editors (1999). Materialized Views: Techniques, Implementations, and Applications. MIT Press.
Google Scholar
Halevy, A. Y. (2001). Answering queries using views: A survey. The VLDB Journal, 10(4):270–294.
Article MATH Google Scholar
Hammad, M. A., Aref, W. G., and Elmagarmid, A. K. (2003a). Stream window join: Tracking moving objects in sensor-network databases. In Proceedings of the 2003 International Conference on Scientific and Statistical Database Management, pages 75–84, Cambridge, Massachusetts, USA.
Google Scholar
Hammad, M. A., Franklin, M. J., Aref, W. G., and Elmagarmid, A. K. (2003b). Scheduling for shared window joins over data streams. In Proceedings of the 2003 International Conference on Very Large Data Bases, pages 297–308, Berlin, Germany.
Google Scholar
Ives, Z. G., Florescu, D., Friedman, M., Levy, A. Y., and Weld, D. S. (1999). An adaptive query execution system for data integration. In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, pages 299–310, Philadelphia, Pennsylvania, USA.
Google Scholar
Kang, J., Naughton, J. F., and Viglas, S. (2003). Evaluating window joins over unbounded streams. In Proceedings of the 2003 International Conference on Data Engineering, pages 341–352, Bangalore, India.
Google Scholar
Liu, B. and Rundensteiner, E. A. (2005). Cost-driven general join view maintenance over distributed data sources. In Proceedings of the 2005 International Conference on Data Engineering, pages 578–579, Tokyo, Japan.
Google Scholar
Madden, S., Shah, M. A., Hellerstein, J. M., and Raman, V. (2002). Continuously adaptive continuous queries over streams. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, Wisconsin, USA.
Google Scholar
Mokbel, M. F., Lu, M., and Aref, W. G. (2004). Hash-merge join: A non-blocking join algorithm for producing fast and early join results. In Proceedings of the 2004 International Conference on Data Engineering, pages 251–263, Boston, Massachusetts, USA.
Google Scholar
Olken, F. (1993). Random Sampling from Databases. PhD thesis, University of California at Berkeley.
Google Scholar
Quass, D., Gupta, A., Mumick, I. S., and Widom, J. (1996). Making views self-maintainable for data warehousing. In Proceedings of the 1996 International Conference on Parallel and Distributed Information Systems, pages 158–169, Miami Beach, Florida, USA.
Google Scholar
Raman, V., Deshpande, A., and Hellerstein, J. M. (2003). Using state modules for adaptive query processing. In Proceedings of the 2003 International Conference on Data Engineering, pages 353–364, Bangalore, India.
Google Scholar
Srivastava, U. and Widom, J. (2004). Memory-limited execution of windowed stream joins. In Proceedings of the 2004 International Conference on Very Large Data Bases, pages 324–335, Toronto, Canada.
Google Scholar
Tao, Y., Yiu, M. L., Papadias, D., Hadjieleftheriou, M., and Mamoulis, N. (2005). RPJ: Producing fast join results on streams through rate-based optimization. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pages 371–382, Baltimore, Maryland, USA.
Google Scholar
Tatbul, N., Cetintemel, U., Zdonik, S. B., Cherniack, M., and Stonebraker, M. (2003). Load shedding in a data stream manager. In Proceedings of the 2003 International Conference on Very Large Data Bases, pages 309–320, Berlin, Germany.
Google Scholar
Tucker, P. A., Maier, D., Sheard, T., and Fegaras, L. (2003). Exploiting punctuation semantics in continuous data streams. IEEE Transactions on Knowledge and Data Engineering, 15(3):555–568.
Article Google Scholar
Urhan, T. and Franklin, M. J. (2001). Dynamic pipeline scheduling for improving interactive query performance. In Proceedings of the 2001 International Conference on Very Large Data Bases, pages 501–510, Roma, Italy.
Google Scholar
Viglas, S. D. and Naughton, J. F. (2002). Rate-based query optimization for streaming information sources. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pages 37–48, Madison, Wisconsin, USA.
Google Scholar
Viglas, S. D., Naughton, J. F., and Burger, J. (2003). Maximizing the output rate of multi-way join queries over streaming information sources. In Proceedings of the 2003 International Conference on Very Large Data Bases, pages 285–296, Berlin, Germany.
Google Scholar
Wilschut, A. N. and Apers, P. M. G. (1991). Dataflow query execution in a parallel main-memory environment. In Proceedings of the 1991 International Conference on Parallel and Distributed Information Systems, pages 68–77, Miami Beach, Florida, USA.
Google Scholar
Xie, J., Yang, J., and Chen, Y. (2005). On joining and caching stochastic streams. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pages 359–370, Baltimore, Maryland, USA.
Google Scholar
Zhuge, Y., Garcia-Molina, H., and Wiener, J. L. (1998). Consistency algorithms for multi-source warehouse view maintenance. Distributed and Parallel Databases, 6(1):7–40.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Duke University, USA
Junyi Xie & Jun Yang

Authors

Junyi Xie
View author publications
You can also search for this author in PubMed Google Scholar
Jun Yang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IBM, Thomas J. Watson Research Center, 19 Skyline Drive, Hawthorne, NY, 10532
Charu C. Aggarwal

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Xie, J., Yang, J. (2007). A Survey of Join Processing in Data Streams. In: Aggarwal, C.C. (eds) Data Streams. Advances in Database Systems, vol 31. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-47534-9_10

Download citation

DOI: https://doi.org/10.1007/978-0-387-47534-9_10
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-28759-1
Online ISBN: 978-0-387-47534-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics