Abstract
Given the fundamental role played by joins in querying relational databases, it is not surprising that stream join has also been the focus of much research on streams. Recall that relational (theta) join between two non-streaming relations R1 and R2, denoted R 1⋈θ R 2, returns the set of all pairs (r 1, r 1), where r 1 ∈ R 1, r 2 ∈ R 2, and the join condition θ(r 1, r 2) evaluates to true. A straightforward extension of join to streams gives the following semantics (in rough terms): At any time t, the set of output tuples generated thus far by the join between two streams S 1 and S 2 should be the same as the result of the relational (non- streaming) join between the sets of input tuples that have arrived thus far in S 1 and S 2.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Arasu, A., Babcock, B., Babu, S., McAlister, J., and Widom, J. (2002). Characterizing memory requirements for queries over continuous data streams. In Proceedings of the 2002 ACM Symposium on Principles of Database Systems, pages 221–232, Madison, Wisconsin, USA.
Arasu, A., Babu, S., and Widom, J. (2003). The CQL continuous query language: Semantic foundations and query execution. Technical Report 2003-67, InfoLab, Stanford University.
Avnur, R. and Hellerstein, J. M. (2000). Eddies: Continuously adaptive query processing. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pages 261–272, Dallas, Texas, USA.
Ayad, A. and Naughton, J. F. (2004). Static optimization of conjunctive queries with sliding windows over infinite streams. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pages 419–430, Paris, France.
Babu, S., Motwani, R., Munagala, K., Nishizawa, I., and Widom, J. (2004a). Adaptive ordering of pipelined stream filters. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pages 407–418, Paris, France.
Babu, S., Munagala, K., Widom, J., and Motwani, R. (2005). Adaptive caching for continuous queries. In Proceedings of the 2005 International Conference on Data Engineering, Tokyo, Japan.
Babu, S., Srivastava, U., and Widom, J. (2004b). Exploiting k-constraints to reduce memory overhead in continuous queries over data streams. ACM Transactions on Database Systems, 29(3):545–580.
Babu, S. and Widom, J. (2001). Continuous queries over data streams. ACM SIGMOD Record.
Bizarro, P., Babu, S., DeWitt, D., and Widom, J. (2005). Content-based routing: Different plans for different data. In Proceedings of the 2005 International Conference on Very Large Data Bases, Trondheim, Norway.
Carney, D., Çetintemel, U., Cherniack, M., Convey, C, Lee, S., Seidman, G., Stonebraker, M., Tatbul, N., and Zdonik, S. B. (2002). Monitoring streams-a new class of data management applications. In Proceedings of the 2002 International Conference on Very Large Data Bases, pages 215–226, Hong Kong, China.
Chandrasekaran, S. and Franklin, M. J. (2003). PSoup: a system for streaming queries over streaming data. The VLDB Journal, 12(2): 140–156.
Chaudhuri, S., Motwani, R., and Narasayya, V. R. (1999). On random sampling over joins. In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, pages 263–274, Philadelphia, Pennsylvania, USA.
Chen, J., DeWitt, D. J., Tian, F., and Wang, Y. (2000). NiagaraCQ: A scalable continuous query system for internet databases. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pages 379–390, Dallas, Texas, USA.
Chirkova, R., Halevy, A. Y., and Suciu, D. (2001). A formal perspective on the view selection problem. In Proceedings of the 2001 International Conference on Very Large Data Bases, pages 59–68, Roma, Italy.
Das, A., Gehrke, J., and Riedewald, M. (2003). Approximate join processing over data streams. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pages 40–51, San Diego, California, USA.
Deshpande, A. and Hellerstein, J. M. (2004). Lifting the burden of history from adaptive query processing. In Proceedings of the 2004 International Conference on Very Large Data Bases, pages 948–959, Toronto, Canada.
Ding, L., Mehta, N., Rundensteiner, E., and Heineman, G. (2004). Joining punctuated streams. In Proceedings of the 2004 International Conference on Extending Database Technology, Heraklion, Crete, Greece.
Ding, L. and Rundensteiner, E. A. (2004). Evaluating window joins over punctuated streams. In Proceedings of the 2004 International Conference on Information and Knowledge Management, pages 98–107, Washington DC, USA.
Dingel, J. and Strom, R., editors (2005). Proceedings of the 2005 International Workshop on Distributed Event Based Systems, Columbus, Ohio, USA.
Dittrich, J.-P., Seeger, B., Taylor, D. S., and Widmayer, P. (2002). Progressive merge join: A generic and non-blocking sort-based join algorithm. In Proceedings of the 2002 International Conference on Very Large Data Bases, pages 299–310, Hong Kong, China.
Garcia-Molina, H., Labio, W., and Yang, J. (1998). Expiring data in a warehouse. In Proceedings of the 1998 International Conference on Very Large Data Bases, pages 500–511, New York City, New York, USA.
Golab, L., Garg, S., and Özsu, T. (2004). On indexing sliding windows over on-line data streams. In Proceedings of the 2004 International Conference on Extending Database Technology, Heraklion, Crete, Greece.
Golab, L. and Özsu, M. T. (2003). Processing sliding window multi-joins in continuous queries over data streams. In Proceedings of the 2003 International Conference on Very Large Data Bases, pages 500–511, Berlin, Germany.
Golab, L. and Özsu, M. T. (2005). Update-pattern-aware modeling and processing of continuous queries. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pages 658–669, Baltimore, Maryland, USA.
Gupta, A. and Mumick, I. S., editors (1999). Materialized Views: Techniques, Implementations, and Applications. MIT Press.
Halevy, A. Y. (2001). Answering queries using views: A survey. The VLDB Journal, 10(4):270–294.
Hammad, M. A., Aref, W. G., and Elmagarmid, A. K. (2003a). Stream window join: Tracking moving objects in sensor-network databases. In Proceedings of the 2003 International Conference on Scientific and Statistical Database Management, pages 75–84, Cambridge, Massachusetts, USA.
Hammad, M. A., Franklin, M. J., Aref, W. G., and Elmagarmid, A. K. (2003b). Scheduling for shared window joins over data streams. In Proceedings of the 2003 International Conference on Very Large Data Bases, pages 297–308, Berlin, Germany.
Ives, Z. G., Florescu, D., Friedman, M., Levy, A. Y., and Weld, D. S. (1999). An adaptive query execution system for data integration. In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, pages 299–310, Philadelphia, Pennsylvania, USA.
Kang, J., Naughton, J. F., and Viglas, S. (2003). Evaluating window joins over unbounded streams. In Proceedings of the 2003 International Conference on Data Engineering, pages 341–352, Bangalore, India.
Liu, B. and Rundensteiner, E. A. (2005). Cost-driven general join view maintenance over distributed data sources. In Proceedings of the 2005 International Conference on Data Engineering, pages 578–579, Tokyo, Japan.
Madden, S., Shah, M. A., Hellerstein, J. M., and Raman, V. (2002). Continuously adaptive continuous queries over streams. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, Wisconsin, USA.
Mokbel, M. F., Lu, M., and Aref, W. G. (2004). Hash-merge join: A non-blocking join algorithm for producing fast and early join results. In Proceedings of the 2004 International Conference on Data Engineering, pages 251–263, Boston, Massachusetts, USA.
Olken, F. (1993). Random Sampling from Databases. PhD thesis, University of California at Berkeley.
Quass, D., Gupta, A., Mumick, I. S., and Widom, J. (1996). Making views self-maintainable for data warehousing. In Proceedings of the 1996 International Conference on Parallel and Distributed Information Systems, pages 158–169, Miami Beach, Florida, USA.
Raman, V., Deshpande, A., and Hellerstein, J. M. (2003). Using state modules for adaptive query processing. In Proceedings of the 2003 International Conference on Data Engineering, pages 353–364, Bangalore, India.
Srivastava, U. and Widom, J. (2004). Memory-limited execution of windowed stream joins. In Proceedings of the 2004 International Conference on Very Large Data Bases, pages 324–335, Toronto, Canada.
Tao, Y., Yiu, M. L., Papadias, D., Hadjieleftheriou, M., and Mamoulis, N. (2005). RPJ: Producing fast join results on streams through rate-based optimization. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pages 371–382, Baltimore, Maryland, USA.
Tatbul, N., Cetintemel, U., Zdonik, S. B., Cherniack, M., and Stonebraker, M. (2003). Load shedding in a data stream manager. In Proceedings of the 2003 International Conference on Very Large Data Bases, pages 309–320, Berlin, Germany.
Tucker, P. A., Maier, D., Sheard, T., and Fegaras, L. (2003). Exploiting punctuation semantics in continuous data streams. IEEE Transactions on Knowledge and Data Engineering, 15(3):555–568.
Urhan, T. and Franklin, M. J. (2001). Dynamic pipeline scheduling for improving interactive query performance. In Proceedings of the 2001 International Conference on Very Large Data Bases, pages 501–510, Roma, Italy.
Viglas, S. D. and Naughton, J. F. (2002). Rate-based query optimization for streaming information sources. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pages 37–48, Madison, Wisconsin, USA.
Viglas, S. D., Naughton, J. F., and Burger, J. (2003). Maximizing the output rate of multi-way join queries over streaming information sources. In Proceedings of the 2003 International Conference on Very Large Data Bases, pages 285–296, Berlin, Germany.
Wilschut, A. N. and Apers, P. M. G. (1991). Dataflow query execution in a parallel main-memory environment. In Proceedings of the 1991 International Conference on Parallel and Distributed Information Systems, pages 68–77, Miami Beach, Florida, USA.
Xie, J., Yang, J., and Chen, Y. (2005). On joining and caching stochastic streams. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pages 359–370, Baltimore, Maryland, USA.
Zhuge, Y., Garcia-Molina, H., and Wiener, J. L. (1998). Consistency algorithms for multi-source warehouse view maintenance. Distributed and Parallel Databases, 6(1):7–40.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Xie, J., Yang, J. (2007). A Survey of Join Processing in Data Streams. In: Aggarwal, C.C. (eds) Data Streams. Advances in Database Systems, vol 31. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-47534-9_10
Download citation
DOI: https://doi.org/10.1007/978-0-387-47534-9_10
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-28759-1
Online ISBN: 978-0-387-47534-9
eBook Packages: Computer ScienceComputer Science (R0)