Abstract
Data stream processing has become increasingly important as many emerging applications call for sophisticated realtime processing over data streams, such as stock trading surveillance, network traffic monitoring, and sensor data analysis. Stream joins are among the most important stream processing operations, which can be used to detect linkages and correlations between different data streams. One major challenge in processing stream joins is to handle continuous, high-volume, and time-varying data streams under resource constraints. In this paper, we present a novel load diffusion system to enable scalable execution of resource-intensive stream joins using an ensemble of server hosts. The load diffusion is achieved by a simple correlation-aware stream partition algorithm. Different from previous work, the load diffusion system can (1) achieve fine-grained load sharing in the distributed stream processing system; and (2) produce exact query answers without missing any join results or generate duplicate join results. Our experimental results show that the load diffusion scheme can greatly improve the system throughput and achieve more balanced load distribution.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Amza, C., Cox, A., Zwaenepoel, W.: Consistent Replication for Scaling Back-end Databases of Dynamic Content Web Sites. In: Proc. of the ACM/IFIP/Usenix Middleware Conference (June 2003)
Balazinska, M., Balakrishnan, H., Stonebraker, M.: Contract-based Load Management in Federated Distributed Systems. In: Proc. of 1st Symposium on Networked Systems Design and Implementation (NSDI) (March 2004)
Cybenko, G.: Dynamic load balancing for distributed memory multiprocessors. Journal of Parallel and Distributed Computing 7(2), 279–301 (1989)
Gu, X., Yu, P.S., Nahrstedt, K.: Optimal Component Composition for Scalable Stream Processing. In: Proc. of IEEE International Conference on Distributed Computing Systems (ICDCS) (June 2005)
Krishnamurthy, S., et al.: TelegraphCQ: An Architectural Status Report. IEEE Data Engineering Bulletin 26(1), 11–18 (2003)
Krishna, A., Schmidt, D.C., Klefstad, R.: Enhancing Real-Time CORBA via Real-Time Java. In: Proceedings of the 24th IEEE International Conference on Distributed Computing Systems (ICDCS), May 23-26 (2004)
Srivastava, U., Widom, J.: Memory Limited Execution of Windowed Stream Joins. In: Proc. of the 30th International Conference on Very Large Databases (VLDB) (August 2004)
Tatbul, N., Çetintemel, U., Zdonik, S., Cherniack, M., Stonebraker, M.: Load Shedding in a Data Stream Manager. In: Proc. of the 29th International Conference on Very Large Data Bases (VLDB) (September 2003)
Xing, Y., Zdonik, S.B., Hwang, J.-H.: Dynamic Load Distribution in the Borealis Stream Processor. In: Proc. of International Conference on Data Engineering (ICDE) (April 2005)
Shah, M.A., Hellerstein, J.M., Chandrasekaran, S., Franklin, M.J.: Flux: An Adaptive Partitioning Operator for Continuous Query Systems. In: Proc. of the 19th International Conference on Data Engineering (ICDE) (March 2003)
The STREAM Group. STREAM: The Stanford Stream Data Manager. IEEE Data Engineering Bulletin 26(1), 19–26 (March 2003)
Zdonik, S., et al.: The Aurora and Medusa Projects. IEEE Data Engineering Bulletin 26(1) (March 2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 IFIP International Federation for Information Processing
About this paper
Cite this paper
Gu, X., Yu, P.S. (2005). Adaptive Load Diffusion for Stream Joins. In: Alonso, G. (eds) Middleware 2005. Middleware 2005. Lecture Notes in Computer Science, vol 3790. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11587552_22
Download citation
DOI: https://doi.org/10.1007/11587552_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30323-7
Online ISBN: 978-3-540-32269-6
eBook Packages: Computer ScienceComputer Science (R0)