Abstract
Distributed steam processing is necessary for a large class of stream-based applications. To exploit the full power of distributed computation, effective load distribution techniques must be developed to optimize the system performance and cope with time-varying loads. When traditional load balancing or load sharing strategies are applied to such systems, we find that they either fall short in achieving good load distribution or fail to maintain good task partition in the long run.
In this paper, we study two important issues of dynamic load distribution in the context of data-intensive stream processing. The first one is how to allocate processing resources for push-based tasks such that the average end-to-end data processing latency can be minimized. The second issue is how to maintain a good load distribution dynamically for long running continuous queries. We propose a new hybrid load distribution strategy that addresses the above concerns by load clustering. To achieve scalability, our algorithm is completely decentralized and asynchronous.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bouganim, L., Florescu, D., Valduriez, P.: Dynamic load balancing in hierarchical parallel database systems. In: Int’l. Conf. on Very Large Data Bases (VLDB), Bombay, India, pp. 436–447 (September 1996)
Carney, D., Cetintemel, U., Cherniack, M., Convey, C., Lee, S., Seidman, G., Stonebraker, M., Tatbul, N., Zdonik, S.: Monitoring Streams – A New Class of Data Management Applications. In: Int’l. Conf. on Very Large Data Bases (VLDB), Hong Kong, China, pp. 215–226 (August 2002)
Carney, D., Cetintemel, U., Rasin, A., Zdonik, S., Cherniack, M., Stonebraker, M.: Operator scheduling in a data stream manager. In: Int’l. Conf. on Very Large Data Bases (VLDB), Berlin, Germany (September 2003)
Chandrasekaran, S., Deshpande, A., Franklin, M., Hellerstein, J., Hong, W., Krishnamurthy, S., Madden, S., Raman, V., Reiss, F., Shah, M.: TelegraphCQ: Continuous dataflow processing for an uncertain world. In: CIDR Conference, Asilomar, CA, pp. 269–280 (January 2003)
Cherniack, M., Balakrishnan, H., Balazinska, M., Carney, D., Cetintemel, U., Xing, Y., Zdonik, S.: Scalable Distributed Stream Processing. In: CIDR Conference, Asilomar, CA, pp. 257–268 (January 2003)
DeWitt, D., Gray, J.: Parallel database systems: the future of high performance database systems. Communications of the ACM 35(6), 85–98 (1992)
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman and Co., New York (1979)
Hendrickson, B., Devine, K.: Dynamic load balancing in computational mechanics. Computer Methods in Applied Mechanics and Engineering 184, 485–500 (2000)
Kremien, O., Kramer, J., Magee, J.: Scalable, adaptive load sharing for distributed systems. IEEE Parallel and Distributed Technology: Systems and Applications 1(3), 62–70 (1993)
Motwani, R., Widom, J., Arasu, A., Babcock, B., Babu, S., Datar, M., Manku, G., Olston, C., Rosenstein, J., Varma, R.: Query processing, approximation, and resource management in a data stream management system. In: CIDR Conference, Asilomar, CA, pp. 245–256 (January 2003)
Rahm, E., Marek, R.: Dynamic multi-resource load balancing in parallel database systems. In: Int’l. Conf. on Very Large Data Bases (VLDB), pp. 395–406 (1995)
Schloegel, K., Karypis, G., Kumar, V.: Graph Partitioning for High Performance Scientific Simulations. CRPC Parallel Computing Handbook. Morgan Kaufmann, San Francisco (2000)
Shah, M.A., Hellerstein, J.M., Chandrasekaran, S., Franklin, M.J.: Flux: An Adaptive Partitioning Operator for Continuous Query Systems. In: ICDE Conference, pp. 25–36 (2003)
Willebeek, M.H., Reeves, A.P.: Strategies for dynamic load balancing on highly parallel computers. IEEE Trans. on Parallel and Distributed Systems 4(9), 979–993 (1993)
Xu, C.Z., Monien, B., Luling, R., Lau, F.C.M.: Nearest neighbor algorithms for load balancing in parallel computers. Concurrency: Practice and Experience 9(12), 1351–1376 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xing, Y. (2004). Load Distribution for Distributed Stream Processing. In: Lindner, W., Mesiti, M., Türker, C., Tzitzikas, Y., Vakali, A.I. (eds) Current Trends in Database Technology - EDBT 2004 Workshops. EDBT 2004. Lecture Notes in Computer Science, vol 3268. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30192-9_11
Download citation
DOI: https://doi.org/10.1007/978-3-540-30192-9_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23305-3
Online ISBN: 978-3-540-30192-9
eBook Packages: Computer ScienceComputer Science (R0)