Costs and Benefits of Load Sharing in the Computational Grid
We present an analysis of the costs and benefits of load sharing of parallel jobs in the computational grid. We begin with a workload generation model that captures the essential properties of parallel jobs and use it as input to a grid simulation model. Our experiments are performed for both homogeneous and heterogeneous grids. We measured average job slowdown with respect to both local and remote jobs and we show that, with some reasonable assumptions concerning the migration policy, load sharing proves to be beneficial when the grid is homogeneous, and that load sharing can adversely affect job slowdown for lightly-loaded machines in a heterogeneous grid. With respect to the number of sites in a grid, we find that the benefits obtained by load sharing do not scale well. Small to modest-size grids can employ load sharing as effectively as large-scale grids. We also present and evaluate an effective scheduling heuristic for migrating a job within the grid.
KeywordsLoad Sharing Queue Time Workload Model Heterogeneous Grid Average Slowdown
Unable to display preview. Download preview PDF.
- 1.Cirne, W., Berman, F.: A comprehensive model of the supercomputer workload. In: 4th Workshop on Workload Characterization (2001)Google Scholar
- 2.Eager, D.L., Lazowska, E.D., Zahorjan, J.: Adaptive load sharing in homogenous distributed systems. IEEE Transactions on Software Engineering SE-12 (1986)Google Scholar
- 4.Foster, I., Kesselman, C. (eds.): The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, San Francisco (1998)Google Scholar
- 5.The Globus Alliance: Project Website (2003), http://www.globus.org
- 7.Hollingsworth, J.K., Maneewongvatana, S.: Imprecise calendars: an approach to scheduling computational grids. In: 19th IEEE International Conference on Distributed Computing Systems (1999)Google Scholar
- 8.Law, A.M., Kelton, W.D.: Simulation Modeling and Analysis, 2nd edn. McGraw-Hill, New York (1991)Google Scholar
- 11.Mu’alem, A., Feitelson, D.: Utilization, predictability, workloads, and user run time estimates in scheduling the IBM SP2 with backfilling. IEEE Transactions on Parallel and Distributed Systems 12 (2001)Google Scholar
- 12.Parallel Workload Archive: The Hebrew university of Jerusalem, school of computer science and engineering (2002), http://www.cs.huji.ac.il/labs/parallel/workload
- 14.Subramani, V., et al.: Distributed job scheduling on computational grids using multiple simultaneous requests. In: 11th IEEE International Symposium on High Performance Distributed Computing (2002)Google Scholar
- 15.The TeraGrid Project: A distributed computing infrastructure for scientific research (2003), http://www.teragrid.org
- 16.Trivedi, K.S.: Probability and Statistics with Reliability, Queueing and Computer Science Applications, 2nd edn. John Wiley and Sons, Inc., Chichester (2002)Google Scholar
- 17.Vazhkudai, S., et al.: Predicting the performance of wide area data transfers. In: Proceedings of the International Parallel and Distributed Processing Symposium (2002)Google Scholar