Advertisement

Costs and Benefits of Load Sharing in the Computational Grid

  • Darin England
  • Jon B. Weissman
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3277)

Abstract

We present an analysis of the costs and benefits of load sharing of parallel jobs in the computational grid. We begin with a workload generation model that captures the essential properties of parallel jobs and use it as input to a grid simulation model. Our experiments are performed for both homogeneous and heterogeneous grids. We measured average job slowdown with respect to both local and remote jobs and we show that, with some reasonable assumptions concerning the migration policy, load sharing proves to be beneficial when the grid is homogeneous, and that load sharing can adversely affect job slowdown for lightly-loaded machines in a heterogeneous grid. With respect to the number of sites in a grid, we find that the benefits obtained by load sharing do not scale well. Small to modest-size grids can employ load sharing as effectively as large-scale grids. We also present and evaluate an effective scheduling heuristic for migrating a job within the grid.

Keywords

Load Sharing Queue Time Workload Model Heterogeneous Grid Average Slowdown 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Cirne, W., Berman, F.: A comprehensive model of the supercomputer workload. In: 4th Workshop on Workload Characterization (2001)Google Scholar
  2. 2.
    Eager, D.L., Lazowska, E.D., Zahorjan, J.: Adaptive load sharing in homogenous distributed systems. IEEE Transactions on Software Engineering SE-12 (1986)Google Scholar
  3. 3.
    Feitelson, D.G.: Packing schemes for gang scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1996 and JSSPP 1996. LNCS, vol. 1162, pp. 89–110. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  4. 4.
    Foster, I., Kesselman, C. (eds.): The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, San Francisco (1998)Google Scholar
  5. 5.
    The Globus Alliance: Project Website (2003), http://www.globus.org
  6. 6.
    Gross, D.M., Harris, C.M.: Fundamentals of Queueing Theory, 2nd edn. John Wiley and Sons, Chichester (1985)zbMATHGoogle Scholar
  7. 7.
    Hollingsworth, J.K., Maneewongvatana, S.: Imprecise calendars: an approach to scheduling computational grids. In: 19th IEEE International Conference on Distributed Computing Systems (1999)Google Scholar
  8. 8.
    Law, A.M., Kelton, W.D.: Simulation Modeling and Analysis, 2nd edn. McGraw-Hill, New York (1991)Google Scholar
  9. 9.
    Lee, C.B., et al.: Are user runtime estimates inherently inaccurate? In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 253–263. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  10. 10.
    Lo, V., Mache, J., Windisch, K.: A comparative study of real workload traces and synthetic workload models for parallel job scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1998, SPDP-WS 1998, and JSSPP 1998. LNCS, vol. 1459, pp. 25–46. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  11. 11.
    Mu’alem, A., Feitelson, D.: Utilization, predictability, workloads, and user run time estimates in scheduling the IBM SP2 with backfilling. IEEE Transactions on Parallel and Distributed Systems 12 (2001)Google Scholar
  12. 12.
    Parallel Workload Archive: The Hebrew university of Jerusalem, school of computer science and engineering (2002), http://www.cs.huji.ac.il/labs/parallel/workload
  13. 13.
    Smith, W., Foster, I., Taylor, V.: Predicting application run times using historical information. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1998, SPDP-WS 1998, and JSSPP 1998. LNCS, vol. 1459, pp. 122–142. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  14. 14.
    Subramani, V., et al.: Distributed job scheduling on computational grids using multiple simultaneous requests. In: 11th IEEE International Symposium on High Performance Distributed Computing (2002)Google Scholar
  15. 15.
    The TeraGrid Project: A distributed computing infrastructure for scientific research (2003), http://www.teragrid.org
  16. 16.
    Trivedi, K.S.: Probability and Statistics with Reliability, Queueing and Computer Science Applications, 2nd edn. John Wiley and Sons, Inc., Chichester (2002)Google Scholar
  17. 17.
    Vazhkudai, S., et al.: Predicting the performance of wide area data transfers. In: Proceedings of the International Parallel and Distributed Processing Symposium (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Darin England
    • 1
  • Jon B. Weissman
    • 1
  1. 1.Department of Computer Science and EngineeringUniversity of MinnesotaTwin Cities

Personalised recommendations