Scheduling of Parallel Jobs in a Heterogeneous Multi-site Environment

  • Gerald Sabin
  • Rajkumar Kettimuthu
  • Arun Rajan
  • Ponnuswamy Sadayappan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2862)

Abstract

Most previous research on job scheduling for heterogeneous systems considers a scenario where each job or task is mapped to a single processor. On the other hand, research on parallel job scheduling has concentrated primarily on the homogeneous context. In this paper, we address the scheduling of parallel jobs in a heterogeneous multi-site environment, where each site has a homogeneous cluster of processors, but processors at different sites have different speeds. Starting with a simple greedy scheduling strategy, we propose and evaluate several enhancements using trace driven simulations. We consider the use of multiple simultaneous reservations at different sites, use of relative job efficacy as a queuing priority, and compare the use of conservative versus aggressive backfilling. Unlike the single-site case, conservative backfilling is found to be consistently superior to aggressive backfilling for the heterogeneous multi-site environment.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Cluster Ohio Initiative, http://oscinfo.osc.edu/clusterohio
  2. 2.
  3. 3.
    Bailey, D., Harris, T., Saphir, W., Wijngaart, R., Woo, A., Yarrow, M.: Thenas parallel benchmarks 2.0. Technical Report Report NAS-95-020, NASA Ames Research Center, Numerical Aerodynamic Simulation Facility (December 1995)Google Scholar
  4. 4.
    Braun, T.D., Siegel, H.J., Beck, N., Boloni, L., Maheswaran, M., Reuther, A.I., Robertson, J.P., Theys, M.D., Yao, B., Hensgen, D.A., Freund, R.F.: A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems. Journal of Parallel and Distributed Computing 61, 810–837 (2001)CrossRefGoogle Scholar
  5. 5.
    Casanova, H., Obertelli, G., Berman, F., Wolski, R.: The AppLeS Parameter Sweep Template: User-Level Middleware for the Grid. In: Supercomputing (November 2000)Google Scholar
  6. 6.
    Chiang, S.H., Vernon, M.K.: Production job scheduling for parallel shared memory systems. In: Proceedings of International Parallel and Distributed Processing Symposium (2002)Google Scholar
  7. 7.
    Dail, H., Casanova, H., Berman, F.: A Decoupled Scheduling Approach for the GrADS Environment. In: Proceedings of Supercomputing (November 2002)Google Scholar
  8. 8.
    Ernemann, C., Hamscher, V., Yahyapour, R.: Economic scheduling in grid computing. In: 8th Workshop on Job Scheduling Strategies for Parallel Processing, in conjunction with the High Performance Distributed Computing Symposium (HPDC 2002) (July 2002)Google Scholar
  9. 9.
    Ernemann, C., Hamscher, V., Schwiegelshohn, U., Yahyapour, R., Streit, A.: On advantages of grid computing for parallel job schedulingGoogle Scholar
  10. 10.
    Feitelson, D.: Workshops on job scheduling strategies for parallel processing, http://www.cs.huji.ac.il/feit/parsched/
  11. 11.
    Feitelson, D., Jette, M.: Improved utilization and responsiveness with gang scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1997 and JSSPP 1997. LNCS, vol. 1291, pp. 238–261. Springer, Heidelberg (1997)Google Scholar
  12. 12.
    Feitelson, D.G.: Logs of real parallel workloads from production systems, http://www.cs.huji.ac.il/labs/parallel/workload/
  13. 13.
    Feitelson, D.G., Rudolph, L., Schwiegelshohn, U., Sevcik, K.C., Wong, P.: Theory and practice in parallel job scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1997 and JSSPP 1997. LNCS, vol. 1291, pp. 1–34. Springer, Heidelberg (1997)Google Scholar
  14. 14.
    Foster, I., Kesselman, C.: Globus: A metacomputing infrastructure toolkit. Intl. J. Supercomputer Applications 11, 115–128 (1997)CrossRefGoogle Scholar
  15. 15.
    Frey, J., Tannenbaum, T., Livny, M., Foster, I., Tuecke, S.: Condor-g: A computation management agent for multi-institutional grids. In: Proc. Intl. Symp. On High Performance Distributed Computing (2001)Google Scholar
  16. 16.
    Gehring, J., Preiss, T.: Scheduling a metacomputer with uncooperative sub-schedulers. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1999, IPPS-WS 1999, and SPDP-WS 1999. LNCS, vol. 1659, pp. 179–201. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  17. 17.
    Gehring, J., Reinefeld, A.: MARS - A Framework for Minimizing the Job Execution Time in a Metacomputing Environment. In: Future Generation Computer Systems - 12, Vol. 1, pp. 87–90 (1996)Google Scholar
  18. 18.
    Gehring, J., Streit, A.: Robust resource management for metacomputers. High Performance Distributed Computing, 105–111 (2000)Google Scholar
  19. 19.
    Grimshaw, A.S., Wulf, W.A., the Legion team: The legion vision of a worldwide computer. Communications of the ACM, 39–45 (January 1997)Google Scholar
  20. 20.
    Hamscher, V., Schwiegelshohn, U., Streit, A., Yahyapour, R.: Evaluation of job-scheduling strategies for grid computing. In: Buyya, R., Baker, M. (eds.) GRID 2000. LNCS, vol. 1971, pp. 191–202. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  21. 21.
    Holenarsipur, P., Yarmolenko, V., Duato, J., Panda, D.K., Sadayappan, P.: Characterization and enhancement of static mapping heuristics for heterogeneous systems. In: Intl. Conf. On High-Performance Computing (December 2000)Google Scholar
  22. 22.
    James, H.A., Hawick, K.A., Coddington, P.D.: Scheduling independent tasks on metacomputing systems. In: Parallel and Distributed Systems (1999)Google Scholar
  23. 23.
    Jones, J.P., Nitzberg, B.: Scheduling for parallel supercomputing: A historical perspective of achievable utilization. In: 5th Workshop on Job Scheduling Strategies for Parallel Processing (1999)Google Scholar
  24. 24.
    Romberg, M.: The UNICORE architecture: Seamless Access to Distributed Resources. In: HPDC 1999, pp. 287–293 (1999)Google Scholar
  25. 25.
    Saphir, W., Woo, A., Yarrow, M.: The NAS Parallel Benchmarks 2.1 Results. Technical Report NAS-96-010, NASA (August 1996), http://www.nas.nasa.gov/NAS/NPB/Reports/NAS-96-010.ps
  26. 26.
    Skovira, J., Chan, W., Zhou, H., Lifka, D.: The EASY - LoadLeveler API project. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1996 and JSSPP 1996. LNCS, vol. 1162, pp. 41–47. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  27. 27.
    Snell, Q., Clement, M., Jackson, D., Gregory, C.: The performance impact of advance reservation meta-scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) IPDPS-WS 2000 and JSSPP 2000. LNCS, vol. 1911, p. 137. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  28. 28.
    Srinivasan, S., Kettimuthu, R., Subramani, V., Sadayappan, P.: Characterization of backfilling strategies for job scheduling. In: 2002 Intl. Workshops on Parallel Processing, held in conjunction with the 2002 Intl. Conf. on Parallel Processing, ICPP 2002 (August 2002)Google Scholar
  29. 29.
    Srinivasan, S., Kettimuthu, R., Subramani, V., Sadayappan, P.: Selective reservation strategies for backfill job scheduling. In: 8th Workshop on Job Scheduling Strategies for Parallel Processing (July 2002)Google Scholar
  30. 30.
    Subramani, V., Kettimuthu, R., Srinivasan, S., Sadayappan, P.: Distributed job scheduling on computational grids using multiple simultaneous requests. In: Proceedings of the 11th High Performance Distributed Computing Conference (2002)Google Scholar
  31. 31.
    Talby, D., Feitelson, D.: Supporting priorities and improving utilization of the ibm sp scheduler using slack-based backfilling. In: Proceedings of the 13th International Parallel Processing Symposium (1999)Google Scholar
  32. 32.
    Vadhiyar, S.S., Dongarra, J.J.: A metascheduler for the grid. In: 11-th IEEE Symposium on High Performance Distributed Computing (July 2002)Google Scholar
  33. 33.
    Weissman, J., Grimshaw, A.: A framework for partitioning parallel computations in heterogeneous environments. Concurrency: Practice and Experience 7(5) (August 1995)Google Scholar
  34. 34.
    Yarmolenko, V., Duato, J., Panda, D.K., Sadayappan, P.: Characterization and enhancement of dynamic mapping heuristics for heterogeneous systems. In: ICPP 2000 Workshop on Network-Based Computing (August 2000)Google Scholar
  35. 35.
    Zhou, S., Zheng, X., Wang, J., Delisle, P.: Utopia: A load sharing facility for large heterogeneous distributed computer systems. Software - Practice and Experience (SPE) (December 1993)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Gerald Sabin
    • 1
  • Rajkumar Kettimuthu
    • 1
  • Arun Rajan
    • 1
  • Ponnuswamy Sadayappan
    • 1
  1. 1.The Ohio State UniversityColumbusUSA

Personalised recommendations