The Impact of More Accurate Requested Runtimes on Production Job Scheduling Performance

  • Su-Hui Chiang
  • Andrea Arpaci-Dusseau
  • Mary K. Vernon
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2537)

Abstract

The question of whether more accurate requested runtimes can significantly improve production parallel system performance has previously been studied for the FCFS-backfill scheduler, using a limited set of system performance measures. This paper examines the question for higher performance backfill policies, heavier system loads as are observed in current leading edge production systems such as the large Origin 2000 system at NCSA, and a broader range of system performance measures. The new results show that more accurate requested runtimes can improve system performance much more significantly than suggested in previous results. For example, average slowdown decreases by a factor of two to six, depending on system load and the fraction of jobs that have the more accurate requests. The new results also show that (a) nearly all of the performance improvement is realized even if the more accurate runtime requests are a factor of two higher than the actual runtimes, (b) most of the performance improvement is achieved when test runs are used to obtain more accurate runtime requests, and (c) in systems where only a fraction (e.g., 60%) of the jobs provide approximately accurate runtime requests, the users that provide the approximately accurate requests achieve even greater improvements in performance, such as an order of magnitude improvement in average slowdown for jobs that have runtime up to fifty hours.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    National Computational Science Alliance Scientific Computing: Silicon Graphics Origin2000. (http://www.ncsa.uiuc.edu/SCD/Hardware/Origin2000) 103, 106
  2. [2]
    NCSA Scientific Computing: IA-32 Linux Cluster. (http://www.ncsa.uiuc.edu/UserInfo/Resources/Hardware/IA32LinuxCluster) 103
  3. [3]
    Lifka, D.: The ANL/IBM SP scheduling system. In: Proc. 1st Workshop on Job Scheduling Strategies for Parallel Processing, Santa Barbara, Lecture Notes in Comp. Sci. Vol. 949, Springer-Verlag (1995) 295–303 103, 109Google Scholar
  4. [4]
    Skovira, J., Chan, W., Zhou, H., Lifka, K.: The EASY-Loadleveler API Project. In: Proc. 2nd Workshop on Job Scheduling Strategies for Parallel Processing, Honolulu, Lecture Notes in Comp. Sci. Vol. 1162, Springer-Verlag (1996) 41–47 103, 109Google Scholar
  5. [5]
    Chiang, S. H., Vernon, M. K.: Production job scheduling for parallel shared memory systems. In: Proc. Int’l. Parallel and Distributed Processing Symp. (IPDPS) 2001, San Francisco (2001) 104, 106, 107, 108, 109, 111, 115, 116Google Scholar
  6. [6]
    Feitelson, D.G., Mu’alem Weil, A.: Utilization and predictability in scheduling the IBM SP2 with backfilling. In: Proc. 12th Int’l. Parallel Processing Symp., Orlando (1998) 542–546 104, 109, 110, 113, 116, 117Google Scholar
  7. [7]
    Mu’alem, A.W., Feitelson, D. G.: Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Trans. Parallel and Distributed Syst. 12 (2001) 529–543 104, 108, 109, 110CrossRefGoogle Scholar
  8. [8]
    Cirne, W., Berman, F.: A comprehensive model of the supercomputer workload. In: Proc. IEEE 4th Annual Workshop on Workload Characterization, Austin, TX. (2001) 104, 108Google Scholar
  9. [9]
    Chiang, S. H., Vernon, M. K.: Characteristics of a large shared memory production workload. In: Proc. 7th Workshop on Job Scheduling Strategies for Parallel Processing, Cambridge, MA. (2001) 104, 106Google Scholar
  10. [10]
    Smith, W., Taylor, V., Foster, I.: Using run-time predictions to estimate queue wait times and improve scheduler performance. In: Proc. 5th Workshop on Job Scheduling Strategies for Parallel Processing, San Juan, Lecture Notes in Comp. Sci. Vol. 1659, Springer-Verlag (1999) 202–219 104, 109, 116Google Scholar
  11. [11]
    Zhang, Y., Franke, H., Moreira, J.E., Sivasubramaniam, A.: Improving parallel job scheduling by combining gang scheduling and backfilling techniques. In: Proc. Int’l. Parallel and Distributed Processing Symp. (IPDPS) 2000, Cancun (2000) 104, 108, 109, 110, 116, 117Google Scholar
  12. [12]
    Zhang, Y., Franke, H., Moreira, J.E., Sivasubramaniam, A.: An analysis of spaceand time-sharing techniques for parallel job scheduling. In: Proc. 7th Workshop on Job Scheduling Strategies for Parallel Processing, Cambridge, MA. (2001) 104, 109, 110, 116, 117Google Scholar
  13. [13]
    Zotkin, D., Keleher, P. J.: Job-length estimation and performance in backfilling schedulers. In: 8th IEEE Int’l Symp. on High Performance Distributed Computing, Redondo Beach (1999) 236–243 108, 109, 110, 116, 117Google Scholar
  14. [14]
    Perkovic, D., Keleher, P. J.: Randomization, speculation, and adaptation in batch schedulers. In: Proc. 2000 ACM/IEEE Supercomputing Conf., Dallas (2000) 108, 109Google Scholar
  15. [15]
    Gibbons, R.: A historical application profiler for use by parallel schedulers. In: Proc. 3rd Workshop on Job Scheduling Strategies for Parallel Processing, Geneva, Lecture Notes in Comp. Sci. Vol. 1291, Springer-Verlag (1997) 109Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Su-Hui Chiang
    • 1
  • Andrea Arpaci-Dusseau
    • 1
  • Mary K. Vernon
    • 1
  1. 1.Computer Sciences DepartmentUniversity of WisconsinMadison

Personalised recommendations