Parallel Job Scheduling under Dynamic Workloads

  • Eitan Frachtenberg
  • Dror G. Feitelson
  • Juan Fernandez
  • Fabrizio Petrini
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2862)


Jobs that run on parallel systems that use gang scheduling for multiprogramming may interact with each other in various ways. These interactions are affected by system parameters such as the level of multiprogramming and the scheduling time quantum. A careful evaluation is therefore required in order to find parameter values that lead to optimal performance. We perform a detailed performance evaluation of three factors affecting scheduling systems running dynamic workloads: multiprogramming level, time quantum, and the use of backfilling for queue management — and how they depend on offered load. Our evaluation is based on synthetic MPI applications running on a real cluster that actually implements the various scheduling schemes. Our results demonstrate the importance of both components of the gang-scheduling plus backfilling combination: gang scheduling reduces response time and slowdown, and backfilling allows doing so with a limited multiprogramming level. This is further improved by using flexible coscheduling rather than strict gang scheduling, as this reduces the constraints and allows for a denser packing.


Cluster computing dynamic workloads job scheduling gang scheduling parallel architectures heterogeneous clusters STORM flexible coscheduling 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Arpaci-Dusseau, A.C.: Implicit Coscheduling: Coordinated Scheduling with Implicit Information in Distributed Systems. ACM Transactions on Computer Systems 19(3), 283–331 (2001)CrossRefGoogle Scholar
  2. 2.
    Batat, A., Feitelson, D.G.: Gang Scheduling with Memory Considerations. In: International Parallel and Distributed Processing Symposium, May 2000, vol. 14, pp. 109–114 (2000)Google Scholar
  3. 3.
    Etsion, Y., Tsafrir, D., Feitelson, D.G.: Effects of Clock Resolution on the Scheduling of Interactive and Soft Real-Time Processes. In: SIGMETRICS Conf. Measurement and Modeling of Comput. Syst. (June 2003) (to appear)Google Scholar
  4. 4.
    Feitelson, D.G.: A Survey of Scheduling in Multiprogrammed Parallel Systems. Research Report RC 19790 (87657), IBM T. J. Watson Research Center (October 1994)Google Scholar
  5. 5.
    Feitelson, D.G.: The Forgotten Factor: Facts; on Performance Evaluation and Its Dependence on Workloads. In: Monien, B., Feldmann, R.L. (eds.) Euro-Par 2002. LNCS, vol. 2400, pp. 49–60. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  6. 6.
    Feitelson, D.G., Rudolph, L.: Gang Scheduling Performance Benefits for Fine-Grain Synchronization. Journal of Parallel and Distributed Computing 16(4), 306–318 (1992)zbMATHCrossRefGoogle Scholar
  7. 7.
    Feitelson, D.G., Rudolph, L.: Metrics and Benchmarking for Parallel Job Scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1998, SPDP-WS 1998, and JSSPP 1998. LNCS, vol. 1459, pp. 1–24. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  8. 8.
    Feitelson, D.G., Rudolph, L., Schwiegelshohn, U., Sevcik, K.C., Wong, P.: Theory and Practice in Parallel Job Scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1997 and JSSPP 1997. LNCS, vol. 1291, pp. 1–34. Springer, Heidelberg (1997)Google Scholar
  9. 9.
    Frachtenberg, E., Feitelson, D.G., Petrini, F., Fernandez, J.: Flexible CoScheduling: Mitigating load imbalance and improving utilization of heterogeneous resources. In: International Parallel and Distributed Processing Symposium (April 2003, vol. 17 (2003)Google Scholar
  10. 10.
    Frachtenberg, E., Petrini, F., Fernandez, J., Pakin, S., Coll, S.: STORM: Lightning-Fast Resource Management. In: Supercomputing 2002, Baltimore, MD (November 2002)Google Scholar
  11. 11.
    Gupta, A., Tucker, A., Urushibara, S.: The Impact of Operating System Scheduling Policies and Synchronization Methods on the Performance of Parallel Applications. In: SIGMETRICS Conf. Measurement and Modeling of Comput. Syst., May 1991, pp. 120–132 (1991)Google Scholar
  12. 12.
    Lifka, D.: The ANL/IBM SP Scheduling System. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1995 and JSSPP 1995. LNCS, vol. 949, pp. 295–303. Springer, Heidelberg (1995)Google Scholar
  13. 13.
    Lublin, U., Feitelson, D.G.: The Workload on Parallel Supercomputers: Modeling the Characteristics of Rigid Jobs. Journal of Parallel and Distributed Computing (2003) (to appear)Google Scholar
  14. 14.
    Moreira, J.E., Chan, W., Fong, L.L., Franke, H., Jette, M.A.: An Infrastructure for Efficient Parallel Job Execution in Terascale Computing Environments. In: Supercomputing 1998 (November 1998)Google Scholar
  15. 15.
    Mualem, A.W., Feitelson, D.G.: Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling. IEEE Transactions on Parallel and Distributed Systems 12(6), 529–543 (2001)CrossRefGoogle Scholar
  16. 16.
    Ousterhout, J.K.: Scheduling Techniques for Concurrent Systems. In: 3rd Intl. Conf. Distributed Comput. Syst. (ICDCS), October 1982, pp. 22–30 (1982)Google Scholar
  17. 17.
    Petrini, F., Feng, W.c., Hoisie, A., Coll, S., Frachtenberg, E.: The Quadrics Network: High Performance Clustering Technology. IEEE Micro 22(l), 46–57 (2002)CrossRefGoogle Scholar
  18. 18.
    Quadrics Supercomputers World Ltd. Elan Reference Manual (January 1999)Google Scholar
  19. 19.
    Quadrics Supercomputers World Ltd. Elan Programming Manual (May 2002)Google Scholar
  20. 20.
    Talby, D., Feitelson, D.G., Raveh, A.: Comparing Logs and Models ofParallel Workloads Using the Co-Plot Method. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1999, IPPS-WS 1999, and SPDP-WS 1999. LNCS, vol. 1659, pp. 43–66. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  21. 21.
    Valiant, L.G.: A Bridging Model for Parallel Computation. Communications of the ACM 33(8), 103–111 (1990)CrossRefGoogle Scholar
  22. 22.
    Zhang, Y., Franke, H., Moreira, J.E., Sivasubramaniam, A.: Improving Parallel Job Scheduling by Combining Gang Scheduling and Backfilling Techniques. In: Intl. Parallel & Distributed Processing Symp., May 2000, vol. 14, pp. 133–142 (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Eitan Frachtenberg
    • 1
    • 2
  • Dror G. Feitelson
    • 2
  • Juan Fernandez
    • 1
  • Fabrizio Petrini
    • 1
  1. 1.CCS-3 Modeling, Algorithms, and Informatics Group, Computer and Computational Sciences (CCS) DivisionLos Alamos National Laboratory (LANL) 
  2. 2.School of Computer Science and EngineeringThe Hebrew UniversityJerusalemIsrael

Personalised recommendations