Advertisement

Parallel Job Scheduling — A Status Report

  • Dror G. Feitelson
  • Larry Rudolph
  • Uwe Schwiegelshohn
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3277)

Abstract

The popularity of research on the scheduling of parallel jobs demands a periodic review of the status of the field. Indeed, several surveys have been written on this topic in the context of parallel supercomputers [17, 20]. The purpose of the present paper is to update that material, and to extend it to include work concerning clusters and the grid.

Keywords

Parallel Processing Parallel Processor Advance Reservation Batch Schedule Grid Schedule 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Alverson, G., Kahan, S., Korry, R., McCann, C., Smith, B.: Scheduling on the Tera MTA. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1995 and JSSPP 1995. LNCS, vol. 949, pp. 19–44. Springer, Heidelberg (1995)Google Scholar
  2. 2.
    Banen, S., Bucur, A.I.D., Epema, D.H.J.: A measurement-based simulation study of processor co-allocation in multicluster systems. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2003. LNCS, vol. 2862, pp. 105–128. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  3. 3.
    Batat, A., Feitelson, D.G.: Gang scheduling with memory considerations. In: 14th Intl. Parallel & Distributed Processing Symp, May 2000, pp. 109–114 (2000)Google Scholar
  4. 4.
    Bucur, A.I.D., Epema, D.H.J.: The influence of communication on the performance of co-allocation. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 2001. LNCS, vol. 2221, pp. 66–86. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  5. 5.
    Bucur, A.I.D., Epema, D.H.J.: The influence of the structure and sizes of jobs on the performance of co-allocation. In: Feitelson, D.G., Rudolph, L. (eds.) IPDPS-WS 2000 and JSSPP 2000. LNCS, vol. 1911, pp. 154–173. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  6. 6.
    Chiang, S.-H., Arpaci-Dusseau, A., Vernon, M.K.: The impact of more accurate requested runtimes on production job scheduling performance. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2002. LNCS, vol. 2537, pp. 103–127. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  7. 7.
    Cirne, W., Berman, F.: Adaptive selection of partition size for supercomputer requests. In: Feitelson, D.G., Rudolph, L. (eds.) IPDPS-WS 2000 and JSSPP 2000. LNCS, vol. 1911, pp. 187–207. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  8. 8.
    Das Sharma, D., Pradhan, D.K.: Job scheduling in mesh multicomputers. In: Intl. Conf. Parallel Processing, August 1994, vol. II, pp. 251–258 (1994)Google Scholar
  9. 9.
    Ernemann, C., Hamscher, V., Schwiegelshohn, U., Streit, A., Yahyapour, R.: Enhanced Algorithms for Multi-Site Scheduling. In: Parashar, M. (ed.) GRID 2002. LNCS, vol. 2536, pp. 219–231. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  10. 10.
    Ernemann, C., Hamscher, V., Schwiegelshohn, U., Streit, A., Yahyapour, R.: On Advantages of Grid Computing for Parallel Job Scheduling. In: Proc. 2nd IEEE/ACM Int’l Symp. on Cluster Computing and the Grid (CCGRID 2002), May 2002, IEEE Press, Berlin (2002)Google Scholar
  11. 11.
    Ernemann, C., Hamscher, V., Streit, A., Yahyapour, R.: On Effects of Machine Configurations on Parallel Job Scheduling in Computational Grids. In: International Conference on Architecture of Computing Systems, ARCS, April 2002, pp. 169–179. VDE, Karlsruhe (2002)Google Scholar
  12. 12.
    Ernemann, C., Hamscher, V., Yahyapour, R.: Economic Scheduling in Grid Computing. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2002. LNCS, vol. 2537, pp. 128–152. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  13. 13.
    Ernemann, C., Yahyapour, R.: Grid Resource Management - State of the Art and Future Trends. In: Applying Economic Scheduling Methods to Grid Environments, pp. 491–506. Kluwer Academic Publishers, Dordrecht (2003)Google Scholar
  14. 14.
    Etsion, Y., Feitelson, D.G.: User-level communication in a system with gang scheduling. In: 15th Intl. Parallel & Distributed Processing Symp. (April 2001)Google Scholar
  15. 15.
    Feitelson, D.G.: Experimental Analysis of the Root Causes of Performance Evaluation Results: A Backfilling Case Study. Technical Report 2002–4, School of Computer Science and Engineering, Hebrew University (March 2002)Google Scholar
  16. 16.
    Feitelson, D.G.: Metric and workload effects on computer systems evaluation. Computer 36(9), 18–25 (2003)CrossRefGoogle Scholar
  17. 17.
    Feitelson, D.G.: A Survey of Scheduling in Multiprogrammed Parallel Systems. Research Report RC 19790 (87657), IBM T. J. Watson Research Center (October 1994)Google Scholar
  18. 18.
    Feitelson, D.G., Mu’alem Weil, A.: Utilization and predictability in scheduling the IBM SP2 with backfilling. In: 12th Intl. Parallel Processing Symp., April 1998, pp. 542–546 (1998)Google Scholar
  19. 19.
    Feitelson, D.G., Rudolph, L.: Gang scheduling performance benefits for finegrain synchronization. J. Parallel & Distributed Comput. 16(4), 306–318 (1992)zbMATHCrossRefGoogle Scholar
  20. 20.
    Feitelson, D.G., Rudolph, L.: Parallel job scheduling: issues and approaches. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1995 and JSSPP 1995. LNCS, vol. 949, pp. 1–18. Springer, Heidelberg (1995)Google Scholar
  21. 21.
    Feitelson, D.G.: The Supercomputer Industry in Light of the Top500 Data. Comput. in Science & Engineering 7(1), 42–47 (2004)CrossRefGoogle Scholar
  22. 22.
    Foster, I., Kesselman, C.: The Globus toolkit. In: Foster, I., Kesselman, C. (eds.) The Grid: Blueprint for a New Computing Infrastructure, pp. 259–278. Morgan Kaufmann, San Francisco (1999)Google Scholar
  23. 23.
    Frachtenberg, E., Feitelson, D.G., Fernandez, J., Petrini, F.: Parallel job scheduling under dynamic workloads. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2003. LNCS, vol. 2862, pp. 208–227. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  24. 24.
    Frachtenberg, E., Feitelson, D.G., Petrini, F., Fernandez, J.: Flexible coscheduling: mitigating load imbalance and improving utilization of heterogeneous resources. In: 17th Intl. Parallel & Distributed Processing Symp. (April 2003)Google Scholar
  25. 25.
    Frachtenberg, E., Petrini, F., Fernandez, J., Pakin, S., Coll, S.: STORM: lightning-fast resource management. In: Supercomputing (November 2002)Google Scholar
  26. 26.
    Hamscher, V., Schwiegelshohn, U., Streit, A., Yahyapour, R.: Evaluation of Job-Scheduling Strategies for Grid Computing. In: Buyya, R., Baker, M. (eds.) GRID 2000. LNCS, vol. 1971, pp. 191–202. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  27. 27.
    Henderson, R.L.: Job scheduling under the portable batch system. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1995 and JSSPP 1995. LNCS, vol. 949, pp. 279–294. Springer, Heidelberg (1995)Google Scholar
  28. 28.
    Holt, G.: Time-Critical Scheduling on a Well Utilised HPC System Using Resource Reservations. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 102–124. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  29. 29.
    Intel Corp., iPSC/860 Multi-User Accounting, Control, and Scheduling Utilities Manual. Order number 312261-002 (May 1992)Google Scholar
  30. 30.
    Jackson, D., Snell, Q., Clement, M.: Core algorithms of theMaui scheduler. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 2001. LNCS, vol. 2221, pp. 87–102. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  31. 31.
    Lagerstrom, R., Gipp, S.: PScheD: Political Scheduling on the CRAY T3E. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1997 and JSSPP 1997. LNCS, vol. 1291, pp. 117–138. Springer, Heidelberg (1997)Google Scholar
  32. 32.
    Lee, C.B., Schwartzman, Y., Hardy, J., Snavely, A.: Are user runtime estimates inherently inaccurate? In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 253–263. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  33. 33.
    Lifka, D.: The ANL/IBM SP scheduling system. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1995 and JSSPP 1995. LNCS, vol. 949, pp. 295–303. Springer, Heidelberg (1995)Google Scholar
  34. 34.
    Litzkow, M.J., Livny, M., Mutka, M.W.: Condor - a hunter of idle workstations. In: 8th Intl. Conf. Distributed Comput. Syst., June 1988, pp. 104–111 (1988)Google Scholar
  35. 35.
    Moreira, J.E., Chan, W., Fong, L.L., Franke, H., Jette, M.A.: An infrastructure for efficient parallel job execution in terascale computing environments. In: Supercomputing 1998 (November 1998)Google Scholar
  36. 36.
    Mraz, R.: Reducing the variance of point-to-point transfers for parallel real-time programs. IEEE Parallel & Distributed Technology 2(4), 20–31 (Winter 1994)CrossRefGoogle Scholar
  37. 37.
    Mu’alem, W., Feitelson, D.G.: Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Trans. Parallel & Distributed Syst. 12(6), 529–543 (2001)CrossRefGoogle Scholar
  38. 38.
    Ousterhout, J.K.: Scheduling techniques for concurrent systems. In: 3rd Intl. Conf. Distributed Comput. Syst., October 1982, pp. 22–30 (1982)Google Scholar
  39. 39.
    Petrini, F., Feng, W.-c.: Buffered coscheduling: a new methodology for multitasking parallel jobs on distributed systems. In: 14th Intl. Parallel & Distributed Processing Symp., May 2000, pp. 439–444 (2000)Google Scholar
  40. 40.
    Petrini, F., Feng, W.-c.: Time-sharing parallel jobs in the presence of multiple resource requirements. In: Feitelson, D.G., Rudolph, L. (eds.) IPDPS-WS 2000 and JSSPP 2000. LNCS, vol. 1911, pp. 113–136. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  41. 41.
    Petrini, F., Kerbyson, D.J., Pakin, S.: The case of missing supercomputer performance: achieving optimal performance on the 8,192 processors of ASCI Q. In: Supercomputing (November 2003)Google Scholar
  42. 42.
    Pruyne, J., Livny, M.: Parallel processing on dynamic resources with CARMI. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1995 and JSSPP 1995. LNCS, vol. 949, pp. 259–278. Springer, Heidelberg (1995)Google Scholar
  43. 43.
    Schwiegelshohn, U., Yahyapour, R.: Analysis of First-Come-First-Serve Parallel Job Scheduling. In: Proceedings of the 9th SIAM Symposium on Discrete Algorithms, January 1998, pp. 629–638 (1998)Google Scholar
  44. 44.
    Schwiegelshohn, U., Yahyapour, R.: Fairness in Parallel Job Scheduling. Journal of Scheduling 3(5), 297–320 (2000)zbMATHCrossRefMathSciNetGoogle Scholar
  45. 45.
    Schwiegelshohn, U., Yahyapour, R.: Grid Resource Management - State of the Art and Future Trends. In: Attributes for Communication Between Grid Scheduling Instances, pp. 41–52. Kluwer Academic Publishers, Dordrecht (2003)Google Scholar
  46. 46.
    Rudolph, L., Smith, P.: Valuation of Ultra-scale Computing Systems. In: Feitelson, D.G., Rudolph, L. (eds.) IPDPS-WS 2000 and JSSPP 2000. LNCS, vol. 1911, pp. 39–55. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  47. 47.
    Shmueli, E., Feitelson, D.G.: Backfilling with lookahead to optimize the performance of parallel job scheduling. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2003. LNCS, vol. 2862, pp. 228–251. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  48. 48.
    Sinaga, J.M.P., Mohammed, H.H., Epema, D.H.J.: A dynamic co-allocation service in multicluster systems. In: 10th Job Scheduling Strategies for Parallel Processing (June 2004)Google Scholar
  49. 49.
    Snell, Q., Clement, M., Jackson, D., Gregory, C.: The performance impact of advance reservation meta-scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) IPDPS-WS 2000 and JSSPP 2000. LNCS, vol. 1911, pp. 137–153. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  50. 50.
    Srinivasan, S., Kettimuthu, R., Subramani, V., Sadayappan, P.: Selective reservation strategies for backfill job scheduling. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2002. LNCS, vol. 2537, pp. 55–71. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  51. 51.
    Talby, D., Feitelson, D.G.: Supporting priorities and improving utilization of the IBM SP scheduler using slack-based backfilling. In: 13th Intl. Parallel Processing Symp., April 1999, pp. 513–517 (1999)Google Scholar
  52. 52.
    Tullsen, D.M., Eggers, S., Emer, J., Levy, H., Lo, J., Stamm, R.: Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor. In: 23rd Annual International Symposium on Computer Architecture (May 1996)Google Scholar
  53. 53.
    Tsafrir, D.: (in preparation)Google Scholar
  54. 54.
    Uno, A., Aoyagi, T., Tani, K.: Job scheduling on the earth simulator. NEC Res. & Develop. 44(1), 47–52 (2003)Google Scholar
  55. 55.
    Schwiegelshohn, U., Yahyapour, R.: GGF-GFD.6: Attributes for Communication between Scheduling Instances (December 2001), http://www.ggf.org/documents/GFD/GFDI-6.pdf
  56. 56.
    Wiseman, Y., Feitelson, D.G.: Paired gang scheduling. IEEE Trans. Parallel & Distributed Syst. 14(6), 581–592 (2003)CrossRefGoogle Scholar
  57. 57.
    Yoo, A.B., Jette, M.A., Grondona, M.: SLURM: simple Linux utility for resource management. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2003. LNCS, vol. 2862, pp. 44–60. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  58. 58.
    Zhang, Y., Franke, H., Moreira, J., Sivasubramaniam, A.: An integrated approach to parallel scheduling using gang-scheduling, backfilling, and migration. IEEE Trans. Parallel & Distributed Syst. 14(3), 236–247 (2003)CrossRefGoogle Scholar
  59. 59.
    Zhang, Y., Franke, H., Moreira, J.E., Sivasubramaniam, A.: Improving parallel job scheduling by combining gang scheduling and backfilling techniques. In: 14th Intl. Parallel & Distributed Processing Symp., May 2000, pp. 133–142 (2000)Google Scholar
  60. 60.
    Zhou, S., Zheng, X., Wang, J., Delisle, P.: Utopia: a load sharing facility for large, heterogeneous distributed computer systems. Software — Pract. & Exp. 23(12), 1305–1336 (1993)CrossRefGoogle Scholar
  61. 61.
    Zotkin, D., Keleher, P.J.: Job-length estimation and performance in backfilling schedulers. In: 8th Intl. Symp. High Performance Distributed Comput. (August 1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Dror G. Feitelson
    • 1
  • Larry Rudolph
    • 2
  • Uwe Schwiegelshohn
    • 3
  1. 1.School of Computer Science and EngineeringThe Hebrew University of JerusalemJerusalemIsrael
  2. 2.Laboratory for Computer ScienceMassachusetts Institute of TechnologyCambridgeUSA
  3. 3.Computer Engineering InstituteUniversität DortmundDortmundGermany

Personalised recommendations