Advertisement

Toward convergence in job schedulers for parallel supercomputers

  • Dror G. Feitelson
  • Larry Rudolph
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1162)

Abstract

The space of job schedulers for parallel supercomputers is rather fragmented, because different researchers tend to make different assumptions about the goals of the scheduler, the information that is available about the workload, and the operations that the scheduler may perform. We argue that by identifying these assumptions explicitly, it is possible to reach a level of convergence. For example, it is possible to unite most of the different assumptions into a common framework by associating a suitable cost function with the execution of each job. The cost function reflects knowledge about the job and the degree to which it fits the goals of the system. Given such cost functions, scheduling is done to maximize the system's profit.

Keywords

Cost Function Parallel Processing Parallel Machine Average Response Time Partition Size 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    T. Agerwala, J. L. Martin, J. H. Mirza, D. C. Sadler, D. M. Dias, and M. Snir, “SP2 system architecture”. IBM Syst. J. 34(2), pp. 152–184, 1995.Google Scholar
  2. 2.
    T. E. Anderson, B. N. Bershad, E. D. Lazowska, and H. M. Levy, “Scheduler activations: effective kernel support for the user-level management of parallelism”. ACM Trans. Comput. Syst. 10(1), pp. 53–79, Feb 1992.CrossRefGoogle Scholar
  3. 3.
    J. M. Barton and N. Bitar, “A scalable multi-discipline, multiple-processor scheduling framework for IRIX” In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 45–69, Springer-Verlag, 1995. Lecture Notes in Computer Science Vol. 949.Google Scholar
  4. 4.
    P. Brinch Hansen, “An analysis of response ratio scheduling”. In IFIP Congress, Ljubljana, pp. TA-3 150–154, Aug 1971.Google Scholar
  5. 5.
    N. Carriero, E. Freedman, D. Gelernter, and D. Kaminsky, “Adaptive parallelism and Piranha”. Computer 28(1), pp. 40–49, Jan 1995.CrossRefGoogle Scholar
  6. 6.
    M-S. Chen and K. G. Shin, “Subcube allocation and task migration in hypercube multiprocessors”. IEEE Trans. Comput. 39(9), pp. 1146–1155, Sep 1990.CrossRefGoogle Scholar
  7. 7.
    S-H. Chiang and M. Vernon, “Dynamic vs. static quantum-based parallel processor allocation”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.Google Scholar
  8. 8.
    M. Crovella, P. Das, C. Dubnicki, T. LeBlanc, and E. Markatos, “Multiprogramming on multiprocessors”. In 3rd IEEE Symp. Parallel & Distributed Processing, pp. 590–597, 1991.Google Scholar
  9. 9.
    D. Das Sharma and D. K. Pradhan, “A fast and efficient strategy for submesh allocation in mesh-connected parallel computers”. In IEEE Symp. Parallel & Distributed Processing, pp. 682–689, Dec 1993.Google Scholar
  10. 10.
    M. Devarakonda and A. Mukherjee, “Issues in implementation of cache-affinity scheduling”. In Proc. Winter USENIX Technical Conf., pp. 345–357, Jan 1992.Google Scholar
  11. 11.
    J. Edler, A. Gottlieb, C. P. Kruskal, K. P. McAuliffe, L. Rudolph, M. Snir, P. J. Teller, and J. Wilson, “Issues related to MIMD shared-memory computers: the NYU Ultracomputer approach”. In 12th Ann. Intl. Symp. Computer Architecture Conf. Proc., pp. 126–135, 1985.Google Scholar
  12. 12.
    D. G. Feitelson, “Packing schemes for gang scheduling”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.Google Scholar
  13. 13.
    D. G. Feitelson, A Survey of Scheduling in Multiprogrammed Parallel Systems. Research Report RC 19790 (87657), IBM T. J. Watson Research Center, Oct 1994.Google Scholar
  14. 14.
    D. G. Feitelson and B. Nitzberg, “Job characteristics of a production parallel scientific workload on the NASA Ames iPSC/860”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 337–360, Springer-Verlag, 1995. Lecture Notes in Computer Science Vol. 949.Google Scholar
  15. 15.
    D. G. Feitelson and L. Rudolph, “Distributed hierarchical control for parallel processing”. Computer 23(5), pp. 65–77, May 1990.CrossRefGoogle Scholar
  16. 16.
    D. G. Feitelson and L. Rudolph, “Evaluation of design choices for gang scheduling using distributed hierarchical control”. J. Parallel & Distributed Comput., 1996. to appear.Google Scholar
  17. 17.
    D. G. Feitelson and L. Rudolph, “Gang scheduling performance benefits for finegrain synchronization”. J. Parallel & Distributed Comput. 16(4), pp. 306–318, Dec 1992.Google Scholar
  18. 18.
    D. G. Feitelson and L. Rudolph, “Parallel job scheduling: issues and approaches”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 1–18, Springer-Verlag, 1995. Lecture Notes in Computer Science Vol. 949.Google Scholar
  19. 19.
    M. Frank, V. Lee, W. Lee, K. Mackenzie, and L. Rudolph, “An online scheduler respecting job cost functions for parallel processors”. Manuscript in preperation, M.I.T. Cambridge, MA, 1996.Google Scholar
  20. 20.
    J. Gehring and F. Ramme, “Architecture-independent request-scheduling with tight waiting-time estimations”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.Google Scholar
  21. 21.
    A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek, and V. Sunderam, PVM 3 User's Guide and Reference Manual. Technical Report ORNL/TM-12187, Oak Ridge National Laboratory, May 1994.Google Scholar
  22. 22.
    B. Gorda and R. Wolski, “Time sharing massively parallel machines”. In Intl. Conf. Parallel Processing, Aug 1995.Google Scholar
  23. 23.
    A. Gupta, A. Tucker, and S. Urushibara, “The impact of operating system scheduling policies and synchronization methods on the performance of parallel applications”. In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 120–132, May 1991.Google Scholar
  24. 24.
    R. L. Henderson, “Job scheduling under the portable batch system”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 279–294, Springer-Verlag, 1995. Lecture Notes in Computer Science Vol. 949.Google Scholar
  25. 25.
    A. Hori et al., “Time space sharing scheduling and architectural support”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 92–105, Springer-Verlag, 1995. Lecture Notes in Computer Science Vol. 949.Google Scholar
  26. 26.
    A. Hori, H. Tezuka, Y. Ishikawa, N. Soda, H. Konaka, and M. Maeda, “Implementation of gang-scheduling on workstation cluster”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.Google Scholar
  27. 27.
    S. Hotovy, “Workload evolution on the Cornell Theory Center IBM SP2”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.Google Scholar
  28. 28.
    N. Islam, A. Prodromidis, and M. Squillante, “Dynamic partitioning in different distributed-memory environments”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.Google Scholar
  29. 29.
    Y. A. Khalidi, J. Bernabeu, V. Matena, K. Shirriff, and M. Thadani, “Solaris MC: a Multi Computer OS”. In Proc. USENIX Conf., Jan 1996.Google Scholar
  30. 30.
    A. A. Khokhar, V. K. Prasanna, M. E. Shaaban, and C-L. Wang, “Heterogeneous computing: challenges and opportunities”. Computer 26(6), pp. 18–27, Jun 1993.CrossRefGoogle Scholar
  31. 31.
    P. Krueger, T-H. Lai, and V. A. Dixit-Radiya, “Job scheduling is more important than processor allocation for hypercube computers”. IEEE Trans. Parallel & Distributed Syst. 5(5), pp. 488–497, May 1994.Google Scholar
  32. 32.
    D. Lifka, “The ANL/IBM SP scheduling system”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 295–303, Springer-Verlag, 1995. Lecture Notes in Computer Science Vol. 949.Google Scholar
  33. 33.
    W. Liu, V. Lo, K. Windisch, and B. Nitzberg, “Non-contiguous processor allocation algorithms for distributed memory multicomputers”. In Supercomputing '94, pp. 227–236, Nov 1994.Google Scholar
  34. 34.
    C. McCann, R. Vaswani, and J. Zahorjan, “A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors”. ACM Trans. Comput. Syst. 11(2), pp. 146–178, May 1993.CrossRefGoogle Scholar
  35. 35.
    C. McCann and J. Zahorjan, “Processor allocation policies for message passing parallel computers”. In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 19–32, May 1994.Google Scholar
  36. 36.
    T. D. Nguyen, R. Vaswani, and J. Zahorjan, “Parallel application characterization for multiprocessor scheduling policy design”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.Google Scholar
  37. 37.
    T. D. Nguyen, R. Vaswani, and J. Zahorjan, “Using runtime measured workload characteristics in parallel processor scheduling”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.Google Scholar
  38. 38.
    J. K. Ousterhout, “Scheduling techniques for concurrent systems”. In 3rd Intl. Conf. Distributed Comput. Syst., pp. 22–30, Oct 1982.Google Scholar
  39. 39.
    J. D. Padhye and L. W. Dowdy, “Preemptive versus non-preemptive processor allocation policies for message passing parallel computers: an empirical comparison”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.Google Scholar
  40. 40.
    E. W. Parsons and K. C. Sevcik, “Multiprocessor scheduling for high-variability service time distributions”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 127–145, Springer-Verlag, 1995. Lecture Notes in Computer Science Vol. 949.Google Scholar
  41. 41.
    J. Peterson and A. Silberschatz, Operating System Concepts. Addison-Wesley, 1983.Google Scholar
  42. 42.
    J. Pruyne and M. Livny, “Managing checkpoints for parallel programs”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.Google Scholar
  43. 43.
    J. Pruyne and M. Livny, “Parallel processing on dynamic resources with CARMI”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 259–278, Springer-Verlag, 1995. Lecture Notes in Computer Science Vol. 949.Google Scholar
  44. 44.
    E. Rosti, E. Smirni, L. W. Dowdy, G. Serazzi, and B. M. Carlson, “Robust partitioning schemes of multiprocessor systems”. Performance Evaluation 19(2–3), pp. 141–165, Mar 1994.CrossRefGoogle Scholar
  45. 45.
    E. Rosti, E. Smirni, G. Serazzi, and L. W. Dowdy, “Analysis of non-workconserving processor partitioning policies”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 165–181, Springer-Verlag, 1995. Lecture Notes in Computer Science Vol. 949.Google Scholar
  46. 46.
    K. C. Sevcik, “Application scheduling and processor allocation in multiprogrammed parallel processing systems”. Performance Evaluation 19(2–3), pp. 107–140, Mar 1994.CrossRefGoogle Scholar
  47. 47.
    K. C. Sevcik, “Characterization of parallelism in applications and their use in scheduling”. In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 171–180, May 1989.Google Scholar
  48. 48.
    J. Skovira, W. Chan, H. Zhou, and D. Lifka, “The EASY — LoadLeveler API project”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.Google Scholar
  49. 49.
    M. S. Squillante and E. D. Lazowska, “Using processor-cache affinity information in shared-memory multiprocessor scheduling”. IEEE Trans. Parallel & Distributed Syst. 4(2), pp. 131–143, Feb 1993.Google Scholar
  50. 50.
    S. Thakkar, P. Gifford, and G. Fielland, “Balance: a shared memory multiprocessor system”. In 2nd Intl. Conf. Supercomputing, vol. I, pp. 93–101, 1987.Google Scholar
  51. 51.
    J. Torrellas, A. Tucker, and A. Gupta, “Evaluating the performance of cacheaffinity scheduling in shared-memory multiprocessors”. J. Parallel & Distributed Comput. 24(2), pp. 139–151, Feb 1995.Google Scholar
  52. 52.
    A. Tucker and A. Gupta, “Process control and scheduling issues for multiprogrammed shared-memory multiprocessors”. In 12th Symp. Operating Systems Principles, pp. 159–166, Dec 1989.Google Scholar
  53. 53.
    M. Wan, R. Moore, G. Kremenek, and K. Steube, “A batch scheduler for the Intel Paragon MPP system with a non-contiguous node allocation algorithm”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.Google Scholar
  54. 54.
    F. Wang, M. Papaefthymiou, M. Squillante, L. Rudolph, P. Pattnaik, and H. Franke, “A gang scheduling design for multiprogrammed parallel computing environments”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.Google Scholar
  55. 55.
    J. Zahorjan and C. McCann, “Processor scheduling in shared memory multiprocessors”. In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 214–225, May 1990.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1996

Authors and Affiliations

  • Dror G. Feitelson
    • 1
  • Larry Rudolph
    • 1
  1. 1.Institute of Computer ScienceThe Hebrew UniversityJerusalemIsrael

Personalised recommendations