Cluster Computing

, Volume 14, Issue 2, pp 165–182 | Cite as

Service control with the preemptive parallel job scheduler Scojo-PECT

Article

Abstract

User satisfaction and scheduling on grids makes predictability of response times and quality-of-service highly desirable. However, existing approaches for response-time prediction still show significant prediction errors, mostly due to problems in dynamic arrival of jobs with potentially higher priority and hard-to-anticipate packing and backfilling effects. The same problems imply that quality-of-service cannot be solved with standard approaches from communication systems. Thus, this paper presents a scheduling approach which provides a more suitable framework for service guarantees and predictability. The approach is based on coarse-grain preemption, combined with an innovative separation of job classes. Resource shares can be determined as necessary to meet target service levels. A further extension permits limited dynamic resource allocation to adapt to variations in machine load and job mixes. The feasibility of service control is demonstrated with various workloads.

Keywords

Job scheduling Gang scheduling Preemption Quality-of-service Prediction 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Brevic, J., Nurmi, D., Wolski, R.: Predicting bounds on queuing delay in space-shared computing environments. In: Proc. ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP) (2006) Google Scholar
  2. 2.
    Carrington, L., Wolter, N., Snavely, A., Lee, C.B.: Applying an automated framework to produce accurate blind performance predictions of full-scale HPC applications. In: Proc. DoD Users Group Conference. IEEE Press, New York (2004) Google Scholar
  3. 3.
    Chiang, S.H., Fu, C.: Benefit of limited time sharing in the presence of very large parallel jobs. In: Proc. IPDPS. IEEE (2005) Google Scholar
  4. 4.
    Chiang, S.H., Vernon, M.: Characteristics of a large shared memory production workload. In: Proc. Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP). LNCS, vol. 2221. Springer, Berlin (2001) Google Scholar
  5. 5.
    Downey, A.: Predicting queue times on space-sharing parallel computers. In: Proc. IPPS (1997) Google Scholar
  6. 6.
    Esbaugh, B., Sodan, A.: Coarse-grain time slicing with resource-share control in parallel-job scheduling. In: Proc. High Performance Computing and Communication (HPCC). LNCS, vol. 4782. Springer, Berlin (2007) Google Scholar
  7. 7.
    Feitelson, D., Jette, M.: Improved utilization and responsiveness with gang scheduling. In: Proc. Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP). LNCS, vol. 1291. Springer, Berlin (1997) Google Scholar
  8. 8.
    Feitelson, D.G.: Packing schemes for gang scheduling. In: Proc. Job Scheduling Strategies for Parallel Processing (JSSPP). LNCS, vol. 1162. Springer, Berlin (1996) CrossRefGoogle Scholar
  9. 9.
    Frachtenberg, E., Petrini, F., Fernandez, J., Pakin, S., Coll, S.: Storm: lightning-fast resource management. In: Proc. IEEE/ACM Supercomputing Conference (2002) Google Scholar
  10. 10.
    Ghare, G., Leutenegger, S.T.: The effect of correlating quantum allocation and job size for gang scheduling. In: Proc. Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP). LNCS, vol. 1659. Springer, Berlin (1999) Google Scholar
  11. 11.
    Gibbons, R.: A historical application profiler for use by parallel schedulers. In: Proc. Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP). LNCS, vol. 1291. Springer, Berlin (1997) Google Scholar
  12. 12.
    Houten, B.V., Ciesielski, F., Brebner, G., Jacquelot, G., Riedmann, M., Strauss, H.: Advanced cluster software for meteorology. In: ECMWF Workshop (2006) Google Scholar
  13. 13.
    Islam, M., Balaji, P., Sandayappan, P., Panda, D.: QoPS: A QoS based scheme for parallel job scheduling. In: Proc. Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP). LNCS, vol. 2862. Springer, Berlin (2003) Google Scholar
  14. 14.
    Jackson, D., Snell, Q., Clement, M.: Core algorithms of the Maui scheduler. In: Proc. Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP). LNCS, vol. 2221, pp. 87–102. Springer, Berlin (2001) CrossRefGoogle Scholar
  15. 15.
    Kettimuthu, R., Subramani, V., Srinivasan, S., Gopalasamy, T.: Selective preemption strategies for parallel job scheduling. In: Proc. ICPP. IEEE Press, New York (2002) Google Scholar
  16. 16.
    Lee, W., Frank, M., Lee, V., Mackenzie, K., Rudolph, L.: Implications of I/O for gang scheduled workloads. In: Proc. Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP). LNCS, vol. 1291, pp. 215–237. Springer, Berlin (1997) Google Scholar
  17. 17.
    Li, H., Chen, J., Tao, Y., Groep, D., Wolters, L.: Improving a local learning technique for queue wait time prediction. In: Proc. Int. Symposium on Cluster Computing and the Grid (CCGrid), Singapore (2006) Google Scholar
  18. 18.
    Lublin, U., Feitelson, D.: The workload on parallel supercomputers—modelling the characteristics of rigid jobs. J. Parallel Distrib. Comput. 63(11), 1105–1122 (2003) MATHCrossRefGoogle Scholar
  19. 19.
    Moab workload manager administrator’s guide, Version 5.0.0. cluster resources. Available at http://www.clusterresources.com/products/mwm/docs/index.shtml (retrieved August 2008)
  20. 20.
    Moreira, J., Chan, W., Fong, L., Franke, H., Jette, M.: An infrastructure for efficient parallel job execution in terascale computing environments. In: Proc. ACM/IEEE Supercomputing (1998) Google Scholar
  21. 21.
    Muálem, A., Feitelson, D.: Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Trans. Parallel Distrib. Syst. 12, 529–543 (2001) CrossRefGoogle Scholar
  22. 22.
    Nelson, R.: Probability, Stochastic Processes, and Queueing Theory. Springer, New York/Berlin/Heidelberg (1995) MATHGoogle Scholar
  23. 23.
    Sabin, G., Sadayapan, P.: Unfairness metrics for space-sharing parallel job schedulers. In: Proc. Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP). LNCS, vol. 3834. Springer, Berlin (2005) Google Scholar
  24. 24.
    Setia, S., Squillante, M., Naik, V.: The impact of job memory requirements on gang-scheduling performance. Perform. Eval. Rev. 26(4), 30–39 (1999) CrossRefGoogle Scholar
  25. 25.
    Sharcnet network of clusters. Available at http://www.sharcnet.ca (retrieved Sept. 2009)
  26. 26.
    Smith, W., Taylor, V., Foster, I.: Using run-time predictions to estimate queue wait times and improve scheduler performance. In: Proc. Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP). LNCS, vol. 1659. Springer, Berlin (1999) Google Scholar
  27. 27.
    Snell, Q., Clement, M., Jackson, D.: Preemption based backfill. In: Proc. Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP). LNCS, vol. 2537, pp. 24–37. Springer, Berlin (2002) CrossRefGoogle Scholar
  28. 28.
    Sodan, A.: Loosely coordinated coscheduling in the context of other dynamic approaches for job scheduling—a survey (2005) Google Scholar
  29. 29.
    Sodan, A.: Autonomic share allocation and bounded prediction of response times in job scheduling for computational grids. In: Proc. Workshop on Adaptive Grid Computing (NCA-AGC). IEEE, Cambridge (2008) Google Scholar
  30. 30.
    Sodan, A.: Predictive space- and time-resource allocation in parallel job scheduling for clusters, grids, and clouds. In: Proc. ICPP Workshop on Scheduling and Resource Management for Parallel and Distritbuted Systems (SRMPDS). IEEE, San Diego (2010) Google Scholar
  31. 31.
    Sodan, A.C., Kanavallil, A., Esbaugh, B.: Group-based optimization for parallel job scheduling with Scojo-PECT-O. In: Proc. Conf. on High Performance Computing Systems and Applications (HPCS). IEEE, Quebec City (2008) Google Scholar
  32. 32.
    Talby, D., Feitelson, D.: Supporting priorities and improving utilization of the IBM SP2 scheduler using slack-based backfilling. In: Proc. IPPS. IEEE Press, New York (1999) Google Scholar
  33. 33.
    Tsafrir, D., Etsion, Y., Feitelson, D.: Backfilling using system-generated predictions rather than user runtime estimates. IEEE Trans. Parallel Distrib. Syst. 18(6), 789–803 (2007) CrossRefGoogle Scholar
  34. 34.
    Wei, J., Xu, C.Z., Li, Q.: A robust packet scheduling algorithm for proportional delay differentiation services. Comput. Commun. 29(18), 3679–3690 (2006) CrossRefGoogle Scholar
  35. 35.
    Feitelson workload archive. Available at http://www.cs.huji.ac.il/labs/parallel/workload/logs.html (retrieved July 2009)
  36. 36.
    Yoshimoto, K., Kovatch, P., Andrews, P.: Co-scheduling with user-settable reservations. In: Proc. Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP). LNCS, vol. 3834, pp. 146–156. Springer, Berlin (2005) CrossRefGoogle Scholar
  37. 37.
    Zhang, Y., Franke, H., Moreira, J., Sivasubramaniam, A.: An integrated approach to parallel scheduling using gang-scheduling, backfilling, and migration. IEEE Trans. Parallel Distrib. Syst. 14, 236 (2003) CrossRefGoogle Scholar
  38. 38.
    Zhang, Y., Yang, A., Sivasubramaniam, A., Moreira, J.: Gang scheduling extensions for I/O intensive workloads. In: Proc. Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP). LNCS, vol. 2862, pp. 183–207. Springer, Berlin (2003) CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Computer ScienceUniversity of WindsorWindsorCanada

Personalised recommendations