Employing Checkpoint to Improve Job Scheduling in Large-Scale Systems

  • Shuangcheng Niu
  • Jidong Zhai
  • Xiaosong Ma
  • Mingliang Liu
  • Yan Zhai
  • Wenguang Chen
  • Weimin Zheng
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7698)

Abstract

The FCFS-based backfill algorithm is widely used in scheduling high-performance computer systems. The algorithm relies on runtime estimate of jobs which is provided by users. However, statistics show the accuracy of user-provided estimate is poor. Users are very likely to provide a much longer runtime estimate than its real execution time.

In this paper, we propose an aggressive backfilling approach with checkpoint based preemption to address the inaccuracy in user-provided runtime estimate. The approach is evaluated with real workload traces. The results show that compared with the FCFS-based backfill algorithm, our scheme improves the job scheduling performance in waiting time, slowdown and mean queue length by up to 40%. Meanwhile, only 4% of the jobs need to perform checkpoints.

Keywords

job scheduling backfill algorithm runtime estimate check-point/restart 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Zhang, Y., Franke, H., Moreira, J., Sivasubramaniam, A.: An integrated approach to parallel scheduling using gang-scheduling, backfilling, and migration. IEEE Transactions on Parallel and Distributed Systems 14(3), 236–247 (2003)CrossRefGoogle Scholar
  2. 2.
    McCann, C., Vaswani, R., Zahorjan, J.: A dynamic processor allocation policy for iviukiprogrammed shared-memory multiprocessors. ACM Transactions on Computer Systems 11(2), 146–178 (1993)CrossRefGoogle Scholar
  3. 3.
    Majumdar, S., Eager, D.L., Bunt, R.B.: Scheduling in multiprogrammed parallel systems, vol. 16. ACM (1988)Google Scholar
  4. 4.
    Lifka, D.: The ANL/IBM SP Scheduling System. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1995 and JSSPP 1995. LNCS, vol. 949, pp. 295–303. Springer, Heidelberg (1995)CrossRefGoogle Scholar
  5. 5.
    Platform Computing Inc. Platform LSF (2012), http://www.platform.com/products/LSFfamily/
  6. 6.
    Adaptive Computing Enterprises Inc. MOAB workload manager (2012), http://www.supercluster.org/moab/
  7. 7.
    Jackson, D., Snell, Q., Clement, M.: Core Algorithms of the Maui Scheduler. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 2001. LNCS, vol. 2221, pp. 87–102. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  8. 8.
    Adaptive Computing Enterprises Inc. PBS/Torque user manual (2012), http://www.clusterresources.com/torquedocs21/usersmanual.shtml
  9. 9.
    Skovira, J., Chan, W., Zhou, H., Lifka, D.: The EASY – LoadLeveler API Project. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1996 and JSSPP 1996. LNCS, vol. 1162, pp. 41–47. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  10. 10.
    Parallel Workloads Archive (2012), http://www.cs.huji.ac.il/labs/parallel/workload/
  11. 11.
    Srinivasan, S., Kettimuthu, R., Subramani, V., Sadayappan, P.: Characterization of backfilling strategies for parallel job scheduling. In: Proceedings of the International Conference on Parallel Processing Workshops, pp. 514–519. IEEE (2002)Google Scholar
  12. 12.
    Cirne, W., Berman, F.: A comprehensive model of the supercomputer workload. In: 2001 IEEE International Workshop on Workload Characterization, WWC-4, pp. 140–148. IEEE (2001)Google Scholar
  13. 13.
    Chiang, S.-H., Arpaci-Dusseau, A., Vernon, M.K.: The Impact of More Accurate Requested Runtimes on Production Job Scheduling Performance. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2002. LNCS, vol. 2537, pp. 103–127. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  14. 14.
    Zhai, J., Chen, W., Zheng, W.: PHANTOM: predicting performance of parallel applications on large-scale parallel machines using a single node. ACM SIGPLAN Notices 45, 305–314 (2010)CrossRefGoogle Scholar
  15. 15.
    Tang, W., Desai, N., Buettner, D., Lan, Z.: Analyzing and adjusting user runtime estimates to improve job scheduling on the Blue Gene/P. In: 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pp. 1–11. IEEE (2010)Google Scholar
  16. 16.
    Bailey Lee, C., Schwartzman, Y., Hardy, J., Snavely, A.: Are User Runtime Estimates Inherently Inaccurate? In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 253–263. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  17. 17.
    Berkeley Lab Checkpoint/Restart, BLCR (2012), https://ftg.lbl.gov/projects/CheckpointRestart/
  18. 18.
    Bent, J., Gibson, G., Grider, G., McClelland, B., Nowoczynski, P., Nunez, J., Polte, M., Wingate, M.: Plfs: A checkpoint filesystem for parallel applications. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, p. 21. ACM (2009)Google Scholar
  19. 19.
    Liu, Y., Nassar, R., Leangsuksun, C., Naksinehaboon, N., Paun, M., Scott, S.L.: An optimal checkpoint/restart model for a large scale high performance computing system. In: IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2008, pp. 1–9. IEEE (2008)Google Scholar
  20. 20.
    Bronevetsky, G., Marques, D., Pingali, K., Stodghill, P.: Automated application-level checkpointing of MPI programs. In: Proceedings of the Ninth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 84–94. ACM (2003)Google Scholar
  21. 21.
    Mallikarjuna Shastry, P.M., Venkatesh, K.: Analysis of Dependencies of Checkpoint Cost and Checkpoint Interval of Fault Tolerant MPI Applications. Analysis 2(08), 2690–2697 (2010)Google Scholar
  22. 22.
    TOP500 Supercomputing web site (2012), http://www.top500.org
  23. 23.
    Naik, H., Gupta, R., Beckman, P.: Analyzing checkpointing trends for applications on the IBM Blue Gene/P system. In: International Conference on Parallel Processing Workshops, ICPPW 2009, pp. 81–88. IEEE (2009)Google Scholar
  24. 24.
    Feitelson, D.G., Weil, A.M.: Utilization and predictability in scheduling the ibm sp2 with backfilling. In: Proceedings of the First Merged International... and Symposium on Parallel and Distributed Processing, Parallel Processing Symposium, IPPS/SPDP 1998, pp. 542–546. IEEE (1998)Google Scholar
  25. 25.
    Ward Jr., W.A., Mahood, C.L., West, J.E.: Scheduling Jobs on Parallel Systems Using a Relaxed Backfill Strategy. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2002. LNCS, vol. 2537, pp. 88–102. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  26. 26.
    Tsafrir, D., Etsion, Y., Feitelson, D.G.: Backfilling using system-generated predictions rather than user runtime estimates. IEEE Transactions on Parallel and Distributed Systems 18(6), 789–803 (2007)CrossRefGoogle Scholar
  27. 27.
    Snell, Q.O., Clement, M.J., Jackson, D.B.: Preemption Based Backfill. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2002. LNCS, vol. 2537, pp. 24–37. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  28. 28.
    Adaptive Computing Enterprises Inc. Preemption Policies (2012), http://www.adaptivecomputing.com/resources/docs/maui/8.4preemption.php
  29. 29.
    Perkovic, D., Keleher, P.J.: Randomization, speculation, and adaptation in batch schedulers. In: Proceedings of the 2000 ACM/IEEE Conference on Supercomputing (CDROM), p. 7. IEEE Computer Society (2000)Google Scholar
  30. 30.
    Jette, M.A.: Performance characteristics of gang scheduling in multiprogrammed environments. In: ACM/IEEE 1997 Conference on Supercomputing, pp. 54–54. IEEE (1997)Google Scholar
  31. 31.
    Jette, M., Storch, D., Yim, E.: Gang scheduler-timesharing the cray t3d, pp. 247–252. Cray User Group (1996)Google Scholar
  32. 32.
    Sosa, C., Knudson, B.: IBM System Blue Gene/P Solution: Blue Gene/P Application Development (2007), http://www.redbooks.ibm.com/abstracts/sg247287.html
  33. 33.
    Xue, R., Chen, W., Zheng, W.: CprFS: a user-level file system to support consistent file states for checkpoint and restart. In: Proceedings of the 22nd Annual International Conference on Supercomputing, pp. 114–123. ACM (2008)Google Scholar
  34. 34.
    Liu, Y., Nassar, R., Leangsuksun, C., Naksinehaboon, N., Paun, M., Scott, S.L.: An optimal checkpoint/restart model for a large scale high performance computing system. In: IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2008, pp. 1–9. IEEE (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Shuangcheng Niu
    • 1
  • Jidong Zhai
    • 1
  • Xiaosong Ma
    • 2
  • Mingliang Liu
    • 1
  • Yan Zhai
    • 1
  • Wenguang Chen
    • 1
  • Weimin Zheng
    • 1
  1. 1.Tsinghua UniversityBeijingChina
  2. 2.North Carolina State University and Oak Ridge National LaboratoryUSA

Personalised recommendations