Abstract
This paper investigates co-scheduling algorithms for processing a set of parallel applications. Instead of executing each application one by one, using a maximum degree of parallelism for each of them, we aim at scheduling several applications concurrently. We partition the original application set into a series of packs, which are executed one by one. A pack comprises several applications, each of them with an assigned number of processors, with the constraint that the total number of processors assigned within a pack does not exceed the maximum number of available processors. The objective is to determine a partition into packs, and an assignment of processors to applications, that minimize the sum of the execution times of the packs. We thoroughly study the complexity of this optimization problem, and propose several heuristics that exhibit very good performance on a variety of workloads, whose application execution times model profiles of parallel scientific codes. We show that co-scheduling leads to faster workload completion time (40 % improvement on average over traditional scheduling) and to faster response times (50 % improvement). Hence, co-scheduling increases system throughput and saves energy, leading to significant benefits from both the user and system perspectives.
Similar content being viewed by others
References
Balay, S., Brown, J., Buschelman, K., Gropp, W. D., Kaushik, D., Knepley, M. G., McInnes, L. C., Smith, B. F., & Zhang, H. (2012). PETSc Web page. http://www.mcs.anl.gov/petsc.
Balay, S., Abhyankar, S., Adams, M. F., Brown, J., Brune, P., Buschelman, K., Eijkhout, V., Gropp, W. D., Kaushik, D., Knepley, M. G., McInnes, L. C., Rupp, K., Smith, B. F., & Zhang, H. (2014). PETSc Web page. http://www.mcs.anl.gov/petsc, http://www.mcs.anl.gov/petsc.
Bhadauria, M., & McKee, S. A. (2010). An approach to resource-aware co-scheduling for CMPs. In: Proceedings of 24th ACM International Conference on Supercomputing ICS ’10, ACM.
Blackford, L. S., Choi, J., Cleary, A., D’Azeuedo, E., Demmel, J., Dhillon, I., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., & Whaley R. C. (1997). ScaLAPACK User’s Guide. SIAM. Philadelphia, PA, USA.
Borgesson, L. (1996). Abaqus. In: Coupled thermo-hydro-mechanical processes of fractured media—mathematical and experimental studies, vol. 79. Amsterdam, Elsevier (pp. 565–570).
Brucker, P., Gladky, A., Hoogeveen, H., Kovalyov, M. Y., Potts, C., Tautenhahn, T., et al. (1998). Scheduling a batching machine. Journal of Scheduling, 1, 31–54.
Chandra, D., Guo, F., Kim, S., & Solihin, Y. (2005). Predicting inter-thread cache contention on a chip multi-processor architecture. In: HPCA 11, IEEE, (pp. 340–351). doi:10.1109/HPCA.2005.27.
Coffman, E. G, Jr, Garey, M. R., Johnson, D. S., & Tarjan, R. E. (1980). Performance bounds for level-oriented two-dimensional packing algorithms. SIAM Journal on Computing, 9(4), 808–826.
Cormen, T . H., Leiserson, C . E., Rivest, R . L., & Stein, C. (2009). Introduction to algorithms. Cambridge: The MIT Press.
Deb, R. K., & Serfozo, R. F. (1973). Optimal control of batch service queues. Advances in Applied Probability, 340–361.
Drozdowski, M. (2003). Scheduling parallel tasks: Algorithms and complexity, Chapter 26. In J. Y. T. Leung (Ed.), Handbook of scheduling: Algorithms, models, and performance analysis. Boca Raton: Chapman/CRC.
Dutot, P. F. (2003). Scheduling parallel tasks: Approximation algorithms, Chapter 26. In J. Y. T. Leung (Ed.), Handbook of scheduling: Algorithms, models, and performance analysis. Boca Rato: Chapman/CRC.
Frachtenberg, E., Feitelson, D., Petrini, F., & Fernandez, J. (2005). Adaptive parallel job scheduling with flexible coscheduling. IEEE Transactions on Parallel and Distributed Systems, 16(11), 1066–1077. doi:10.1109/TPDS.2005.130.
Garey, M. R., & Johnson, D. S. (1979). Computers and intractability. A guide to the theory of NP-completeness. New York: W.H, Freeman and Co.
Gordon. (2011). Gordon user guide: Technical summary. http://www.sdsc.edu/us/resources/gordon/
Hankendi, C., & Coskun, A. (2012). Reducing the energy cost of computing through efficient co-scheduling of parallel workloads. In: Design, Automation Test in Europe Conference Exhibition (DATE), 2012, (pp. 994–999). doi:10.1109/DATE.2012.6176641.
Heroux, M. A., Doerfler, D. W., Crozier, P. S., Willenbring, J. M., Edwards, H. C., Williams, A., Rajan, M., Keiter, E. R., Thornquist, H. K., & Numrich, R. W. (2009). Improving performance via mini-applications. Research Report 5574, Sandia National Laboratories, USA.
Ikura, Y., & Gimple, M. (1986). Efficient scheduling algorithms for a single batch processing machine. Operations Research Letters, 5(2), 61–65.
Kamil, S., Shalf, J., & Strohmaier, E. (2008). Power efficiency in high performance computing. In: IPDPS, IEEE.
Koehler, F., & Khuller, S. (2013). Optimal batch schedules for parallel machines. In: Proceedings of the 13th Annual Algorithms and Data Structures Symposium.
Koole, G., & Righter, R. (2001). A stochastic batching and scheduling problem. Probability in the Engineering and Informational Sciences, 15(04), 465–479.
Kresse, G., & Hafner, J. (1993). Ab initio molecular dynamics for liquid metals. Physical Review B, 47(1), 558–561.
Li, D., Nikolopoulos, D. S., Cameron, K., de Supinski, B. R., & Schulz, M. (2010). Power-aware MPI task aggregation prediction for high-end computing systems. IPDPS, 10, 1–12.
Lodi, A., Martello, S., & Monaci, M. (2002). Two-dimensional packing problems: A survey. European Journal of Operational Research, 141(2), 241–252.
Muthuvelu, N., Chai, I., Chikkannan, E., & Buyya, R. (2011). Batch resizing policies and techniques for fine-grain grid tasks: The nuts and bolts. Journal of Information Processing Systems, 7(2), 299–320.
Plimpton, S. (1995). Fast parallel algorithms for short-range molecular dynamics. Journal of Computational Physics, 117, 1–19.
Potts, C. N., & Kovalyov, M. Y. (2000). Scheduling with batching: A review. European Journal of Operational Research, 120(2), 228–249.
Rountree, B., Lownenthal, D. K., de Supinski, B. R., Schulz, M., Freeh, V. W., & Bletsch, T. (2009). Adagio: Making DVS practical for complex HPC applications. ICS, 09, 460–469.
Scogland, T., Subramaniam, B., & Feng, W. -C. (2011), Emerging trends on the evolving green500: Year three. In: 7th Workshop on High-Performance, Power-Aware Computing, Anchorage, Alaska, USA.
Shantharam, M., Youn, Y., & Raghavan, P. (2013). Speedup-aware co-schedules for efficient workload management. Parallel Processing Letters, 23(2), 1340001.
Turek, J., Schwiegelshohn, U., Wolf, J. L., & Yu, P. S. (1994). Scheduling parallel tasks to minimize average response time. In: Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms, Society for Industrial and Applied Mathematics (pp. 112–121).
Acknowledgments
Anne Benoit and Yves Robert are with the Institut Universitaire de France (IUF). This work was supported in part by the ANR RESCUE project. The research of Padma Raghavan and Manu Shantharam was supported in part by the U.S. National Science Foundation through grants CCF 0963839, 1018881 and 1319448.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Aupy, G., Shantharam, M., Benoit, A. et al. Co-scheduling algorithms for high-throughput workload execution. J Sched 19, 627–640 (2016). https://doi.org/10.1007/s10951-015-0445-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10951-015-0445-x