Co-scheduling algorithms for high-throughput workload execution

Aupy, Guillaume; Shantharam, Manu; Benoit, Anne; Robert, Yves; Raghavan, Padma

doi:10.1007/s10951-015-0445-x

Co-scheduling algorithms for high-throughput workload execution

Published: 30 August 2015

Volume 19, pages 627–640, (2016)
Cite this article

Journal of Scheduling Aims and scope Submit manuscript

Guillaume Aupy¹,
Manu Shantharam³,
Anne Benoit¹,
Yves Robert^1,2 &
…
Padma Raghavan⁴

427 Accesses
13 Citations
Explore all metrics

Abstract

This paper investigates co-scheduling algorithms for processing a set of parallel applications. Instead of executing each application one by one, using a maximum degree of parallelism for each of them, we aim at scheduling several applications concurrently. We partition the original application set into a series of packs, which are executed one by one. A pack comprises several applications, each of them with an assigned number of processors, with the constraint that the total number of processors assigned within a pack does not exceed the maximum number of available processors. The objective is to determine a partition into packs, and an assignment of processors to applications, that minimize the sum of the execution times of the packs. We thoroughly study the complexity of this optimization problem, and propose several heuristics that exhibit very good performance on a variety of workloads, whose application execution times model profiles of parallel scientific codes. We show that co-scheduling leads to faster workload completion time (40 % improvement on average over traditional scheduling) and to faster response times (50 % improvement). Hence, co-scheduling increases system throughput and saves energy, leading to significant benefits from both the user and system perspectives.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Analytical Bound for Choosing Trivial Strategies in Co-scheduling

Locality-aware task scheduling for homogeneous parallel computing systems

Article 01 November 2017

Optimal Co-Scheduling to Minimize Makespan on Chip Multiprocessors

References

Balay, S., Brown, J., Buschelman, K., Gropp, W. D., Kaushik, D., Knepley, M. G., McInnes, L. C., Smith, B. F., & Zhang, H. (2012). PETSc Web page. http://www.mcs.anl.gov/petsc.
Balay, S., Abhyankar, S., Adams, M. F., Brown, J., Brune, P., Buschelman, K., Eijkhout, V., Gropp, W. D., Kaushik, D., Knepley, M. G., McInnes, L. C., Rupp, K., Smith, B. F., & Zhang, H. (2014). PETSc Web page. http://www.mcs.anl.gov/petsc, http://www.mcs.anl.gov/petsc.
Bhadauria, M., & McKee, S. A. (2010). An approach to resource-aware co-scheduling for CMPs. In: Proceedings of 24th ACM International Conference on Supercomputing ICS ’10, ACM.
Blackford, L. S., Choi, J., Cleary, A., D’Azeuedo, E., Demmel, J., Dhillon, I., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., & Whaley R. C. (1997). ScaLAPACK User’s Guide. SIAM. Philadelphia, PA, USA.
Borgesson, L. (1996). Abaqus. In: Coupled thermo-hydro-mechanical processes of fractured media—mathematical and experimental studies, vol. 79. Amsterdam, Elsevier (pp. 565–570).
Brucker, P., Gladky, A., Hoogeveen, H., Kovalyov, M. Y., Potts, C., Tautenhahn, T., et al. (1998). Scheduling a batching machine. Journal of Scheduling, 1, 31–54.
Article Google Scholar
Chandra, D., Guo, F., Kim, S., & Solihin, Y. (2005). Predicting inter-thread cache contention on a chip multi-processor architecture. In: HPCA 11, IEEE, (pp. 340–351). doi:10.1109/HPCA.2005.27.
Coffman, E. G, Jr, Garey, M. R., Johnson, D. S., & Tarjan, R. E. (1980). Performance bounds for level-oriented two-dimensional packing algorithms. SIAM Journal on Computing, 9(4), 808–826.
Article Google Scholar
Cormen, T . H., Leiserson, C . E., Rivest, R . L., & Stein, C. (2009). Introduction to algorithms. Cambridge: The MIT Press.
Google Scholar
Deb, R. K., & Serfozo, R. F. (1973). Optimal control of batch service queues. Advances in Applied Probability, 340–361.
Drozdowski, M. (2003). Scheduling parallel tasks: Algorithms and complexity, Chapter 26. In J. Y. T. Leung (Ed.), Handbook of scheduling: Algorithms, models, and performance analysis. Boca Raton: Chapman/CRC.
Google Scholar
Dutot, P. F. (2003). Scheduling parallel tasks: Approximation algorithms, Chapter 26. In J. Y. T. Leung (Ed.), Handbook of scheduling: Algorithms, models, and performance analysis. Boca Rato: Chapman/CRC.
Google Scholar
Frachtenberg, E., Feitelson, D., Petrini, F., & Fernandez, J. (2005). Adaptive parallel job scheduling with flexible coscheduling. IEEE Transactions on Parallel and Distributed Systems, 16(11), 1066–1077. doi:10.1109/TPDS.2005.130.
Article Google Scholar
Garey, M. R., & Johnson, D. S. (1979). Computers and intractability. A guide to the theory of NP-completeness. New York: W.H, Freeman and Co.
Google Scholar
Gordon. (2011). Gordon user guide: Technical summary. http://www.sdsc.edu/us/resources/gordon/
Hankendi, C., & Coskun, A. (2012). Reducing the energy cost of computing through efficient co-scheduling of parallel workloads. In: Design, Automation Test in Europe Conference Exhibition (DATE), 2012, (pp. 994–999). doi:10.1109/DATE.2012.6176641.
Heroux, M. A., Doerfler, D. W., Crozier, P. S., Willenbring, J. M., Edwards, H. C., Williams, A., Rajan, M., Keiter, E. R., Thornquist, H. K., & Numrich, R. W. (2009). Improving performance via mini-applications. Research Report 5574, Sandia National Laboratories, USA.
Ikura, Y., & Gimple, M. (1986). Efficient scheduling algorithms for a single batch processing machine. Operations Research Letters, 5(2), 61–65.
Article Google Scholar
Kamil, S., Shalf, J., & Strohmaier, E. (2008). Power efficiency in high performance computing. In: IPDPS, IEEE.
Koehler, F., & Khuller, S. (2013). Optimal batch schedules for parallel machines. In: Proceedings of the 13th Annual Algorithms and Data Structures Symposium.
Koole, G., & Righter, R. (2001). A stochastic batching and scheduling problem. Probability in the Engineering and Informational Sciences, 15(04), 465–479.
Google Scholar
Kresse, G., & Hafner, J. (1993). Ab initio molecular dynamics for liquid metals. Physical Review B, 47(1), 558–561.
Article Google Scholar
Li, D., Nikolopoulos, D. S., Cameron, K., de Supinski, B. R., & Schulz, M. (2010). Power-aware MPI task aggregation prediction for high-end computing systems. IPDPS, 10, 1–12.
Google Scholar
Lodi, A., Martello, S., & Monaci, M. (2002). Two-dimensional packing problems: A survey. European Journal of Operational Research, 141(2), 241–252.
Article Google Scholar
Muthuvelu, N., Chai, I., Chikkannan, E., & Buyya, R. (2011). Batch resizing policies and techniques for fine-grain grid tasks: The nuts and bolts. Journal of Information Processing Systems, 7(2), 299–320.
Article Google Scholar
Plimpton, S. (1995). Fast parallel algorithms for short-range molecular dynamics. Journal of Computational Physics, 117, 1–19.
Article Google Scholar
Potts, C. N., & Kovalyov, M. Y. (2000). Scheduling with batching: A review. European Journal of Operational Research, 120(2), 228–249.
Rountree, B., Lownenthal, D. K., de Supinski, B. R., Schulz, M., Freeh, V. W., & Bletsch, T. (2009). Adagio: Making DVS practical for complex HPC applications. ICS, 09, 460–469.
Google Scholar
Scogland, T., Subramaniam, B., & Feng, W. -C. (2011), Emerging trends on the evolving green500: Year three. In: 7th Workshop on High-Performance, Power-Aware Computing, Anchorage, Alaska, USA.
Shantharam, M., Youn, Y., & Raghavan, P. (2013). Speedup-aware co-schedules for efficient workload management. Parallel Processing Letters, 23(2), 1340001.
Article Google Scholar
Turek, J., Schwiegelshohn, U., Wolf, J. L., & Yu, P. S. (1994). Scheduling parallel tasks to minimize average response time. In: Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms, Society for Industrial and Applied Mathematics (pp. 112–121).

Download references

Acknowledgments

Anne Benoit and Yves Robert are with the Institut Universitaire de France (IUF). This work was supported in part by the ANR RESCUE project. The research of Padma Raghavan and Manu Shantharam was supported in part by the U.S. National Science Foundation through grants CCF 0963839, 1018881 and 1319448.

Author information

Authors and Affiliations

LIP, ENS Lyon, Lyon, France
Guillaume Aupy, Anne Benoit & Yves Robert
University of Tennessee, Knoxville, USA
Yves Robert
San Diego Supercomputer Center, La Jolla, USA
Manu Shantharam
Pennsylvania State University, State College, USA
Padma Raghavan

Authors

Guillaume Aupy
View author publications
You can also search for this author in PubMed Google Scholar
Manu Shantharam
View author publications
You can also search for this author in PubMed Google Scholar
Anne Benoit
View author publications
You can also search for this author in PubMed Google Scholar
Yves Robert
View author publications
You can also search for this author in PubMed Google Scholar
Padma Raghavan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guillaume Aupy.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aupy, G., Shantharam, M., Benoit, A. et al. Co-scheduling algorithms for high-throughput workload execution. J Sched 19, 627–640 (2016). https://doi.org/10.1007/s10951-015-0445-x

Download citation

Published: 30 August 2015
Issue Date: December 2016
DOI: https://doi.org/10.1007/s10951-015-0445-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Co-scheduling algorithms for high-throughput workload execution

Abstract

Access this article

Similar content being viewed by others

An Analytical Bound for Choosing Trivial Strategies in Co-scheduling

Locality-aware task scheduling for homogeneous parallel computing systems

Optimal Co-Scheduling to Minimize Makespan on Chip Multiprocessors

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Co-scheduling algorithms for high-throughput workload execution

Abstract

Access this article

Similar content being viewed by others

An Analytical Bound for Choosing Trivial Strategies in Co-scheduling

Locality-aware task scheduling for homogeneous parallel computing systems

Optimal Co-Scheduling to Minimize Makespan on Chip Multiprocessors

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation