Abstract
Scheduling and dispatching are critical enabling technologies in supercomputing and grid computing. In these contexts, scalability is an issue: we have to allocate and schedule up to tens of thousands of tasks on tens of thousands of resources. This problem scale is out of reach for complete and centralized scheduling approaches. We propose a distributed allocation and scheduling paradigm called DARDIS that is lightweight, scalable and fully customizable in many domains. In DARDIS each task offloads to the available resources the computation of a probability index associated with each possible start time for the given task on the specific resource. The task then selects the proper resource and start time on the basis of the above probability. The scheduler can be customized with different policies to fit several objective functions like load balancing or makespan. We evaluate our approach in the domain of grids and supercomputers. We compare DARDIS with the most widely used algorithms used in these specific domains to show that this approach can reach better solutions in several cases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
NSCC: Tianhe-2 service page (2015). http://www.nscc-gz.cn/Product/HighPerformanceComputingService/ServiceCharacteristics.html#Page_1
BBC: Supercomputers: Obama orders world’s fastest computer (2015). http://www.bbc.com/news/technology-33718311
Attig, N., Gibbon, P., Lippert, T.: Trends in supercomputing: the european path to exascale. Comput. Phys. Commun. 182(9), 2041–2046 (2011)
Lavignon, J., et al.: Etp4hpc strategic research agenda achieving hpc leadership in europe (2013). http://www.etp4hpc.eu/wp-content/uploads/2013/06/ETP4HPC_book_singlePage.pdf
Salot, P.: A survey of various scheduling algorithm in cloud computing environment. Int. J. Res. Eng. Technol. (IJRET) (2013). ISSN 2319-1163
Bartolini, A., Borghesi, A., Bridi, T., Lombardi, M., Milano, M.: Proactive workload dispatching on the EURORA supercomputer. In: O’Sullivan, B. (ed.) CP 2014. LNCS, vol. 8656, pp. 765–780. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10428-7_55
Borghesi, A., Collina, F., Lombardi, M., Milano, M., Benini, L.: Power capping in high performance computing systems. In: Pesant, G. (ed.) CP 2015. LNCS, vol. 9255, pp. 524–540. Springer, Heidelberg (2015). doi:10.1007/978-3-319-23219-5_37
Van Den Briel, M., Scott, P., Thiébaux, S.: Randomized load control: A simple distributed approach for scheduling smart appliances. In: Proceedings of the 23th International Joint Conference on Artificial Intelligence, pp. 2915–2922. AAAI Press (2013)
Bergman, K., Borkar, S., Campbell, D., Carlson, W., Dally, W., Denneau, M., et al.: Exascale computing study: technology challenges in achieving exascale systems. Defense Advanced Research Projects Agency Information Processing Techniques Office (DARPA IPTO), Technical report 15 (2008)
Borghesi, A., Conficoni, C., Lombardi, M., Bartolini, A.: Ms3: A mediterranean-stile job scheduler for supercomputers-do less when it’s too hot!. In: 2015 International Conference on High Performance Computing & Simulation (HPCS), pp. 88–95. IEEE (2015)
Feng, X., Ge, R., Cameron, K.W.: Power and energy profiling of scientific applications on distributed systems. In: 19th IEEE International, Parallel and Distributed Processing Symposium, 2005, Proceedings, p. 34. IEEE (2005)
Mehta, V.K.: Variable load on power station (2005). http://www.nct-tech.edu.lk/Download/Technology%20Zone/Variable%20Load%20on%20Power%20Station..pdf
Subramani, V., Kettimuthu, R., Srinivasan, S., Sadayappan, P.: Distributed job scheduling on computational grids using multiple simultaneous requests. In: 11th IEEE International Symposium on High Performance Distributed Computing, HPDC-11 2002, Proceedings, pp. 359–366. IEEE (2002)
Feitelson, D.: The cea curie log (2012). http://www.cs.huji.ac.il/labs/parallel/workload/l_cea_curie/index.html
Yoo, A.B., Jette, M.A., Grondona, M.: SLURM: simple linux utility for resource management. In: Feitelson, D., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2003. LNCS, vol. 2862, pp. 44–60. Springer, Heidelberg (2003). doi:10.1007/10968987_3
Blazewicz, J., Lenstra, J.K., Kan, A.R.: Scheduling subject to resource constraints: classification and complexity. Discrete Appl. Math. 5(1), 11–24 (1983)
Hartmann, S.: A self-adapting genetic algorithm for project scheduling under resource constraints. NRL 49(5), 433–448 (2002)
Damay, J., Quilliot, A., Sanlaville, E.: Linear programming based algorithms for preemptive and non-preemptive rcpsp. Eur. J. Oper. Res. 182(3), 1012–1022 (2007)
Bhaskar, T., Pal, M.N., Pal, A.K.: A heuristic method for rcpsp with fuzzy activity times. Eur. J. Oper. Res. 208(1), 57–66 (2011)
Haupt, R.: A survey of priority rule-based scheduling. Oper. Res. Spektrum 11(1), 3–16 (1989)
Ramamritham, K., Stankovic, J., Zhao, W., et al.: Distributed scheduling of tasks with deadlines and resource requirements. IEEE Trans. Comput. 38(8), 1110–1123 (1989)
Izakian, H., Tork Ladani, B., Zamanifar, K., Abraham, A.: A novel particle swarm optimization approach for grid job scheduling. In: Prasad, S.K., Routray, S., Khurana, R., Sahni, S. (eds.) ICISTM 2009. CCIS, vol. 31, pp. 100–109. Springer, Heidelberg (2009). doi:10.1007/978-3-642-00405-6_14
Zhan, S., Huo, H.: Improved pso-based task scheduling algorithm in cloud computing. J. Inform. Comput. Sci. 9(13), 3821–3829 (2012)
Izakian, H., Ladani, B.T., Abraham, A., Snasel, V.: A discrete particle swarm optimization approach for grid job scheduling. Int. J. Innovative Comput. Inform. Control 6(9), 4219–4233 (2010)
Vanneschi, L., Codecasa, D., Mauri, G.: A comparative study of four parallel and distributed pso methods. New Gener. Comput. 29(2), 129–161 (2011)
Montresor, A., Meling, H., Babaoğlu, Ö.: Messor: load-balancing through a swarm of autonomous agents. In: Moro, G., Koubarakis, M. (eds.) AP2PC 2002. LNCS (LNAI), vol. 2530, pp. 125–137. Springer, Heidelberg (2003). doi:10.1007/3-540-45074-2_12
Benhamou, F. (ed.): CP 2006. LNCS, vol. 4204. Springer, Heidelberg (2006)
Gomes, C.P., van Hoeve, W.J., Selman, B.: Constraint programming for distributed planning and scheduling. AAAI Spring Symposium: Distributed Plan and Schedule Management, vol. 1, pp. 157–158 (2006)
Rolf, C.C., Kuchcinski, K.: Distributed constraint programming with agents. In: Bouchachia, A. (ed.) ICAIS 2011. LNCS (LNAI), vol. 6943, pp. 320–331. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23857-4_32
Bridi, T., Bartolini, A., Lombardi, M., Milano, M., Benini, L.: A constraint programming scheduler for heterogeneous high-performance computing machines. IEEE Trans. Parallel Distrib. Syst. 27(10), 2781–2794 (2016). doi:10.1109/TPDS.2016.2516997. ISSN:1045-9219
Acknowledgments
This work was partially supported by the FP7 ERC Advance project MULTITHERMAN (g.a. 291125), by the YINS RTD project (no. 20NA21 150939), evaluated by the Swiss NSF and funded by Nano-Tera.ch with Swiss Confederation financing and by CINECA.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Bridi, T., Lombardi, M., Bartolini, A., Benini, L., Milano, M. (2016). DARDIS: Distributed And Randomized DIspatching and Scheduling. In: Adorni, G., Cagnoni, S., Gori, M., Maratea, M. (eds) AI*IA 2016 Advances in Artificial Intelligence. AI*IA 2016. Lecture Notes in Computer Science(), vol 10037. Springer, Cham. https://doi.org/10.1007/978-3-319-49130-1_36
Download citation
DOI: https://doi.org/10.1007/978-3-319-49130-1_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49129-5
Online ISBN: 978-3-319-49130-1
eBook Packages: Computer ScienceComputer Science (R0)