Communication Models Insights Meet Simulations
It is well-known that taking into account communications while scheduling jobs in large scale parallel computing platforms is a crucial issue. In modern hierarchical platforms, communication times are highly different when occurring inside a cluster or between clusters. Thus, allocating the jobs taking into account locality constraints is a key factor for reaching good performances. However, several theoretical results prove that imposing such constraints reduces the solution space and thus, possibly degrades the performances. In practice, such constraints simplify implementations and most often lead to better results.
Our aim in this work is to bridge theoretical and practical intuitions, and check the differences between constrained and unconstrained schedules (namely with respect to locality and node contiguity) through simulations. We have developed a generic tool, using SimGrid as the base simulator, enabling interactions with external batch schedulers to evaluate their scheduling policies. The results confirm that insights gained through theoretical models are ill-suited to current architectures and should be reevaluated.
KeywordsFCFS with backfilling Simulations Heterogeneity
The work is partially supported by the ANR project MOEBUS. Experiments presented in this paper were carried out using the Grid’5000 experimental testbed, being developed under the INRIA ALADDIN development action with support from CNRS, RENATER and several Universities as well as other funding bodies (see https://www.grid5000.fr).
- 2.Giroudeau, R., König, J.C.: Scheduling with communication delay. In: Multiprocessor Scheduling: Theory and Applications, pp. 1–26. ARS Publishing, December 2007Google Scholar
- 4.Hunold, S., Casanova, H., Suter, F.: From simulation to experiment: a case study on multiprocessor task scheduling. In: Proceedings of the 13th Workshop on Advances on Parallel and Distributed Processing Symposium (APDCM) (2011)Google Scholar
- 5.Jeannot, E., Meneses, E., Mercier, G., Tessier, F., Zheng, G.: Communication and topology-aware load balancing in charm++ with treematch. In: IEEE Cluster 2013. IEEE, Indianapolis, United States, September 2013Google Scholar
- 6.Leung, J.: Handbook of Scheduling: Algorithms, Models, and Performance Analysis. Chapman and Hall/CRC Computer and Information Science Series. CRC Press, Boca Raton (2004) Google Scholar
- 8.Lucarelli, G., Mendonca, F., Trystram, D., Wagner, F.: Contiguity and locality in backfilling scheduling. In: 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), May 2015Google Scholar