Abstract
Today’s scientific computing applications require many different kinds of task and computational resource. The success of scientific computing hinges on the development of High Performance Computing (HPC) system in the role of decreasing execution time. Remarkably, the support is more enhanced with the advent of accelerators like Graphics Processing Unit (GPU) or Intel Xeon Phi (MIC) coprocessor. However, problems related to coprocessor underutilization of MIC can lead to the thread and memory over-subscription. Based on logging the runtime behaviors of scientific applications, scheduling jobs usually has constraints on the completion time of jobs as deadline or due date assignment. These problems can be solved to improve the performance by a suitable method such as scheduling or assigning priorities to job submission. In this paper, we propose a scheduling module named SCOUT by exploiting factors from the view of the application’s performance to improve the scheduler on a CPU/Coprocessor-based cluster. SCOUT focuses on the performance of applications as well as reducing their execution time on Xeon Phi accelerator. Furthermore, our scheduling module decides the order of job execution to increase the throughput and minimize the delay time. Given a set of popular scientific applications, the experimental results show that the performance and throughput of SCOUT are better than others compared policies. Especially, we implement the entire module as a seamless plug-in to an HPC workload manager named PBS Professional and show the efficiency in practice.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Assarzadegan, P., Rasti-Barzoki, M.: Minimizing sum of the due date assignment costs, maximum tardiness and distribution costs in a supply chain scheduling problem. Appl. Soft Comput. 47, 343–356 (2016)
Awasthi, M., Nellans, D., Sudan, K., Balasubramonian, R., Davis, A.: Handling the problems and opportunities posed by multiple on-chip memory controllers. In: Parallel Architectures and Compilation Techniques (PACT), 2010 19th International Conference on, pp. 319–330. IEEE (2010)
Benton, J., Do, M.B., Kambhampati, S.: Over-subscription planning with numeric goals. In: IJCAI, pp. 1207–1213. Citeseer (2005)
Blair-Chappell, S., Stokes, A.: Parallel Programming With Intel Parallel Studio XE. Wiley, Hoboken (2012)
Boillat, J.E., Kropf, P.G.: A fast distributed mapping algorithm. In: CONPAR 90VAPP IV, pp. 405–416. Springer, Berlin (1990)
Broquedis, F., Aumage, O., Goglin, B., Thibault, S., Wacrenier, P.A., Namyst, R.: Structuring the execution of openmp applications for multicore architectures. In: Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on, pp. 1–10. IEEE (2010)
Broquedis, F., Clet-Ortega, J., Moreaud, S., Furmento, N., Goglin, B., Mercier, G., Thibault, S., Namyst, R.: hwloc: A generic framework for managing hardware affinities in hpc applications. In: Parallel, Distributed and Network-Based Processing (PDP), 2010 18th Euromicro International Conference on, pp. 180–186. IEEE (2010)
Chrysos, G.: Intel® xeon phi coprocessor-the architecture, vol. 176. Intel Whitepaper (2014)
Cruz, E.H., Diener, M., Pilla, L.L., Navaux, P.O.: An efficient algorithm for communication-based task mapping. In: Parallel, Distributed and Network-Based Processing (PDP), 2015 23rd Euromicro International Conference on, pp. 207–214. IEEE (2015)
Diener, M., Cruz, E.H., Alves, M.A., Navaux, P.O., Koren, I.: Affinity-based thread and data mapping in shared memory systems. ACM Comput. Surv. (CSUR) 49(4), 64 (2017)
Goglin, B., Furmento, N.: Enabling high-performance memory migration for multithreaded applications on linux. In: Parallel & Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on, pp. 1–9. IEEE (2009)
Gordon, V., Proth, J.M., Chu, C.: A survey of the state-of-the-art of common due date assignment and scheduling research. Eur. J. Oper. Res. 139(1), 1–25 (2002)
Iancu, C., Hofmeyr, S., Blagojević, F., Zheng, Y.: Oversubscription on multicore processors. In: Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on, pp. 1–11. IEEE (2010)
Ito, S., Goto, K., Ono, K.: Automatically optimized core mapping to subdomains of domain decomposition method on multicore parallel environments. Comput. Fluids 80, 88–93 (2013)
Jackson, J.R.: Scheduling a production line to minimize maximum tardiness. Technical Report, California Univ Los Angeles Numerical Analysis Research (1955)
Kaur, K., Chhabra, A., Singh, G.: Heuristics based genetic algorithm for scheduling static tasks in homogeneous parallel system. Int. J. Comput. Sci. Secur. (IJCSS) 4(2), 183–198 (2010)
Leung, J.Y.: Handbook of scheduling: algorithms, models, and performance analysis. CRC Press, Baco Raton (2004)
Li, J., Sun, K., Xu, D., Li, H.: Single machine due date assignment scheduling problem with customer service level in fuzzy environment. Appl. Soft Comput. 10(3), 849–858 (2010)
Lublin, U., Feitelson, D.G.: The workload on parallel supercomputers: Modeling the characteristics of rigid jobs. J. Parallel Distrib. Comput. 63(11), 1105–1122 (2003)
Power, J., Basu, A., Gu, J., Puthoor, S., Beckmann, B.M., Hill, M.D., Reinhardt, S.K., Wood, D.A.: Heterogeneous system coherence for integrated cpu-gpu systems. In: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 457–467. ACM, New York (2013)
Raschka, S.: Python machine learning. Packt Publishing Ltd, Birmingham (2015)
Smith, D.E.: Choosing objectives in over-subscription planning. In: ICAPS, vol. 4, p. 393 (2004)
Tavakkoli-Moghaddam, R., Moslehi, G., Vasei, M., Azaron, A.: Optimal scheduling for a single machine to minimize the sum of maximum earliness and tardiness considering idle insert. Appl. Math. Comput. 167(2), 1430–1450 (2005)
Tesla, N.: A unified graphics and computing architecture. IEEE Computer Society pp. 0272–1732 (2008)
Toosi, A.N., Sinnott, R.O., Buyya, R.: Resource provisioning for data-intensive applications with deadline constraints on hybrid clouds using aneka. Futur. Gener. Comput. Syst. 79, 765–775 (2018)
Van Den Briel, M., Sanchez, R., Do, M.B., Kambhampati, S.: Effective approaches for partial satisfaction (over-subscription) planning. In: AAAI, pp. 562–569 (2004)
Vasile, M.A., Pop, F., Tutueanu, R.I., Cristea, V., Kołodziej, J.: Resource-aware hybrid scheduling algorithm in heterogeneous distributed computing. Futur. Gener. Comput. Syst. 51, 61–71 (2015)
Acknowledgements
We thank the anonymous reviewers for their insightful comments. This research was conducted within the “Studying Tools to Support Applications Running on Powerful Clusters & Big Data Analytics (HPDA phase I 2018–2020)” funded by Ho Chi Minh City Department of Science and Technology (under grant number 46/2018/HD-QKHCN).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Chung, M.T., Pham, K.T., Nguyen, MT., Thoai, N. (2021). SCOUT: Scheduling Core Utilization to Optimize the Performance of Scientific Computing Applications on CPU/Coprocessor-Based Cluster. In: Bock, H.G., Jäger, W., Kostina, E., Phu, H.X. (eds) Modeling, Simulation and Optimization of Complex Processes HPSC 2018. Springer, Cham. https://doi.org/10.1007/978-3-030-55240-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-55240-4_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-55239-8
Online ISBN: 978-3-030-55240-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)