Effect of MPI tasks location on cluster throughput using NAS

Rodríguez-Pascual, Manuel; Moríñigo, José A.; Mayo-García, Rafael

doi:10.1007/s10586-018-02898-7

Effect of MPI tasks location on cluster throughput using NAS

Published: 03 January 2019

Volume 22, pages 1187–1198, (2019)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Manuel Rodríguez-Pascual¹,
José A. Moríñigo ORCID: orcid.org/0000-0003-2528-7485¹ &
Rafael Mayo-García¹

137 Accesses
4 Citations
Explore all metrics

Abstract

In this work the Numerical Aerodynamic Simulation (NAS) benchmarks have been executed in a systematic way on two clusters of rather different architectures and CPUs, to identify dependencies between MPI tasks mapping and the speedup or resource occupation. To this respect, series of experiments with the NAS kernels have been designed to take into account the context complexity when running scientific applications on HPC environments (CPU, I/O or memory-bound, execution time, degree of parallelism, dedicated computational resources, strong- and weak-scaling behaviour, to cite some). This context includes scheduling decisions, which have a great influence on the performance of the applications, making difficult to achieve an optimal exploitation with cost-effective strategies of the HPC resources. An analysis on how task grouping strategies under various cluster setups drive the execution time of jobs and the infrastructure throughput is provided. As a result, criteria for cluster setup arise linked to maximize performance of individual jobs, total cluster throughput or achieving better scheduling. To this respect, a criterion for execution decisions is suggested. This work is expected to be of interest on the design of scheduling policies and useful to HPC administrators.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Benchmarking Performance: Influence of Task Location on Cluster Throughput

SCOUT: Scheduling Core Utilization to Optimize the Performance of Scientific Computing Applications on CPU/Coprocessor-Based Cluster

Resource Contention Aware Execution of Multiprocessor Tasks on Heterogeneous Platforms

References

Bailey, D., et al.: The NAS Parallel Benchmarks. Tech. Rep. (1994)
Bonnie++: www.coker.com.au/bonnie++
Chai, L., Gao, Q., Panda, D.K.: Understanding the impact of multi-core architecture in cluster computing: a case study with Intel dual-core system. In: Proceedings of 7th IEEE International Symposium on Cluster Computing and the Grid (CCGrid), pp. 471–478 (2007)
Chavarría-Miranda, D., Nieplocha, J., Tipparaju, V.: Topology-aware tile mapping for clusters of SMPs. In: Proceedings of 3rd Conference on Computing Frontiers 2006, pp. 383–392 (2006)
Intel Memory Latency Checker 3.1: www.intel.com/software/mlc
Jeannot, E., Mercier, G., Tessier, F.: Process placement in multicore clusters: algorithmic issues and practical techniques. IEEE Trans. Parallel Distrib. Syst. 25(4), 993–1002 (2014)
Article Google Scholar
McCalpin, J.D.: Memory Bandwidth and Machine Balance in Current High Performance Computers. IEEE TCCA Newsletter (May), pp. 19–25 (1995)
OSU Micro-Benchmarks: http://mvapich.cse.ohio-state.edu/benchmarks
Ribeiro, C.P.: Evaluating CPU and memory affinity for numerical scientific multithreaded benchmarks on multi-cores. Int. J. Comput. Sci. Inf. Security 7(1), 79–93 (2012)
Google Scholar
Rodrigues, E.R., Madruga, F.L., Navaux, P.O.A., Panetta, J.: Multi-core aware process mapping and its impact on communication overhead of parallel applications. In: Proceedings of IEEE Symposium on Computers and Communications, pp. 811–817 (2009)
Shainer, G., Lui, P., Liu, T., Wilde, T., Layton, J.: The impact of inter-node latency versus intra-node latency on HPC applications. In: Proceedings of IASTED International Conference on Parallel and Distributed Computing and Systems, pp. 455–460 (2011)
Smith, B., Bode, B.: Performance effects of node mappings on the IBM BlueGene/L machine. In: Euro-Par 2005 Parallel Processing, pp. 1005–1013 (2005)
Top 500: www.top500.org
Wu, X., Taylor, V.: Performance modeling of hybrid MPI/OpenMP scientific applications on large-scale multicore supercomputers. J. Comput. Syst. Sci. 79(8), 1256–1268 (2013)
Article MathSciNet Google Scholar
Xingfu, W., Taylor, V.: Processor partitioning: an experimental performance analysis of parallel applications on SMP clusters systems. In: 19th International Conference on Parallel Distributed Computing and Systems (PDCS’07), pp. 13–18, Los Angeles, CA, USA (2007)
Xingfu, W., Taylor, V.: Using processor partitioning to evaluate the performance of MPI, OpenMP and Hybrid Parallel Applications on Dual- and Quad-core Cray XT4 Systems. In: Cray UG Proceedings (CUG 2009), pp. 4–7. Atlanta, USA (2009)
Zhang, C., Yuan, X., Srinivasan, A.: Processor affinity and MPI performance on SMP-CMP clusters. In: IEEE International Symposium on Parallel & Distributed Processing, Workshops and PhD Forum, pp. 1–8. Atlanta, USA (2010)

Download references

Acknowledgements

This work was partially funded by the Spanish Ministry of Economy, Industry and Competitiveness project CODEC2 (TIN2015-63562-R) with European Regional Development Fund (ERDF) as well as carried out on computing facilities provided by the CYTED Network RICAP (517RT0529).

Author information

Authors and Affiliations

Department of Technology, Centro de Investigaciones Energéticas, Medioambientales y Tecnológicas, CIEMAT, Madrid, Spain
Manuel Rodríguez-Pascual, José A. Moríñigo & Rafael Mayo-García

Authors

Manuel Rodríguez-Pascual
View author publications
You can also search for this author in PubMed Google Scholar
José A. Moríñigo
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Mayo-García
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to José A. Moríñigo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rodríguez-Pascual, M., Moríñigo, J.A. & Mayo-García, R. Effect of MPI tasks location on cluster throughput using NAS. Cluster Comput 22, 1187–1198 (2019). https://doi.org/10.1007/s10586-018-02898-7

Download citation

Received: 23 January 2018
Revised: 18 July 2018
Accepted: 21 December 2018
Published: 03 January 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s10586-018-02898-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effect of MPI tasks location on cluster throughput using NAS

Abstract

Access this article

Similar content being viewed by others

Benchmarking Performance: Influence of Task Location on Cluster Throughput

SCOUT: Scheduling Core Utilization to Optimize the Performance of Scientific Computing Applications on CPU/Coprocessor-Based Cluster

Resource Contention Aware Execution of Multiprocessor Tasks on Heterogeneous Platforms

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Effect of MPI tasks location on cluster throughput using NAS

Abstract

Access this article

Similar content being viewed by others

Benchmarking Performance: Influence of Task Location on Cluster Throughput

SCOUT: Scheduling Core Utilization to Optimize the Performance of Scientific Computing Applications on CPU/Coprocessor-Based Cluster

Resource Contention Aware Execution of Multiprocessor Tasks on Heterogeneous Platforms

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation