Abstract
Achieving good application performance on a modern compute cluster of multi-core, multi-socket, NUMA-aware systems can be challenging. In this paper, we use VASP, a popular ab-initio quantum-mechanical MD simulation software, to investigate the various levels of the software, hardware, and network tuning that boosts performance on a Dell PowerEdge R815 HPC cluster with AMD “Interlagos” and “Abu-Dhabi” processors. We implement code changes with the free software stack that supports FMA and AVX CPU instructions on the Bulldozer/Piledriver architecture. We analyze the MPI communications by profiling, compare the scalability performance of different interconnects, and discuss various MPI tuning parameters show effects of the advanced features that are crucial to the scalability performance of InfiniBand, including MXM and SRQ, which optimize the network resources for MPI communications. We investigate the importance of the MPI process placement, and introduce a process allocation tool that facilitates the affinity grouping on a multicore architecture.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
InfiniBand Trade Association, http://www.infinibandta.org/
HPC Advisory Council HPC Center, http://www.hpcadvisorycouncil.com/cluster_center.php
The TOP500 list, http://www.top500.org
Shipman, G.M., Woodall, T.S., Graham, R.L., Maccabe, A.B., Bridges, P.G.: InfiniBand Scalability in Open MPI. In: IEEE Parallel and Distributed Processing Symposium (IPDPS), Rhodes Island, Greece (May 2006)
Bailey, D.H., Lucas, R.F., Williams, S.W.: Performance Tuning of Scientific Applications (2011) ISBN 978-1-4398-1569-4
Kresse, G., Hafner, J.: Ab initio molecular dynamics for liquid metals. Phys. Rev. B 47, 558 (1993)
Kresse, G., Hafner, J.: Ab initio molecular-dynamics simulation of the liquid-metal-amorphous-semiconductor transition in germanium. Phys. Rev. B 49, 14251 (1994)
Kresse, G., Furthmüller, J.: Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set. Comput. Mat. Sci. 6, 15 (1996)
Kresse, G., Furthmüller, J.: Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set. Phys. Rev. B 54, 11169 (1996)
Code changes that supports Open64 Compiler on VASP, http://www.hpcadvisorycouncil.com/pdf/open64.diff , http://www.hpcadvisorycouncil.com/pdf/open64.diff
Shainer, G., Lui, P., Liu, T., Wilde, T., Layton, J.: The Impact of Inter-Node Latency versus Intra-Node Latency on HPC Applications. In: Parallel and Distributed Computing and Systems. ACTA Press (2011)
Shainer, G., Wilde, T., Lui, P., Liu, T., Kagan, M., Dubman, M., Shahar, Y., Graham, R., Shamis, P., Poole, S.: The Co-design Architecture for Exascale Systems, A Novel Approach for Scalable Designs. In: ISC 2012. Springer (2012) ISSN 1865-2034
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shainer, G. et al. (2013). Maximizing Application Performance in a Multi-core, NUMA-Aware Compute Cluster by Multi-level Tuning. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds) Supercomputing. ISC 2013. Lecture Notes in Computer Science, vol 7905. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38750-0_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-38750-0_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38749-4
Online ISBN: 978-3-642-38750-0
eBook Packages: Computer ScienceComputer Science (R0)