High Performance Optimizations for Nuclear Physics Code MFDn on KNL

  • Brandon Cook
  • Pieter Maris
  • Meiyue Shao
  • Nathan Wichmann
  • Marcus Wagner
  • John O’Neill
  • Thanh Phung
  • Gaurav Bansal
Conference paper

DOI: 10.1007/978-3-319-46079-6_26

Part of the Lecture Notes in Computer Science book series (LNCS, volume 9945)
Cite this paper as:
Cook B. et al. (2016) High Performance Optimizations for Nuclear Physics Code MFDn on KNL. In: Taufer M., Mohr B., Kunkel J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science, vol 9945. Springer, Cham

Abstract

Initial optimization strategies and results on MFDn, a large-scale nuclear physics application code, running on a single KNL node are presented. This code consists of the construction of a very large sparse real symmetric matrix and computing a few lowest eigenvalues and eigenvectors of this matrix through iterative methods. Challenges addressed include effectively utilizing MCDRAM with representative input data for production runs on 5,000 KNL nodes that require over 80 GB of memory per node, using OpenMP 4 to parallelize functions in the construction phase of the sparse matrices, and vectorizing those functions in spite of while-loops, conditionals, and lookup tables with indirect indexing. Moreover, hybrid MPI/OpenMP is employed not only to maximize the total problem size that can be solved per node, but also to eventually minimize parallel scaling overhead through the best scaling combination of MPI ranks per node with OpenMP threads. We describe a vectorized version of a popcount operation to avoid serialization on intrinsic popcnt which only operates on scalar registers. Additionally we leverage SSE 4.2 string comparison instructions to determine nonzero matrix elements. By utilizing MCDRAM, we achieve excellent Sparse Matrix–Matrix multiplication performance; in particular, using blocks of 8 vectors lead to a speedup of 6.4\(\times \) on KNL and 2.9\(\times \) on Haswell compared to the performance of repeated SpMV’s. This optimization was essential in achieving a 1.6\(\times \) improvement on KNL over Haswell.

Keywords

Vectorization MCDRAM KNL MFDn Sparse matrix SpMV 

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Brandon Cook
    • 1
  • Pieter Maris
    • 2
  • Meiyue Shao
    • 1
  • Nathan Wichmann
    • 3
  • Marcus Wagner
    • 3
  • John O’Neill
    • 4
  • Thanh Phung
    • 4
  • Gaurav Bansal
    • 4
  1. 1.Lawrence Berkeley National LaboratoryBerkeleyUSA
  2. 2.Department of Physics and AstronomyIowa State UniversityAmesUSA
  3. 3.Cray Inc.SeattleUSA
  4. 4.Software and Services GroupIntel CorporationSanta ClaraUSA

Personalised recommendations