Optimizing the Advanced Accelerator Simulation Framework Synergia Using OpenMP

  • Hongzhang Shan
  • Erich Strohmaier
  • James Amundson
  • Eric G. Stern
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7312)

Abstract

Synergia is an advanced accelerator simulation framework widely used in the accelerator community. Unfortunately, its performance and scalability suffers significantly from very high communication requirements. In this paper, we address this issue by replacing the flat MPI programming model with the hybrid OpenMP+MPI programming model. We describe in detail how the code has been parallelized in OpenMP and what the challenges are. The improved hybrid code can perform over 1.7 times better than the original program for a realistic benchmark problem.

Keywords

Beam Dynamic Time Breakdown OpenMP Thread Local Charge Density NUMA Architecture 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Amundson, J., Spentzouris, P., Qiang, J., Ryne, R.: Synergia: An accelerator modeling tool with 3-d space charge. J. Comp. Phys. 211, 229 (2006)Google Scholar
  2. 2.
    Broquedis, F., Furmento, N., Goglin, B., Namyst, R., Wacrenier, P.-A.: Dynamic Task and Data Placement over NUMA Architectures: An OpenMP Runtime Perspective. In: Müller, M.S., de Supinski, B.R., Chapman, B.M. (eds.) IWOMP 2009. LNCS, vol. 5568, pp. 79–92. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  3. 3.
    Brunst, H., Mohr, B.: Performance Analysis of Large-Scale OpenMP and Hybrid MPI/OpenMP Applications with Vampir NG. In: Mueller, M.S., Chapman, B.M., de Supinski, B.R., Malony, A.D., Voss, M. (eds.) IWOMP 2005 and IWOMP 2006. LNCS, vol. 4315, pp. 5–14. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  4. 4.
    CMAKE: the cross-platform, open-source build system, http://www.cmake.org
  5. 5.
    Frigo, M., Johnsoni, S.G.: The design and implementation of fftw3. Proceedings of the IEEE 93(2), 216–231 (2005)CrossRefGoogle Scholar
  6. 6.
    Kaushik, D., Keyes, D., Balay, S., Smith, B.: Hybrid Programming Model for Implicit PDE Simulations on Multicore Architectures. In: Chapman, B.M., Gropp, W.D., Kumaran, K., Müller, M.S. (eds.) IWOMP 2011. LNCS, vol. 6665, pp. 12–21. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  7. 7.
    Nakajima, K.: Three-level hybrid vs. flat MPI on the Earth Simulator: parallel iterative solvers for finite-element method. Applied Numerical Mathematics 54(2) (July 2005)Google Scholar
  8. 8.
    Qiang, J., Li, X.: Particle-field decomposition and domain decomposition in parallel particle-in-cell beam dynamics simulation. Computer Physics Communications 181, 2024 (2010)MATHCrossRefGoogle Scholar
  9. 9.
    Shan, H., Blagojevic, F., Min, S.J., Hargrove, P., Jin, H., Fuerlinger, K., Koniges, A., Wright, N.J.: A programming model performance study using the nas parallel benchmarks. Scientific Programming-Exploring Languages for Expressing Medium to Massive On-Chip Parallelism 18(3-4) (August 2010)Google Scholar
  10. 10.
    Su, C., Li, D., Nikolopoulos, D., Grove, M., Cameron, K., de Supinski, B.: Critical path-based thread placement for NUMA systems. In: 2nd International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Hongzhang Shan
    • 1
  • Erich Strohmaier
    • 1
  • James Amundson
    • 2
  • Eric G. Stern
    • 2
  1. 1.Future Technology Group, Computational Research DivisionLawrence Berkeley National LaboratoryBerkeley
  2. 2.Fermi National Accelerator LaboratoryBatavia

Personalised recommendations