Multithreaded Geant4: Semi-automatic Transformation into Scalable Thread-Parallel Software

  • Xin Dong
  • Gene Cooperman
  • John Apostolakis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6272)

Abstract

This work presents an application case study. Geant4 is a 750,000 line toolkit first designed in the mid-1990s and originally intended only for sequential computation. Intel’s promise of an 80-core CPU meant that Geant4 users would have to struggle in the future with 80 processes on one CPU chip, each one having a gigabyte memory footprint. Thread parallelism would be desirable. A semi-automatic methodology to parallelize the Geant4 code is presented in this work. Our experimental tests demonstrate linear speedup in a range from one thread to 24 on a 24-core computer. To achieve this performance, we needed to write a custom, thread-private memory allocator, and to detect and eliminate excessive cache misses. Without these improvements, there was almost no performance improvement when going beyond eight cores. Finally, in order to guarantee the run-time correctness of the transformed code, a dynamic method was developed to capture possible bugs and either immediately generate a fault, or optionally recover from the fault.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
  2. 2.
    Arce, P., Lagares, J.I., Perez-Astudillo, D., Apostolakis, J., Cosmo, G.: Optimization of An External Beam Radiotherapy Treatment Using GAMOS/Geant4. In: World Congress on Medic Physics and Biomedical Engineering, vol. 25(1), pp. 794–797. Springer, Heidelberg (2009)Google Scholar
  3. 3.
    Hohlmann, M., Ford, P., Gnanvo, K., Helsby, J., Pena, D., Hoch, R., Mitra, D.: GEANT4 Simulation of a Cosmic Ray Muon Tomography System With Micro-Pattern Gas Detectors for the Detection of High-rm Z Materials. IEEE Transactions on Nuclear Science 56(3-2), 1356–1363 (2009)CrossRefGoogle Scholar
  4. 4.
    Godet, O., Sizun, P., Barret, D., Mandrou, P., Cordier, B., Schanne, S., Remoué, N.: Monte-Carlo simulations of the background of the coded-mask camera for X- and Gamma-rays on-board the Chinese-French GRB mission SVOM. Nuclear Instruments and Methods in Physics Research Section A 603(3), 365–371 (2009)CrossRefGoogle Scholar
  5. 5.
  6. 6.
  7. 7.
    The Hoard Memory Allocator, http://www.hoard.org/
  8. 8.
    Instrumentation Framework for Building Dynamic Analysis Tools, http://valgrind.org/
  9. 9.
    Agostinelli, S., et al.: GEANT4–a simulation toolkit. Nuclear Instruments and Methods in Physics Research Section A 506(3), 250–303 (2003) (over 100 authors, including J. Apostolakis and G. Cooperman)Google Scholar
  10. 10.
    Allison, J., et al.: Geant4 Developments and Applications. IEEE Transactions on Nuclear Science 53(1), 270–278 (2006) (73 authors, including J. Apostolakis and G. Cooperman)CrossRefGoogle Scholar
  11. 11.
    Cooperman, G., Nguyen, V., Malioutov, I.: Parallelization of Geant4 Using TOP-C and Marshalgen. In: IEEE NCA 2006, pp. 48–55 (2006)Google Scholar
  12. 12.
  13. 13.
  14. 14.
    Elsa: An Elkhound-based C++ Parser, http://www.cs.berkeley.edu/~smcpeak/elkhound/
  15. 15.
    Parallel Linear Algebra For Scalable Multi-core Architecture, http://icl.cs.utk.edu/plasma/
  16. 16.
    Bondhugula, U., Hartono, A., Ramanujam, J.: A Practical Automatic Polyhedral Parallelizer and Locality Optimizer. In: PLDI 2008, vol. 43(6), pp. 101–113 (2008)Google Scholar
  17. 17.
    Baskaran, M.M., Vydyanathan, N., Bondhugula, U.K.R., Ramanujam, J., Rountev, A., Sadayappan, P.: Compiler-Assisted Dynamic Scheduling for Effective Parallelization of Loop Nests on Multicore Processors. In: PPoPP 2009, pp. 219–228 (2009)Google Scholar
  18. 18.
    Aleen, F., Clark, N.: Commutativity Analysis for Software Parallelization: Letting Program Transformations See the Big Picture. In: ASPLOS 2009, vol. 44(3), pp. 241–252 (2009)Google Scholar
  19. 19.
  20. 20.
  21. 21.
  22. 22.
    Anderson, Z., Gay, D., Ennals, R., Brewer, E.: SharC: Checking Data Sharing Strategies for Multithreaded C. In: PLDI 2008, vol. 43(6), pp. 149–158 (2008)Google Scholar
  23. 23.
    Voung, J.W., Jhala, R., Lerner, S.: RELAY: Static Race Detection on Millions of Lines of Code. In: ESEC-FSE 2007, pp. 205–214 (2007)Google Scholar
  24. 24.
    Engler, D., Ashcraft, K.: RacerX: Effective, Static Detection of Race Conditions and Deadlocks. In: SOSP 2003, vol. 37(5), pp. 237–252 (2003)Google Scholar
  25. 25.
    Qadeer, S., Wu, D.: KISS: Keep It Simple and Sequential. In: PLDI 2004, pp. 149–158 (2004)Google Scholar
  26. 26.
    Pratikakis, P., Foster, J.S., Hicks, M.: LOCKSMITH: Context-Sensitive Correlation Analysis for Race Detection. In: PLDI 2006, vol. 41(6), pp. 320–331 (2006)Google Scholar
  27. 27.
    Henzinger, T.A., Jhala, R., Majumdar, R.: Race Checking by Context Inference. In: PLDI 2004, pp. 1–13 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Xin Dong
    • 1
  • Gene Cooperman
    • 1
  • John Apostolakis
    • 2
  1. 1.College of Computer ScienceNortheastern UniversityBostonUSA
  2. 2.PH/SFT, CERNGeneva 23Switzerland

Personalised recommendations