Abstract
This work presents an application case study. Geant4 is a 750,000 line toolkit first designed in the mid-1990s and originally intended only for sequential computation. Intel’s promise of an 80-core CPU meant that Geant4 users would have to struggle in the future with 80 processes on one CPU chip, each one having a gigabyte memory footprint. Thread parallelism would be desirable. A semi-automatic methodology to parallelize the Geant4 code is presented in this work. Our experimental tests demonstrate linear speedup in a range from one thread to 24 on a 24-core computer. To achieve this performance, we needed to write a custom, thread-private memory allocator, and to detect and eliminate excessive cache misses. Without these improvements, there was almost no performance improvement when going beyond eight cores. Finally, in order to guarantee the run-time correctness of the transformed code, a dynamic method was developed to capture possible bugs and either immediately generate a fault, or optionally recover from the fault.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Arce, P., Lagares, J.I., Perez-Astudillo, D., Apostolakis, J., Cosmo, G.: Optimization of An External Beam Radiotherapy Treatment Using GAMOS/Geant4. In: World Congress on Medic Physics and Biomedical Engineering, vol. 25(1), pp. 794–797. Springer, Heidelberg (2009)
Hohlmann, M., Ford, P., Gnanvo, K., Helsby, J., Pena, D., Hoch, R., Mitra, D.: GEANT4 Simulation of a Cosmic Ray Muon Tomography System With Micro-Pattern Gas Detectors for the Detection of High-rm Z Materials. IEEE Transactions on Nuclear Science 56(3-2), 1356–1363 (2009)
Godet, O., Sizun, P., Barret, D., Mandrou, P., Cordier, B., Schanne, S., Remoué, N.: Monte-Carlo simulations of the background of the coded-mask camera for X- and Gamma-rays on-board the Chinese-French GRB mission SVOM. Nuclear Instruments and Methods in Physics Research Section A 603(3), 365–371 (2009)
malloc, http://www.malloc.de/en/
TCMalloc, http://goog-perftools.sourceforge.net/doc/tcmalloc.html
The Hoard Memory Allocator, http://www.hoard.org/
Instrumentation Framework for Building Dynamic Analysis Tools, http://valgrind.org/
Agostinelli, S., et al.: GEANT4–a simulation toolkit. Nuclear Instruments and Methods in Physics Research Section A 506(3), 250–303 (2003) (over 100 authors, including J. Apostolakis and G. Cooperman)
Allison, J., et al.: Geant4 Developments and Applications. IEEE Transactions on Nuclear Science 53(1), 270–278 (2006) (73 authors, including J. Apostolakis and G. Cooperman)
Cooperman, G., Nguyen, V., Malioutov, I.: Parallelization of Geant4 Using TOP-C and Marshalgen. In: IEEE NCA 2006, pp. 48–55 (2006)
Thread-Local Storage, http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n1966.html
Elsa: An Elkhound-based C++ Parser, http://www.cs.berkeley.edu/~smcpeak/elkhound/
Parallel Linear Algebra For Scalable Multi-core Architecture, http://icl.cs.utk.edu/plasma/
Bondhugula, U., Hartono, A., Ramanujam, J.: A Practical Automatic Polyhedral Parallelizer and Locality Optimizer. In: PLDI 2008, vol. 43(6), pp. 101–113 (2008)
Baskaran, M.M., Vydyanathan, N., Bondhugula, U.K.R., Ramanujam, J., Rountev, A., Sadayappan, P.: Compiler-Assisted Dynamic Scheduling for Effective Parallelization of Loop Nests on Multicore Processors. In: PPoPP 2009, pp. 219–228 (2009)
Aleen, F., Clark, N.: Commutativity Analysis for Software Parallelization: Letting Program Transformations See the Big Picture. In: ASPLOS 2009, vol. 44(3), pp. 241–252 (2009)
OpenMP, http://openmp.org/wp/
Cilk, http://www.cilk.com/
Anderson, Z., Gay, D., Ennals, R., Brewer, E.: SharC: Checking Data Sharing Strategies for Multithreaded C. In: PLDI 2008, vol. 43(6), pp. 149–158 (2008)
Voung, J.W., Jhala, R., Lerner, S.: RELAY: Static Race Detection on Millions of Lines of Code. In: ESEC-FSE 2007, pp. 205–214 (2007)
Engler, D., Ashcraft, K.: RacerX: Effective, Static Detection of Race Conditions and Deadlocks. In: SOSP 2003, vol. 37(5), pp. 237–252 (2003)
Qadeer, S., Wu, D.: KISS: Keep It Simple and Sequential. In: PLDI 2004, pp. 149–158 (2004)
Pratikakis, P., Foster, J.S., Hicks, M.: LOCKSMITH: Context-Sensitive Correlation Analysis for Race Detection. In: PLDI 2006, vol. 41(6), pp. 320–331 (2006)
Henzinger, T.A., Jhala, R., Majumdar, R.: Race Checking by Context Inference. In: PLDI 2004, pp. 1–13 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dong, X., Cooperman, G., Apostolakis, J. (2010). Multithreaded Geant4: Semi-automatic Transformation into Scalable Thread-Parallel Software. In: D’Ambra, P., Guarracino, M., Talia, D. (eds) Euro-Par 2010 - Parallel Processing. Euro-Par 2010. Lecture Notes in Computer Science, vol 6272. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15291-7_27
Download citation
DOI: https://doi.org/10.1007/978-3-642-15291-7_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15290-0
Online ISBN: 978-3-642-15291-7
eBook Packages: Computer ScienceComputer Science (R0)