Donath S., Götz J., Feichtinger C., Iglberger K., Rüde U. (2010) waLBerla: Optimization for Itanium-based Systems with Thousands of Processors. In: Wagner S., Steinmetz M., Bode A., Müller M. (eds) High Performance Computing in Science and Engineering, Garching/Munich 2009. Springer, Berlin, Heidelberg
Performance optimization is an issue at different levels, in particular for computing and communication intensive codes like free surface lattice Boltzmann. This method is used to simulate liquid-gas flow phenomena such as bubbly flows and foams. Due to a special treatment of the gas phase, an aggregation of bubble volume data is necessary in every time step. In order to accomplish efficient parallel scaling, the all-to-all communication schemes used up to now had to be replaced with more sophisticated patterns that work in a local vicinity. With this approach, scaling could be improved such that simulation runs on up to 9 152 processor cores are possible with more than 90% efficiency. Due to the computation of surface tension effects, this method is also computational intensive. Therefore, also optimization of single core performance plays a tremendous role. The characteristics of the Itanium processor require programming techniques that assist the compiler in efficient code vectorization, especially for complex C++ codes like the waLBerla framework. An approach using variable length arrays shows promising results.