Cache Performance Optimizations for Parallel Lattice Boltzmann Codes

  • Jens Wilke
  • Thomas Pohl
  • Markus Kowarschik
  • Ulrich Rüde
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2790)


When designing and implementing highly efficient scientific applications for parallel computers such as clusters of workstations, it is inevitable to consider and to optimize the single–CPU performance of the codes. For this purpose, it is particularly important that the codes respect the hierarchical memory designs that computer architects employ in order to hide the effects of the growing gap between CPU performance and main memory speed. In this paper, we present techniques to enhance the single–CPU efficiency of lattice Boltzmann methods which are commonly used in computational fluid dynamics. We show various performance results to emphasize the effectiveness of our optimization techniques.


Lattice Boltzmann Method Particle Distribution Function Data Layout Hierarchical Memory Cache Utilization 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Allen, R., Kennedy, K.: Optimizing Compilers for Modern Architectures. Morgan Kaufmann Publishers, San Francisco (2001)Google Scholar
  2. 2.
    AMD Corporation, AMD Athlon XP Processor 8 Data Sheet, Publication #25175 Rev. F (2002)Google Scholar
  3. 3.
    Bassetti, F., Davis, K., Quinlan, D.: Temporal Locality Optimizations for Stencil Operations within Parallel Object–Oriented Scientific Frameworks on Cache– Based Architectures. In: Proc. of the Int. Conference on Parallel and Distributed Computing and Systems, Las Vegas, NV, USA, pp. 145–153 (1998)Google Scholar
  4. 4.
    Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P.: A Portable Programming Interface for Performance Evaluation on Modern Processors. Int. Journal of High Performance Computing Applications 14(3), 189–204 (2000)CrossRefGoogle Scholar
  5. 5.
    Chen, S., Doolen, G.D.: Lattice Boltzmann Method for Fluid Flow. Annual Reviews of Fluid Mechanics 30, 329–364 (1998)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Douglas, C.C., Hu, J., Kowarschik, M., Rüde, U., Weiß, C.: Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transactions on Numerical Analysis 10, 21–40 (2000)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Frigo, M., Johnson, S.G.: FFTW: An Adaptive Software Architecture for the FFT. In: Proc. of the Int. Conference on Acoustics, Speech, and Signal Processing, Seattle, WA, USA, vol. 3, pp. 1381–1384 (1998)Google Scholar
  8. 8.
    Goedecker, S., Hoisie, A.: Performance Optimization of Numerically Intensive Codes. SIAM, Philadelphia (2001)CrossRefGoogle Scholar
  9. 9.
    Griebel, M., Dornseifer, T., Neunhoeffer, T.: Numerical Simulation in Fluid Dynamics. SIAM, Philadelphia (1998)CrossRefGoogle Scholar
  10. 10.
    Handy, J.: The Cache Memory Book, 2nd edn. Academic Press, London (1998)zbMATHGoogle Scholar
  11. 11.
    Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitative Approach, 2nd edn. Morgan Kaufmann Publisher, Inc., San Francisco (1996)zbMATHGoogle Scholar
  12. 12.
    Intel Corporation. Intel Itanium2 Processor Reference Manual (2002), Document Number: 251110–001Google Scholar
  13. 13.
    Kowarschik, M., Weiß, C.: An Overview of Cache Optimization Techniques and Cache–Aware Numerical Algorithms. In: Meyer, U., Sanders, P., Sibeyn, J.F. (eds.) Algorithms for Memory Hierarchies. LNCS, vol. 2625, pp. 213–232. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  14. 14.
    Loshin, D.: Efficient Memory Programming. McGraw–Hill, New York (1998)Google Scholar
  15. 15.
    Rivera, G., Tseng, C.-W.: Data Transformations for Eliminating Conflict Misses. In: Proc. of the ACM SIGPLAN Conference on Programming Language Design and Implementation, Montreal, Canada (1998)Google Scholar
  16. 16.
    Whaley, R.C., Dongarra, J.: Automatically Tuned Linear Algebra Software. In: Proc. of the ACM/IEEE Supercomputing Conference, Orlando, FL, USA (1998)Google Scholar
  17. 17.
    Wolf-Gladrow, D.A.: Lattice–Gas Cellular Automata and Lattice Boltzmann Models. Springer, Heidelberg (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Jens Wilke
    • 1
  • Thomas Pohl
    • 1
  • Markus Kowarschik
    • 1
  • Ulrich Rüde
    • 1
  1. 1.Lehrstuhl für Systemsimulation (Informatik 10), Institut für InformatikFriedrich–Alexander–Universität Erlangen–NürnbergGermany

Personalised recommendations