Skip to main content

LRnLA Lattice Boltzmann Method: A Performance Comparison of Implementations on GPU and CPU

  • Conference paper
  • First Online:
Parallel Computational Technologies (PCT 2019)

Abstract

We present an implementation of the Lattice Boltzmann Method (LBM) with Locally Recursive non-Locally Asynchronous (LRnLA) algorithms on GPU and CPU. The algorithm is based on the recursive subdivision of the domain of the dD1T space-time simulation and loosens the memory-bound limit for numerical schemes with local dependencies. We show that LRnLA algorithm allows to overcome the main memory bandwidth limitations in both CPU and GPU implementations. For CPU, we find the data layout that provides alignment for the full use of AVX2/AVX512 vectorization. For GPU, we devise a procedure for pairwise CUDA-block synchronization applied to the implementation of the LRnLA algorithm, which previously worked only on CPU. The performance on GPU is higher, as it is usual in modern implementations. However, the performance gap in our implementation is smaller, thanks to a more efficient CPU version. Through a detailed comparison, we show possible future applications for both the CPU and the GPU implementations of the lattice Boltzmann method in the complex setting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Computational resources of Keldysh Institute of Applied Mathematics RAS. www.kiam.ru

  2. Bailey, P., Myre, J., Walsh, S.D., Lilja, D.J., Saar, M.O.: Accelerating lattice boltzmann fluid flow simulations using graphics processors. In: International Conference on Parallel Processing, ICPP 2009, pp. 550–557. IEEE (2009). https://doi.org/10.1109/ICPP.2009.38

  3. Geier, M., Schönherr, M.: Esoteric twist: an efficient in-place streaming algorithmus for the lattice boltzmann method on massively parallel hardware. Computation 5(2), 19 (2017). https://doi.org/10.3390/computation5020019

    Article  Google Scholar 

  4. Levchenko, V., Perepelkina, A., Zakirov, A.: Diamondtorre algorithm for high-performance wave modeling. Computation 4(3), 29 (2016). https://doi.org/10.3390/computation4030029

    Article  Google Scholar 

  5. Levchenko, V.D., Perepelkina, A.Y.: Locally recursive non-locally asynchronous algorithms for stencil computation. Lobachevskii J. Math. 39(4), 552–561 (2018). https://doi.org/10.1134/S1995080218040108

    Article  MathSciNet  MATH  Google Scholar 

  6. Mattila, K., Hyväluoma, J., Rossi, T., Aspnäs, M., Westerholm, J.: An efficient swap algorithm for the lattice boltzmann method. Comput. Phys. Commun. 176(3), 200–210 (2007). https://doi.org/10.1016/j.cpc.2006.09.005

    Article  MATH  Google Scholar 

  7. Neumann, P., Bungartz, H.J., Mehl, M., Neckel, T., Weinzierl, T.: A coupled approach for fluid dynamic problems using the PDE framework peano. Commun. Comput. Phys. 12(1), 65–84 (2012). https://doi.org/10.4208/cicp.210910.200611a

    Article  MATH  Google Scholar 

  8. Nguyen, A., Satish, N., Chhugani, J., Kim, C., Dubey, P.: 3.5-D blocking optimization for stencil computations on modern CPUs and GPUs. In: High Performance Computing, Networking, Storage and Analysis (SC), pp. 1–13. IEEE (2010). https://doi.org/10.1109/SC.2010.2

  9. Perepelkina, A., Levchenko, V.: LRnLA algorithm ConeFold with non-local vectorization for LBM implementation. Commun. Comput. Inf. Sci. 965, 101–113 (2019). https://doi.org/10.1007/978-3-030-05807-4_9

    Article  Google Scholar 

  10. Riesinger, C., Bakhtiari, A., Schreiber, M., Neumann, P., Bungartz, H.J.: A holistic scalable implementation approach of the lattice Boltzmann method for CPU/GPU heterogeneous clusters. Computation 5(4), 48 (2017). https://doi.org/10.3390/computation5040048

    Article  Google Scholar 

  11. Robertsén, F., Westerholm, J., Mattila, K.: Designing a graphics processing unit accelerated petaflop capable lattice boltzmann solver: read aligned data layouts and asynchronous communication. Int. J. High Perform. Comput. Appl. 31(3), 246–255 (2017). https://doi.org/10.1177/1094342016658109

    Article  Google Scholar 

  12. Succi, S.: The Lattice Boltzmann Equation: for Fluid Dynamics and Beyond. Oxford University Press, Oxford (2001)

    MATH  Google Scholar 

  13. Tomczak, T., Szafran, R.G.: A new GPU implementation for lattice-Boltzmann simulations on sparse geometries. Comput. Phys. Commun. 235, 258–278 (2019)

    Article  Google Scholar 

  14. Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009). https://doi.org/10.1145/1498765.1498785

    Article  Google Scholar 

  15. Zakirov, A., Levchenko, V., Perepelkina, A., Zempo, Y.: High performance FDTD algorithm for GPGPU supercomputers. J. Phys: Conf. Ser. 759, 012100 (2016). https://doi.org/10.1088/1742-6596/759/1/012100. IOP Publishing

    Article  Google Scholar 

Download references

Acknowledgments

The work was supported by the Russian Science Foundation (grant No. 18-71-10004).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vadim Levchenko .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Levchenko, V., Zakirov, A., Perepelkina, A. (2019). LRnLA Lattice Boltzmann Method: A Performance Comparison of Implementations on GPU and CPU. In: Sokolinsky, L., Zymbler, M. (eds) Parallel Computational Technologies. PCT 2019. Communications in Computer and Information Science, vol 1063. Springer, Cham. https://doi.org/10.1007/978-3-030-28163-2_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-28163-2_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-28162-5

  • Online ISBN: 978-3-030-28163-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics