Skip to main content

Implementation of LB Simulations

  • Chapter
  • First Online:
The Lattice Boltzmann Method

Abstract

After reading this chapter, you will understand the fundamentals of high-performance computing and how to write efficient code for lattice Boltzmann method simulations. You will know how to optimise sequential codes and develop parallel codes for multi-core CPUs, computing clusters, and graphics processing units. The code listings in this chapter allow you to quickly get started with an efficient code and show you how to optimise your existing code.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 99.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/lbm-principles-practice

  2. 2.

    This is analogous to dividing decimal numbers by powers of 10 by “shifting” right.

  3. 3.

    Pointers are variables that hold the address of another variable. See Appendix A.9.6 for more details.

  4. 4.

    For readers unfamiliar with assembly language or the instructions shown here for a typical modern 64 bit Intel processor, push and pop are instructions that save and retrieve their parameter from the “stack,” a special memory region where data can be stored temporarily. The instruction mov dst,src copies the contents of src to dst where src and dst may be locations in memory or registers. QWORD PTR [addr] refers to the contents of the quadword (four words, which is eight bytes) at the location addr in memory. Numbers written as 0xhh represent the value hh in base 16 (hexadecimal). The symbols rax, rbp, and rsp denote 64 bit general purpose registers, and xmm0, xmm1, and xmm2 are registers for floating point values. Note that these are 128 bit floating point registers that can store two double precision values or four single precision values, but in this code only the lower 64 bits are used. The instruction movsd dst,src means “move scalar double” and copies src to dst using only the lowest 64 bits if a register is specified. movapd dst,src moves the full 128 bit value from src to dst. The instructions addsd dst,src and mulsd dst,src are scalar addition and multiplication instructions, repectively, that store the result of adding/multiplying dst and src to dst. The function’s parameters are provided in the registers xmm0-2 and its result is returned in xmm0. Execution continues in the calling function after the instruction ret.

  5. 5.

    Only calculating these values is not enough; they must be used somehow or the compiler will discard the unnecessary calculations.

  6. 6.

    On a command line, we can do this with output redirection. For example, ./sim > sim.out on a Unix command line (or sim.exe > sim.out in Windows) runs the program sim and saves its output to the text file sim.out. The output is not shown on the screen. To both display and save the output we can use (on Unix systems) the tee command: ./sim | tee sim.out where we have used a pipe, |, to send the output of one program to the input of another, in this case tee.

  7. 7.

    Where the term “node” is potentially ambiguous, we use the more specific terms “computing node” and “lattice node” for clarity.

  8. 8.

    http://aws.amazon.com/hpc/

  9. 9.

    Depending on how the MPI implementation combines the output from the different processes, this synchronisation might not have the desired effect. Later output from rank 1, for example, might appear before any output from rank 0. If it is essential for the order of output to be synchronised, the data to be output from all ranks can be sent to one rank that displays it all in the correct order.

  10. 10.

    MPI_Testall is a variant of MPI_Test. The variants of MPI_Test are analogous to those of MPI_Wait: MPI_Testany, MPI_Testsome, and MPI_Testall.

  11. 11.

    This matches the size of size_t on the systems used for testing, but it is not portable and may need to be changed for other systems.

  12. 12.

    mpirun, mpiexec, and orterun are synonyms in Open MPI.

  13. 13.

    The analysis of how a program uses memory and computing resources is called profiling. Automatic profiling software typically reports the time taken by the most time-consuming functions in a program and is useful for optimisation.

  14. 14.

    In textile weaving, a warp is a collection of parallel threads through which other thread, called the weft, is interlaced.

  15. 15.

    A macro is a compiler shortcut that allows programmers to conveniently use a fragment of code in many places. When preparing code for compilation, the compiler system replaces the name of the macro with the corresponding code fragment.

  16. 16.

    This strict ordering of memory accesses is not necessary in general. The memory accesses within a warp are combined as long as they involve a contiguous block of memory regardless of the details of which threads access which locations in memory.

References

  1. Institute of Electrical and Electronics Engineers. 754-2008 — IEEE standard for floating-point arithmetic (2008). http://dx.doi.org/10.1109/IEEESTD.2008.4610935

  2. H.S. Warren Jr., Hacker’s Delight, 2nd edn. (Addison-Wesley, Boston, 2013)

    Google Scholar 

  3. U. Drepper. What every programmer should know about memory (2007). https://www.akkadia.org/drepper/cpumemory.pdf

    Google Scholar 

  4. S. Chellappa, F. Franchetti, M. Püschel, in Generative and Transformational Techniques in Software Engineering II: International Summer School, GTTSE 2007, Braga, Portugal, July 2–7, 2007. Revised Papers, ed. by R. Lämmel, J. Visser, J. Saraiva (Springer, Berlin, Heidelberg, 2008), pp. 196–259

    Google Scholar 

  5. M. Wittmann, T. Zeiser, G. Hager, G. Wellein, Comput. Math. Appl. 65, 924 (2013)

    Article  MathSciNet  Google Scholar 

  6. D.A. Bikulov, D.S. Senin, Vychisl. Metody Programm. 3, 370 (2013). This article is in Russian.

    Google Scholar 

  7. OpenMP Architecture Review Board. About the OpenMP ARB and OpenMP.org. http://openmp.org/wp/about-openmp/

  8. OpenMP Architecture Review Board. OpenMP application program interface (2011). http://www.openmp.org/mp-documents/OpenMP3.1.pdf. Version 3.1

  9. OpenMP Architecture Review Board. OpenMP application programming interface (2015). http://www.openmp.org/mp-documents/openmp-4.5.pdf. Version 4.5

    Google Scholar 

  10. B. Barney. OpenMP. https://computing.llnl.gov/tutorials/openMP/

  11. Message Passing Interface Forum. Message Passing Interface (MPI) Forum Home Page. http://www.mpi-forum.org/

  12. TOP500. November 2015 TOP500 supercomputer sites. http://www.top500.org/lists/2015/11/

  13. Message Passing Interface Forum. MPI: A Message-Passing Interface standard (2008). http://www.mpi-forum.org/docs/mpi-1.3/mpi-report-1.3-2008-05-30.pdf. Version 1.3

  14. The Open MPI Project. Open MPI: Open Source High Performance Computing. https://www.open-mpi.org/

  15. Message Passing Interface Forum. MPI documents. http://www.mpi-forum.org/docs/docs.html

  16. The Open MPI Project. Open MPI documentation. https://www.open-mpi.org/doc/

  17. B. Barney. Message Passing Interface (MPI). https://computing.llnl.gov/tutorials/mpi/

  18. W. Gropp, E. Lusk, A. Skjellum, Using MPI: Portable parallel programming with the Message-Passing Interface, 3rd edn. (MIT Press, Cambridge, 2014)

    MATH  Google Scholar 

  19. Adaptive Computing, Inc. TORQUE resource manager. http://www.adaptivecomputing.com/products/open-source/torque/

  20. Khronos Group. OpenCL. https://www.khronos.org/opencl/

  21. OpenACC. Directives for accelerators. http://www.openacc.org/

  22. NVIDIA. CUDA toolkit documentation. http://docs.nvidia.com/cuda/

  23. NVIDIA. CUDA code samples. https://developer.nvidia.com/cuda-code-samples

  24. NVIDIA. CUDA toolkit documentation. http://docs.nvidia.com/cuda/cuda-samples/

  25. J. Sanders, E. Kandrot, CUDA by Example: An Introduction to General Purpose GPU Programming (Addison-Wesley, Boston, 2010)

    Google Scholar 

  26. NVIDIA. CUDA downloads. https://developer.nvidia.com/cuda-downloads

  27. NVIDIA. CUDA quick start guide. http://docs.nvidia.com/cuda/pdf/CUDA_Quick_Start_Guide.pdf

  28. NVIDIA. CUDA C best practices guide (2015). http://docs.nvidia.com/cuda/pdf/CUDA_C_Best_Practices_Guide.pdf

  29. W. Li, X. Wei, A. Kaufman, Visual Comput. 19, 444 (2003)

    Article  Google Scholar 

  30. A. Kaufman, Z. Fan, K. Petkov, J. Stat. Mech. 2009, P06016 (2009)

    Article  Google Scholar 

  31. J. Tölke, Comput. Visual. Sci. 13, 29 (2010)

    Article  Google Scholar 

  32. J. Tölke, M. Krafczyk, Int. J. Comput. Fluid. D. 22, 443 (2008)

    Article  Google Scholar 

  33. M.J. Mawson, A.J. Revell, Comput. Phys. Commun. 185, 2566 (2014)

    Article  Google Scholar 

  34. O. Shardt, J.J. Derksen, S.K. Mitra, Langmuir 29, 6201 (2013)

    Article  Google Scholar 

  35. O. Shardt, S.K. Mitra, J.J. Derksen, Langmuir 30, 14416 (2014)

    Article  Google Scholar 

  36. A.E. Komrakova, O. Shardt, D. Eskin, J.J. Derksen, Int. J. Multiphase Flow 59, 24 (2014)

    Article  Google Scholar 

  37. A.E. Komrakova, O. Shardt, D. Eskin, J.J. Derksen, Chem. Eng. Sci. 126, 150 (2015)

    Article  Google Scholar 

  38. W. Xian, A. Takayuki, Parallel Comput. 37, 521 (2011)

    MathSciNet  Google Scholar 

  39. X. Li, Y. Zhang, X. Wang, W. Ge, Chem. Eng. Sci. 102, 209 (2013)

    Article  Google Scholar 

  40. J. McClure, H. Wang, J.F. Prins, C.T. Miller, W.C. Feng, in Parallel and Distributed Processing Symposium, 2014 IEEE 28th International (2014), pp. 583–592

    Google Scholar 

  41. A. Gray, A. Hart, O. Henrich, K. Stratford, Int. J. High Perform. C. 29, 274 (2015)

    Article  Google Scholar 

  42. C. Obrecht, F. Kuznik, B. Tourancheau, J.J. Roux, Comput. Fluids 54, 118 (2012)

    Article  MathSciNet  Google Scholar 

  43. C. Obrecht, F. Kuznik, B. Tourancheau, J.J. Roux, Comput. Math. Appl. 65, 252 (2013)

    Article  MathSciNet  Google Scholar 

  44. C. Obrecht, F. Kuznik, B. Tourancheau, J.J. Roux, Comput. Fluids 80, 269 (2013)

    Article  Google Scholar 

  45. C. Obrecht, F. Kuznik, B. Tourancheau, J.J. Roux, Comput. Math. Appl. 61, 3628 (2011)

    Article  Google Scholar 

  46. F. Kuznik, C. Obrecht, G. Rusaouen, J.J. Roux, Comput. Math. Appl. 59, 2380 (2010)

    Article  Google Scholar 

  47. C. Obrecht, F. Kuznik, B. Tourancheau, J.J. Roux, Parallel Comput. 39, 259 (2013)

    Article  MathSciNet  Google Scholar 

  48. M. Schreiber, P. Neumann, S. Zimmer, H.J. Bungartz, Procedia Comput. Sci. 4, 984 (2011)

    Article  Google Scholar 

  49. H. Zhou, G. Mo, F. Wu, J. Zhao, M. Rui, K. Cen, Comput. Methods Appl. Mech. Eng. 225–228, 984 (2011)

    Google Scholar 

  50. M. Schönherr, K. Kucher, M. Geier, M. Stiebler, S. Freudiger, M. Krafczyk, Comput. Math. Appl. 61, 3730 (2011)

    Article  Google Scholar 

  51. C. Obrecht, F. Kuznik, B. Tourancheau, J.J. Roux, Comput. Math. Appl. 65, 936 (2013)

    Article  MathSciNet  Google Scholar 

  52. H. Liu, Q. Kang, C.R. Leonardi, S. Schmieschek, A. Narváez, B.D. Jones, J.R. Williams, A.J. Valocchi, J. Harting, Comput. Geosci. 20, 777 (2016)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Krüger, T., Kusumaatmaja, H., Kuzmin, A., Shardt, O., Silva, G., Viggen, E.M. (2017). Implementation of LB Simulations. In: The Lattice Boltzmann Method. Graduate Texts in Physics. Springer, Cham. https://doi.org/10.1007/978-3-319-44649-3_13

Download citation

Publish with us

Policies and ethics