Non-uniform Domain Decomposition for Heterogeneous Accelerated Processing Units

Freytag, Gabriel; Navaux, Philippe Olivier Alexandre; Lima, João Vicente Ferreira; Schnorr, Lucas Mello; Rech, Paolo

doi:10.1007/978-3-030-15996-2_8

Non-uniform Domain Decomposition for Heterogeneous Accelerated Processing Units

Gabriel Freytag²¹,
Philippe Olivier Alexandre Navaux²¹,
João Vicente Ferreira Lima²²,
Lucas Mello Schnorr²¹ &
…
Paolo Rech²¹

Conference paper
First Online: 26 March 2019

402 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11333))

Abstract

The use of heterogeneous architectures has become indispensable in optimizing application performance. Nowadays, one of the most popular heterogeneous architectures is discrete CPU+GPU. Despite the high computational power present in such architectures, in many cases, memory data transfers between CPU and GPU are significant performance bottlenecks. As an attempt to mitigate performance costs involved in data transfers, chipmakers started to integrate CPU and GPU cores in the same fabric sharing the same main memory but with different memory address spaces in architectures denominated APUs (Accelerated Processing Unit). To efficiently exploit heterogeneous CPU+GPU architectures it is needed to split the data so that both processing units (PUs) can perform the computations in parallel. Although this approach results in significant performance improvements, some applications can also be functionality split, as is the case of the Lattice-Boltzmann Method (LBM). In this work, we evaluate the performance of each kernel resulting from the functional decomposition of an OpenCL Lattice-Boltzmann method implementation using non-uniform domain decomposition between CPU and GPU on an APU to better understand the performance impact of different non-uniform domain decompositions between CPU and GPU on each kernel. The experimental results performed on an AMD APU A10-7870K show that uniform domain decompositions between each kernel on the same PU but non-uniform domain decompositions between CPU and GPU affect each kernel differently. These results suggest that non-uniform domain decompositions between each kernel on the same PU and not only between the different PUs can improve even more the performance of the application.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Calore, E., Gabbana, A., Kraus, J., Pellegrini, E., Schifano, S.F., Tripiccione, R.: Massively parallel lattice-Boltzmann codes on large GPU clusters. Parallel Comput. 58, 1–24 (2016). https://doi.org/10.1016/j.parco.2016.08.005
Article MathSciNet Google Scholar
Chen, S., Doolen, G.D.: Lattice Boltzmann method for fluid flows. Ann. Rev. Fluid Mech. 30(1), 329–364 (1998). https://doi.org/10.1146/annurev.fluid.30.1.329
Article MathSciNet MATH Google Scholar
Feichtinger, C., Habich, J., Köstler, H., Hager, G., Rüde, U., Wellein, G.: A flexible patch-based lattice Boltzmann parallelization approach for heterogeneous GPU-CPU clusters. Parallel Comput. 37(9), 536–549 (2011). https://doi.org/10.1016/j.parco.2011.03.005
Article MathSciNet Google Scholar
McClure, J.E., Prins, J.F., Miller, C.T.: A novel heterogeneous algorithm to simulate multiphase flow in porous media on multicore CPU-GPU systems. Comput. Phys. Commun. 185(7), 1865–1874 (2014). https://doi.org/10.1016/j.cpc.2014.03.012
Article MathSciNet Google Scholar
McNamara, G.R., Zanetti, G.: Use of the Boltzmann equation to simulate lattice-gas automata. Phys. Rev. Lett. 61(20), 2332–2335 (1988). https://doi.org/10.1103/PhysRevLett.61.2332
Article Google Scholar
Meadows, L., Ishikawa, K.: OpenMP tasking and MPI in a lattice QCD benchmark. In: de Supinski, B.R., Olivier, S.L., Terboven, C., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2017. LNCS, vol. 10468, pp. 77–91. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65578-9_6
Chapter Google Scholar
Nagar, P., Song, F., Zhu, L., Lin, L.: LBM-IB: a parallel library to solve 3D fluid-structure interaction problems on manycore systems. In: Proceedings of the International Conference on Parallel Processing, December 2015, pp. 51–60 (2015). https://doi.org/10.1109/ICPP.2015.14
Riesinger, C., Bakhtiari, A., Schreiber, M., Neumann, P., Bungartz, H.J.: A holistic scalable implementation approach of the lattice Boltzmann method for CPU/GPU heterogeneous clusters. Computation 5(4), 48 (2017). https://doi.org/10.3390/computation5040048. http://www.mdpi.com/2079-3197/5/4/48
Article Google Scholar
Said, I., Fortin, P., Lamotte, J., Calandra, H.: Leveraging the accelerated processing units for seismic imaging: a performance and power efficiency comparison against CPUs and GPUs. Int. J. High Perform. Comput. Appl. (2017). https://doi.org/10.1177/1094342017696562
Schepke, C., Diverio, T.A.: Distribuição de Dados para Implementações Paralelas do Método de Lattice Boltzmann. Ph.D. thesis, Universidade Federal do Rio Grande do Sul (2007)
Google Scholar
Schepke, C., Maillard, N., Navaux, P.O.A.: Parallel lattice Boltzmann method with blocked partitioning. Int. J. Parallel Program. 37(6), 593–611 (2009). https://doi.org/10.1007/s10766-009-0113-x
Article MATH Google Scholar
Tang, P., Song, A., Liu, Z., Zhang, W.: An implementation and optimization of lattice Boltzmann method based on the multi-node CPU+MIC heterogeneous architecture. In: 2016 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), no. 1, pp. 315–320 (2016). https://doi.org/10.1109/CyberC.2016.67, http://ieeexplore.ieee.org/document/7864252/
Valero-Lara, P., Jansson, J.: Heterogeneous CPU+GPU approaches for mesh refinement over lattice-Boltzmann simulations. Concurr. Comput. 29, 1–20 (2017). https://doi.org/10.1002/cpe.3919
Article Google Scholar
Xian, W., Takayuki, A.: Multi-GPU performance of incompressible flow computation by lattice Boltzmann method on GPU cluster. Parallel Comput. 37(9), 521–535 (2011). https://doi.org/10.1016/j.parco.2011.02.007
Article MathSciNet Google Scholar
Ye, Y., Li, K., Wang, Y., Deng, T.: Parallel computation of entropic lattice Boltzmann method on hybrid CPU-GPU accelerated system. Comput. Fluids 110, 114–121 (2015). https://doi.org/10.1016/j.compfluid.2014.06.002
Article MathSciNet MATH Google Scholar
Zhou, Y., He, F., Qiu, Y.: Accelerating image convolution filtering algorithms on integrated CPU-GPU architectures. J. Electron. Imaging 27(3) (2018). https://doi.org/10.1117/1.JEI.27.3.033002

Download references

Author information

Authors and Affiliations

Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, 9500, Brazil
Gabriel Freytag, Philippe Olivier Alexandre Navaux, Lucas Mello Schnorr & Paolo Rech
Universidade Federal de Santa Maria, Santa Maria, RS, 1000, Brazil
João Vicente Ferreira Lima

Authors

Gabriel Freytag
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Olivier Alexandre Navaux
View author publications
You can also search for this author in PubMed Google Scholar
João Vicente Ferreira Lima
View author publications
You can also search for this author in PubMed Google Scholar
Lucas Mello Schnorr
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Rech
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gabriel Freytag .

Editor information

Editors and Affiliations

Federal University of São Carlos, São Carlos, São Paulo, Brazil
Hermes Senger
Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Osni Marques
Universidade Estadual Paulista Júlio de Mesquita Filho, Presidente Prudente, São Paulo, Brazil
Rogerio Garcia
Universidade Estadual Paulista Júlio de Mesquita Filho, São Paulo, São Paulo, Brazil
Tatiana Pinheiro de Brito
Universidade Estadual Paulista Júlio de Mesquita Filho, São Paulo, São Paulo, Brazil
Rogério Iope
Universidade Estadual Paulista Júlio de Mesquita Filho, São Paulo, São Paulo, Brazil
Silvio Stanzani
Universidad Nacional de San Luis, San Luis, Argentina
Veronica Gil-Costa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Freytag, G., Navaux, P.O.A., Lima, J.V.F., Schnorr, L.M., Rech, P. (2019). Non-uniform Domain Decomposition for Heterogeneous Accelerated Processing Units. In: Senger, H., et al. High Performance Computing for Computational Science – VECPAR 2018. VECPAR 2018. Lecture Notes in Computer Science(), vol 11333. Springer, Cham. https://doi.org/10.1007/978-3-030-15996-2_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-15996-2_8
Published: 26 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15995-5
Online ISBN: 978-3-030-15996-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics