Abstract
The use of heterogeneous architectures has become indispensable in optimizing application performance. Nowadays, one of the most popular heterogeneous architectures is discrete CPU+GPU. Despite the high computational power present in such architectures, in many cases, memory data transfers between CPU and GPU are significant performance bottlenecks. As an attempt to mitigate performance costs involved in data transfers, chipmakers started to integrate CPU and GPU cores in the same fabric sharing the same main memory but with different memory address spaces in architectures denominated APUs (Accelerated Processing Unit). To efficiently exploit heterogeneous CPU+GPU architectures it is needed to split the data so that both processing units (PUs) can perform the computations in parallel. Although this approach results in significant performance improvements, some applications can also be functionality split, as is the case of the Lattice-Boltzmann Method (LBM). In this work, we evaluate the performance of each kernel resulting from the functional decomposition of an OpenCL Lattice-Boltzmann method implementation using non-uniform domain decomposition between CPU and GPU on an APU to better understand the performance impact of different non-uniform domain decompositions between CPU and GPU on each kernel. The experimental results performed on an AMD APU A10-7870K show that uniform domain decompositions between each kernel on the same PU but non-uniform domain decompositions between CPU and GPU affect each kernel differently. These results suggest that non-uniform domain decompositions between each kernel on the same PU and not only between the different PUs can improve even more the performance of the application.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Calore, E., Gabbana, A., Kraus, J., Pellegrini, E., Schifano, S.F., Tripiccione, R.: Massively parallel lattice-Boltzmann codes on large GPU clusters. Parallel Comput. 58, 1–24 (2016). https://doi.org/10.1016/j.parco.2016.08.005
Chen, S., Doolen, G.D.: Lattice Boltzmann method for fluid flows. Ann. Rev. Fluid Mech. 30(1), 329–364 (1998). https://doi.org/10.1146/annurev.fluid.30.1.329
Feichtinger, C., Habich, J., Köstler, H., Hager, G., Rüde, U., Wellein, G.: A flexible patch-based lattice Boltzmann parallelization approach for heterogeneous GPU-CPU clusters. Parallel Comput. 37(9), 536–549 (2011). https://doi.org/10.1016/j.parco.2011.03.005
McClure, J.E., Prins, J.F., Miller, C.T.: A novel heterogeneous algorithm to simulate multiphase flow in porous media on multicore CPU-GPU systems. Comput. Phys. Commun. 185(7), 1865–1874 (2014). https://doi.org/10.1016/j.cpc.2014.03.012
McNamara, G.R., Zanetti, G.: Use of the Boltzmann equation to simulate lattice-gas automata. Phys. Rev. Lett. 61(20), 2332–2335 (1988). https://doi.org/10.1103/PhysRevLett.61.2332
Meadows, L., Ishikawa, K.: OpenMP tasking and MPI in a lattice QCD benchmark. In: de Supinski, B.R., Olivier, S.L., Terboven, C., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2017. LNCS, vol. 10468, pp. 77–91. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65578-9_6
Nagar, P., Song, F., Zhu, L., Lin, L.: LBM-IB: a parallel library to solve 3D fluid-structure interaction problems on manycore systems. In: Proceedings of the International Conference on Parallel Processing, December 2015, pp. 51–60 (2015). https://doi.org/10.1109/ICPP.2015.14
Riesinger, C., Bakhtiari, A., Schreiber, M., Neumann, P., Bungartz, H.J.: A holistic scalable implementation approach of the lattice Boltzmann method for CPU/GPU heterogeneous clusters. Computation 5(4), 48 (2017). https://doi.org/10.3390/computation5040048. http://www.mdpi.com/2079-3197/5/4/48
Said, I., Fortin, P., Lamotte, J., Calandra, H.: Leveraging the accelerated processing units for seismic imaging: a performance and power efficiency comparison against CPUs and GPUs. Int. J. High Perform. Comput. Appl. (2017). https://doi.org/10.1177/1094342017696562
Schepke, C., Diverio, T.A.: Distribuição de Dados para Implementações Paralelas do Método de Lattice Boltzmann. Ph.D. thesis, Universidade Federal do Rio Grande do Sul (2007)
Schepke, C., Maillard, N., Navaux, P.O.A.: Parallel lattice Boltzmann method with blocked partitioning. Int. J. Parallel Program. 37(6), 593–611 (2009). https://doi.org/10.1007/s10766-009-0113-x
Tang, P., Song, A., Liu, Z., Zhang, W.: An implementation and optimization of lattice Boltzmann method based on the multi-node CPU+MIC heterogeneous architecture. In: 2016 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), no. 1, pp. 315–320 (2016). https://doi.org/10.1109/CyberC.2016.67, http://ieeexplore.ieee.org/document/7864252/
Valero-Lara, P., Jansson, J.: Heterogeneous CPU+GPU approaches for mesh refinement over lattice-Boltzmann simulations. Concurr. Comput. 29, 1–20 (2017). https://doi.org/10.1002/cpe.3919
Xian, W., Takayuki, A.: Multi-GPU performance of incompressible flow computation by lattice Boltzmann method on GPU cluster. Parallel Comput. 37(9), 521–535 (2011). https://doi.org/10.1016/j.parco.2011.02.007
Ye, Y., Li, K., Wang, Y., Deng, T.: Parallel computation of entropic lattice Boltzmann method on hybrid CPU-GPU accelerated system. Comput. Fluids 110, 114–121 (2015). https://doi.org/10.1016/j.compfluid.2014.06.002
Zhou, Y., He, F., Qiu, Y.: Accelerating image convolution filtering algorithms on integrated CPU-GPU architectures. J. Electron. Imaging 27(3) (2018). https://doi.org/10.1117/1.JEI.27.3.033002
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Freytag, G., Navaux, P.O.A., Lima, J.V.F., Schnorr, L.M., Rech, P. (2019). Non-uniform Domain Decomposition for Heterogeneous Accelerated Processing Units. In: Senger, H., et al. High Performance Computing for Computational Science – VECPAR 2018. VECPAR 2018. Lecture Notes in Computer Science(), vol 11333. Springer, Cham. https://doi.org/10.1007/978-3-030-15996-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-15996-2_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15995-5
Online ISBN: 978-3-030-15996-2
eBook Packages: Computer ScienceComputer Science (R0)