Abstract
This paper presents the on-node performance tuning of a multi-block Euler solver for turbomachinery computations.
Our work focuses on vertical and horizontal scaling within an x86 multi-socket compute node by exploiting the fine grained parallelism available through SIMD instructions at core level and thread-level parallelism across the die through shared memory. We report on the challenges encountered in enabling efficient vectorization using both compiler directives and intrinsics with an emphasis on data structure transformations and their performance impact on vector computations.
Finally, we present the solver performance on different grid sizes running on Intel Sandy Bridge and Ivy Bridge processors.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Williams, S., Oliker, L., Carter, J., Shalf, J.: Extracting ultra-scale lattice boltzmann performance via hierarchical and distributed auto-tuning. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2011, pp. 55:1–55:12. ACM, New York (2011)
Pennycook, S.J., Hughes, C.J., Smelyanskiy, M., Jarvis, S.: Exploring simd for molecular dynamics, using intel xeon processors and intel xeon phi coprocessors. In: Parallel and Distributed Processing Symposium, International, pp. 1085–1097 (2013)
Smith, M.R., Liu, J.Y., Kuo, F.A., Wu, J.S.: Hybrid openmp/avx acceleration of a higher order quiet direct simulation method for the euler equations. Procedia Engineering 61, 152–157 (2013), 25th International Conference on Parallel Computational Fluid Dynamics
Abel, J., Balasubramanian, K., Bargeron, M., Craver, T., Phlipot, M.: Application tuning for streaming simd extensions. Intel Technology Journal, 1–12 (2009)
Gepner, P., Gamayunov, V., Fraser, D.L.: Early performance evaluation of avx for hpc. Procedia Computer Science 4, 452–460 (2011), Proceedings of the International Conference on Computational Science, ICCS 2011
Piazza, T., Jiang, H., Hammarlund, P., Singhal, R.: Technology insight: Intel(r) next generation microarchitecture code name haswell. Technical report, Intel Corporation (2012)
Zone, I.D.: Intel(r) xeon phi, http://software.intel.com/en-us/articles/intel-xeon-phi-coprocessor-vector-microarchitecture (accessed January 3, 2014)
Zone, I.D.: Avx-512 instructions, http://software.intel.com/en-us/blogs/2013/avx-512-instructions (accessed April 3, 2014)
Henretty, T., Stock, K., Pouchet, L.-N., Franchetti, F., Ramanujam, J., Sadayappan, P.: Data layout transformation for stencil computations on short-vector SIMD architectures. In: Knoop, J. (ed.) CC 2011. LNCS, vol. 6601, pp. 225–245. Springer, Heidelberg (2011)
Wang, Y., Baboulin, M., Dongarra, J., Falcou, J., Fraigneau, Y., Maître, O.L.: A parallel solver for incompressible fluid flows. Procedia Computer Science 18, 439–448 (2013)
Vavra, M.: Aero-Thermodynamics and Flow in Turbomachines. John Wiley, Los Alamitos (1960)
Albada, G., Leer, B., Roberts Jr., W.W.: A comparative study of computational methods in cosmic gas dynamics. In: Hussaini, M., Leer, B., Rosendale, J. (eds.) Upwind and High-Resolution Schemes, pp. 95–103. Springer, Heidelberg (1997)
Roe, P.: Approximate riemann solvers, parameter vectors, and difference schemes. Journal of Computational Physics 43(2), 357–372 (1981)
Grasso, F., Meola, C.: Handbook of Computational Fluid Mechanics. Academic Press, London (1996)
Williams, S., Waterman, A., Patterson, D.: Roofline: An insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)
Treibig, J., Hager, G.: Introducing a performance model for bandwidth-limited loop kernels. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds.) PPAM 2009, Part I. LNCS, vol. 6067, pp. 615–624. Springer, Heidelberg (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Hadade, I., di Mare, L. (2014). Exploiting SIMD and Thread-Level Parallelism in Multiblock CFD. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds) Supercomputing. ISC 2014. Lecture Notes in Computer Science, vol 8488. Springer, Cham. https://doi.org/10.1007/978-3-319-07518-1_26
Download citation
DOI: https://doi.org/10.1007/978-3-319-07518-1_26
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07517-4
Online ISBN: 978-3-319-07518-1
eBook Packages: Computer ScienceComputer Science (R0)