Using Intel Xeon Phi Coprocessor to Accelerate Computations in MPDATA Algorithm

Szustak, Lukasz; Rojek, Krzysztof; Gepner, Pawel

doi:10.1007/978-3-642-55224-3_54

Lukasz Szustak¹⁹,
Krzysztof Rojek¹⁹ &
Pawel Gepner²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8384))

Included in the following conference series:

International Conference on Parallel Processing and Applied Mathematics

1661 Accesses
16 Citations

Abstract

The multidimensional positive definite advection transport algorithm (MPDATA) belongs to the group of nonoscillatory forward-in-time algorithms, and performs a sequence of stencil computations. MPDATA is one of the major parts of the dynamic core of the EULAG geophysical model.

The Intel Xeon Phi coprocessor is the first product based on the Intel Many Integrated Core (Intel MIC) architecture. In this work, we outline an approach to adaptation of the 3D MPDATA algorithm to the Intel MIC architecture. This approach is based on combination of temporal and space blocking techniques, and allows us to ease memory and communication bounds and better exploit the theoretical floating point efficiency of target computing platforms. In order to utilize computing resources available in Intel Xeon Phi, the proposed approach employs two main levels of parallelism: (i) task parallelism which allows for utilization of more than 200 logical cores, and (ii) data parallelism to use efficiently 512-bit vector processing units.

We discuss performance results obtained on two platforms, including either two Intel Xeon E5-2643 CPUs and Intel Xeon Phi 3120A, or two Intel Xeon E5-2697 v2 CPUs and Intel Xeon Phi7120P. The top-of-the-line Intel Xeon Phi 7120P gives the best performance results for all tests. Notably, this coprocessor executes the MPDATA algorithm 2 times faster than two Intel Xeon E5-2697 v2 CPUs, and 2.86 times faster than two Intel Xeon E5-2643 processors. Both the utilization of Intel Xeon Phi many cores and vectorization play the leading role in performance exploitation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Intel Architectures Comparison. http://ark.intel.com/pl/compare/75799,75797,64587,75283
Intel: Intel Xeon Phi Coprocessor System Software Developers Guide. Intel Corporation (2013)
Google Scholar
Colfax International: Parallel Programming and Optimization with Intel Xeon Phi Coprocessors. Handbook on the Development and Optimization of Parallel Applications for Intel Xeon Processors and Intel Xeon Phi Coprocessors. Colfax International (2013)
Google Scholar
Piotrowski, Z., Wyszogrodzki, A., Smolarkiewicz, P.: Towards petascale simulation of atmospheric circulations with soundproof equations. Acta Geophys. 59, 1294–1311 (2011)
Article Google Scholar
Rojek, K., Szustak, L.: Parallelization of EULAG model on multicore architectures with GPU accelerators. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2011, Part II. LNCS, vol. 7204, pp. 391–400. Springer, Heidelberg (2012)
Google Scholar
Rojek, K., Szustak, L., Wyrzykowski, R.: Performance analysis for stencil-based 3D MPDATA algorithm on GPU architecture. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2013, Part I. LNCS, vol. 8384, pp. 145–154. Springer, Heidelberg (2014)
Google Scholar
Smolarkiewicz, P.: Multidimensional positive definite advection transport algorithm: an overview. Int. J. Numer. Meth. Fluids 50, 1123–1144 (2006)
Article MATH MathSciNet Google Scholar
Treibig, J., Wellein, G., Hager, G.: Efficient multicore-aware parallelization strategies for iterative stencil computations. J. Comput. Sci. 2, 130–137 (2011)
Article Google Scholar
Wittmann, M., Hager, G., Treibig, J., Wellein, G.: Leveraging shared caches for parallel temporal blocking of stencil codes on multicore processors and clusters. Parallel Process. Lett. 20(4), 359–376 (2010)
Article MathSciNet Google Scholar
Wyrzykowski, R., Rojek, K., Szustak, L.: Model-driven adaptation of double-precision matrix multiplication to the cell processor architecture. Parallel Comput. 38, 260–276 (2012)
Article Google Scholar
Wyrzykowski, R., Rojek, K., Szustak, L.: Using blue gene/P and GPUs to accelerate computations in the EULAG model. In: Lirkov, I., Margenov, S., Waśniewski, J. (eds.) LSSC 2011. LNCS, vol. 7116, pp. 670–677. Springer, Heidelberg (2012)
Google Scholar

Download references

Acknowledgments

This work was supported in part by the Polish National Science Centre under grant no. UMO-2011/03/B/ST6/03500.

We gratefully acknowledge the help and support provided by Jamie Wilcox from Intel EMEA Technical Marketing HPC Lab.

Author information

Authors and Affiliations

Czestochowa University of Technology, Dabrowskiego 69, 42-201, Czestochowa, Poland
Lukasz Szustak & Krzysztof Rojek
Intel Corporation, Swindon, UK
Pawel Gepner

Authors

Lukasz Szustak
View author publications
You can also search for this author in PubMed Google Scholar
Krzysztof Rojek
View author publications
You can also search for this author in PubMed Google Scholar
Pawel Gepner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lukasz Szustak .

Editor information

Editors and Affiliations

Institute of Computer and Information Science, Czestochowa University of Technology, Czestochowa, Poland
Roman Wyrzykowski
University of Tennessee, Department of Computer Science, Knoxville, Tennessee, USA
Jack Dongarra
Institute of Computer and Information Science, Czestochowa University of Technology, Czestochowa, Poland
Konrad Karczewski
Technical University of Denmark Informatics and Mathematical Modelling, Kongens Lyngby, Denmark
Jerzy Waśniewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Szustak, L., Rojek, K., Gepner, P. (2014). Using Intel Xeon Phi Coprocessor to Accelerate Computations in MPDATA Algorithm. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2013. Lecture Notes in Computer Science(), vol 8384. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55224-3_54

Download citation

DOI: https://doi.org/10.1007/978-3-642-55224-3_54
Published: 06 May 2014
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-55223-6
Online ISBN: 978-3-642-55224-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics