Assessing the Performance of OpenMP Programs on the Intel Xeon Phi

Schmidl, Dirk; Cramer, Tim; Wienke, Sandra; Terboven, Christian; Müller, Matthias S.

doi:10.1007/978-3-642-40047-6_56

Dirk Schmidl^19,21,
Tim Cramer^19,21,
Sandra Wienke^19,21,
Christian Terboven^19,21 &
…
Matthias S. Müller^19,20,21

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8097))

Included in the following conference series:

European Conference on Parallel Processing

4052 Accesses
36 Citations

Abstract

The Intel Xeon Phi has been introduced as a new type of compute accelerator that is capable of executing native x86 applications. It supports programming models that are well-established in the HPC community, namely MPI and OpenMP, thus removing the necessity to refactor codes for using accelerator-specific programming paradigms. Because of its native x86 support, the Xeon Phi may also be used stand-alone, meaning codes can be executed directly on the device without the need for interaction with a host. In this sense, the Xeon Phi resembles a big SMP on a chip if its 240 logical cores are compared to a common Xeon-based compute node offering up to 32 logical cores. In this work, we compare a Xeon-based two-socket compute node with the Xeon Phi stand-alone in scalability and performance using OpenMP codes. Considering both as individual SMP systems, they come at a very similar price and power envelope, but our results show significant differences in absolute application performance and scalability. We also show in how far common programming idioms for the Xeon multi-core architecture are applicable for the Xeon Phi many-core architecture and which challenges the changing ratio of core count to single core performance poses for the application programmer.

Parts of this work were funded by the German Federal Ministry of Research and Education (BMBF) under Grant No. 01IH11006.

Download to read the full chapter text

Chapter PDF

Utilizing Multiple Xeon Phi Coprocessors on One Compute Node

Portability and Scalability of OpenMP Offloading on State-of-the-Art Accelerators

HIPLZ: Enabling Performance Portability for Exascale Systems

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Fatoohi, R.A., Frederickson, P.O., Lasinski, T.A., Simon, H.D., Venkatakrishnan, V., Weeratunga, S.K.: The nas parallel benchmarks. Technical report, NASA Ames Research Center (1991)
Google Scholar
Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proc. of the Conference on High Performance Computing Networking, Storage and Analysis, SC 2009, pp. 18:1–18:11. ACM, New York (2009)
Google Scholar
Bücker, H.M., Beucker, R., Rupp, A.: Parallel Minimum p-Norm Solution of the Neuromagnetic Inverse Problem for Realistic Signals Using Exact Hessian-Vector Products. SIAM J. on Scientific Computing 30(6), 2905–2921 (2008)
Article MATH Google Scholar
Bull, J.M.: Measuring Synchronisation and Scheduling Overheads in OpenMP. In: Proc. of First European Workshop on OpenMP, pp. 99–105 (1999)
Google Scholar
Terboven, C., an Mey, D., Schmidl, D., Jin, H., Wagner, M.: Data and Thread Affinity in OpenMP Programs. In: Proc. of the 2008 Workshop on Memory Access on Future Processors: A Solved Problem?, MAW 2008, pp. 377–384. ACM (2008)
Google Scholar
Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Skadron, K.: A performance study of general-purpose applications on graphics processors using CUDA. J. Parallel Distrib. Comput. 68(10), 1370–1380 (2008)
Article Google Scholar
Cramer, T., Schmidl, D., Klemm, M., an Mey, D.: OpenMP Programming on Intel Xeon Phi Coprocessors: An Early Performance Comparison. In: Proc. of the Many-core Applications Research Community Symposium, pp. 38–44 (November 2012)
Google Scholar
Davis, T.A.: University of Florida Sparse Matrix Collection. NA Digest 92 (1994)
Google Scholar
Deselaers, T., Keysers, D., Ney, H.: Features for image retrieval: an experimental comparison. Information Retrieval 11(2), 77–107 (2008)
Article Google Scholar
Gerndt, A., Sarholz, S., Wolter, M., an Mey, D., Bischof, C., Kuhlen, T.: Nested OpenMP for Efficient Computation of 3D Critical Points in Multi-Block CFD Datasets. In: SC 2006 Conference, Proc. of the ACM/IEEE 2006, p. 46 (November 2006)
Google Scholar
Hestenes, M.R., Stiefel, E.: Methods of Conjugate Gradients for Solving Linear Systems. J. of Research of the National Bureau of Standards 49(6), 409–436 (1952)
Article MathSciNet MATH Google Scholar
McCalpin, J.: STREAM: Sustainable Memory Bandwidth in High Performance Computers
Google Scholar
McVoy, L., Staelin, C.: lmbench: portable tools for performance analysis. In: Proc. of the 1996 Annual Conference on USENIX, ATEC 1996, p. 23. USENIX Association, Berkeley (1996)
Google Scholar
Park, J., Tang, P.T.P., Smelyanskiy, M., Kim, D., Benson, T.: Efficient backprojection-based synthetic aperture radar computation with many-core processors. In: Proc. of the Int. Conference on High Performance Computing, Networking, Storage and Analysis, SC 2012, pp. 28:1–28:11. IEEE Computer Society Press, Los Alamitos (2012)
Google Scholar
Schulz, K.W., Ulerich, R., Malaya, N., Bauman, P.T., Stogner, R., Simmons, C.: Early Experiences Porting Scientific Applications to the Many Integrated Core (MIC) Platform. Technical report, TACC-Intel Highly Parallel Computing Symposium (April 2012)
Google Scholar
Terboven, C., Schmidl, D., Cramer, T., an Mey, D.: Assessing OpenMP Tasking Implementations on NUMA Architectures. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 182–195. Springer, Heidelberg (2012)
Chapter Google Scholar
Terboven, C., Schmidl, D., Cramer, T., an Mey, D.: Task-Parallel Programming on NUMA Architectures. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 638–649. Springer, Heidelberg (2012)
Chapter Google Scholar
Volkov, V., Demmel, J.W.: Benchmarking GPUs to tune dense linear algebra. In: Proc. of the 2008 ACM/IEEE Conference on Supercomputing, SC 2008 (2008)
Google Scholar
Wienke, S., Plotnikov, D., an Mey, D., Bischof, C., Hardjosuwito, A., Gorgels, C., Brecher, C.: Simulation of bevel gear cutting with GPGPUs - performance and productivity. Computer Science - Research and Development 26, 165–174 (2011)
Article Google Scholar
Williams, S., Kalamkar, D.D., Singh, A., Deshpande, A.M., Van Straalen, B., Smelyanskiy, M., Almgren, A., Dubey, P., Shalf, J., Oliker, L.: Optimization of geometric multigrid for emerging multi- and manycore processors. In: Proc. of the Int. Conference on HPC, Networking, Storage and Analysis, SC 2012 (2012)
Google Scholar
Williams, S., Waterman, A., Patterson, D.: Roofline: An Insightful Visual Performance Model for Multicore Architectures. Commun. ACM 52(4), 65–76 (2009)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Center for Computing and Communication, RWTH Aachen University, D - 52074, Aachen, Germany
Dirk Schmidl, Tim Cramer, Sandra Wienke, Christian Terboven & Matthias S. Müller
Chair for High Performance Computing, RWTH Aachen University, D - 52074, Aachen, Germany
Matthias S. Müller
JARA High-Performance Computing, Schinkelstrae 2, D 52062, Aachen, Germany
Dirk Schmidl, Tim Cramer, Sandra Wienke, Christian Terboven & Matthias S. Müller

Authors

Dirk Schmidl
View author publications
You can also search for this author in PubMed Google Scholar
Tim Cramer
View author publications
You can also search for this author in PubMed Google Scholar
Sandra Wienke
View author publications
You can also search for this author in PubMed Google Scholar
Christian Terboven
View author publications
You can also search for this author in PubMed Google Scholar
Matthias S. Müller
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

German Research School for Simulation Sciences, RWTH Aachen, Schinkelstr. 2a, 52062, Aachen, Germany
Felix Wolf
Jülich Supercomputing Centre, Forschungszentrum Jülich GmbH, Station 22,, 52425, Jülich, Germany
Bernd Mohr
Center for Computing and Communication, RWTH Aachen, Seffenter Weg 23, 52074, Aachen, Germany
Dieter an Mey

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schmidl, D., Cramer, T., Wienke, S., Terboven, C., Müller, M.S. (2013). Assessing the Performance of OpenMP Programs on the Intel Xeon Phi. In: Wolf, F., Mohr, B., an Mey, D. (eds) Euro-Par 2013 Parallel Processing. Euro-Par 2013. Lecture Notes in Computer Science, vol 8097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40047-6_56

Download citation

DOI: https://doi.org/10.1007/978-3-642-40047-6_56
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40046-9
Online ISBN: 978-3-642-40047-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Assessing the Performance of OpenMP Programs on the Intel Xeon Phi

Abstract

Chapter PDF

Similar content being viewed by others

Utilizing Multiple Xeon Phi Coprocessors on One Compute Node

Portability and Scalability of OpenMP Offloading on State-of-the-Art Accelerators

HIPLZ: Enabling Performance Portability for Exascale Systems

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Assessing the Performance of OpenMP Programs on the Intel Xeon Phi

Abstract

Chapter PDF

Similar content being viewed by others

Utilizing Multiple Xeon Phi Coprocessors on One Compute Node

Portability and Scalability of OpenMP Offloading on State-of-the-Art Accelerators

HIPLZ: Enabling Performance Portability for Exascale Systems

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation