Skip to main content

Assessing the Performance of OpenMP Programs on the Knights Landing Architecture

  • Conference paper
  • First Online:
Scaling OpenMP for Exascale Performance and Portability (IWOMP 2017)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10468))

Included in the following conference series:

  • 1032 Accesses

Abstract

Intel’s Knights Landing processor (KNL) is the latest product in the Xeon Phi product line. As a self-hosted system it is the first commercially available many-core architecture which can run unmodified applications. This makes KNL a very interesting option for HPC centers which have to support many different applications including community and ISV codes, where code changes are hard or impossible. Of course running any application and running any application efficiently is not the same, so it remains to investigate how efficient KNL is in executing unmodified codes from x86 servers.

In this work we will investigate the Knights Landing architecture with a focus on its ability to run OpenMP applications efficiently. Kernel benchmarks are used to investigate basic characteristics like memory latency and bandwidth. Furthermore, application-like benchmarks like the NAS parallel benchmarks or SPEC OpenMP benchmarks are used as well as real applications from RWTH Aachen University. The performance is compared to a 2-socket Broadwell system. We consider this a fair comparison as both architectures are state-of-the-art today and both roughly cost the same amount of money and consume the same amount of energy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Fatoohi, R.A., Frederickson, P.O., Lasinski, T.A., Simon, H.D., Venkatakrishnan, V., Weeratunga, S.K.: The NAS parallel benchmarks. Technical report. NASA Ames Research Center (1991)

    Google Scholar 

  2. Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC 2009, pp. 18:1–18:11, New York. ACM (2009)

    Google Scholar 

  3. Bull, J.M., O’Neill, D.: A microbenchmark suite for OpenMP 2.0. SIGARCH Comput. Archit. News 29(5), 41–48 (2001)

    Article  Google Scholar 

  4. Bull, J.M., Reid, F., McDonnell, N.: A microbenchmark suite for OpenMP tasks. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 271–274. Springer, Heidelberg (2012). doi:10.1007/978-3-642-30961-8_24

    Chapter  Google Scholar 

  5. Cramer, T., Schmidl, D., Klemm, M., an Mey, D.: OpenMP programming on intel xeon phi coprocessors: an early performance comparison. In: Proceedings of the Many-core Applications Research Community Symposium, pp. 38–44, November 2012

    Google Scholar 

  6. Duran, A., Teruel, X., Ferrer, R., Martorell, X., Ayguade, E.: Barcelona OpenMP tasks suite: a set of benchmarks targeting the exploitation of task parallelism in OpenMP. In: International Conference on Parallel Processing, 2009, ICPP 2009, pp. 124–131 (2009)

    Google Scholar 

  7. Gerndt, A., Sarholz, S., Wolter, M., Mey, D.A., Bischof, C., Kuhlen, T.: Nested OpenMP for efficient computation of 3D critical points in multi-block CFD datasets. In: Proceedings of the ACM/IEEESC 2006 Conference, p. 46, November 2006

    Google Scholar 

  8. Khronos OpenCL Working Group: The OpenCL Specification, v2.2 (2016)

    Google Scholar 

  9. McCalpin, J.D.: STREAM: Sustainable Memory Bandwidth in High Performance Computers (1995). Accessed 24 Mar 2016

    Google Scholar 

  10. McVoy, L., Staelin, C.: lmbench: portable tools for performance analysis. In: Proceedings of the 1996 Annual Conference on USENIX Annual Technical Conference, ATEC 1996, pp. 23–23, Berkeley, CA, USA. USENIX Association (1996)

    Google Scholar 

  11. Müller, M.S., et al.: SPEC OMP2012 — an application benchmark suite for parallel systems using OpenMP. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 223–236. Springer, Heidelberg (2012). doi:10.1007/978-3-642-30961-8_17

    Chapter  Google Scholar 

  12. NVIDIA: CUDA C Programming Guide, v8.0 (2016)

    Google Scholar 

  13. OpenMP ARB: OpenMP Application Program Interface, v. 4.5. http://www.openmp.org

  14. Peters, N., Wang, L.: Dissipation element analysis of scalar fields in turbulence. C. R. Mech. 334, 493–506 (2006)

    Article  MATH  Google Scholar 

  15. Reinders, J., Jeffers, J., Sodani, A.: Intel Xeon Phi Processor High Performance Programming Knights Landing Edititon. Morgan Kaufmann Publishers Inc., Boston (2016)

    Google Scholar 

  16. Schmidl, D., Cramer, T., Wienke, S., Terboven, C., Müller, M.S.: Assessing the performance of OpenMP programs on the intel xeon phi. In: Wolf, F., Mohr, B., Mey, D. (eds.) Euro-Par 2013. LNCS, vol. 8097, pp. 547–558. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40047-6_56

    Chapter  Google Scholar 

  17. Volkov, V., Demmel, J.W.: Benchmarking GPUs to tune dense linear algebra. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC 2008 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dirk Schmidl .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Schmidl, D., Wang, B., Müller, M.S. (2017). Assessing the Performance of OpenMP Programs on the Knights Landing Architecture. In: de Supinski, B., Olivier, S., Terboven, C., Chapman, B., Müller, M. (eds) Scaling OpenMP for Exascale Performance and Portability. IWOMP 2017. Lecture Notes in Computer Science(), vol 10468. Springer, Cham. https://doi.org/10.1007/978-3-319-65578-9_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-65578-9_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-65577-2

  • Online ISBN: 978-3-319-65578-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics