Skip to main content

SCIPHI Score-P and Cube Extensions for Intel Phi

  • Conference paper
  • First Online:
Tools for High Performance Computing 2017 (PTHPC 2017)

Abstract

The Knights Landing processors offers unique features with regards to memory hierarchy and vectorization capabilities. To improve tool support within these two areas, we present extensions to the Score-P measurement infrastructure and the Cube report explorer. With the Knights Landing edition, Intel introduced a new memory architecture, utilizing two types of memory, MCDRAM and DDR4 SDRAM. To assist the user in the decision where to place data structures, we introduce a MCDRAM candidate metric to the Cube report explorer. In addition we track all MCDRAM allocations through the hbwmalloc interface, providing memory metrics like leaked memory or the high-water mark on a per-region basis, as already known for the ubiquitous malloc/free. A Score-P metric plugin that records memory statistics via numastat on a per process level enables a timeline analysis using the Vampir toolset. To get the best performance out of , the large vector processing units need to be utilized effectively. The ratio between computation and data access and the vector processing unit (VPU) intensity are introduced as metrics to identify vectorization candidates on a per-region basis. The Portable Hardware Locality (hwloc) Broquedis et al. (hwloc: a generic framework for managing hardware affinities in hpc applications, 2010 [2]) library allows us to visualize the distribution of the KNL-specific performance metrics within the Cube report explorer, taking the hardware topology consisting of processor tiles and cores into account.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/memkind.

  2. 2.

    http://jemalloc.net/.

  3. 3.

    malloc,realloc,calloc,free,memalign,posix_memalign,valloc.

References

  1. Adhianto, L., Banerjee, S., Fagan, M., Krentel, M., Marin, G., Mellor-Crummey, J., Tallent, N.R.: Hpctoolkit: tools for performance analysis of optimized parallel programs. Concurr. Comput.: Pract. Exper., 22(6):685–701, April 2010 http://hpctoolkit.org

  2. Broquedis, F., Clet-Ortega, J., Moreaud, S., Furmento, N., Goglin, B., Mercier, G., Thibault, S., Namyst, R.: hwloc: a generic framework for managing hardware affinities in hpc applications. In IEEE, editor, PDP: The 18th Euromicro International Conference on Parallel, p. 2010. Distributed and Network-Based Computing, Pisa, Italy, February (2010)

    Google Scholar 

  3. Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P.: A portable programming interface for performance evaluation on modern processors. Int. J. High Perform. Comput. Appl. 14(3), 189–204 (2000)

    Article  Google Scholar 

  4. Eschweiler, D., Wagner, M., Geimer, M., Knüpfer, A., Nagel, W.E., Wolf, F.: Open trace format 2—the next generation of scalable trace formats and support libraries. In: Proceedings of the International Conference on Parallel Computing (ParCo), Ghent, Belgium, August 30–September 2 2011, vol. 22 of Advances in Parallel Computing, pp. 481–490. IOS Press (2012)

    Google Scholar 

  5. Intel Corporation. Intel architecture instruction set extensions programming reference. https://software.intel.com/isa-extensions

  6. Intel\(^{R}\) VTune\(^{{\rm TM}}\) amplifier. https://software.intel.com/en-us/intel-vtune-amplifier-xe

  7. Jurenz, M., Brendel, R., Knüpfer, A., Müller, M., Nagel, W.E.: Memory allocation tracing with VampirTrace, pp. 839–846. Springer, Berlin, Heidelberg (2007)

    Google Scholar 

  8. Knüpfer, A., Brunst, H., Doleschal, J., Jurenz, M., Lieber, M., Mickler, H., Müller, M.S., Nagel, W.E.: The Vampir performance analysis tool-set, pp. 139–155. Springer, Berlin, Heidelberg (2008)

    Google Scholar 

  9. Knüpfer, A., Rössel, C., an Mey, D., Biersdorff, S., Diethelm, K., Eschweiler, D., Geimer, M., Gerndt, M., Lorenz, D., Malony, A.D., Nagel, W,E., Oleynik, Y., Philippen, P., Saviankou, P., Schmidl, D., Shende, S.S., Tschüter, R., Wagner, M., Wesarg, B., Wolf, F.: Score-P—a joint performance measurement run-time infrastructure for Periscope, Scalasca, TAU, and Vampir. In: Proceedings of 5th Parallel Tools Workshop, 2011, Dresden, Germany, pp. 79–91. Springer, Berlin, Heidelberg, September 2012

    Google Scholar 

  10. Liu, X., Mellor-Crummey, J.: A data-centric profiler for parallel programs. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2013, pp. 28:1–28:12. ACM, New York, NY, USA (2013)

    Google Scholar 

  11. Liu, X., Wu, B.: Scaanalyzer: a tool to identify memory scalability bottlenecks in parallel programs. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015, pages 47:1–47:12. ACM, New York, NY, USA (2015)

    Google Scholar 

  12. Lorenz, D., Böhme, D., Mohr, B., Strube, A., Szebenyi, Z.: Extending Scalasca’s analysis features. In: Cheptsov, A., Brinkmann, S., Gracia, J., Resch, M.M., Nagel, W.E. (eds.) Tools for High Performance Computing 2012, pp. 115–126. Springer, Berlin, Heidelberg (2013)

    Chapter  Google Scholar 

  13. Mallinson, A.C., Beckingsale, D.A., Gaudin, W.P., Herdman, J.A., Levesque, J.M., Jarvis, S.A.: Cloverleaf: preparing hydrodynamics codes for exascale. In: A New Vintage of Computing: CUG2013. Cray User Group, Inc. (2013)

    Google Scholar 

  14. Marconi, new CINECA tier-0 system. http://www.hpc.cineca.it/hardware/marconi

  15. Reinders, J., Jeffers, J., Sodani, A.: Intel Xeon Phi Processor High Performance Programming Knights, Landing edn. Morgan Kaufmann Publishers Inc., Boston, MA, USA (2016)

    Google Scholar 

  16. Saviankou, P., Knobloch, M., Visser, A., Mohr, B.: Cube v4 From performance report explorer to performance analysis tool. Proced. Comput. Sci. 51, 1343–1352 (2015)

    Article  Google Scholar 

  17. Schöne, R., Tschüter, R., Ilsche, T., Hackenberg, D.: The VampirTrace Plugin Counter Interface: Introduction and Examples, pp. 501–511. Springer, Berlin, Heidelberg (2011)

    Google Scholar 

  18. Shende, S.S., Malony, A.D.: The tau parallel performance system. Int. J. High Perform. Comput. Appl. 20(2), 287–311 (2006)

    Article  Google Scholar 

  19. Treibig, J., Hager, G., Wellein, G.: Likwid: a lightweight performance-oriented tool suite for x86 multicore environments. In: Proceedings of PSTI 2010, the First International Workshop on Parallel Software Tools and Tool Infrastructures, San Diego CA (2010)

    Google Scholar 

  20. Van der Wijngaart, R.F., Jin, H.: NAS Parallel Benchmarks, Multi-Zone versions. Technical Report NAS-03-010, NASA Ames Research Center, Moffett Field, CA, USA, July 2003. http://www.nas.nasa.gov/Software/NPB/

  21. Wylie, B.J.N., Mohr, B., Wolf, F.: Holistic hardware counter performance analysis of parallel programs. In: Proceedings of the Conference on Parallel Computing (ParCo), Malaga, Spain, pp. 187–194, September 2005

    Google Scholar 

Download references

Acknowledgements

We would like to express our thanks to Intel Corporation, who supported this work by the Intel Gift Grant “SCIPHI—Score-P and Cube extensions for Intel PHI”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marc Schlütter .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Schlütter, M., Feld, C., Saviankou, P., Knobloch, M., Hermanns, MA., Mohr, B. (2019). SCIPHI Score-P and Cube Extensions for Intel Phi. In: Niethammer, C., Resch, M., Nagel, W., Brunst, H., Mix, H. (eds) Tools for High Performance Computing 2017. PTHPC 2017. Springer, Cham. https://doi.org/10.1007/978-3-030-11987-4_6

Download citation

Publish with us

Policies and ethics