SCIPHI Score-P and Cube Extensions for Intel Phi

  • Marc SchlütterEmail author
  • Christian Feld
  • Pavel Saviankou
  • Michael Knobloch
  • Marc-André Hermanns
  • Bernd Mohr
Conference paper


The Open image in new window Knights Landing processors offers unique features with regards to memory hierarchy and vectorization capabilities. To improve tool support within these two areas, we present extensions to the Score-P measurement infrastructure and the Cube report explorer. With the Knights Landing edition, Intel introduced a new memory architecture, utilizing two types of memory, MCDRAM and DDR4 SDRAM. To assist the user in the decision where to place data structures, we introduce a MCDRAM candidate metric to the Cube report explorer. In addition we track all MCDRAM allocations through the hbwmalloc interface, providing memory metrics like leaked memory or the high-water mark on a per-region basis, as already known for the ubiquitous malloc/free. A Score-P metric plugin that records memory statistics via numastat on a per process level enables a timeline analysis using the Vampir toolset. To get the best performance out of Open image in new window , the large vector processing units need to be utilized effectively. The ratio between computation and data access and the vector processing unit (VPU) intensity are introduced as metrics to identify vectorization candidates on a per-region basis. The Portable Hardware Locality (hwloc) Broquedis et al. (hwloc: a generic framework for managing hardware affinities in hpc applications, 2010 [2]) library allows us to visualize the distribution of the KNL-specific performance metrics within the Cube report explorer, taking the hardware topology consisting of processor tiles and cores into account.



We would like to express our thanks to Intel Corporation, who supported this work by the Intel Gift Grant “SCIPHI—Score-P and Cube extensions for Intel PHI”.


  1. 1.
    Adhianto, L., Banerjee, S., Fagan, M., Krentel, M., Marin, G., Mellor-Crummey, J., Tallent, N.R.: Hpctoolkit: tools for performance analysis of optimized parallel programs. Concurr. Comput.: Pract. Exper., 22(6):685–701, April 2010
  2. 2.
    Broquedis, F., Clet-Ortega, J., Moreaud, S., Furmento, N., Goglin, B., Mercier, G., Thibault, S., Namyst, R.: hwloc: a generic framework for managing hardware affinities in hpc applications. In IEEE, editor, PDP: The 18th Euromicro International Conference on Parallel, p. 2010. Distributed and Network-Based Computing, Pisa, Italy, February (2010)Google Scholar
  3. 3.
    Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P.: A portable programming interface for performance evaluation on modern processors. Int. J. High Perform. Comput. Appl. 14(3), 189–204 (2000)CrossRefGoogle Scholar
  4. 4.
    Eschweiler, D., Wagner, M., Geimer, M., Knüpfer, A., Nagel, W.E., Wolf, F.: Open trace format 2—the next generation of scalable trace formats and support libraries. In: Proceedings of the International Conference on Parallel Computing (ParCo), Ghent, Belgium, August 30–September 2 2011, vol. 22 of Advances in Parallel Computing, pp. 481–490. IOS Press (2012)Google Scholar
  5. 5.
    Intel Corporation. Intel architecture instruction set extensions programming reference.
  6. 6.
    Intel\(^{R}\) VTune\(^{{\rm TM}}\) amplifier.
  7. 7.
    Jurenz, M., Brendel, R., Knüpfer, A., Müller, M., Nagel, W.E.: Memory allocation tracing with VampirTrace, pp. 839–846. Springer, Berlin, Heidelberg (2007)Google Scholar
  8. 8.
    Knüpfer, A., Brunst, H., Doleschal, J., Jurenz, M., Lieber, M., Mickler, H., Müller, M.S., Nagel, W.E.: The Vampir performance analysis tool-set, pp. 139–155. Springer, Berlin, Heidelberg (2008)Google Scholar
  9. 9.
    Knüpfer, A., Rössel, C., an Mey, D., Biersdorff, S., Diethelm, K., Eschweiler, D., Geimer, M., Gerndt, M., Lorenz, D., Malony, A.D., Nagel, W,E., Oleynik, Y., Philippen, P., Saviankou, P., Schmidl, D., Shende, S.S., Tschüter, R., Wagner, M., Wesarg, B., Wolf, F.: Score-P—a joint performance measurement run-time infrastructure for Periscope, Scalasca, TAU, and Vampir. In: Proceedings of 5th Parallel Tools Workshop, 2011, Dresden, Germany, pp. 79–91. Springer, Berlin, Heidelberg, September 2012Google Scholar
  10. 10.
    Liu, X., Mellor-Crummey, J.: A data-centric profiler for parallel programs. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2013, pp. 28:1–28:12. ACM, New York, NY, USA (2013)Google Scholar
  11. 11.
    Liu, X., Wu, B.: Scaanalyzer: a tool to identify memory scalability bottlenecks in parallel programs. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015, pages 47:1–47:12. ACM, New York, NY, USA (2015)Google Scholar
  12. 12.
    Lorenz, D., Böhme, D., Mohr, B., Strube, A., Szebenyi, Z.: Extending Scalasca’s analysis features. In: Cheptsov, A., Brinkmann, S., Gracia, J., Resch, M.M., Nagel, W.E. (eds.) Tools for High Performance Computing 2012, pp. 115–126. Springer, Berlin, Heidelberg (2013)CrossRefGoogle Scholar
  13. 13.
    Mallinson, A.C., Beckingsale, D.A., Gaudin, W.P., Herdman, J.A., Levesque, J.M., Jarvis, S.A.: Cloverleaf: preparing hydrodynamics codes for exascale. In: A New Vintage of Computing: CUG2013. Cray User Group, Inc. (2013)Google Scholar
  14. 14.
    Marconi, new CINECA tier-0 system.
  15. 15.
    Reinders, J., Jeffers, J., Sodani, A.: Intel Xeon Phi Processor High Performance Programming Knights, Landing edn. Morgan Kaufmann Publishers Inc., Boston, MA, USA (2016)Google Scholar
  16. 16.
    Saviankou, P., Knobloch, M., Visser, A., Mohr, B.: Cube v4 From performance report explorer to performance analysis tool. Proced. Comput. Sci. 51, 1343–1352 (2015)CrossRefGoogle Scholar
  17. 17.
    Schöne, R., Tschüter, R., Ilsche, T., Hackenberg, D.: The VampirTrace Plugin Counter Interface: Introduction and Examples, pp. 501–511. Springer, Berlin, Heidelberg (2011)Google Scholar
  18. 18.
    Shende, S.S., Malony, A.D.: The tau parallel performance system. Int. J. High Perform. Comput. Appl. 20(2), 287–311 (2006)CrossRefGoogle Scholar
  19. 19.
    Treibig, J., Hager, G., Wellein, G.: Likwid: a lightweight performance-oriented tool suite for x86 multicore environments. In: Proceedings of PSTI 2010, the First International Workshop on Parallel Software Tools and Tool Infrastructures, San Diego CA (2010)Google Scholar
  20. 20.
    Van der Wijngaart, R.F., Jin, H.: NAS Parallel Benchmarks, Multi-Zone versions. Technical Report NAS-03-010, NASA Ames Research Center, Moffett Field, CA, USA, July 2003.
  21. 21.
    Wylie, B.J.N., Mohr, B., Wolf, F.: Holistic hardware counter performance analysis of parallel programs. In: Proceedings of the Conference on Parallel Computing (ParCo), Malaga, Spain, pp. 187–194, September 2005Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Marc Schlütter
    • 1
    Email author
  • Christian Feld
    • 1
  • Pavel Saviankou
    • 1
  • Michael Knobloch
    • 1
  • Marc-André Hermanns
    • 2
  • Bernd Mohr
    • 2
  1. 1.Forschungszentrum Jülich GmbH JSC, Jülich Supercomputing Centre, Forschungszentrum Jülich GmbHJülichGermany
  2. 2.JARA-HPC, Jülich Supercomputing Centre, Forschungszentrum Jülich GmbHJülichGermany

Personalised recommendations