Abstract
The Knights Landing processors offers unique features with regards to memory hierarchy and vectorization capabilities. To improve tool support within these two areas, we present extensions to the Score-P measurement infrastructure and the Cube report explorer. With the Knights Landing edition, Intel introduced a new memory architecture, utilizing two types of memory, MCDRAM and DDR4 SDRAM. To assist the user in the decision where to place data structures, we introduce a MCDRAM candidate metric to the Cube report explorer. In addition we track all MCDRAM allocations through the hbwmalloc interface, providing memory metrics like leaked memory or the high-water mark on a per-region basis, as already known for the ubiquitous malloc/free. A Score-P metric plugin that records memory statistics via numastat on a per process level enables a timeline analysis using the Vampir toolset. To get the best performance out of , the large vector processing units need to be utilized effectively. The ratio between computation and data access and the vector processing unit (VPU) intensity are introduced as metrics to identify vectorization candidates on a per-region basis. The Portable Hardware Locality (hwloc) Broquedis et al. (hwloc: a generic framework for managing hardware affinities in hpc applications, 2010 [2]) library allows us to visualize the distribution of the KNL-specific performance metrics within the Cube report explorer, taking the hardware topology consisting of processor tiles and cores into account.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
malloc,realloc,calloc,free,memalign,posix_memalign,valloc.
References
Adhianto, L., Banerjee, S., Fagan, M., Krentel, M., Marin, G., Mellor-Crummey, J., Tallent, N.R.: Hpctoolkit: tools for performance analysis of optimized parallel programs. Concurr. Comput.: Pract. Exper., 22(6):685–701, April 2010 http://hpctoolkit.org
Broquedis, F., Clet-Ortega, J., Moreaud, S., Furmento, N., Goglin, B., Mercier, G., Thibault, S., Namyst, R.: hwloc: a generic framework for managing hardware affinities in hpc applications. In IEEE, editor, PDP: The 18th Euromicro International Conference on Parallel, p. 2010. Distributed and Network-Based Computing, Pisa, Italy, February (2010)
Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P.: A portable programming interface for performance evaluation on modern processors. Int. J. High Perform. Comput. Appl. 14(3), 189–204 (2000)
Eschweiler, D., Wagner, M., Geimer, M., Knüpfer, A., Nagel, W.E., Wolf, F.: Open trace format 2—the next generation of scalable trace formats and support libraries. In: Proceedings of the International Conference on Parallel Computing (ParCo), Ghent, Belgium, August 30–September 2 2011, vol. 22 of Advances in Parallel Computing, pp. 481–490. IOS Press (2012)
Intel Corporation. Intel architecture instruction set extensions programming reference. https://software.intel.com/isa-extensions
Intel\(^{R}\) VTune\(^{{\rm TM}}\) amplifier. https://software.intel.com/en-us/intel-vtune-amplifier-xe
Jurenz, M., Brendel, R., Knüpfer, A., Müller, M., Nagel, W.E.: Memory allocation tracing with VampirTrace, pp. 839–846. Springer, Berlin, Heidelberg (2007)
Knüpfer, A., Brunst, H., Doleschal, J., Jurenz, M., Lieber, M., Mickler, H., Müller, M.S., Nagel, W.E.: The Vampir performance analysis tool-set, pp. 139–155. Springer, Berlin, Heidelberg (2008)
Knüpfer, A., Rössel, C., an Mey, D., Biersdorff, S., Diethelm, K., Eschweiler, D., Geimer, M., Gerndt, M., Lorenz, D., Malony, A.D., Nagel, W,E., Oleynik, Y., Philippen, P., Saviankou, P., Schmidl, D., Shende, S.S., Tschüter, R., Wagner, M., Wesarg, B., Wolf, F.: Score-P—a joint performance measurement run-time infrastructure for Periscope, Scalasca, TAU, and Vampir. In: Proceedings of 5th Parallel Tools Workshop, 2011, Dresden, Germany, pp. 79–91. Springer, Berlin, Heidelberg, September 2012
Liu, X., Mellor-Crummey, J.: A data-centric profiler for parallel programs. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2013, pp. 28:1–28:12. ACM, New York, NY, USA (2013)
Liu, X., Wu, B.: Scaanalyzer: a tool to identify memory scalability bottlenecks in parallel programs. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015, pages 47:1–47:12. ACM, New York, NY, USA (2015)
Lorenz, D., Böhme, D., Mohr, B., Strube, A., Szebenyi, Z.: Extending Scalasca’s analysis features. In: Cheptsov, A., Brinkmann, S., Gracia, J., Resch, M.M., Nagel, W.E. (eds.) Tools for High Performance Computing 2012, pp. 115–126. Springer, Berlin, Heidelberg (2013)
Mallinson, A.C., Beckingsale, D.A., Gaudin, W.P., Herdman, J.A., Levesque, J.M., Jarvis, S.A.: Cloverleaf: preparing hydrodynamics codes for exascale. In: A New Vintage of Computing: CUG2013. Cray User Group, Inc. (2013)
Marconi, new CINECA tier-0 system. http://www.hpc.cineca.it/hardware/marconi
Reinders, J., Jeffers, J., Sodani, A.: Intel Xeon Phi Processor High Performance Programming Knights, Landing edn. Morgan Kaufmann Publishers Inc., Boston, MA, USA (2016)
Saviankou, P., Knobloch, M., Visser, A., Mohr, B.: Cube v4 From performance report explorer to performance analysis tool. Proced. Comput. Sci. 51, 1343–1352 (2015)
Schöne, R., Tschüter, R., Ilsche, T., Hackenberg, D.: The VampirTrace Plugin Counter Interface: Introduction and Examples, pp. 501–511. Springer, Berlin, Heidelberg (2011)
Shende, S.S., Malony, A.D.: The tau parallel performance system. Int. J. High Perform. Comput. Appl. 20(2), 287–311 (2006)
Treibig, J., Hager, G., Wellein, G.: Likwid: a lightweight performance-oriented tool suite for x86 multicore environments. In: Proceedings of PSTI 2010, the First International Workshop on Parallel Software Tools and Tool Infrastructures, San Diego CA (2010)
Van der Wijngaart, R.F., Jin, H.: NAS Parallel Benchmarks, Multi-Zone versions. Technical Report NAS-03-010, NASA Ames Research Center, Moffett Field, CA, USA, July 2003. http://www.nas.nasa.gov/Software/NPB/
Wylie, B.J.N., Mohr, B., Wolf, F.: Holistic hardware counter performance analysis of parallel programs. In: Proceedings of the Conference on Parallel Computing (ParCo), Malaga, Spain, pp. 187–194, September 2005
Acknowledgements
We would like to express our thanks to Intel Corporation, who supported this work by the Intel Gift Grant “SCIPHI—Score-P and Cube extensions for Intel PHI”.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Schlütter, M., Feld, C., Saviankou, P., Knobloch, M., Hermanns, MA., Mohr, B. (2019). SCIPHI Score-P and Cube Extensions for Intel Phi. In: Niethammer, C., Resch, M., Nagel, W., Brunst, H., Mix, H. (eds) Tools for High Performance Computing 2017. PTHPC 2017. Springer, Cham. https://doi.org/10.1007/978-3-030-11987-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-11987-4_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11986-7
Online ISBN: 978-3-030-11987-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)