Abstract
Performance anomalies involving interconnection networks have largely remained a “black box” for developers relying on traditional CPU profilers. Network-side profilers collect aggregate statistics and lack source-code attribution. We have incorporated an effective protocol extension in the Gen-Z communication protocol for tagging network packets in an interconnection network; additionally, we have backed the protocol extension with hardware and software enhancements that allow tracking the flow of a network transaction through every hop in the interconnection network and associate it back to the application source code. The result is a first-of-its-kind hardware-assisted telemetry of disparate, autonomous interconnection networking components with application source code association that offers better developer insights. Our scheme works on a sampling basis to ensure low runtime overhead and generates modest volumes of data. Simulation of our methods in the open-source Structural Simulation Toolkit (SST/Macro) shows its effectiveness—deep insights into the underlying network details to the developer at minimal overheads.
M. Chabbi—Work done while at Hewlett Packard Labs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Oliker, L., Canning, A., Carter, J., Shalf, J., Ethier, S.: Scientific application performance on leading scalar and vector supercomputing platforms. Int. J. High Perform. Comput. Appl. 22(1), 5–20 (2006)
Dongarra, J., Heroux, M.A.: Toward a new metric for ranking high performance computing systems. Sandia report, SAND2013-4744 312, p. 150 (2013)
Egawa, R., Komatsu, K., Momose, S., Isobe, Y., Musa, A., Takizawa, H., Kobayashi, H.: Potential of a modern vector supercomputer for practical applications: performance evaluation of SX-ACE. J. Supercomput., March 2017
Intel Inc.: Intel VTune. https://software.intel.com/en-us/intel-vtune-amplifier-xe
Adhianto, L., Banerjee, S., Fagan, M., Krentel, M., Marin, G., Mellor-Crummey, J., Tallent, N.R.: HPCToolkit: tools for performance analysis of optimized parallel programs. Concurr. Comput. Pract. Exp. 22(6), 685–701 (2010)
Geimer, M., Wolf, F., Wylie, B.J.N., Ábrahám, E., Becker, D., Mohr, B.: The Scalasca performance toolset architecture. Concurr. Comput. Pract. Exp. 22(6), 702–719 (2010)
Shende, S.S., Malony, A.D.: The Tau parallel performance system. Int. J. High Perform. Comput. Appl. 20(2), 287–311 (2006)
Oracle Inc.: Oracle Solaris Studio. http://www.oracle.com/technetwork/server-storage/solarisstudio/overview/index.html
Intel Inc.: Intel Trace Analyzer and Collector, October 2017. https://software.intel.com/en-us/intel-trace-analyzer
Allinea Inc.: Allinea MAP - C/C++ profiler and Fortran profiler for high performance Linux code, October 2017. https://www.allinea.com/products/map
Liu, X., Mellor-Crummey, J.: A data-centric profiler for parallel programs. In: Proceedings of SC13: International Conference for High Performance Computing, Networking, Storage and Analysis, vol. 28 (2013)
Rane, A., Browne, J.: Enhancing performance optimization of multicore chips and multichip nodes with data structure metrics. In: Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques, Minneapolis, MN, USA. IEEE Computer Society (2012)
Böhme, D., Geimer, M., Arnold, L., Voigtlaender, F., Wolf, F.: Identifying the root causes of wait states in large-scale parallel applications. ACM Trans. Parallel Comput. 3(2), 11:1–11:24 (2016)
Isaacs, K.E., Gamblin, T., Bhatele, A., Schulz, M., Hamann, B., Bremer, P.T.: Ordering traces logically to identify lateness in message passing programs. IEEE Trans. Parallel Distrib. Syst. 27(3), 829–840 (2016)
Weber, M., Brendel, R., Hilbrich, T., Mohror, K., Schulz, M., Brunst, H.: Structural clustering: a new approach to support performance analysis at scale. In: 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 484–493, May 2016
Isaacs, K.E., Giménez, A., Jusufi, I., Gamblin, T., Bhatele, A., Schulz, M., Hamann, B., Bremer, P.T.: State of the art of performance visualization. In: Borgo, R., Maciejewski, R., Viola, I. (eds.) EuroVis - STARs. The Eurographics Association (2014)
Valiev, M., Bylaska, E., Govind, N., Kowalski, K., Straatsma, T., Dam, H.V., Wang, D., Nieplocha, J., Apra, E., Windus, T., de Jong, W.: NWChem: a comprehensive and scalable open-source solution for large scale molecular simulations. Comput. Phys. Commun. 181(9), 1477–1489 (2010)
Kim, J., Dally, W.J., Scott, S., Abts, D.: Technology-driven, highly-scalable dragonfly topology. In: Proceedings of the 35th Annual International Symposium on Computer Architecture, ISCA 2008, Washington, DC, USA, pp. 77–88. IEEE Computer Society (2008)
Alverson, B., Kaplan, L., Roweth, D.: Cray XC Series Network. http://www.cray.com/sites/default/files/resources/CrayXCNetwork.pdf
National Energy Research Scientific Computing Center: Edison. http://www.nersc.gov/users/computational-systems/edison/
Mellor-Crummey, J.M., Scott, M.L.: Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst. 9(1), 21–65 (1991)
Gen-Z Consortium: Gen-Z: Draft Core Specification, July 2017. http://genzconsortium.org/specifications/draft-core-specification-july-2017/
Linux wiki: Linux perf tool. https://perf.wiki.kernel.org/index.php/Main_Page
Zaki, O., Lusk, E., Gropp, W., Swider, D.: Toward scalable performance visualization with jumpshot. High Perf. Comput. Appl. 13(2), 277–288 (1999)
Karrels, E., Lusk, E.: Performance analysis of MPI programs. In: Dongarra, J., Tourancheau, B. (eds.) Proceedings of the Workshop on Environments and Tools For Parallel Scientific Computing, pp. 195–200. SIAM Publications (1994)
Knüpfer, A., Brunst, H., Doleschal, J., Jurenz, M., Lieber, M., Mickler, H., Müller, M.S., Nagel, W.E.: The vampir performance analysis tool-set. Tools High Perf. Comput. 139–155 (2008)
McDonald, N.: SuperSim: a flexible event-driven cycle-accurate network simulator. https://github.com/HewlettPackard/supersim
Carothers, C.: ROSS: Rensselaer’s Optimistic Simulation System. https://github.com/carothersc/ROSS/wiki
Carothers, C.D., Bauer, D., Pearce, S.: ROSS: a high-performance, low memory, modular time warp system. In: Proceedings of the Fourteenth Workshop on Parallel and Distributed Simulation, PADS 2000, Washington, DC, USA, pp. 53–60. IEEE Computer Society (2000)
Liu, N., Carothers, C., Cope, J., Carns, P., Ross, R.: Model and simulation of exascale communication networks. J. Simul. 6(4), 227–236 (2012)
Jain, N., Bhatele, A., White, S., Gamblin, T., Kale, L.V.: Evaluating hpc networks via simulation of parallel workloads. In: SC16: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 154–165, November 2016
Rodrigues, A.F., Hemmert, K.S., Barrett, B.W., Kersey, C., Oldfield, R., Weston, M., Risen, R., Cook, J., Rosenfeld, P., CooperBalls, E., Jacob, B.: The structural simulation toolkit. SIGMETRICS Perform. Eval. Rev. 38(4), 37–42 (2011)
So-In, C.: A survey of network traffic monitoring and analysis tools. https://www.cse.wustl.edu/~jain/cse567-06/ftp/net_traffic_monitors3.pdf
Cisco Inc.: Cisco IOS NetFlow. https://www.cisco.com/c/en/us/products/ios-nx-os-software/ios-netflow/index.html
sFlow organization: sFlow. http://www.sflow.org/
Hewlett Packard Labs: Network Performance Monitoring (NWPM) Tool. https://github.com/HewlettPackard/genz_tools_network_monitoring
Plotly Technologies Inc.: Collaborative data science (2015). https://plot.ly
Acknowledgments
This work was supported (in part) by the US Department of Energy (DOE) under Cooperative Agreement DE-SC0012199, the Blackcomb 2 Project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A NWChem Profiles from HPCToolkit
B Profiles of NCAST Program
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Yoga, A., Chabbi, M. (2018). Path-Synchronous Performance Monitoring in HPC Interconnection Networks with Source-Code Attribution. In: Jarvis, S., Wright, S., Hammond, S. (eds) High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation. PMBS 2017. Lecture Notes in Computer Science(), vol 10724. Springer, Cham. https://doi.org/10.1007/978-3-319-72971-8_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-72971-8_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-72970-1
Online ISBN: 978-3-319-72971-8
eBook Packages: Computer ScienceComputer Science (R0)