Advertisement

The Pitfalls of Provisioning Exascale Networks: A Trace Replay Analysis for Understanding Communication Performance

  • Joseph P. Kenny
  • Khachik Sargsyan
  • Samuel Knight
  • George Michelogiannakis
  • Jeremiah J. Wilke
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10876)

Abstract

Data movement is considered the main performance concern for exascale, including both on-node memory and off-node network communication. Indeed, many application traces show significant time spent in MPI calls, potentially indicating that faster networks must be provisioned for scalability. However, equating MPI times with network communication delays ignores synchronization delays and software overheads independent of network hardware. Using point-to-point protocol details, we explore the decomposition of MPI time into communication, synchronization and software stack components using architecture simulation. Detailed validation using Bayesian inference is used to identify the sensitivity of performance to specific latency/bandwidth parameters for different network protocols and to quantify associated uncertainties. The inference combined with trace replay shows that synchronization and MPI software stack overhead are at least as important as the network itself in determining time spent in communication routines.

Notes

Acknowledgment

This work was funded by Sandia National Laboratories, which is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s (DOE) National Nuclear Security Administration (NNSA) under contract DE-NA-0003525. The views expressed in the article do not necessarily represent the views of the U.S. Department of Energy or the United States Government.

References

  1. 1.
    MPI: A Message-Passing Interface Standard; Version 3.1 (2015). http://mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf
  2. 2.
    Open|speedshop (2017). https://openspeedshop.org/
  3. 3.
  4. 4.
  5. 5.
  6. 6.
    Vampir - Performance Optimization (2017). https://www.vampir.eu/
  7. 7.
    ASCAC Subcommittee, Lucas, et al.: Top ten exascale research challenges. US Department Of Energy Report (2014)Google Scholar
  8. 8.
    Carlin, B.P., Louis, T.A.: Bayesian Methods for Data Analysis. Chapman and Hall/CRC, Boca Raton (2011)zbMATHGoogle Scholar
  9. 9.
    Casanova, H., et al.: Versatile, scalable, and accurate simulation of distributed applications and platforms. J. Parallel Distrib. Comput. 74(10), 2899–2917 (2014)CrossRefGoogle Scholar
  10. 10.
    Chan, C.P., et al.: Topology-aware performance optimization and modeling of adaptive mesh refinement codes for exascale. In: International Workshop on Communication Optimizations in HPC (COMHPC), pp. 17–28. IEEE (2016)Google Scholar
  11. 11.
    Christensen, R.: Plane Answers to Complex Questions: The Theory of Linear Models, 3rd edn. Springer, New York (2002).  https://doi.org/10.1007/978-1-4419-9816-3CrossRefzbMATHGoogle Scholar
  12. 12.
    Degomme, A., Legrand, A., Markomanolis, G.S., Quinson, M., Stillwell, M., Suter, F.: Simulating MPI applications: the SMPI approach. IEEE Trans. Parallel Distrib. Syst. 28, 2387–2400 (2017)CrossRefGoogle Scholar
  13. 13.
    Eberius, D., Patinyasakdikul, T., Bosilca, G.: Using software-based performance counters to expose low-level open MPI performance information. In: Proceedings of the 24th European MPI Users’ Group Meeting, pp. 7:1–7:8 (2017)Google Scholar
  14. 14.
    Gamerman, D., Lopes, H.F.: Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference. Chapman and Hall/CRC, Boca Raton (2006)zbMATHGoogle Scholar
  15. 15.
    Ghanem, R., Spanos, P.: Stochastic Finite Elements: A Spectral Approach. Springer Verlag, New York (1991)CrossRefGoogle Scholar
  16. 16.
    Haario, H., Saksman, E., Tamminen, J.: An adaptive metropolis algorithm. Bernoulli 7, 223–242 (2001)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Hoefler, T., Schneider, T., Lumsdaine, A.: LogGOPSim: Simulating large-scale applications in the LogGOPS model. In: HPDC 2010: 19th ACM International Symposium on High Performance Distributed Computing, pp. 597–604 (2010)Google Scholar
  18. 18.
    Hoefler, T., Schneider, T., Lumsdaine, A.: Characterizing the influence of system noise on large-scale applications by simulation. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–11. IEEE Computer Society (2010)Google Scholar
  19. 19.
    Islam, T., Mohror, K., Schulz, M.: Exploring the capabilities of the new MPI_T interface. In: Proceedings of the 21st European MPI Users’ Group Meeting, pp. 91:91–91:96 (2014)Google Scholar
  20. 20.
    Jain, N., et al.: Evaluating HPC networks via simulation of parallel workloads. In: SC16: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 154–165 (2016)Google Scholar
  21. 21.
    Jain, N., et al.: Evaluating HPC networks via simulation of parallel workloads. In: SC16: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 154–165. IEEE (2016)Google Scholar
  22. 22.
    Jain, N., et al.: Predicting the Performance Impact of Different Fat-tree Configurations (2017)Google Scholar
  23. 23.
    Jiang, N., Becker, D.U., Michelogiannakis, G., Balfour, J.D., Towles, B., Shaw, D.E., Kim, J., Dally, W.J.: A detailed and flexible cycle-accurate Network-on-Chip simulator. In: ISPASS, pp. 86–96 (2013)Google Scholar
  24. 24.
    Jones, T., Ostrouchov, G., Koenig, G.A., Mondragon, O.H., Bridges, P.G.: An evaluation of the state of time synchronization on leadership class supercomputers. Concurr. Comput. Pract. Exp. e4341.  https://doi.org/10.1002/cpe.4341CrossRefGoogle Scholar
  25. 25.
    Keller, R., Bosilca, G., Fagg, G., Resch, M., Dongarra, J.J.: Implementation and usage of the PERUSE-interface in open MPI. In: Mohr, B., Träff, J.L., Worringen, J., Dongarra, J. (eds.) EuroPVM/MPI 2006. LNCS, vol. 4192, pp. 347–355. Springer, Heidelberg (2006).  https://doi.org/10.1007/11846802_48CrossRefGoogle Scholar
  26. 26.
    Kim, J., Dally, W.J., Scott, S., Abts, D.: Technology-driven, highly-scalable dragonfly topology. In: Proceedings of the 35th Annual International Symposium on Computer Architecture, pp. 77–88. ISCA 2008 (2008)Google Scholar
  27. 27.
    Knüpfer, A., et al.: Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir, January 2012Google Scholar
  28. 28.
    Le Maître, O., Knio, O.: Spectral Methods for Uncertainty Quantification. Springer, New York (2010).  https://doi.org/10.1007/978-90-481-3520-2CrossRefzbMATHGoogle Scholar
  29. 29.
    Michelogiannakis, G., et al.: APHiD: hierarchical task placement to enable a tapered fat tree topology for lower power and cost in HPC networks. In: Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 228–237. IEEE Press (2017)Google Scholar
  30. 30.
    Minkenberg, C.: HPC networks: challenges and the role of optics. In: Optical Fiber Communications Conference and Exhibition (OFC), 2015, pp. 1–3. IEEE (2015)Google Scholar
  31. 31.
    National Energy Research Scientific Computing Center: Characterization of the DOE Mini-apps (2017). https://portal.nersc.gov/project/CAL/doe-miniapps.htm
  32. 32.
    Petras, K.: Smolyak cubature of given polynomial degree with few nodes for increasing dimension. Numerische Mathematik 93, 729–753 (2003)MathSciNetCrossRefGoogle Scholar
  33. 33.
    Pritchard, H., Gorodetsky, I., Buntinas, D.: A uGNI-based MPICH2 nemesis network module for the cray XE. In: 18th European MPI Users’ Group Conference on Recent Advances in the Message Passing Interface, pp. 110–119 (2011)CrossRefGoogle Scholar
  34. 34.
    Queipo, N.V., Haftka, R.T., Shyy, W., Goel, T., Vaidyanathan, R., Tucker, P.K.: Surrogate-based analysis and optimization. Prog. Aerosp. Sci. 41(1), 1–28 (2005)CrossRefGoogle Scholar
  35. 35.
    Ramesh, S., et al.: MPI performance engineering with the MPI tool interface: the integration of MVAPICH and TAU. In: Proceedings of the 24th European MPI Users’ Group Meeting, pp. 16:1–16:11. EuroMPI 2017 (2017)Google Scholar
  36. 36.
    Rodrigues, A.F., et al.: The structural simulation toolkit. ACM SIGMETRICS Perform. Eval. Rev. 38(4), 37–42 (2011)CrossRefGoogle Scholar
  37. 37.
    Rumley, S., Bahadori, M., Polster, R., Hammond, S.D., Calhoun, D.M., Wen, K., Rodrigues, A., Bergman, K.: Optical interconnects for extreme scale computing systems. Parallel Comput. 64, 65–80 (2017)MathSciNetCrossRefGoogle Scholar
  38. 38.
    Sargsyan, K., Safta, C., Najm, H., Debusschere, B., Ricciuto, D., Thornton, P.: Dimensionality reduction for complex models via Bayesian compressive sensing. Int. J. Uncertainty Quantification 4(1), 63–93 (2014)MathSciNetCrossRefGoogle Scholar
  39. 39.
    Sivia, D.S., Skilling, J.: Data Analysis: A Bayesian Tutorial, 2nd edn. Oxford University Press, New York (2006)zbMATHGoogle Scholar
  40. 40.
    Smolyak, S.A.: Quadrature and interpolation formulas for tensor products of certain classes of functions. Sov. Math. Dokl. 4, 240–243 (1963)zbMATHGoogle Scholar
  41. 41.
    Sobol, I.M.: Sensitivity estimates for nonlinear mathematical models. Math. Modeling Comput. Exper. 1, 407–414 (1993)MathSciNetzbMATHGoogle Scholar
  42. 42.
    Sudret, B.: Global sensitivity analysis using Polynomial Chaos expansions. Reliability Engineering and System Safety (2007).  https://doi.org/10.1016/j.ress.2007.04.002
  43. 43.
    Sudret, B.: Meta-models for structural reliability and uncertainty quantification. In: Asian-Pacific Symposium on Structural Reliability and its Applications, pp. 1–24 (2012)Google Scholar
  44. 44.
    Susukita, R., et al.: Performance prediction of large-scale parallel system and application using macro-level simulation. In: SC 2008: International Conference for High Performance Computing, Networking, Storage and Analysis (2008)Google Scholar
  45. 45.
    Thakur, R., Rabenseifner, R., Gropp, W.: Optimization of collective communication operations in MPICH. Int. J. High Perform. Comput. Appl. 19(1), 49–66 (2005)CrossRefGoogle Scholar
  46. 46.
    Totoni, E., et al.: Simulation-based performance analysis and tuning for a two-level directly connected system. In: IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS), 2011, pp. 340–347. IEEE (2011)Google Scholar
  47. 47.
    Wilke, J.J., Sargsyan, K., Kenny, J.P., Debusschere, B., Najm, H.N., Hendry, G.: Validation and uncertainty assessment of extreme-scale HPC simulation through Bayesian inference. In: Wolf, F., Mohr, B., an Mey, D. (eds.) Euro-Par 2013. LNCS, vol. 8097, pp. 41–52. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-40047-6_7CrossRefGoogle Scholar
  48. 48.
    Yoga, A., Chabbi, M.: Path-synchronous performance monitoring in HPC interconnection networks with source-code attribution. In: Jarvis, S., Wright, S., Hammond, S. (eds.) PMBS 2017. LNCS, vol. 10724, pp. 221–235. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-72971-8_11CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Joseph P. Kenny
    • 1
  • Khachik Sargsyan
    • 1
  • Samuel Knight
    • 1
  • George Michelogiannakis
    • 2
  • Jeremiah J. Wilke
    • 1
  1. 1.Sandia National LaboratoriesLivermoreUSA
  2. 2.Lawrence Berkeley National LaboratoryBerkeleyUSA

Personalised recommendations