Towards Portable Online Prediction of Network Utilization Using MPI-Level Monitoring

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11725)


Stealing network bandwidth helps a variety of HPC runtimes and services to run additional operations in the background without negatively affecting the applications. A key ingredient to make this possible is an accurate prediction of the future network utilization, enabling the runtime to plan the background operations in advance, such as to avoid competing with the application for network bandwidth. In this paper, we propose a portable deep learning predictor that only uses the information available through MPI introspection to construct a recurrent sequence-to-sequence neural network capable of forecasting network utilization. We leverage the fact that most HPC applications exhibit periodic behaviors to enable predictions far into the future (at least the length of a period). Our online approach does not have an initial training phase, it continuously improves itself during application execution without incurring significant computational overhead. Experimental results show better accuracy and lower computational overhead compared with the state-of-the-art on two representative applications.



This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. This material was based upon work supported by the U.S. Department of Energy, Office of Science, under contract DE-AC02-06CH11357, and by the National Science Foundation under Grant No. #1664142. The experiments presented in this paper were carried out using the Grid’5000/ALADDIN-G5K experimental testbed, an initiative of the French Ministry of Research through the ACI GRID incentive action, INRIA, CNRS and RENATER and other contributing partners (see


  1. 1.
  2. 2.
  3. 3.
    Adalsteinsson, H., Cranford, S., Evensky, D.A., Kenny, J.P., Mayo, J., Pinar, A., Janssen, C.L.: A simulator for large-scale parallel computer architectures. Int. J. Distrib. Syst. Technol. 1(2), 57–73 (2010)CrossRefGoogle Scholar
  4. 4.
    Baker, A.H., Falgout, R.D., Kolev, T.V., Yang, U.M.: Multigrid smoothers for ultraparallel computing. SIAM J. Sci. Comput. 33(5), 2864–2887 (2011)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Barrett, B., Squyres, J.M., Lumsdaine, A., Graham, R.L., Bosilca, G.: Analysis of the component architecture overhead in Open MPI. In: EuroPVM/MPI 2005: 12th European Parallel Virtual Machine and Message Passing Interface Users’ Group Meeting, Sorrento, Italy, pp. 175–182 (2005)Google Scholar
  6. 6.
    Bengio, Y., Simard, P., Frasconi, P., et al.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Networks 5(2), 157–166 (1994)CrossRefGoogle Scholar
  7. 7.
    Bosilca, G., Foyer, C., Jeannot, E., Mercier, G., Papauré, G.: Online dynamic monitoring of MPI communications. In: Rivera, F.F., Pena, T.F., Cabaleiro, J.C. (eds.) Euro-Par 2017. LNCS, vol. 10417, pp. 49–62. Springer, Cham (2017). Scholar
  8. 8.
    Box, G.E., Jenkins, G.M., Reinsel, G.C., Ljung, G.M.: Time Series Analysis: Forecasting and Control. Wiley, Hoboken (2015)zbMATHGoogle Scholar
  9. 9.
    Brown, K.A., Domke, J., Matsuoka, S.: Tracing data movements within MPI collectives. In: EuroMPI 2014: Proceedings of the 21st European MPI Users’ Group Meeting, Kyoto, Japan, pp. 117:117–117:118 (2014)Google Scholar
  10. 10.
    Chiu, C.C., et al.: State-of-the-art speech recognition with sequence-to-sequence models. In: ICASSP 2018: 2018 IEEE International Conference on Acoustics. Speech and Signal Processing, Calgary, AB, Canada, pp. 4774–4778 (2018)Google Scholar
  11. 11.
    Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: EMNLP 2014: 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, pp. 1724–1734 (2014)Google Scholar
  12. 12.
    Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12(Aug), 2493–2537 (2011)zbMATHGoogle Scholar
  13. 13.
    Gerber, R., et al.: Crosscut report: exascale requirements reviews, March 9–10, 2017-tysons corner, virginia. An office of science review sponsored by: advanced scientific computing research, basic energy sciences, biological and environmental research, fusion energy sciences, high energy physics, nuclear physics. Technical report, Oak Ridge National Lab. (ORNL) (2018)Google Scholar
  14. 14.
    Habib, S., Morozov, V., Frontiere, N., Finkel, H., Pope, A., Heitmann, K.: HACC: extreme scaling and performance across diverse architectures. In: SC 2013: 2013 International Conference on High Performance Computing. Networking, Storage and Analysis, Denver, USA, pp. 1–10 (2013)Google Scholar
  15. 15.
    Harmon, M., Klabjan, D.: Dynamic prediction length for time series with sequence to sequence networks. arXiv preprint arXiv:1807.00425 (2018)
  16. 16.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  17. 17.
    Jean, S., Cho, K., Memisevic, R., Bengio, Y.: On using very large target vocabulary for neural machine translation. In: ACL-IJCNLP 2015: 53rd Annual Meeting of the Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing, Beijing, China, pp. 1–10 (2015)Google Scholar
  18. 18.
    Keller, R., Bosilca, G., Fagg, G., Resch, M., Dongarra, J.J.: Implementation and usage of the PERUSE-interface in Open MPI. In: EuroPVM/MPI 2006: 13th European Parallel Virtual Machine/Message Passing Interface Users’ Group Meeting, Bonn, Germany, pp. 347–355 (2006)Google Scholar
  19. 19.
    Knüpfer, A., et al.: Score-P: a joint performance measurement run-time infrastructure for periscope, scalasca, TAU, and vampir. In: 5th International Workshop on Parallel Tools for High Performance Computing, Dresden, Germany, pp. 9–91 (2012)Google Scholar
  20. 20.
    Kumar, A.S., Mazumdar, S.: Forecasting HPC workload using ARMA models and SSA. In: ICIT 2016: 2016 International Conference on Information Technology, Bhubaneswar, India, pp. 294–297 (2016)Google Scholar
  21. 21.
    Kuznetsov, V., Mariet, Z.: Foundations of sequence-to-sequence modeling for time series. arXiv preprint arXiv:1805.03714 (2018)
  22. 22.
    Salvador, S., Chan, P.: Toward accurate dynamic time warping in linear time and space. Intell. Data Anal. 11(5), 561–580 (2007)CrossRefGoogle Scholar
  23. 23.
    Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: NIPS 2014: 27th Annual Conference on Neural Information Processing Systems, Montreal, Quebec, Canada, pp. 3104–3112 (2014)Google Scholar
  24. 24.
    Tran, N., Reed, D.A.: ARIMA time series modeling and forecasting for adaptive I/O prefetching. In: ICS 2001: Proceedings of the 15th International Conference on Supercomputing, Sorrento, Italy, pp. 473–485 (2001)Google Scholar
  25. 25.
    Tran, N., Reed, D.A.: Automatic ARIMA time series modeling for adaptive I/O prefetching. IEEE Trans. Parallel Distrib. Syst. 15(4), 362–377 (2004)CrossRefGoogle Scholar
  26. 26.
    Venugopalan, S., Rohrbach, M., Donahue, J., Mooney, R., Darrell, T., Saenko, K.: Sequence to sequence-video to text. In: ICCV 2015: 2015 IEEE International Conference on Computer Vision, Santiago, Chile, pp. 4534–4542 (2015)Google Scholar
  27. 27.
    Vetter, J.S., McCracken, M.O.: Statistical scalability analysis of communication operations in distributed applications. In: PPoPP 2001: Proceedings of the 8th ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, Snowbird, Utah, USA, pp. 123–132 (2001)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.University of California IrvineIrvineUSA
  2. 2.Argonne National LaboratoryLemontUSA
  3. 3.University of Tennessee KnoxvilleKnoxvilleUSA
  4. 4.INRIA BordeauxTalenceFrance

Personalised recommendations