Advertisement

Using Deep Learning for Automated Communication Pattern Characterization: Little Steps and Big Challenges

  • Philip C. RothEmail author
  • Kevin Huck
  • Ganesh Gopalakrishnan
  • Felix Wolf
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11027)

Abstract

Characterization of a parallel application’s communication patterns can be useful for performance analysis, debugging, and system design. However, obtaining and interpreting a characterization can be difficult. AChax implements an approach that uses search and a library of known communication patterns to automatically characterize communication patterns. Our approach has some limitations that reduce its effectiveness for the patterns and pattern combinations used by some real-world applications. By viewing AChax’s pattern recognition problem as an image recognition problem, it may be possible to use deep learning to address these limitations. In this position paper, we present our current ideas regarding the benefits and challenges of integrating deep learning into AChax and our conclusion that a hybrid approach combining deep learning classification, regression, and the existing AChax approach may be the best long-term solution to the problem of parameterizing recognized communication patterns.

Keywords

Deep learning Automation Application characterization 

Notes

Acknowledgments

We thank David Poliakoff of Lawrence Livermore National Laboratory for his helpful feedback about this paper and the tools workshop presentation that motivated it.

This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research under contract number DE-AC05-00OR22725.

This work is supported in part by the US Department of Energy Office of Science SciDAC RAPIDS project under subcontract 4000159855 to the University of Oregon from Oak Ridge National Laboratory.

References

  1. 1.
    Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems (2015). http://download.tensorflow.org/paper/whitepaper2015.pdf
  2. 2.
    Al-Rfou, R., et al.: Theano: a Python framework for fast computation of mathematical expressions. arXiv e-prints abs/1605.02688, May 2016. http://arxiv.org/abs/1605.02688
  3. 3.
    Graph-tool: efficient network analysis (2018). https://graph-tool.skewed.de
  4. 4.
    Gropp, W., Lusk, E., Skjellum, A.: Using MPI: Portable Parallel Programming with the Message-passing Interface. Scientific and Engineering Computation, 2nd edn. MIT Press, Cambridge (1999)CrossRefGoogle Scholar
  5. 5.
    NumPy (2018). http://www.numpy.org
  6. 6.
    Paszke, A., et al.: Automatic differentiation in PyTorch. In: NIPS 2017 Autodiff Workshop, December 2017Google Scholar
  7. 7.
    Roth, P.C.: Improved accuracy for automated communication pattern characterization using communication graphs and aggressive search space pruning. In: Bhatele, A., et al. (eds.) ESPT/VPA 2017/2018. LNCS, vol. 11027, pp. 38–55. Springer, Cham (2019)Google Scholar
  8. 8.
    Roth, P.C.: Scalable, automated characterization of parallel application communication behavior. In: 2018 Scalable Tools Workshop, July 2018Google Scholar
  9. 9.
    Roth, P.C., Meredith, J.S., Vetter, J.S.: Automated characterization of parallel application communication patterns. In: Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2015), Portland, Oregon, USA, pp. 73–84, August 2015.  https://doi.org/10.1145/2749246.2749278

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Oak Ridge National LaboratoryOak RidgeUSA
  2. 2.University of OregonEugeneUSA
  3. 3.University of UtahSalt Lake CityUSA
  4. 4.Technische Universität DarmstadtDarmstadtGermany

Personalised recommendations