Using Deep Learning for Automated Communication Pattern Characterization: Little Steps and Big Challenges
Characterization of a parallel application’s communication patterns can be useful for performance analysis, debugging, and system design. However, obtaining and interpreting a characterization can be difficult. AChax implements an approach that uses search and a library of known communication patterns to automatically characterize communication patterns. Our approach has some limitations that reduce its effectiveness for the patterns and pattern combinations used by some real-world applications. By viewing AChax’s pattern recognition problem as an image recognition problem, it may be possible to use deep learning to address these limitations. In this position paper, we present our current ideas regarding the benefits and challenges of integrating deep learning into AChax and our conclusion that a hybrid approach combining deep learning classification, regression, and the existing AChax approach may be the best long-term solution to the problem of parameterizing recognized communication patterns.
KeywordsDeep learning Automation Application characterization
We thank David Poliakoff of Lawrence Livermore National Laboratory for his helpful feedback about this paper and the tools workshop presentation that motivated it.
This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research under contract number DE-AC05-00OR22725.
This work is supported in part by the US Department of Energy Office of Science SciDAC RAPIDS project under subcontract 4000159855 to the University of Oregon from Oak Ridge National Laboratory.
- 1.Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems (2015). http://download.tensorflow.org/paper/whitepaper2015.pdf
- 2.Al-Rfou, R., et al.: Theano: a Python framework for fast computation of mathematical expressions. arXiv e-prints abs/1605.02688, May 2016. http://arxiv.org/abs/1605.02688
- 3.Graph-tool: efficient network analysis (2018). https://graph-tool.skewed.de
- 5.NumPy (2018). http://www.numpy.org
- 6.Paszke, A., et al.: Automatic differentiation in PyTorch. In: NIPS 2017 Autodiff Workshop, December 2017Google Scholar
- 7.Roth, P.C.: Improved accuracy for automated communication pattern characterization using communication graphs and aggressive search space pruning. In: Bhatele, A., et al. (eds.) ESPT/VPA 2017/2018. LNCS, vol. 11027, pp. 38–55. Springer, Cham (2019)Google Scholar
- 8.Roth, P.C.: Scalable, automated characterization of parallel application communication behavior. In: 2018 Scalable Tools Workshop, July 2018Google Scholar
- 9.Roth, P.C., Meredith, J.S., Vetter, J.S.: Automated characterization of parallel application communication patterns. In: Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2015), Portland, Oregon, USA, pp. 73–84, August 2015. https://doi.org/10.1145/2749246.2749278