Abstract
Performance models can be very useful for understanding the behavior of applications and guide design and optimization decisions. Unfortunately, performance modeling of nontrivial computations typically requires significant expertise and human effort. Moreover, even when performed by experts, it is necessarily limited in scope, accuracy, or both. In this paper, we are building the Meliora framework for machine learning-based performance model generation of arbitrary codes based on static analysis of intermediate language representations. We demonstrate good accuracy in matching known codes and show how Meliora can be used to optimize new codes though reusing optimization knowledge, either manually or in conjunction with an autotuner. When autotuning, Meliora eliminates or dramatically reduces the empirical search space, while generally achieving competitive performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Speedup with respect to the original code version.
References
Balaprakash, P., Wild, S.M., Norris, B.: SPAPT: search problems in automatic performance tuning. Proc. Comput. Sci. 9, 1959–1968 (2012)
Beckingsale, D., Pearce, O., Laguna, I., Gamblin, T.: Apollo: reusable models for fast, dynamic tuning of input-dependent code. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 307–316. IEEE (2017)
Beyls, K., D’Hollander, E.: Reuse distance as a metric for cache behavior. In: Proceedings of the IASTED Conference on Parallel and Distributed Computing and systems, vol. 14, pp. 350–360. Citeseer (2001)
Ding, C., Zhong, Y.: Predicting whole-program locality through reuse distance analysis. In: Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation, pp. 245–257 (2003)
Hartono, A., Norris, B., Sadayappan, P.: Annotation-based empirical performance tuning using Orio. In: 2009 IEEE International Symposium on Parallel & Distributed Processing, pp. 1–11. IEEE (2009)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Lim, R., Norris, B., Malony, A.: A similarity measure for GPU kernel subgraph matching. In: 31st International Workshop on Languages and Compilers for Parallel Computing (LCPC), October 2018
Luk, C.K., Hong, S., Kim, H.: Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 45–55. IEEE (2009)
Niepert, M., Ahmed, M., Kutzkov, K.: Learning convolutional neural networks for graphs. In: International Conference on Machine Learning, pp. 2014–2023 (2016)
Norris, B., Hartono, A., Gropp, W.: Annotations for productivity and performance portability. In: Petascale Computing: Algorithms and Applications, pp. 443–462. Computational Science, Chapman & Hall / CRC Press, Taylor and Francis Group (2007). http://www.mcs.anl.gov/uploads/cels/papers/P1392.pdf. Also available as Preprint ANL/MCS-P1392-0107
Perelman, E., Hamerly, G., Van Biesbrouck, M., Sherwood, T., Calder, B.: Using simpoint for accurate and efficient simulation. ACM SIGMETRICS Perform. Eval. Rev. 31(1), 318–319 (2003)
SPAPT benchmark codes. https://github.com/brnorris03/Orio/tree/master/testsuite/SPAPT. Accessed 22 Apr 2020
Stephenson, M., Amarasinghe, S.: Predicting unroll factors using supervised classification. In: International Symposium on Code Generation and Optimization, pp. 123–134. IEEE (2005)
Wen, Y., Wang, Z., O’boyle, M.F.: Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms. In: 2014 21st International Conference on High Performance Computing (HiPC), pp. 1–10. IEEE (2014)
Yuki, T., Renganarayanan, L., Rajopadhye, S., Anderson, C., Eichenberger, A.E., O’Brien, K.: Automatic creation of tile size selection models. In: Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization, pp. 190–199 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Meng, K., Norris, B. (2022). Guiding Code Optimizations with Deep Learning-Based Code Matching. In: Chapman, B., Moreira, J. (eds) Languages and Compilers for Parallel Computing. LCPC 2020. Lecture Notes in Computer Science(), vol 13149. Springer, Cham. https://doi.org/10.1007/978-3-030-95953-1_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-95953-1_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95952-4
Online ISBN: 978-3-030-95953-1
eBook Packages: Computer ScienceComputer Science (R0)