A Similarity Measure for GPU Kernel Subgraph Matching

Lim, Robert; Norris, Boyana; Malony, Allen

doi:10.1007/978-3-030-34627-0_3

Robert Lim¹⁰,
Boyana Norris¹⁰ &
Allen Malony¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11882))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

365 Accesses
1 Citations

Abstract

Accelerator architectures specialize in executing SIMD (single instruction, multiple data) in lockstep. Because the majority of CUDA applications are parallelized loops, control flow information can provide an in-depth characterization of a kernel. CUDAflow is a tool that statically separates CUDA binaries into basic block regions and dynamically measures instruction and basic block frequencies. CUDAflow captures this information in a control flow graph (CFG) and performs subgraph matching across various kernel’s CFGs to gain insights into an application’s resource requirements, based on the shape and traversal of the graph, instruction operations executed and registers allocated, among other information. The utility of CUDAflow is demonstrated with SHOC and Rodinia application case studies on a variety of GPU architectures, revealing novel control flow characteristics that facilitate end users, autotuners, and compilers in generating high performing code.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Adhianto, L., et al.: HPCToolkit: tools for performance analysis of optimized parallel programs. Concurr. Comput. Pract. Exp. 22(6), 685–701 (2010)
Google Scholar
Ammons, G., Ball, T., Larus, J.R.: Exploiting hardware performance counters with flow and context sensitive profiling. ACM Sigplan Not. 32(5), 85–96 (1997)
Article Google Scholar
Ball, T., Larus, J.R.: Optimally profiling and tracing programs. ACM Trans. Program. Lang. Syst. (TOPLAS) 16(4), 1319–1360 (1994)
Article Google Scholar
Böhm, C., Jacopini, G.: Flow diagrams, turing machines and languages with only two formation rules. Commun. ACM 9(5), 366–371 (1966)
Article Google Scholar
Borgelt, C., Berthold, M.R.: Mining molecular fragments: finding relevant substructures of molecules. In: Proceedings of the IEEE International Conference on Data Mining, pp. 51–58. IEEE (2002)
Google Scholar
Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: IEEE International Symposium on Workload Characterization, IISWC 2009, pp. 44–54. IEEE (2009)
Google Scholar
Collective Knowledge (CK). http://cknowledge.org
Csardi, G., Nepusz, T.: The iGraph software package for complex network research
Google Scholar
Danalis, A., et al.: The scalable heterogeneous computing (SHOC) benchmark suite. In: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, pp. 63–74. ACM (2010)
Google Scholar
Allinea DDT. http://www.allinea.com/products/ddt
Diamos, G., Ashbaugh, B., Maiyuran, S., Kerr, A., Wu, H., Yalamanchili, S.: SIMD re-convergence at thread frontiers. In: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 477–488. ACM (2011)
Google Scholar
Farooqui, N., Kerr, A., Eisenhauer, G., Schwan, K., Yalamanchili, S.: Lynx: a dynamic instrumentation system for data-parallel applications on GPGPU architectures. In: International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 58–67. IEEE (2012)
Google Scholar
Gonzales, R.C., Woods, R.E.: Digital Image Processing. Addison-Wesley, Reading (1993)
Google Scholar
Huan, J., Wang, W., Prins, J.: Efficient mining of frequent subgraphs in the presence of isomorphism. In: Third IEEE International Conference on Data Mining, ICDM 2003, pp. 549–552. IEEE (2003)
Google Scholar
Koutra, D., Vogelstein, J.T., Faloutsos, C.: DeltaCon: a principled massive-graph similarity function. SIAM
Google Scholar
Lim, R., Carrillo-Cisneros, D., Alkowaileet, W., Scherson, I.: Computationally efficient multiplexing of events on hardware counters. In: Linux Symposium (2014)
Google Scholar
Lim, R., Malony, A., Norris, B., Chaimov, N.: Identifying optimization opportunities within kernel execution in GPU codes. In: Hunold, S., et al. (eds.) Euro-Par 2015. LNCS, vol. 9523, pp. 185–196. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-27308-2_16
Chapter Google Scholar
Lim, R., Norris, B., Malony, A.: Autotuning GPU kernels via static and predictive analysis. In: 2017 46th International Conference on Parallel Processing (ICPP), pp. 523–532. IEEE (2017)
Google Scholar
Marin, G., Dongarra, J., Terpstra, D.: MIAMI: A framework for application performance diagnosis. In: 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 158–168. IEEE (2014)
Google Scholar
Miller, B.P., et al.: The paradyn parallel performance measurement tool. Computer 28(11), 37–46 (1995)
Article Google Scholar
Nvidia Visual Profiler. https://developer.nvidia.com/nvidia-visual-profiler
Sabne, A., Sakdhnagool, P., Eigenmann, R.: Formalizing structured control flow graphs. In: Ding, C., Criswell, J., Wu, P. (eds.) LCPC 2016. LNCS, vol. 10136, pp. 153–168. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-52709-3_13
Chapter Google Scholar
Sarkar, V.: Determining average program execution times and their variance. In: ACM SIGPLAN Notices, vol. 24, pp. 298–312. ACM (1989)
Google Scholar
Shende, S.S., Malony, A.D.: The TAU parallel performance system. Int. J. High Perform. Comput. Appl. 20(2), 287–311 (2006)
Article Google Scholar
Singh, R., Xu, J., Berger, B.: Pairwise global alignment of protein interaction networks by matching neighborhood topology. In: Speed, T., Huang, H. (eds.) RECOMB 2007. LNCS, vol. 4453, pp. 16–31. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71681-5_2
Chapter Google Scholar
Sreepathi, S., et al.: Application characterization using Oxbow toolkit and PADS infrastructure. In: Proceedings of the 1st International Workshop on Hardware-Software Co-Design for High Performance Computing, pp. 55–63. IEEE Press (2014)
Google Scholar
Williams, M.H., Ossher, H.: Conversion of unstructured flow diagrams to structured form. Comput. J. 21(2), 161–167 (1978)
Article Google Scholar
Wu, H., Diamos, G., Li, S., Yalamanchili, S.: Characterization and transformation of unstructured control flow in GPU applications. In: 1st International Workshop on Characterizing Applications for Heterogeneous Exascale Systems (2011)
Google Scholar
Yan, X., Han, J.: gSpan: graph-based substructure pattern mining. In: Proceedings of 2002 IEEE International Conference on Data Mining, ICDM 2003, pp. 721–724. IEEE (2002)
Google Scholar
Zhang, F., D’Hollander, E.H.: Using hammock graphs to structure programs. IEEE Trans. Softw. Eng. 30(4), 231–245 (2004)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Oregon, Eugene, OR, USA
Robert Lim, Boyana Norris & Allen Malony

Authors

Robert Lim
View author publications
You can also search for this author in PubMed Google Scholar
Boyana Norris
View author publications
You can also search for this author in PubMed Google Scholar
Allen Malony
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robert Lim .

Editor information

Editors and Affiliations

University of Utah, Salt Lake City, UT, USA
Mary Hall
University of Utah, Salt Lake City, UT, USA
Hari Sundar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lim, R., Norris, B., Malony, A. (2019). A Similarity Measure for GPU Kernel Subgraph Matching. In: Hall, M., Sundar, H. (eds) Languages and Compilers for Parallel Computing. LCPC 2018. Lecture Notes in Computer Science(), vol 11882. Springer, Cham. https://doi.org/10.1007/978-3-030-34627-0_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-34627-0_3
Published: 13 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34626-3
Online ISBN: 978-3-030-34627-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics