Euro-Par 2011: Euro-Par 2011 Parallel Processing pp 425-437 | Cite as
Lessons Learned from Exploring the Backtracking Paradigm on the GPU
Abstract
We explore the backtracking paradigm with properties seen as sub-optimal for GPU architectures, using as a case study the maximal clique enumeration problem, and find that the presence of these properties limit GPU performance to approximately 1.4–2.25 times a single CPU core. The GPU performance “lessons” we find critical to providing this performance include a coarse-and-fine-grain parallelization of the search space, a low-overhead load-balanced distribution of work, global memory latency hiding through coalescence, saturation, and shared memory utilization, and the use of GPU output buffering as a solution to irregular workloads and a large solution domain. We also find a strong reliance on an efficient global problem structure representation that bounds any efficiencies gained from these lessons, and discuss the meanings of these results to backtracking problems in general.
Keywords
Shared Memory Connectivity Query Memory Operation Frequent Itemset Mining Candidate PathPreview
Unable to display preview. Download preview PDF.
References
- 1.Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. of the 20th VLDB Conference, pp. 487–499 (1994)Google Scholar
- 2.Bader, D.A., Madduri, K.: GTgraph: A suite of synthetic random graph generators, https://sdm.lbl.gov/~kamesh/software/GTgraph/
- 3.Bron, C., Kerbosch, J.: Algorithm 457: Finding all cliques of an undirected graph. Communications of the ACM 16(9), 575–577 (1973)CrossRefMATHGoogle Scholar
- 4.Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-MAT: A recursive model for graph mining. In: SIAM International Conference on Data Mining, pp. 442–446. SIAM, Philadelphia (2004)Google Scholar
- 5.Foley, T., Sugerman, J.: KD-Tree acceleration structures for a GPU raytracer. In: Graphics Hardware 2005, pp. 15–22 (July 2005)Google Scholar
- 6.Gouda, K., Zaki, M.J.: Efficiently mining maximal frequent itemsets. In: Proc. of the 2001 IEEE International Conference on Data Mining, pp. 163–170 (2001)Google Scholar
- 7.Grindley, H.M., Artymiuk, P.J., Rice, D.W., Willett, P.: Identification of tertiary structure resemblance in proteins using a maximal common subgraph isomorphism algorithm. Journal of Molecular Biology 229(3), 707–721 (1993)CrossRefGoogle Scholar
- 8.Harish, P., Narayanan, P.J.: Accelerating large graph algorithms on the GPU using CUDA. In: Aluru, S., Parashar, M., Badrinath, R., Prasanna, V.K. (eds.) HiPC 2007. LNCS, vol. 4873, pp. 197–208. Springer, Heidelberg (2007)CrossRefGoogle Scholar
- 9.Havran, V.: Heuristic Ray Shooting Algorithms. PhD thesis, Czech Technical University in Prague (2001)Google Scholar
- 10.Horn, D., Sugerman, J., Houston, M., Hanrahan, P.: Interactive k-d tree GPU raytracing. In: Proc. of the 2007 Symposium on Interactive 3D Graphics and Games, pp. 167–174 (2007)Google Scholar
- 11.Kumar, V.: Algorithms for constraint-satisfaction problems: A survey. AI Magazine 13(1), 32–44 (1992)Google Scholar
- 12.Lee, V.W., Kim, C., et al.: Debunking the 100X GPU vs. CPU myth: An evaluation of throughput computing on CPU and GPU. In: Int’l Symposium on Computer Architecture, pp. 451–460 (2010)Google Scholar
- 13.Moon, J., Moser, W.: On cliques in graphs. Israel J. of Math. 3, 23–28 (1965)CrossRefMathSciNetMATHGoogle Scholar
- 14.Owens, J.D., Houston, M., Luebke, D., Green, S., Stone, J.E., Phillips, J.C.: GPU computing. Proceedings of the IEEE 96(5), 879–899 (2008)CrossRefGoogle Scholar
- 15.Rowe, R., Creamer, G., Hershkop, S., Stolfo, S.J.: Automated social hierarchy detection through email network analysis. In: 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Network Analysis (2007)Google Scholar
- 16.Schmidt, M.C., Samatova, N.F., Thomas, K., Park, B.-H.: A scalable, parallel algorithm for maximal clique enumeration. JPDC 69(4), 417–428 (2009)Google Scholar
- 17.Tabb, D.L., Thompson, M.R., Khalsa-Moyers, G., VerBerkmoes, N.C., McDonald, W.H.: Ms2grouper: group assessment and synthetic replacement of duplicate proteomic tandem mass spectra. Journal of the American Society for Mass Spectrometry 16(8), 1250–1261 (2005)CrossRefGoogle Scholar
- 18.Vuduc, R., Chandramowlishwaran, A., Choi, J., Guney, M., Shringarpure, A.: On the limits of GPU acceleration. Hot Topics in Paralellism 35(5) (2010)Google Scholar
- 19.Zhang, B., Park, B.-H., Karpinets, T., Samatova, N.F.: From pull-down data to protein interaction networks and complexes with biological relevance. Bioinformatics 24(7), 979–986 (2008)CrossRefGoogle Scholar
- 20.Zhou, K., Hou, Q., Wang, R., Guo, B.: Real-time KD-tree construction on graphics hardware. ACM Transactions on Graphics 27(5), 1–126 (2008)CrossRefGoogle Scholar