Lessons Learned from Exploring the Backtracking Paradigm on the GPU

  • John Jenkins
  • Isha Arkatkar
  • John D. Owens
  • Alok Choudhary
  • Nagiza F. Samatova
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6853)


We explore the backtracking paradigm with properties seen as sub-optimal for GPU architectures, using as a case study the maximal clique enumeration problem, and find that the presence of these properties limit GPU performance to approximately 1.4–2.25 times a single CPU core. The GPU performance “lessons” we find critical to providing this performance include a coarse-and-fine-grain parallelization of the search space, a low-overhead load-balanced distribution of work, global memory latency hiding through coalescence, saturation, and shared memory utilization, and the use of GPU output buffering as a solution to irregular workloads and a large solution domain. We also find a strong reliance on an efficient global problem structure representation that bounds any efficiencies gained from these lessons, and discuss the meanings of these results to backtracking problems in general.


Shared Memory Connectivity Query Memory Operation Frequent Itemset Mining Candidate Path 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. of the 20th VLDB Conference, pp. 487–499 (1994)Google Scholar
  2. 2.
    Bader, D.A., Madduri, K.: GTgraph: A suite of synthetic random graph generators,
  3. 3.
    Bron, C., Kerbosch, J.: Algorithm 457: Finding all cliques of an undirected graph. Communications of the ACM 16(9), 575–577 (1973)CrossRefzbMATHGoogle Scholar
  4. 4.
    Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-MAT: A recursive model for graph mining. In: SIAM International Conference on Data Mining, pp. 442–446. SIAM, Philadelphia (2004)Google Scholar
  5. 5.
    Foley, T., Sugerman, J.: KD-Tree acceleration structures for a GPU raytracer. In: Graphics Hardware 2005, pp. 15–22 (July 2005)Google Scholar
  6. 6.
    Gouda, K., Zaki, M.J.: Efficiently mining maximal frequent itemsets. In: Proc. of the 2001 IEEE International Conference on Data Mining, pp. 163–170 (2001)Google Scholar
  7. 7.
    Grindley, H.M., Artymiuk, P.J., Rice, D.W., Willett, P.: Identification of tertiary structure resemblance in proteins using a maximal common subgraph isomorphism algorithm. Journal of Molecular Biology 229(3), 707–721 (1993)CrossRefGoogle Scholar
  8. 8.
    Harish, P., Narayanan, P.J.: Accelerating large graph algorithms on the GPU using CUDA. In: Aluru, S., Parashar, M., Badrinath, R., Prasanna, V.K. (eds.) HiPC 2007. LNCS, vol. 4873, pp. 197–208. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  9. 9.
    Havran, V.: Heuristic Ray Shooting Algorithms. PhD thesis, Czech Technical University in Prague (2001)Google Scholar
  10. 10.
    Horn, D., Sugerman, J., Houston, M., Hanrahan, P.: Interactive k-d tree GPU raytracing. In: Proc. of the 2007 Symposium on Interactive 3D Graphics and Games, pp. 167–174 (2007)Google Scholar
  11. 11.
    Kumar, V.: Algorithms for constraint-satisfaction problems: A survey. AI Magazine 13(1), 32–44 (1992)Google Scholar
  12. 12.
    Lee, V.W., Kim, C., et al.: Debunking the 100X GPU vs. CPU myth: An evaluation of throughput computing on CPU and GPU. In: Int’l Symposium on Computer Architecture, pp. 451–460 (2010)Google Scholar
  13. 13.
    Moon, J., Moser, W.: On cliques in graphs. Israel J. of Math. 3, 23–28 (1965)CrossRefMathSciNetzbMATHGoogle Scholar
  14. 14.
    Owens, J.D., Houston, M., Luebke, D., Green, S., Stone, J.E., Phillips, J.C.: GPU computing. Proceedings of the IEEE 96(5), 879–899 (2008)CrossRefGoogle Scholar
  15. 15.
    Rowe, R., Creamer, G., Hershkop, S., Stolfo, S.J.: Automated social hierarchy detection through email network analysis. In: 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Network Analysis (2007)Google Scholar
  16. 16.
    Schmidt, M.C., Samatova, N.F., Thomas, K., Park, B.-H.: A scalable, parallel algorithm for maximal clique enumeration. JPDC 69(4), 417–428 (2009)Google Scholar
  17. 17.
    Tabb, D.L., Thompson, M.R., Khalsa-Moyers, G., VerBerkmoes, N.C., McDonald, W.H.: Ms2grouper: group assessment and synthetic replacement of duplicate proteomic tandem mass spectra. Journal of the American Society for Mass Spectrometry 16(8), 1250–1261 (2005)CrossRefGoogle Scholar
  18. 18.
    Vuduc, R., Chandramowlishwaran, A., Choi, J., Guney, M., Shringarpure, A.: On the limits of GPU acceleration. Hot Topics in Paralellism 35(5) (2010)Google Scholar
  19. 19.
    Zhang, B., Park, B.-H., Karpinets, T., Samatova, N.F.: From pull-down data to protein interaction networks and complexes with biological relevance. Bioinformatics 24(7), 979–986 (2008)CrossRefGoogle Scholar
  20. 20.
    Zhou, K., Hou, Q., Wang, R., Guo, B.: Real-time KD-tree construction on graphics hardware. ACM Transactions on Graphics 27(5), 1–126 (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • John Jenkins
    • 1
    • 2
  • Isha Arkatkar
    • 1
    • 2
  • John D. Owens
    • 3
  • Alok Choudhary
    • 4
  • Nagiza F. Samatova
    • 1
    • 2
  1. 1.North Carolina State UniversityRaleighUSA
  2. 2.Oak Ridge National LaboratoryOak RidgeUSA
  3. 3.University of California, DavisDavisUSA
  4. 4.Northwestern UniversityEvanstonUSA

Personalised recommendations