Skip to main content

LightHouse: An Automatic Code Generator for Graph Algorithms on GPUs

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10136))

Abstract

We propose LightHouse, a GPU code-generator for a graph language named Green-Marl for which a multicore CPU backend already exists. This allows a user to seamlessly generate both the multicore as well as the GPU backends from the same specification of a graph algorithm. This restriction of not modifying the language poses several challenges as we work with an existing abstract syntax tree of the language, which is not tailored to GPUs. LightHouse overcomes these challenges with various optimizations such as reducing the number of atomics and collapsing loops. We illustrate its effectiveness by generating efficient CUDA codes for four graph analytic algorithms, and comparing performance against their multicore OpenMP versions generated by Green-Marl. In particular, our generated CUDA code performs comparable to 4 to 64-threaded OpenMP versions for different algorithms.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    LightHouse code is available at http://pace.cse.iitm.ac.in/tools.php.

References

  1. Bader, D.A., Madduri, K.: Designing multithreaded algorithms for breadth-first search and st-connectivity on the Cray MTA-2. In: ICPP 2006, pp. 523–530 (2006)

    Google Scholar 

  2. Buluç, A., Madduri, K.: Parallel breadth-first search on distributed memory systems. In: SC 2011, pp. 65:1–65:12. ACM (2011)

    Google Scholar 

  3. Burtscher, M., Nasre, R., Pingali, K.: A quantitative study of irregular programs on GPUs. In: IISWC 2012, pp. 141–151. IEEE Computer Society (2012)

    Google Scholar 

  4. Checconi, F., Petrini, F., Willcock, J., Lumsdaine, A., Choudhury, A.R., Sabharwal, Y.: Breaking the speed, scalability barriers for graph exploration on distributed-memory machines. In: SC 2012, pp. 13:1–13:12 (2012)

    Google Scholar 

  5. Gharaibeh, A., Costa, L.B., Santos-Neto, E., Ripeanu, M.: A yoke of oxen and a thousand chickens for heavy lifting graph processing. In: PACT 2012 (2012)

    Google Scholar 

  6. Hong, S., Chafi, H., Sedlar, E., Olukotun, K.: Green-Marl: a DSL for easy and efficient graph analysis. In: ASPLOS 2012, pp. 349–362 ACM (2012)

    Google Scholar 

  7. Jablin, T.B., Jablin, J.A., Prabhu, P., Liu, F., August, D.I.: Dynamically managed data for CPU-GPU architectures. In: CGO 2012. ACM (2012)

    Google Scholar 

  8. Kulkarni, M., Burtscher, M., Inkulu, R., Pingali, K., Casçaval, C.: How much parallelism is there in irregular applications? In: PPoPP 2009, pp. 3–14 (2009)

    Google Scholar 

  9. Kulkarni, M., Pingali, K., Ramanarayanan, G., Walter, B., Bala, K., Chew, L.P.: Optimistic parallelism benefits from data partitioning. SIGARCH Comput. Archit. News 36(1), 233–243 (2008)

    Article  Google Scholar 

  10. Kulkarni, M., Pingali, K., Walter, B., Ramanarayanan, G., Bala, K., Chew, L.P.: Optimistic parallelism requires abstractions. PLDI 42(6), 211–222 (2007)

    Google Scholar 

  11. Leskovec, J., Sosič, R.: SNAP: a general purpose network analysis and graph mining library in C++, June 2014. http://snap.stanford.edu/snap

  12. Madduri, K., Bader, D., Berry, J., Crobak, J.: An experimental study of a parallel shortest path algorithm for solving large-scale graph instances. In: ALENEX (2007)

    Google Scholar 

  13. Nasre, R., Burtscher, M., Pingali, K.: Morph algorithms on GPUs. In: PPoPP 2013. ACM (2013)

    Google Scholar 

  14. Pearce, R., Gokhale, M., Amato, N.M.: Multithreaded asynchronous graph traversal for in-memory and semi-external memory. In: SC 2010, pp. 1–11 (2010)

    Google Scholar 

  15. Pingali, K., Nguyen, D., Kulkarni, M., Burtscher, M., Hassaan, M.A., Kaleem, R., Lee, T.-H., Lenharth, A., Manevich, R., Méndez-Lojo, M., Prountzos, D., Sui, X.: The tao of parallelism in algorithms. In: PLDI 2011, pp. 12–25. ACM (2011)

    Google Scholar 

  16. Prountzos, D., Manevich, R., Pingali, K.: Elixir: a system for synthesizing concurrent graph programs. In: OOPSLA 2012, pp. 375–394. ACM (2012)

    Google Scholar 

  17. Prountzos, D., Manevich, R., Pingali, K.: Synthesizing parallel graph programs via automated planning. In: PLDI, pp. 533–544. ACM (2015)

    Google Scholar 

  18. Ragan-Kelley, J., Barnes, C., Adams, A., Paris, S., Durand, F., Amarasinghe, S.: Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In: PLDI 2013, pp. 519–530. ACM (2013)

    Google Scholar 

  19. Shun, J., Blelloch, G.E.: Ligra: A lightweight graph processing framework for shared memory. In: PPoPP, pp. 135–146. ACM (2013)

    Google Scholar 

  20. Venkat, A., Shantharam, M., Hall, M., Strout, M.M.: Non-affine extensions to polyhedral code generation. In: Proceedings of Annual IEEE/ACM International Symposium on Code Generation, Optimization, CGO 2014, pp. 185:185–185:194. ACM, New York (2014)

    Google Scholar 

  21. Xiao, S., Feng, W.: Inter-block GPU communication via fast barrier synchronization. In: IPDPS, pp. 1–12. IEEE (2010)

    Google Scholar 

  22. Yoo, A., Chow, E., Henderson, K., McLendon, W., Hendrickson, B., Catalyurek, U.: A scalable distributed parallel breadth-first search algorithm on blueGene/L. In: ICS, p. 25. IEEE Computer Society (2005)

    Google Scholar 

  23. Zhang, E.Z., Jiang, Y., Guo, Z., Tian, K., Shen, X.: On-the-fly elimination of dynamic irregularities for GPU computing. In: ASPLOS. ACM (2011)

    Google Scholar 

  24. Zhong, J., He, B.: Medusa: simplified graph processing on GPUs. IEEE Trans. Parallel Distrib. Syst. 25(6), 1543–1552 (2014)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to G. Shashidhar or Rupesh Nasre .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Shashidhar, G., Nasre, R. (2017). LightHouse: An Automatic Code Generator for Graph Algorithms on GPUs. In: Ding, C., Criswell, J., Wu, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2016. Lecture Notes in Computer Science(), vol 10136. Springer, Cham. https://doi.org/10.1007/978-3-319-52709-3_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-52709-3_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-52708-6

  • Online ISBN: 978-3-319-52709-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics