Language and Hardware Acceleration Backend for Graph Processing
Abstract
Graphs are important in many applications. However, their analysis on conventional computer architectures is generally inefficient because it involves highly irregular access to memory when traversing vertices and edges. As an example, when finding a path from a source vertex to a target one the performance is typically limited by the memory bottleneck whereas the actual computation is trivial. This paper presents a methodology for embedding graphs into silicon, where graph vertices become finite state machines communicating via the graph edges. With this approach many common graph analysis tasks can be performed by propagating signals through the physical graph and measuring signal propagation time using the on-chip clock distribution network. This eliminates the memory bottleneck and allows thousands of vertices to be processed in parallel. We present a domain-specific language for graph description and transformation, and demonstrate how it can be used to translate application graphs into an FPGA board, where they can be analyzed up to 1000× faster than on a conventional computer.
Keywords
Graph processing Average shortest path Breadth-first search Hardware acceleration FPGA Drug discovery Domain-specific language HaskellNotes
Acknowledgements
This research was funded by EPSRC Impact Acceleration Account (EP/K503885/1, project Fantasi), EPSRC Programme Grant Poets (EP/N031768/1), Newcastle University and e-Therapeutics PLC.
References
- 1.R. Albert, A.-L. Barabási, Statistical mechanics of complex networks. Rev. Mod. Phys. 74(1), 47 (2002)MathSciNetCrossRefGoogle Scholar
- 2.T.H. Cormen, C.E. Leiserson, R.L. Rivest, C. Stein, Introduction to Algorithms (MIT Press, London/Cambridge, 2001)zbMATHGoogle Scholar
- 3.B. Betkaoui, D.B. Thomas, W. Luk, N. Przulj, A framework for FPGA acceleration of large graph problems: Graphlet counting case study. In International Conference on Field-Programmable Technology, New Delhi (2011)Google Scholar
- 4.E. Nurvitadhi, G. Weisz, Y. Wang, S. Hurkat, M. Nguyen, J.C. Hoe, J.F. Martínez, C. Guestrin, Graphgen: an FPGA framework for vertex-centric graph computation. In International Symposium on Field-Programmable Custom Computing Machines (IEEE, Piscataway, 2014), pp. 25–28Google Scholar
- 5.N. Kapre, Custom FPGA-based soft-processors for sparse graph acceleration. In International Conference on Application-Specific Systems, Architectures and Processors (IEEE, 2015), pp. 9–16Google Scholar
- 6.M. Lin, I. Lebedev, J. Wawrzynek, High-throughput Bayesian computing machine with reconfigurable hardware. In International Symposium on Field Programmable Gate Arrays, Monterey (2010), pp. 73–82Google Scholar
- 7.P. Hudak, Building domain-specific embedded languages. ACM Comput. Surv. 28(4), 196 (1996)CrossRefGoogle Scholar
- 8.Centrifuge project. GitHub page. https://github.com/tuura/centrifuge
- 9.A. Mokhov, Algebraic graphs with class (functional pearl). In Proceedings of the International Symposium on Haskell (ACM, New York, 2017)Google Scholar
- 10.M. Lipovača, Learn You a Haskell for Great Good!: A Beginner’s Guide (No Starch Press, San Francisco, 2012)Google Scholar
- 11.P. Wadler, Monads for functional programming. In International School on Advanced Functional Programming (Springer, Berlin, 1995), pp. 24–52CrossRefGoogle Scholar
- 12.N. Satish, C. Kim, J. Chhugani, P. Dubey, Large-scale energy-efficient graph traversal: a path to efficient data-intensive supercomputing. In International Conference for High Performance Computing, Networking, Storage and Analysis (IEEE, Piscataway, 2012), pp. 1–11Google Scholar
- 13.S. Hong, T. Oguntebi, K. Olukotun, Efficient parallel graph exploration on multi-core CPU and GPU. In International Conference on Parallel Architectures and Compilation Techniques (IEEE, Los Alamitos, 2011), pp. 78–88Google Scholar
- 14.Y. Wang, A. Davidson, Y. Pan, Y. Wu, A. Riffel, J.D. Owens, Gunrock: a high-performance graph processing library on the GPU. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (ACM, New York, 2016), p. 11Google Scholar
- 15.P. Harish, P.J. Narayanan, Accelerating large graph algorithms on the GPU using CUDA. In International Conference on High-Performance Computing (Springer, Berlin/Heidelberg, 2007), pp. 197–208Google Scholar
- 16.A. Mokhov, Conditional Partial Order Graphs. Ph.D. Thesis, Newcastle University (2009)Google Scholar
- 17.A. Mokhov, A. Yakovlev, Conditional partial order graphs: model, synthesis, and application. IEEE Trans. Comput. 59(11), 1480–1493 (2010)MathSciNetCrossRefGoogle Scholar
- 18.T. Ideker, R. Sharan, Protein networks in disease. Genome Res. 18(4), 644–652CrossRefGoogle Scholar
- 19.E.E. Schadt, Molecular networks as sensors and drivers of common human diseases. Nature 461(7261), 218–223 (2009)CrossRefGoogle Scholar
- 20.H. Kitano, A robustness-based approach to systems-oriented drug design. Nat. Rev. Drug Discov. 6(3), 202–210 (2007)CrossRefGoogle Scholar
- 21.T. Tian, S. Olson, J.M. Whitacre, A. Harding, The origins of cancer robustness and evolvability. Integr. Biol. (Camb.) 3(1), 17 (2011)CrossRefGoogle Scholar
- 22.R. Chen, M. Snyder, Systems biology: personalized medicine for the future? Curr. Opin. Pharmacol. 12(5):623–628 (2012)CrossRefGoogle Scholar
- 23.M.P. Young, S. Zimmer, A.V. Whitmore, Drug molecules and biology: network and systems aspects. In RSC Drug Discovery, ed. by J.R. Morphy, C.J. Harris (Royal Society of Chemistry, 2012), pp. 32–49, Chapter 3Google Scholar
- 24.D.S. Callaway, M.E. Newman, S.H. Strogatz, D.J. Watts, Network robustness and fragility: percolation on random graphs. Phys. Rev. Lett. 85(25), 5468–5471 (2000)CrossRefGoogle Scholar
- 25.R. Albert, H. Jeong, A.-L. Barabási, Error and attack tolerance of complex networks. Nature 406(6794), 378–382.CrossRefGoogle Scholar
- 26.P. Crucitti, V. Latora, M. Marchiori, A. Rapisarda, Efficiency of scale-free networks: error and attack tolerance. Physica A 320, 622–642 (2003)CrossRefGoogle Scholar
- 27.M. Abadi, et al. Tensorflow: large-scale machine learning on heterogeneous distributed systems. In Google Research, White Paper (2016)Google Scholar