International Journal of Parallel Programming

, Volume 45, Issue 2, pp 320–339 | Cite as

Towards Systematic Parallelization of Graph Transformations Over Pregel

  • Le-Duc Tung
  • Zhenjiang Hu


Graphs can be used to model many kinds of data, from traditional datasets to social networks or semi-structured datasets. To process large graphs, many systems have been proposed. The Pregel programming model is popular, thanks to its scalability. Although Pregel is simple to understand and use, it is of low-level in programming and requires developers to write programs that are hard to maintain and need to be carefully optimized. On the other hand, structural recursion is powerful to systematically construct efficient parallel programs on lists, arrays and trees, but it has not yet been applied to graphs. In this paper, we propose an efficient method for parallel evaluation of structural recursion on graphs, which is suitable for Pregel. We design and implement a high-level parallel programming framework where a domain-specific language (DSL) is provided to ease the programing task. Specifications written in the DSL are automatically compiled into Pregel programs that are scalable for large graphs. Experimental results show that our framework outperforms the original evaluation of structural recursion, and achieves good scalability and speedup for real datasets.


Structural recursion Graph transformation Parallel programming Pregel programming model 


  1. 1.
    Afrati, F.N., Ullman, J.D.: Transitive closure and recursive datalog implemented on clusters. In: Proceedings of the 15th International Conference on Extending Database Technology, EDBT ’12 (2012)Google Scholar
  2. 2.
    Buneman, P.: Semistructured data. In: Proceedings of the 16th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS ’97, pp. 117–121. ACM, New York, NY, USA (1997)Google Scholar
  3. 3.
    Buneman, P., Fernandez, M., Suciu, D.: UnQL: A Query Language and Algebra for Semistructured Data Based on Structural Recursion. VLDB J. 9(1), 76–110 (2000)Google Scholar
  4. 4.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  5. 5.
    Emoto, K., Fischer, S., Hu, Z.: Generate, test, and aggregate: a calculation-based framework for systematic parallel programming with mapreduce. In: Proceedings of the 21st European Conference on Programming Languages and Systems, ESOP’12, pp. 254–273. Springer, Berlin (2012)Google Scholar
  6. 6.
    Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: PowerGraph: distributed graph-parallel computation on natural graphs. In: Proceedings of OSDI’12, pp. 17–30 (2012)Google Scholar
  7. 7.
    Hidaka, S., Hu, Z., Kato, H., Nakano, K.: Towards a compositional approach to model transformation for software development. In: Proceedings of the 2009 ACM Symposium on Applied Computing, SAC ’09, pp. 468–475. ACM, New York, NY, USA (2009)Google Scholar
  8. 8.
    Hong, S., Salihoglu, S., Widom, J., Olukotun, K.: Simplifying scalable graph processing with a domain-specific language. In: Proceedings of CGO’14, pp. 208–218 (2014)Google Scholar
  9. 9.
    Krause, C., Tichy, M., Giese, H.: Implementing graph transformations in the bulk synchronous parallel model. In: Gnesi, S., Rensink, A. (eds.) Fundamental Approaches to SoftwareEngineering, Lecture Notes in Computer Science, vol. 8411, pp. 325–339. Springer, Berlin (2014)Google Scholar
  10. 10.
    Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., Hellerstein, J.M.: Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proc. VLDB Endow. 5(8), 716–727 (2012)CrossRefGoogle Scholar
  11. 11.
    Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD ’10 (2010)Google Scholar
  12. 12.
    Matsuzaki, K., Iwasaki, H., Emoto, K., Hu, Z.: A library of constructive skeletons for sequential style of parallel programming. In: Proceedings of the 1st International Conference on Scalable Information Systems, InfoScale ’06. ACM, New York, NY, USA (2006)Google Scholar
  13. 13.
    Nolé, M., Sartiani, C.: Processing regular path queries on giraph. In: EDBT/ICDT Workshops (2014)Google Scholar
  14. 14.
    Salihoglu, S., Widom, J.: HelP: High-level primitives for large-scale graph processing. In: Proceedings of Workshop on GRAph Data Management Experiences and Systems, GRADES’14, pp. 3:1–3:6 (2014)Google Scholar
  15. 15.
    Suciu, D.: Distributed query evaluation on semistructured data. ACM Trans. Database Syst. 27(1), 1–62 (2002)Google Scholar
  16. 16.
    Tung, L.D., Nguyen-Van, Q., Hu, Z.: Efficient query evaluation on distributed graphs with hadoop environment. In: Proceedings of the 4th Symposium on Information and Communication Technology, SoICT ’13. ACM, New York, NY, USA (2013)Google Scholar
  17. 17.
    Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)CrossRefGoogle Scholar
  18. 18.
    Xin, R.S., Gonzalez, J.E., Franklin, M.J., Stoica, I.: GraphX: a resilient distributed graph system on spark. In: First International Workshop on Graph Data Management Experiences and Systems, GRADES ’13, pp. 2:1–2:6 (2013)Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.SOKENDAI (The Graduate University for Advanced Studies)HayamaJapan
  2. 2.SOKENDAI/National Institute of Informatics (NII)TokyoJapan

Personalised recommendations