Advertisement

The ubiquity of large graphs and surprising challenges of graph processing: extended survey

  • Siddhartha SahuEmail author
  • Amine Mhedhbi
  • Semih Salihoglu
  • Jimmy Lin
  • M. Tamer Özsu
Special Issue Paper
  • 183 Downloads

Abstract

Graph processing is becoming increasingly prevalent across many application domains. In spite of this prevalence, there is little research about how graphs are actually used in practice. We performed an extensive study that consisted of an online survey of 89 users, a review of the mailing lists, source repositories, and white papers of a large suite of graph software products, and in-person interviews with 6 users and 2 developers of these products. Our online survey aimed at understanding: (i) the types of graphs users have; (ii) the graph computations users run; (iii) the types of graph software users use; and (iv) the major challenges users face when processing their graphs. We describe the participants’ responses to our questions highlighting common patterns and challenges. Based on our interviews and survey of the rest of our sources, we were able to answer some new questions that were raised by participants’ responses to our online survey and understand the specific applications that use graph data and software. Our study revealed surprising facts about graph processing in practice. In particular, real-world graphs represent a very diverse range of entities and are often very large, scalability and visualization are undeniably the most pressing challenges faced by participants, and data integration, recommendations, and fraud detection are very popular applications supported by existing graph software. We hope these findings can guide future research.

Keywords

User survey Graph processing Graph databases RDF systems 

Notes

Acknowledgements

We are grateful to Chen Zou for helping us in using online survey tools and drafting an early version of this survey. We are also grateful to Nafisa Anzum, Jeremy Chen, Pranjal Gupta, Chathura Kankanamge, and Shahid Khaliq for their valuable comments on the survey and help in categorizing the academic publications, user emails, and issues. We would like to thank our online survey participants and our interviewees: Brad Bebee, Scott Brave, Jordan Crombie, Luna Dong, William Hayes, Thomas Hubauer, Peter Lawrence, Stephen Ruppert, Chen Yuan, with a special thanks to Zhengping Qian for the many hours of follow-up discussions after our interview. Finally, we would like to thank the anonymous reviewers for their valuable comments. This research was partially supported by multiple Discovery Grants from the Natural Sciences and Engineering Council (NSERC) of Canada.

Supplementary material

778_2019_548_MOESM1_ESM.xlsx (83 kb)
Supplementary material 1 (xlsx 83 KB)

References

  1. 1.
  2. 2.
  3. 3.
    Aggarwal, C.C., Wang, H.: Graph Data Management and Mining: A Survey of Algorithms and Applications, pp. 13–68. Springer, Berlin (2010)CrossRefGoogle Scholar
  4. 4.
  5. 5.
  6. 6.
  7. 7.
    Aluç, G., Hartig, O., Özsu, M.T., Daudjee, K.: Diversified Stress Testing of RDF Data Management Systems. In: ISWC (2014)Google Scholar
  8. 8.
    Amer-Yahia, S., Pei, J. (eds.): PVLDB, Volume 11 (2017–2018). http://www.vldb.org/pvldb/vol11.html
  9. 9.
    Angles, R., Arenas, M., Barceló, P., Boncz, P.A., Fletcher, G.H.L., Gutierrez, C., Lindaaker, T., Paradies, M., Plantikow, S., Sequeda, J.F., van Rest, O., Voigt, H.: G-CORE: a core for future graph query languages. In: Proceedings of International Conference on Management of Data (2018)Google Scholar
  10. 10.
    Angles, R., Arenas, M., Barceló, P., Hogan, A., Reutter, J.L., Vrgoc, D.: Foundations of modern query languages for graph databases. ACM Comput. Surv. 50(5), 68 (2017)CrossRefGoogle Scholar
  11. 11.
  12. 12.
    Arpaci-Dusseau, A.C., Voelker, G. (eds.): Proceedings of the Symposium on Operating Systems Design and Implementation. USENIX Association (2018). https://www.usenix.org/conference/osdi18
  13. 13.
  14. 14.
  15. 15.
    Balcan, M., Weinberger, K.Q. (eds.): Proceedings of the International Conference on Machine Learning. JMLR.org (2016). http://jmlr.org/proceedings/papers/v48
  16. 16.
    Batarfi, O., Shawi, R.E., Fayoumi, A.G., Nouri, R., Beheshti, S.M.R., Barnawi, A., Sakr, S.: Large scale graph processing systems: survey and an experimental evaluation. Cluster Comput. 18(3), 1189–1213 (2015)CrossRefGoogle Scholar
  17. 17.
    Basic Linear Algebra Subprograms. http://www.netlib.org/blas
  18. 18.
    Boncz, P., Salem, K. (eds.): PVLDB, Volume 10 (2016–2017). http://www.vldb.org/pvldb/vol10.html
  19. 19.
    Bridgeman, S., Tamassia, R.: A User Study in Similarity Measures for Graph Drawing, pp. 19–30. Springer, Berlin (2001)zbMATHGoogle Scholar
  20. 20.
  21. 21.
    Ching, A., Edunov, S., Kabiljo, M., Logothetis, D., Muthukrishnan, S.: One trillion edges: graph processing at facebook-scale. PVLDB 8(12), 1804–1815 (2015)Google Scholar
  22. 22.
  23. 23.
    Conceptual Graphs. http://conceptualgraphs.org
  24. 24.
    Cui, W., Qu, H.: A Survey on Graph Visualization. PhD Qualifying Exam Report, Computer Science Department, Hong Kong University of Science and Technology (2007)Google Scholar
  25. 25.
  26. 26.
  27. 27.
    DTD and XSD XML Schemas. https://www.w3.org/standards/xml/schema
  28. 28.
    Dy, J.G., Krause, A. (eds.): Proceedings of the International Conference on Machine Learning. JMLR.org (2018). http://jmlr.org/proceedings/papers/v80/
  29. 29.
  30. 30.
  31. 31.
  32. 32.
  33. 33.
  34. 34.
  35. 35.
    Graph for Scala. http://www.scala-graph.org
  36. 36.
    Graph 500 Benchmarks. http://graph500.org
  37. 37.
  38. 38.
  39. 39.
  40. 40.
    Apache Spark GraphX. https://spark.apache.org/graphx
  41. 41.
    Apache TinkerPop. https://tinkerpop.apache.org
  42. 42.
    Group, W.: Common format for exchange of solved load flow data. IEEE Trans. Power App. Syst. 92(6), 1916–1925 (1973)CrossRefGoogle Scholar
  43. 43.
  44. 44.
    Haase, P., Broekstra, J., Eberhart, A., Volz, R.: A Comparison of RDF Query Languages, pp. 502–517. Springer, Berlin (2004)Google Scholar
  45. 45.
    Herman, I., Melançon, G., Marshall, M.S.: Graph visualization and navigation in information visualization: a survey. IEEE Trans. Vis. Comput. Graph. 6(1), 24–43 (2000)CrossRefGoogle Scholar
  46. 46.
    Holten, D., van Wijk, J.J.: A User Study on Visualizing Directed Edges in Graphs. In: Proceedings of International Conference on Human Factors in Computing Systems (2009)Google Scholar
  47. 47.
    Holzschuher, F., Peinl, R.: Performance of graph query languages: comparison of Cypher, Gremlin and Native Access in Neo4j. In: Proceedings of the Joint EDBT/ICDT Workshops (2013)Google Scholar
  48. 48.
    Jagadish, H.V., Zhou, A. (eds.): PVLDB, Vol. 7 (2013–2014). http://www.vldb.org/pvldb/vol7.html
  49. 49.
  50. 50.
    Jayaram, N., Khan, A., Li, C., Yan, X., Elmasri, R.: Querying knowledge graphs by example entity tuples. In: Proceedings of International Conference on Data Engineering (2016)Google Scholar
  51. 51.
  52. 52.
  53. 53.
    Katifori, A., Halatsis, C., Lepouras, G., Vassilakis, C., Giannopoulou, E.: Ontology visualization methods: a survey. ACM Comput. Surv. 39(4), 10 (2007)CrossRefGoogle Scholar
  54. 54.
    Proceedings of the International Conference on Knowledge Discovery and Data Mining. ACM (2015). http://dl.acm.org/citation.cfm?id=2783258
  55. 55.
    Proceedings of the International Conference on Knowledge Discovery and Data Mining. ACM (2017). http://dl.acm.org/citation.cfm?id=3097983
  56. 56.
    Proceedings of the International Conference on Knowledge Discovery and Data Mining. ACM (2018). http://dl.acm.org/citation.cfm?id=3219819
  57. 57.
    Keeton, K., Roscoe, T. (eds.): Proceedings of the Symposium on Operating Systems Design and Implementation. USENIX Association (2016). https://www.usenix.org/conference/osdi16
  58. 58.
    Knowledge Graph at Siemens. https://youtu.be/9pmQXua9LWA?t=1109
  59. 59.
  60. 60.
    Letunic, I., Bork, P.: Interactive tree of life: an online tool for phylogenetic tree display and annotation. Bioinformatics 23(1), 127–128 (2006)CrossRefGoogle Scholar
  61. 61.
    Lu, Y., Cheng, J., Yan, D., Wu, H.: Large-scale Distributed graph computing systems: an experimental evaluation. PVLDB 8(3), 281–292 (2014)Google Scholar
  62. 62.
    Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: A system for large-scale graph processing. In: Proceedings of International Conference on Management of Data (2010)Google Scholar
  63. 63.
  64. 64.
    Mattson, T., Bader, D.A., Berry, J.W., Buluç, A., Dongarra, J.J., Faloutsos, C., Feo, J., Gilbert, J.R., Gonzalez, J., Hendrickson, B., Kepner, J., Leiserson, C.E., Lumsdaine, A., Padua, D.A., Poole, S., Reinhardt, S.P., Stonebraker, M., Wallach, S., Yoo, A.: Standards for graph algorithm primitives. In: Proceedings of High Performance Extreme Computing Conference (2013)Google Scholar
  65. 65.
  66. 66.
    Detect Fraud in Real Time with Graph Databases. https://neo4j.com/whitepapers/fraud-detection-graph-databases
  67. 67.
    The 2016 State of the Graph Report. https://neo4j.com/resources/2016-state-of-the-graph
  68. 68.
  69. 69.
  70. 70.
  71. 71.
  72. 72.
  73. 73.
  74. 74.
    Pienta, R., Tamersoy, A., Endert, A., Navathe, S., Tong, H., Chau, D.H.: VISAGE: Interactive visual graph querying. In: Proceedings of International Working Conference on Advanced Visual Interfaces (2016)Google Scholar
  75. 75.
    Precup, D., Teh, Y.W. (eds.): Proceedings of the International Conference on Machine Learning. JMLR.org (2017). http://jmlr.org/proceedings/papers/v70
  76. 76.
    Qiu, X., Cen, W., Qian, Z., Peng, Y., Zhang, Y., Lin, X., Zhou, J.: Real-time constrained cycle detection in large dynamic graphs. PVLDB 11(12), 1876–1888 (2018)Google Scholar
  77. 77.
    Rath, M., Akehurst, D., Borowski, C., Mäder, P.: Are graph query languages applicable for requirements traceability analysis? In: Proceedings of International Conference on Requirements Engineering: Foundation for Software Quality (2017)Google Scholar
  78. 78.
    van Rest, O., Hong, S., Kim, J., Meng, X., Chafi, H.: PGQL: a property graph query language. In: Proceedings of Graph Data Management Experiences and Systems (2016)Google Scholar
  79. 79.
    Rodriguez, M.A.: The Gremlin Graph Traversal Machine and Language. CoRR arXiv:1508.03843 (2015)
  80. 80.
    Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Press (2016). https://dl.acm.org/citation.cfm?id=3014904
  81. 81.
    Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Press (2017). https://dl.acm.org/citation.cfm?id=3126908
  82. 82.
    Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Press (2018). https://dl.acm.org/citation.cfm?id=3291656
  83. 83.
    Sharma, A., Jiang, J., Bommannavar, P., Larson, B., Lin, J.: GraphJet: real-time content recommendations at twitter. PVLDB 9(13), 1281–1292 (2016)Google Scholar
  84. 84.
    SNAP: Standford Network Analysis Project. https://snap.stanford.edu
  85. 85.
    Proceedings of the Symposium on Cloud Computing. ACM (2015). http://dl.acm.org/citation.cfm?id=2806777
  86. 86.
    Proceedings of the Symposium on Cloud Computing. ACM (2017). http://dl.acm.org/citation.cfm?id=3127479
  87. 87.
    Proceedings of the Symposium on Cloud Computing. ACM (2018). http://dl.acm.org/citation.cfm?id=3267809
  88. 88.
    Proceedings of the Symposium on Operating Systems Principles. ACM (2017). http://dl.acm.org/citation.cfm?id=3132747
  89. 89.
    Apache Spark—Preparing for the Next Wave of Reactive Big Data. https://info.lightbend.com/white-paper-spark-survey-trends-adoption-report-register.html
  90. 90.
  91. 91.
  92. 92.
  93. 93.
  94. 94.
    The TPC-C Benchmark. http://www.tpc.org/tpcc
  95. 95.
    Vehlow, C., Beck, F., Weiskopf, D.: Visualizing group structures in graphs: a survey. Computer Graphics Forum 36(6), 201–225 (2017)CrossRefGoogle Scholar
  96. 96.
  97. 97.
    Wang, C., Tao, J.: Graphs in scientific visualization: a survey. Computer Graphics Forum 36(1), 263–287 (2017)CrossRefGoogle Scholar
  98. 98.
    Zhao, Y., Yuan, C., Liu, G., Grinberg, I.: Graph-based preconditioning conjugate gradient algorithm for “N-1” contingency analysis. In: IEEE Power Energy Society General Meeting (2018)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.University of WaterlooWaterlooCanada

Personalised recommendations