Advertisement

Big Data Analytics: Exploring Graphs with Optimized SQL Queries

  • Sikder Tahsin Al-Amin
  • Carlos Ordonez
  • Ladjel Bellatreche
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 903)

Abstract

Nowadays there is an abundance of tools and systems to analyze large graphs. In general, the goal is to summarize the graph and discover interesting patterns hidden in the graph. On the other hand, there is a lot of data stored on DBMSs that can be potentially analyzed as graphs. External graph data sets can be quickly loaded. It is feasible to load data quickly and that SQL can help prepare graph data sets from raw data. In this paper, we show SQL queries on a graph stored in relational form as triples can reveal many interesting properties and patterns on the graph in a more flexible manner and efficient than existing systems. We explain many interesting statistics on the graph can be derived with queries combining joins and aggregations. On the other hand, linearly recursive queries can summarize interesting patterns including reachability, paths, and connected components. We experimentally show exploratory queries can be efficiently evaluated based on the input edges and it performs better than Spark. We also show that skewed degree vertices, cycles and cliques are the main reason exploratory queries become slow.

Keywords

Graph Parallel DBMS SQL 

References

  1. 1.
    Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases : The Logical Level, Facsimile edn. Pearson Education POD, Boston (1994)Google Scholar
  2. 2.
    Agrawal, R., Dar, S., Jagadish, H.: Direct and transitive closure algorithms: design and performance evaluation. ACM TODS 15(3), 427–458 (1990)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Bancilhon, F., Ramakrishnan, R.: An amateur’s introduction to recursive query processing strategies. In: Proceedings of ACM SIGMOD Conference, pp. 16–52 (1986)Google Scholar
  4. 4.
    Cabrera, W., Ordonez, C.: Scalable parallel graph algorithms with matrix–vector multiplication evaluated with queries. Distrib. Parallel Databases 35(3–4), 335–362 (2017)CrossRefGoogle Scholar
  5. 5.
    Jindal, A., Rawlani, P., Wu, E., Madden, S., Deshpande, A., Stonebraker, M.: VERTEXICA: your relational friend for graph analytics!. Proc. VLDB Endow. 7(13), 1669–1672 (2014)CrossRefGoogle Scholar
  6. 6.
    Johnson, T., Kanza, Y., Lakshmanan, L.V.S., Shkapenyuk, V.: Nepal: a path query language for communication networks. In: Proceedings of the 1st ACM SIGMOD Workshop on Network Data Analytics, NDA 2016, pp. 6:1–6:8 (2016)Google Scholar
  7. 7.
    Kang, U., Tsourakakis, C.E., Faloutsos, C.: PEGASUS: a peta-scale graph mining system implementation and observations. In: Proceedings of the 2009 Ninth IEEE International Conference on Data Mining, ICDM 2009, pp. 229–238 (2009)Google Scholar
  8. 8.
    Lamb, A., et al.: The Vertica analytic database: C-store 7 years later. Proc. VLDB Endow. 5, 1790–1801 (2012)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Leskovec, J., Krevl, A.: SNAP datasets: stanford large network dataset collection, June 2014. http://snap.stanford.edu/data
  10. 10.
    Libkin, L., Wong, L.: Incremental recomputation of recursive queries with nested sets and aggregate functions. In: Cluet, S., Hull, R. (eds.) DBPL 1997. LNCS, vol. 1369, pp. 222–238. Springer, Heidelberg (1998).  https://doi.org/10.1007/3-540-64823-2_13CrossRefGoogle Scholar
  11. 11.
    Mumick, I., Finkelstein, S., Pirahesh, H., Ramakrishnan, R.: Magic Conditions. ACM TODS 21(1), 107–155 (1996)CrossRefGoogle Scholar
  12. 12.
    Mumick, I., Pirahesh, H.: Implementation of magic-sets in a relational database system. In: ACM SIGMOD, pp. 103–114 (1994)Google Scholar
  13. 13.
    Ordonez, C., Cabrera, W., Gurram, A.: Comparing columnar, row and array DBMSs to process recursive queries on graphs. Inf. Syst. 63, 66–79 (2016)CrossRefGoogle Scholar
  14. 14.
    Pavlo, A., et al.: A comparison of approaches to large-scale data analysis. In: Proceedings of ACM SIGMOD Conference, pp. 165–178 (2009)Google Scholar
  15. 15.
    Rodriguez, M.A.: The Gremlin graph traversal machine and language (invited talk). In: Proceedings of the 15th Symposium on Database Programming Languages, DBPL 2015, pp. 1–10 (2015)Google Scholar
  16. 16.
    Seshadri, S., Naughton, J.: On the expected size of recursive Datalog queries. In: Proceedings of ACM PODS Conference, pp. 268–279 (1991)Google Scholar
  17. 17.
    Siek, J., Lee, L.Q., Lumsdaine, A.: Boost c++ libraries. https://www.boost.org/
  18. 18.
    Sakr, S., Elnikety, S., He, Y.: Hybrid query execution engine for large attributed graphs. Inf. Syst. 41, 45–73 (2014)CrossRefGoogle Scholar
  19. 19.
    Tetzel, F., Voigt, H., Paradies, M., Lehner, W.: An analysis of the feasibility of graph compression techniques for indexing regular path queries. In: Proceedings of the Fifth International Workshop on Graph Data-management Experiences & Systems, GRADES 2017, pp. 11:1–11:6 (2017)Google Scholar
  20. 20.
    Thakkar, H., Punjani, D., Auer, S., Vidal, M.-E.: Towards an integrated graph algebra for graph pattern matching with Gremlin. In: Benslimane, D., Damiani, E., Grosky, W.I., Hameurlain, A., Sheth, A., Wagner, R.R. (eds.) DEXA 2017 Part I. LNCS, vol. 10438, pp. 81–91. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-64468-4_6CrossRefGoogle Scholar
  21. 21.
    Ullman, J.: Implementation of logical query languages for databases. ACM Trans. Database Syst. 10(3), 289–321 (1985)CrossRefGoogle Scholar
  22. 22.
    Valduriez, P., Boral, H.: Evaluation of recursive queries using join indices. In: Expert Database Systems, pp. 271–293 (1986)Google Scholar
  23. 23.
    Youn, C., Kim, H., Henschen, L., Han, J.: Classification and compilation of linear recursive queries in deductive databases. IEEE TKDE 4(1), 52–67 (1992)Google Scholar
  24. 24.
    Zhao, K., Yu, J.X.: All-in-one: graph processing in RDBMSs revisited. In: Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD, pp. 1165–1180 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Sikder Tahsin Al-Amin
    • 1
  • Carlos Ordonez
    • 1
  • Ladjel Bellatreche
    • 2
  1. 1.University of HoustonHoustonUSA
  2. 2.LIAS/ISAE-ENSMAPoitiersFrance

Personalised recommendations