Advertisement

Compact and efficient representation of general graph databases

  • Sandra Álvarez-García
  • Borja Freire
  • Susana Ladra
  • Óscar Pedreira
Regular Paper
  • 24 Downloads

Abstract

In this paper, we propose a compact data structure to store labeled attributed graphs based on the \(k^2\)-tree, which is a very compact data structure designed to represent a simple directed graph. The idea we propose can be seen as an extension of the \(k^2\)-tree to support property graphs. In addition to the static approach, we also propose a dynamic version of the storage representation, which allows flexible schemas and insertion or deletion of data. We provide an implementation of a basic set of operations, which can be combined to form complex queries over these graphs with attributes. We evaluate the performance of our proposal with existing graph database systems and prove that our compact attributed graph representation obtains also competitive time results.

Keywords

Compression Graph databases Property graphs Attributed graphs Compact data structures Dynamic graphs 

Notes

Acknowledgements

This research has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie [Grant Agreement No 690941]; from the Ministerio de Economía y Competitividad (PGE and ERDF) [Grant Numbers TIN2015-69951-R; TIN2016-77158-C4-3-R] and from Xunta de Galicia (co-founded with ERDF) [Grant Numbers ED431C 2017/58; ED431G/01]. We also thank Nieves R. Brisaboa for her contributions during the initial discussions of this work.

References

  1. 1.
    Aggarwal C, Wang H (2010) Managing and mining graph data. Springer, New YorkCrossRefGoogle Scholar
  2. 2.
    Álvarez-García S, de Bernardo G, Brisaboa N, Navarro G (2017) A succinct data structure for self-indexing ternary relations. J Discrete Algorithms 43:38–53MathSciNetCrossRefGoogle Scholar
  3. 3.
    Angles R, Gutiérrez C (2008) Survey of graph database models. ACM Comput Surv 40(1):1CrossRefGoogle Scholar
  4. 4.
    Böhm H-J, Schneider G (2000) Virtual screening for bioactive molecules. Wiley, WeinheimCrossRefGoogle Scholar
  5. 5.
    Boldi P, Vigna S (2004) The WebGraph framework I: compression techniques. In: Proceedings of the 13th international world wide web conference (WWW), pp 595–601Google Scholar
  6. 6.
    Bornea,M, Dolby J, Kementsietsidis A, Srinivas K, Dantressangle P, Udrea O, Bhattacharjee B (2013) Building an efficient RDF store over a relational database. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data (SIGMOD). ACM, pp 21–132Google Scholar
  7. 7.
    Brisaboa N, Cerdeira-Pena A, de Bernardo G, Navarro G (2017) Compressed representation of dynamic binary relations with applications. Inf Syst 69:106–123CrossRefGoogle Scholar
  8. 8.
    Brisaboa N, Ladra S, Navarro G (2014) Compact representation of web graphs with extended functionality. Inf Syst 39(1):152–174CrossRefGoogle Scholar
  9. 9.
    Caro D, Rodríguez MA, Brisaboa NR, Fariña A (2016) Compressed kd-tree for temporal graphs. Knowl Inf Syst 49:553–595CrossRefGoogle Scholar
  10. 10.
    Chierichetti F, Kumar R, Lattanzi S, Mitzenmacher M, Panconesi A, Raghavan P (2009) On compressing social networks. In: Proceedings of 15th conference on knowledge discovery and data mining (KDD), pp 219–228Google Scholar
  11. 11.
    Ciglan M, Averbuch A, Hluchy L (2012) Benchmarking traversal operations over graph databases. In: Proceedings of the 28th international conference on data engineering workshops (ICDEW), pp 186–189Google Scholar
  12. 12.
    Claude F, Navarro G (2010) Fast and compact web graph representations. ACM Trans Web 4(4):16CrossRefGoogle Scholar
  13. 13.
    Conte D, Foggia P, Sansone C, Vento M (2004) Thirty years of graph matching in pattern recognition. Int J Pattern Recognit Artif Intell 18(3):265–298CrossRefGoogle Scholar
  14. 14.
    de Bernardo G, Álvarez-García S, Brisaboa N, Navarro G, Pedreira O (2013) Compact querieable representations of raster data. In: Proceedings 20th international symposium on string processing and information retrieval (SPIRE). LNCS 8214, pp 96–108Google Scholar
  15. 15.
    Erling O, Averbuch A, Larriba-Pey J, Chafi H, Gubichev A, Prat A, Pham M-D, Boncz P (2011) The LDBC social network benchmark: interactive workload. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data (SIGMOD). ACM, pp 619–630Google Scholar
  16. 16.
    Fischer J, Peters D (2016) GLOUDS: representing tree-like graphs. J Discrete Algorithms 36:39–49MathSciNetCrossRefGoogle Scholar
  17. 17.
    Grouplens (2014) Movielens dataset. http://grouplens.org/datasets/movielens/
  18. 18.
    Gyssens M, Paredaens J, Van den Bussche J, Van Gucht D (1994) A graph-oriented object database model. IEEE Trans Knowl Data Eng 6(4):572–586CrossRefGoogle Scholar
  19. 19.
    Han J, Haihong E, Le G, Du J (2011) Survey on NoSQL database. In: Proceedings of the 6th international conference on pervasive computing and applications (ICPCA), pp 363–366Google Scholar
  20. 20.
    Hernández C, Navarro G (2014) Compressed representations for web and social graphs. Knowl Inf Syst 40(2):279–313CrossRefGoogle Scholar
  21. 21.
    Iordanov B (2010) HyperGraphDB: a generalized graph database. In: Web-age information management. Springer, pp 25–36Google Scholar
  22. 22.
    Jacobson G (1989) Space-efficient static trees and graphs. In: Proceedings of the 30th IEEE symposium on foundations of computer science (FOCS), pp 549–554Google Scholar
  23. 23.
    Ladra S, Paramá J, Silva-Coira F (2017) Scalable and queryable compressed storage structure for raster data. Inf Syst 72:179–204CrossRefGoogle Scholar
  24. 24.
    Larriba-Pey J.L, Martínez-Bazán N, Domínguez-Sal D (2014) Introduction to graph databases. In: Reasoning web. Reasoning on the web in the big data Era, Vol. 8714 of Lecture Notes in Computer Science. Springer International Publishing, pp 171–194Google Scholar
  25. 25.
    Levene M, Poulovassilis A (1990) The hypernode model and its associated query language. In: Proceedings of the 5th Jerusalem conference on information technology, IEEE, pp 520–530Google Scholar
  26. 26.
    Mäkinen V, Navarro G (2008) Dynamic entropy-compressed sequences and full-text indexes. ACM Trans Algorithms 4(3):32–38MathSciNetCrossRefGoogle Scholar
  27. 27.
    Malewicz G, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM international conference on management of data (SIGMOD), pp 135–146Google Scholar
  28. 28.
    Maneth S, Peternek F (2016) Compressing graphs by grammars. In: Proceedings of the 32nd IEEE international conference on data engineering (ICDE). IEEE, pp 109–120Google Scholar
  29. 29.
    Martínez-Bazan N, Águila-Lorente MA, Muntés-Mulero V, Dominguez-Sal D, Gómez-Villamor S, Larriba-Pey JL (2012) Efficient graph management based on bitmap indices. In: Proceedings of the 16th international database engineering and applications symposium (IDEAS). ACM, pp 110–119Google Scholar
  30. 30.
    Martínez-Bazan N, Muntés-Mulero V, Gómez-Villamor S, Nin J, Sánchez-Martínez MA, Larriba-Pey JL (2007) DEX: high-performance exploration on large graphs for information retrieval. In: Proceedings of the 16th ACM conference on information and knowledge management (CIKM). ACM, pp 573–582Google Scholar
  31. 31.
    Navarro G (2014) Wavelet trees for all. J Discrete Algorithms 25:2–20MathSciNetCrossRefGoogle Scholar
  32. 32.
    Navarro G (2016) Compact data structures—a practical approach. Cambridge University Press, Cambridge. ISBN 978-1-107-15238-0Google Scholar
  33. 33.
    Padrol-Sureda A, Perarnau-Llobet G, Pfeifle J, Muntés-Mulero V (2010) Overlapping community search for social networks. In: Proceedings of the IEEE 26th international conference on data engineering (ICDE). IEEE Press, pp 992–995Google Scholar
  34. 34.
    Paradies M, Kinder C, Bross J, Fischer T, Kasperovics R, Gildhoff H (2017) GraphScript: implementing complex graph algorithms in SAP HANA. In: Proceedings of the 16th international symposium on database programming languages (DBPL). ACM, pp 13:1–13:4Google Scholar
  35. 35.
    Prezza N (2017) A framework of dynamic data structures for string processing. In: International symposium on experimental algorithms. Leibniz international proceedings in informatics (LIPIcs)Google Scholar
  36. 36.
    Raghavan S, Garcia-Molina H (2003) Representing web graphs. In: Proceedings of the IEEE 19th international conference on data engineering (ICDE). IEEE Press, pp 405–416Google Scholar
  37. 37.
    Robinson I, Webber J, Eifrem E (2013) Graph databases, O’ReillyGoogle Scholar
  38. 38.
    Samet H (2006) Foundations of multidimensional and metric data structures. Morgan Kaufmann Publishers Inc, BurlingtonzbMATHGoogle Scholar
  39. 39.
    SAP (2016) SAP HANA Graph Reference. Document version 1.0Google Scholar
  40. 40.
    Sun W, Fokoue A, Srinivas K, Kementsietsidis A, Hu G, Xie G (2015) SQLGraph: an efficient relational-based property graph store. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data (SIGMOD). ACM, pp 1887–1901Google Scholar
  41. 41.
    Tinkerpop (2014) Gremlim query language. https://github.com/tinkerpop/gremlin/wiki

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2018

Authors and Affiliations

  1. 1.IndraA CoruñaSpain
  2. 2.Enxenio S.L.A CoruñaSpain
  3. 3.CITIC, Database LaboratoryUniversidade da CoruñaA CoruñaSpain

Personalised recommendations