Advertisement

Knowledge and Information Systems

, Volume 44, Issue 2, pp 439–474 | Cite as

Compressed vertical partitioning for efficient RDF management

  • Sandra Álvarez-García
  • Nieves Brisaboa
  • Javier D. Fernández
  • Miguel A. Martínez-Prieto
  • Gonzalo Navarro
Regular Paper

Abstract

The Web of Data has been gaining momentum in recent years. This leads to increasingly publish more and more semi-structured datasets following, in many cases, the RDF (Resource Description Framework) data model based on atomic triple units of subject, predicate, and object. Although it is a very simple model, specific compression methods become necessary because datasets are increasingly larger and various scalability issues arise around their organization and storage. This requirement is even more restrictive in RDF stores because efficient SPARQL solution on the compressed RDF datasets is also required. This article introduces a novel RDF indexing technique that supports efficient SPARQL solution in compressed space. Our technique, called \(\hbox {k}^2\)-triples, uses the predicate to vertically partition the dataset into disjoint subsets of pairs (subject, object), one per predicate. These subsets are represented as binary matrices of subjects \(\times \) objects in which 1-bits mean that the corresponding triple exists in the dataset. This model results in very sparse matrices, which are efficiently compressed using \(\hbox {k}^2\)-trees. We enhance this model with two compact indexes listing the predicates related to each different subject and object in the dataset, in order to address the specific weaknesses of vertically partitioned representations. The resulting technique not only achieves by far the most compressed representations, but also achieves the best overall performance for RDF retrieval in our experimental setup. Our approach uses up to 10 times less space than a state-of-the-art baseline and outperforms its time performance by several orders of magnitude on the most basic query patterns. In addition, we optimize traditional join algorithms on \(\hbox {k}^2\)-triples and define a novel one leveraging its specific features. Our experimental results show that our technique also overcomes traditional vertical partitioning for join solution, reporting the best numbers for joins in which the non-joined nodes are provided, and being competitive in most of the cases.

Keywords

RDF Compressed index Vertical partitioning  Memory-based SPARQL solution \(\hbox {k}^2\)-tree 

Notes

Acknowledgments

This work was partially funded by the Spanish Ministry of Economy and Competitiveness (PGE & FEDER), grants TIN2009-14560-C03-02 (first and second authors) and TIN2013-46238-C4-3-R (first, second, third, and fourth authors); CDTI, Spanish Ministry of Economy and Competitiveness, and Axencia Galega de Innovación (CDTI EXP 00064563 / ITC-20133062), and the Xunta de Galicia with FEDER ref. GRC2013/053 (first and second authors); and Chilean Fondecyt, refs. 1-110066 and 1-140796. The first author is granted by the Spanish Ministry of Economy and Competitiveness ref. BES-2010-039022. The third author is granted by the Regional Government of Castilla y Leon (Spain) and the European Social Fund. The fourth author has a Ibero-American Young Teachers and Researchers Grant funded by Santander Universidades.

References

  1. 1.
    Abadi D, Marcus A, Madden S, Hollenbach K (2009) SW-store: a vertically partitioned DBMS for semantic web data management. VLDB J 18:385–406CrossRefGoogle Scholar
  2. 2.
    Abadi D, Madden S, Ferreira M (2006) Integrating compression and execution in column-oriented database systems. In: Proceedings of 33rd international conference on management of data (SIGMOD)’, pp 671–682Google Scholar
  3. 3.
    Abadi D, Marcus A, Madden S, Hollenbach K (2007) Scalable semantic web data management using vertical partitioning. In: Proceedings of 33rd international conference on very large data bases (VLDB)’, pp 411–422Google Scholar
  4. 4.
    Anglés R, Gutiérrez C (2005) Querying RDF data from a graph database perspective. In: Proceedings of 2nd European semantic web conference (ESWC)’, pp 346–360Google Scholar
  5. 5.
    Arias M, Fernández J, Martínez-Prieto M (2011) An empirical study of real-world SPARQL queries. In: Proceedings of 1st international workshop on usage analysis and the web of data (USEWOD). Available at http://arxiv.org/abs/1103.5043
  6. 6.
    Atre M, Chaoji V, Zaki M, Hendler J (2010) Matrix “bit” loaded: a scalable lightweight join query processor for RDF data. In: Proceedings of 19th international conference on world wide web (WWW)’, pp 41–50Google Scholar
  7. 7.
    Auer S, Bizer C, Kobilarov G, Lehmann J, Cyganiak R, Ives Z (2007) DBpedia: a nucleus for a web of open data. In: Proceedings of 6th international semantic web (ISWC) conference and 2nd Asian semantic web conference (ASWC)’, pp 722–735Google Scholar
  8. 8.
    Berners-Lee T, Hendler J, Lassila O (2001) The semantic web. Scientific American MagazineGoogle Scholar
  9. 9.
    Binna R, Gassler W, Zangerle E, Pacher D, Specht, G (2011) SpiderStore: a native main memory approach for graph storage. In: Proceedings of 23rd workshop Grundlagen von Datenbanken (GvDB)’, pp 91–96Google Scholar
  10. 10.
    Bizer C, Heath T, Berners-Lee T (2009) Linked data-the story so far. Int J Semant Web Inf Syst 5:1–22Google Scholar
  11. 11.
    Bönström V, Hinze A, Schweppe H (2003) Storing RDF as a graph. In: Proceedings of 1st Latin American Web Congress (LA-WEB)’, pp 27–36Google Scholar
  12. 12.
    Brisaboa N, Ladra S, Navarro G (2013) DACs: Bringing direct access to variable-length codes. Inf Process Manag 49(1):392–404CrossRefGoogle Scholar
  13. 13.
    Brisaboa N, Ladra S, Navarro G (2014) Compact representation of web graphs with extended functionality. Inf Syst 39(1):152–174CrossRefGoogle Scholar
  14. 14.
    Brisaboa N, de Bernardo G, Navarro G (2012) Compressed dynamic binary relations. In: Proceedings of 22nd data compression conference (DCC)’, pp 52–61Google Scholar
  15. 15.
    Broekstra J, Kampman A, van Harmelen F (2003) Sesame: an architecture for storing and querying RDF data and schema information. In: Spinning the semantic web, chapter , MIT Press, pp 197–222Google Scholar
  16. 16.
    Claude F, Ladra S (2011) Practical representations for Web and social graphs. In: Proceedings of 20th ACM conference on information and knowledge management (CIKM)’, pp 1185–1190Google Scholar
  17. 17.
    Fernández JD, Martínez-Prieto MA, Gutiérrez C, Polleres A (2011) Binary RDF representation for publication and exchange (HDT), W3C Member Submission. http://www.w3.org/Submission/2011/03/
  18. 18.
    Fernández JD, Martínez-Prieto MA, Gutiérrez C, Polleres A, Arias M (2013) Binary RDF representation for publication and exchange (HDT). J Web Semant. (in press). Available at: doi: 10.1016/j.websem.2013.01.002
  19. 19.
    González R, Grabowski S, Mäkinen V, Navarro G (2005) Practical implementation of rank and select queries. In: Proceedings of posters of 4th workshop on experimental algorithms (WEA), pp 27–38Google Scholar
  20. 20.
    Grant J, Beckett D (2004) RDF test cases, W3C recommendation. http://www.w3.org/TR/rdf-testcases/
  21. 21.
    Groppe S (2011) Data management and query processing in semantic web databases. Springer, BerlinCrossRefGoogle Scholar
  22. 22.
    Groza T, Grimnes G, Handschuh S, Decker S (2013) From raw publications to linked data. Knowl Inf Syst 34:1–21CrossRefGoogle Scholar
  23. 23.
    Harris S, Gibbins N (2003) 3store: efficient bulk RDF storage. In: Proceedings of 1st international workshop on practical and scalable semantic systems (PSSS), pp 1–15Google Scholar
  24. 24.
    Harth A, Decker S (2005) Optimized index structures for querying RDF from the web. In: Proceedings of 3rd Latin American Web Congress (LA-WEB)’, pp 71–80Google Scholar
  25. 25.
    Hayes J, Gutiérrez C (2004) Bipartite graphs as intermediate model for RDF. In: Proceedings of 3rd international semantic web conference (ISWC), pp 47–61Google Scholar
  26. 26.
    Huang J, Abadi D, Ren K (2011) Scalable SPARQL querying of large RDF graphs. Proc VLDB Endow 4(11):1123–1134Google Scholar
  27. 27.
    Janik M, Kochut K (2005) BRAHMS: a workbench RDF store and high performance memory system for semantic association discovery. In: Proceedings of 4th international semantic web conference (ISWC), pp 431–445Google Scholar
  28. 28.
    Jing Y, Jeong D, Baik D (2009) Sparql graph pattern rewriting for owl-dl inference queries. Knowl Inf Syst 20:243–262CrossRefGoogle Scholar
  29. 29.
    Knuth D (1973) The art of computer programming, vol. 3: sorting and searching. Addison Wesley, ReadingGoogle Scholar
  30. 30.
    Manola F, Miller E (eds) (2004) RDF primer, W3C recommendation. http://www.w3.org/TR/rdf-primer/
  31. 31.
    Martínez-Prieto M, Fernández J, Cánovas R (2012) Querying RDF dictionaries in compressed space. ACM SIGAPP Appl Comput Rev 12(2):64–77CrossRefGoogle Scholar
  32. 32.
    MonetDB (2013). http://www.monetdb.org/
  33. 33.
    Navarro G, Mäkinen V (2007) Compressed full-text indexes. In: ACM computing surveys 39(1) article 2Google Scholar
  34. 34.
    Neumann T, Weikum G (2010) The RDF-3X engine for scalable management of RDF data. VLDB J 19:91–113CrossRefGoogle Scholar
  35. 35.
    Neumann T, Weikum G (2009) Scalable join processing on very large RDF graphs. In: Proceedings of 35th international conference on management of data (SIGMOD), pp 627–640Google Scholar
  36. 36.
    Prud’hommeaux E, Seaborne A (eds) (2008) SPARQL query language for RDF, W3C recommendation. http://www.w3.org/TR/rdf-sparql-query/
  37. 37.
    Ramakrishnan R, Gehrke J (2000) Database management systems. Osborne/McGraw-HillGoogle Scholar
  38. 38.
    Sakr S, Al-Naymat G (2010) Relational processing of RDF queries: a survey. SIGMOD Rec 38:23–28CrossRefGoogle Scholar
  39. 39.
    Sakr S, Elnikety S, He Y (2012) G-SPARQL: a hybrid engine for querying large attributed graphs. In: Proceedings of 21st ACM conference on information and knowledge management (CIKM), pp 335–344Google Scholar
  40. 40.
    Salomon D (2007) Variable-length codes for data compression. Springer, BerlinCrossRefGoogle Scholar
  41. 41.
    Samet H (2006) Foundations of multidimensional and metric data structures. Morgan Kaufmann Publishers Inc, Los AltosGoogle Scholar
  42. 42.
    Sánchez D, Isern D, Millan M (2011) Content annotation for the semantic web: an automatic web-based approach. Knowl Inf Syst 27:393–418CrossRefGoogle Scholar
  43. 43.
    Schmidt M, Hornung T, Küchlin N, Lausen G, Pinkel C (2008) An experimental comparison of RDF data management approaches in a SPARQL benchmark scenario. In: Proceedings of 7th international conference on the semantic web (ISWC), pp 82–97Google Scholar
  44. 44.
    Sidirourgos L, Goncalves R, Kersten M, Nes N, Manegold S (2008) Column-store support for RDF data management: not all swans are white. Proc VLDB Endow 1(2):1553–1563CrossRefGoogle Scholar
  45. 45.
    Stonebraker M, Abadi D, Batkin A, Chen X, Cherniack M, Ferreira M, Lau E, Lin A, Madden S, O’Neil E, O’Neil P, Rasin A, Tran N, Zdonik S (2005) C-store: a column-oriented DBMS. In: Proceedings of 31st international conference on very large data bases (VLDB), pp 553–564Google Scholar
  46. 46.
    Urbani J, Maassen J, Bal H (2010) Massive semantic web data compression with MapReduce. In: Proceedings of 19th ACM international symposium on high performance distributed computing (HPDC), pp 795–802Google Scholar
  47. 47.
    Virtuoso Universal Server (2013) http://virtuoso.openlinksw.com/
  48. 48.
    Weiss C, Karras P, Bernstein A (2008) Hexastore: sextuple indexing for semantic web data management. Proc VLDB Endow 1(1):1008–1019CrossRefGoogle Scholar
  49. 49.
    Wilkinson K (2006) Jena property table implementation. In: Proceedings of 2nd international workshop on scalable semantic web knowledge base systems (SSWS), pp 35–46Google Scholar

Copyright information

© Springer-Verlag London 2014

Authors and Affiliations

  • Sandra Álvarez-García
    • 1
  • Nieves Brisaboa
    • 1
  • Javier D. Fernández
    • 2
    • 3
  • Miguel A. Martínez-Prieto
    • 2
    • 3
    • 4
  • Gonzalo Navarro
    • 3
  1. 1.Database Lab, Facultade de InformáticaUniversity of A CoruñaA CoruñaSpain
  2. 2.DataWeb Research, Department of Computer ScienceUniversity of ValladolidValladolidSpain
  3. 3.Department of Computer ScienceUniversity of ChileSantiagoChile
  4. 4.Escuela Universitaria de InformáticaCampus María ZambranoSegoviaSpain

Personalised recommendations