International Symposium on String Processing and Information Retrieval

SPIRE 2015: String Processing and Information Retrieval pp 103-115 | Cite as

A Compact RDF Store Using Suffix Arrays

  • Nieves R. Brisaboa
  • Ana Cerdeira-Pena
  • Antonio Fariña
  • Gonzalo Navarro
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9309)

Abstract

RDF has become a standard format to describe resources in the Semantic Web and other scenarios. RDF data is composed of triples (subjectpredicateobject), referring respectively to a resource, a property of that resource, and the value of such property. Compact storage schemes allow fitting larger datasets in main memory for faster processing. On the other hand, supporting efficient SPARQL queries on RDF datasets requires index data structures to accompany the data, which hampers compactness. As done for text collections, we introduce a self-index for RDF data, which combines the data and its index in a single representation that takes less space than the raw triples and efficiently supports basic SPARQL queries. Our storage format, RDFCSA, builds on compressed suffix arrays. Although there exist more compact representations of RDF data, RDFCSA uses about half of the space of the raw data (and replaces it) and displays much more robust and predictable query times around 1–2 microseconds per retrieved triple. RDFCSA is 3 orders of magnitude faster than representations like MonetDB or RDF-3X, while using the same space as the former and 6 times less space than the latter. It is also faster than the more compact representations on most queries, in some cases by 2 orders of magnitude.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    RDF 1.1 XML syntax, W3C recommendation (2004). http://www.w3.org/TR/rdf-syntax-grammar
  2. 2.
    MonetDB (2013). http://www.monetdb.org
  3. 3.
    Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: Scalable semantic Web data management using vertical partitioning. In: Proc. VLDB, pp. 411–422 (2007)Google Scholar
  4. 4.
    Álvarez-García, S., Brisaboa, N., Fernández, J., Martínez-Prieto, M., Navarro, G.: Compressed vertical partitioning for efficient RDF management. Knowledge and Information Systems (2014) (to appear) preprint at www.dcc.uchile.cl/gnavarro/ps/kais14.pdf
  5. 5.
    Arias, M., Fernández, J.D., Martínez-Prieto, M.A., de la Fuente, P.: An empirical study of real-world SPARQL queries. CoRR abs/1103.5043 (2011). http://arxiv.org/abs/1103.5043
  6. 6.
    Atre, M., Chaoji, V., Zaki, M.J., Hendler, J.A.: Matrix “bit” loaded: A scalable lightweight join query processor for RDF data. In: Proc. WWW, pp. 41–50 (2010)Google Scholar
  7. 7.
    Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: A Nucleus for a Web of Open Data. In: Aberer, K., et al. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007) CrossRefGoogle Scholar
  8. 8.
    Barbay, J., Claude, F., Gagie, T., Navarro, G., Nekrich, Y.: Efficient fully-compressed sequence representations. Algorithmica 69(1), 232–268 (2014)MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    Berners-Lee, T., Hendler, J., Lassila, O.: The semantic Web. Scientific American Magazine (2001)Google Scholar
  10. 10.
    Brisaboa, N., Ladra, S., Navarro, G.: Compact representation of Web graphs with extended functionality. Inf. Syst. 39(1), 152–174 (2014)CrossRefGoogle Scholar
  11. 11.
    Clark, D.: Compact PAT Trees. Ph.D. thesis, U. of Waterloo, Canada (1996)Google Scholar
  12. 12.
    Curé, O., Blin, G., Revuz, D., Faye, D.C.: WaterFowl: A Compact, Self-indexed and Inference-Enabled Immutable RDF Store. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 302–316. Springer, Heidelberg (2014) CrossRefGoogle Scholar
  13. 13.
    Fariña, A., Brisaboa, N., Navarro, G., Claude, F., Places, A., Rodríguez, E.: Word-based self-indexes for natural language text. ACM TOIS 30(1), article 1 (2012)Google Scholar
  14. 14.
    Fernández, J.D., Martínez-Prieto, M.A., Gutiérrez, C., Polleres, A., Arias, M.: Binary RDF representation for publication and exchange (HDT). Web Semantics 19, 22–41 (2013)CrossRefGoogle Scholar
  15. 15.
    Ferragina, P., Manzini, G.: Indexing compressed texts. J. ACM 52(4), 552–581 (2005)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representations of sequences and full-text indexes. ACM Trans. Alg. 3(2), article 20 (2007)Google Scholar
  17. 17.
    Ferragina, P., Venturini, R.: The compressed permuterm index. ACM Trans. Alg. 7(1), article 10 (2010)Google Scholar
  18. 18.
    Jing, Y., Jeong, D., Baik, D.K.: SPARQL graph pattern rewriting for OWL-DL inference queries. Knowl. Inf. Syst. 20(2), 243–262 (2009)CrossRefGoogle Scholar
  19. 19.
    Manola, F., Miller, E., (eds): RDF primer, W3C recommendation. http://www.w3.org/TR/rdf-primer (2004)
  20. 20.
    Martínez-Prieto, M.A., Fernández, J.D., Cánovas, R.: Querying RDF dictionaries in compressed space. SIGAPP Appl. Comput. Rev. 12(2), 64–77 (2012)CrossRefGoogle Scholar
  21. 21.
    Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comp. Surv. 39(1), article 2 (2007)Google Scholar
  22. 22.
    Neumann, T., Weikum, G.: The RDF-3X engine for scalable management of RDF data. The VLDB J. 19(1), 91–113 (2010)CrossRefGoogle Scholar
  23. 23.
    Prud’hommeaux, E., Seaborne, A., (eds.): SPARQL query language for RDF, W3C recommendation. http://www.w3.org/TR/rdf-sparql-query (2008)
  24. 24.
    Sadakane, K.: New text indexing functionalities of the compressed suffix arrays. J. Algorithms 48(2), 294–313 (2003)MathSciNetCrossRefMATHGoogle Scholar
  25. 25.
    Sakr, S., Al-Naymat, G.: Relational processing of RDF queries: A survey. SIGMOD Rec. 38(4), 23–28 (2010)CrossRefGoogle Scholar
  26. 26.
    Weiss, C., Karras, P., Bernstein, A.: Hexastore: Sextuple indexing for semantic web data management. Proc. VLDB 1(1), 1008–1019 (2008)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Nieves R. Brisaboa
    • 1
  • Ana Cerdeira-Pena
    • 1
  • Antonio Fariña
    • 1
  • Gonzalo Navarro
    • 2
  1. 1.Database Lab.University of A CoruñaCoruñaSpain
  2. 2.Department of Computer ScienceUniversity of ChileSantiagoChile

Personalised recommendations