Massive-Scale RDF Processing Using Compressed Bitmap Indexes

  • Kamesh Madduri
  • Kesheng Wu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6809)

Abstract

The Resource Description Framework (RDF) is a popular data model for representing linked data sets arising from the web, as well as large scientific data repositories such as UniProt. RDF data intrinsically represents a labeled and directed multi-graph. SPARQL is a query language for RDF that expresses subgraph pattern-finding queries on this implicit multigraph in a SQL-like syntax. SPARQL queries generate complex intermediate join queries; to compute these joins efficiently, this paper presents a new strategy based on bitmap indexes. We store the RDF data in column-oriented compressed bitmap structures, along with two dictionaries. We find that our bitmap index-based query evaluation approach is up to an order of magnitude faster the state-of-the-art system RDF-3X, for a variety of SPARQL queries on gigascale RDF data sets.

Keywords

semantic data RDF SPARQL query optimization compressed bitmap indexes large-scale data analysis 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: Scalable semantic web data management using vertical partitioning. In: Proc. 33rd Int’l. Conference on Very Large Data Bases (VLDB 2007), pp. 411–422 (2007)Google Scholar
  2. 2.
    Atre, M., Chaoji, V., Zaki, M.J., Hendler, J.A.: Matrix ”bit” loaded: a scalable lightweight join query processor for RDF data. In: Proc. 19th Int’l. World Wide Web Conference (WWW), pp. 41–50 (2010)Google Scholar
  3. 3.
    Bizer, C., Heath, T., Berners-Lee, T.: Linked data – the story so far. Int’l. J. Semantic Web Inf. Syst. 5(3), 1–22 (2009)CrossRefGoogle Scholar
  4. 4.
    Guo, Y., Pan, Z., Heflin, J.: LUBM: A benchmark for OWL knowledge base systems. Web Semant. 3, 158–182 (2005)CrossRefGoogle Scholar
  5. 5.
    McGlothlin, J.P., Khan, L.: Efficient RDF data management including provenance and uncertainty. In: Proc.14th Int’l. Database Engineering & Applications Symposium (IDEAS 2010), pp. 193–198 (2010)Google Scholar
  6. 6.
    McGlothlin, J.P., Khan, L.R.: RDFJoin: A scalable data model for persistence and efficient querying of RDF datasets. Tech. Rep. UTDCS-08-09, Univ. of Texas at Dallas (2008)Google Scholar
  7. 7.
    Murray, C.: RDF data model in Oracle. Tech. Rep. B19307-01, Oracle (2005)Google Scholar
  8. 8.
    Neumann, T., Weikum, G.: RDF-3X: a RISC-style engine for RDF. In: Proc. VLDB Endow., vol. 1, pp. 647–659 (August 2008)Google Scholar
  9. 9.
    O’Neil, P.: Model 204 architecture and performance. In: Proc. of HPTS , vol 359. LNCS, pp. 40–59 (1987)Google Scholar
  10. 10.
    Prud’Hommeaux, E., Seaborne, A.: SPARQL query language for RDF. In: World Wide Web Consortium. Recommendation REC-rdf-sparql-query-20080115 (January 2008)Google Scholar
  11. 11.
    Redaschi, N.: Uniprot in RDF: Tackling data integration and distributed annotation with the semantic web. In: Proc. 3rd Int’l. Biocuration Conf. (2009)Google Scholar
  12. 12.
    Sidirourgos, L., Goncalves, R., Kersten, M., Nes, N., Manegold, S.: Column-store support for RDF data management: not all swans are white. In: Proc. VLDB Endow., vol. 1, pp. 1553–1563 (August 2008)Google Scholar
  13. 13.
    Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: A large ontology from Wikipedia and WordNet. Web Semant. 6, 203–217 (2008)CrossRefGoogle Scholar
  14. 14.
    Wu, K., Otoo, E., Shoshani, A.: Optimizing bitmap indices with efficient compression. ACM TODS 31(1), 1–38 (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Kamesh Madduri
    • 1
  • Kesheng Wu
    • 1
  1. 1.Lawrence Berkeley National LaboratoryBerkeleyUSA

Personalised recommendations