An Intermediate Algebra for Optimizing RDF Graph Pattern Matching on MapReduce

  • Padmashree Ravindra
  • HyeongSik Kim
  • Kemafor Anyanwu
Conference paper

DOI: 10.1007/978-3-642-21064-8_4

Volume 6644 of the book series Lecture Notes in Computer Science (LNCS)
Cite this paper as:
Ravindra P., Kim H., Anyanwu K. (2011) An Intermediate Algebra for Optimizing RDF Graph Pattern Matching on MapReduce. In: Antoniou G. et al. (eds) The Semanic Web: Research and Applications. ESWC 2011. Lecture Notes in Computer Science, vol 6644. Springer, Berlin, Heidelberg

Abstract

Existing MapReduce systems support relational style join operators which translate multi-join query plans into several Map-Reduce cycles. This leads to high I/O and communication costs due to the multiple data transfer steps between map and reduce phases. SPARQL graph pattern matching is dominated by join operations, and is unlikely to be efficiently processed using existing techniques. This cost is prohibitive for RDF graph pattern matching queries which typically involve several join operations. In this paper, we propose an approach for optimizing graph pattern matching by reinterpreting certain join tree structures as grouping operations. This enables a greater degree of parallelism in join processing resulting in more “bushy” like query execution plans with fewer Map-Reduce cycles. This approach requires that the intermediate results are managed as sets of groups of triples or TripleGroups. We therefore propose a data model and algebra - Nested TripleGroup Algebra for capturing and manipulating TripleGroups. The relationship with the traditional relational style algebra used in Apache Pig is discussed. A comparative performance evaluation of the traditional Pig approach and RAPID+ (Pig extended with NTGA) for graph pattern matching queries on the BSBM benchmark dataset is presented. Results show up to 60% performance improvement of our approach over traditional Pig for some tasks.

Keywords

MapReduce RDF graph pattern matching optimization techniques 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Padmashree Ravindra
    • 1
  • HyeongSik Kim
    • 1
  • Kemafor Anyanwu
    • 1
  1. 1.Department of Computer ScienceNorth Carolina State UniversityRaleighUSA