Chapter

Semantic Technology

Volume 8943 of the series Lecture Notes in Computer Science pp 230-235

Date:

G-Diff: A Grouping Algorithm for RDF Change Detection on MapReduce

  • Jinhyun AhnAffiliated withBiomedical Knowledge Engineering Laboratory, Seoul National UniversityDental Research Institute, Seoul National University
  • , Dong-Hyuk ImAffiliated withDepartment of Computer and Information Engineering, Hoseo University Email author 
  • , Jae-Hong EomAffiliated withBiomedical Knowledge Engineering Laboratory, Seoul National UniversityDental Research Institute, Seoul National University Email author 
  • , Nansu ZongAffiliated withBiomedical Knowledge Engineering Laboratory, Seoul National University
  • , Hong-Gee KimAffiliated withBiomedical Knowledge Engineering Laboratory, Seoul National UniversityDental Research Institute, Seoul National University Email author 

* Final gross prices may vary according to local VAT.

Get Access

Abstract

Linked Data is a collection of RDF data that can grow exponentially and change over time. Detecting changes in RDF data is important to support Linked Data consuming applications with version management. Traditional approaches for change detection are not scalable. This has led researchers to devise algorithms on the MapReduce framework. Most works simply take a URI as a Map key. We observed that it is not efficient to handle RDF data with a large number of distinct URIs since many Reduce tasks have to be created. Even though the Reduce tasks are scheduled to run simultaneously, too many small Reduce tasks would increase the overall running time. In this paper, we propose G-Diff, an efficient MapReduce algorithm for RDF change detection. G-Diff groups triples by URIs during Map phase and sends the triples to a particular Reduce task rather than multiple Reduce tasks. Experiments on real datasets showed that the proposed approach takes less running time than previous works.