Effects of Network Structure Improvement on Distributed RDF Querying
- Cite this paper as:
- Ali L., Janson T., Lausen G., Schindelhauer C. (2013) Effects of Network Structure Improvement on Distributed RDF Querying. In: Hameurlain A., Rahayu W., Taniar D. (eds) Data Management in Cloud, Grid and P2P Systems. Globe 2013. Lecture Notes in Computer Science, vol 8059. Springer, Berlin, Heidelberg
In this paper, we analyze the performance of distributed RDF systems in a peer-to-peer (P2P) environment. We compare the performance of P2P networks based on Distributed Hash Tables (DHTs) and search-tree based networks. Our simulations show a performance boost of factor 2 when using search-tree based networks. This is achieved by grouping related data in branches of the tree, which tend to be accessed combined in a query, e.g. data of a university domain is in one branch. We observe a strongly unbalanced data distribution when indexing the RDF triples by subject, predicate, and object, which raises the question of scalability for huge data sets, e.g. peer responsible for predicate ’type’ is overloaded. However, we show how to exploit this unbalanced data distribution, and how we can speed up the evaluation of queries dramatically with only a few additional routing links, so-called shortcuts, to these frequently occurring triples components. These routing shortcuts can be established with only a constant increase of the peer’s routing tables. To cope with hotspots of unfair load balancing, we propose a novel indexing scheme where triples are indexed ’six instead of three times’ with only 23% data overhead in experiments and the possibility of more parallelism in query processing. For experiments, we use the LUBM data set and benchmark queries.
Unable to display preview. Download preview PDF.