Skip to main content

Distributed Efficient Provenance-Aware Regular Path Queries on Large RDF Graphs

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10827))

Included in the following conference series:

Abstract

With the proliferation of knowledge graphs, massive RDF graphs have been published on the Web. As an essential type of queries for RDF graphs, Regular Path Queries (RPQs) have been attracting increasing research efforts. However, the existing query processing approaches mainly focus on the standard semantics of RPQs, which cannot provide provenance of the answer sets. We propose dProvRPQ that is a distributed approach to evaluating provenance-aware RPQs over big RDF graphs. Our Pregel-based method employs Glushkov automata to keep track of matching processes of RPQs in parallel. Meanwhile, four optimization strategies are devised, including edge filtering, candidate states, message compression, and message selection, which can reduce the intermediate results of the basic dProvRPQ algorithm dramatically and overcome the counting-paths problem to some extent. The proposed algorithms are verified by extensive experiments on both synthetic and real-world datasets, which show that our approach can efficiently answer the provenance-aware RPQs over large RDF graphs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://cloud.tencent.com/.

  2. 2.

    http://swat.cse.lehigh.edu/projects/lubm/.

  3. 3.

    http://dsg.uwaterloo.ca/watdiv/.

  4. 4.

    http://wiki.dbpedia.org/.

  5. 5.

    http://dsg.uwaterloo.ca/watdiv/watdiv-data-model.txt.

References

  1. Arenas, M., Conca, S., Pérez, J.: Counting beyond a Yottabyte, or how SPARQL 1.1 property paths will prevent adoption of the standard. In: Proceedings of the 21st International Conference on World Wide Web, pp. 629–638. ACM (2012)

    Google Scholar 

  2. Barceló, P., Libkin, L., Lin, A.W., Wood, P.T.: Expressive languages for path queries over graph-structured data. ACM Trans. Database Syst. (TODS) 37(4), 31 (2012)

    Article  Google Scholar 

  3. Brüggemann-Klein, A.: Regular expressions into finite automata. Theoret. Comput. Sci. 120(2), 197–213 (1993)

    Article  MathSciNet  Google Scholar 

  4. Calvanese, D., De Giacomo, G., Lenzerini, M., Vardi, M.Y.: Answering regular path queries using views. In: 16th International Conference on Data Engineering, Proceedings, pp. 389–398. IEEE (2000)

    Google Scholar 

  5. Dey, S., Cuevas-Vicenttín, V., Köhler, S., Gribkoff, E., Wang, M., Ludäscher, B.: On implementing provenance-aware regular path queries with relational query engines. In: Proceedings of the Joint EDBT/ICDT 2013 Workshops, pp. 214–223. ACM (2013)

    Google Scholar 

  6. Harris, S., Seaborne, A., Prudhommeaux, E.: SPARQL 1.1 query language. W3C Recomm. 21(10) (2013). https://www.w3.org/TR/sparql11-query/

  7. Jupp, S., Malone, J., Bolleman, J., Brandizi, M., Davies, M., Garcia, L., Gaulton, A., Gehant, S., Laibe, C., Redaschi, N., et al.: The EBI RDF platform: linked open data for the life sciences. Bioinformatics 30(9), 1338–1339 (2014)

    Article  Google Scholar 

  8. Koschmieder, A., Leser, U.: Regular path queries on large graphs. In: Ailamaki, A., Bowers, S. (eds.) SSDBM 2012. LNCS, vol. 7338, pp. 177–194. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31235-9_12

    Chapter  Google Scholar 

  9. Kostylev, E.V., Reutter, J.L., Romero, M., Vrgoč, D.: SPARQL with property paths. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 3–18. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25007-6_1

    Chapter  Google Scholar 

  10. Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 135–146. ACM (2010)

    Google Scholar 

  11. Nolé, M., Sartiani, C.: Regular path queries on massive graphs. In: Proceedings of the 28th International Conference on Scientific and Statistical Database Management, p. 13. ACM (2016)

    Google Scholar 

  12. Tong, Y., She, J., Meng, R.: Bottleneck-aware arrangement over event-based social networks: the max-min approach. World Wide Web 19(6), 1151–1177 (2016)

    Article  Google Scholar 

  13. Wang, X., Ling, J., Wang, J., Wang, K., Feng, Z.: Answering provenance-aware regular path queries on RDF graphs using an automata-based algorithm. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 395–396. ACM (2014)

    Google Scholar 

  14. Wang, X., Wang, J.: ProvRPQ: an interactive tool for provenance-aware regular path queries on RDF graphs. In: Cheema, M.A., Zhang, W., Chang, L. (eds.) ADC 2016. LNCS, vol. 9877, pp. 480–484. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46922-5_44

    Chapter  Google Scholar 

  15. Wang, X., Wang, J., Zhang, X.: Efficient distributed regular path queries on RDF graphs using partial evaluation. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 1933–1936. ACM (2016)

    Google Scholar 

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China (61572353, 61772361), the National High-tech R&D Program of China (863 Program) (2013AA013204), and the Natural Science Foundation of Tianjin (17JCYBJC15400).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xin, Y., Wang, X., Jin, D., Wang, S. (2018). Distributed Efficient Provenance-Aware Regular Path Queries on Large RDF Graphs. In: Pei, J., Manolopoulos, Y., Sadiq, S., Li, J. (eds) Database Systems for Advanced Applications. DASFAA 2018. Lecture Notes in Computer Science(), vol 10827. Springer, Cham. https://doi.org/10.1007/978-3-319-91452-7_49

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-91452-7_49

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-91451-0

  • Online ISBN: 978-3-319-91452-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics