Enabling Fine-Grained RDF Data Completeness Assessment

  • Fariz Darari
  • Simon Razniewski
  • Radityo Eko Prasojo
  • Werner Nutt
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9671)

Abstract

Nowadays, more and more RDF data is becoming available on the Semantic Web. While the Semantic Web is generally incomplete by nature, on certain topics, it already contains complete information and thus, queries may return all answers that exist in reality. In this paper we develop a technique to check query completeness based on RDF data annotated with completeness information, taking into account data-specific inferences that lead to an inference problem which is \(\varPi ^P_2\)-complete. We then identify a practically relevant fragment of completeness information, suitable for crowdsourced, entity-centric RDF data sources such as Wikidata, for which we develop an indexing technique that allows to scale completeness reasoning to Wikidata-scale data sources. We verify the applicability of our framework using Wikidata and develop COOL-WD, a completeness tool for Wikidata, used to annotate Wikidata with completeness statements and reason about the completeness of query answers over Wikidata. The tool is available at http://cool-wd.inf.unibz.it/.

Keywords

RDF Data completeness SPARQL Query completeness Wikidata 

References

  1. 1.
    Hayes, P.J., Patel-Schneider, P.F. (eds.): RDF 1.1 Semantics. W3C Recommendation, 25 February 2014Google Scholar
  2. 2.
    Vrandecic, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)CrossRefGoogle Scholar
  3. 3.
    Darari, F., Nutt, W., Pirrò, G., Razniewski, S.: Completeness statements about rdf data sources and their use for query answering. In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) ISWC 2013, Part I. LNCS, vol. 8218, pp. 66–83. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  4. 4.
    Razniewski, S., Korn, F., Nutt, W., Srivastava, D.: Identifying the extent of completeness of query answers over partially complete databases. In: ACM SIGMOD 2015, pp. 561–576 (2015)Google Scholar
  5. 5.
    Harris, S., Seaborne, A. (eds.): SPARQL 1.1 Query Language. W3C Recommendation, 21 March 2013Google Scholar
  6. 6.
    Wang, R.Y., Strong, D.M.: Beyond accuracy: what data quality means to data consumers. J. Manage. Inf. Syst. 12(4), 5–33 (1996)CrossRefGoogle Scholar
  7. 7.
    Motro, A.: Integrity = Validity + Completeness. ACM Trans. Database Syst. 14(4), 480–502 (1989)CrossRefGoogle Scholar
  8. 8.
    Levy, A.Y.: Obtaining complete answers from incomplete databases. In: VLDB 1996, pp. 402–412 (1996)Google Scholar
  9. 9.
    Razniewski, S., Nutt, W.: Completeness of queries over incomplete databases. PVLDB 4(11), 749–760 (2011)Google Scholar
  10. 10.
    Razniewski, S., Nutt, W.: Assessing query completeness over incomplete databases. In: VLDB Journal (submitted)Google Scholar
  11. 11.
    Fürber, C., Hepp, M.: SWIQA - a semantic web information quality assessment framework. In: ECIS 2011 (2011)Google Scholar
  12. 12.
    Mendes, P.N., Mühleisen, H., Bizer, C.: Sieve: linked data quality assessment and fusion. In: EDBT/ICDT Workshops, pp. 116–123 (2012)Google Scholar
  13. 13.
    Chu, X., Morcos, J., Ilyas, I.F., Ouzzani, M., Papotti, P., Tang, N., Ye, Y.: KATARA: a data cleaning system powered by knowledge bases and crowdsourcing. In: ACM SIGMOD 2015, pp. 1247–1261 (2015)Google Scholar
  14. 14.
    Acosta, M., Simperl, E., Flöck, F., Vidal, M.-E.: HARE: a hybrid SPARQL engine to enhance query answers via crowdsourcing. In: K-CAP 2015, pp. 11:1–11:8 (2015)Google Scholar
  15. 15.
    Galárraga, L.A., Teflioudi, C., Hose, K., Suchanek, F.M.: AMIE: association rule mining under incomplete evidence in ontological knowledge bases. In: WWW 2013, pp. 413–422 (2013)Google Scholar
  16. 16.
    Dong, X., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Murphy, K., Strohmann, T., Sun, S., Zhang, W.: Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: ACM SIGKDD 2014, pp. 601–610 (2014)Google Scholar
  17. 17.
    Darari, F., Prasojo, R.E., Nutt, W.: Expressing no-value information in RDF. In: ISWC Posters and Demos (2015)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Fariz Darari
    • 1
  • Simon Razniewski
    • 1
  • Radityo Eko Prasojo
    • 1
  • Werner Nutt
    • 1
  1. 1.Free University of Bozen-BolzanoBolzanoItaly

Personalised recommendations