Enabling Fine-Grained RDF Data Completeness Assessment
- 10 Citations
- 6 Mentions
- 2k Downloads
Abstract
Nowadays, more and more RDF data is becoming available on the Semantic Web. While the Semantic Web is generally incomplete by nature, on certain topics, it already contains complete information and thus, queries may return all answers that exist in reality. In this paper we develop a technique to check query completeness based on RDF data annotated with completeness information, taking into account data-specific inferences that lead to an inference problem which is \(\varPi ^P_2\)-complete. We then identify a practically relevant fragment of completeness information, suitable for crowdsourced, entity-centric RDF data sources such as Wikidata, for which we develop an indexing technique that allows to scale completeness reasoning to Wikidata-scale data sources. We verify the applicability of our framework using Wikidata and develop COOL-WD, a completeness tool for Wikidata, used to annotate Wikidata with completeness statements and reason about the completeness of query answers over Wikidata. The tool is available at http://cool-wd.inf.unibz.it/.
Keywords
RDF Data completeness SPARQL Query completeness WikidataNotes
Acknowledgments
We would like to thank Sebastian Rudolph for his feedback on an earlier version of this paper. The research was supported by the projects “CANDy: Completeness-Aware Querying and Navigation on the Web of Data” and “TaDaQua - Tangible Data Quality with Object Signatures” of the Free University of Bozen-Bolzano, and “MAGIC: Managing Completeness of Data” of the province of Bozen-Bolzano.
References
- 1.Hayes, P.J., Patel-Schneider, P.F. (eds.): RDF 1.1 Semantics. W3C Recommendation, 25 February 2014Google Scholar
- 2.Vrandecic, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)CrossRefGoogle Scholar
- 3.Darari, F., Nutt, W., Pirrò, G., Razniewski, S.: Completeness statements about rdf data sources and their use for query answering. In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) ISWC 2013, Part I. LNCS, vol. 8218, pp. 66–83. Springer, Heidelberg (2013)CrossRefGoogle Scholar
- 4.Razniewski, S., Korn, F., Nutt, W., Srivastava, D.: Identifying the extent of completeness of query answers over partially complete databases. In: ACM SIGMOD 2015, pp. 561–576 (2015)Google Scholar
- 5.Harris, S., Seaborne, A. (eds.): SPARQL 1.1 Query Language. W3C Recommendation, 21 March 2013Google Scholar
- 6.Wang, R.Y., Strong, D.M.: Beyond accuracy: what data quality means to data consumers. J. Manage. Inf. Syst. 12(4), 5–33 (1996)CrossRefGoogle Scholar
- 7.Motro, A.: Integrity = Validity + Completeness. ACM Trans. Database Syst. 14(4), 480–502 (1989)CrossRefGoogle Scholar
- 8.Levy, A.Y.: Obtaining complete answers from incomplete databases. In: VLDB 1996, pp. 402–412 (1996)Google Scholar
- 9.Razniewski, S., Nutt, W.: Completeness of queries over incomplete databases. PVLDB 4(11), 749–760 (2011)Google Scholar
- 10.Razniewski, S., Nutt, W.: Assessing query completeness over incomplete databases. In: VLDB Journal (submitted)Google Scholar
- 11.Fürber, C., Hepp, M.: SWIQA - a semantic web information quality assessment framework. In: ECIS 2011 (2011)Google Scholar
- 12.Mendes, P.N., Mühleisen, H., Bizer, C.: Sieve: linked data quality assessment and fusion. In: EDBT/ICDT Workshops, pp. 116–123 (2012)Google Scholar
- 13.Chu, X., Morcos, J., Ilyas, I.F., Ouzzani, M., Papotti, P., Tang, N., Ye, Y.: KATARA: a data cleaning system powered by knowledge bases and crowdsourcing. In: ACM SIGMOD 2015, pp. 1247–1261 (2015)Google Scholar
- 14.Acosta, M., Simperl, E., Flöck, F., Vidal, M.-E.: HARE: a hybrid SPARQL engine to enhance query answers via crowdsourcing. In: K-CAP 2015, pp. 11:1–11:8 (2015)Google Scholar
- 15.Galárraga, L.A., Teflioudi, C., Hose, K., Suchanek, F.M.: AMIE: association rule mining under incomplete evidence in ontological knowledge bases. In: WWW 2013, pp. 413–422 (2013)Google Scholar
- 16.Dong, X., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Murphy, K., Strohmann, T., Sun, S., Zhang, W.: Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: ACM SIGKDD 2014, pp. 601–610 (2014)Google Scholar
- 17.Darari, F., Prasojo, R.E., Nutt, W.: Expressing no-value information in RDF. In: ISWC Posters and Demos (2015)Google Scholar