CORNER: A Completeness Reasoner for SPARQL Queries Over RDF Data Sources
- 1.2k Downloads
With the increased availability of data on the Semantic Web, the question whether data sources offer data of appropriate quality for a given purpose becomes an issue. With CORNER, we specifically address the data quality aspect of completeness. CORNER supports SPARQL BGP queries and can take RDFS ontologies into account in its analysis. If a query can only be answered completely by a combination of sources, CORNER rewrites the original query into one with SPARQL SERVICE calls, which assigns each query part to a suitable source, and executes it over those sources. CORNER builds upon previous work by Darari et al.  and is implemented using standard Semantic Web frameworks.
KeywordsData quality Data completeness Query completeness SPARQL
In recent years, large amounts of data have been made available on the Semantic Web, which can be accessed by posing queries to SPARQL endpoints. As more data become available, quality of data becomes an issue since data in different sources may be suitable for different usages. In particular, data completeness may vary among data sources. Consequently, users who pose a query to different sources may get answers with different degrees of completeness. The question is how to support users in choosing sources over which their queries can retrieve complete answers.
For relational databases, Levy  proposed a format for statements about data completeness and studied how to assess the completeness of a query in the presence of such statements. Razniewski and Nutt  introduced a general reasoning technique for this problem and provided a comprehensive complexity analysis. Darari et al.  developed a framework for completeness reasoning techniques on the Semantic Web. The framework enables one to provide descriptions as to which parts of a data source are complete, called completeness statements, and to perform checks whether a given query over such a data source returns a complete result, called query completeness checks. The framework supports basic graph pattern (BGP) queries  and can take into account RDFS ontologies featuring subclass, subproperty, domain and range. Moreover, if a query can be ensured to be complete over a combination of data sources, the framework tells one how to produce a federated rewriting of the query that contains SERVICE calls , with query parts that are to be sent to the relevant data sources.
We have implemented the reasoning techniques of Darari et al.  using standard Semantic Web frameworks that can process RDF data and SPARQL queries, and reason with RDFS ontologies in a system called CORNER. Moreover, we have built a Web-based demo to show the functionalities of CORNER, which can be accessed at http://corner.inf.unibz.it/. While our implementation is based on Apache Jena1, the approach would also be applicable to other Semantic Web frameworks like OpenRDF Sesame2. As a demo for our system, we show various aspects of completeness reasoning in the domain of movies, using the LinkedMDB3 and DBpedia4 data sources, which are RDF versions of IMDb and Wikipedia, respectively. Interestingly, IMDb already contains assertions in English about the completeness of cast and crew of movies5, which are currently still not reflected in its RDF counterpart, LinkedMDB.
2 Motivating Examples
We attach this statement to LinkedMDB but not to DBpedia, since some information that Tarantino was starred in some movies is actually missing in DBpedia. CORNER then analyzes the query and the statement, and concludes that the query over LinkedMDB can be answered completely, while it cannot give such a guarantee for DBpedia.
We imagine that such statements could be part of the meta-information about a data source like the ones provided by VoID descriptions7. In fact, completeness statements in RDF syntax can be embedded into VoID descriptions. Alternatively, there could be query hubs that contain such metadata about sources, propose sources suitable for a given query and execute the query over those sources. CORNER demonstrates the second possibility.
3 System Architecture
As shown in Fig. 1, CORNER consists of two main components, built on top of the Linked Data layer.
The processes inside the backend are controlled by the CORNER business logic, which implements the completeness reasoning technique in  consisting of the following steps. From the query \(Q\), CORNER generates an initial RDF graph \(G^i_Q\) that represents the information needed for answering the query. Moreover, every completeness statement \(C\) is translated into a SPARQL CONSTRUCT query \(Q_C\). Application of all the queries \(Q_C\) to the graph \(G^i_Q\) results in a graph \(G^a_Q\), which is a subgraph of \(G^i_Q\) and represents the parts of the query for which data are complete. By evaluating \(Q\) over \(G^a_Q\), CORNER tests whether the complete data are sufficient to answer \(Q\). Finally, if \(Q\) can be answered completely, based on the data sources information of the completeness statements that contribute to generate \(G^a_Q\), CORNER distributes the query parts of \(Q\) to their suitable, complete data sources.
4 Demo Description
Figure 2 shows the example of the query about budget and box-office gross of movies starring Quentin Tarantino, mentioned above. We first specify the SPARQL query in the query panel of the Web UI. Then, in the ontology panel, we specify which ontologies we want to use. In this case, we only need to activate the mapping ontology for LinkedMDB and DBpedia. After that, in the completeness statements panel, we select the statements about data sources to be used for query completeness checking. The figure shows the two completeness statements we mentioned above.
To start completeness reasoning, the user has to click the execution button at the bottom of the UI. Now, CORNER returns to the user the query results and information stating that the completeness of the query can be guaranteed. CORNER also provides debugging information about the completeness reasoning and the federated rewriting of the query that was executed over the data sources.
This work has been partially supported by the project “MAGIC: Managing Completeness of Data” funded by the province of Bozen-Bolzano, and the European Master’s Program in Computational Logic (EMCL).
- 1.Darari, F., Nutt, W., Pirrò, G., Razniewski, S.: Completeness statements about RDF data sources and their use for query answering. In: Alani, H., et al. (eds.) ISWC 2013, Part I. LNCS, vol. 8218, pp. 66–83. Springer, Heidelberg (2013)Google Scholar
- 2.Levy, A.Y.: Obtaining complete answers from incomplete databases. In: PVLDB (1996)Google Scholar
- 3.Razniewski, S., Nutt, W.: Completeness of queries over incomplete databases. In: PVLDB (2011)Google Scholar
- 4.Harris, S., Seaborne, A.: SPARQL 1.1 query language. Technical report, W3C (2013)Google Scholar