SPARQL Query Recommendations by Example

Allocca, Carlo; Adamou, Alessandro; d’Aquin, Mathieu; Motta, Enrico

doi:10.1007/978-3-319-47602-5_26

Carlo Allocca¹⁹,
Alessandro Adamou¹⁹,
Mathieu d’Aquin¹⁹ &
…
Enrico Motta¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 9989))

Included in the following conference series:

European Semantic Web Conference

1523 Accesses
2 Citations

Abstract

In this demo paper, a SPARQL Query Recommendation Tool (called SQUIRE) based on query reformulation is presented. Based on three steps, Generalization, Specialization and Evaluation, SQUIRE implements the logic of reformulating a SPARQL query that is satisfiable w.r.t a source RDF dataset, into others that are satisfiable w.r.t a target RDF dataset. In contrast with existing approaches, SQUIRE aims at recommending queries whose reformulations: (i) reflect as much as possible the same intended meaning, structure, type of results and result size as the original query and (ii) do not require to have a mapping between the two datasets. Based on a set of criteria to measure the similarity between the initial query and the recommended ones, SQUIRE demonstrates the feasibility of the underlying query reformulation process, ranks appropriately the recommended queries, and offers a valuable support for query recommendations over an unknown and unmapped target RDF dataset, not only assisting the user in learning the data model and content of an RDF dataset, but also supporting its use without requiring the user to have intrinsic knowledge of the data.

You have full access to this open access chapter, Download conference paper PDF

SPARQL Query Writing with Recommendations Based on Datasets

Federated Query Evaluation Supported by SPARQL Recommendation

Provenance-Based SPARQL Query Formulation

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

One of the main aspects that characterises Linked Open Data (LOD) is Heterogeneity: it is not hard to find RDF datasets that describe overlapping domains using different vocabularies [11]. A long-standing challenge raised by this state of affairs is related to the access and retrieval of data. In particular, a common scenario that is playing a central role to accomplish a number of tasks, including integration, enriching and comparing data from several RDF datasets, can be described as follows: given a query \(Q_{o}\) (e.g. Select distinct ?mod ?title where { ?mod a ou:Module. ?mod dc:title ?title } ^{Footnote 1}) formulated w.r.t a source RDF dataset \(D_{s}\) (e.g. \(D_{ou}\) = http://data.open.ac.uk/query), we need to reformulate it w.r.t another similar target RDF dataset \(D_{t}\) (e.g. \(D_{su}\) = http://sparql.data.southampton.ac.uk). Achieving this goal usually involves quite intensive and time consuming ad-hoc pre-processing [12]. In particular, it requires spending time in exploring and understanding the target RDF dataset’s data model and content, and then, iteratively reformulating and testing SPARQL queries until the user reaches a query formulation that is right for his/her needs [3]. Reformulating a query over many RDF datasets can be very laborious but, if aided by tool support that recognises similarities and provides prototypical queries that can be tested without the user’s prior knowledge of the dataset, the time and effort could be significantly reduced. In this demo paper, we propose a novel approach and a tool (called SQUIRE) that, given a SPARQL query \(Q_{o}\) that is satisfiable w.r.t a source RDF dataset (\(D_{s}\)), provides query recommendations by automatically reformulating \(Q_{o}\) into others \(Q_{r_{i}}\) that are satisfiable w.r.t a target RDF dataset (\(D_{t}\)). In contrast with existing approaches (see Sect. 2), SQUIRE aims at recommending queries whose reformulations: i) reflect as much as possible the same intended meaning, structure, type of results and result size as the original query and ii) do not require to have an ontology mapping and/or instance matching between the datasets. Based on a set of criteria to measure the similarity between the user-provided query \(Q_{o}\) and the recommended ones \(Q_{r_{i}}\), we have prototyped our approach. Demo session attendants will have the opportunity to experiment with SQUIRE over real-world SPARQL endpoints, thus demonstrating the feasibility of the underlying query reformulation and query recommendation processes. The paper structure is as follows: Sect. 2 discusses existing works, Sect. 3 details the SQUIRE’s approach and its implementation. Finally, Sect. 4 concludes and points out future research.

2 Related Work

To the best of our knowledge, there is no other study investigating SPARQL query recommendations over unmapped RDF datasets that take user queries into account. On the contrary, several solutions exist to address the issue of SPARQL query rewriting for implementing data integration over linked data. For instance, [7] devised a query rewriting approach that makes full use of schema mapping, whereas [2] relies on an explicit ontology alignment between the source \(D_{s}\) and the target \(D_{t}\). Similarly, [9] described a method for query approximation where the entities appearing in the query can be generalized w.r.t an given ontology mapping. Moreover, several systems have been proposed, with very good achievements, to support users with no knowledge on SPARQL or RDF to build appropriate queries from scratch. Just to mention a few, Sparklis [4], QUICK [12], QueryMed [10] are designed on a query building process that is based on a guided interactive questions and answers. The authors of RDF-GL [6] designed a method based on a visual query language where a query can be viewed in a natural language-like form and in a graphical form. On the same line, but hiding the SPARQL language syntax, SparqlFilterFlow [5] and SPARQLViz [1] proposed an approach based on visual interface where the queries can be created entirely with graphical elements.

Closer to our goal, [3] aims at alleviating the effort of understanding the potential use of an RDF dataset by automatically extracting relevant natural language questions that could be formulated and executed over it. Although all the above studies were useful for us as they contribute interesting elements to build on, they are mainly driven by a context where the user is not familiar with the underlying technologies (which is not our case) and having in mind the goal that semantic access and retrieval of data can be made more usable through an appropriate natural language based systems. In contrast, we are focusing on a method to make SPARQL query recommendations by reformulating a user query for accessing and retrieving data from unmapped RDF datasets.

3 Method and Implementation

To achieve our goal, SQUIRE proposes and implements a mechanism based on three steps: Generalization, Specialization and Evaluation. To present each of them, let us consider the case in which we want to build recommendations for the example query \(Q_{ou}\) but w.r.t. the Southampton University RDF dataset \({D_{su}}\).

Generalization aims at generalizing the entities (classes, properties, individuals and literals) of \(Q_{o}\) that are not present in \({D_{t}}\) into variables (marked as template variables)^{Footnote 2}. By applying this step, we build what we called the Generalized Query Template (GQT). Back to the query \(Q_{ou}\), the GQT is obtained from it by turning the entities ou:Module as a class and dc:title as a datatype property into two template variables, that is ?ct1 and ?dtp1, respectively. The result is shown in the root node of the tree in Fig. 1.
Specialization aims at specializing consistently the obtained GQT by applying two main operations: (a) Instantiation (I) instantiates consistently a template variable with a corresponding concrete value that belongs to \({D_{t}}\) (e.g. we instantiate ?ct1[?dtp1] over each class [datatype property] of \({D_{su}}\)); and (b) Removal (R) deletes an entire triple pattern from the \(Q_{o}\)’s GQT. We called the output of this step Specialized Query Tree. Figure 1 shows a part of it for \(Q_{ou}\).
Evaluation. As a result of the previous two steps, each tree node is considered to be a reformulated query that is a candidate for recommendation. However, some of them are more “similar” to the original one than the others. Thus, the main question was: how can we capture and compute such similarity to provide a score-based ranking? Being in accord with [8] that there is no universal way of measuring the distance and/or similarities between two formal queries, we based our approach on a linear combination of the following criteria^{Footnote 3}: (a) Result Type Similarity aiming at measuring the overlap of the types of results (URI or literal) between \(Q_{o}\) and any recommendations \(Q_{r_{i}}\); (b) Query Result Size Similarity aiming at measuring the result size rate (normalized w.r.t datasets sizes) between \(Q_{o}\) and any recommended one \(Q_{r_{i}}\); (c) Query Root Distance aiming at measuring the cost of each applied operation from the root node to the one containing the recommended \(Q_{r_{i}}\). It takes into account the distance-based matching of the replaced entities and the structure (as a set of triple patterns) between \(Q_{o}\) and any recommended \(Q_{r_{i}}\); and (d) Query Specificity Distance aiming at measuring the distance between \(Q_{o}\) and any recommended \(Q_{r_{i}}\) based on the sets of variables (total shared variables/total variables).

A screenshot of the implemented tool is shown in Fig. 2. Basically, SQUIRE allows the user to (1) refer to a source RDF dataset, either as an RDF file or as the URL of a SPARQL endpoint; (2) write down the query \(Q_{o}\) w.r.t \(D_{s}\) and (3) specify the target RDF dataset. Once the user clicks on the Recommend button, SQUIRE executes the method described above and returns a list of scored recommended queries, sorted high-to-low. Another distinctive characteristic of SQUIRE is that the recommended queries not only are expressed in terms of the target dataset, but also are guaranteed to be satisfiable (i.e. the result set is not empty) and can therefore be used to access and retrieve data from the target dataset.

4 Conclusion and Discussion

SQUIRE, as an approach and a tool, enables SPARQL query recommendations by reformulating a user query that is satisfiable w.r.t a source RDF dataset \(D_{s}\), into others that are satisfiable w.r.t a target (and unmapped) RDF dataset \(D_{t}\). One of the advantages of SQUIRE is that not only it helps learning the data model and content of a dataset, which usually requires a huge initial effort, but also enhances their use straightforwardly without the user’s prior knowledge. Indeed, the problem is not fully solved. One of the aspects we have planned to investigate is the case where the reformulation is based on other types of operations (e.g. adding a triple pattern, or more generally replacing a graph pattern with another one, and so on). Moreover, we want to extend this work in such a way that covers, apart from SELECT (which is the main focus here), other types of queries such as DESCRIBE, CONSTRUCT and ASK. Finally, we believe that the outcomes of research on SPARQL query profiling can be combined with ours to improve the corresponding approaches.

Notes

1.
Select the names of the modules available at The Open University.
2.
It is a query variable that has been consistently indexed with natural numbers and named according to its type. We used ct, it, opt, dpt and lt, for class, instance, object property, data type property and literal, respectively.
3.
The weights are options given to the user to set based on their preferences.

References

Borsje, J., Embregts, H.: Graphical Query Composition and Natural Language Processing in an RDF Visualization Interface. E.S. of E. and B., Univ., Rott. (2006)
Google Scholar
Correndo, G., Salvadores, M., Millard, I., Glaser, H., Shadbolt, N.: SPARQL query rewriting for implementing data integration over linked data. In: Proceedings of the EDBT/ICDT Workshops, EDBT 2010. ACM, New York (2010)
Google Scholar
d’Aquin, M., Motta, E.: Extracting relevant questions to an RDF dataset using formal concept analysis. In Proceedings of the 6th K-CAP, USA (2011)
Google Scholar
Ferre, S., Sparklis: an expressive query builder for SPARQL endpoints with guidance in natural language. Sem. Web Inter. Usab. App. (2016, to appear)
Google Scholar
Haag, F., Lohmann, S., Ertl, T.: SparqlFilterFlow: SPARQL query composition for everyone. In: Presutti, V., Blomqvist, E., Troncy, R., Sack, H., Papadakis, I., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8798, pp. 362–367. Springer, Heidelberg (2014). doi:10.1007/978-3-319-11955-7_49
Google Scholar
Hogenboom, F., Milea, V., Frasincar, F., Kaymak, U: RDF-GL: a SPARQL-based graphical query language for RDF. In: Emergent Web Intelligence: Advanced Information Retrieval (2010)
Google Scholar
Makris, K., Bikakis, N., Gioldasis, N., Tsinaraki, C., Christodoulakis, S.: Towards a mediator based on OWL and SPARQL. In: Lytras, M.D., et al. (eds.) WSKS 2009. LNCS, vol. 5736, pp. 326–335. Springer, Heidelberg (2009)
Google Scholar
Picalausa, F., Vansummeren, S.: What are real SPARQL queries like? In: Proceedings of SWIM 2011, pp. 7:1-7:6. ACM, New York (2011)
Google Scholar
Reddy, B.R.K., Kumar, P.S.: Efficient approximate SPARQL querying of web of linked data. In: URSW CEUR Workshop Proceeding, CEUR-WS.org (2010)
Seneviratne, O.: QueryMed: an intuitive SPARQL query builder for biomedical RDF data (2010)
Google Scholar
Tzitzikas, Y., et al.: Integrating heterogeneous and distributed information about marine species through a top level ontology. In: Garoufallou, E., Greenberg, J. (eds.) MTSR 2013. CCIS, vol. 390, pp. 289–301. Springer, Heidelberg (2013). doi:10.1007/978-3-319-03437-9_29
Chapter Google Scholar
Zenz, G., Zhou, X., Minack, E., Siberski, W., Nejdl, W.: From keywords to semantic queries-incremental query construction on the semantic web. Web Sem. 7(3), 166–176 (2009)
Article Google Scholar

Download references

Acknowledgments

This work was supported by the MK:Smart project (OU Reference HGCK B4466).

Author information

Authors and Affiliations

Knowledge Media Institute, The Open University, Milton Keynes, UK
Carlo Allocca, Alessandro Adamou, Mathieu d’Aquin & Enrico Motta

Authors

Carlo Allocca
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro Adamou
View author publications
You can also search for this author in PubMed Google Scholar
Mathieu d’Aquin
View author publications
You can also search for this author in PubMed Google Scholar
Enrico Motta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carlo Allocca .

Editor information

Editors and Affiliations

Hasso-Plattner-Institut für Softwaresystemtechnik, Universität Potsdam, Potsdam, Germany
Harald Sack
Innovation Development, Istituto Superiore Mario Boella, Turin, Italy
Giuseppe Rizzo
Technical University of Ilmenau, Ilemnau, Germany
Nadine Steinmetz
Artiﬁcial Intelligence Laboratory, J. Stefan Institute, Ljubljana, Slovenia
Dunja Mladenić
Institut für Informatik III, University of Bonn, Bonn, Germany
Sören Auer
Institut für Informatik III, Universität Bonn, Bonn, Germany
Christoph Lange

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Allocca, C., Adamou, A., d’Aquin, M., Motta, E. (2016). SPARQL Query Recommendations by Example. In: Sack, H., Rizzo, G., Steinmetz, N., Mladenić, D., Auer, S., Lange, C. (eds) The Semantic Web. ESWC 2016. Lecture Notes in Computer Science(), vol 9989. Springer, Cham. https://doi.org/10.1007/978-3-319-47602-5_26

Download citation

DOI: https://doi.org/10.1007/978-3-319-47602-5_26
Published: 20 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47601-8
Online ISBN: 978-3-319-47602-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

SPARQL Query Recommendations by Example

Abstract

Similar content being viewed by others

SPARQL Query Writing with Recommendations Based on Datasets

Federated Query Evaluation Supported by SPARQL Recommendation

Provenance-Based SPARQL Query Formulation

Keywords

1 Introduction

2 Related Work

3 Method and Implementation

4 Conclusion and Discussion

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

SPARQL Query Recommendations by Example

Abstract

Similar content being viewed by others

SPARQL Query Writing with Recommendations Based on Datasets

Federated Query Evaluation Supported by SPARQL Recommendation

Provenance-Based SPARQL Query Formulation

Keywords

1 Introduction

2 Related Work

3 Method and Implementation

4 Conclusion and Discussion

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation