Automated conversion from natural language query to SPARQL query

Abstract

Nowadays, domain ontologies are widely used as background knowledge bases. However, end users of ontology-based question answering (QA) systems are unaware of major concepts of ontology or the structure of domain ontology schema. Thus, it has been essential to provide an efficient method to reduce this gap. Namely, the critical issue for ontology-based QA systems is how to generate a SPARQL query from a user’s natural language query (NLQ). Therefore, we proposed a method to generate SPARQL queries from Korean natural language queries. When an input query comes in, we split it into a set of tokens and map each token to certain resources in the ontology. Subsequently, a graph generation process creates multiple “query graphs” by arranging the resources and identifying relationships between them. To identify relations between resources, we applied a path search algorithm based on the structure of domain ontology schema. We score query graphs by measuring the degree to which the graph reflected the general user’s intent, and the best-rated query graph is converted into a SPARQL query. We implemented a prototype system to evaluate the proposed method for the music domain ontology and conclude that our query conversion process can convert Korean natural language queries into semantically equivalent SPARQL queries. We anticipate that, after appropriate modification, the process can be applied to other languages.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Notes

  1. 1.

    http://lucene.apache.org/core/

  2. 2.

    https://jena.apache.org/documentation/query/

  3. 3.

    http://musicontology.com/

References

  1. Bernstein, A., Kaufmann, E., Kaiser, C. (2005). Querying the semantic web with ginseng: a guided input natural language search engine. In 15th Workshop on information technologies and systems (pp. 112–126). Las Vegas: Citeseer.

  2. Cha, J, Lee, G, Lee, JH. (1998). Generalized unknown morpheme guessing for hybrid pos tagging of Korean. In Proceedings of the 6th annual workshop on very large corpora (pp. 85–93).

  3. Cho, K, Van Merriënboer, B, Gulcehre, C, Bahdanau, D, Bougares, F, Schwenk, H, Bengio, Y. (2014). Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv:14061078.

  4. Craswell, N. (2009). Mean reciprocal rank. In Liu, L, & Özsu, MT (Eds.) Encyclopedia of database systems (p. 1703). Boston: Springer, https://doi.org/10.1007/978-0-387-39940-9_488.

  5. Ferré, S. (2013). Squall: a controlled natural language as expressive as sparql 1.1. In International conference on application of natural language to information systems (pp. 114–125): Springer.

  6. Hart, P.E., Nilsson, N.J., Raphael, B. (1968). A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics, 4(2), 100–107.

    Article  Google Scholar 

  7. Kaufmann, E, Bernstein, A, Zumstein, R. (2006). Querix: a natural language interface to query ontologies based on clarification dialogs. In 5th International semantic web conference (ISWC 2006) (pp. 980–981): Citeseer.

  8. Lee, M., Kim, W., Park, S. (2012). Searching and ranking method of relevant resources by user intention on the semantic web. Expert Systems with Applications, 39 (4), 4111–4121.

    Article  Google Scholar 

  9. Lehmann, J, & Bühmann, L. (2011). Autosparql: let users query your knowledge base. In Extended semantic web conference (pp. 63–79): Springer.

  10. Lei, Y, Uren, V, Motta, E. (2006). Semsearch: a search engine for the semantic web. In International conference on knowledge engineering and knowledge management (pp. 238–245): Springer.

  11. Sander, M., Waltinger, U., Roshchin, M., Runkler, T. (2014). Ontology-based translation of natural language queries to sparql. In 2014 AAAI fall symposium series.

  12. Soru, T., Marx, E., Moussallem, D., Publio, G., Valdestilhas, A., Esteves, D., Baron Neto, C. (2017). SPARQL as a foreign language. arXiv:1708.07624.

  13. Tran, T., Cimiano, P., Rudolph, S., Studer, R. (2007). Ontology-based interpretation of keywords for semantic search (pp. 523–536). Springer.

  14. Wang, C., Xiong, M., Zhou, Q., Yu, Y. (2007). Panto: a portable natural language interface to ontologies. In Franconi, E, Kifer, M, May, W (Eds.) The semantic web: research and applications (pp. 473–487). Berlin: Springer.

  15. Yahya, M, Berberich, K, Elbassuoni, S, Ramanath, M, Tresp, V, Weikum, G. (2012). Natural language questions for the web of data. In Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning EMNLP-CoNLL ’12 (pp. 379–390). Stroudsburg: Association for Computational Linguistics.

  16. Zhou, Q., Wang, C., Xiong, M., Wang, H., Yu, Y. (2007). Spark: adapting keyword query to semantic search. In Aberer, K, Choi, K.S, Noy, N, Allemang, D, Lee, K.I, Nixon, L, Golbeck, J, Mika, P, Maynard, D, Mizoguchi, R, Schreiber, G, Cudré-Mauroux, P (Eds.) The semantic web (pp. 694–707). Berlin: Springer.

Download references

Acknowledgements

This research was supported by the project ”Development of NLQ-KBQA Model for KB Query” of SK Telecom Co., Ltd.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Wooju Kim.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Jung, H., Kim, W. Automated conversion from natural language query to SPARQL query. J Intell Inf Syst 55, 501–520 (2020). https://doi.org/10.1007/s10844-019-00589-2

Download citation

Keywords

  • SPARQL generation
  • Domain ontology
  • Natural language query
  • Korean language