xDBTagger: explainable natural language interface to databases using keyword mappings and schema graph

Usta, Arif; Karakayali, Akifhan; Ulusoy, Özgür

doi:10.1007/s00778-023-00809-w

xDBTagger: explainable natural language interface to databases using keyword mappings and schema graph

Regular Paper
Published: 23 August 2023

Volume 33, pages 301–321, (2024)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

179 Accesses
Explore all metrics

Abstract

Recently, numerous studies have been proposed to attack the natural language interfaces to data-bases (NLIDB) problem by researchers either as a conventional pipeline-based or an end-to-end deep-learning-based solution. Although each approach has its own advantages and drawbacks, regardless of the approach preferred, both approaches exhibit black-box nature, which makes it difficult for potential users to comprehend the rationale behind the decisions made by the intelligent system to produce the translated SQL. Given that NLIDB targets users with little to no technical background, having interpretable and explainable solutions becomes crucial, which has been overlooked in the recent studies. To this end, we propose xDBTagger, an explainable hybrid translation pipeline that explains the decisions made along the way to the user both textually and visually. We also evaluate xDBTagger quantitatively in three real-world relational databases. The evaluation results indicate that in addition to being lightweight, fast, and fully explainable, xDBTagger is also competitive in terms of translation accuracy compared to both pipeline-based and end-to-end deep learning approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

COMBINE: A Pipeline for SQL Generation from Natural Language

Translating synthetic natural language to database queries with a polyglot deep learning framework

Article Open access 16 September 2021

Task-Driven Neural Natural Language Interface to Database

Notes

Available at https://github.com/arifusta/DBTagger

References

Affolter, K., Stockinger, K., Bernstein, A.: A comparative survey of recent natural language interfaces for databases. VLDB J. 28(5), 793–819 (2019)
Article Google Scholar
Baik, C., Jagadish, H.V., Li, Y.: Bridging the semantic gap with SQL query logs in natural language interfaces to databases. In: 2019 IEEE 35th International Conference on Data Engineering (ICDE), pp. 374–385 (2019)
Blunschi, L., Jossen, C., Kossmann, D., Mori, M., Stockinger, K.: Soda: generating SQL for business users. Proc. VLDB Endow. 5(10), 932–943 (2012)
Article Google Scholar
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Article Google Scholar
Cao, R., Chen, L., Chen, Z., Zhao, Y., Zhu, S., Yu, K.: LGESQL: Line graph enhanced text-to-SQL model with mixed local and non-local relations. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 2541–2555. Association for Computational Linguistics, Online. https://doi.org/10.18653/v1/2021.acl-long.198, https://aclanthology.org/2021.acl-long.198 (2021)
Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)
Article MathSciNet Google Scholar
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Gated feedback recurrent neural networks. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning, JMLR.org, ICML’15, vol. 37, pp. 2067–2075 (2015)
Clark, K., Luong, M., Le, Q.V., Manning, C.D.: ELECTRA: pre-training text encoders as discriminators rather than generators. arXiv:2003.10555 (2020)
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12(null), 2493–2537 (2011)
Google Scholar
Crawshaw, M.: Multi-task learning with deep neural networks: a survey. CoRR abs arXiv:2009.09796 (2020)
Deng, X., Awadallah, A.H., Meek, C., Polozov, O., Sun, H., Richardson, M.: Structure-grounded pretraining for text-to-SQL. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1337–1350. Association for Computational Linguistics, Online (2021)
Deutch, D., Frost, N., Gilad, A.: Explaining natural language query results. VLDB J. 29(1), 485–508 (2020)
Article Google Scholar
Devlin, J., Chang, M. W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis (2019)
Dozat, T.: Incorporating nesterov momentum into adam. In: ICLR Workshop, JMLR.org (2016)
Došilović, F.K., Brcic, M., Hlupic, N.: Explainable artificial intelligence: a survey. In: 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pp. 0210–0215 (2018)
Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649 (2013)
Gregor, S., Benbasat, I.: Explanations from intelligent systems: theoretical foundations and implications for practice. MIS Q. 23, 497–530 (1999)
Article Google Scholar
Gunning, D., Aha, D.: DARPA’s explainable artificial intelligence (XAI) program. AI Mag. 40(2), 44–58 (2019)
Google Scholar
Guo, J., Zhan, Z., Gao, Y., Xiao, Y., Lou, J.G., Liu, T., Zhang, D.: Towards Complex Text-to-SQL in Cross-domain Database with Intermediate Representation, pp. 4524–4535. Association for Computational Linguistics, Florence, Italy (2019)
Google Scholar
Hayes-Roth, F., Jacobstein, N.: The state of knowledge-based systems. Commun. ACM 37(3), 26–39 (1994)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Hendrix, G.G., Sacerdoti, E.D., Sagalowicz, D., Slocum, J.: Developing a natural language interface to complex data. ACM Trans. Database Syst. 3(2), 105–147 (1978)
Article Google Scholar
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Improving neural networks by preventing co-adaptation of feature detectors. ArXiv abs arXiv:1207.0580 (2012)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–80 (1997)
Article CAS PubMed Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K. Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269 (2017)
Huang, P.S., Wang, C., Singh, R., Yih, W., He, X.: Natural Language to Structured Query Generation via Meta-learning, pp. 732–738. Association for Computational Linguistics, New Orleans (2018)
Google Scholar
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. arXiv:1508.01991 (2015)
Iyer, S., Konstas, I., Cheung, A., Krishnamurthy, J., Zettlemoyer, L.: Learning a neural semantic parser from user feedback. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 963–973 . Association for Computational Linguistics, Vancouver (2017)
Jou, B., Chang, S.F.: Deep cross residual learning for multitask visual recognition. In: Proceedings of the 24th ACM International Conference on Multimedia, Association for Computing Machinery, New York, MM ’16, pp. 998–1007. https://doi.org/10.1145/2964284.2964309 (2016)
Katsogiannis-Meimarakis, G., Koutrika, G.: A survey on deep learning approaches for text-to-SQL. VLDB J. 32(4), 905–936 (2023). https://doi.org/10.1007/s00778-022-00776-8
Article Google Scholar
Kim, H., So, B.H., Han, W.S., Lee, H.: Natural language to SQL: Where are we today? Proc. VLDB Endow. 13(10), 1737–1750 (2020)
Article Google Scholar
Koutrika, G., Simitsis, A., Ioannidis, Y.E.: Explaining structured queries in natural language. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010), pp. 333–344. https://doi.org/10.1109/ICDE.2010.5447824 (2010)
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, Morgan Kaufmann Publishers Inc., San Francisco, ICML ’01, pp. 282–289 (2001)
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 260–270. Association for Computational Linguistics, San Diego (2016)
Li, F., Jagadish, H.V.: Constructing an interactive natural language interface for relational databases. Proc. VLDB Endow. 8(1), 73–84 (2014)
Article CAS Google Scholar
Lin, X. V., Socher, R., Xiong, C.: Bridging textual and tabular data for cross-domain text-to-SQL semantic parsing. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 4870–4888. Association for Computational Linguistics, Online (2020)
Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1064–1074. Association for Computational Linguistics, Berlin (2016)
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S. J., McClosky, D.: The stanford CoreNLP natural language processing toolkit. In: Association for Computational Linguistics (ACL) System Demonstrations, pp. 55–60 (2014)
Müller, T., Grust, T.: Provenance for SQL through abstract interpretation: value-less, but worthwhile. Proc. VLDB Endow. 8(12), 1872–1875 (2015)
Article Google Scholar
Özcan, F., Quamar, A., Sen, J., Lei, C., Efthymiou, V.: State of the art and open challenges in natural language interfaces to data. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, Association for Computing Machinery, New York, NY, USA, SIGMOD ’20, pp. 2629–2636 (2020)
Poulin, B., Eisner, R., Szafron, D., Lu, P., Greiner, R., Wishart, D.S., Fyshe, A., Pearcy, B., MacDonell, C., Anvik, J.: Visual explanation of evidence with additive classifiers. In: Proceedings of the National Conference on Artificial Intelligence, Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999, vol. 21, p. 1822 (2006)
Ribeiro, M. T., Singh, S., Guestrin, C.: "why should I trust you?": explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13–17, 2016, pp. 1135–1144 (2016)
Saha, D., Floratou, A., Sankaranarayanan, K., Minhas, U.F., Mittal, A.R., Özcan, F.: ATHENA: an ontology-driven system for natural language querying over relational data stores. Proc. VLDB Endow. 9(12), 1209–1220 (2016)
Article Google Scholar
Scholak, T., Schucher, N., Bahdanau, D.: PICARD: parsing incrementally for constrained auto-regressive decoding from language models. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 9895–9901. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic (2021)
Sen, J., Lei, C., Quamar, A., Özcan, F., Efthymiou, V., Dalmia, A., Stager, G., Mittal, A., Saha, D., Sankaranarayanan, K.: ATHENA++: natural language querying for complex nested SQL queries. Proc. VLDB Endow. 13(12), 2747–2759 (2020)
Article Google Scholar
Sheinin, V., Khorashani, E., Yeo, H., Xu, K., Vo, N.P.A., Popescu, O.: Quest: a natural language interface to relational databases. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) (2018)
Usta, A., Karakayali, A., Ulusoy, O.: DBTagger: multi-task learning for keyword mapping in NLIDBs using bi-directional recurrent neural networks. Proc. VLDB Endow. 14(5), 813–821 (2021)
Article Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates Inc, New York (2017)
Google Scholar
Wang, B., Shin, R., Liu, X., Polozov, O., Richardson, M.: RAT-SQL: relation-aware schema encoding and linking for text-to-SQL parsers. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7567–7578. Association for Computational Linguistics, Online (2020)
Weir, N., Utama, P., Galakatos, A., Crotty, A., Ilkhechi, A., Ramaswamy, S., Bhushan, R., Geisler, N., Hättasch, B., Eger, S., Cetintemel, U., Binnig, C.: DBPal: a Fully Pluggable NL2SQL training pipeline. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, Association for Computing Machinery, New York, NY, USA, SIGMOD ’20, pp. 2347–2361 (2020)
Wen, Y., Zhu, X., Roy, S., Yang, J.: Interactive summarization and exploration of top aggregate query answers. Proc. VLDB Endow. 11(13), 2196–2208 (2018)
Article Google Scholar
Xu, X., Liu, C., Song, D.: SQLNet: generating structured queries from natural language without reinforcement learning. arXiv preprint arXiv:1711.04436 (2017)
Yaghmazadeh, N., Wang, Y., Dillig, I., Dillig, T.: SQLizer: query synthesis from natural language. Proc. ACM Program. Lang. 1(OOPSLA), 63:1-63:26 (2017)
Article Google Scholar
Yavuz, S., Gur, I., Su, Y., Yan, X.: What it takes to achieve 100% condition accuracy on WikiSQL. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 1702–1711. Association for Computational Linguistics, Brussels (2018)
Yin, P., Neubig, G., Yih, Wt., Riedel, S.: TaBERT: pretraining for joint understanding of textual and tabular data. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 8413–8426. Association for Computational Linguistics, Online (2020)
Yu, T., Li, Z., Zhang, Z., Zhang, R., Radev, D.: TypeSQL: Knowledge-based Type-aware Neural Text-to-SQL Generation, pp. 588–594. Association for Computational Linguistics, New Orleans (2018)
Google Scholar
Yu, T., Yasunaga, M., Yang, K., Zhang, R., Wang, D., Li, Z., Radev, D.: SyntaxSQLNet: syntax tree networks for complex and cross-domain text-to-SQL task. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 1653–1663. Association for Computational Linguistics, Brussels (2018)
Yu, T., Zhang, R., Yang, K., Yasunaga, M., Wang, D., Li, Z., Ma, J., Li, I., Yao, Q., Roman, S., Zhang, Z., Radev, D.: Spider: A Large-scale Human-Labeled Dataset for Complex and Cross-domain Semantic Parsing and Text-to-SQL Task, pp. 3911–3921. Association for Computational Linguistics, Brussels (2018)
Google Scholar
Zeiler, M. D.: ADADELTA: an adaptive learning rate method. arXiv:1212.5701 (2012)
Zhong, V., Xiong, C., Socher, R.: Seq2SQL: generating structured queries from natural language using reinforcement learning. arXiv preprint arXiv:1709.00103 (2017)

Download references

Acknowledgements

This research is supported by The Scientific and Technological Research Council of Türkiye (TÜBİTAK) under the grant no 118E724.

Author information

Authors and Affiliations

University of Waterloo, Waterloo, ON, Canada
Arif Usta
The Central Bank of the Republic of Türkiye, Ankara, Turkey
Akifhan Karakayali
Bilkent University, Ankara, Turkey
Özgür Ulusoy

Authors

Arif Usta
View author publications
You can also search for this author in PubMed Google Scholar
Akifhan Karakayali
View author publications
You can also search for this author in PubMed Google Scholar
Özgür Ulusoy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arif Usta.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Usta, A., Karakayali, A. & Ulusoy, Ö. xDBTagger: explainable natural language interface to databases using keyword mappings and schema graph. The VLDB Journal 33, 301–321 (2024). https://doi.org/10.1007/s00778-023-00809-w

Download citation

Received: 10 October 2022
Revised: 26 April 2023
Accepted: 31 July 2023
Published: 23 August 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s00778-023-00809-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

xDBTagger: explainable natural language interface to databases using keyword mappings and schema graph

Abstract

Access this article

Similar content being viewed by others

COMBINE: A Pipeline for SQL Generation from Natural Language

Translating synthetic natural language to database queries with a polyglot deep learning framework

Task-Driven Neural Natural Language Interface to Database

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

xDBTagger: explainable natural language interface to databases using keyword mappings and schema graph

Abstract

Access this article

Similar content being viewed by others

COMBINE: A Pipeline for SQL Generation from Natural Language

Translating synthetic natural language to database queries with a polyglot deep learning framework

Task-Driven Neural Natural Language Interface to Database

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation