Abstract
Accessing data stored in relational databases requires an understanding of the database schema and mainly a query language such as SQL, which, while powerful, is difficult to master. In this sense, recent researches try to approach systems to facilitate this task, in particular by making Text-to-SQL models that attempt to map a question in Natural Language (NL) to the corresponding SQL query. In this paper, we present COMBINE, a pipeline for SQL generation from NL, in which we combine two existing models, RATSQL (We used the version RATSQL v3+BERT; paper’s url: arxiv.org/abs/1911.04942.) and BRIDGE (We used the version BRIDGE v1+BERT; paper’s url: aclweb.org/anthology/2020.findings-emnlp.438/.), that are based on recent advances in Deep Learning (DL) for Natural Language Processing (NLP). Our model is evaluated on the Spider challenge, using Exact Matching Accuracy (EMA) and Execution Accuracy (EA) metrics. Our experimental evaluation demonstrates that COMBINE outperforms the two used models in the same challenge, and at the time of writing, achieving the state of the art in EA with 70%, and competitive result in EMA with 71.4%, on Spider Dev Set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Wang, B., Shin, R., Liu, X., et al.: Rat-SQL: relation-aware schema encoding and linking for text-to-SQL parsers. arXiv preprint arXiv:1911.04942 (2019)
Lin, X.V., Socher, R., Xiong, C.: Bridging textual and tabular data for cross-domain text-to-SQL semantic parsing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, EMNLP 2020, Online Event, 16–20 November 2020, pp. 4870–4888. Association for Computational Linguistics (2020)
Zelle, J.M., Mooney, R.J.: Learning to parse database queries using inductive logic programming. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI) (1996)
Zettlemoyer, L.S., Collins, M.: Learning to map sentences to logical form: structured classification with probabilistic categorial grammars. In: Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence (UAI) (2005)
Clarke, J., Goldwasser, D., Chang, M.-W., Roth, D.: Driving semantic parsing from the world’s response. In: Proceedings of the Fourteenth Conference on Computational Natural Language Learning (CoNLL) (2010)
Liang, P., Jordan, M., Klein, D.: Learning dependency-based compositional semantics. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT), Portland, Oregon, USA, pp. 590–599. Association for Computational Linguistics (2011)
Yih, W., He, X., Meek, C.: Semantic parsing for single-relation question answering. In: ACL (2014)
Matuszek, C., Herbst, E., Zettlemoyer, L., Fox, D.: Learning to parse natural language commands to a robot control system. In: Desai, J., Dudek, G., Khatib, O., Kumar, V. (eds.) Experimental Robotics. Springer Tracts in Advanced Robotics, vol. 88, pp. 403–415. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-319-00065-7_28
Graesser, A.C., Chipman, P., Haynes, B.C., Olney, A.: AutoTutor: an intelligent tutoring system with mixed-initiative dialogue. IEEE Trans. Educ. 48, 612–618 (2005)
Ramakrsihnan, R., Donjerkovic, D., Ranganathan, A., Beyer, K.S., Krishnaprasad, M.: SRQL: sorted relational query language. In: Proceedings of the Tenth International Conference on Scientific and Statistical Database Management (Cat. No. 98TB100243), pp. 84–95. IEEE (1998)
Xu, X., Liu, C., Song, D.: SQLNet: generating structured queries from natural language without reinforcement learning. Computing Research Repository. arXiv:1711.04436 (2017)
Yu, T., Li, Z., Zhang, Z., Zhang, R., Radev, D.: TypeSQL: knowledge-based type-aware neural text-to-SQL generation. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 588–594 (2018a)
Shi, T., Tatwawadi, K., Chakrabarti, K., Mao, Y., Polozov, O., Chen, W.: IncSQL: training incremental text-to-SQL parsers with non-deterministic oracles. Computing Research Repository. arXiv:1809.05054 (2018)
Dong, L., Lapata, M.: Coarse-to-fine decoding for neural semantic parsing. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pp. 731–742 (2018)
Hwang, W., Yim, J., Park, S., Seo, M.: A comprehensive exploration on WikiSQL with table-aware word contextualization. Computing Research Repository. arXiv:1902.01069 (2019)
He, P., Mao, Y., Chakrabarti, K., Chen, W.: X-SQL: reinforce schema representation with context. Computing Research Repository. arXiv:1908.08113 (2019)
Yu, T., et al.: Spider: a large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3911–3921 (2018c)
Warren, D.H.D., Pereira, F.C.N.: An efficient easily adaptable system for interpreting natural language queries. Comput. Linguist. 8(34), 110122 (1982)
Androutsopoulos, I., Ritchie, G.D., Thanisch, P.: Natural language interfaces to databases - an introduction. CoRR, cmp-lg/9503016 (1995)
Popescu, A.-M., Armanasu, A., Etzioni, O., Ko, D., Yates, A.: Modern natural language interfaces to databases: composing statistical parsing with semantic tractability. In: Proceedings of the 20th International Conference on Computational Linguistics, COLING 04, USA, p. 141es. Association for Computational Linguistics (2004)
Simitsis, A., Koutrika, G., Ioannidis, Y.: Précis: from unstructured keywords as queries to structured databases as answers. VLDB J. Int. J. Very Large Data Bases 17(1), 117–149 (2008)
Blunschi, L., Jossen, C., Kossmann, D., Mori, M., Stockinger, K.: Soda: generating SQL for business users. Proc. VLDB Endow. 5(10), 932–943 (2012)
Bast, H., Haussmann, E.: More accurate question answering on freebase. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1431–1440. ACM (2015)
Zheng, W., Cheng, H., Zou, L., Yu, J.X., Zhao, K.: Natural language question/answering: let users talk with the knowledge graph. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 217–226. ACM (2017)
Song, D., et al.: TR discover: a natural language interface for querying and analyzing interlinked datasets. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9367, pp. 21–37. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25010-6_2
Ferré, S.: Sparklis: an expressive query builder for SPARQL endpoints with guidance in natural language. Semant. Web 8(3), 405–418 (2017)
Affolter, K., Stockinger, K., Bernstein, A.: A comparative survey of recent natural language interfaces for databases. VLDB J. 28(5), 793–819 (2019). https://doi.org/10.1007/s00778-019-00567-8
Dong, L., Lapata, M.: Language to logical form with neural attention. CoRR, abs/1601.01280 (2016)
Zhong, V., Xiong, C., Socher, R.: Seq2SQL: generating structured queries from natural language using reinforcement learning. CoRR, abs/1709.00103 (2017)
Vaswani, A., et al.: Attention is all you need. CoRR, abs/1706.03762 (2017)
Hwang, W., Yim, J., Park, S., Seo, M.: A comprehensive exploration on WikiSQL with table-aware word contextualization. CoRR, abs/1902.01069 (2019)
Deriu, J., et al.: A methodology for creating question answering corpora using inverse data annotation (2020)
Guo, J., et al.: Towards complex text-to-SQL in cross-domain database with intermediate representation. CoRR, abs/1905.08205 (2019)
Yin, P., Neubig, G., Yih, W.T., Riedel, S.: TABERT: pretraining for joint understanding of textual and tabular data. arXiv preprint arXiv:2005.08314 (2020)
Herzig, J., Nowak, P.K., Müller, T., Piccinno, F., Eisenschlos, J.M.: TAPAS: weakly supervised table parsing via pre-training. arXiv preprint arXiv:2004.02349 (2020)
Yu, T., Wu, C.-S., Lin, X.V., et al.: GraPPa: grammar-augmented pre-training for table semantic parsing. arXiv preprint arXiv:2009.13845 (2020)
Choi, D.H., Shin, M.C., Kim, E.G., Shin, D.R.: RYANSQL: recursively applying sketch-based slot fillings for complex text-to-SQL in cross-domain databases. CoRR, abs/2004.03125 (2020)
Rubin, O., Berant, J.: SmBoP: semiautoregressive bottom-up semantic parsing. CoRR, abs/2010.12412 (2020)
Wang, B., Shin, R., Liu, X., Polozov, O., Richardson, M.: RAT-SQL: relation-aware schema encoding and linking for text-to-SQL parsers. ArXiv, abs/1911.04942 (2019)
Zhong, V., Lewis, M., Wang, S.I., et al.: Grounded adaptation for zero-shot executable semantic parsing. arXiv preprint arXiv:2009.07396 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Mellah, Y., Rhouati, A., Ettifouri, E.H., Bouchentouf, T., Belkasmi, M.G. (2021). COMBINE: A Pipeline for SQL Generation from Natural Language. In: Singh, M., Tyagi, V., Gupta, P.K., Flusser, J., Ören, T., Sonawane, V.R. (eds) Advances in Computing and Data Sciences. ICACDS 2021. Communications in Computer and Information Science, vol 1441. Springer, Cham. https://doi.org/10.1007/978-3-030-88244-0_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-88244-0_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88243-3
Online ISBN: 978-3-030-88244-0
eBook Packages: Computer ScienceComputer Science (R0)