Skip to main content

COMBINE: A Pipeline for SQL Generation from Natural Language

  • Conference paper
  • First Online:
Advances in Computing and Data Sciences (ICACDS 2021)

Abstract

Accessing data stored in relational databases requires an understanding of the database schema and mainly a query language such as SQL, which, while powerful, is difficult to master. In this sense, recent researches try to approach systems to facilitate this task, in particular by making Text-to-SQL models that attempt to map a question in Natural Language (NL) to the corresponding SQL query. In this paper, we present COMBINE, a pipeline for SQL generation from NL, in which we combine two existing models, RATSQL (We used the version RATSQL v3+BERT; paper’s url: arxiv.org/abs/1911.04942.) and BRIDGE (We used the version BRIDGE v1+BERT; paper’s url: aclweb.org/anthology/2020.findings-emnlp.438/.), that are based on recent advances in Deep Learning (DL) for Natural Language Processing (NLP). Our model is evaluated on the Spider challenge, using Exact Matching Accuracy (EMA) and Execution Accuracy (EA) metrics. Our experimental evaluation demonstrates that COMBINE outperforms the two used models in the same challenge, and at the time of writing, achieving the state of the art in EA with 70%, and competitive result in EMA with 71.4%, on Spider Dev Set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://drive.google.com/uc?export=download&id=1_AckYkinAnhqmRQtGsQgUKAnTHxxX5J0.

References

  1. Wang, B., Shin, R., Liu, X., et al.: Rat-SQL: relation-aware schema encoding and linking for text-to-SQL parsers. arXiv preprint arXiv:1911.04942 (2019)

  2. Lin, X.V., Socher, R., Xiong, C.: Bridging textual and tabular data for cross-domain text-to-SQL semantic parsing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, EMNLP 2020, Online Event, 16–20 November 2020, pp. 4870–4888. Association for Computational Linguistics (2020)

    Google Scholar 

  3. Zelle, J.M., Mooney, R.J.: Learning to parse database queries using inductive logic programming. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI) (1996)

    Google Scholar 

  4. Zettlemoyer, L.S., Collins, M.: Learning to map sentences to logical form: structured classification with probabilistic categorial grammars. In: Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence (UAI) (2005)

    Google Scholar 

  5. Clarke, J., Goldwasser, D., Chang, M.-W., Roth, D.: Driving semantic parsing from the world’s response. In: Proceedings of the Fourteenth Conference on Computational Natural Language Learning (CoNLL) (2010)

    Google Scholar 

  6. Liang, P., Jordan, M., Klein, D.: Learning dependency-based compositional semantics. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT), Portland, Oregon, USA, pp. 590–599. Association for Computational Linguistics (2011)

    Google Scholar 

  7. Yih, W., He, X., Meek, C.: Semantic parsing for single-relation question answering. In: ACL (2014)

    Google Scholar 

  8. Matuszek, C., Herbst, E., Zettlemoyer, L., Fox, D.: Learning to parse natural language commands to a robot control system. In: Desai, J., Dudek, G., Khatib, O., Kumar, V. (eds.) Experimental Robotics. Springer Tracts in Advanced Robotics, vol. 88, pp. 403–415. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-319-00065-7_28

  9. Graesser, A.C., Chipman, P., Haynes, B.C., Olney, A.: AutoTutor: an intelligent tutoring system with mixed-initiative dialogue. IEEE Trans. Educ. 48, 612–618 (2005)

    Google Scholar 

  10. Ramakrsihnan, R., Donjerkovic, D., Ranganathan, A., Beyer, K.S., Krishnaprasad, M.: SRQL: sorted relational query language. In: Proceedings of the Tenth International Conference on Scientific and Statistical Database Management (Cat. No. 98TB100243), pp. 84–95. IEEE (1998)

    Google Scholar 

  11. Xu, X., Liu, C., Song, D.: SQLNet: generating structured queries from natural language without reinforcement learning. Computing Research Repository. arXiv:1711.04436 (2017)

  12. Yu, T., Li, Z., Zhang, Z., Zhang, R., Radev, D.: TypeSQL: knowledge-based type-aware neural text-to-SQL generation. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 588–594 (2018a)

    Google Scholar 

  13. Shi, T., Tatwawadi, K., Chakrabarti, K., Mao, Y., Polozov, O., Chen, W.: IncSQL: training incremental text-to-SQL parsers with non-deterministic oracles. Computing Research Repository. arXiv:1809.05054 (2018)

  14. Dong, L., Lapata, M.: Coarse-to-fine decoding for neural semantic parsing. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pp. 731–742 (2018)

    Google Scholar 

  15. Hwang, W., Yim, J., Park, S., Seo, M.: A comprehensive exploration on WikiSQL with table-aware word contextualization. Computing Research Repository. arXiv:1902.01069 (2019)

  16. He, P., Mao, Y., Chakrabarti, K., Chen, W.: X-SQL: reinforce schema representation with context. Computing Research Repository. arXiv:1908.08113 (2019)

  17. Yu, T., et al.: Spider: a large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3911–3921 (2018c)

    Google Scholar 

  18. Warren, D.H.D., Pereira, F.C.N.: An efficient easily adaptable system for interpreting natural language queries. Comput. Linguist. 8(34), 110122 (1982)

    Google Scholar 

  19. Androutsopoulos, I., Ritchie, G.D., Thanisch, P.: Natural language interfaces to databases - an introduction. CoRR, cmp-lg/9503016 (1995)

    Google Scholar 

  20. Popescu, A.-M., Armanasu, A., Etzioni, O., Ko, D., Yates, A.: Modern natural language interfaces to databases: composing statistical parsing with semantic tractability. In: Proceedings of the 20th International Conference on Computational Linguistics, COLING 04, USA, p. 141es. Association for Computational Linguistics (2004)

    Google Scholar 

  21. Simitsis, A., Koutrika, G., Ioannidis, Y.: Précis: from unstructured keywords as queries to structured databases as answers. VLDB J. Int. J. Very Large Data Bases 17(1), 117–149 (2008)

    Google Scholar 

  22. Blunschi, L., Jossen, C., Kossmann, D., Mori, M., Stockinger, K.: Soda: generating SQL for business users. Proc. VLDB Endow. 5(10), 932–943 (2012)

    Article  Google Scholar 

  23. Bast, H., Haussmann, E.: More accurate question answering on freebase. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1431–1440. ACM (2015)

    Google Scholar 

  24. Zheng, W., Cheng, H., Zou, L., Yu, J.X., Zhao, K.: Natural language question/answering: let users talk with the knowledge graph. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 217–226. ACM (2017)

    Google Scholar 

  25. Song, D., et al.: TR discover: a natural language interface for querying and analyzing interlinked datasets. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9367, pp. 21–37. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25010-6_2

  26. Ferré, S.: Sparklis: an expressive query builder for SPARQL endpoints with guidance in natural language. Semant. Web 8(3), 405–418 (2017)

    Google Scholar 

  27. Affolter, K., Stockinger, K., Bernstein, A.: A comparative survey of recent natural language interfaces for databases. VLDB J. 28(5), 793–819 (2019). https://doi.org/10.1007/s00778-019-00567-8

    Article  Google Scholar 

  28. Dong, L., Lapata, M.: Language to logical form with neural attention. CoRR, abs/1601.01280 (2016)

    Google Scholar 

  29. Zhong, V., Xiong, C., Socher, R.: Seq2SQL: generating structured queries from natural language using reinforcement learning. CoRR, abs/1709.00103 (2017)

    Google Scholar 

  30. Vaswani, A., et al.: Attention is all you need. CoRR, abs/1706.03762 (2017)

    Google Scholar 

  31. Hwang, W., Yim, J., Park, S., Seo, M.: A comprehensive exploration on WikiSQL with table-aware word contextualization. CoRR, abs/1902.01069 (2019)

    Google Scholar 

  32. Deriu, J., et al.: A methodology for creating question answering corpora using inverse data annotation (2020)

    Google Scholar 

  33. Guo, J., et al.: Towards complex text-to-SQL in cross-domain database with intermediate representation. CoRR, abs/1905.08205 (2019)

    Google Scholar 

  34. Yin, P., Neubig, G., Yih, W.T., Riedel, S.: TABERT: pretraining for joint understanding of textual and tabular data. arXiv preprint arXiv:2005.08314 (2020)

  35. Herzig, J., Nowak, P.K., Müller, T., Piccinno, F., Eisenschlos, J.M.: TAPAS: weakly supervised table parsing via pre-training. arXiv preprint arXiv:2004.02349 (2020)

  36. Yu, T., Wu, C.-S., Lin, X.V., et al.: GraPPa: grammar-augmented pre-training for table semantic parsing. arXiv preprint arXiv:2009.13845 (2020)

  37. Choi, D.H., Shin, M.C., Kim, E.G., Shin, D.R.: RYANSQL: recursively applying sketch-based slot fillings for complex text-to-SQL in cross-domain databases. CoRR, abs/2004.03125 (2020)

    Google Scholar 

  38. Rubin, O., Berant, J.: SmBoP: semiautoregressive bottom-up semantic parsing. CoRR, abs/2010.12412 (2020)

    Google Scholar 

  39. Wang, B., Shin, R., Liu, X., Polozov, O., Richardson, M.: RAT-SQL: relation-aware schema encoding and linking for text-to-SQL parsers. ArXiv, abs/1911.04942 (2019)

    Google Scholar 

  40. Zhong, V., Lewis, M., Wang, S.I., et al.: Grounded adaptation for zero-shot executable semantic parsing. arXiv preprint arXiv:2009.07396 (2020)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Youssef Mellah .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mellah, Y., Rhouati, A., Ettifouri, E.H., Bouchentouf, T., Belkasmi, M.G. (2021). COMBINE: A Pipeline for SQL Generation from Natural Language. In: Singh, M., Tyagi, V., Gupta, P.K., Flusser, J., Ören, T., Sonawane, V.R. (eds) Advances in Computing and Data Sciences. ICACDS 2021. Communications in Computer and Information Science, vol 1441. Springer, Cham. https://doi.org/10.1007/978-3-030-88244-0_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-88244-0_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-88243-3

  • Online ISBN: 978-3-030-88244-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics