Abstract
The key challenge of cross-domain context-dependent text-to-SQL generation tasks lies in capturing the relation of natural language utterance and SQL queries in different turns. A line of works attempt to combat this challenge by capturing the overlaps among consecutively generated SQL queries. Existing models sequentially generate the SQL query for a single turn and model the SQL overlaps via copying tokens or segments generated in previous turns. However, they are not flexible enough to capture various overlapping granularities, e.g., columns, filters, or even the whole query, as they neglect the intrinsic structures inhabited in SQL queries. In this paper, we employ tree-structured intermediate representations of SQL queries, i.e., SemQL, for SQL generation and propose a novel subtree-copy mechanism to characterize the SQL overlaps. At each turn, we encode the interaction questions and previously generated trees as context and decode the SemQL tree in a top-down fashion. Each node is either generated according to SemQL grammar or copied from previously generated SemQL subtrees. Our model can capture various overlapping granularities by copying nodes at different levels of SemQL trees. We evaluate our approach on the SParC dataset and the experimental results show the superior performance of our model compared with state-of-the-art baselines.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Berant, J., Chou, A., Frostig, R., Liang, P.: Semantic parsing on freebase from question-answer pairs. In: Proceedings of EMNLP 2013
Bogin, B., Gardner, M., Berant, J.: Global reasoning over database structures for text-to-sql parsing. In: Proceedings of EMNLP-IJCNLP (2019)
Dahl, D.A., et al.: Expanding the scope of the ATIS task: The ATIS-3 corpus. In: Proceedings of Human Language Technology (1994)
Dong, L., Lapata, M.: Coarse-to-fine decoding for neural semantic parsing. In: Proceedings of ACL (2018)
Dong, L., Lapata, M.: Language to logical form with neural attention. In: Proceedings of ACL (2016)
Finegan-Dollak, C., et al.: Improving text-to-SQL evaluation methodology. In: Proceedings of the ACL (2018)
Fried, D., Andreas, J., Klein, D.: Unified pragmatic models for generating and following instructions. In: Proceedings of NAACL-HLT (2018)
Guo, J., et al.: Towards complex text-to-sql in cross-domain database with intermediate representation. In: Proceedings of ACL (2019)
Hemphill, C.T., Godfrey, J.J., Doddington, G.R.: The ATIS spoken language systems pilot corpus. In: Proceedings of Speech and Natural Language (1990)
Huang, H., Choi, E., Yih, W.: Flowqa: grasping flow in history for conversational machine comprehension. In: Proceedings of ICLR (2019)
Iyer, S., Konstas, I., Cheung, A., Zettlemoyer, L.: Mapping language to code in programmatic context. In: Proceedings of EMNLP (2018)
Iyyer, M., Yih, W., Chang, M.: Search-based neural structured learning for sequential question answering. In: Proceedings of ACL (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of ICLR (2015)
Kwiatkowski, T., Zettlemoyer, L.S., Goldwater, S., Steedman, M.: Lexical generalization in CCG grammar induction for semantic parsing. In: Proceedings of EMNLP (2011)
Long, R., Pasupat, P., Liang, P.: Simpler context-dependent logical forms via model projections. In: Proceedings of ACL (2016)
Miller, S., Stallard, D., Bobrow, R.J., Schwartz, R.M.: A fully statistical approach to natural language interfaces. In: Proceedings of ACL (1996)
Suhr, A., Artzi, Y.: Situated mapping of sequential instructions to actions with single-step reward observation. In: Proceedings of ACL (2018)
Suhr, A., Iyer, S., Artzi, Y.: Learning to map context-dependent sentences to executable formal queries. In: Proceedings of NAACL-HLT (2018)
Sun, Y., et al.: Semantic parsing with syntax- and table-aware SQL generation. In: Proceedings of ACL (2018)
Wang, B., Shin, R., Liu, X., Polozov, O., Richardson, M.: RAT-SQL: relation-aware schema encoding and linking for text-to-sql parsers. CoRR abs/1911.04942
Yavuz, S., Gur, I., Su, Y., Yan, X.: What it takes to achieve 100 percent condition accuracy on wikisql. In: Proceedings of EMNLP (2018)
Yin, P., Neubig, G.: A syntactic neural model for general-purpose code generation. In: Proceedings of ACL (2017)
Yu, T., Li, Z., Zhang, Z., Zhang, R., Radev, D.: TypeSQL: Knowledge-based type-aware neural text-to-SQL generation. In: Proceedings of NAACL (2018)
Yu, T., et al.: SyntaxSQLNet: Syntax tree networks for complex and cross-domain text-to-SQL task. In: Proceedings of EMNLP (2018)
Yu, T., et al.: Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task. In: Proceedings of EMNLP (2018)
Yu, T., et al.: Sparc: Cross-domain semantic parsing in context. In: Proceedings of ACL (2019)
Zelle, J.M., Mooney, R.J.: Learning to parse database queries using inductive logic programming. In: Proceedings of AAAI (1996)
Zettlemoyer, L.S., Collins, M.: Learning context-dependent mappings from sentences to logical form. In: Proceedings of ACL (2009)
Zettlemoyer, L.S., Collins, M.: Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars. In: Proceedings of UAI (2005)
Zhang, R., et al.: Editing-based SQL query generation for cross-domain context-dependent questions. In: Proceedings of EMNLP-IJCNLP (2019)
Zhong, V., Xiong, C., Socher, R.: Seq2sql: Generating structured queries from natural language using reinforcement learning. CoRR abs/1709.00103 (2017)
Acknowledgments
This paper is funded by the National Natural Science Foundation of China under Grant Nos. 91746301, 62002347 and 61902380. Huawei Shen is also funded by Beijing Academy of Artificial Intelligence (BAAI) and K.C. Wong Education Foundation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhao, R., Gao, J., Shen, H., Cheng, X. (2021). Capturing SQL Query Overlapping via Subtree Copy for Cross-Domain Context-Dependent SQL Generation. In: Karlapalem, K., et al. Advances in Knowledge Discovery and Data Mining. PAKDD 2021. Lecture Notes in Computer Science(), vol 12713. Springer, Cham. https://doi.org/10.1007/978-3-030-75765-6_53
Download citation
DOI: https://doi.org/10.1007/978-3-030-75765-6_53
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75764-9
Online ISBN: 978-3-030-75765-6
eBook Packages: Computer ScienceComputer Science (R0)