Explaining Natural Language query results

Deutch, Daniel; Frost, Nave; Gilad, Amir

doi:10.1007/s00778-019-00584-7

Explaining Natural Language query results

Special Issue Paper
Published: 02 November 2019

Volume 29, pages 485–508, (2020)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

650 Accesses
11 Citations
1 Altmetric
Explore all metrics

Abstract

Multiple lines of research have developed Natural Language (NL) interfaces for formulating database queries. We build upon this work, but focus on presenting a highly detailed form of the answers in NL. The answers that we present are importantly based on the provenance of tuples in the query result, detailing not only the results but also their explanations. We develop a novel method for transforming provenance information to NL, by leveraging the original NL query structure. Furthermore, since provenance information is typically large and complex, we present two solutions for its effective presentation as NL text: one that is based on provenance factorization, with novel desiderata relevant to the NL case and one that is based on summarization. We have implemented our solution in an end-to-end system supporting questions, answers and provenance, all expressed in NL. Our experiments, including a user study, indicate the quality of our solution and its scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Fig. 10

Fig. 14

A comparative survey of recent natural language interfaces for databases

Article Open access 28 August 2019

Advanced Query Functionalities in Natural Logic Knowledge Bases

Article Open access 18 April 2024

Towards the Domain Agnostic Generation of Natural Language Explanations from Provenance Graphs for Casual Users

Notes

We are extremely grateful to Fei Li and H.V. Jagadish for generously sharing with us the source code of NaLIR, and providing invaluable support.

References

Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases: The Logical Level. Addison-Wesley Longman Publishing Co., Inc., Boston (1995)
Google Scholar
Agrawal, S., Chaudhuri, S., Das, G.: Dbxplorer: A system for keyword-based search over relational databases. In: ICDE, pp. 5–16 (2002)
Ailamaki, A., Ioannidis, Y.E., Livny, M.: Scientific workflow management by database management. In: SSDBM, pp. 190–199 (1998)
Ainy, E., Bourhis, P., Davidson, S.B., Deutch, D., Milo, T.: Approximated summarization of data provenance. In: CIKM, pp. 483–492 (2015)
Amsterdamer, Y., Davidson, S.B., Deutch, D., Milo, T., Stoyanovich, J., Tannen, V.: Putting lipstick on pig: enabling database-style workflow provenance. Proc. VLDB Endow. 5, 346–357 (2011)
Article Google Scholar
Amsterdamer, Y., Kukliansky, A., Milo, T.: A natural language interface for querying general and individual knowledge. VLDB 8, 1430–1441 (2015)
Google Scholar
Bakibayev, N., Olteanu, D., Zavodny, J.: FDB: A query engine for factorised relational databases. PVLDB, pp. 1232–1243 (2012)
Article Google Scholar
Beltagy, I., Erk, K., Mooney, R.: Semantic parsing using distributional semantics and probabilistic logic. In: Proceedings of the ACL 2014 workshop on semantic parsing, pp. 7–11 (2014)
Benjelloun, O., Sarma, A., Halevy, A., Theobald, M., Widom, J.: Databases with uncertainty and lineage. VLDB J. 17, 243–264 (2008)
Article Google Scholar
Berant, J., Liang, P.: Semantic parsing via paraphrasing. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1415–1425 (2014)
Bhalotia, G., Hulgeri, A., Nakhe, C., Chakrabarti, S., Sudarshan, S.: Keyword searching and browsing in databases using BANKS. In: ICDE, pp. 431–440 (2002)
Brgisser, P., Clausen, M., Shokrollahi, M.A.: Algebraic Complexity Theory. Springer, New York (2010)
Google Scholar
Buneman, P., Khanna, S., chiew Tan, W.: Why and where: a characterization of data provenance. In: ICDT, pp. 316–330 (2001)
Chapman, A., Jagadish, H.V.: Why not? In: SIGMOD, pp. 523–534 (2009)
Chapman, A.P., Jagadish, H.V., Ramanan, P.: Efficient provenance storage. In: SIGMOD, pp. 993–1006 (2008)
Cheney, J., Chiticariu, L., Tan, W.C.: Provenance in databases: why, how, and where. Found. Trends Databases 1, 379–474 (2009)
Article Google Scholar
Chiticariu, L., Tan, W.C., Vijayvargiya, G.: DBNotes: a post-it system for relational databases based on provenance. In: SIGMOD, pp. 942–944 (2005)
Cohen-Boulakia, S., Biton, O., Cohen, S., Davidson, S.: Addressing the provenance challenge using zoom. Concurr. Comput.: Pract. Exp. 20, 497–506 (2008)
Article Google Scholar
Cohn, D., Hull, R.: Business artifacts: a data-centric approach to modeling business operations and processes. IEEE Data Eng. Bull. 32, 3–9 (2009)
Google Scholar
Davidson, S.B., Boulakia, S.C., Eyal, A., Ludäscher, B., McPhillips, T.M., Bowers, S., Anand, M.K., Freire, J.: Provenance in scientific workflow systems. IEEE Data Eng. Bull. 30, 44–50 (2007)
Google Scholar
Davidson, S.B., Freire, J.: Provenance and scientific workflows: challenges and opportunities. In: SIGMOD, pp. 1345–1350 (2008)
Deutch, D., Frost, N., Gilad, A.: Nlprov: natural language provenance. Proc. VLDB Endow. 9, 1900–1903 (2016)
Article Google Scholar
Deutch, D., Frost, N., Gilad, A.: Provenance for natural language queries. PVLDB 10(5), 577–588 (2017)
Google Scholar
Deutch, D., Frost, N., Gilad, A., Haimovich, T.: Nlprovenans: natural language provenance for non-answers. Proc. VLDB Endow. 11(12), 1986–1989 (2018)
Article Google Scholar
Deutch, D., Gilad, A., Moskovitch, Y.: Selective provenance for datalog programs using top-k queries. PVLDB 8, 1394–1405 (2015)
Google Scholar
Elbassioni, K., Makino, K., Rauf, I.: On the readability of monotone boolean formulae. JoCO 22, 293–304 (2011)
MathSciNet MATH Google Scholar
Fakas, G.J.: Automated generation of object summaries from relational databases: a novel keyword searching paradigm. In: ICDE, pp. 564–567 (2008)
Fakas, G.J., Cai, Z., Mamoulis, N.: Versatile size-\$l\$ object summariesfor relational keyword search. IEEE Trans. Knowl. Data Eng. 26(4), 1026–1038 (2014)
Article Google Scholar
Fakas, G.J., Cai, Z., Mamoulis, N.: Diverse and proportional size-l object summaries using pairwise relevance. VLDB J. 25(6), 791–816 (2016)
Article Google Scholar
Foster, I., Vockler, J., Wilde, M., Zhao, A.: Chimera: a virtual data system for representing, querying, and automating data derivation. In: SSDBM, pp. 37–46 (2002)
Franconi, E., Gardent, C., Juarez-Castro, X.I., Perez-Beltrachini, L.: Quelo natural language interface: generating queries and answer descriptions. In: Natural Language Interfaces for Web of Data (2014)
Giordani, A., Moschitti, A.: Translating questions to SQL queries with generative parsers discriminatively reranked. In: Proceedings of COLING 2012: Posters, pp. 401–410 (2012)
Glavic, B.: Big data provenance: challenges and implications for benchmarking. In: Specifying Big Data Benchmarks—First Workshop, WBDB, pp. 72–80 (2012)
Chapter Google Scholar
Glavic, B., Alonso, G.: Perm: processing provenance and data on the same data model through query rewriting. In: ICDE, pp. 174–185 (2009)
Glavic, B., Miller, R.J., Alonso, G.: Using SQL for efficient generation and querying of provenance information. In: Tannen, V., Wong, L., Libkin, L., Fan, W., Tan, W.C., Fourman, M. (eds.) In Search of Elegance in the Theory and Practice of Computation-Essays Dedicated to Peter Buneman, pp. 291–320. Springer, Berlin (2013)
Chapter Google Scholar
Green, T., Karvounarakis, G., Tannen, V.: Provenance semirings. In: PODS, pp. 31–40 (2007)
Green, T.J.: Containment of conjunctive queries on annotated relations. In: ICDT, pp. 296–309 (2009)
https://github.com/navefr/NL_Provenance/
Hemaspaandra, E., Schnoor, H.: Minimization for generalized Boolean formulas. In: IJCAI, pp. 566–571 (2011)
Herschel, M., Hlawatsch, M.: Provenance: on and behind the screens. In: SIGMOD, pp. 2213–2217 (2016)
Hristidis, V., Papakonstantinou, Y.: DISCOVER: keyword search in relational databases. In: VLDB, pp. 670–681 (2002)
Chapter Google Scholar
Hull, D., et al.: Taverna: a tool for building and running workflows of services. Nucl. Acids Res. 34, 729–732 (2006)
Article Google Scholar
Ives, Z.G., Haeberlen, A., Feng, T., Gatterbauer, W.: Querying provenance for ranking and recommending. In: TaPP, pp. 9–9 (2012)
Iyer, S., Konstas, I., Cheung, A., Krishnamurthy, J., Zettlemoyer, L.: Learning a neural semantic parser from user feedback. arXiv:1704.08760 (2017)
Joglekar, M., Garcia-Molina, H., Parameswaran, A.G.: Smart drill-down: a new data exploration operator. PVLDB 8(12), 1928–1931 (2015)
Google Scholar
Karvounarakis, G., Ives, Z.G., Tannen, V.: Querying data provenance. In: SIGMOD, pp. 951–962 (2010)
Kenig, B., Gal, A., Strichman, O.: A new class of lineage expressions over probabilistic databases computable in p-time. In: SUM, pp. 219–232 (2013)
Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Annual Meeting on Association for Computational Linguistics, pp. 423–430 (2003)
Koutrika, G., Simitsis, A., Ioannidis, Y.E.: Précis: the essence of a query answer. In: ICDE, pp. 69–78 (2006)
Koutrika, G., Simitsis, A., Ioannidis, Y.E.: Explaining structured queries in natural language. In: ICDE, pp. 333–344 (2010)
Küpper, D., Storbel, M., Rösner, D.: Nauda: a cooperative natural language interface to relational databases. In: SIGMOD, pp. 529–533
Kwasnikowska, N., den Bussche, J.V.: Mapping the NRC dataflow model to the open provenance model. In: IPAW, pp. 3–16 (2008)
Lee, S., Köhler, S., Ludäscher, B., Glavic, B.: A SQL-middleware unifying why and why-not provenance for first-order queries. In: 33rd IEEE International Conference on Data Engineering, ICDE 2017, San Diego, CA, USA, April 19–22, 2017, pp. 485–496 (2017)
Li, F., Jagadish, H.V.: Constructing an interactive natural language interface for relational databases. Proc. VLDB Endow. 8, 73–84 (2014)
Article Google Scholar
Liang, P., Jordan, M.I., Klein, D.: Learning dependency-based compositional semantics. Comput. Linguist. 39(2), 389–446 (2013)
Article MathSciNet Google Scholar
Mas. http://academic.research.microsoft.com
Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the penn treebank. Comput. Linguist. 19, 313–330 (1993)
Google Scholar
Marneffe, M., Maccartney, B., Manning, C.: Generating typed dependency parses from phrase structure parses. In: LREC, pp. 449–454 (2006)
Meliou, A., Song, Y., Suciu, D.: Tiresias: a demonstration of how-to queries. In: SIGMOD, pp. 709–712 (2012)
Missier, P., Paton, N.W., Belhajjame, K.: Fine-grained and efficient lineage querying of collection-based workflow provenance. In: EDBT, pp. 299–310 (2010)
Müller, T., Grust, T.: Provenance for SQL through abstract interpretation: value-less, but worthwhile. PVLDB 8, 1872–1875 (2015)
Google Scholar
Olteanu, D., Závodný, J.: Factorised representations of query results: size bounds and readability. In: ICDT, pp. 285–298 (2012)
Poon, H.: Grounded unsupervised semantic parsing. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 933–943 (2013)
Popescu, A.-M., Armanasu, A., Etzioni, O., Ko, D., Yates, A.: Modern natural language interfaces to databases: composing statistical parsing with semantic tractability. In: Proceedings of the 20th International Conference on Computational Linguistics, p. 141 (2004)
Popescu, A.-M., Etzioni, O., Kautz, H.: Towards a theory of natural language interfaces to databases. In: IUI, pp. 149–157 (2003)
Ré, C., Suciu, D.: Approximate lineage for probabilistic databases. Proc. VLDB Endow. 1, 797–808 (2008)
Article Google Scholar
Roy, S., Suciu, D.: A formal approach to finding explanations for database queries. In: SIGMOD, pp. 1579–1590 (2014)
Saha, D., Floratou, A., Sankaranarayanan, K., Minhas, U.F., Mittal, A.R., Özcan, F.: ATHENA: an ontology-driven system for natural language querying over relational data stores. PVLDB 9(12), 1209–1220 (2016)
Google Scholar
Saint-Paul, R., Raschia, G., Mouaddib, N.: Database summarization: the saintetiq system. In: ICDE, pp. 1475–1476 (2007)
Sellam, T., Kersten, M.L.: Have a chat with clustine, conversational engine to query large tables. In: HILDA, p. 2 (2016)
Simitsis, A., Koutrika, G.: Comprehensible answers to précis queries. In: CAiSE, pp. 142–156 (2006)
Simitsis, A., Koutrika, G., Alexandrakis, Y., Ioannidis, Y.E.: Synthesizing structured text from logical database subsets. In: EDBT, pp. 428–439 (2008)
Simitsis, A., Koutrika, G., Ioannidis, Y.E.: Précis: from unstructured keywords as queries to structured databases as answers. VLDB J. 17(1), 117–149 (2008)
Article Google Scholar
Simmhan, Y.L., Plale, B., Gannon, D.: Karma2: provenance management for data-driven workflows. Int. J. Web Serv. Res. 5, 1–22 (2008)
Article Google Scholar
Song, D., Schilder, F., Smiley, C.: Natural language question answering and analytics for diverse and interlinked datasets. In: NAACL, pp. 101–105 (2015)
Song, D., Schilder, F., Smiley, C., Brew, C., Zielund, T., Bretz, H., Martin, R., Dale, C., Duprey, J., Miller, T., Harrison, J.: TR discover: a natural language interface for querying and analyzing interlinked datasets. In: ISWC, pp. 21–37 (2015)
Wen, Y., Zhu, X., Roy, S., Yang, J.: Interactive summarization and exploration of top aggregate query answers. PVLDB 11(13), 2196–2208 (2018)
Google Scholar
Yih, S. W.-t., Chang, M.-W., He, X., Gao, J.: Semantic parsing via staged query graph generation: question answering with knowledge base (2015)
Zettlemoyer, L.S., Collins, M.: Learning to map sentences to logical form: structured classification with probabilistic categorial grammars. arXiv:1207.1420 (2012)
Zhong, V., Xiong, C., Socher, R.: Seq2sql: generating structured queries from natural language using reinforcement learning. arXiv:1709.00103 (2017)

Download references

Acknowledgements

This research has been funded by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant agreement No. 804302), the Israeli Science Foundation (ISF) Grant No. 978/17, and the Google Ph.D. Fellowship. The contribution of Amir Gilad is part of a Ph.D. thesis research conducted at Tel Aviv University.

Author information

Authors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Daniel Deutch, Nave Frost & Amir Gilad

Authors

Daniel Deutch
View author publications
You can also search for this author in PubMed Google Scholar
Nave Frost
View author publications
You can also search for this author in PubMed Google Scholar
Amir Gilad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amir Gilad.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Deutch, D., Frost, N. & Gilad, A. Explaining Natural Language query results. The VLDB Journal 29, 485–508 (2020). https://doi.org/10.1007/s00778-019-00584-7

Download citation

Received: 03 December 2018
Revised: 02 July 2019
Accepted: 19 October 2019
Published: 02 November 2019
Issue Date: January 2020
DOI: https://doi.org/10.1007/s00778-019-00584-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Explaining Natural Language query results

Abstract

Access this article

Similar content being viewed by others

A comparative survey of recent natural language interfaces for databases

Advanced Query Functionalities in Natural Logic Knowledge Bases

Towards the Domain Agnostic Generation of Natural Language Explanations from Provenance Graphs for Casual Users

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Explaining Natural Language query results

Abstract

Access this article

Similar content being viewed by others

A comparative survey of recent natural language interfaces for databases

Advanced Query Functionalities in Natural Logic Knowledge Bases

Towards the Domain Agnostic Generation of Natural Language Explanations from Provenance Graphs for Casual Users

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation