Skip to main content

Explaining Natural Language query results

Abstract

Multiple lines of research have developed Natural Language (NL) interfaces for formulating database queries. We build upon this work, but focus on presenting a highly detailed form of the answers in NL. The answers that we present are importantly based on the provenance of tuples in the query result, detailing not only the results but also their explanations. We develop a novel method for transforming provenance information to NL, by leveraging the original NL query structure. Furthermore, since provenance information is typically large and complex, we present two solutions for its effective presentation as NL text: one that is based on provenance factorization, with novel desiderata relevant to the NL case and one that is based on summarization. We have implemented our solution in an end-to-end system supporting questions, answers and provenance, all expressed in NL. Our experiments, including a user study, indicate the quality of our solution and its scalability.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Notes

  1. We are extremely grateful to Fei Li and H.V. Jagadish for generously sharing with us the source code of NaLIR, and providing invaluable support.

References

  1. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases: The Logical Level. Addison-Wesley Longman Publishing Co., Inc., Boston (1995)

    Google Scholar 

  2. Agrawal, S., Chaudhuri, S., Das, G.: Dbxplorer: A system for keyword-based search over relational databases. In: ICDE, pp. 5–16 (2002)

  3. Ailamaki, A., Ioannidis, Y.E., Livny, M.: Scientific workflow management by database management. In: SSDBM, pp. 190–199 (1998)

  4. Ainy, E., Bourhis, P., Davidson, S.B., Deutch, D., Milo, T.: Approximated summarization of data provenance. In: CIKM, pp. 483–492 (2015)

  5. Amsterdamer, Y., Davidson, S.B., Deutch, D., Milo, T., Stoyanovich, J., Tannen, V.: Putting lipstick on pig: enabling database-style workflow provenance. Proc. VLDB Endow. 5, 346–357 (2011)

    Article  Google Scholar 

  6. Amsterdamer, Y., Kukliansky, A., Milo, T.: A natural language interface for querying general and individual knowledge. VLDB 8, 1430–1441 (2015)

    Google Scholar 

  7. Bakibayev, N., Olteanu, D., Zavodny, J.: FDB: A query engine for factorised relational databases. PVLDB, pp. 1232–1243 (2012)

    Article  Google Scholar 

  8. Beltagy, I., Erk, K., Mooney, R.: Semantic parsing using distributional semantics and probabilistic logic. In: Proceedings of the ACL 2014 workshop on semantic parsing, pp. 7–11 (2014)

  9. Benjelloun, O., Sarma, A., Halevy, A., Theobald, M., Widom, J.: Databases with uncertainty and lineage. VLDB J. 17, 243–264 (2008)

    Article  Google Scholar 

  10. Berant, J., Liang, P.: Semantic parsing via paraphrasing. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1415–1425 (2014)

  11. Bhalotia, G., Hulgeri, A., Nakhe, C., Chakrabarti, S., Sudarshan, S.: Keyword searching and browsing in databases using BANKS. In: ICDE, pp. 431–440 (2002)

  12. Brgisser, P., Clausen, M., Shokrollahi, M.A.: Algebraic Complexity Theory. Springer, New York (2010)

    Google Scholar 

  13. Buneman, P., Khanna, S., chiew Tan, W.: Why and where: a characterization of data provenance. In: ICDT, pp. 316–330 (2001)

  14. Chapman, A., Jagadish, H.V.: Why not? In: SIGMOD, pp. 523–534 (2009)

  15. Chapman, A.P., Jagadish, H.V., Ramanan, P.: Efficient provenance storage. In: SIGMOD, pp. 993–1006 (2008)

  16. Cheney, J., Chiticariu, L., Tan, W.C.: Provenance in databases: why, how, and where. Found. Trends Databases 1, 379–474 (2009)

    Article  Google Scholar 

  17. Chiticariu, L., Tan, W.C., Vijayvargiya, G.: DBNotes: a post-it system for relational databases based on provenance. In: SIGMOD, pp. 942–944 (2005)

  18. Cohen-Boulakia, S., Biton, O., Cohen, S., Davidson, S.: Addressing the provenance challenge using zoom. Concurr. Comput.: Pract. Exp. 20, 497–506 (2008)

    Article  Google Scholar 

  19. Cohn, D., Hull, R.: Business artifacts: a data-centric approach to modeling business operations and processes. IEEE Data Eng. Bull. 32, 3–9 (2009)

    Google Scholar 

  20. Davidson, S.B., Boulakia, S.C., Eyal, A., Ludäscher, B., McPhillips, T.M., Bowers, S., Anand, M.K., Freire, J.: Provenance in scientific workflow systems. IEEE Data Eng. Bull. 30, 44–50 (2007)

    Google Scholar 

  21. Davidson, S.B., Freire, J.: Provenance and scientific workflows: challenges and opportunities. In: SIGMOD, pp. 1345–1350 (2008)

  22. Deutch, D., Frost, N., Gilad, A.: Nlprov: natural language provenance. Proc. VLDB Endow. 9, 1900–1903 (2016)

    Article  Google Scholar 

  23. Deutch, D., Frost, N., Gilad, A.: Provenance for natural language queries. PVLDB 10(5), 577–588 (2017)

    Google Scholar 

  24. Deutch, D., Frost, N., Gilad, A., Haimovich, T.: Nlprovenans: natural language provenance for non-answers. Proc. VLDB Endow. 11(12), 1986–1989 (2018)

    Article  Google Scholar 

  25. Deutch, D., Gilad, A., Moskovitch, Y.: Selective provenance for datalog programs using top-k queries. PVLDB 8, 1394–1405 (2015)

    Google Scholar 

  26. Elbassioni, K., Makino, K., Rauf, I.: On the readability of monotone boolean formulae. JoCO 22, 293–304 (2011)

    MathSciNet  MATH  Google Scholar 

  27. Fakas, G.J.: Automated generation of object summaries from relational databases: a novel keyword searching paradigm. In: ICDE, pp. 564–567 (2008)

  28. Fakas, G.J., Cai, Z., Mamoulis, N.: Versatile size-\$l\$ object summariesfor relational keyword search. IEEE Trans. Knowl. Data Eng. 26(4), 1026–1038 (2014)

    Article  Google Scholar 

  29. Fakas, G.J., Cai, Z., Mamoulis, N.: Diverse and proportional size-l object summaries using pairwise relevance. VLDB J. 25(6), 791–816 (2016)

    Article  Google Scholar 

  30. Foster, I., Vockler, J., Wilde, M., Zhao, A.: Chimera: a virtual data system for representing, querying, and automating data derivation. In: SSDBM, pp. 37–46 (2002)

  31. Franconi, E., Gardent, C., Juarez-Castro, X.I., Perez-Beltrachini, L.: Quelo natural language interface: generating queries and answer descriptions. In: Natural Language Interfaces for Web of Data (2014)

  32. Giordani, A., Moschitti, A.: Translating questions to SQL queries with generative parsers discriminatively reranked. In: Proceedings of COLING 2012: Posters, pp. 401–410 (2012)

  33. Glavic, B.: Big data provenance: challenges and implications for benchmarking. In: Specifying Big Data Benchmarks—First Workshop, WBDB, pp. 72–80 (2012)

    Chapter  Google Scholar 

  34. Glavic, B., Alonso, G.: Perm: processing provenance and data on the same data model through query rewriting. In: ICDE, pp. 174–185 (2009)

  35. Glavic, B., Miller, R.J., Alonso, G.: Using SQL for efficient generation and querying of provenance information. In: Tannen, V., Wong, L., Libkin, L., Fan, W., Tan, W.C., Fourman, M. (eds.) In Search of Elegance in the Theory and Practice of Computation-Essays Dedicated to Peter Buneman, pp. 291–320. Springer, Berlin (2013)

    Chapter  Google Scholar 

  36. Green, T., Karvounarakis, G., Tannen, V.: Provenance semirings. In: PODS, pp. 31–40 (2007)

  37. Green, T.J.: Containment of conjunctive queries on annotated relations. In: ICDT, pp. 296–309 (2009)

  38. https://github.com/navefr/NL_Provenance/

  39. Hemaspaandra, E., Schnoor, H.: Minimization for generalized Boolean formulas. In: IJCAI, pp. 566–571 (2011)

  40. Herschel, M., Hlawatsch, M.: Provenance: on and behind the screens. In: SIGMOD, pp. 2213–2217 (2016)

  41. Hristidis, V., Papakonstantinou, Y.: DISCOVER: keyword search in relational databases. In: VLDB, pp. 670–681 (2002)

    Chapter  Google Scholar 

  42. Hull, D., et al.: Taverna: a tool for building and running workflows of services. Nucl. Acids Res. 34, 729–732 (2006)

    Article  Google Scholar 

  43. Ives, Z.G., Haeberlen, A., Feng, T., Gatterbauer, W.: Querying provenance for ranking and recommending. In: TaPP, pp. 9–9 (2012)

  44. Iyer, S., Konstas, I., Cheung, A., Krishnamurthy, J., Zettlemoyer, L.: Learning a neural semantic parser from user feedback. arXiv:1704.08760 (2017)

  45. Joglekar, M., Garcia-Molina, H., Parameswaran, A.G.: Smart drill-down: a new data exploration operator. PVLDB 8(12), 1928–1931 (2015)

    Google Scholar 

  46. Karvounarakis, G., Ives, Z.G., Tannen, V.: Querying data provenance. In: SIGMOD, pp. 951–962 (2010)

  47. Kenig, B., Gal, A., Strichman, O.: A new class of lineage expressions over probabilistic databases computable in p-time. In: SUM, pp. 219–232 (2013)

  48. Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Annual Meeting on Association for Computational Linguistics, pp. 423–430 (2003)

  49. Koutrika, G., Simitsis, A., Ioannidis, Y.E.: Précis: the essence of a query answer. In: ICDE, pp. 69–78 (2006)

  50. Koutrika, G., Simitsis, A., Ioannidis, Y.E.: Explaining structured queries in natural language. In: ICDE, pp. 333–344 (2010)

  51. Küpper, D., Storbel, M., Rösner, D.: Nauda: a cooperative natural language interface to relational databases. In: SIGMOD, pp. 529–533

  52. Kwasnikowska, N., den Bussche, J.V.: Mapping the NRC dataflow model to the open provenance model. In: IPAW, pp. 3–16 (2008)

  53. Lee, S., Köhler, S., Ludäscher, B., Glavic, B.: A SQL-middleware unifying why and why-not provenance for first-order queries. In: 33rd IEEE International Conference on Data Engineering, ICDE 2017, San Diego, CA, USA, April 19–22, 2017, pp. 485–496 (2017)

  54. Li, F., Jagadish, H.V.: Constructing an interactive natural language interface for relational databases. Proc. VLDB Endow. 8, 73–84 (2014)

    Article  Google Scholar 

  55. Liang, P., Jordan, M.I., Klein, D.: Learning dependency-based compositional semantics. Comput. Linguist. 39(2), 389–446 (2013)

    MathSciNet  Article  Google Scholar 

  56. Mas. http://academic.research.microsoft.com

  57. Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the penn treebank. Comput. Linguist. 19, 313–330 (1993)

    Google Scholar 

  58. Marneffe, M., Maccartney, B., Manning, C.: Generating typed dependency parses from phrase structure parses. In: LREC, pp. 449–454 (2006)

  59. Meliou, A., Song, Y., Suciu, D.: Tiresias: a demonstration of how-to queries. In: SIGMOD, pp. 709–712 (2012)

  60. Missier, P., Paton, N.W., Belhajjame, K.: Fine-grained and efficient lineage querying of collection-based workflow provenance. In: EDBT, pp. 299–310 (2010)

  61. Müller, T., Grust, T.: Provenance for SQL through abstract interpretation: value-less, but worthwhile. PVLDB 8, 1872–1875 (2015)

    Google Scholar 

  62. Olteanu, D., Závodný, J.: Factorised representations of query results: size bounds and readability. In: ICDT, pp. 285–298 (2012)

  63. Poon, H.: Grounded unsupervised semantic parsing. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 933–943 (2013)

  64. Popescu, A.-M., Armanasu, A., Etzioni, O., Ko, D., Yates, A.: Modern natural language interfaces to databases: composing statistical parsing with semantic tractability. In: Proceedings of the 20th International Conference on Computational Linguistics, p. 141 (2004)

  65. Popescu, A.-M., Etzioni, O., Kautz, H.: Towards a theory of natural language interfaces to databases. In: IUI, pp. 149–157 (2003)

  66. Ré, C., Suciu, D.: Approximate lineage for probabilistic databases. Proc. VLDB Endow. 1, 797–808 (2008)

    Article  Google Scholar 

  67. Roy, S., Suciu, D.: A formal approach to finding explanations for database queries. In: SIGMOD, pp. 1579–1590 (2014)

  68. Saha, D., Floratou, A., Sankaranarayanan, K., Minhas, U.F., Mittal, A.R., Özcan, F.: ATHENA: an ontology-driven system for natural language querying over relational data stores. PVLDB 9(12), 1209–1220 (2016)

    Google Scholar 

  69. Saint-Paul, R., Raschia, G., Mouaddib, N.: Database summarization: the saintetiq system. In: ICDE, pp. 1475–1476 (2007)

  70. Sellam, T., Kersten, M.L.: Have a chat with clustine, conversational engine to query large tables. In: HILDA, p. 2 (2016)

  71. Simitsis, A., Koutrika, G.: Comprehensible answers to précis queries. In: CAiSE, pp. 142–156 (2006)

  72. Simitsis, A., Koutrika, G., Alexandrakis, Y., Ioannidis, Y.E.: Synthesizing structured text from logical database subsets. In: EDBT, pp. 428–439 (2008)

  73. Simitsis, A., Koutrika, G., Ioannidis, Y.E.: Précis: from unstructured keywords as queries to structured databases as answers. VLDB J. 17(1), 117–149 (2008)

    Article  Google Scholar 

  74. Simmhan, Y.L., Plale, B., Gannon, D.: Karma2: provenance management for data-driven workflows. Int. J. Web Serv. Res. 5, 1–22 (2008)

    Article  Google Scholar 

  75. Song, D., Schilder, F., Smiley, C.: Natural language question answering and analytics for diverse and interlinked datasets. In: NAACL, pp. 101–105 (2015)

  76. Song, D., Schilder, F., Smiley, C., Brew, C., Zielund, T., Bretz, H., Martin, R., Dale, C., Duprey, J., Miller, T., Harrison, J.: TR discover: a natural language interface for querying and analyzing interlinked datasets. In: ISWC, pp. 21–37 (2015)

  77. Wen, Y., Zhu, X., Roy, S., Yang, J.: Interactive summarization and exploration of top aggregate query answers. PVLDB 11(13), 2196–2208 (2018)

    Google Scholar 

  78. Yih, S. W.-t., Chang, M.-W., He, X., Gao, J.: Semantic parsing via staged query graph generation: question answering with knowledge base (2015)

  79. Zettlemoyer, L.S., Collins, M.: Learning to map sentences to logical form: structured classification with probabilistic categorial grammars. arXiv:1207.1420 (2012)

  80. Zhong, V., Xiong, C., Socher, R.: Seq2sql: generating structured queries from natural language using reinforcement learning. arXiv:1709.00103 (2017)

Download references

Acknowledgements

This research has been funded by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant agreement No. 804302), the Israeli Science Foundation (ISF) Grant No. 978/17, and the Google Ph.D. Fellowship. The contribution of Amir Gilad is part of a Ph.D. thesis research conducted at Tel Aviv University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amir Gilad.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Deutch, D., Frost, N. & Gilad, A. Explaining Natural Language query results. The VLDB Journal 29, 485–508 (2020). https://doi.org/10.1007/s00778-019-00584-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-019-00584-7

Keywords

  • Provenance
  • CQ
  • UCQ
  • NL
  • Natural Language