The VLDB Journal

, Volume 28, Issue 1, pp 47–71 | Cite as

PUG: a framework and practical implementation for why and why-not provenance

  • Seokki LeeEmail author
  • Bertram Ludäscher
  • Boris Glavic
Regular Paper


Explaining why an answer is (or is not) returned by a query is important for many applications including auditing, debugging data and queries, and answering hypothetical questions about data. In this work, we present the first practical approach for answering such questions for queries with negation (first-order queries). Specifically, we introduce a graph-based provenance model that, while syntactic in nature, supports reverse reasoning and is proven to encode a wide range of provenance models from the literature. The implementation of this model in our PUG (Provenance Unification through Graphs) system takes a provenance question and Datalog query as an input and generates a Datalog program that computes an explanation, i.e., the part of the provenance that is relevant to answer the question. Furthermore, we demonstrate how a desirable factorization of provenance can be achieved by rewriting an input query. We experimentally evaluate our approach demonstrating its efficiency.


Datalog Provenance Missing answers Semirings 



This work was supported by NSF Awards OAC-{1640864, 1541450} and SMA-1637155. Opinions and findings expressed in this material are those of the authors and do not necessarily reflect the views of the NSF.


  1. 1.
    Arab, B., Gawlick, D., Radhakrishnan, V., Guo, H., Glavic, B.: A generic provenance middleware for database queries, updates, and transactions. In: TaPP (2014)Google Scholar
  2. 2.
    Bidoit, N., Herschel, M., Tzompanaki, K.: Immutably answering why-not questions for equivalent conjunctive queries. In: TaPP (2014)Google Scholar
  3. 3.
    Bidoit, N., Herschel, M., Tzompanaki, K., et al.: Query-based why-not provenance with NedExplain. In: EDBT, pp. 145–156 (2014)Google Scholar
  4. 4.
    Chapman, A., Jagadish, H.V.: Why not? In: SIGMOD, pp. 523–534 (2009)Google Scholar
  5. 5.
    Cheney, J., Chiticariu, L., Tan, W.: Provenance in databases: why, how, and where. Found. Trends Databases 1(4), 379–474 (2009)CrossRefGoogle Scholar
  6. 6.
    Damásio, C.V., Analyti, A., Antoniou, G.: Justifications for logic programming. In: Logic Programming and Nonmonotonic Reasoning, pp. 530–542 (2013)Google Scholar
  7. 7.
    Deutch, D., Gilad, A., Moskovitch, Y.: Selective provenance for datalog programs using top-k queries. PVLDB 8(12), 1394–1405 (2015)Google Scholar
  8. 8.
    Deutch, D., Milo, T., Roy, S., Tannen, V.: Circuits for datalog provenance. In: ICDT, pp. 201–212 (2014)Google Scholar
  9. 9.
    Fehrenbach, S., Cheney, J.: Language-integrated provenance. Sci. Comput. Programm. 155, 103–145 (2017)CrossRefGoogle Scholar
  10. 10.
    Flum, J., Kubierschky, M., Ludäscher, B.: Total and partial well-founded datalog coincide. In: ICDT, pp. 113–124 (1997)Google Scholar
  11. 11.
    Glavic, B., Köhler, S., Riddle, S., Ludäscher, B.: Towards constraint-based explanations for answers and non-answers. In: TaPP (2015)Google Scholar
  12. 12.
    Glavic, B., Miller, R.J., Alonso, G.: Using sql for efficient generation and querying of provenance information. In: Tannen, V., Wong, L., Libkin, L., Fan, W., Tan, W.C., Fourman, M. (eds.) In Search of Elegance in the Theory and Practice of Computation, pp. 291–320. Springer, Berlin (2013)CrossRefGoogle Scholar
  13. 13.
    Grädel, E., Tannen, V.: Semiring provenance for first-order model checking (2017). arXiv:1712.01980
  14. 14.
    Green, T.: Containment of conjunctive queries on annotated relations. Theory Comput. Syst. 49(2), 429–459 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Green, T., Karvounarakis, G., Tannen, V.: Provenance semirings. In: PODS, pp. 31–40 (2007)Google Scholar
  16. 16.
    Green, T.J., Aref, M., Karvounarakis, G.: Logicblox, platform and language: a tutorial. In: Datalog in Academia and Industry, pp. 1–8. Springer, Berlin (2012)Google Scholar
  17. 17.
    Green, T.J., Karvounarakis, G., Ives, Z.G., Tannen, V.: Update exchange with mappings and provenance. In: VLDB, pp. 675–686 (2007)Google Scholar
  18. 18.
    Green, T.J., Tannen, V.: The semiring framework for database provenance. In: PODS, pp. 93–99 (2017)Google Scholar
  19. 19.
    Herschel, M., Diestelkämper, R., Lahmar, H.B.: A survey on provenance: What for? what form? what from? VLDB J 9(3), 1–26 (2017)Google Scholar
  20. 20.
    Herschel, M., Hernandez, M.: Explaining missing answers to SPJUA queries. PVLDB 3(1), 185–196 (2010)Google Scholar
  21. 21.
    Huang, J., Chen, T., Doan, A., Naughton, J.: On the provenance of non-answers to queries over extracted data. In: VLDB, pp. 736–747 (2008)Google Scholar
  22. 22.
    Karvounarakis, G., Green, T.J.: Semiring-annotated data: queries and provenance. SIGMOD Rec. 41(3), 5–14 (2012)CrossRefGoogle Scholar
  23. 23.
    Köhler, S., Ludäscher, B., Smaragdakis, Y.: Declarative datalog debugging for mere mortals. In: Datalog 2.0: Datalog in Academia and Industry, pp. 111–122 (2012)Google Scholar
  24. 24.
    Köhler, S., Ludäscher, B., Zinn, D.: First-order provenance games. In: Tannen, V., Wong, L., Libkin, L., Fan, W., Tan, W.C., Fourman, M. (eds.) Search of Elegance in the Theory and Practice of Computation, pp. 382–399. Springer, Berlin (2013)CrossRefGoogle Scholar
  25. 25.
    Lee, S., Köhler, S., Ludäscher, B., Glavic, B.: Efficiently computing provenance graphs for queries with negation. Technical Report CoRR (2016). arXiv:1701.05699
  26. 26.
    Lee, S., Köhler, S., Ludäscher, B., Glavic, B.: A SQL-middleware unifying why and why-not provenance for first-order queries. In: ICDE, pp. 485–496 (2017)Google Scholar
  27. 27.
    Lee, S., Ludäscher, B., Glavic, B.: Pug: A framework and practical implementation for why and why-not provenance (extended version). Technical Report CoRR (2018). arXiv:1808.05752
  28. 28.
    Lee, S., Niu, X., Ludäscher, B., Glavic, B.: Integrating approximate summarization with provenance capture. In: TaPP (2017)Google Scholar
  29. 29.
    Meliou, A., Gatterbauer, W., Moore, K., Suciu, D.: The complexity of causality and responsibility for query answers and non-answers. PVLDB 4(1), 34–45 (2010)Google Scholar
  30. 30.
    Meliou, A., Gatterbauer, W., Suciu, D.: Reverse data management. PVLDB 4(12), 1490–1493 (2011)Google Scholar
  31. 31.
    Meliou, A., Suciu, D.: Tiresias: The database oracle for how-to queries. In: SIGMOD, pp. 337–348 (2012)Google Scholar
  32. 32.
    Niu, X., Kapoor, R., Glavic, B., Gawlick, D., Liu, Z.H., Krishnaswamy, V., Radhakrishnan, V.: Provenance-aware query optimization. In: ICDE, pp. 473–484 (2017)Google Scholar
  33. 33.
    Olteanu, D., Závodnỳ, J.: Factorised representations of query results: size bounds and readability. In: ICDT, pp. 285–298. ACM (2012)Google Scholar
  34. 34.
    Olteanu, D., Závodnỳ, J.: Size bounds for factorised representations of query results. ACM Trans. Database Syst. (TODS) 40(1), 2 (2015)MathSciNetCrossRefGoogle Scholar
  35. 35.
    Riddle, S., Köhler, S., Ludäscher, B.: Towards constraint provenance games. In: TaPP (2014)Google Scholar
  36. 36.
    Roy, S., Orr, L., Suciu, D.: Explaining query answers with explanation-ready databases. Proc. VLDB Endow. 9(4), 348–359 (2015)CrossRefGoogle Scholar
  37. 37.
    Roy, S., Suciu, D.: A formal approach to finding explanations for database queries. In: SIGMOD (2014)Google Scholar
  38. 38.
    Senellart, P.: Provenance and probabilities in relational databases. ACM SIGMOD Rec. 46(4), 5–15 (2018)MathSciNetCrossRefGoogle Scholar
  39. 39.
    Tannen, V.: Provenance analysis for FOL model checking. ACM SIGLOG News 4(1), 24–36 (2017)Google Scholar
  40. 40.
    Tran, Q.T., Chan, C.-Y.: How to conquer why-not questions. In: SIGMOD, pp. 15–26 (2010)Google Scholar
  41. 41.
    Wu, E., Madden, S.: Scorpion: explaining away outliers in aggregate queries. PVLDB 6(8), 553–564 (2013)Google Scholar
  42. 42.
    Wu, Y., Zhao, M., Haeberlen, A., Zhou, W., Loo, B.T.: Diagnosing missing events in distributed systems with negative provenance. In: SIGCOMM, pp. 383–394 (2014)Google Scholar
  43. 43.
    Xu, J., Zhang, W., Alawini, A., Tannen, V.: Provenance analysis for missing answers and integrity repairs. IEEE Data Eng. Bull. 41(1), 39–50 (2018)Google Scholar
  44. 44.
    Zhou, W., Sherr, M., Tao, T., Li, X., Loo, B.T., Mao, Y.: Efficient querying and maintenance of network provenance at internet-scale. In: SIGMOD, pp. 615–626 (2010)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Illinois Institute of TechnologyChicagoUSA
  2. 2.University of Illinois, Urbana-Champaign (UIUC)ChampaignUSA

Personalised recommendations