Using SQL for Efficient Generation and Querying of Provenance Information

  • Boris Glavic
  • Renée J. Miller
  • Gustavo Alonso
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8000)

Abstract

In applications such as data warehousing or data exchange, the ability to efficiently generate and query provenance information is crucial to understand the origin of data. In this chapter, we review some of the main contributions of Perm, a DBMS that generates different types of provenance information for complex SQL queries (including nested and correlated subqueries and aggregation). The two key ideas behind Perm are representing data and its provenance together in a single relation and relying on query rewrites to generate this representation. Through this, Perm supports fully integrated, on-demand provenance generation and querying using SQL. Since Perm rewrites a query requesting provenance into a regular SQL query and generates easily optimizable SQL code, its performance greatly benefits from the query optimization techniques provided by the underlying DBMS.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Acar, U., Buneman, P., Cheney, J., van den Bussche, J., Kwasnikowska, N., Vansummeren, S.: A graph model of data and workflow provenance. In: TaPP (2010)Google Scholar
  2. 2.
    Agrawal, P., Benjelloun, O., Das Sarma, A., Hayworth, C., Nabar, S.U., Sugihara, T., Widom, J.: Trio: A System for Data, Uncertainty, and Lineage. In: VLDB, pp. 1151–1154 (2006)Google Scholar
  3. 3.
    Amsterdamer, Y., Deutch, D., Tannen, V.: On the Limitations of Provenance for Queries with Difference. In: TaPP (2011)Google Scholar
  4. 4.
    Amsterdamer, Y., Deutch, D., Tannen, V.: Provenance for Aggregate Queries. In: PODS, pp. 153–164 (2011)Google Scholar
  5. 5.
    Bhagwat, D., Chiticariu, L., Tan, W.-C., Vijayvargiya, G.: An Annotation Management System for Relational Databases. VLDB Journal 14(4), 373–396 (2005)CrossRefGoogle Scholar
  6. 6.
    Bose, R., Frew, J.: Lineage retrieval for scientific data processing: A survey. ACM Computing Surveys 37(1), 1–28 (2005)CrossRefGoogle Scholar
  7. 7.
    Buneman, P., Khanna, S., Tan, W.-C.: Why and Where: A Characterization of Data Provenance. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 316–330. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  8. 8.
    Cheney, J.: Program Slicing and Data Provenance. IEEE Data Engineering Bulletin 30(4), 22–28 (2007)Google Scholar
  9. 9.
    Cheney, J.: Causality and the Semantics of Provenance. In: DCM, pp. 63–74 (2010)Google Scholar
  10. 10.
    Cheney, J., Chiticariu, L., Tan, W.-C.: Provenance in Databases: Why, How, and Where. Foundations and Trends in Databases 1(4), 379–474 (2009)CrossRefGoogle Scholar
  11. 11.
    Chiticariu, L., Tan, W.-C., Vijayvargiya, G.: DBNotes: a Post-it System for Relational Databases based on Provenance. In: SIGMOD, pp. 942–944 (2005)Google Scholar
  12. 12.
    Cui, Y., Widom, J., Wiener, J.L.: Tracing the Lineage of View Data in a Warehousing Environment. TODS 25(2), 179–227 (2000)CrossRefGoogle Scholar
  13. 13.
    Dayal, U.: Of Nests and Trees: A Unified Approach to Processing Queries That Contain Nested Subqueries, Aggregates, and Quantifiers. In: VLDB, pp. 197–208 (1987)Google Scholar
  14. 14.
    Foster, J.N., Green, T.J., Tannen, V.: Annotated XML: Queries and Provenance. In: PODS, pp. 271–280 (2008)Google Scholar
  15. 15.
    Geerts, F., Poggi, A.: On database query languages for K-relations. Journal of Applied Logic 8(2), 173–185 (2010)MathSciNetCrossRefMATHGoogle Scholar
  16. 16.
    Glavic, B.: Perm: Efficient Provenance Support for Relational Databases. PhD thesis, University of Zurich (2010)Google Scholar
  17. 17.
    Glavic, B., Alonso, G.: Perm: Processing Provenance and Data on the same Data Model through Query Rewriting. In: ICDE, pp. 174–185 (2009)Google Scholar
  18. 18.
    Glavic, B., Alonso, G.: Provenance for Nested Subqueries. In: EDBT, pp. 982–993 (2009)Google Scholar
  19. 19.
    Glavic, B., Alonso, G., Miller, R.J., Haas, L.M.: TRAMP: Understanding the Behavior of Schema Mappings through Provenance. In: VLDB, pp. 1314–1325 (2010)Google Scholar
  20. 20.
    Green, T.J., Ives, Z.G., Tannen, V.: Reconcilable Differences. In: ICDT, pp. 212–224 (2009)Google Scholar
  21. 21.
    Green, T.J., Karvounarakis, G., Ives, Z.G., Tannen, V.: Update Exchange with Mappings and Provenance. In: VLDB, pp. 675–686 (2007)Google Scholar
  22. 22.
    Green, T.J., Karvounarakis, G., Tannen, V.: Provenance Semirings. In: PODS, pp. 31–40 (2007)Google Scholar
  23. 23.
    Green, T.J.: Containment of conjunctive queries on annotated relations. Theory of Computing Systems 49(2), 429–459 (2011)MathSciNetCrossRefMATHGoogle Scholar
  24. 24.
    Karvounarakis, G., Green, T.J.: Semiring-Annotated Data: Queries and Provenance. SIGMOD Record 41(3), 5–14 (2012)CrossRefGoogle Scholar
  25. 25.
    Karvounarakis, G., Ives, Z.G., Tannen, V.: Querying data provenance. In: SIGMOD, pp. 951–962 (2010)Google Scholar
  26. 26.
    Kementsietsidis, A., Wang, M.: On the Efficiency of Provenance Queries. In: ICDE, pp. 1223–1226 (2009)Google Scholar
  27. 27.
    Kementsietsidis, A., Wang, M.: Provenance Query Evaluation: What’s so Special about it? In: CIKM, pp. 681–690 (2009)Google Scholar
  28. 28.
    Kim, W.: On Optimizing an SQL-like Nested Query. TODS 7(3), 443–469 (1982)CrossRefMATHGoogle Scholar
  29. 29.
    Kostylev, E.V., Buneman, P.: Combining dependent annotations for relational algebra. In: ICDT, pp. 196–207 (2012)Google Scholar
  30. 30.
    Meliou, A., Gatterbauer, W., Moore, K.F., Suciu, D.: The Complexity of Causality and Responsibility for Query Answers and non-Answers. PVLDB 4(1), 34–45 (2010)Google Scholar
  31. 31.
    Park, J., Nguyen, D., Sandhu, R.: A provenance-based access control model. In: PST, pp. 137–144. IEEE (2012)Google Scholar
  32. 32.
    Seshadri, P., Pirahesh, H., Leung, T.Y.C.: Complex Query Decorrelation. In: ICDE, pp. 450–458 (1996)Google Scholar
  33. 33.
    Tan, W.-C.: Containment of Relational Queries with Annotation Propagation. In: DBPL, pp. 37–53 (2003)Google Scholar
  34. 34.
    Widom, J.: Trio: A System for Managing Data, Uncertainty, and Lineage. In: Managing and Mining Uncertain Data, pp. 113–148 (2008)Google Scholar
  35. 35.
    Widom, J., Theobald, M., Das Sarma, A.: Exploiting Lineage for Confidence Computation in Uncertain and Probabilistic Databases. In: ICDE, pp. 1023–1032 (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Boris Glavic
    • 1
  • Renée J. Miller
    • 2
  • Gustavo Alonso
    • 3
  1. 1.Illinois Institute of TechnologyUSA
  2. 2.University of TorontoCanada
  3. 3.ETH ZurichSwitzerland

Personalised recommendations