Provenance as Dependency Analysis

  • James Cheney
  • Amal Ahmed
  • Umut A. Acar
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4797)


Provenance is information recording the source, derivation, or history of some information. Provenance tracking has been studied in a variety of settings; however, although many design points have been explored, the mathematical or semantic foundations of data provenance have received comparatively little attention. In this paper, we argue that dependency analysis techniques familiar from program analysis and program slicing provide a formal foundation for forms of provenance that are intended to show how (part of) the output of a query depends on (parts of) its input. We introduce a semantic characterization of such dependency provenance, show that this form of provenance is not computable, and provide dynamic and static approximation techniques.


Dependency Analysis Collection Type Typing Judgment Data Provenance Semantic Characterization 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abadi, M., Banerjee, A., Heintze, N., Riecke, J.G.: A core calculus of dependency. In: POPL, pp. 147–160. ACM Press, New York (1999)CrossRefGoogle Scholar
  2. 2.
    Abadi, M., Lampson, B., Lévy, J.-J.: Analysis and caching of dependencies. In: ICFP, pp. 83–91. ACM Press, New York (1996)Google Scholar
  3. 3.
    Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Reading (1995)zbMATHGoogle Scholar
  4. 4.
    Acar, U.A., Blelloch, G.E., Harper, R.: Selective memoization. In: Proceedings of the 30th Annual ACM Symposium on Principles of Programming Languages, ACM Press, New York (2003)Google Scholar
  5. 5.
    Benjelloun, O., Sarma, A.D., Halevy, A.Y., Widom, J.: ULDBs: Databases with uncertainty and lineage. In: VLDB, pp. 953–964 (2006)Google Scholar
  6. 6.
    Bhagwat, D., Chiticariu, L., Tan, W.-C., Vijayvargiya, G.: An annotation management system for relational databases. VLDB Journal 14(4), 373–396 (2005)CrossRefGoogle Scholar
  7. 7.
    Biswas, S.: Dynamic Slicing in Higher-Order Programming Languages. PhD thesis, University of Pennsylvania (1997)Google Scholar
  8. 8.
    Bose, R., Frew, J.: Lineage retrieval for scientific data processing: a survey. ACM Comput. Surv. 37(1), 1–28 (2005)CrossRefGoogle Scholar
  9. 9.
    Buneman, P., Chapman, A., Cheney, J.: Provenance management in curated databases. In: SIGMOD 2006, pp. 539–550 (2006)Google Scholar
  10. 10.
    Buneman, P., Cheney, J., Vansummeren, S.: On the expressiveness of implicit provenance in query and update languages. In: Schwentick, T., Suciu, D. (eds.) ICDT 2007. LNCS, vol. 4353, pp. 209–223. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  11. 11.
    Buneman, P., Khanna, S., Tan, W.-C.: Why and where: A characterization of data provenance. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 316–330. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  12. 12.
    Buneman, P., Khanna, S., Tan, W.-C.: On propagation of deletions and annotations through views. In: PODS, pp. 150–158 (2002)Google Scholar
  13. 13.
    Buneman, P., Naqvi, S.A., Tannen, V., Wong, L.: Principles of programming with complex objects and collection types. Theor. Comp. Sci. 149(1), 3–48 (1995)zbMATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Cheney, J., Ahmed, A., Acar, U.: Provenance as dependency analysis. Technical Report arXiv:0708.2173v1, e-Print archive (2007)Google Scholar
  15. 15.
    Cui, Y., Widom, J., Wiener, J.L.: Tracing the lineage of view data in a warehousing environment. ACM Trans. Database Syst. 25(2), 179–227 (2000)CrossRefGoogle Scholar
  16. 16.
    Field, J., Tip, F.: Dynamic dependence in term rewriting systems and its application to program slicing. Information and Software Technology 40(11–12), 609–636 (1998)CrossRefGoogle Scholar
  17. 17.
    Moreau, L., Foster, I. (eds.): IPAW 2006. LNCS, vol. 4145. Springer, Heidelberg (2006)Google Scholar
  18. 18.
    Geerts, F., Kementsietsidis, A., Milano, D.: Mondrian: Annotating and querying databases through colors and blocks. In: ICDE 2006, p. 82 (2006)Google Scholar
  19. 19.
    Green, T.J., Karvounarakis, G., Tannen, V.: Provenance semirings. In: PODS, pp. 31–40. ACM Press, New York (2007)Google Scholar
  20. 20.
    Sabelfeld, A., Myers, A.: Language-based information-flow security. IEEE Journal on Selected Areas in Communications 21(1), 5–19 (2003)CrossRefGoogle Scholar
  21. 21.
    Simmhan, Y., Plale, B., Gannon, D.: A survey of data provenance in e-science. SIGMOD Record 34(3), 31–36 (2005)CrossRefGoogle Scholar
  22. 22.
    Wadler, P.: Comprehending monads. Mathematical Structures in Computer Science 2, 461–493 (1992)zbMATHMathSciNetCrossRefGoogle Scholar
  23. 23.
    Wang, Y.R., Madnick, S.E.: A polygen model for heterogeneous database systems: The source tagging perspective. In: VLDB, pp. 519–538 (1990)Google Scholar
  24. 24.
    Weiser, M.: Program slicing. In: ICSE, pp. 439–449. IEEE Press, Piscataway, NJ, USA (1981)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • James Cheney
    • 1
  • Amal Ahmed
    • 2
  • Umut A. Acar
    • 2
  1. 1.University of Edinburgh 
  2. 2.Toyota Technological Institute at Chicago 

Personalised recommendations