BNCOD 2013: Big Data pp 7-12 | Cite as

The Providence of Provenance

  • Peter Buneman
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7968)

Abstract

For many years and under various names, provenance has been modelled, theorised about, standardised and implemented in various ways; it has become part of mainstream database research. Moreover, the topic has now infected nearly every branch of computer science: provenance is a problem for everyone. But what exactly is the problem? And has the copious research had any real effect on how we use databases or, more generally, how we use computers.

This is a brief attempt to summarise the research on provenance and what practical impact it has had. Although much of the research has yet to come to market, there is an increasing interest in the topic from industry; moreover, it has had a surprising impact in tangential areas such as data integration and data citation. However, we are still lacking basic tools to deal with provenance and we need a culture shift if ever we are to make full use of the technology that has already been developed.

Keywords

Data Integration Data Citation Database Query Data Provenance Provenance Information 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Amsterdamer, Y., Deutch, D., Tannen, V.: Provenance for aggregate queries. CoRR, abs/1101.1110 (2011)Google Scholar
  2. 2.
  3. 3.
    Bizer, C.: World factbook, fu berlin (UTC) (retrieved 16:30, May 4, 2013)Google Scholar
  4. 4.
    Bowers, S., McPhillips, T.M., Ludäscher, B.: Provenance in collection-oriented scientific workflows. Concurrency and Computation: Practice and Experience 20(5), 519–529 (2008)CrossRefGoogle Scholar
  5. 5.
    Bowers, S., McPhillips, T., Ludäscher, B., Cohen, S., Davidson, S.B.: A model for user-oriented data provenance in pipelined scientific workflows. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 133–147. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  6. 6.
    Braun, U., Shinnar, A., Seltzer, M.I.: Securing provenance. In: HotSec (2008)Google Scholar
  7. 7.
    Buneman, P., Cheney, J., Vansummeren, S.: On the expressiveness of implicit provenance in query and update languages. ACM Trans. Database Syst. 33(4) (2008)Google Scholar
  8. 8.
    Buneman, P., Khanna, S., Tajima, K., Tan, W.C.: Archiving scientific data. ACM Trans. Database Syst. 29, 2–42 (2004)CrossRefGoogle Scholar
  9. 9.
    Buneman, P., Khanna, S., Tan, W.-C.: Why and where: A characterization of data provenance. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 316–330. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  10. 10.
    Central Intelligence Agency. The World Factbook, https://www.cia.gov/library/publications/the-world-factbook/
  11. 11.
    Cheney, J., Ahmed, A., Acar, U.A.: Provenance as dependency analysis. Mathematical Structures in Computer Science 21(6), 1301–1337 (2011)MathSciNetMATHCrossRefGoogle Scholar
  12. 12.
    Cui, Y., Widom, J.: Practical lineage tracing in data warehouses. In: ICDE, pp. 367–378 (2000)Google Scholar
  13. 13.
    Davidson, S.B., Freire, J.: Provenance and scientific workflows: challenges and opportunities. In: SIGMOD Conference, pp. 1345–1350 (2008)Google Scholar
  14. 14.
    Davidson, S.B., Khanna, S., Roy, S., Stoyanovich, J., Tannen, V., Chen, Y.: On provenance and privacy. In: ICDT, pp. 3–10 (2011)Google Scholar
  15. 15.
    Deutch, D., Ives, Z., Milo, T., Tannen, V.: Caravan: Provisioning for what-if analysis. In: CIDR (2013)Google Scholar
  16. 16.
    Freire, J., Silva, C.T.: Making computations and publications reproducible with vistrails. Computing in Science and Engineering 14(4), 18–25 (2012)CrossRefGoogle Scholar
  17. 17.
    Gil, Y., Miles, S.: Prov model primer (2013), http://www.w3.org/TR/2013/NOTE-prov-primer-20130430/
  18. 18.
    Green, T.J., Karvounarakis, G., Ives, Z.G., Tannen, V.: Provenance in orchestra. IEEE Data Eng. Bull. 33(3), 9–16 (2010)Google Scholar
  19. 19.
    Green, T.J., Karvounarakis, G., Tannen, V.: Provenance semirings. In: PODS, pp. 31–40 (2007)Google Scholar
  20. 20.
    Heath, T., Bizer, C.: Linked Data: Evolving the Web into a Global Data Space. Synthesis Lectures on the Semantic Web. Morgan & Claypool Publishers (2011)Google Scholar
  21. 21.
    Karvounarakis, G., Ives, Z.G., Tannen, V.: Querying data provenance. In: SIGMOD Conference, pp. 951–962 (2010)Google Scholar
  22. 22.
    Marinho, A., Murta, L., Werner, C., Braganholo, V., Cruz, S., Ogasawara, E., Mattoso, M.: Provmanager: a provenance management system for scientific workflows. Concurr. Comput.: Pract. Exper. 24(13), 1513–1530 (2012)CrossRefGoogle Scholar
  23. 23.
    Moreau, L., Freire, J., Futrelle, J., McGrath, R.E., Myers, J., Paulson, P.: The open provenance model: An overview. In: Freire, J., Koop, D., Moreau, L. (eds.) IPAW 2008. LNCS, vol. 5272, pp. 323–326. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  24. 24.
    Muniswamy-Reddy, K.-K., Braun, U., David, P.M., Holland, A., Maclean, D., Margo, D., Seltzer, M., Smogor, R.: Layering in Provenance Systems. In: 2009 USENIX Annual Technical Conference, San Diego, CA (June 2009)Google Scholar
  25. 25.
    Nowakowski, P., Ciepiela, E., Harezlak, D., Kocot, J., Kasztelnik, M., Bartynski, T., Meizner, J., Dyk, G., Malawski, M.: The collage authoring environment. Procedia CS 4, 608–617 (2011)Google Scholar
  26. 26.
    Seltzer, M.: World domination through provenance (tapp 2013 keynote) (2013), https://www.usenix.org/conference/tapp13/world-domination-through-provenance
  27. 27.
    Sharman, J.L., Benson, H.E., Pawson, A.J., Lukito, V., Mpamhanga, C.P., Bombail, V., Davenport, A.P., Peters, J.A., Spedding, M., Harmar, A.J.: Nc-Iuphar. Iuphar-db: updated database content and new features. Nucleic Acids Research 41(Database-Issue), 1083–1088 (2013)CrossRefGoogle Scholar
  28. 28.
    Wang, Y.R., Madnick, S.E.: A polygen model for heterogeneous database systems: The source tagging perspective. In: VLDB, pp. 519–538 (1990)Google Scholar
  29. 29.
    Woodruff, A., Stonebraker, M.: Supporting fine-grained data lineage in a database visualization environment. In: ICDE, pp. 91–102 (1997)Google Scholar
  30. 30.
    Zhao, J., Goble, C., Stevens, R., Turi, D.: Mining taverna’s semantic web of provenance. Concurrency and Computation: Practice and Experience 20(5), 463–472 (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Peter Buneman
    • 1
  1. 1.School of InformaticsUniversity of EdinburghUK

Personalised recommendations