Advertisement

A Python Library for Provenance Recording and Querying

  • Carsten Bochner
  • Roland Gude
  • Andreas Schreiber
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5272)

Abstract

In many application domains the provenance of data plays an important role. It is often required to get store detailed information of the underlying processes that led to the data (e.g., results of numerical simulations) for the purpose of documentation or checking the process for compliance to applicable regulations. Especially in science and engineering more and more applications are being developed in Python, which is used either for development of the whole application or as a glue language for coordinating codes written in other programming languages. To easily integrate provenance recording into applications developed in Python, a provenance client library with a suitable Python API is useful. In this paper we present such a Python client library for recording and querying provenance information. We show an exemplary application, explain the overall architecture of the library, and give some details on the technologies used for the implementation.

Keywords

Application Developer Globus Toolkit Provenance Information Soap Message Object Oriented Programming Language 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Moreau, L., Groth, P., Miles, S., Vazquez-Salceda, J., Ibbotson, J., Jiang, S., Munroe, S., Rana, O., Schreiber, A., Tan, V., Varga, L.: The provenance of electronic data. Commun. ACM 51(4), 52–58 (2008)CrossRefGoogle Scholar
  2. 2.
    Groth, P., Jiang, S., Miles, S., Munroe, S., Tan, V., Tsasakou, S., Moreau, L.: An Architecture for Provenance Systems. Technical report, University of Southampton (2006)Google Scholar
  3. 3.
    The Python Website, http://www.python.org
  4. 4.
    The AeroGrid Project Website, http://www.aero-grid.de
  5. 5.
    Schlauch, T., Schreiber, A.: Datafinder - a scientific data management solution. In: Ensuring the Long-Term Preservation and Value Adding to Scientific and Technical Data, PV 2007, Oberpfaffenhofen, Germany (2007)Google Scholar
  6. 6.
    Dubois, P.F.: Ten good practices in scientific programming. Computing in Science and Engg. 1(1), 7–11 (1999)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Jackson, K.R.: PyGlobus: a Python interface to the Globus Toolkit. Concurrency and Computation: Practice and Experience 14(13-15), 1075–1083 (2002)CrossRefzbMATHGoogle Scholar
  8. 8.
    The EU Grid Provenance Website, http://www.gridprovenance.org
  9. 9.
    Miles, S., Moreau, L., Groth, P., Tan, V., Munroe, S., Jiang, S.: Provenance Query Protocol. Technical report, University of Southampton (2006)Google Scholar
  10. 10.
    Jiang, S.: Client side library. Architecture tutorial. Technical report, University of Southampton (2005)Google Scholar
  11. 11.
    Groth, P., Tan, V., Munroe, S., Jiang, S., Miles, S., Moreau, L.: Process Documentation Recording Protocol. Technical report, University of Southampton (2006)Google Scholar
  12. 12.
    Miles, S., Moreau, L., Groth, P., Tan, V., Munroe, S., Jiang, S.: XPath Profile for the Provenance Query Protocol. Technical report, University of Southampton (2006)Google Scholar
  13. 13.
    Munroe, S., Tan, V., Groth, P., Jiang, S., Miles, S., Moreau, L.: A SOAP Binding For Process Documentation. Technical report, University of Southampton (2006)Google Scholar
  14. 14.
  15. 15.
    Jiang, S., Moreau, L., Groth, P., Miles, S., Munroe, S., Tan, V.: Client Side Library Design and Implementation. Technical report, University of Southampton (2006)Google Scholar
  16. 16.
    The Python Webservices Project Website (including ZSI), http://pywebsvcs.sourceforge.net
  17. 17.
    The Python Enterprise Application Kit (PEAK) Website, http://peak.telecommunity.com

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Carsten Bochner
    • 1
  • Roland Gude
    • 1
  • Andreas Schreiber
    • 1
  1. 1.Simulation and Software TechnologyGerman Aerospace CenterCologneGermany

Personalised recommendations