Skip to main content

Recording and Reasoning over Data Provenance in Web and Grid Services

  • Conference paper
On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE (OTM 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2888))

Abstract

Large-scale, dynamic and open environments such as the Grid and Web Services build upon existing computing infrastructures to supply dependable and consistent large-scale computational systems. This kind of architecture has been adopted by the business and scientific communities allowing them to exploit extensive and diverse computing resources to perform complex data processing tasks. In such systems, results are often derived by composing multiple, geographically distributed, heterogeneous services as specified by intricate workflow management. This leads to the undesirable situation where the results are known, but the means by which they were achieved is not. With both scientific experiments and business transactions, the notion of lineage and dataset derivation is of paramount importance since without it, information is potentially worthless. We address the issue of data provenance, the description of the origin of a piece of data, in these environments showing the requirements, uses and implementation difficulties. We propose an infrastructure level support for a provenance recording capability for service-oriented architectures such as the Grid and Web Services. We also offer services to view and retrieve provenance and we provide a mechanism by which provenance is used to determine whether previous computed results are still up to date.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Buneman, P., Deutsch, A., Tan, W.-C.: A deterministic model for semistructured data. In: Workshop on Query Processing for Semistructured Data and Non-Standard Data Formats (1998)

    Google Scholar 

  2. Buneman, P., Khanna, S., Tan, W.-C.: Why and Where: A Characterization of Data Provenance. In: International Conference on Database Theory, ICDT (2001)

    Google Scholar 

  3. Buneman, P., Khanna, S., Tan, W.-C.: Computing provenance and annotations for views, Published at [18] (October 2002)

    Google Scholar 

  4. Curbera, F., Goland, Y., Klein, J., Leymann, F., Roller, D., Thatte, S., Weerawarana, S.: Business process execution language for web services, bpel4ws (2002), http://www.ibm.com/developerworks/library/ws-bpel/

  5. de Roure, D., Jennings, N.R., Shadbolt, N.: The semantic grid: A future e-science infrastructure. International Journal of Concurrency and Computation: Practice and Experience (2003)

    Google Scholar 

  6. Foster, I., Voeckler, J., Wilde, M., Zhao, Y.: Chimera: A virtual data system for representing, querying and automating data derivation. In: Proceedings of the 14th Conference on Scientific and Statistical Database Management, Edinburgh, Scotland (July 2002)

    Google Scholar 

  7. Foster, I., Kesselman, C., Nick, J.M., Tuecke, S.: The Physiology of the Grid – An Open Grid Services Architecture for Distributed Systems Integration. Technical report, Argonne National Laboratory (2002)

    Google Scholar 

  8. Foster, I., Kesselman, C., Tuecke, S.: The Anatomy of the Grid. Enabling Scalable Virtual Organizations. International Journal of Supercomputer Applications (2001)

    Google Scholar 

  9. Foster, I., Vockler, J., Wilde, M., Zhao, Y.: The virtual data grid: A new model and architecture for data-intensive collaboration, Published at [18] (October 2002)

    Google Scholar 

  10. Frew, J., Bose, R.: Lineage issues for scientific data and information, Published at [18] (October 2002)

    Google Scholar 

  11. Grid computing environments working group at the global grid forum (November 2002), http://www.computingportals.org/

  12. Goble, C.: Position statement: Musings on provenance, workflow and (semantic web) annotations for bioinformatics, Published at [18] (October 2002)

    Google Scholar 

  13. Greenwood, M., Goble, C., Stevens, R., Zhao, J., Addis, M., Marvin, D., Moreau, L., Oinn, T.: Provenance of e-science experiments – experience from bioinformatics. In: Proceedings of the UK OSTe-Scienc e second All Hands Meeting 2003 (AHM 2003), p. 4, Nottingham, UK (September 2003)

    Google Scholar 

  14. Leyman, F.: Web Services Flow Language (WSFL). Technical report, IBM (May 2001)

    Google Scholar 

  15. Luck, M., McBurney, P., Preist, C.: Agent Technolgy: Enabling Next Generation Computing. AgentLink (2003)

    Google Scholar 

  16. Moreau, L., Miles, S., Goble, C., Greenwood, M., Dialani, V., Addis, M., Alpdemir, N., Cawley, R., De Roure, D., Ferris, J., Gaizauskas, R., Glover, K., Greenhalgh, C., Li, P., Liu, X., Lord, P., Luck, M., Marvin, D., Oinn, T., Paton, N., Pettifer, S., Radenkovic, M.V., Roberts, A., Robinson, A., Rodden, T., Senger, M., Sharman, N., Stevens, R., Warboys, B., Wipat, A., Wroe, C.: On the Use of Agents in a BioInformatics Grid. In: Lee, S., Sekguchi, S., Matsuoka, S., Sato, M. (eds.) Proceedings of the Third IEEE/ACM CCGRID 2003 Workshop on Agent Based Cluster and Grid Computing, Tokyo, Japan, May 2003, pp. 653–661 (2003)

    Google Scholar 

  17. Pearson, D.: Data requirements for the grid – scoping study report, Status Draft (February 2002)

    Google Scholar 

  18. Data provenance/derivation workshop (October 2002), http://people.cs.uchicago.edu/yongzh/position_papers.html

  19. Saltz, J.: Data provenance, Published at [18] (October 2002)

    Google Scholar 

  20. Tan, H.K., Moreau, L.: Extending Execution Tracing for Mobile Code Security. In: Fischer, K., Hutter, D. (eds.) Second International Workshop on Security of Mobile MultiAgent Systems (SEMAS 2002), DFKI Research Report, RR-02-03, pp. 51–59, Bologna, Italy, DFKI Saarbrucken (June 2002)

    Google Scholar 

  21. Thatte, S.: Xlang, web services for business process design (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Szomszor, M., Moreau, L. (2003). Recording and Reasoning over Data Provenance in Web and Grid Services. In: Meersman, R., Tari, Z., Schmidt, D.C. (eds) On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE. OTM 2003. Lecture Notes in Computer Science, vol 2888. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39964-3_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-39964-3_39

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20498-5

  • Online ISBN: 978-3-540-39964-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics