Abstract
Large-scale, dynamic and open environments such as the Grid and Web Services build upon existing computing infrastructures to supply dependable and consistent large-scale computational systems. This kind of architecture has been adopted by the business and scientific communities allowing them to exploit extensive and diverse computing resources to perform complex data processing tasks. In such systems, results are often derived by composing multiple, geographically distributed, heterogeneous services as specified by intricate workflow management. This leads to the undesirable situation where the results are known, but the means by which they were achieved is not. With both scientific experiments and business transactions, the notion of lineage and dataset derivation is of paramount importance since without it, information is potentially worthless. We address the issue of data provenance, the description of the origin of a piece of data, in these environments showing the requirements, uses and implementation difficulties. We propose an infrastructure level support for a provenance recording capability for service-oriented architectures such as the Grid and Web Services. We also offer services to view and retrieve provenance and we provide a mechanism by which provenance is used to determine whether previous computed results are still up to date.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Buneman, P., Deutsch, A., Tan, W.-C.: A deterministic model for semistructured data. In: Workshop on Query Processing for Semistructured Data and Non-Standard Data Formats (1998)
Buneman, P., Khanna, S., Tan, W.-C.: Why and Where: A Characterization of Data Provenance. In: International Conference on Database Theory, ICDT (2001)
Buneman, P., Khanna, S., Tan, W.-C.: Computing provenance and annotations for views, Published at [18] (October 2002)
Curbera, F., Goland, Y., Klein, J., Leymann, F., Roller, D., Thatte, S., Weerawarana, S.: Business process execution language for web services, bpel4ws (2002), http://www.ibm.com/developerworks/library/ws-bpel/
de Roure, D., Jennings, N.R., Shadbolt, N.: The semantic grid: A future e-science infrastructure. International Journal of Concurrency and Computation: Practice and Experience (2003)
Foster, I., Voeckler, J., Wilde, M., Zhao, Y.: Chimera: A virtual data system for representing, querying and automating data derivation. In: Proceedings of the 14th Conference on Scientific and Statistical Database Management, Edinburgh, Scotland (July 2002)
Foster, I., Kesselman, C., Nick, J.M., Tuecke, S.: The Physiology of the Grid – An Open Grid Services Architecture for Distributed Systems Integration. Technical report, Argonne National Laboratory (2002)
Foster, I., Kesselman, C., Tuecke, S.: The Anatomy of the Grid. Enabling Scalable Virtual Organizations. International Journal of Supercomputer Applications (2001)
Foster, I., Vockler, J., Wilde, M., Zhao, Y.: The virtual data grid: A new model and architecture for data-intensive collaboration, Published at [18] (October 2002)
Frew, J., Bose, R.: Lineage issues for scientific data and information, Published at [18] (October 2002)
Grid computing environments working group at the global grid forum (November 2002), http://www.computingportals.org/
Goble, C.: Position statement: Musings on provenance, workflow and (semantic web) annotations for bioinformatics, Published at [18] (October 2002)
Greenwood, M., Goble, C., Stevens, R., Zhao, J., Addis, M., Marvin, D., Moreau, L., Oinn, T.: Provenance of e-science experiments – experience from bioinformatics. In: Proceedings of the UK OSTe-Scienc e second All Hands Meeting 2003 (AHM 2003), p. 4, Nottingham, UK (September 2003)
Leyman, F.: Web Services Flow Language (WSFL). Technical report, IBM (May 2001)
Luck, M., McBurney, P., Preist, C.: Agent Technolgy: Enabling Next Generation Computing. AgentLink (2003)
Moreau, L., Miles, S., Goble, C., Greenwood, M., Dialani, V., Addis, M., Alpdemir, N., Cawley, R., De Roure, D., Ferris, J., Gaizauskas, R., Glover, K., Greenhalgh, C., Li, P., Liu, X., Lord, P., Luck, M., Marvin, D., Oinn, T., Paton, N., Pettifer, S., Radenkovic, M.V., Roberts, A., Robinson, A., Rodden, T., Senger, M., Sharman, N., Stevens, R., Warboys, B., Wipat, A., Wroe, C.: On the Use of Agents in a BioInformatics Grid. In: Lee, S., Sekguchi, S., Matsuoka, S., Sato, M. (eds.) Proceedings of the Third IEEE/ACM CCGRID 2003 Workshop on Agent Based Cluster and Grid Computing, Tokyo, Japan, May 2003, pp. 653–661 (2003)
Pearson, D.: Data requirements for the grid – scoping study report, Status Draft (February 2002)
Data provenance/derivation workshop (October 2002), http://people.cs.uchicago.edu/yongzh/position_papers.html
Saltz, J.: Data provenance, Published at [18] (October 2002)
Tan, H.K., Moreau, L.: Extending Execution Tracing for Mobile Code Security. In: Fischer, K., Hutter, D. (eds.) Second International Workshop on Security of Mobile MultiAgent Systems (SEMAS 2002), DFKI Research Report, RR-02-03, pp. 51–59, Bologna, Italy, DFKI Saarbrucken (June 2002)
Thatte, S.: Xlang, web services for business process design (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Szomszor, M., Moreau, L. (2003). Recording and Reasoning over Data Provenance in Web and Grid Services. In: Meersman, R., Tari, Z., Schmidt, D.C. (eds) On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE. OTM 2003. Lecture Notes in Computer Science, vol 2888. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39964-3_39
Download citation
DOI: https://doi.org/10.1007/978-3-540-39964-3_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20498-5
Online ISBN: 978-3-540-39964-3
eBook Packages: Springer Book Archive