Journal of Grid Computing

, Volume 5, Issue 1, pp 1–25 | Cite as

The Requirements of Using Provenance in e-Science Experiments

  • Simon MilesEmail author
  • Paul Groth
  • Miguel Branco
  • Luc Moreau


In e-Science experiments, it is vital to record the experimental process for later use such as in interpreting results, verifying that the correct process took place or tracing where data came from. The process that led to some data is called the provenance of that data, and a provenance architecture is the software architecture for a system that will provide the necessary functionality to record, store and use process documentation to determine the provenance of data items. However, there has been little principled analysis of what is actually required of a provenance architecture, so it is impossible to determine the functionality they would ideally support. In this paper, we present use cases for a provenance architecture from current experiments in biology, chemistry, physics and computer science, and analyse the use cases to determine the technical requirements of a generic, technology and application-independent architecture. We propose an architecture that meets these requirements, analyse its features compared with other approaches and evaluate a preliminary implementation by attempting to realise two of the use cases.

Key words

e-Science Grid provenance requirements use case workflow 



Candidate Gene Experiment


Intron Compressibility Experiment


Provenance-Aware Service-Oriented Architecture


Particle Detection Experiment


Protein Identification Experiment


Second Harmonic Generation Experiment


Service-Oriented Architecture


Service Reliability Experiment


Security Testing Experiment


Virtual Data System


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Addis, M., Ferris, J., Greenwood, M., Marvin, D., Li, P., Oinn, T., Wipat, A.: Experiences with eScience workflow specification and enactment in bioinformatics. In: Cox, S. (ed.) Proceedings of the UK OST e-Science Second All Hands Meeting 2003 (AHM’03), pp. 459–467, Nottingham, UK (2003)Google Scholar
  2. 2.
    Alonso, G., Abbadi, A.E.: GOOSE: Geographic object oriented support environment. In: Proceedings of the ACM workshop on Advances in Geographic Information Systems, pp. 38–49, Arlington, Virginia (1993)Google Scholar
  3. 3.
    Alonso, G., Hagen, C.: Geo-opera: workflow concepts for spatial processes. In: Proceedings of 5th International Symposium on Spatial Databases (SSD ’97), pp. 238–258, Berlin, Germany (1997)Google Scholar
  4. 4.
    Andrews, T., Curbera, F., Dholakia, H., Goland, Y., Klein, J., Leymann, F., Liu, K., Roller, D., Smith, D., Thatte, S., Trickovic, I., Weerawarana, S.: Business process execution language for web services version 1.1. (2006)
  5. 5.
    Ashri, R., Payne, T., Marvin, D., Surridge, M., Taylor, S.: Towards a semantic web security infrastructure. In: Semantic Web Services, AAAI Spring Symposium Series. Published as part of AAAI Technical Report SS-04-06, no page numbers given (2004)Google Scholar
  6. 6.
    Becker, R.A., Chambers, J.M.J.M.: Auditing of data analyses. SIAM J. Sci. Statist. Comput. 9(4), 747–760 (1988)zbMATHCrossRefGoogle Scholar
  7. 7.
    Buneman, P., Khanna, S., Tajima, K., Tan, W.: Archiving scientific data. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 1–12 (2002)Google Scholar
  8. 8.
    Buneman, P., Khanna, S., Tan, W.: Why and where: a characterization of data provenance. In: Int. Conf. on Databases Theory (ICDT). pp. 316–330 (2001)Google Scholar
  9. 9.
    Consultative Committee for Space Data Systems: Reference Model for an Open Archival Information System (OAIS). Technical report 650.0-B-1, National Aeronautics and Space Administration, Washington, DC 20546 USA (2002)Google Scholar
  10. 10.
    Crawford, M.J., Frey, J.G., VanderNoot, T.J., Zhao, Y.G.: Investigation of transport across an immiscible liquid/liquid interface– electrochemical and second harmonic generation studies. J. Chem. Soc., Faraday Trans. 92(8), 1369–1373 (1996)CrossRefGoogle Scholar
  11. 11.
    Cui, Y., Widom, J., Wiener, J.L.: Tracing the lineage of view data in a warehousing environment. ACM Trans. Database Syst. 25(2), 179–227 (2000)CrossRefGoogle Scholar
  12. 12.
  13. 13.
    Fan, H., Poulovassilis, A.: Tracing data lineage using schema transformation pathways. In: Omelayenko, B., Klein, M. (eds.) Knowledge Transformation for the Semantic Web, pp. 64–79, IOS Press, Amsterdam, The Netherlands (2003)Google Scholar
  14. 14.
    Foster, I., Kesselman, C., Tuecke, S.: The Anatomy of the Grid: Enabling Scalable Virtual Organizations. Int. J. Supercomput. Appl. 15(3), 200–222 (2001)CrossRefGoogle Scholar
  15. 15.
    Foster, I., Vockler, J., Wilde, M., Zhao, Y.: The virtual data Grid: a new model and architecture for data-intensive collaboration. In: Proceedings of the CIDR 2003 First Biennial Conference on Innovative Data Systems Research (non-published) (2003)Google Scholar
  16. 16.
    Foster, I., Voeckler, J., Wilde, M., Zhao, Y.: Chimera: a virtual data system for representing, querying and automating data derivation. In: Proceedings of the 14th Conf. on Scientific and Statistical Database Management. pp. 37–46 (2002)Google Scholar
  17. 17.
  18. 18.
    Gene Ontology Consortium: (2006)
  19. 19.
    Greenwood, M., Goble, C., Stevens, R., Zhao, J., Addis, M., Marvin, D., Moreau, L., Oinn, T.: Provenance of e-Science experiments–experience from Bioinformatics. In: Cox, S.J. (ed.) Proceedings of the UK e-Science All Hands Meeting 2003, pp. 223–226 (2003)Google Scholar
  20. 20.
    Groth, P., Luck, M., Moreau, L.: A protocol for recording provenance in service-oriented Grids. In: Proceedings of the 8th International Conference on Principles of Distributed Systems (OPODIS’04), vol. 3544, pp. 124–139, Grenoble, France (2004)Google Scholar
  21. 21.
    Groth, P., Miles, S., Fang, W., Wong, S.C., Zauner, K.-P., Moreau, L.: Recording and using provenance in a protein compressibility experiment. In: Proceedings of the 14th IEEE International Symposium on High Performance Distributed Computing (HPDC’05). Forthcoming (2005)Google Scholar
  22. 22.
    Hughes, G., Mills, H., de Roure, D., Frey, J.G., Moreau, L., Schraefel, M.C., Smith, G., Zaluska, E.: The semantic smart laboratory: a system for supporting the chemical eScientist. Org. Biomol. Chem. 2(2), 1–10 (2004)Google Scholar
  23. 23.
    Lanter, D.: Design of a lineage-based meta-data base for GIS. Cartogr. Geogr. Inf. Syst. 18(4), 255–261 (1991a)CrossRefGoogle Scholar
  24. 24.
    Lanter, D.: Lineage in GIS: the problem and a solution. Technical report 90-6, National Center for Geographic Information and Analysis (NCGIA), UCSB, Santa Barbara, CA (1991b)Google Scholar
  25. 25.
    Lanter, D., Essinger, R.: User-centered graphical user interface design for GIS. Technical report 91-6, National Center for Geographic Information and Analysis (NCGIA). UCSB (1991)Google Scholar
  26. 26.
    Marathe, A.P.: Tracing lineage of array data. J. Intel. Inf. Syst. 17(2-3), 193–214 (2001)zbMATHCrossRefGoogle Scholar
  27. 27.
    Myers, J., Chappell, A., Elder, M., Geist, A., Schwidder, J.: Re-integrating the research record. IEEE Comput. Sci. Eng. 5(3), 44–50 (2003a)Google Scholar
  28. 28.
    Myers, J.D., Pancerella, C., Lansing, C., Schuchardt, K.L., Didier, B.: Multi-scale science: supporting emerging practice with semantically derived provenance. In: ISWC 2003 Workshop: Semantic Web Technologies for Searching and Retrieving Scientific Data. Sanibel Island, Florida, USA. Online Proceedings (2003b)Google Scholar
  29. 29.
  30. 30.
    Pope, A.: The CORBA Reference Guide: Understanding the Common Object Request Broker Architecture. Addison Wesley, Reading, MA (1997)Google Scholar
  31. 31.
    Proteomics Standards Initiative: (2006)
  32. 32.
    Ruth, P., Xu, D., Bhargava, B.K., Regnier, F.: E-notebook middleware for acccountability and reputation based trust in distributed data sharing communities. In: Proceedings 2nd Int. Conf. on Trust Management, Oxford, UK, vol. 2995 of LNCS. pp. 161–175 (2004)Google Scholar
  33. 33.
    Seltzer, M., Muniswamy-Reddy, K.-K., Holland, D.A., Braun, U., Ledlie, J.: Provenance-aware storage systems. Technical report, Harvard University Computer Science Technical Report TR-18-05 (2005)Google Scholar
  34. 34.
    Szomszor, M., Moreau, L.: Recording and reasoning over data provenance in web and Grid services. In: Int. Conf. on Ontologies, Databases and Applications of Semantics, vol. 2888 of LNCS. Catania, Sicily, Italy, pp. 603–620 (2003)Google Scholar
  35. 35.
    Tan, V.H.K.: Interaction tracing for mobile agent security. PhD thesis, University of Southampton (2004)Google Scholar
  36. 36.
    Townend, P., Groth, P., Xu, J.: A provenance-aware weighted fault tolerance scheme for service-based applications. In: Proceedings of the 8th IEEE International Symposium on Object-oriented Real-time distributed Computing (ISORC 2005), pp. 258–266 (2005)Google Scholar
  37. 37.
    Vahdat, A., Anderson, T.: Transparent result caching. In: Proceedings of the 1998 USENIX Technical Conference, New Orleans, LA, pp. 25–37 (1998)Google Scholar
  38. 38.
    Waldo, J.: The Jini Specifications (2nd edn), Addison-Wesley, Reading, MA (2000)Google Scholar
  39. 39.
    Web Services Architecture: (2006)
  40. 40.
    Woodruff, A., Stonebraker, M.: Supporting fine-grained data lineage in a database visualization environment. In: Proceedings of the 13th International Conference on Data Engineering, Birmingham, England, pp. 91–102 (1997)Google Scholar
  41. 41.
    Woodruff, A.G.: Data lineage and information density in database visualization. Ph.D. thesis, University of California at Berkeley (1998)Google Scholar
  42. 42.
    Zhao, J., Goble, C., Greenwood, M., Wroe, C., Stevens, R.: Annotating, linking and browsing provenance logs for e-Science. In: Proceedings of the Workshop on Semantic Web Technologies for Searching and Retrieving Scientific Data. pp. 92–106, Online Proceedings (2003)Google Scholar
  43. 43.
    Zhao, J., Wroe, C., Goble, C., Stevens, R., Quan, D., Greenwood, M.: Using semantic web technologies for representing e-Science provenance. In: Proceedings of Third International Semantic Web Conference (ISWC2004), vol. 3298 of LNCS, pp. 92–106, Hiroshima, Japan (2004)Google Scholar

Copyright information

© Springer Science + Business Media B.V. 2006

Authors and Affiliations

  • Simon Miles
    • 1
    Email author
  • Paul Groth
    • 1
  • Miguel Branco
    • 1
  • Luc Moreau
    • 1
  1. 1.School of Electronics and Computer ScienceUniversity of SouthamptonSouthamptonUK

Personalised recommendations