Enabling Provenance on Large Scale e-Science Applications

  • Miguel Branco
  • Luc Moreau
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4145)


Large-scale e-Science experiments present unprecedented data handling requirements with their multi-petabyte data storages. Complex software applications, such as the ATLAS High Energy Physics experiment at CERN, run throughout Grid computing sites around the world in a distributed environment, with scientists performing concurrent analysis on data and producing new data products shared among the collaboration. In this paper, we introduce a multi-phase infrastructure to achieve data provenance for an e-Science experiment. We propose an infrastructure to integrate provenance onto an existing legacy application with strong emphasis on scalability and explore the relationship between provenance and metadata introducing a model where data provenance is made available as metadata through a separate reasoning phase.


  1. 1.
    Buneman, P., Khanna, S., Tan, W.C.: Data provenance: Some basic issues. In: Foundations of Software Technology and Theoretical Computer Science (2000)Google Scholar
  2. 2.
    Buneman, P., Khanna, S., Tajima, K., Tan, W.C.: Archiving scientific data. In: Proc. of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 1–12. ACM Press, New York (2002)CrossRefGoogle Scholar
  3. 3.
    Cui, Y., Widom, J.: Practical lineage tracing in data warehouses. In: Proceedings of the 16th International Conference on Data Engineering (ICDE 2000), San Diego, California (February 2000)Google Scholar
  4. 4.
    Widom, J., Cui, Y.: Lineage tracing for general data warehouse transformations. The VLDB Journal, 471–480 (2001)Google Scholar
  5. 5.
    Szomszor, M., Moreau, L.: Recording and reasoning over data provenance in web and grid services. In: Meersman, R., Tari, Z., Schmidt, D.C. (eds.) CoopIS 2003, DOA 2003, and ODBASE 2003. LNCS, vol. 2888, pp. 603–620. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  6. 6.
    McCool, R., Silva, P., McGuinness, D.: Knowledge provenance infrastructure. IEEE Data Eng. Bull. 26(4), 26–32 (2003)Google Scholar
  7. 7.
    Woodruff, Stonebraker, M.: Supporting fine-grained data lineage in a database visualization environment. In: ICDE 1997: Proceedings of the Thirteenth International Conference on Data Engineering, Washington, DC, USA, pp. 91–102. IEEE Computer Society, Los Alamitos (1997)CrossRefGoogle Scholar
  8. 8.
    Foster, I., Voeckler, J., Wilde, M., Zhao, Y.: Chimera: A virtual data system for representing, querying, and automating data derivation (2002)Google Scholar
  9. 9.
    Groth, P., Luck, M., Moreau, L.: Formalising a protocol for recording provenance in grids. In: Proc. of the UK OST e-Science second Al l Hands Meeting 2004 (AHM 2004), Nottingham, UK (September 2004)Google Scholar
  10. 10.
    Groth, P., Luck, M., Moreau, L.: A protocol for recording provenance in service-oriented grids. In: Higashino, T. (ed.) OPODIS 2004. LNCS, vol. 3544, pp. 124–139. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  11. 11.
    Singh, M.P., Huhns, M.N.: Service-Oriented Computing: Semantics, Processes, Agents. John Wiley & Sons, Ltd., Chichester (2005)Google Scholar
  12. 12.
    Foster, I., Kesselman, C., Tuecke, S.: The anatomy of the grid: Enabling scalable virtual organizations. LNCS. Springer, Heidelberg (2001)Google Scholar
  13. 13.
    Zhao, J., Goble, C., Stevens, R., Bechhofer, S.: Semantically Linking and Browsing Provenance Logs for e-Science. In: Bouzeghoub, M., Goble, C.A., Kashyap, V., Spaccapietra, S. (eds.) ICSNW 2004. LNCS, vol. 3226, pp. 158–176. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  14. 14.
    ATLAS Computing Group, ATLAS Computing Technical Design Report (June 20, 2005),

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Miguel Branco
    • 1
    • 2
  • Luc Moreau
    • 2
  1. 1.CERNEuropean Organization for, Nuclear ResearchGenève
  2. 2.University of SouthamptonSouthamptonUnited Kingdom

Personalised recommendations