Tracking Files in the Kepler Provenance Framework

  • Pierre Mouallem
  • Roselyne Barreto
  • Scott Klasky
  • Norbert Podhorszki
  • Mladen Vouk
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5566)


Workflow Management Systems (WFMS), such as Kepler, are proving to be an important tool in scientific problem solving. They can automate and manage complex processes and huge amounts of data produced by petascale simulations. Typically, the produced data need to be properly visualized and analyzed by scientists in order to achieve the desired scientific goals. Both run-time and post analysis may benefit from, even require, additional meta-data – provenance information. One of the challenges in this context is the tracking of the data files that can be produced in very large numbers during stages of the workflow, such as visualizations. The Kepler provenance framework collects all or part of the raw information flowing through the workflow graph. This information then needs to be further parsed to extract meta-data of interest. This can be done through add-on tools and algorithms. We show how to automate tracking specific information such as data files locations.


Data Tracking Data Provenance Scientific Data Management  Scientific Workflows 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Scientific Process Automation (SPA) (2009),
  2. 2.
    Kepler Project (2009),
  3. 3.
    Eker, J., Janneck, J., Lee, E., Liu, J., Liu, X., Ludvig, J., Neuendorffer, S., Sachs, S., Xiong, Y.: Taming heterogeneity – the ptolemy approach. Proceedings of the IEEE 91(1) (January 2003)Google Scholar
  4. 4.
    Altintas, I., Barney, O., Jaeger-Frank, E.: Provenance Collection Support in the Kepler Scientific Workflow System. In: International Provenance and Annotation Workshop (IPAW 2006), Chicago, Illinois, USA, May 3-5 (2006)Google Scholar
  5. 5.
    Klasky, S., Barreto, R., Kahn, A., Parashar, M., Podhorszki, N., Parker, S., Silver, D., Vouk, M.: Collaborative visualization spaces for petascale simulations. In: International Symposium on Collaborative Technologies and Systems, pp. 203–211 (May 2008)Google Scholar
  6. 6.
    Altintas, I., et al.: Provenance in Kepler-based Scientific Workflow Systems. In: Poster # 41, at Microsoft eScience Workshop Friday Center, October 13 - 15, p. 82. University of North Carolina, Chapell Hill, NC (2007)Google Scholar
  7. 7.
    Kepler Provenance Recorder Framework (2009),
  8. 8.
    Nagappan, M., Vouk, M.: A Privacy Policy Model for Sharing of Provenance Information in a Query Based System. In: Short Paper and Poster in the Second International Provenance and Annotation Workshop (IPAW 2008), Salt Lake City, UT, June 17-18 (2008)Google Scholar
  9. 9.
    Klasky, S., Beck, M., Bhat, V., Feibush, E., Ludäscher, B., Parashar, M., Shoshani, A., Silver, D., Vouk, M.: Data management on the fusion computational pipeline. In: SciDAC 2006, Journal of Physics: Conference Series, vol. 16, pp. 510–520 (2005)Google Scholar
  10. 10.
    Cummings, J., Pankin, A., Podhosrzki, N., Park, G., Ku, S., Barreto, R., Klasky, S., Chang, C.S., Strauss, H., Sugiyama, L., Snyder, P., Pearlstein, D., Ludäscher, B., Bateman, G., Kritz, A.: Plasma edge kinetic-MHD modeling in tokamaks using Kepler workflow for code coupling, data management and visualization. Communications in Computational Physics 4, 675–702 (2008)Google Scholar
  11. 11.
    Chen, J., Choudhary, A., Supinski, B., DeVries, M., Hawkes, E., Klasky, S., Liao, W., Ma, K., Mellor-Crummey, J., Podhorszki, N., Sankaran, R., Shende, S., Yoo, C.: Terascale direct numerical simulations of turbulent combustion using S3D. Computational Science and Discovery 2015001, 31 (2009)Google Scholar
  12. 12.
    Cormen, T., Leiserson, C., Rivest, R.: Introduction to Algorithms, 2nd edn. MIT Press, Cambridge (2001)zbMATHGoogle Scholar
  13. 13.
    High Performance Storage System (2009),
  14. 14.
    Simmhan, Y., Plale, B., Gannon, D.: Karma2: Provenance management for data driven workflows. International Journal of Web Services Research 5, 1 (2008)CrossRefGoogle Scholar
  15. 15.
    Miles, S., Groth, P., Munroe, S., Jiang, S., Assandri, T., Moreau, L.: Extracting Causal Graphs from an Open Provenance Data Model. Concurrency and Computation: Practice and Experience 20(5), 577–586 (2008)CrossRefGoogle Scholar
  16. 16.
    Cohen, S., Boulakia, S., Davidson, S.: Towards a model of provenance and user views in scientific workflows. In: Leser, U., Naumann, F., Eckman, B. (eds.) DILS 2006. LNCS (LNBI), vol. 4075, pp. 264–279. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  17. 17.
    Kim, J., Deelman, E., Gil, Y., Mehta, G., Ratnakar, V.: Provenance trails in the wings/pegasus system. Concurrency and Computation: Practice and Experience 20(5), 587–597 (2008)CrossRefGoogle Scholar
  18. 18.
    Freire, J., Silva, C., Callahan, S., Santos, E., Scheidegger, C., Vo, H.: Managing rapidly-evolving scientific workflows. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 10–18. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  19. 19.
    Heinis, T., Alonso, G.: Efficient lineage tracking for scientific workflows. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pp. 1007–1018 (2008)Google Scholar
  20. 20.
  21. 21.
  22. 22.
    Podhorszki, N., Altintas, I., Bowers, S., Guan, Z., Ludaescher, B., McPhillips, T.: RWS, in First Provenance Challenge (2006),
  23. 23.
    Bowers, S., McPhillips, T., Ludäscher, B., Cohen, S., Davidson, S.B.: A Model for User-Oriented Data Provenance in Pipelined Scientific Workflows. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 133–147. Springer, Heidelberg (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Pierre Mouallem
    • 1
  • Roselyne Barreto
    • 2
  • Scott Klasky
    • 2
  • Norbert Podhorszki
    • 2
  • Mladen Vouk
    • 1
  1. 1.North Carolina State UniversityRaleighUSA
  2. 2.Oak Ridge National LaboratoryOak RidgeUSA

Personalised recommendations