Abstract
Workflow Management Systems (WFMS), such as Kepler, are proving to be an important tool in scientific problem solving. They can automate and manage complex processes and huge amounts of data produced by petascale simulations. Typically, the produced data need to be properly visualized and analyzed by scientists in order to achieve the desired scientific goals. Both run-time and post analysis may benefit from, even require, additional meta-data – provenance information. One of the challenges in this context is the tracking of the data files that can be produced in very large numbers during stages of the workflow, such as visualizations. The Kepler provenance framework collects all or part of the raw information flowing through the workflow graph. This information then needs to be further parsed to extract meta-data of interest. This can be done through add-on tools and algorithms. We show how to automate tracking specific information such as data files locations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Scientific Process Automation (SPA) (2009), http://sdm.lbl.gov/sdmcenter/
Kepler Project (2009), http://kepler-project.org/
Eker, J., Janneck, J., Lee, E., Liu, J., Liu, X., Ludvig, J., Neuendorffer, S., Sachs, S., Xiong, Y.: Taming heterogeneity – the ptolemy approach. Proceedings of the IEEE 91(1) (January 2003)
Altintas, I., Barney, O., Jaeger-Frank, E.: Provenance Collection Support in the Kepler Scientific Workflow System. In: International Provenance and Annotation Workshop (IPAW 2006), Chicago, Illinois, USA, May 3-5 (2006)
Klasky, S., Barreto, R., Kahn, A., Parashar, M., Podhorszki, N., Parker, S., Silver, D., Vouk, M.: Collaborative visualization spaces for petascale simulations. In: International Symposium on Collaborative Technologies and Systems, pp. 203–211 (May 2008)
Altintas, I., et al.: Provenance in Kepler-based Scientific Workflow Systems. In: Poster # 41, at Microsoft eScience Workshop Friday Center, October 13 - 15, p. 82. University of North Carolina, Chapell Hill, NC (2007)
Kepler Provenance Recorder Framework (2009), http://kepler-project.org/Wiki.jsp?page=KeplerProvenanceFramework
Nagappan, M., Vouk, M.: A Privacy Policy Model for Sharing of Provenance Information in a Query Based System. In: Short Paper and Poster in the Second International Provenance and Annotation Workshop (IPAW 2008), Salt Lake City, UT, June 17-18 (2008)
Klasky, S., Beck, M., Bhat, V., Feibush, E., Ludäscher, B., Parashar, M., Shoshani, A., Silver, D., Vouk, M.: Data management on the fusion computational pipeline. In: SciDAC 2006, Journal of Physics: Conference Series, vol. 16, pp. 510–520 (2005)
Cummings, J., Pankin, A., Podhosrzki, N., Park, G., Ku, S., Barreto, R., Klasky, S., Chang, C.S., Strauss, H., Sugiyama, L., Snyder, P., Pearlstein, D., Ludäscher, B., Bateman, G., Kritz, A.: Plasma edge kinetic-MHD modeling in tokamaks using Kepler workflow for code coupling, data management and visualization. Communications in Computational Physics 4, 675–702 (2008)
Chen, J., Choudhary, A., Supinski, B., DeVries, M., Hawkes, E., Klasky, S., Liao, W., Ma, K., Mellor-Crummey, J., Podhorszki, N., Sankaran, R., Shende, S., Yoo, C.: Terascale direct numerical simulations of turbulent combustion using S3D. Computational Science and Discovery 2015001, 31 (2009)
Cormen, T., Leiserson, C., Rivest, R.: Introduction to Algorithms, 2nd edn. MIT Press, Cambridge (2001)
High Performance Storage System (2009), http://www.hpss-collaboration.org/hpss/index.jsp
Simmhan, Y., Plale, B., Gannon, D.: Karma2: Provenance management for data driven workflows. International Journal of Web Services Research 5, 1 (2008)
Miles, S., Groth, P., Munroe, S., Jiang, S., Assandri, T., Moreau, L.: Extracting Causal Graphs from an Open Provenance Data Model. Concurrency and Computation: Practice and Experience 20(5), 577–586 (2008)
Cohen, S., Boulakia, S., Davidson, S.: Towards a model of provenance and user views in scientific workflows. In: Leser, U., Naumann, F., Eckman, B. (eds.) DILS 2006. LNCS (LNBI), vol. 4075, pp. 264–279. Springer, Heidelberg (2006)
Kim, J., Deelman, E., Gil, Y., Mehta, G., Ratnakar, V.: Provenance trails in the wings/pegasus system. Concurrency and Computation: Practice and Experience 20(5), 587–597 (2008)
Freire, J., Silva, C., Callahan, S., Santos, E., Scheidegger, C., Vo, H.: Managing rapidly-evolving scientific workflows. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 10–18. Springer, Heidelberg (2006)
Heinis, T., Alonso, G.: Efficient lineage tracking for scientific workflows. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pp. 1007–1018 (2008)
OWL (2009), http://www.w3.org/TR/owl-guide
Provenance Challenge, http://twiki.ipaw.info/bin/view/Challenge/
Podhorszki, N., Altintas, I., Bowers, S., Guan, Z., Ludaescher, B., McPhillips, T.: RWS, in First Provenance Challenge (2006), http://twiki.ipaw.info/bin/view/Challenge/RWS
Bowers, S., McPhillips, T., Ludäscher, B., Cohen, S., Davidson, S.B.: A Model for User-Oriented Data Provenance in Pipelined Scientific Workflows. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 133–147. Springer, Heidelberg (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mouallem, P., Barreto, R., Klasky, S., Podhorszki, N., Vouk, M. (2009). Tracking Files in the Kepler Provenance Framework. In: Winslett, M. (eds) Scientific and Statistical Database Management. SSDBM 2009. Lecture Notes in Computer Science, vol 5566. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02279-1_21
Download citation
DOI: https://doi.org/10.1007/978-3-642-02279-1_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02278-4
Online ISBN: 978-3-642-02279-1
eBook Packages: Computer ScienceComputer Science (R0)