Abstract
Often data processing is not implemented by a workflow system or an integration application but is performed manually by humans along the lines of a more or less specified procedure. Collecting provenance information in semi-structured processes can not be automated. Further, manual collection of provenance information is error prone and time consuming. Therefore, we propose to infer provenance information based on the file read and write access of users. The derived provenance information is complete, but has a low precision. Therefore, we propose further to introducing organizational guidelines in order to improve the precision of the inferred provenance information.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Ludascher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger, E., Jones, M., Lee, E., Tao, J., Zhao, Y.: Scientific workflow management and the Kepler system. Concurrency and Computation: Practice and Experience 18(10), 1039–1065 (2006)
Oinn, T., Addis, M., Ferris, J., Marvin, D., Greenwood, M., Carver, T., Pocock, M., Wipat, A., Li, P.: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20(17), 3045–3054 (2004)
Huq, M.R., Wombacher, A., Apers, P.M.G.: Facilitating fine grained data provenance using temporal data model. In: Proc. 7. Intl Workshop on Data Management for Sensor Networks, DMSN, pp. 8–13. ACM (September 2010)
Cui, Y., Widom, J.: Lineage tracing for general data warehouse transformations. VLDB Journal 12(1), 41–58 (2003)
Szomszor, M., Moreau, L.: Recording and reasoning over data provenance in web and grid services. In: Meersman, R., Schmidt, D.C. (eds.) CoopIS 2003, DOA 2003, and ODBASE 2003. LNCS, vol. 2888, pp. 603–620. Springer, Heidelberg (2003)
Simmhan, Y.L., Plale, B., Gannon, D.: A survey of data provenance in e-science. SIGMOD Rec. 34(3), 31–36 (2005)
Allen, M.D., Chapman, A., Blaustein, B., Seligman, L.: Capturing provenance in the wild. In: McGuinness, D.L., Michaelis, J.R., Moreau, L. (eds.) IPAW 2010. LNCS, vol. 6378, pp. 98–101. Springer, Heidelberg (2010)
Seltzer, M., Muniswamy-Reddy, K.K., Holland, D.A., Braun, U., Ledlie, J.: Provenance-aware storage systems. In: Proceedings of the USENIX Annual Technical Conference, USENIX 2006 (June 2006)
Margo, D.W., Seltzer, M.I.: The case for browser provenance. In: Cheney, J. (ed.) Workshop on the Theory and Practice of Provenance. USENIX (2009)
Futrelle, J.: Tupelo server, http://tupeloproject.ncsa.uiuc.edu/
Simmhan, Y.L., Plale, B., Gannon, D.: Karma2: Provenance management for data driven workflows. Intl. J. of Web Services Research 5, 1–23 (2008)
Moreau, L., Freire, J., Futrelle, J., McGrath, R.E., Myers, J., Paulson, P.: The open provenance model: An overview. In: Freire, J., Koop, D., Moreau, L. (eds.) IPAW 2008. LNCS, vol. 5272, pp. 323–326. Springer, Heidelberg (2008)
Misra, A., Blount, M.L., Kementsietsidis, A., Sow, D., Wang, M.: Advances and Challenges for Scalable Provenance in Stream Processing Systems. In: Freire, J., Koop, D., Moreau, L. (eds.) IPAW 2008. LNCS, vol. 5272, pp. 253–265. Springer, Heidelberg (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 IFIP International Federation for Information Processing
About this paper
Cite this paper
Wombacher, A., Huq, M.R. (2013). Towards Automatic Capturing of Semi-structured Process Provenance. In: Cudre-Mauroux, P., Ceravolo, P., Gašević, D. (eds) Data-Driven Process Discovery and Analysis. SIMPDA 2012. Lecture Notes in Business Information Processing, vol 162. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40919-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-40919-6_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40918-9
Online ISBN: 978-3-642-40919-6
eBook Packages: Computer ScienceComputer Science (R0)