Towards Automatic Capturing of Semi-structured Process Provenance

  • Andreas Wombacher
  • Mohammad Rezwanul Huq
Conference paper
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 162)


Often data processing is not implemented by a workflow system or an integration application but is performed manually by humans along the lines of a more or less specified procedure. Collecting provenance information in semi-structured processes can not be automated. Further, manual collection of provenance information is error prone and time consuming. Therefore, we propose to infer provenance information based on the file read and write access of users. The derived provenance information is complete, but has a low precision. Therefore, we propose further to introducing organizational guidelines in order to improve the precision of the inferred provenance information.


Modify Entry Data Provenance Provenance Information Manual Manipulation Version Control System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Ludascher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger, E., Jones, M., Lee, E., Tao, J., Zhao, Y.: Scientific workflow management and the Kepler system. Concurrency and Computation: Practice and Experience 18(10), 1039–1065 (2006)CrossRefGoogle Scholar
  2. 2.
    Oinn, T., Addis, M., Ferris, J., Marvin, D., Greenwood, M., Carver, T., Pocock, M., Wipat, A., Li, P.: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20(17), 3045–3054 (2004)CrossRefGoogle Scholar
  3. 3.
    Huq, M.R., Wombacher, A., Apers, P.M.G.: Facilitating fine grained data provenance using temporal data model. In: Proc. 7. Intl Workshop on Data Management for Sensor Networks, DMSN, pp. 8–13. ACM (September 2010)Google Scholar
  4. 4.
    Cui, Y., Widom, J.: Lineage tracing for general data warehouse transformations. VLDB Journal 12(1), 41–58 (2003)CrossRefGoogle Scholar
  5. 5.
    Szomszor, M., Moreau, L.: Recording and reasoning over data provenance in web and grid services. In: Meersman, R., Schmidt, D.C. (eds.) CoopIS 2003, DOA 2003, and ODBASE 2003. LNCS, vol. 2888, pp. 603–620. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  6. 6.
    Simmhan, Y.L., Plale, B., Gannon, D.: A survey of data provenance in e-science. SIGMOD Rec. 34(3), 31–36 (2005)CrossRefGoogle Scholar
  7. 7.
    Allen, M.D., Chapman, A., Blaustein, B., Seligman, L.: Capturing provenance in the wild. In: McGuinness, D.L., Michaelis, J.R., Moreau, L. (eds.) IPAW 2010. LNCS, vol. 6378, pp. 98–101. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  8. 8.
    Seltzer, M., Muniswamy-Reddy, K.K., Holland, D.A., Braun, U., Ledlie, J.: Provenance-aware storage systems. In: Proceedings of the USENIX Annual Technical Conference, USENIX 2006 (June 2006)Google Scholar
  9. 9.
    Margo, D.W., Seltzer, M.I.: The case for browser provenance. In: Cheney, J. (ed.) Workshop on the Theory and Practice of Provenance. USENIX (2009)Google Scholar
  10. 10.
    Futrelle, J.: Tupelo server,
  11. 11.
    Simmhan, Y.L., Plale, B., Gannon, D.: Karma2: Provenance management for data driven workflows. Intl. J. of Web Services Research 5, 1–23 (2008)CrossRefGoogle Scholar
  12. 12.
    Moreau, L., Freire, J., Futrelle, J., McGrath, R.E., Myers, J., Paulson, P.: The open provenance model: An overview. In: Freire, J., Koop, D., Moreau, L. (eds.) IPAW 2008. LNCS, vol. 5272, pp. 323–326. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  13. 13.
    Misra, A., Blount, M.L., Kementsietsidis, A., Sow, D., Wang, M.: Advances and Challenges for Scalable Provenance in Stream Processing Systems. In: Freire, J., Koop, D., Moreau, L. (eds.) IPAW 2008. LNCS, vol. 5272, pp. 253–265. Springer, Heidelberg (2008)CrossRefGoogle Scholar

Copyright information

© IFIP International Federation for Information Processing 2013

Authors and Affiliations

  • Andreas Wombacher
    • 1
  • Mohammad Rezwanul Huq
    • 1
  1. 1.University of TwenteEnschedeThe Netherlands

Personalised recommendations