Issues in Automatic Provenance Collection

  • Uri Braun
  • Simson Garfinkel
  • David A. Holland
  • Kiran-Kumar Muniswamy-Reddy
  • Margo I. Seltzer
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4145)


Automatic provenance collection describes systems that observe processes and data transformations inferring, collecting, and maintaining provenance about them. Automatic collection is a powerful tool for analysis of objects and processes, providing a level of transparency and pervasiveness not found in more conventional provenance systems. Unfortunately, automatic collection is also difficult. We discuss the challenges we encountered and the issues we exposed as we developed an automatic provenance collector that runs at the operating system level.


File System Semantic Knowledge Virtual Node Virtual Data Provenance Security 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Barham, P.T., Donnelly, A., Isaacs, R., Mortier, R.: Using magpie for request extraction and workload modelling. In: OSDI, pp. 259–272 (2004)Google Scholar
  2. 2.
    Muniswamy-Reddy, K.-K. Seltzer, M.: Coping with cycles in provenance,
  3. 3.
    Braun, U., Shinnar, A.: A Security Model for Provenance. Technical Report TR-04-06, Harvard University (January 2006)Google Scholar
  4. 4.
    Cornell, B., Dinda, P., Bustamante, F.: Wayback: A User-level Versioning File System for Linux. In: Proceedings of the USENIX 2004 Annual Technical Conference, FREENIX Track (2004)Google Scholar
  5. 5.
  6. 6.
    Edmonds, R.: Justice department hid parts of report criticizing diversity effort. Associated Press (October 31, 2003)Google Scholar
  7. 7.
    Foster, I., Voeckler, J., Wilde, M., Zhao, Y.: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration. In: CIDR, Asilomar, CA (January 2003)Google Scholar
  8. 8.
    Frew, J., Bose, R.: Earth system science workbench: A data management infrastructure for earth science products. In: Proceedings of the 13th International Conference on Scientific and Statistical Database Management, pp. 180–189. IEEE Computer Society, Los Alamitos (2001)Google Scholar
  9. 9.
  10. 10.
    Heydon, A., Levin, R., Mann, T., Yu, Y.: The Vesta Approach to Software Configuration Management. Technical Report 168, Compaq Systems Research Center (March 2001)Google Scholar
  11. 11.
    Hitz, D., Lau, J., Malcolm, M.: File System Design for an NFS File Server Appliance. In: Proceedings of the USENIX Winter Technical Conference, January 1994, pp. 235–245 (1994)Google Scholar
  12. 12.
    Muniswamy-Reddy, K., Wright, C.P., Himmer, A., Zadok, E.: A Versatile and User-Oriented Versioning File System. In: Proceedings of the Third USENIX Conference on File and Storage Technologies (FAST 2004), San Francisco, CA (March/April 2004)Google Scholar
  13. 13.
    Lee, E.K., Thekkath, C.A.: Petal: Distributed virtual disks. In: Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-7), Cambridge, MA, pp. 84–92 (1996)Google Scholar
  14. 14.
  15. 15.
    McCoy, K.: VMS File System Internals. Digital Press (1990)Google Scholar
  16. 16.
    Microsoft. How to use ntfs alternate data streams (July 13, 2004)Google Scholar
  17. 17.
    Muchnick, S.: Advanced Compiler Design and Implementation, ch. 8. Morgan Kaufmann, San Francisco (1997)Google Scholar
  18. 18.
    Muniswamy-Reddy, K.-K., Holland, D.A., Braun, U., Seltzer, M.: Provenance-aware storage systems. In: Proceedings of the 2006 USENIX Annual Technical Conference (June 2006)Google Scholar
  19. 19.
    Nost. Definition of the flexible image transport system (FITS) (1999)Google Scholar
  20. 20.
    Organisation for Economic Co-operation and Development. Guidelines on the protection of privacy and transborder flows of personal data (1980)Google Scholar
  21. 21.
    Pancerella, C., et al.: Metadata in the Collaboratory for Multi-scale Chemical Science. In: Dublin Core Conference, Seattle, WA (2003)Google Scholar
  22. 22.
    Peterson, Z.N.J., Burns, R.C.: Ext3cow: The design, Implementation, and Analysis of Metadat for a Time-Shifting File System. Technical Report HSSL-2003-03, Computer Science Department, The Johns Hopkins University (2003),
  23. 23.
    Provenance aware service oriented architecture,
  24. 24.
    Quinlan, S., Dorward, S.: Venti: a new approach to archival storage. In: Proceedings of First USENIX conference on File and Storage Technologies, January 2002, pp. 89–101 (2002)Google Scholar
  25. 25.
    Santry, D.J., Feeley, M.J., Hutchinson, N.C., Veitch, A.C.: Elephant: The file system that never forgets. In: Workshop on Hot Topics in Operating Systems, pp. 2–7 (1999)Google Scholar
  26. 26.
    Seward, J.: Valgrind, an open-source memory debugger for GNU/Linux (2005),
  27. 27.
    Shankland, S., Ard, S.: Document shows SCO prepped lawsuit against BofA. News.Com. (March 4, 2004)Google Scholar
  28. 28.
    Vahdat, A., Anderson, T.: Transparent result caching. Technical Report CSD-97-974, 8 (1997)Google Scholar
  29. 29.
    Wan, M., Rajasekar, A., Schroeder, W.: An Overview of the SRB 3.0: the Federated MCAT (September 2003),
  30. 30.
    Weitzner, D.J., Abelson, H., Berners-Lee, T., Hanson, C., Hendler, J., Kagal, L., McGuinness, D.L., Sussman, G.J., Waterman, K.K.: Transparent accountable data mining: New strategies for privacy protection. Technical report, Massachusets Institute of Technology Computer Science and Artificial Intelligence Laboratory (2006)Google Scholar
  31. 31.
    Wong, E.: Web site lists Iran coup names. The New York Times (June 24, 2000)Google Scholar
  32. 32.
    Zhao, J., Goble, M., Greenwood, C., Wroe, C., Stevens, R.: Annotating, linking and browsing provenance logs for e-scienceGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Uri Braun
    • 1
  • Simson Garfinkel
    • 1
  • David A. Holland
    • 1
  • Kiran-Kumar Muniswamy-Reddy
    • 1
  • Margo I. Seltzer
    • 1
  1. 1.Harvard UniversityCambridge

Personalised recommendations