Towards a Universal Data Provenance Framework Using Dynamic Instrumentation

  • Eleni Gessiou
  • Vasilis Pappas
  • Elias Athanasopoulos
  • Angelos D. Keromytis
  • Sotiris Ioannidis
Part of the IFIP Advances in Information and Communication Technology book series (IFIPAICT, volume 376)


The advantage of collecting data provenance information has driven research on how to extend or modify applications and systems in order to provide it, or the creation of architectures that are built from the ground up with provenance capabilities. In this paper we propose a universal data provenance framework, using dynamic instrumentation, which gathers data provenance information for real-world applications without any code modifications. Our framework simplifies the task of finding the right points to instrument, which can be cumbersome in large and complex systems. We have built a proof-of-concept implementation of the framework on top of DTrace. Moreover, we evaluated its functionality by using it for three different scenarios: file-system operations, database transactions and web browser HTTP requests. Based on our experiences we believe that it is possible to provide data provenance, transparently, to any layer of the software stack.


System Call Data Provenance Provenance Information IEEE Internet Computing USENIX Association 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
  2. 2.
    Aho, A.V., Kernighan, B.W., Weinberger, P.J.: The AWK Programming Language. Addison-Wesley (1988)Google Scholar
  3. 3.
  4. 4.
    Buneman, P., Tan, W.-C.: Provenance in databases. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, SIGMOD 2007, pp. 1171–1173. ACM, New York (2007)CrossRefGoogle Scholar
  5. 5.
    Cantrill, B.M., Shapiro, M.W., Leventhal, A.H.: Dynamic instrumentation of production systems. In: Proceedings of the USENIX Annual Technical Conference (ATC), pp. 15–28 (2004)Google Scholar
  6. 6.
    Cheney, J., Chiticariu, L., Tan, W.-C.: Provenance in Databases: Why, How, and Where. Foundations and Trends in Databases 1(4), 379–474 (2007)CrossRefGoogle Scholar
  7. 7.
    Demsky, B.: Garm: cross application data provenance and policy enforcement. In: Proceedings of the 4th USENIX Conference on Hot Topics in Security, HotSec 2009, p. 10. USENIX Association, Berkeley (2009)Google Scholar
  8. 8.
    Dietz, M., Shekhar, S., Pisetsky, Y., Shu, A., Wallach, D.S.: Quire: Lightweight provenance for smart phone operating systems. In: Proceedings of the 20th USENIX Security Symposium, San Francisco, CA (August 2011)Google Scholar
  9. 9.
    FreeBSD. DTrace – FreeBSD Wiki,
  10. 10.
    Jones, S., Strong, C., Long, D.D.E., Miller, E.L.: Tracking emigrant data via transient provenance. In: Proceedings of the 3rd USENIX Workshop on the Theory and Practice of Provenance (TaPP 2011), Heraklion, Greece (June 2011) Google Scholar
  11. 11.
    Kemerlis, V.P., Pappas, V., Portokalidis, G., Keromytis, A.D.: iLeak: A lightweight system for detecting inadvertent information leaks. In: Proceedings of the 6th European Conference on Computer Network Defense (EC2ND), Berlin, Germany, pp. 21–28 (October 2010)Google Scholar
  12. 12.
    Kemerlis, V.P., Portokalidis, G., Jee, K., Keromytis, A.D.: libdft: Practical dynamic data flow tracking for commodity systems. In: Proceedings of the 8th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE), London, UK (March 2012)Google Scholar
  13. 13.
    Lakshmanan, G.T., Curbera, F., Freire, J., Sheth, A.: Guest editors’ introduction: Provenance in web applications. IEEE Internet Computing 15(1), 17–21 (2011)CrossRefGoogle Scholar
  14. 14.
  15. 15.
    Luk, C.-K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: building customized program analysis tools with dynamic instrumentation. In: Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2005, pp. 190–200. ACM, New York (2005)CrossRefGoogle Scholar
  16. 16.
    Margo, D.W., Seltzer, M.: The case for browser provenance. In: Proceedings of the First Workshop on on Theory and Practice of Provenance, pp. 9:1–9:5. USENIX Association, Berkeley (2009)Google Scholar
  17. 17.
    Michaelis, J.R., McGuinness, D.L.: Towards Provenance Aware Comment Tracking for Web Applications. In: McGuinness, D.L., Michaelis, J.R., Moreau, L. (eds.) IPAW 2010. LNCS, vol. 6378, pp. 265–273. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  18. 18.
    Muniswamy-Reddy, K.-K., Holland, D.A., Braun, U., Seltzer, M.: Provenance-aware storage systems. In: Proceedings of the Annual Conference on USENIX 2006 Annual Technical Conference, p. 4. USENIX Association, Berkeley (2006)Google Scholar
  19. 19.
    Muniswamy-Reddy, K.-K., Macko, P., Seltzer, M.: Provenance for the cloud. In: Proceedings of the 8th USENIX Conference on File and Storage Technologies, FAST 2010, pp. 14–15. USENIX Association, Berkeley (2010)Google Scholar
  20. 20.
    QNX. The community portal for qnx software developers,
  21. 21.
    Spillane, R., Sears, R., Yalamanchili, C., Gaikwad, S., Chinni, M., Zadok, E.: Story book: an efficient extensible provenance framework. In: Proceedings of the First Workshop on Theory and Practice of Provenance, pp. 1–10. USENIX Association, Berkeley (2009)Google Scholar
  22. 22.
    Theoharis, Y., Fundulaki, I., Karvounarakis, G., Christophides, V.: On provenance of queries on semantic web data. IEEE Internet Computing 15, 31–39 (2011)CrossRefGoogle Scholar
  23. 23.
    Viega, J., Messier, M., Chandra, P.: Network security with OpenSSL. O’Reilly Media (2002)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2012

Authors and Affiliations

  • Eleni Gessiou
    • 1
  • Vasilis Pappas
    • 2
  • Elias Athanasopoulos
    • 2
  • Angelos D. Keromytis
    • 2
  • Sotiris Ioannidis
    • 3
  1. 1.Computer Science & Engineering DepartmentPolytechnic Institute of NYUUSA
  2. 2.Department of Computer ScienceColumbia UniversityUSA
  3. 3.Institute of Computer ScienceFoundation for Research and Technology - HellasGreece

Personalised recommendations