Advertisement

SPADE: Support for Provenance Auditing in Distributed Environments

  • Ashish Gehani
  • Dawood Tariq
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7662)

Abstract

SPADE is an open source software infrastructure for data provenance collection and management. The underlying data model used throughout the system is graph-based, consisting of vertices and directed edges that are modeled after the node and relationship types described in the Open Provenance Model. The system has been designed to decouple the collection, storage, and querying of provenance metadata. At its core is a novel provenance kernel that mediates between the producers and consumers of provenance information, and handles the persistent storage of records. It operates as a service, peering with remote instances to enable distributed provenance queries. The provenance kernel on each host handles the buffering, filtering, and multiplexing of incoming metadata from multiple sources, including the operating system, applications, and manual curation. Provenance elements can be located locally with queries that use wildcard, fuzzy, proximity, range, and Boolean operators. Ancestor and descendant queries are transparently propagated across hosts until a terminating expression is satisfied, while distributed path queries are accelerated with provenance sketches.

Keywords

Snow Leopard Data Provenance Provenance Information Provenance Event Audit Record 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abraham, J., Brazier, P., Chebotko, A., Navarro, J., Piazza, A.: Distributed storage and querying techniques for a semantic Web of scientific workflow provenance. In: IEEE International Conference on Services Computing (2010)Google Scholar
  2. 2.
    Nedim Alpdemir, M., Mukherjee, A., Paton, N.W., Fernandes, A.A.A., Watson, P., Glover, K., Greenhalgh, C., Oinn, T., Tipney, H.: Contextualised Workflow Execution in MyGrid. In: Sloot, P.M.A., Hoekstra, A.G., Priol, T., Reinefeld, A., Bubak, M. (eds.) EGC 2005. LNCS, vol. 3470, pp. 444–453. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  3. 3.
    Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research 25 (1997)Google Scholar
  4. 4.
    Apache Web Server (Version 2.2.22), http://httpd.apache.org/
  5. 5.
  6. 6.
    Bhagwat, D., Chiticariu, L., Tan, W.-C., Vijayvargiya, G.: An annotation management system for relational databases. In: 30th ACM International Conference on Very Large Data Bases (2004)Google Scholar
  7. 7.
    Bose, R., Frew, J.: Lineage retrieval for scientific data processing: A survey. ACM Computing Surveys 37(1) (2005)Google Scholar
  8. 8.
    Callahan, S., Freire, J., Santos, E., Scheidegger, C., Silva, C., Vo, H.: VisTrails: Visualization meets data management. In: ACM SIGMOD International Conference on Management of Data (2006)Google Scholar
  9. 9.
    Chang, F., Dean, J., Ghemawat, S., Hsieh, W., Wallach, D., Burrows, M., Chandra, T., Fikes, A., Gruber, R.: BigTable: A distributed storage system for structured data. 7th USENIX Symposium on Operating Systems Design and Implementation (2006)Google Scholar
  10. 10.
  11. 11.
    Foster, I.T., Vckler, J.-S., Wilde, M., Zhao, Y.: A virtual data system for representing, querying, and automating data derivation. In: Scientific and Statistical Database Management Conference (2002)Google Scholar
  12. 12.
    Frew, J., Bose, R.: Earth System Science Workbench: A data management infrastructure for earth science products. In: Scientific and Statistical Database Management Conference (2001)Google Scholar
  13. 13.
    Frew, J., Metzger, D., Slaughter, P.: Automatic capture and reconstruction of computational provenance. Concurrency and Computation 20(5) (2008)Google Scholar
  14. 14.
    Filesystem in Userspace, http://fuse.sourceforge.net
  15. 15.
    Gehani, A., Lindqvist, U.: Bonsai: Balanced lineage authentication. In: 23rd Annual Computer Security Applications Conference. IEEE Computer Society (2007)Google Scholar
  16. 16.
    Gehani, A., Kim, M., Zhang, J.: Steps toward managing lineage metadata in Grid clusters. In: 1st Workshop on the Theory and Practice of Provenance (2009)Google Scholar
  17. 17.
    Gehani, A., Kim, M., Malik, T.: Efficient querying of distributed provenance stores. In: 8th ACM Workshop on the Challenges of Large Applications in Distributed Environments (2010)Google Scholar
  18. 18.
    Gehani, A., Kim, M.: Mendel: Efficiently verifying the lineage of data modified in multiple trust domains. In: 19th ACM International Symposium on High Performance Distributed Computing (2010)Google Scholar
  19. 19.
    Gehani, A., Tariq, D., Baig, B., Malik, T.: Policy-based integration of provenance metadata. In: 12th IEEE International Symposium on Policies for Distributed Systems and Networks (2011)Google Scholar
  20. 20.
    Glavic, B., Alonso, G.: Perm: Processing provenance and data on the same data model through query rewriting. In: 25th International Conference on Data Engineering (2009)Google Scholar
  21. 21.
  22. 22.
    Green, T., Karvounarakis, G., Tannen, V.: Provenance semirings. In: 26th ACM Symposium on Principles of Database Systems (2007)Google Scholar
  23. 23.
    Groth, P., Moreau, L.: Representing distributed systems using the Open Provenance Model. Future Generation Computer Systems 27(6) (2011)Google Scholar
  24. 24.
    Heydon, A., Levin, R., Mann, T., Yu, Y.: The Vesta Approach to Software Configuration Management. Technical Report 168, Compaq Systems Research Center (2001)Google Scholar
  25. 25.
    Heinis, T., Alonso, G.: Efficient lineage tracking for scientific workflows. In: ACM SIGMOD International Conference on Management of Data (2008)Google Scholar
  26. 26.
    Holland, D.A., Braun, U., Maclean, D., Muniswamy-Reddy, K., Seltzer, M.: Choosing a data model and query language for provenance. In: 2nd International Provenance and Annotation Workshop (2008)Google Scholar
  27. 27.
  28. 28.
  29. 29.
    Influenza Data, National Institutes of Health, ftp://ftp.ncbi.nlm.nih.gov/genomes/INFLUENZA/influenza.faa
  30. 30.
  31. 31.
    Java Native Interface, http://java.sun.com/docs/books/jni/
  32. 32.
    Kementsietsidis, A., Wang, M.: On the efficiency of provenance queries. In: 25th International Conference on Data Engineering (2009)Google Scholar
  33. 33.
  34. 34.
  35. 35.
  36. 36.
  37. 37.
  38. 38.
    Macko, P., Seltzer, M.: A general-purpose provenance library. In: 4th USENIX Workshop on the Theory and Practice of Provenance (2012)Google Scholar
  39. 39.
    Malik, T., Gehani, A., Tariq, D., Zaffar, F.: Sketching Distributed Data Provenance. In: Liu, Q., Bai, Q., Giugni, S., Williamson, D., Taylor, J. (eds.) Data Provenance and Data Management in eScience. SCI, vol. 426, pp. 85–108. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  40. 40.
  41. 41.
    Miles, S., Deelman, E., Groth, P., Vahi, K., Mehta, G., Moreau, L.: Connecting scientific data to scientific experiments with provenance. In: 3rd IEEE International Conference on e-Science and Grid Computing (2007)Google Scholar
  42. 42.
    Moreau, L., Clifford, B., Freire, J., Futrelle, J., Gil, Y., Groth, P., Kwasnikowska, N., Miles, S., Missier, P., Myers, J., Plale, B., Simmhan, Y., Stephan, E., Van den Bussche, J.: The Open Provenance Model core specification (v1.1). Future Generation Computer Systems (2010)Google Scholar
  43. 43.
  44. 44.
  45. 45.
    Novel Information Gathering and Harvesting Techniques for Intelligence in Global Autonomous Language Exploitation, http://www.speech.sri.com/projects/GALE/
  46. 46.
  47. 47.
    Open Provenance Model, http://openprovenance.org/
  48. 48.
    Pancerella, C., Hewson, J., Koegler, W., Leahy, D., Lee, M., Rahn, L., Yang, C., Myers, J.D., Didier, B., McCoy, R., Schuchardt, K., Stephan, E., Windus, T., Amin, K., Bittner, S., Lansing, C., Minkoff, M., Nijsure, S., van. Laszewski, G., Pinzon, R., Ruscic, B., Wagner, A., Wang, B., Pitz, W., Ho, Y.L., Montoya, D., Xu, L., Allison, T.C., Green Jr., W.H., Frenklach, M.: Metadata in the collaboratory for multi-scale chemical science. In: Dublin Core Conference (2003)Google Scholar
  49. 49.
  50. 50.
    Rajgarhia, A., Gehani, A.: Performance and extension of user space file systems. In: 25th ACM Symposium on Applied Computing (2010)Google Scholar
  51. 51.
    Muniswamy-Reddy, K.-K., Holland, D.A., Braun, U., Seltzer, M.: Provenance-aware storage systems. In: USENIX Annual Technical Conference (2006)Google Scholar
  52. 52.
    Muniswamy-Reddy, K.-K, Braun, U., Holland, D.A., Macko, P., Maclean, D., Margo, D., Seltzer, M., Smogor, R.: Layering in provenance systems. In: USENIX Annual Technical Conference (2009)Google Scholar
  53. 53.
    Muniswamy-Reddy, K.-K., Macko, P., Seltzer, M.: Making a Cloud provenance-aware. In: 1st USENIX Workshop on the Theory and Practice of Provenance (2009)Google Scholar
  54. 54.
    Muniswamy-Reddy, K.-K., Macko, P., Seltzer, M.: Provenance for the Cloud. In: 8th USENIX Conference on File and Storage Technologies (2010)Google Scholar
  55. 55.
  56. 56.
    Scalable Authentication of Grid Data Provenance, http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=0722068
  57. 57.
    Silva, C.T., Freire, J., Callahan, S.: Provenance for visualizations: Reproducibility and beyond. Computing in Science and Engineering 9(5) (2007)Google Scholar
  58. 58.
    SLAC National Accelerator Laboratory, http://www.slac.stanford.edu/
  59. 59.
    Support for Provenance Auditing in Distributed Environments, http://spade.csl.sri.com/
  60. 60.
    Szomszor, M., Moreau, L.: Recording and Reasoning over Data Provenance in Web and Grid Services. In: Meersman, R., Schmidt, D.C. (eds.) CoopIS/DOA/ODBASE 2003. LNCS, vol. 2888, pp. 603–620. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  61. 61.
    Tariq, D., Ali, M., Gehani, A.: Towards Automated Collection of Application-Level Data Provenance. In: 4th USENIX Workshop on the Theory and Practice of Provenance (2012)Google Scholar
  62. 62.
  63. 63.
  64. 64.
  65. 65.
    Widom, J.: Trio: A system for integrated management of data, accuracy and lineage. In: 2nd Conference on Innovative Data Systems Research (2005)Google Scholar
  66. 66.
    Windows Management Instrumentation, http://msdn.microsoft.com/en-us/library/aa394582(v=VS.85).aspxGoogle Scholar
  67. 67.
    Zhao, J., Goble, C.A., Stevens, R., Bechhofer, S.: Semantically Linking and Browsing Provenance Logs for E-science. In: Bouzeghoub, M., Goble, C.A., Kashyap, V., Spaccapietra, S. (eds.) ICSNW 2004. LNCS, vol. 3226, pp. 158–176. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  68. 68.
    Zhou, W., Sherr, M., Tao, T., Li, X., Loo, B., Mao, Y.: Efficient querying and maintenance of network provenance at Internet-scale. In: ACM SIGMOD International Conference on Management of Data (2010)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2012

Authors and Affiliations

  • Ashish Gehani
    • 1
  • Dawood Tariq
    • 1
  1. 1.SRI InternationalUS

Personalised recommendations