Hiding Data and Structure in Workflow Provenance

  • Susan Davidson
  • Zhuowei Bao
  • Sudeepa Roy
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7108)


In this paper we discuss the use of views to address the problem of providing useful answers to provenance queries while ensuring that privacy concerns are met. In particular, we propose a hierarchical workflow model, based on context-free graph grammars, in which fine-grained dependencies between the inputs and outputs of a module are explicitly specified. Using this model, we examine how privacy concerns surrounding data, module function, and workflow structure can be addressed.


Data Item Output Port Input Port Privacy Concern Graph Grammar 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Backstrom, L., Dwork, C., Kleinberg, J.M.: Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography. In: WWW, pp. 181–190 (2007)Google Scholar
  2. 2.
    Bao, Z., Davidson, S., Milo, T.: A Fine-Grained Workflow Model with Provenance-Aware Security Views. In: Proceedings of TaPP (2011)Google Scholar
  3. 3.
    Beeri, C., Eyal, A., Kamenkovich, S., Milo, T.: Querying business processes. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 343–354 (2006)Google Scholar
  4. 4.
    Biton, O., Boulakia, S.C., Davidson, S.B., Hara, C.S.: Querying and Managing Provenance through User Views in Scientific Workflows. In: ICDE, pp. 1072–1081 (2008)Google Scholar
  5. 5.
    Bose, R., Foster, I., Moreau, L.: Report on the International Provenance and Annotation Workshop. SIGMOD Rec. 35(3) (2006)Google Scholar
  6. 6.
    Bose, R., Frew, J.: Lineage retrieval for scientific data processing: a survey. ACM Comp. Surveys 37(1), 1–28 (2005)CrossRefGoogle Scholar
  7. 7.
    Bowers, S., Ludäscher, B.: Actor-oriented design of scientific workflows. In: Int. Conf. on Concept. Modeling, pp. 369–384 (2005)Google Scholar
  8. 8.
    Campan, A., Truta, T.M.: A clustering approach for data and structural anonymity in social networks. In: PinKDD (2008)Google Scholar
  9. 9.
    Davidson, S.B., Boulakia, S.C., Eyal, A., Ludäscher, B., McPhillips, T.M., Bowers, S., Anand, M.K., Freire, J.: Provenance in scientific workflow systems. IEEE Data Eng. Bull. 30(4), 44–50 (2007)Google Scholar
  10. 10.
    Davidson, S.B., Khanna, S., Milo, T., Panigrahi, D., Roy, S.: Provenance views for module privacy. In: Proceedings of the 30th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 175–186 (2011)Google Scholar
  11. 11.
    Davidson, S.B., Khanna, S., Panigrahi, D., Roy, S.: Preserving module privacy in workflow provenance (2010) (manuscript),
  12. 12.
    Dwork, C.: Differential Privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  13. 13.
    Freire, J., Silva, C.T., Callahan, S.P., Santos, E., Scheidegger, C.E., Vo, H.T.: Managing Rapidly-Evolving Scientific Workflows. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 10–18. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  14. 14.
    Korolova, A., Motwani, R., Nabar, S.U., Xu, Y.: Link privacy in social networks. In: CIKM, pp. 289–298. ACM, New York (2008)CrossRefGoogle Scholar
  15. 15.
    Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: L-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1(1), 3 (2007)CrossRefGoogle Scholar
  16. 16.
    Machanavajjhala, A., Korolova, A., Sarma, A.D.: Personalized social recommendations: accurate or private. Proc. VLDB Endow. 4, 440–450 (2011)CrossRefGoogle Scholar
  17. 17.
    Moreau, L., Freire, J., Futrelle, J., McGrath, R.E., Myers, J., Paulson, P.: The Open Provenance Model: An overview. In: Freire, J., Koop, D., Moreau, L. (eds.) IPAW 2008. LNCS, vol. 5272, pp. 323–326. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  18. 18.
    Moreau, L., Ludäscher, B. (eds.): Concurrency and Computation: Practice and Experience – Special Issue on the First Provenance Challenge. Wiley (2007),
  19. 19.
    Oinn, T., Addis, M., Ferris, J., Marvin, D., Senger, M., Greenwood, R., Carver, K., Pocock, M.G., Wipat, A., Li, P.: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20(1), 3045–3054 (2003)Google Scholar
  20. 20.
    Rastogi, V., Hay, M., Miklau, G., Suciu, D.: Relationship privacy: output perturbation for queries with joins. In: PODS, pp. 107–116 (2009)Google Scholar
  21. 21.
    Samarati, P., De Capitani di Vimercati, S., Paraboschi, S.: Access control: principles and solutions. Software—Practice and Experience 33(5), 397–421 (2003)CrossRefGoogle Scholar
  22. 22.
    Shawn Bowers, B.L., McPhillips, T.M.: Provenance in collection-oriented scientific workflows. Concurrency and Computation: Practice and Experience 20(5), 519–529 (2008)CrossRefGoogle Scholar
  23. 23.
    Simmhan, Y., Plale, B., Gannon, D.: A survey of data provenance in e-science. SIGMOD Rec. 34(3), 31–36 (2005)CrossRefGoogle Scholar
  24. 24.
    Sroka, J., Hidders, J., Missier, P., Goble, C.A.: A formal semantics for the Taverna 2 workflow model. J. Comput. Syst. Sci. 76(6), 490–508 (2010)CrossRefzbMATHMathSciNetGoogle Scholar
  25. 25.
    Stoyanovich, J., Pe’er, I.: MutaGeneSys: estimating individual disease susceptibility based on genome-wide SNP array data. Bioinformatics 24(3), 440–442 (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Susan Davidson
    • 1
  • Zhuowei Bao
    • 1
  • Sudeepa Roy
    • 1
  1. 1.Department of Computer and Information ScienceUniversity of PennsylvaniaPhiladelphiaUSA

Personalised recommendations