pp 1–15 | Cite as

On privacy-aware eScience workflows

  • Khalid BelhajjameEmail author
  • Noura Faci
  • Zakaria Maamar
  • Vanilson Burégio
  • Edvan Soares
  • Mahmoud Barhamgi


Computing-intensive experiments in modern sciences have become increasingly data-driven illustrating perfectly the Big-Data era. These experiments are usually specified and enacted in the form of workflows that would need to manage (i.e., read, write, store, and retrieve) highly-sensitive data like persons’ medical records. We assume for this work that the operations that constitute a workflow are 1-to-1 operations, in the sense that for each input data record they produce a single data record. While there is an active research body on how to protect sensitive data by, for instance, anonymizing datasets, there is a limited number of approaches that would assist scientists with identifying the datasets, generated by the workflows, that need to be anonymized along with setting the anonymization degree that must be met. We present in this paper a solution privacy requirements of datasets used and generated by a workflow execution. We also present a technique for anonymizing workflow data given an anonymity degree.


Privacy e-Science Workflow 

Mathematics Subject Classification




  1. 1.
    Alhaqbani B, Adams M, Fidge CJ, ter Hofstede AHM (2013) Privacy-aware workflow management. In: Glykas M (ed) Business process management. Springer, Dortmund, pp 111–128CrossRefGoogle Scholar
  2. 2.
    Barth A, Mitchell JC, Datta A, Sundaram S (2007) Privacy and utility in business processes. In: CSF. IEEE, pp 279–294Google Scholar
  3. 3.
    Belhajjame K, Faci N, Maamar Z, Burégio VA, Soares E, Barhamgi M (2019) Privacy-preserving data analysis workflows for escience. In: Proceedings of the CEUR workshop of the EDBT/ICDT, vol 2322. CEUR-WS.orgGoogle Scholar
  4. 4.
    Chebbi I, Tata S (2007) Workflow abstraction for privacy preservation. In: Weske M, Hacid MS, Godart C (eds) International conference on web information systems engineering. Springer, Nancy, pp 166–177Google Scholar
  5. 5.
    Cohen-Boulakia S, Belhajjame K, Collin O, Chopard J, Froidevaux C, Gaignard A et al (2017) Scientific workflows for computational reproducibility in the life sciences: status, challenges and opportunities. Future Gener Comput Syst 75:284–298CrossRefGoogle Scholar
  6. 6.
    Davidson SB, Khanna S, Tannen V, Roy S, Chen Y, Milo T, Stoyanovich J (2011) Enabling privacy in provenance-aware workflow systems. In: Biennial conference on innovative data systems research, pp 215–218Google Scholar
  7. 7.
    Dolby RB, Harvey G, Jenkins NP, Raviraj R (2000) Data suppression and regeneration. US Patent 6,038,231Google Scholar
  8. 8.
    Dwork C (2006) Differential privacy. In: Proceedings on 33rd international colloquium on automata, languages and programming, part II. Springer, Venice, pp 1–12Google Scholar
  9. 9.
    Elmagarmid Ahmed K, Ipeirotis Panagiotis G, Verykios Vassilios S (2007) Duplicate record detection: a survey. IEEE Trans Knowl Data Eng 19(1):1–16CrossRefGoogle Scholar
  10. 10.
    Eyupoglu Can, Aydin Muhammed Ali, Zaim Abdul Halim, Sertbas Ahmet (2018) An efficient big data anonymization algorithm based on chaos and perturbation techniques. Entropy 20(5):373CrossRefGoogle Scholar
  11. 11.
    Gil Y, Cheung WK, Ratnakar V, Chan K-K (2007) Privacy enforcement in data analysis workflows. In: AAAI workshop on privacy enforcement and accountability with semantics. AAAI, Busan, pp 41–48Google Scholar
  12. 12.
    Kargupta H, Datta S, Wang Q, Sivakumar K (2003) On the privacy preserving properties of random data perturbation techniques. In: ICDM. IEEE, pp 99–106Google Scholar
  13. 13.
    LeFevre K, DeWitt DJ, Ramakrishnan V (2006) Mondrian multidimensional k-anonymity. In: International conference on data engineering. IEEE, Atlanta, p 25Google Scholar
  14. 14.
    Missier P, Belhajjame K, Cheney J (2013) The W3C PROV family of specifications for modelling provenance metadata. In: EDBT/ICDT. ACM press, pp 773–776Google Scholar
  15. 15.
    Sharif S, Taheri J, Zomaya AY, Nepal S (2013) MPHC: preserving privacy for workflow execution in hybrid clouds. In: PDCAT. IEEE, pp 272–280Google Scholar
  16. 16.
    Sweeney L (2002) k-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(05):557–570MathSciNetCrossRefGoogle Scholar
  17. 17.
    Wang K, Yu PS, Chakraborty S (2004) Bottom-up generalization: a data mining solution to privacy protection. In: ICDM. IEEE, pp 249–256Google Scholar

Copyright information

© Springer-Verlag GmbH Austria, part of Springer Nature 2020

Authors and Affiliations

  • Khalid Belhajjame
    • 1
    Email author
  • Noura Faci
    • 2
  • Zakaria Maamar
    • 3
  • Vanilson Burégio
    • 4
  • Edvan Soares
    • 4
  • Mahmoud Barhamgi
    • 2
  1. 1.LAMSADEPSL, Université Paris-DauphineParisFrance
  2. 2.LIRISUDL, Université de LyonLyonFrance
  3. 3.Zayed UniversityDubaiUnited Arab Emirates
  4. 4.Federal Rural University of PernambucoRecifeBrazil

Personalised recommendations