Towards a Toolkit for Utility and Privacy-Preserving Transformation of Semi-structured Data Using Data Pseudonymization

  • Saffija Kasem-MadaniEmail author
  • Michael MeierEmail author
  • Martin Wehner
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10436)


We present a flexibly configurable toolkit for the automatic pseudonymization of datasets that keeps certain utility. The toolkit could be used to pseudonymize data in order to preserve the privacy of data owners while data processing and to meet the requirements of the new European general data protection regulation. We define some possible utility requirements and corresponding utility options a pseudonym can meet. Based on that, we define a policy language that can be used to produce machine-readable utility policies. The utility policies are used to configure the toolkit to produce a pseudonymized dataset that offers the utility options. Here, we follow a confidentiality-by-default principle. I.e., only the data mentioned in the policy is transformed and included in the pseudonymized dataset. All remaining data is kept confidential. This stays in contrast to common pseudonymization techniques that replace only personal or sensitive data of a dataset with pseudonyms, while keeping any other information in plaintext. If applied appropriately, our approach allows for providing pseudonymized datasets that includes less information that can be misused to infer personal information about the individuals the data belong to.


Privacy Pseudonymization Data utility Confidentiality Policy language Utility requirements 


  1. 1.
    Ben-Kiki, O., Evans, C., Ingerson, B.: Yaml Ain’t Markup Language (yaml) Version 1.1. Technical report (2005)Google Scholar
  2. 2.
    Biskup, J., Flegel, U.: On pseudonymization of audit data for intrusion detection. In: International Workshop on Designing Privacy Enhancing Technologies: Design Issues in Anonymity and Unobservability, pp. 161–180. Springer-Verlag, New York Inc., New York (2001).
  3. 3.
    Boneh, D., Gentry, C., Halevi, S., Wang, F., Wu, D.J.: Private database queries using somewhat homomorphic encryption. In: Jacobson, M., Locasto, M., Mohassel, P., Safavi-Naini, R. (eds.) ACNS 2013. LNCS, vol. 7954, pp. 102–118. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-38980-1_7CrossRefGoogle Scholar
  4. 4.
    Brakerski, Z., Gentry, C., Vaikuntanathan, V.: (Leveled) fully homomorphic encryption without bootstrapping. In: Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, ITCS 2012, NY, USA, pp. 309–325 (2012).
  5. 5.
    BSI: Kryptographische Verfahren: Empfehlungen und Schlüssellangen. Technische Richtlinie TR-02102-1, Bundesamt fur Sicherheit in der Informationstechnik (2017)Google Scholar
  6. 6.
    Crockford, D.: The application/json media type for javascript object notation (json) 2006a (2006).
  7. 7.
    Daemen, J., Rijmen, V.: AES proposal: Rijndael (1999)Google Scholar
  8. 8.
    Dolin, R.H., Alschuler, L., Boyer, S., Beebe, C., Behlen, F.M., Biron, P.V., Shabo, A.: HL7 clinical document architecture, release 2. J. Am. Med. Inf. Assoc. 13(1), 30–39 (2006)CrossRefGoogle Scholar
  9. 9.
    Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation). Official Journal of the European Union L119/59, May 2016.
  10. 10.
    Flegel, U., Hoffmann, J., Meier, M.: Cooperation enablement for centralistic early warning systems. In: Proceedings of the 2010 ACM Symposium on Applied Computing, SAC 2010, NY, USA, pp. 2001–2008 (2010).
  11. 11.
    ElGamal, T.: A public key cryptosystem and a signature scheme based on discrete logarithms. In: Blakley, G.R., Chaum, D. (eds.) CRYPTO 1984. LNCS, vol. 196, pp. 10–18. Springer, Heidelberg (1985). doi: 10.1007/3-540-39568-7_2CrossRefGoogle Scholar
  12. 12.
    Gentry, C., et al.: Fully homomorphic encryption using ideal lattices. In: STOC, vol. 9, pp. 169–178 (2009)Google Scholar
  13. 13.
    Goldwasser, S., Micali, S.: Probabilistic encryption. J. Comput. Syst. Sci. 28(2), 270–299 (1984)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Heurix, J., Khosravipour, S., Tjoa, A.M., Rawassizadeh, R.: LiDSec- A lightweight pseudonymization approach for privacy-preserving publishing of textual personal information. In: 2012 Seventh International Conference on Availability, Reliability and Security, pp. 603–608 (2011)Google Scholar
  15. 15.
    Kasem-Madani, S., Meier, M.: Security and Privacy Policy Languages: A Survey, Categorization and Gap Identification. arXiv preprint arXiv:1512.00201 (2015)
  16. 16.
    Kerschbaum, F.: Distance-preserving Pseudonymization for timestamps and spatial data. In: Proceedings of the 2007 ACM Workshop on Privacy in Electronic Society, WPES 2007, NY, USA, pp. 68–71 (2007).
  17. 17.
    Kumaraguru, P., Calo, S.: A survey of privacy policy languages. In: Workshop on Usable IT Security Management (USM 2007): Proceedings of the 3rd Symposium on Usable Privacy and Security. ACM (2007)Google Scholar
  18. 18.
    Naveed, M., Kamara, S., Wright, C.V.: Inference attacks on property-preserving encrypted databases. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, CCS 2015, NY, USA, pp. 644–655 (2015).
  19. 19.
    Neubauer, T., Riedl, B.: Improving patients privacy with pseudonymization. Stud. Health Technol. Inf. 136, 691 (2008)Google Scholar
  20. 20.
    Paillier, P.: Public-key cryptosystems based on composite degree residuosity classes. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, pp. 223–238. Springer, Heidelberg (1999). doi: 10.1007/3-540-48910-X_16CrossRefGoogle Scholar
  21. 21.
    Popa, R.A., Redfield, C.M.S., Zeldovich, N., Balakrishnan, H.: CryptDB: protecting confidentiality with encrypted query processing. In: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, SOSP 2011, NY, USA, pp. 85–100 (2011).
  22. 22.
    Riedl, B., Neubauer, T., Goluch, G., Boehm, O., Reinauer, G., Krumboeck, A.: A secure architecture for the pseudonymization of medical data. In: The Second International Conference on Availability, Reliability and Security, ARES 2007, pp. 318–324. IEEE (2007)Google Scholar
  23. 23.
    Rivest, R.L., Shamir, A., Adleman, L.: A method for obtaining digital signatures and public-key cryptosystems. Commun. ACM 21(2), 120–126 (1978)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Rossum, G.: Python Reference Manual. Technical report, Amsterdam, The Netherlands (1995)Google Scholar
  25. 25.
    Schaad, A., Bkakria, A., Kerschbaum, F., Cuppens, F., Cuppens-Boulahia, N., Gross-Amblard, D.: Optimized and controlled provisioning of encrypted outsourced data. In: 19th ACM Symposium on Access Control Models and Technologies, SACMAT 2014, London, ON, Canada, 25–27 June 2014, pp. 141–152 (2014).
  26. 26.
    Shafranovich, Y.: Common format and MIME type for comma-separated values (csv) files (2005)Google Scholar
  27. 27.
    Slagell, A., Lakkaraju, K., Luo, K.: FLAIM: a multi-level anonymization framework for computer and network logs. In: LISA 2006: Proceedings of the 20th conference on Large Installation System Administration, p. 6. USENIX Association, Berkeley (2006)Google Scholar
  28. 28.
    Wendzel, S.: How to increase the security of smart buildings? Commun. ACM 59(5), 47–49 (2006). Scholar
  29. 29.
    Zhao, J., Binns, R., Van Kleek, M., Shadbolt, N.: Privacy languages: are we there yet to enable user controls? In: Proceedings of the 25th International Conference Companion on World Wide Web, WWW 2016 Companion, pp. 799–806. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2016).

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.University of BonnBonnGermany

Personalised recommendations