Data Anonymization as a Vector Quantization Problem: Control Over Privacy for Health Data

  • Yoan Miche
  • Ian Oliver
  • Silke Holtmanns
  • Aapo Kalliola
  • Anton Akusok
  • Amaury Lendasse
  • Kaj-Mikael Björk
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9817)

Abstract

This paper tackles the topic of data anonymization from a vector quantization point of view. The admitted goal in this work is to provide means of performing data anonymization to avoid single individual or group re-identification from a data set, while maintaining as much as possible (and in a very specific sense) data integrity and structure. The structure of the data is first captured by clustering (with a vector quantization approach), and we propose to use the properties of this vector quantization to anonymize the data. Under some assumptions over possible computations to be performed on the data, we give a framework for identifying and “pushing back outliers in the crowd”, in this clustering sense, as well as anonymizing cluster members while preserving cluster-level statistics and structure as defined by the assumptions (density, pairwise distances, cluster shape and members...).

References

  1. 1.
    Bogachev, V.I., Kolesnikov, A.V.: The Monge-Kantorovich problem: achievements, connections, and perspectives. Russian Math. Surveys 67, 785–890 (2012)MathSciNetCrossRefMATHGoogle Scholar
  2. 2.
    Ciriani, V., di Vimercati, S.C., Foresti, S., Samarati, P.: \(\kappa \)-anonymity. In: Secure Data Management in Decentralized Systems, vol. 33, Advances in Information Security, pp. 323–353. Springer US (2007)Google Scholar
  3. 3.
    Cybenko, G.: Approximations by superpositions of sigmoidal functions. Math. Control Sig. Syst. 2(4), 303–314 (1989)MathSciNetCrossRefMATHGoogle Scholar
  4. 4.
    Dwork, C.: Differential privacy: a survey of results. In: Agrawal, M., Du, D.-Z., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  5. 5.
    Kieseberg, P., Hobel, H., Schrittwieser, S., Weippl, E., Holzinger, A.: Protecting anonymity in data-driven biomedical science. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 301–316. Springer, Heidelberg (2014)Google Scholar
  6. 6.
    Kieseberg, P., Malle, B., Frühwirt, P., Weippl, E., Holzinger, A.: A tamper-proof audit and control system for the doctor in the loop. In: Brain Informatics, pp. 1–11 (2016)Google Scholar
  7. 7.
    Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: \(\ell \)-diversity: privacy beyond \(\kappa \)-anonymity. In: International Conference on Data Engineering (ICDE), pp. 24 (2006)Google Scholar
  9. 9.
    Mallows, C.L.: A note on asymptotic joint normality. Ann. Math. Stat. 43(2), 508–515 (1972)MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© IFIP International Federation for Information Processing 2016

Authors and Affiliations

  • Yoan Miche
    • 1
  • Ian Oliver
    • 1
  • Silke Holtmanns
    • 1
  • Aapo Kalliola
    • 1
    • 4
  • Anton Akusok
    • 3
  • Amaury Lendasse
    • 2
  • Kaj-Mikael Björk
    • 3
  1. 1.Bell LabsNokiaFinland
  2. 2.Department of Mechanical and Industrial Engineering and the Iowa Informatics InitiativeThe University of IowaIowa CityUSA
  3. 3.Arcada University of Applied SciencesHelsinkiFinland
  4. 4.Aalto UniversityEspooFinland

Personalised recommendations