Publishing Differentially Private Medical Events Data

  • Sigal ShakedEmail author
  • Lior Rokach
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9817)


Sequential data has been widely collected in the past few years; in the public health domain it appears as collections of medical events such as lab results, electronic chart records, or hospitalization transactions. Publicly available sequential datasets for research purposes promises new insights, such as understanding patient types, and recognizing emerging diseases. Unfortunately, the publication of sequential data presents a significant threat to users’ privacy. Since data owners prefer to avoid such risks, much of the collected data is currently unavailable to researchers. Existing anonymization techniques that aim at preserving sequential patterns lack two important features: handling long sequences and preserving occurrence times. In this paper, we address this challenge by employing an ensemble of Markovian models trained based on the source data. The ensemble takes several optional periodicity levels into consideration. Each model captures transitions between times and states according to shorter parts of the sequence, which is eventually reconstructed. Anonymity is provided by utilizing only elements of the model that guarantee differential privacy. Furthermore, we develop a solution for generating differentially private sequential data, which will bring us one step closer to publicly available medical datasets via sequential data. We applied this method to two real medical events datasets and received some encouraging results, demonstrating that the proposed method can be used to publish high quality anonymized data.


Data synthetization Privacy preserving data publishing Markov model Clustering Sequential patterns Differential privacy Medical events 


  1. 1.
    Bonomi, L., Xiong, L.: A two-phase algorithm for mining sequential patterns with differential privacy. In: Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management, CIKM 2013, pp. 269–278. ACM, New York (2013)Google Scholar
  2. 2.
    Broder, A.Z., Charikar, M., Frieze, A.M., Mitzenmacher, M.: Min-wise independent permutations. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, STOC 1998, pp. 327–336. ACM, New York (1998)Google Scholar
  3. 3.
    Chen, R., Acs, G., Castelluccia, C.: Differentially private sequential data publication via variable-length n-grams. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security, CCS 2012, p. 638. ACM Press (2012)Google Scholar
  4. 4.
    Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  5. 5.
    Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  6. 6.
    Ghasemzadeh, M., Fung, B.C., Chen, R., Awasthi, A.: Anonymizing trajectory data for passenger flow analysis. Transp. Res. Part C: Emerg. Technol. 39, 63–79 (2014)CrossRefGoogle Scholar
  7. 7.
    Holzinger, A.: Interactive machine learning for health informatics: when do we need the human-in-the-loop? Brain Inform. 3(2), 119–131 (2016)CrossRefGoogle Scholar
  8. 8.
    Kieseberg, P., Malle, B., Frühwirt, P., Weippl, E., Holzinger, A.: A tamper-proof audit and control system for the doctor in the loop. Brain Inform. 3, 1–11 (2016)Google Scholar
  9. 9.
    Lee, J., Scott, D.J., Villarroel, M., Clifford, G.D., Saeed, M., Mark, R.G.: Open-access MIMIC-II database for intensive care research. In: Conference Proceedings: Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 8315–8318 (2011)Google Scholar
  10. 10.
    de Montjoye, Y.A., Hidalgo, C.A., Verleysen, M., Blondel, V.D.: Unique in the Crowd: the privacy bounds of human mobility. Sci. Rep. 3, 1376 (2013)CrossRefGoogle Scholar
  11. 11.
    Pensa, R.G., Monreale, A., Pinelli, F., Pedreschi, D.: Pattern-preserving k-anonymization of sequences and its application to mobility data mining. In: CEUR Workshop Proceedings, vol. 397, pp. 44–60 (2008)Google Scholar
  12. 12.
    Samarati, P.: Protecting respondents identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)CrossRefGoogle Scholar

Copyright information

© IFIP International Federation for Information Processing 2016

Authors and Affiliations

  1. 1.The Department of Information Systems EngineeringBen-Gurion UniversityBeershebaIsrael

Personalised recommendations