Skip to main content

Towards Identifying De-anonymisation Risks in Distributed Health Data Silos

  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11706))

Included in the following conference series:

Abstract

Accessing distributed and isolated data repositories such as medical research and treatment data in a privacy-preserving manner is a challenging problem. Furthermore, in the context of high-dimensional datasets, adhering to strict privacy legislation can be projected to a W[2]-complete problem whereby all privacy violating attribute combinations must be identified. While traditional anonymisation algorithms incur high levels of information loss when applied to high-dimensional data, they often do not guarantee privacy, which defeats the purpose of anonymisation. In this paper, we extend our previous work and address these issues by using Bayesian networks to handle data transformation for anonymisation [29]. By computing conditional probabilities linking attribute pairs for all attribute pair combinations the privacy exposure risk can be assessed. Attribute pairs differing by a high conditional probability indicate a high risk of de-anonymisation, similar to quasi-identifiers in syntactic anonymisation schemes, and can be separated instead of deleted. Attribute compartmentation removes the risk of privacy exposure, and deletion avoidance results in a significant reduction in information loss. In other words, assimilating the conditional probability of outliers directly in the adjacency matrix in a greedy fashion is efficient and privacy-preserving. Further, we offer deeper evaluation insights for optimising Bayesian networks with multigrid solver for aggregating state space explosion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://ec.europa.eu/info/law/law-topic/data-protection/reform/what-does-general-data-protection-regulation-gdpr-govern_en.

References

  1. An, X., Jutla, D., Cercone, N.: A Bayesian network approach to detecting privacy intrusion. In: Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, pp. 73–76. IEEE Computer Society (2006)

    Google Scholar 

  2. Aue, G., Biesdorf, S., Henke, N.: ehealth 2.0: how health systems can gain a leadership role in digital health. McKinsey & Company, December 2015

    Google Scholar 

  3. Barbaro, M., Zeller, T.: A face is exposed for AOL searcher no. 4417749, August 2006. http://www.nytimes.com/2006/08/09/technology/09aol.html

  4. Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: 2005 Proceedings of 21st International Conference on Data Engineering, ICDE 2005, pp. 217–228. IEEE (2005)

    Google Scholar 

  5. Bläsius, T., Friedrich, T., Schirneck, M.: The parameterized complexity of dependency detection in relational databases. In: Guo, J., Hermelin, D. (eds.) 11th International Symposium on Parameterized and Exact Computation (IPEC 2016). Leibniz International Proceedings in Informatics (LIPIcs), vol. 63, pp. 6:1–6:13. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany (2017). http://drops.dagstuhl.de/opus/volltexte/2017/6920

  6. Briggs, W.L., Henson, V.E., McCormick, S.F.: A Multigrid Tutorial. SIAM, Philadelphia (2000)

    Book  Google Scholar 

  7. Carr, J.: Applications of Centre Manifold Theory, vol. 35. Springer, New York (2012)

    MATH  Google Scholar 

  8. Chickering, D.M., Geiger, D., Heckerman, D., et al.: Learning Bayesian networks is NP-hard. Technical Report, MSR-TR-94-17, Microsoft Research (1994)

    Google Scholar 

  9. Crossfield, S.S., Clamp, S.: Electronic health records research in a health sector environment with multiple provider types. In: HEALTHINF, pp. 104–111 (2013)

    Google Scholar 

  10. Dagum, P., Luby, M.: Approximating probabilistic inference in Bayesian belief networks is NP-hard. Artif. Intell. 60(1), 141–153 (1993)

    Article  MathSciNet  Google Scholar 

  11. De Montjoye, Y.A., Hidalgo, C.A., Verleysen, M., Blondel, V.D.: Unique in the crowd: the privacy bounds of human mobility. Sci. Rep. 3, 1376 (2013)

    Article  Google Scholar 

  12. Dwork, C.: Differential privacy. In: van Tilborg, H.C.A., Jajodia, S. (eds.) Encyclopedia of Cryptography and Security, pp. 338–340. Springer, Boston (2011). https://doi.org/10.1007/978-1-4419-5906-5_752

    Chapter  Google Scholar 

  13. Efron, B.: Bayes’ theorem in the 21st century. Science 340(6137), 1177–1178 (2013)

    Article  MathSciNet  Google Scholar 

  14. European Commission: opinion 05/2014 on anonymisation techniques, April 2014. https://www.pdpjournals.com/docs/88197.pdf

  15. Fulton, S.R., Ciesielski, P.E., Schubert, W.H.: Multigrid methods for elliptic problems: a review. Mon. Weather Rev. 114(5), 943–959 (1986)

    Article  Google Scholar 

  16. Kayyali, B., Knott, D., Van Kuiken, S.: The big-data revolution in us health care: accelerating value and innovation, April 2013

    Google Scholar 

  17. Kifer, D., Machanavajjhala, A.: No free lunch in data privacy. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, pp. 193–204. ACM, New York (2011). https://doi.org/10.1145/1989323.1989345

  18. Kohlmayer, F., Prasser, F., Eckert, C., Kuhn, K.A.: A flexible approach to distributed data anonymization. J. Biomed. Inform. 50, 62–76 (2014)

    Article  Google Scholar 

  19. Leoni, D.: Non-interactive differential privacy: a survey. In: Proceedings of the First International Workshop on Open Data, pp. 40–52. ACM (2012)

    Google Scholar 

  20. Lin, T., Zha, H.: Riemannian manifold learning. IEEE Trans. Pattern Anal. Mach. Intell. 30(5), 796–809 (2008)

    Article  Google Scholar 

  21. Liu, F.: Generalized Gaussian mechanism for differential privacy. arXiv preprint arXiv:1602.06028 (2016)

  22. Massey, R.: How the GDPR will impact life sciences and health care, February 2017

    Google Scholar 

  23. Meng, D., Sivakumar, K., Kargupta, H.: Privacy-sensitive Bayesian network parameter learning. In: 2004 Fourth IEEE International Conference on Data Mining, ICDM 2004, pp. 487–490. IEEE (2004)

    Google Scholar 

  24. Meyerson, A., Williams, R.: On the complexity of optimal k-anonymity. In: Proceedings of the Twenty-Third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 223–228. ACM (2004)

    Google Scholar 

  25. Mohammed, N., Fung, B., Hung, P.C., Lee, C.K.: Centralized and distributed anonymization for high-dimensional healthcare data. ACM Trans. Knowl. Discov. Data (TKDD) 4(4), 18 (2010)

    Google Scholar 

  26. Narayanan, A., Shmatikov, V.: How to break anonymity of the netflix prize dataset. CoRR abs/cs/0610105 (2006). http://arxiv.org/abs/cs/0610105

  27. Podlesny, N.J., Kayem, A.V.D.M., von Schorlemer, S., Uflacker, M.: Minimising information loss on anonymised high dimensional data with greedy in-memory processing. In: Hartmann, S., Ma, H., Hameurlain, A., Pernul, G., Wagner, R.R. (eds.) DEXA 2018. LNCS, vol. 11029, pp. 85–100. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98809-2_6

    Chapter  Google Scholar 

  28. Olson, L.N., Schroder, J.B.: PyAMG: algebraic multigrid solvers in Python v4.0 (2018). release 4.0, https://github.com/pyamg/pyamg

  29. Podlesny, N., Kayem, A.V., Meinel, C.: Identifying data exposure across high-dimensional health data silos through Bayesian networks optimised by multigrid and manifold. In: 2019 IEEE 17th International Conference on Dependable, Autonomic and Secure Computing (DASC). IEEE (2019)

    Google Scholar 

  30. Podlesny, N.J.: Enriched health dataset (2017). https://github.com/jaSunny/MA-enriched-Health-Data

  31. Rubinstein, I.S., Hartzog, W.: Anonymization and risk. 91 Washington Law Review, p. 703 (2016)

    Google Scholar 

  32. Sajda, P.: Machine learning for detection and diagnosis of disease. Annu. Rev. Biomed. Eng. 8, 537–565 (2006)

    Article  Google Scholar 

  33. Schadt, E., Chilukuri, S.: The role of big data in medicine, November 2015

    Google Scholar 

  34. Smith, G.: Recent developments in quantitative information flow (invited tutorial). In: Proceedings of the 2015 30th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), pp. 23–31. IEEE Computer Society (2015)

    Google Scholar 

  35. Stüben, K.: An introduction to algebraic multigrid. Multigrid, pp. 413–532 (2001)

    Google Scholar 

  36. Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(05), 571–588 (2002)

    Article  MathSciNet  Google Scholar 

  37. Takbiri, N., Houmansadr, A., Goeckel, D.L., Pishro-Nik, H.: Fundamental limits of location privacy using anonymization. In: 2017 51st Annual Conference on Information Sciences and Systems (CISS), pp. 1–6. IEEE (2017)

    Google Scholar 

  38. Terrovitis, M., Mamoulis, N., Kalnis, P.: Privacy-preserving anonymization of set-valued data. Proc. VLDB Endow. 1(1), 115–125 (2008)

    Article  Google Scholar 

  39. Vaněk, P., Mandel, J., Brezina, M.: Algebraic multigrid by smoothed aggregation for second and fourth order elliptic problems. Computing 56(3), 179–196 (1996)

    Article  MathSciNet  Google Scholar 

  40. Vessenes, P., Seidensticker, R.: System and method for analyzing transactions in a distributed ledger, US Patent 9,298,806, 29 March 2016. https://www.google.com/patents/US9298806

  41. Wang, J., Zhang, Z., Zha, H.: Adaptive manifold learning. In: Advances in Neural Information Processing Systems, pp. 1473–1480 (2005)

    Google Scholar 

  42. Wright, R., Yang, Z.: Privacy-preserving Bayesian network structure computation on distributed heterogeneous data. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 713–718. ACM (2004)

    Google Scholar 

  43. Zhang, B., Dave, V., Mohammed, N., Hasan, M.A.: Feature selection for classification under anonymity constraint. arXiv preprint arXiv:1512.07158 (2015)

  44. Zhang, J., Cormode, G., Procopiuc, C.M., Srivastava, D., Xiao, X.: Privbayes: private data release via bayesian networks. ACM Trans. Database Syst. (TODS) 42(4), 25 (2017)

    Article  MathSciNet  Google Scholar 

  45. Zhang, X., Yang, L.T., Liu, C., Chen, J.: A scalable two-phase top-down specialization approach for data anonymization using mapreduce on cloud. IEEE Trans. Parallel Distrib. Syst. 25(2), 363–373 (2014)

    Article  Google Scholar 

  46. Zillner, S., Neururer, S.: Big data in the health sector. In: Cavanillas, J.M., Curry, E., Wahlster, W. (eds.) New Horizons for a Data-Driven Economy, pp. 179–194. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-21569-3_10

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Nikolai J. Podlesny , Anne V. D. M. Kayem or Christoph Meinel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Podlesny, N.J., Kayem, A.V.D.M., Meinel, C. (2019). Towards Identifying De-anonymisation Risks in Distributed Health Data Silos. In: Hartmann, S., Küng, J., Chakravarthy, S., Anderst-Kotsis, G., Tjoa, A., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2019. Lecture Notes in Computer Science(), vol 11706. Springer, Cham. https://doi.org/10.1007/978-3-030-27615-7_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-27615-7_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-27614-0

  • Online ISBN: 978-3-030-27615-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics