Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8401))

Abstract

With formidable recent improvements in data processing and information retrieval, knowledge discovery/data mining, business intelligence, content analytics and other upcoming empirical approaches have an enormous potential, particularly for the data intensive biomedical sciences. For results derived using empirical methods, the underlying data set should be made available, at least during the review process for the reviewers, to ensure the quality of the research done and to prevent fraud or errors and to enable the replication of studies. However, in particular in the medicine and the life sciences, this leads to a discrepancy, as the disclosure of research data raises considerable privacy concerns, as researchers have of course the full responsibility to protect their (volunteer) subjects, hence must adhere to respective ethical policies. One solution for this problem lies in the protection of sensitive information in medical data sets by applying appropriate anonymization. This paper provides an overview on the most important and well-researched approaches and discusses open research problems in this area, with the goal to act as a starting point for further investigation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chawla, N.V., Davis, D.A.: Bringing big data to personalized healthcare: A patient-centered framework. Journal of General Internal Medicine 28, S660–S665

    Google Scholar 

  2. Holzinger, A.: Biomedical Informatics: Discovering Knowledge in Big Data. Springer, New York (2014)

    Book  MATH  Google Scholar 

  3. Holzinger, A., Dehmer, M., Jurisica, I.: Knowledge discovery and interactive data mining in bioinformatics - state-of-the-art, future challenges and research directions. BMC Bioinformatics 15(suppl. 6), I1 (2014)

    Google Scholar 

  4. Emmert-Streib, F., de Matos Simoes, R., Glazko, G., McDade, S., Haibe-Kains, B., Holzinger, A., Dehmer, M., Campbell, F.: Functional and genetic analysis of the colon cancer network. BMC Bioinformatics 15(suppl. 6), S6 (2014)

    Google Scholar 

  5. Jacobs, A.: The pathologies of big data. Communications of the ACM 52(8), 36–44 (2009)

    Article  Google Scholar 

  6. Craig, T., Ludloff, M.E.: Privacy and Big Data: The Players, Regulators and Stakeholders. Reilly Media, Inc., Beijing (2011)

    Google Scholar 

  7. Weippl, E., Holzinger, A., Tjoa, A.M.: Security aspects of ubiquitous computing in health care. Springer Elektrotechnik & Informationstechnik, e&i 123(4), 156–162 (2006)

    Article  Google Scholar 

  8. Breivik, M., Hovland, G., From, P.J.: Trends in research and publication: Science 2.0 and open access. Modeling Identification and Control 30(3), 181–190 (2009)

    Article  Google Scholar 

  9. Thompson, M., Heneghan, C.: Bmj open data campaign: We need to move the debate on open clinical trial data forward. British Medical Journal 345 (2012)

    Google Scholar 

  10. Hobel, H., Schrittwieser, S., Kieseberg, P., Weippl, E.: Privacy, Anonymity, Pseudonymity and Data Disclosure in Data-Driven Science (2013)

    Google Scholar 

  11. Bonneau, J.: The science of guessing: analyzing an anonymized corpus of 70 million passwords. In: 2012 IEEE Symposium on Security and Privacy (SP), pp. 538–552. IEEE (2012)

    Google Scholar 

  12. Chia, P.H., Yamamoto, Y., Asokan, N.: Is this app safe?: a large scale study on application permissions and risk signals. In: Proceedings of the 21st International Conference on World Wide Web, pp. 311–320. ACM (2012)

    Google Scholar 

  13. Dey, R., Jelveh, Z., Ross, K.: Facebook users have become much more private: A large-scale study. In: 2012 IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops), pp. 346–352. IEEE (2012)

    Google Scholar 

  14. Siersdorfer, S., Chelaru, S., Nejdl, W., San Pedro, J.: How useful are your comments?: analyzing and predicting youtube comments and comment ratings. In: Proceedings of the 19th International Conference on World Wide Web, pp. 891–900. ACM (2010)

    Google Scholar 

  15. West, R., Leskovec, J.: Human wayfinding in information networks. In: Proceedings of the 21st International Conference on World Wide Web, pp. 619–628. ACM (2012)

    Google Scholar 

  16. Zang, H., Bolot, J.: Anonymization of location data does not work: A large-scale measurement study. In: Proceedings of the 17th Annual International Conference on Mobile Computing and Networking, pp. 145–156. ACM (2011)

    Google Scholar 

  17. Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10(05), 571–588 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  18. Sweeney, L.: k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10(05), 557–570 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  19. Fung, B., Wang, K., Chen, R., Yu, P.S.: Privacy-preserving data publishing: A survey of recent developments. ACM Computing Surveys (CSUR) 42(4), 14 (2010)

    Article  Google Scholar 

  20. Dalenius, T.: Finding a needle in a haystack-or identifying anonymous census record. Journal of Official Statistics 2(3), 329–336 (1986)

    Google Scholar 

  21. Pfitzmann, A., Köhntopp, M.: Anonymity, unobservability, and pseudonymity - A proposal for terminology. In: Federrath, H. (ed.) Anonymity 2000. LNCS, vol. 2009, pp. 1–9. Springer, Heidelberg (2001)

    Google Scholar 

  22. Hobel, H., Heurix, J., Anjomshoaa, A., Weippl, E.: Towards security-enhanced and privacy-preserving mashup compositions. In: Janczewski, L.J., Wolfe, H.B., Shenoi, S. (eds.) SEC 2013. IFIP AICT, vol. 405, pp. 286–299. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  23. Wang, K., Fung, B.C., Philip, S.Y.: Handicapping attacker’s confidence: an alternative to k-anonymization. Knowledge and Information Systems 11(3), 345–368 (2007)

    Article  Google Scholar 

  24. Meyerson, A., Williams, R.: On the complexity of optimal k-anonymity. In: Proceedings of the Twenty-Third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 223–228. ACM (2004)

    Google Scholar 

  25. LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: Efficient full-domain k-anonymity. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 49–60. ACM (2005)

    Google Scholar 

  26. LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: Proceedings of the 22nd International Conference on Data Engineering, ICDE 2006, pp. 25–25. IEEE (2006)

    Google Scholar 

  27. LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Workload-aware anonymization. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 277–286. ACM (2006)

    Google Scholar 

  28. Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: l-diversity: Privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data (TKDD) 1(1), 3 (2007)

    Article  Google Scholar 

  29. Li, N., Li, T., Venkatasubramanian, S.: t-closeness: Privacy beyond k-anonymity and l-diversity. In: ICDE, vol. 7, pp. 106–115 (2007)

    Google Scholar 

  30. Li, J., Tao, Y., Xiao, X.: Preservation of proximity privacy in publishing numerical sensitive data. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 473–486. ACM (2008)

    Google Scholar 

  31. Heurix, J., Karlinger, M., Neubauer, T.: Pseudonymization with metadata encryption for privacy-preserving searchable documents. In: 2012 45th Hawaii International Conference on System Science (HICSS), pp. 3011–3020. IEEE (2012)

    Google Scholar 

  32. Neubauer, T., Heurix, J.: A methodology for the pseudonymization of medical data. International Journal of Medical Informatics 80(3), 190–204 (2011)

    Article  Google Scholar 

  33. Heurix, J., Neubauer, T.: Privacy-preserving storage and access of medical data through pseudonymization and encryption. In: Furnell, S., Lambrinoudakis, C., Pernul, G. (eds.) TrustBus 2011. LNCS, vol. 6863, pp. 186–197. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  34. Noumeir, R., Lemay, A., Lina, J.M.: Pseudonymization of radiology data for research purposes. Journal of Digital Imaging 20(3), 284–295 (2007)

    Article  Google Scholar 

  35. Agrawal, R., Kiernan, J.: Watermarking relational databases. In: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 155–166. VLDB Endowment (2002)

    Google Scholar 

  36. Deshpande, A., Gadge, J.: New watermarking technique for relational databases. In: 2009 2nd International Conference on Emerging Trends in Engineering and Technology (ICETET), pp. 664–669 (2009)

    Google Scholar 

  37. Kieseberg, P., Schrittwieser, S., Mulazzani, M., Echizen, I., Weippl, E.: An algorithm for collusion-resistant anonymization and fingerprinting of sensitive microdata. Electronic Markets - The International Journal on Networked Business (2014)

    Google Scholar 

  38. Schrittwieser, S., Kieseberg, P., Echizen, I., Wohlgemuth, S., Sonehara, N., Weippl, E.: An algorithm for k-anonymity-based fingerprinting. In: Shi, Y.Q., Kim, H.-J., Perez-Gonzalez, F. (eds.) IWDW 2011. LNCS, vol. 7128, pp. 439–452. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  39. Heurix, J., Rella, A., Fenz, S., Neubauer, T.: Automated transformation of semi-structured text elements. In: AMCIS 2012 Proceedings, pp. 1–11 (August 2012)

    Google Scholar 

  40. Heurix, J., Rella, A., Fenz, S., Neubauer, T.: A rule-based transformation system for converting semi-structured medical documents. Health and Technology, 1–13 (March 2013)

    Google Scholar 

  41. Kohlmayer, F., Prasser, F., Eckert, C., Kemper, A., Kuhn, K.A.: Flash: efficient, stable and optimal k-anonymity. In: 2012 International Conference on Privacy, Security, Risk and Trust (PASSAT), 2012 International Confernece on Social Computing (SocialCom), pp. 708–717. IEEE (2012)

    Google Scholar 

  42. Kohlmayer, F., Prasser, F., Eckert, C., Kemper, A., Kuhn, K.A.: Highly efficient optimal k-anonymity for biomedical datasets. In: 2012 25th International Symposium on Computer-Based Medical Systems (CBMS), pp. 1–6. IEEE (2012)

    Google Scholar 

  43. El Emam, K., Dankar, F.K., Issa, R., Jonker, E., Amyot, D., Cogo, E., Corriveau, J.P., Walker, M., Chowdhury, S., Vaillancourt, R., et al.: A globally optimal k-anonymity method for the de-identification of health data. Journal of the American Medical Informatics Association 16(5), 670–682 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Kieseberg, P., Hobel, H., Schrittwieser, S., Weippl, E., Holzinger, A. (2014). Protecting Anonymity in Data-Driven Biomedical Science. In: Holzinger, A., Jurisica, I. (eds) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. Lecture Notes in Computer Science, vol 8401. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43968-5_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-43968-5_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-43967-8

  • Online ISBN: 978-3-662-43968-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics