Skip to main content

Personal Privacy Protection in Time of Big Data

  • Chapter
  • First Online:
Challenges in Computational Statistics and Data Mining

Part of the book series: Studies in Computational Intelligence ((SCI,volume 605))

Abstract

Personal privacy protection increasingly becomes a story of privacy protection in electronic data format. Personal privacy protection also becomes a showcase of advantages and challenges of Big Data phenomenon. Accumulation of massive data volumes combined with development of intelligent Data Mining algorithms allows more data being analysed and linked. Unintended consequences of Big Data analytics include increasing risks of discovery new information about individuals. There are several approaches to protect privacy of individuals in the large data sets, privacy-preserving Data Mining being an example. In this paper, we discuss content-aware prevention of data leaks. We concentrate on protection of personal health information (PHI), arguably the most vulnerable type of personal information. This paper discusses the applied methods and challenges which arise when we want to hold health information private. PHI leak prevention on the Web and on online social networks is our case study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.riskbasedsecurity.com/reports/2013-DataBreachQuickView.pdf.

  2. 2.

    https://www.infoway-inforoute.ca/index.php/news-media/2012-news-releases/use-of-electronic- medical-records-doubled-over-six-years.

  3. 3.

    http://www.computerweekly.com/news/2240215175/UK-shows-biggest-take-up-of-electronic-Health-records-in-Europe.

  4. 4.

    http://www.networkworld.com/news/2009/020209-data-breach.html.

  5. 5.

    http://www.hhs.gov/ocr/privacy/index.html.

  6. 6.

    http://www.e-laws.gov.on.ca/html/statutes/english/elaws_statutes_04p03_e.htm.

  7. 7.

    http://eurlex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:31995L0046:EN:HTML.

  8. 8.

    https://www.i2b2.org/NLP/DataSets/Main.php.

  9. 9.

    http://pewinternet.org/Reports/2011/Social-Life-of-Health-Info.aspx.

  10. 10.

    http://www.yourprivacy.co.uk/.

  11. 11.

    http://www.internetlivestats.com/internet-users/.

  12. 12.

    http://en.wikipedia.org/wiki/District_of_Columbia_v.Heller.

References

  1. Armour Q, Elazmeh W, Nour El-Kadri N, Japkowicz N, Matwin S (2005) Privacy compliance enforcement in Email. Adv Artif Intell 18:194–204 (Springer)

    Google Scholar 

  2. Balicco L, Paganelli C (2011) Access to health information: going from professional to public practices. In: 4th International conference on information systems and economic intelligence, p 135

    Google Scholar 

  3. Boufaden N, Elazmeh W, Ma Y, Matwin S, El-Kadri N, Japkowicz N (2005) PEEP- an information extraction base approach for privacy protection in Email. CEAS

    Google Scholar 

  4. Boufaden N, Elazmeh W, Matwin S, Japkowicz N (2005) PEEP- privacy enforcement in Email project. In: Third annual conference on privacy, security and trust, pp 257–260

    Google Scholar 

  5. Carroll J, Koeling R, Puri S (2012) Lexical acquisition for clinical text mining using distributional similarity. In: Computational linguistics and intelligent text processing. Springer, New York, pp 232–246

    Google Scholar 

  6. Cavoukian A, Alvarez A (2012) Embedding privacy into the design of EHRs to enable multiple functionalities—Win/Win. Canada Health Infoway

    Google Scholar 

  7. Davenport T, McNeill D (2014) Analytics in healthcare and the life sciences. International Institute for Analytics

    Google Scholar 

  8. Ghazinour K, Sokolova M, Matwin S (2013) Detecting health-related privacy leaks in social networks using text mining tools. Adv Artif Intell 26:25–39 (Springer)

    Google Scholar 

  9. Harris A, Teschke K (2008) Personal privacy and public health: potential impacts of privacy legislation on health research in Canada. Can J Public Health 99:293–296

    Google Scholar 

  10. Jafer Y, Matwin S, Sokolova M (2014) Task oriented privacy preserving data publishing using feature selection. Adv Artif Intell 27:143–154 (Springer)

    Google Scholar 

  11. Johnson E (2009) Data hemorrhages in the health-care sector. In: Financial cryptography and data security, Springer, pp 71–89

    Google Scholar 

  12. Kazley A, Simpson A, Simpson K, Teufel R (2014) Association of electronic health records with cost savings in a national sample. Am J Manag Care 183–190

    Google Scholar 

  13. Li F, Zou X, Liu P, Chan J (2011) New threats to health data privacy. BMC Bioinf. doi:10.1186/1471-2105-12-S12-S7

  14. Malin B (2005) An evaluation of the current state of genomic data privacy protection technology and a roadmap for the future. J Am Med Inform Assoc 12:28–34

    Google Scholar 

  15. Malin B (2010) Secure construction of k-unlinkable patient records from distributed providers. Artif Intell Med 48:29–41

    Google Scholar 

  16. Matwin S, Szapiro T (2010) Data privacy: from technology to economics. In: J Koronacki et al (eds) Advances in machine learning II. Springer, New York, pp 43–74

    Google Scholar 

  17. McCoy A, Wright A, Eysenbach G, Malin B, Patterson E, Xu H et al (2013) State of the art in clinical informatics: evidence and examples. In: IMIA Yearbook of Medical Informatics, pp 1–8

    Google Scholar 

  18. Meystre S, Friedlin F, South B, Shen S, Samore M (2010) Automatic de-identification of textual documents in the electronic health record: a review of recent research. BMC Med Res Methodol. doi:10.1186/1471-2288-10-70

  19. Mitiku T, Tu K (2008) ICES report: using data from electronic medical records: theory versus practice. Healthc Q 11(2):23–25

    Google Scholar 

  20. Muqun L, Carrell D, Aberdeen J, Hirschman L, Malin B (2014) De-identification of clinical narratives through writing complexity measures. Int J Med Inform 83(10):750–767

    Google Scholar 

  21. Neamatullah I, Douglass M, Lehman L, Reisner A, Villarroel M, Long W et al (2008) Automated de-identification of free-text medical records. Med Inform Decis Mak 8(32):24–32

    Google Scholar 

  22. Orizio G, Schulz P, Gasparotti C, Caimi L (2010) The world of e-patients: a content analysis of online social networks focusing on diseases. Telemed J E Health 16(10):1060–1066

    Google Scholar 

  23. Richter J, Becker A, Schalis H, Koch T, Willers R, Specker C et al (2011) An ask-the-expert service on a rheumatology web site: who are the users and what did they look for? Arthritis Care Res 63(4):604–611

    Google Scholar 

  24. Sehatkar M (2014) Towards a privacy preserving framework for publishing longitudinal data (Ph.D. thesis). University of Ottawa

    Google Scholar 

  25. Sokolova M, El Emam K, Arbuckle L, Neri E, Rose S, Jonker E (2012) P2P Watch: personal health information detection in peer-to-peer file sharing networks. J Med Internet Res. http://dx.doi.org/10.2196/jmir.1898

  26. Swan M (2012) Sensor Mania! the internet of things, wearable computing, objective metrics, and the quantified self 2.0. J Sens Actuator Netw 1(3):217–253

    Google Scholar 

  27. Sweeney L (2006) Protecting job seekers from identity theft. IEEE Internet Comput 10(2):74–78

    Google Scholar 

  28. Tahboub R, Saleh Y (2014) Data leakage/loss prevention systems. Comput Appl Inform Syst 1–6 (IEEE)

    Google Scholar 

  29. Uzuner O, Luo Y, Szolovits P (2007) Evaluating the state-of-the-art in automatic de-indentification. J Am Med Inform Assoc 14(5):550–563

    Google Scholar 

  30. Van der Velden M, El Emam K (2012) Not all my friends need to know: a qualitative study of teenage patients, privacy and social media. J Am Med Inform Assoc 20(1):16–24

    Google Scholar 

  31. Wicks P, Massagli M, Frost J, Brownstein C, Okun S, Vaughan T et al (2010) Sharing health data for better outcomes on PatientsLikeMe. J Med Internet Res. http://dx.doi.org/10.2196/jmir.1549

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stan Matwin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Sokolova, M., Matwin, S. (2016). Personal Privacy Protection in Time of Big Data. In: Matwin, S., Mielniczuk, J. (eds) Challenges in Computational Statistics and Data Mining. Studies in Computational Intelligence, vol 605. Springer, Cham. https://doi.org/10.1007/978-3-319-18781-5_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18781-5_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18780-8

  • Online ISBN: 978-3-319-18781-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics