Detecting Health-Related Privacy Leaks in Social Networks Using Text Mining Tools

  • Kambiz Ghazinour
  • Marina Sokolova
  • Stan Matwin
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7884)


In social media, especially in social networks, users routinely share personal information. In such sharing, they might inadvertently reveal some personal health information, an essential part of their private information. In this work, we present a tool for detection of personal health information (PHI) in a social network site, MySpace. We analyze the PHI with the use of two well-known medical resources MedDRA and SNOMED. We introduce a new measure – Risk Factor of Personal Information – that assesses a possibility of a term to disclose personal health information. We synthesize a profile of a potential PHI leak in a social network, and we demonstrate that this task benefits from the emphasis on the MedDRA and SNOMED terms.


Medical electronic dictionaries Personal health information Social networks Machine Learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Balicco, L., Paganelli, C.: Access to health information: going from professional to public practices. In: 4th Int. Conference on Information Systems and Economic Intelligence, SIIE 2011 (2011)Google Scholar
  2. 2.
    Campbell, J., Xu, J., Wah Fung, K.: Can SNOMED CT fulfill the vision of a compositional terminology? Analyzing the use case for Problem List. In: AMIA Annual Symposium Proc. 2011, pp. 181–188 (2011)Google Scholar
  3. 3.
    Carroll, J., Koeling, R., Puri, S.: Lexical Acquisition for Clinical Text Mining Using Distributional Similarity. In: Gelbukh, A. (ed.) CICLing 2012, Part II. LNCS, vol. 7182, pp. 232–246. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  4. 4.
    Dietterich, T.G.: Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computation 10(7), 1895–1924 (1998)CrossRefGoogle Scholar
  5. 5.
    Dwyer, C., Hiltz, S.R., Passerini, K.: Trust and privacy concern within social networking sites: A comparison of Facebook and MySpace. In: Proceedings of the Thirteenth Americas Conference on Information Systems, Keystone, Colorado, August 09-12 (2007)Google Scholar
  6. 6.
    Grace, J., Gruhl, D., Haas, K., Nagarajan, M., Robson, C., Sahoo, N.: Artist ranking through analysis of on-line community comments (2007),$File/rj371154852573810421.pdf
  7. 7.
    Kennedy, D.: Doctor blogs raise concerns about patient privacy, (accessed June 13, 2012)
  8. 8.
    Lagu, T., Kaufman, E., Asch, D., Armstrong, K.: Content of Weblogs Written by Health Professionals. Journal of General Internal Medicine 23(10), 1642–1646 (2008)CrossRefGoogle Scholar
  9. 9.
    Li, F., Zou, X., Liu, P., et al.: New threats to health data privacy. BMC Bioinformatics 12, S7 (2011)Google Scholar
  10. 10.
    Malik, S., Coulson, N.: Coping with infertility online: an examination of self-help mechanisms in an online infertility support group. Patient Educ. Couns 81, 315–318 (2010)CrossRefGoogle Scholar
  11. 11.
    McLernon, D.J., Bond, C.M., Hannaford, P.C., Watson, M.C., Lee, A.J., Hazell, L., Avery, A.: Adverse drug reaction reporting in the UK: a retrospective observational comparison of Yellow Card reports submitted by patients and healthcare professionals. Drug Saf. 33(9), 775–788 (2010)CrossRefGoogle Scholar
  12. 12.
    MedDRA Maintenance and Support Services Organization, (accessed January 1, 2013)
  13. 13.
    Miller, A.R., Tucker, C.: Privacy protection and technology adoption: The case of electronic medical records. Management Science 55(7), 1077–1093 (2009)CrossRefGoogle Scholar
  14. 14.
    Renahy, E.: Recherche bd’infomation en matiere de sante sur INternet: determinants, practiques et impact sur la sante et le recours aux soins, Paris 6 (2008) Google Scholar
  15. 15.
    Scanfeld, D., Scanfeld, V., Larson, E.: Dissemination of health information through social networks: Twitter and antibiotics. American Journal of Infection Control 38(3), 182–188 (2010)CrossRefGoogle Scholar
  16. 16.
    Shani, G., Chickering, D.M., Meek, C.: Mining recommendations from the web. In: RecSys 2008: Proceedings of the 2008 ACM Conference on Recommender Systems, pp. 35–42 (2008)Google Scholar
  17. 17.
    Silverman, E.: Doctor Blogs Reveal Patient Info & Endorse Products. Pharmalot (2008), (December 15, 2009)
  18. 18.
    Systematized Nomenclature of Medicine, (accessed January 1, 2013)
  19. 19.
    Sokolova, M., Schramm, D.: Building a patient-based ontology for mining user-written content. In: Recent Advances in Natural Language Processing, pp. 758–763 (2011)Google Scholar
  20. 20.
    Star, K., Norén, G.N., Nordin, K., Edwards, I.R.: Suspected adverse drug reaction reported for children worldwide: an exploratory study using VigiBase. Drug Saf. 34, 415–428 (2011)CrossRefGoogle Scholar
  21. 21.
    Yeniterzi, R., Aberdeen, J., Bayer, S., Wellner, B., Clark, C., Hirschman, L., Malin, B.: Effects of personal identifier resynthesis on clinical text de-identification. J. Am. Med. Inform. Assoc. 17(2), 159–168 (2010)CrossRefGoogle Scholar
  22. 22.
    Yu, F.: High Speed Deep Packet Inspection with Hardware Support’- Technical Report No. UCB/EECS-2006-156 (2006),
  23. 23.
    Zhang, W., Gunter, C.A., Liebovitz, D., Tian, J., Malin, B.: Role prediction using electronic medical record system audits. In: AMIA 2011 Annual Symposium, pp. 858–867. American Medical Informatics Association (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Kambiz Ghazinour
    • 1
    • 3
  • Marina Sokolova
    • 1
    • 2
    • 3
  • Stan Matwin
    • 1
    • 4
  1. 1.School of Electrical Engineering and Computer ScienceUniversity of OttawaCanada
  2. 2.Faculty of MedicineUniversity of OttawaCanada
  3. 3.Electronic Health Information LabCHEO Research InstituteCanada
  4. 4.Faculty of Computer ScienceDalhousie UniversityCanada

Personalised recommendations