Peer-to-Peer Networking and Applications

, Volume 9, Issue 5, pp 864–875 | Cite as

A semi-supervised privacy-preserving clustering algorithm for healthcare

  • Meiyu Huang
  • Yiqiang Chen
  • Bo-Wei Chen
  • Junfa Liu
  • Seungmin Rho
  • Wen Ji


With the proliferation of healthcare data, the cloud mining technology for E-health services and applications has become a hot research topic. While on the other hand, these rapidly evolving cloud mining technologies and their deployment in healthcare systems also pose potential threats to patient’s data privacy. In order to solve the privacy problem in the cloud mining technique, this paper proposes a semi-supervised privacy-preserving clustering algorithm. By employing a small amount of supervised information, the method first learns a Large Margin Nearest Cluster metric using convex optimization. Then according to the trained metric, the method imposes multiplicative perturbation on the original data, which can change the distribution shape of the original data and thus protect the privacy information as well as ensuring high data usability. The experimental results on the brain fiber dataset provided by the 2009 PBC demonstrated that the proposed method could not only protect data privacy towards secure attacks, but improve the clustering purity.


Privacy-preserving cloud clustering Data perturbation Semi-supervised Large margin nearest cluster metric Brain fiber 


  1. 1.
    Wang L, Alexander CA (2014) Telemedicine based on mobile devices and mobile cloud computing. Int J Cloud Comput Serv Sci 3(1):26–36Google Scholar
  2. 2.
    Sultan N (2014) Making use of cloud computing for healthcare provision: opportunities and challenges. Int J Inf Manag 34(2):177–184CrossRefGoogle Scholar
  3. 3.
    Uniyal D, Raychoudhury V (2014) Pervasive healthcare-a comprehensive survey of tools and techniques. Clin Orthop Relat ResGoogle Scholar
  4. 4.
    Jeong S, Kim Y-W, Youn C-H (2014) Personalized healthcare system for chronic disease care in cloud environment. J Electron Telecommunications Res Inst 36(5):730–740Google Scholar
  5. 5.
    Meyer J, Boll S (2014) Digital health devices for everyone! IEEE Pervasive Comput 13(2):10–13CrossRefGoogle Scholar
  6. 6.
    Min J-K, Doryab A, Wiese J, Amini S, Zimmerman J, Hong JI (2014) Toss’n’turn: smartphone as sleep and sleep quality detector. In Proceedings of the 32nd annual ACM conference on human factors in computing systems. ACM 477–486Google Scholar
  7. 7.
    Banu PN, Andrews S (2015) Performance analysis of hard and soft clustering approaches for gene expression data. Int J Rough Sets Data Anal 2(1):58–69CrossRefGoogle Scholar
  8. 8.
    Yuan B, Herbert J (2014) Context-aware hybrid reasoning framework for pervasive health- care. Pers Ubiquit Comput 18(4):865–881CrossRefGoogle Scholar
  9. 9.
    Theoharidou M, Tsalis N, Gritzalis D (2014) Smart home solutions for healthcare: privacy in ubiquitous computing infrastructures. Handbook of smart homes, health care and well-beingGoogle Scholar
  10. 10.
    Avancha S, Baxi A, Kotz D (2012) Privacy in mobile technology for personal healthcare. ACM Comput Surv 45(1):3CrossRefGoogle Scholar
  11. 11.
    Verykios VS, Bertino E, Fovino IN, Provenza LP, Saygin Y, Theodoridis Y (2004) State-of-the-art in privacy preserving data mining. ACM Sigmod Rec 33(1):50–57CrossRefGoogle Scholar
  12. 12.
    Chhinkaniwala H, Garg S (2014) Privacy preserving data mining-issues & techniques: preserving privacy of data streams and large data sets while mining. Scholars PressGoogle Scholar
  13. 13.
    Wang B, Yang J (2011) The state of the art and tendency of privacy preserving data mining. In International Conference on E-Business and E-Government. IEEE 1–3Google Scholar
  14. 14.
    Keyvanpour MR, Moradi SS (2014) A perturbation method based on singular value decomposition and feature selection for privacy preserving data mining. Int J Data Warehouse Min 10(1):55–76CrossRefGoogle Scholar
  15. 15.
    Liu K, Kargupta H, Ryan J (2006) Random projection-based multiplicative data pertur- bation for privacy preserving distributed data mining. IEEE Trans Knowl Data Eng 18(1):92–106CrossRefGoogle Scholar
  16. 16.
    Saygin Y, Verykios VS, Elmagarmid AK (2002) Privacy preserving association rule mining. In Proceedings of twelfth international workshop on research issues in data engineering: engineering e-commerce/e-business systems. IEEE 151–158Google Scholar
  17. 17.
    Fienberg SE, McIntyre J (2005) Data swapping: variations on a theme by dalenius and reiss. J Off Stat 21(2):309Google Scholar
  18. 18.
    Oliveira SR, Za¨ıane OR (2004) Achieving privacy preservation when sharing data for clustering. In secure data management. Springer 67–82Google Scholar
  19. 19.
    The pbc site. [Online]. Available:
  20. 20.
    Han J, Kamber M (2006) Data mining, Southeast Asia edition: concepts and techniques. Morgan kaufmannGoogle Scholar
  21. 21.
    Tomar D, Agarwal S (2013) A survey on data mining approaches for healthcare. Int J Bio Sci Bio Technol 5(5):241–266CrossRefGoogle Scholar
  22. 22.
    Kumar V, Park H, Basole RC, Braunstein M, Kahng M, Chau DH, Tamersoy A, Hirsh DA, Serban N, Bost J et al (2014) Exploring clinical care processes using visual and data analytics: challenges and opportunities. In Proceedings of the 20th ACM SIGKDD conference on knowledge discovery and data mining workshop on data science for social goodGoogle Scholar
  23. 23.
    Hartigan JA, Wong MA (1979) Algorithm as 136: a k-means clustering algorithm. Appl Stat 100–108Google Scholar
  24. 24.
    Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Disc 2(3):283–304CrossRefGoogle Scholar
  25. 25.
    Ball GH, Hall DJ (1965) Isodata, a novel method of data analysis and pattern classification. DTIC document. Technol RepGoogle Scholar
  26. 26.
    Kaushik K, Kapoor D, Varadharajan V, Nallusamy R (2014) Disease management: clustering–based disease prediction. Int J Collab Enterp 4(1):69–82CrossRefGoogle Scholar
  27. 27.
    Vesanto J, Alhoniemi E (2000) Clustering of the self-organizing map. IEEE Trans Neural Netw 11(3):586–600CrossRefGoogle Scholar
  28. 28.
    Hajihashemi Z, Yefimova M, Popescu M (2014) Detecting daily routines of older adults using sensor time series clustering. In Proceedings of the 36th annual IEEE international conference on engineering in medicine and biology society. IEEE 5912–5915Google Scholar
  29. 29.
    Fahad LG, Tahir SF, Rajarajan M (2014) Activity recognition in smart homes using clustering based classification. In Proceedings of the 22nd international conference on pattern recognition. IEEE 1348–1353Google Scholar
  30. 30.
    Lloyd S (1982) Least squares quantization in pcm. IEEE Trans Inf Theory 28(2):129–137MathSciNetCrossRefMATHGoogle Scholar
  31. 31.
    Haraty RA, Dimishkieh M, Masud M (2015) An enhanced k-means clustering algorithm for pattern discovery in healthcare data. Int J Distrib Sens NetwGoogle Scholar
  32. 32.
    Wang X, Chen M, Kwon TT, Yang L, Leung V (2013) Ames-cloud: a framework of adaptive mobile video streaming and efficient social video sharing in the clouds. IEEE Trans Multimed 15(4):811–820CrossRefGoogle Scholar
  33. 33.
    Wan J, Ullah S, Lai C-F, Zhou M, Wang X et al (2013) Cloud-enabled wireless body area networks for pervasive healthcare. IEEE Netw 27(5):56–61CrossRefGoogle Scholar
  34. 34.
    Raij A, Ghosh A, Kumar S, Srivastava M (2011) Privacy risks emerging from the adoption of innocuous wearable sensors in the mobile environment. In Proceedings of the SIGCHI conference on human factors in computing systems, ser. CHI ’11. New York, NY, USA. ACM 11–20Google Scholar
  35. 35.
    Du W, Zhan Z (2003) Using randomized response techniques for privacy-preserving data mining. In Proceedings of the ninth ACM SIGKDD international conference on knowl- edge discovery and data mining. ACM 505–510Google Scholar
  36. 36.
    Kalaivani R, Chidambaram S (2014) Additive gaussian noise based data perturbation in multi-level trust privacy preserving data mining. Int J Data Min Knowl Manag Process 4(3):21–29CrossRefGoogle Scholar
  37. 37.
    Wieland SC, Cassa CA, Mandl KD, Berger B (2008) Revealing the spatial distribution of a disease while preserving privacy. Proc Natl Acad Sci 105(46):17608–17613CrossRefGoogle Scholar
  38. 38.
    Elmisery AM, Fu H (2010) Privacy preserving distributed learning clustering of healthcare data using cryptography protocols. In Proceeedings of the 34th Annual IEEE Conference on Computer Software and Applications Workshops. IEEE 140–145Google Scholar
  39. 39.
    Williams J (2010) Social networking applications in health care: threats to the privacy and security of health information. In Proceedings of the International Conference on Software Engeneering Workshop on Software Engineering in Health Care. ACM 39–49Google Scholar
  40. 40.
    Allab K, Benabdeslem K (2011) Constraint selection for semi-supervised topological clustering. In machine learning and knowledge discovery in databases. Springer 28–43Google Scholar
  41. 41.
    Lange T, Law MH, Jain AK, Buhmann JM (2005) Learning with constrained and unlabelled data. Proc IEEE Comput Soc Conf Comput Vis Pattern Recogn 1:731–738Google Scholar
  42. 42.
    Bekkerman R, Sahami M (2006) Semi-supervised clustering using combinatorial mrfs. In Proceedings of IEEE international conference of machine learning workshop on learn- ing in structured output spacesGoogle Scholar
  43. 43.
    Yang L, Jin R (2006) Distance metric learning: a comprehensive survey. Michigan State Univ 2Google Scholar
  44. 44.
    Guillaumin M, Verbeek J, Schmid C (2010) Multiple instance metric learning from auto- matically labeled bags of faces. In Europeon Conference on Computer Vision. Springer 634–647Google Scholar
  45. 45.
    Klein D, Kamvar SD, Manning CD (2002) From instance-level constraints to space- level constraints: making the most of prior knowledge in data clustering. In Proceedings of the Nineteenth International Conference on Machine Learning. Morgan Kaufmann Publishers Inc. 307–314Google Scholar
  46. 46.
    Cohn D, Caruana R, McCallum A (2003) Semi-supervised clustering with user feedback. Constrained Cluster Adv AlgorithmsTheory Appl 4(1):17–32MathSciNetMATHGoogle Scholar
  47. 47.
    Wu L, Hoi SC, Jin R, Zhu J, Yu N (2012) Learning bregman distance functions for semi-supervised clustering. IEEE Trans Knowl Data Eng 24(3):478–491CrossRefGoogle Scholar
  48. 48.
    Domeniconi C, Peng J, Yan B (2011) Composite kernels for semi-supervised clustering. Knowl Inf Syst 28(1):99–116CrossRefGoogle Scholar
  49. 49.
    Chen Y, Rege M, Dong M, Hua J (2007) Incorporating user provided constraints into document clustering. In Proceedings of the Seventh IEEE International Conference on Data Mining. IEEE 103–112Google Scholar
  50. 50.
    Baghshah MS, Shouraki SB (2010) Kernel-based metric learning for semi-supervised clustering. Neurocomputing 73(7):1352–1361CrossRefMATHGoogle Scholar
  51. 51.
    Hoi SC, Jin R, Lyu MR (2007) Learning nonparametric kernel matrices from pairwise constraints. In Proceedings of the 24th International Conference on Machine Learning. ACM 361–368Google Scholar
  52. 52.
    Bar-Hillel A, Hertz T, Shental N, Weinshall D (2003) Learning distance functions using equivalence relations. Proc Tenth Int Conf Mach Learn 3:11–18MATHGoogle Scholar
  53. 53.
    Xing EP, Jordan MI, Russell S, Ng AY (2002) Distance metric learning with appli- cation to clustering with side-information. In advances in neural information processing systems. 505–512Google Scholar
  54. 54.
    Weinberger KQ, Blitzer J, Saul LK (2005) Distance metric learning for large margin nearest neighbor classification. In Advances in neural information processing systems. 1473–1480Google Scholar
  55. 55.
    Vandenberghe L, Boyd S (1996) Semidefinite programming. Soc Ind Appl Math Rev 38(1):49–95MathSciNetMATHGoogle Scholar
  56. 56.
    Bertsekas DP (1976) On the goldstein-levitin-polyak gradient projection method. IEEE Trans Autom Control 21(2):174–184MathSciNetCrossRefMATHGoogle Scholar
  57. 57.
    Golub GH, Van Loan CF (2012) Matrix computations. JHU Press 3Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Meiyu Huang
    • 1
    • 2
  • Yiqiang Chen
    • 1
  • Bo-Wei Chen
    • 3
  • Junfa Liu
    • 1
  • Seungmin Rho
    • 4
  • Wen Ji
    • 1
  1. 1.Beijing Key Laboratory of Mobile Computing and Pervasive Device, Institute of Computing TechnologyChinese Academy of SciencesBeijingChina
  2. 2.University of Chinese Academy of SciencesBeijingChina
  3. 3.Department of Electrical EngineeringPrinceton UniversityPrincetonUSA
  4. 4.Department of MultimediaSungkyul UniversitySungkyulSouth Korea

Personalised recommendations