Privacy-Preserving Data Mining for Distributed Medical Scenarios

  • Simone Scardapane
  • Rosa Altilio
  • Valentina Ciccarelli
  • Aurelio Uncini
  • Massimo PanellaEmail author
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 69)


In this paper, we consider the application of data mining methods in medical contexts, wherein the data to be analysed (e.g. records from different patients) is distributed among multiple clinical parties. Although inference procedures could provide meaningful medical information (such as optimal clustering of the subjects), each party is forbidden to disclose its local dataset to a centralized location, due to privacy concerns over sensible portions of the dataset. To this end, we propose a general framework enabling the parties involved to perform (in a decentralized fashion) any data mining procedure relying solely on the Euclidean distance among patterns, including kernel methods, spectral clustering, and so on. Specifically, the problem is recast as a decentralized matrix completion problem, whose proposed solution does not require the presence of a centralized coordinator, and full privacy of the original data can be ensured by the use of different strategies, including random multiplicative updates for secure computation of distances. Experimental results support our proposal as an efficient tool for performing clustering and classification in distributed medical contexts. As an example, on the known Pima Indians Diabetes dataset, we obtain a Rand-Index for clustering of 0.52 against 0.54 of the (unfeasible) centralized solution, while on the Parkinson speech database we increase from 0.45 to 0.50.


Distributed learning Biomedicine Kernel methods Spectral clustering Privacy 


  1. 1.
    Baccarelli, E., Cordeschi, N., Mei, A., Panella, M., Shojafar, M., Stefa, J.: Energy-efficient dynamic traffic offloading and reconfiguration of networked data centers for big data stream mobile computing: review, challenges, and a case study. IEEE Netw. 30(2), 54–61 (2016)Google Scholar
  2. 2.
    Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., Zhu, M.Y.: Tools for privacy preserving distributed data mining. ACM SiGKDD Explor. Newsl. 4(2), 28–34 (2002)CrossRefGoogle Scholar
  3. 3.
    Fierimonte, R., Scardapane, S., Uncini, A., Panella, M.: Fully decentralized semi-supervised learning via privacy-preserving matrix completion. IEEE Trans. Neural Netw. Learn. Syst. (2016) in press. doi: 10.1109/TNNLS.2016.2597444
  4. 4.
    Forero, P.A., Cano, A., Giannakis, G.B.: Consensus-based distributed support vector machines. JMLR 11, 1663–1707 (2010)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Liu, K., Kargupta, H., Ryan, J.: Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Trans. Knowl. Data Eng. 18(1), 92–106 (2006)CrossRefGoogle Scholar
  6. 6.
    Mishra, B., Meyer, G., Sepulchre, R.: Low-rank optimization for distance matrix completion. In: 2011 50th IEEE Conference on Decision and Control and European Control Conference (CDC-ECC’11), pp. 4455–4460. IEEE (2011)Google Scholar
  7. 7.
    Predd, J.B., Kulkarni, S.R., Poor, H.V.: Distributed learning in wireless sensor networks. IEEE Signal Process. Mag. 23(4), 56–69 (2006)CrossRefGoogle Scholar
  8. 8.
    Sayed, A.H.: Adaptive networks. Proc. IEEE 102(4), 460–497 (2014)CrossRefGoogle Scholar
  9. 9.
    Scardapane, S., Fierimonte, R., Di Lorenzo, P., Panella, M., Uncini, A.: Distributed semi-supervised support vector machines. Neural Netw. 80, 43–52 (2016)Google Scholar
  10. 10.
    Scardapane, S., Wang, D., Panella, M.: A decentralized training algorithm for echo state networks in distributed big data applications. Neural Netw. 78, 65–74 (2016)Google Scholar
  11. 11.
    Scardapane, S., Wang, D., Panella, M., Uncini, A.: Distributed learning for random vector functional-link networks. Inf. Sci. 301, 271–284 (2015)Google Scholar
  12. 12.
    Sweeney, L.: k-anonymity: A model for protecting privacy. Int. J. of Uncertainty, Fuzziness and Knowledge-Based Systems 10(05), 557–570 (2002)Google Scholar
  13. 13.
    Vieira-Marques, P.M., Robles, S., Cucurull, J., Navarro, G., et al.: Secure integration of distributed medical data using mobile agents. IEEE Intelligent Systems 21(6), 47–54 (2006)CrossRefGoogle Scholar
  14. 14.
    Von Luxburg, U.: A tutorial on spectral clustering. Statistics and computing 17(4), 395–416 (2007)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Yoo, I., Alafaireet, P., Marinov, M., Pena-Hernandez, K., Gopidi, R., Chang, J.F., Hua, L.: Data mining in healthcare and biomedicine: a survey of the literature. J. Med. Syst. 36(4), 2431–2448 (2012)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Simone Scardapane
    • 1
  • Rosa Altilio
    • 1
  • Valentina Ciccarelli
    • 1
  • Aurelio Uncini
    • 1
  • Massimo Panella
    • 1
    Email author
  1. 1.Department of Information Engineering, Electronics and Telecommunications (DIET)University of Rome “La Sapienza”RomeItaly

Personalised recommendations