Advertisement

Have You Forgotten? A Method to Assess if Machine Learning Models Have Forgotten Data

Conference paper
  • 5.5k Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12261)

Abstract

In the era of deep learning, aggregation of data from several sources is a common approach to ensuring data diversity. Let us consider a scenario where several providers contribute data to a consortium for the joint development of a classification model (hereafter the target model), but, now one of the providers decides to leave. This provider requests that their data (hereafter the query dataset) be removed from the databases but also that the model ‘forgets’ their data. In this paper, for the first time, we want to address the challenging question of whether data have been forgotten by a model. We assume knowledge of the query dataset and the distribution of a model’s output. We establish statistical methods that compare the target’s outputs with outputs of models trained with different datasets. We evaluate our approach on several benchmark datasets (MNIST, CIFAR-10 and SVHN) and on a cardiac pathology diagnosis task using data from the Automated Cardiac Diagnosis Challenge (ACDC). We hope to encourage studies on what information a model retains and inspire extensions in more complex settings.

Keywords

Privacy Statistical measure Kolmogorov-Smirnov 

Notes

Acknowledgment

This work was supported by the University of Edinburgh by a PhD studentship. This work was partially supported by the Alan Turing Institute under the EPSRC grant EP/N510129/1. S.A. Tsaftaris acknowledges the support of the Royal Academy of Engineering and the Research Chairs and Senior Research Fellowships scheme and the [in part] support of the Industrial Centre for AI Research in digital Diagnostics (iCAIRD) which is funded by Innovate UK on behalf of UK Research and Innovation (UKRI) [project number: 104690] (https://icaird.com/).

Supplementary material

505204_1_En_10_MOESM1_ESM.pdf (109 kb)
Supplementary material 1 (pdf 109 KB)

References

  1. 1.
    Barillot, C., et al.: Federating distributed and heterogeneous information sources in neuroimaging: the neurobase project. Stud. Health Technol. Inf. 120, 3 (2006)Google Scholar
  2. 2.
    Belghazi, M.I., et al.: Mutual information neural estimation. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 531–540. PMLR, Stockholmsmässan, Stockholm Sweden, 10–15 July 2018Google Scholar
  3. 3.
    Bernard, O., et al.: Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE Trans. Med. Imaging 37(11), 2514–2525 (2018)CrossRefGoogle Scholar
  4. 4.
    Carlini, N., Liu, C., Erlingsson, U., Kos, J., Song, D.: The secret sharer: evaluating and testing unintended memorization in neural networks. In: Proceedings of the 28th USENIX Conference on Security Symposium, SEC 2019, pp. 267–284. USENIX Association, Berkeley, CA, USA (2019)Google Scholar
  5. 5.
    Cherubin, G., Chatzikokolakis, K., Palamidessi, C.: F-BLEAU: fast Black-box Leakage Estimation, February 2019. http://arxiv.org/abs/1902.01350
  6. 6.
    Feller, W.: On the Kolmogorov-Smirnov limit theorems for empirical distributions. In: Schilling, R., Vondraček, Z., Woyczyński, W. (eds.) Selected Papers I, pp. 735–749. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-16859-3_38CrossRefGoogle Scholar
  7. 7.
    Glazer, A., Lindenbaum, M., Markovitch, S.: Learning high-density regions for a generalized Kolmogorov-Smirnov test in high-dimensional data. In: NIPS (2012)Google Scholar
  8. 8.
    Golatkar, A., Achille, A., Soatto, S.: Eternal sunshine of the spotless net: selective forgetting in deep networks (2019)Google Scholar
  9. 9.
    Gong, M., Xie, Y., Pan, K., Feng, K., Qin, A.K.: A survey on differentially private machine learning [review article]. IEEE Comput. Intell. Mag. 15(2), 49–64 (2020)CrossRefGoogle Scholar
  10. 10.
    Ji, Z., Lipton, Z.C., Elkan, C.: Differential privacy and machine learning: a survey and review (2014)Google Scholar
  11. 11.
    Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)Google Scholar
  12. 12.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  13. 13.
    Li, T., Sahu, A.K., Talwalkar, A., Smith, V.: Federated learning: challenges, methods, and future directions. IEEE Signal Process. Mag. 37(3), 50–60 (2020)CrossRefGoogle Scholar
  14. 14.
    Li, W., et al.: Privacy-preserving federated brain tumour segmentation. In: Suk, H.I., Liu, M., Yan, P., Lian, C. (eds.) MLMI 2019. LNCS, vol. 11861, pp. 133–141. Springer, Cham (2019).  https://doi.org/10.1007/978-3-030-32692-0_16CrossRefGoogle Scholar
  15. 15.
    Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011)Google Scholar
  16. 16.
    Pyrgelis, A., Troncoso, C., De Cristofaro, E.: Under the hood of membership inference attacks on aggregate location time-series, February 2019. http://arxiv.org/abs/1902.07456
  17. 17.
    Roy, A.G., Siddiqui, S., Pölsterl, S., Navab, N., Wachinger, C.: BrainTorrent: a peer-to-peer environment for decentralized federated learning, May 2019. http://arxiv.org/abs/1905.06731
  18. 18.
    Sablayrolles, A., Douze, M., Schmid, C., Jégou, H.: D\(\backslash \)’ej\(\backslash \)‘a Vu: an empirical evaluation of the memorization properties of ConvNets, September 2018. http://arxiv.org/abs/1809.06396
  19. 19.
    Sheller, M.J., Reina, G.A., Edwards, B., Martin, J., Bakas, S.: Multi-institutional deep learning modeling without sharing patient data: a feasibility study on brain tumor segmentation. In: Crimi, A., Bakas, S., Kuijf, H., Keyvan, F., Reyes, M., van Walsum, T. (eds.) BrainLes 2018. LNCS, vol. 11383, pp. 92–104. Springer, Cham (2019).  https://doi.org/10.1007/978-3-030-11723-8_9CrossRefGoogle Scholar
  20. 20.
    Shokri, R., Stronati, M., Song, C., Shmatikov, V.: Membership inference attacks against machine learning models. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 3–18, May 2017.  https://doi.org/10.1109/SP.2017.41
  21. 21.
    Torralba, A., Efros, A.: Unbiased look at dataset bias. In: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1521–1528 (2011)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.School of EngineeringUniversity of EdinburghEdinburghUK
  2. 2.The Alan Turing InstituteLondonUK

Personalised recommendations