Skip to main content
Log in

Migrating federated learning to centralized learning with the leverage of unlabeled data

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Federated learning carries out cooperative training without local data sharing; the obtained global model performs generally better than independent local models. Benefiting from the free data sharing, federated learning preserves the privacy of local users. However, the performance of the global model might be degraded if diverse clients hold non-IID training data. This is because the different distributions of local data lead to weight divergence of local models. In this paper, we introduce a novel teacher–student framework to alleviate the negative impact of non-IID data. On the one hand, we maintain the advantage of the federated learning on the privacy-preserving, and on the other hand, we take the advantage of the centralized learning on the accuracy. We use unlabeled data and global models as teachers to generate a pseudo-labeled dataset, which can significantly improve the performance of the global model. At the same time, the global model as a teacher provides more accurate pseudo-labels. In addition, we perform a model rollback to mitigate the impact of latent noise labels and data imbalance in the pseudo-labeled dataset. Extensive experiments have verified that our teacher ensemble performs a more robust training. The empirical study verifies that the reliance on the centralized pseudo-labeled data enables the global model almost immune to non-IID data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. McMahan B, Moore E, Ramage D, Hampson S, y Arcas BA (2017) Communication-efficient learning of deep networks from decentralized data. In: Artificial intelligence and statistics, pp. 1273–1282

  2. Lim WYB, Luong NC, Hoang DT, Jiao Y, Liang Y-C, Yang Q, Niyato D, Miao C (2020) Federated learning in mobile edge networks: a comprehensive survey. IEEE Commun Surv Tutor 22:2031–2063

    Article  Google Scholar 

  3. Zhao Y, Li M, Lai L, Suda N, Civin D, Chandra V (2018) Federated learning with non-iid data. arXiv preprint arXiv:1806.00582

  4. Li X, Huang K, Yang W, Wang S, Zhang Z (2020) On the convergence of fedavg on non-iid data. In: International conference on learning representations

  5. Jeong W, Yoon J, Yang E, Hwang SJ (2021) Federated semi-supervised learning with inter-client consistency and disjoint learning. In: International conference on learning representations

  6. Zhang Z, Yao Z, Yang Y, Yan Y, Gonzalez JE, Mahoney MW (2020) Benchmarking semi-supervised federated learning, 17. arXiv preprint arXiv:2008.11364

  7. Liu Y, Yuan X, Zhao R, Zheng Y, Zheng Y (2020) Rc-ssfl: Towards robust and communication-efficient semi-supervised federated learning system. arXiv preprint arXiv:2012.04432

  8. Diao E, Ding J, Tarokh V (2021) Semifl: Communication efficient semi-supervised federated learning with unlabeled clients. arXiv preprint arXiv:2106.01432

  9. Sattler F, Marban A, Rischke R, Samek W (2020) Communication-efficient federated distillation. arXiv preprint arXiv:2012.00632

  10. Jeong E, Oh S, Kim H, Park J, Bennis M, Kim S-L (2018) Communication-efficient on-device machine learning: Federated distillation and augmentation under non-iid private data. arXiv preprint arXiv:1811.11479

  11. Itahara S, Nishio T, Koda Y, Morikura M, Yamamoto K (2021) Distillation-based semi-supervised federated learning for communication-efficient collaborative training with non-iid private data. IEEE Trans Mobile Comput 22(1):191–205

    Article  Google Scholar 

  12. Lin T, Kong L, Stich SU, Jaggi M (2020) Ensemble distillation for robust model fusion in federated learning. In: NeurIPS

  13. Chen H-Y, Chao W-L (2021) Fedbe: making Bayesian model ensemble applicable to federated learning. In: International conference on learning representations

  14. Mao S, Chen J, Jiao L, Gou S, Wang R (2019) Maximizing diversity by transformed ensemble learning. Appl Soft Comput 82:105580

    Article  Google Scholar 

  15. Kairouz P, McMahan HB, Avent B, Bellet A, Bennis M, Bhagoji AN, Bonawitz KA, Charles Z, Cormode G, Cummings R, D’Oliveira RGL, Eichner H, Rouayheb SE, Evans D, Gardner J, Garrett Z, Gascón A, Ghazi B, Gibbons PB, Gruteser M, Harchaoui Z, He C, He L, Huo Z, Hutchinson B, Hsu J, Jaggi M, Javidi T, Joshi G, Khodak M, Konečný J, Korolova A, Koushanfar F, Koyejo S, Lepoint T, Liu Y, Mittal P, Mohri M, Nock R, Özgür A, Pagh R, Qi H, Ramage D, Raskar R, Raykova M, Song D, Song W, Stich SU, Sun Z, Suresh AT, Tramèr F, Vepakomma P, Wang J, Xiong L, Xu Z, Yang Q, Yu FX, Yu H, Zhao S (2021) Advances and open problems in federated learning. Found Trends Mach Learn 14:1–210

  16. Arazo E, Ortego D, Albert P, O’Connor NE, McGuinness K (2020) Pseudo-labeling and confirmation bias in deep semi-supervised learning. In: 2020 International joint conference on neural networks, IJCNN 2020, Glasgow, United Kingdom, July 19-24, 2020, pp. 1–8

  17. Tarvainen A, Valpola H (2017) Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In: 5th International conference on learning representations, ICLR 2017, Toulon, France, April 24-26, 2017, Workshop Track Proceedings, vol 30

  18. Krizhevsky A, Hinton G, et al (2009) Learning multiple layers of features from tiny images

  19. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86:2278–2324

    Article  Google Scholar 

  20. Guha N, Talwalkar A, Smith V (2019) One-shot federated learning. arXiv preprint arXiv:1902.11175

  21. Wang H, Kaplan Z, Niu D, Li B (2020) Optimizing federated learning on non-iid data with reinforcement learning. In: IEEE INFOCOM 2020-IEEE conference on computer communications, pp. 1698–1707

  22. Li T, Sahu AK, Zaheer M, Sanjabi M, Talwalkar A, Smith V (2020) Federated optimization in heterogeneous networks. In: Dhillon IS, Papailiopoulos DS, Sze V (eds) Proceedings of machine learning and systems 2020, MLSys 2020, Austin, TX, USA, March 2-4

  23. Pang J, Huang Y, Xie Z, Han Q, Cai Z (2020) Realizing the heterogeneity: a self-organized federated learning framework for IoT. IEEE Internet Things J 8:3088–3098

    Article  Google Scholar 

  24. Laguel Y, Pillutla K, Malick J, Harchaoui Z (2020) Device heterogeneity in federated learning: a superquantile approach. arXiv preprint arXiv:2002.11223

  25. Shen S, Zhu T, Wu D, Wang W, Zhou W (2020) From distributed machine learning to federated learning: In the view of data privacy and security. Practice and experience, concurrency and computation

  26. Zhu T, Ye D, Wang W, Zhou W, Yu P (2020) More than privacy: applying differential privacy in key areas of artificial intelligence. IEEE Trans Knowl Data Eng 34(6):2824–2843

    Google Scholar 

  27. Yoshida N, Nishio T, Morikura M, Yamamoto K, Yonetani R (2020) Hybrid-fl for wireless networks: Cooperative learning mechanism using non-iid data. In: ICC 2020-2020 IEEE international conference on communications (ICC), pp 1–7

  28. Lu R, Zhang W, Li Q, Zhong X, Vasilakos AV (2021) Auction based clustered federated learning in mobile edge computing system. arXiv preprint arXiv:2103.07150

  29. Taïk A, Mlika Z, Cherkaoui S (2022) Data-aware device scheduling for federated edge learning. IEEE Trans Cogn Commun Netw, 408–421

  30. Li A, Zhang L, Tan J, Qin Y, Wang J, Li X (2021) Sample-level data selection for federated learning. In: 40th IEEE conference on computer communications, INFOCOM 2021, Vancouver, BC, Canada, May 10-13, 2021, pp 1–10

  31. Sattler F, Müller K, Samek W (2021) Clustered federated learning: model-agnostic distributed multitask optimization under privacy constraints. IEEE Trans Neural Netw Learn Syst 32:3710–3722

    Article  MathSciNet  Google Scholar 

  32. Briggs C, Fan Z, Andras P (2020) Federated learning with hierarchical clustering of local updates to improve training on non-iid data. In: 2020 international joint conference on neural networks (IJCNN), pp. 1–9

  33. Jamali-Rad H, Abdizadeh M, Szabo A (2021) Federated learning with taskonomy for non-iid data. arXiv preprint arXiv:2103.15947

  34. Hu K, Wu J, Weng L, Zhang Y, Zheng F, Pang Z, Xia M (2021) A novel federated learning approach based on the confidence of federated kalman filters. Int J Mach Learn Cybern 12:3607–3627

    Article  Google Scholar 

  35. Li X, Liu N, Chen C, Zheng Z, Li H, Yan Q (2020) Communication-efficient collaborative learning of geo-distributed jointcloud from heterogeneous datasets. In: 2020 IEEE international conference on joint cloud computing, pp. 22–29

  36. Sohn K, Berthelot D, Carlini N, Zhang Z, Zhang H, Raffel C, Cubuk ED, Kurakin A, Li C (2020) Fixmatch: simplifying semi-supervised learning with consistency and confidence

  37. Albaseer A, Ciftler BS, Abdallah M, Al-Fuqaha A (2020) Exploiting unlabeled data in smart cities using federated edge learning. In: 2020 international wireless communications and mobile computing (IWCMC), pp. 1666–1671

  38. Long Z, Che L, Wang Y, Ye M, Luo J, Wu J, Xiao H, Ma F (2020) Fedsiam: Towards adaptive federated semi-supervised learning. arXiv preprint arXiv:2012.03292

  39. Dong X, Yu Z, Cao W, Shi Y, Ma Q (2020) A survey on ensemble learning. Front. Comput. Sci. 14:241–258

    Article  Google Scholar 

  40. Zhu Z, Hong J, Zhou J (2021) Data-free knowledge distillation for heterogeneous federated learning. In: International conference on machine learning, pp. 12878–12889. PMLR

Download references

Funding

The funding was provided by National Natural Science Foundation of China (Grant No. 61972366).

Author information

Authors and Affiliations

Authors

Contributions

XW wrote the main manuscript and did the experiment; TZ proposed the key idea of the research; WR extended the idea in details; DZ helped on the experiment part; PX helped to extended the idea and proposed revised comments. All authors reviewed the manuscript.

Corresponding author

Correspondence to Tianqing Zhu.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Zhu, T., Ren, W. et al. Migrating federated learning to centralized learning with the leverage of unlabeled data. Knowl Inf Syst 65, 3725–3752 (2023). https://doi.org/10.1007/s10115-023-01869-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-023-01869-8

Keywords

Navigation