Abstract
Federated learning suffers from terrible generalization performance because the model fails to utilize global information over all clients when data is non-IID (not independently or identically distributed) partitioning. Meanwhile, the theoretical studies in this field are still insufficient. In this paper, we present an excess risk bound for federated learning on non-IID data, which measures the error between the model of federated learning and the optimal centralized model. Specifically, we present a novel error decomposition strategy, which decomposes the excess risk into three terms: agnostic error, federated error, and approximation error. By estimating the error terms, we find that Rademacher complexity and discrepancy distance are the keys to affecting the learning performance. Motivated by the theoretical findings, we propose FedAvgR to improve the performance via additional regularizers to lower the excess risk. Experimental results demonstrate the effectiveness of our algorithm and coincide with our theory.
Keywords
- Federated learning
- Non-IID
- Excess risk bound
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsChange history
31 March 2022
In the originally published version of chapter 3 the second affiliation of the author Bojian Wei was incorrect. The second affiliation of the author Bojian Wei has been corrected as “School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China”.
References
Bartlett, P.L., Bousquet, O., Mendelson, S.: Localized rademacher complexities. In: COLT, vol. 2375, pp. 44–58 (2002)
Basu, D., Data, D., Karakus, C., Diggavi, S.N.: Qsparse-local-SGD: distributed SGD with quantization, sparsification and local computations. In: NeurIPS, pp. 14668–14679 (2019)
Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., Vaughan, J.W.: A theory of learning from different domains. Mach. Learn. 79(1-2), 151–175 (2010)
Borgwardt, K.M., Gretton, A., Rasch, M.J., Kriegel, H., Schölkopf, B., Smola, A.J.: Integrating structured biological data by Kernel maximum mean discrepancy. In: Proceedings of the 14th International Conference on Intelligent Systems for Molecular Biology, pp. 49–57 (2006)
Briggs, C., Fan, Z., Andras, P.: Federated learning with hierarchical clustering of local updates to improve training on non-IID data. In: International Joint Conference on Neural Networks, IJCNN, pp. 1–9. IEEE (2020)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (20)
Charles, Z., Konečný, J.: Convergence and accuracy trade-offs in federated learning and meta-learning. In: AISTATS, vol. 130, pp. 2575–2583 (2021)
Cortes, C., Kuznetsov, V., Mohri, M., Yang, S.: Structured prediction theory based on factor graph complexity. In: NIPS, pp. 2514–2522 (2016)
Karimireddy, S.P., Kale, S., Mohri, M., Reddi, S.J., Stich, S.U., Suresh, A.T.: SCAFFOLD: stochastic controlled averaging for federated learning. In: ICML, vol. 119, pp. 5132–5143 (2020)
Li, J., Liu, Y., Wang, W.: Automated spectral Kernel learning. In: AAAI, pp. 4618–4625 (2020)
Li, J., Liu, Y., Yin, R., Wang, W.: Approximate manifold regularization: Scalable algorithm and generalization analysis. In: IJCAI. pp. 2887–2893 (2019)
Li, J., Liu, Y., Yin, R., Wang, W.: Multi-class learning using unlabeled samples: theory and algorithm. In: IJCAI, pp. 2880–2886 (2019)
Li, J., Liu, Y., Yin, R., Zhang, H., Ding, L., Wang, W.: Multi-class learning: from theory to algorithm. In: NeurIPS, pp. 1593–1602 (2018)
Li, T., Sahu, A.K., Talwalkar, A., Smith, V.: Federated learning: challenges, methods, and future directions. IEEE Sig. Process. Mag. 37(3), 50–60 (2020)
Li, T., Sahu, A.K., Zaheer, M., Sanjabi, M., Talwalkar, A., Smith, V.: Federated optimization in heterogeneous networks. In: MLSys (2020)
Li, X., Huang, K., Yang, W., Wang, S., Zhang, Z.: On the convergence of FedAvg on non-iid data. In: ICLR (2020)
Li, Z., Kovalev, D., Qian, X., Richtárik, P.: Acceleration for compressed gradient descent in distributed and federated optimization. In: ICML, vol. 119, pp. 5895–5904 (2020)
Lian, X., Zhang, C., Zhang, H., Hsieh, C., Zhang, W., Liu, J.: Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gradient descent. In: NeurIPS, pp. 5330–5340 (2017)
Liu, Y., Jiang, S., Liao, S.: Efficient approximation of cross-validation for kernel methods using Bouligand influence function. In: ICML, vol. 32, pp. 324–332 (2014)
Liu, Y., Liao, S., Jiang, S., Ding, L., Lin, H., Wang, W.: Fast cross-validation for kernel-based algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 42(5), 1083–1096 (2020)
Liu, Y., Liao, S., Lin, H., Yue, Y., Wang, W.: Generalization analysis for ranking using integral operator. In: AAAI, pp. 2273–2279 (2017)
Liu, Y., Liao, S., Lin, H., Yue, Y., Wang, W.: Infinite Kernel learning: generalization bounds and algorithms. In: AAAI, pp. 2280–2286 (2017)
Liu, Y., Liu, J., Wang, S.: Effective distributed learning with random features: improved bounds and algorithms. In: ICLR (2021)
Mansour, Y., Mohri, M., Rostamizadeh, A.: Domain adaptation: learning bounds and algorithms. In: COLT (2009)
McMahan, B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: AISTATS, vol. 54, pp. 1273–1282 (2017)
Mohri, M., Sivek, G., Suresh, A.T.: Agnostic federated learning. In: ICML, vol. 97, pp. 4615–4625 (2019)
Mehryar, A.R.M., Talwalkar, A.: Foundations of Machine Learning. The MIT Press, Cambridge second edn. (2018)
Pustozerova, A., Rauber, A., Mayer, R.: Training effective neural networks on structured data with federated learning. In: AINA, vol. 226, pp. 394–406 (2021)
Rahimi, A., Recht, B.: Random features for large-scale Kernel machines. In: NIPS, pp. 1177–1184 (2007)
Sattler, F., Müller, K.R., Samek, W.: Clustered federated learning: model-agnostic distributed multitask optimization under privacy constraints. IEEE Trans. Neural Netw. Learn. Syst. 32(8), 3710–3722 (2021)
Smith, V., Chiang, C., Sanjabi, M., Talwalkar, A.S.: Federated multi-task learning. In: NIPS, pp. 4424–4434 (2017)
Stich, S.U.: Local SGD converges fast and communicates little. In: ICLR (2019)
Wang, H., Yurochkin, M., Sun, Y., Papailiopoulos, D.S., Khazaeni, Y.: Federated learning with matched averaging. In: ICLR (2020)
Wang, J., Liu, Q., Liang, H., Joshi, G., Poor, H.V.: Tackling the objective inconsistency problem in heterogeneous federated optimization. In: NeurIPS (2020)
Wang, J., Tantia, V., Ballas, N., Rabbat, M.G.: SLOWMO: improving communication-efficient distributed SGD with slow momentum. In: ICLR (2020)
Wang, S., et al.: Adaptive federated learning in resource constrained edge computing systems. IEEE J. Sel. Areas Commun. 37(6), 1205–1221 (2019)
Yin, R., Liu, Y., Lu, L., Wang, W., Meng, D.: Divide-and-conquer learning with nyström: optimal rate and algorithm. In: AAAI, pp. 6696–6703 (2020)
Yu, H., Jin, R., Yang, S.: On the linear speedup analysis of communication efficient momentum SGD for distributed non-convex optimization. In: ICML, vol. 97, pp. 7184–7193 (2019)
Zhang, Y., Duchi, J.C., Wainwright, M.J.: Divide and conquer kernel ridge regression: a distributed algorithm with minimax optimal rates. J. Mach. Learn. Res. 16, 3299–3340 (2015)
Zhang, Y., Liu, T., Long, M., Jordan, M.I.: Bridging theory and algorithm for domain adaptation. In: ICML, vol. 97, pp. 7404–7413 (2019)
Acknowledgement
This work was supported in part by Excellent Talents Program of Institute of Information Engineering, CAS, Special Research Assistant Project of CAS (No. E0YY231114), Beijing Outstanding Young Scientist Program (No. BJJWZYJH01 2019100020098), National Natural Science Foundation of China (No. 62076234, No. 62106257) and Beijing Municipal Science and Technology Commission under Grant Z191100007119002.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Wei, B., Li, J., Liu, Y., Wang, W. (2021). Federated Learning for Non-IID Data: From Theory to Algorithm. In: Pham, D.N., Theeramunkong, T., Governatori, G., Liu, F. (eds) PRICAI 2021: Trends in Artificial Intelligence. PRICAI 2021. Lecture Notes in Computer Science(), vol 13031. Springer, Cham. https://doi.org/10.1007/978-3-030-89188-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-89188-6_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89187-9
Online ISBN: 978-3-030-89188-6
eBook Packages: Computer ScienceComputer Science (R0)