Improving Generalization in Federated Learning by Seeking Flat Minima

Caldarola, Debora; Caputo, Barbara; Ciccone, Marco

doi:10.1007/978-3-031-20050-2_38

Debora Caldarola¹²,
Barbara Caputo^12,13 &
Marco Ciccone¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13683))

Included in the following conference series:

European Conference on Computer Vision

2680 Accesses
16 Citations

Abstract

Models trained in federated settings often suffer from degraded performances and fail at generalizing, especially when facing heterogeneous scenarios. In this work, we investigate such behavior through the lens of geometry of the loss and Hessian eigenspectrum, linking the model’s lack of generalization capacity to the sharpness of the solution. Motivated by prior studies connecting the sharpness of the loss surface and the generalization gap, we show that i) training clients locally with Sharpness-Aware Minimization (SAM) or its adaptive version (ASAM) and ii) averaging stochastic weights (SWA) on the server-side can substantially improve generalization in Federated Learning and help bridging the gap with centralized models. By seeking parameters in neighborhoods having uniform low loss, the model converges towards flatter minima and its generalization significantly improves in both homogeneous and heterogeneous scenarios. Empirical results demonstrate the effectiveness of those optimizers across a variety of benchmark vision datasets (e.g. Cifar10/100, Landmarks-User-160k, Idda) and tasks (large scale classification, semantic segmentation, domain generalization).

D. Caldarola and M. Ciccone—Equal contribution.

Official code: https://github.com/debcaldarola/fedsam.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Acar, D.A.E., Zhao, Y., Navarro, R.M., Mattina, M., Whatmough, P.N., Saligrama, V.: Federated learning based on dynamic regularization. In: International Conference on Learning Representations (2021)
Google Scholar
Alberti, E., Tavera, A., Masone, C., Caputo, B.: IDDA: a large-scale multi-domain dataset for autonomous driving. IEEE Robot. Autom. Lett. 5(4), 5526–5533 (2020)
Article Google Scholar
Andreux, M., du Terrail, J.O., Beguier, C., Tramel, E.W.: Siloed federated learning for multi-centric histopathology datasets. In: Albarqouni, S., et al. (eds.) DART/DCL -2020. LNCS, vol. 12444, pp. 129–139. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60548-3_13
Chapter Google Scholar
Bahri, D., Mobahi, H., Tay, Y.: Sharpness-aware minimization improves language model generalization. arXiv preprint arXiv:2110.08529 (2021)
Bello, I., et al.: Revisiting resNets: improved training and scaling strategies. In: Advances in Neural Information Processing Systems 34 (2021)
Google Scholar
Bercea, C.I., Wiestler, B., Rueckert, D., Albarqouni, S.: FedDis: disentangled federated learning for unsupervised brain pathology segmentation. arXiv preprint arXiv:2103.03705 (2021)
Blanchard, G., Lee, G., Scott, C.: Generalizing from several related classification tasks to a new unlabeled sample. In: Advances in Neural Information Processing Systems 24 (2011)
Google Scholar
Briggs, C., Fan, Z., Andras, P.: Federated learning with hierarchical clustering of local updates to improve training on non-IID data. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–9. IEEE (2020)
Google Scholar
Caldarola, D., Mancini, M., Galasso, F., Ciccone, M., Rodolà, E., Caputo, B.: Cluster-driven graph federated learning over multiple domains. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop, pp. 2749–2758 (2021)
Google Scholar
Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)
Google Scholar
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)
Article Google Scholar
Chen, X., Hsieh, C.J., Gong, B.: When vision transformers outperform resNets without pre-training or strong data augmentations. In: International Conference on Learning Representations (2022)
Google Scholar
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223 (2016)
Google Scholar
DeVries, T., Taylor, G.W.: Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552 (2017)
Draxler, F., Veschgini, K., Salmhofer, M., Hamprecht, F.: Essentially no barriers in neural network energy landscape. In: International conference on machine learning, pp. 1309–1318. PMLR (2018)
Google Scholar
Dziugaite, G.K., Roy, D.M.: Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data. arXiv preprint arXiv:1703.11008 (2017)
Fantauzzo, L., et al.: FedDrive: generalizing federated learning to semantic segmentation in autonomous driving. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (2022)
Google Scholar
Foret, P., Kleiner, A., Mobahi, H., Neyshabur, B.: Sharpness-aware minimization for efficiently improving generalization. In: International Conference on Learning Representations (2021)
Google Scholar
Garcia-Garcia, A., Orts-Escolano, S., Oprea, S., Villena-Martinez, V., Garcia-Rodriguez, J.: A review on deep learning techniques applied to semantic segmentation. arXiv preprint arXiv:1704.06857 (2017)
Garipov, T., Izmailov, P., Podoprikhin, D., Vetrov, D.P., Wilson, A.G.: Loss surfaces, mode connectivity, and fast ensembling of DNNs. In: Advances in neural information processing systems 31 (2018)
Google Scholar
Gong, X., et al.: Ensemble attention distillation for privacy-preserving federated learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 15076–15086 (2021)
Google Scholar
Guo, P., Wang, P., Zhou, J., Jiang, S., Patel, V.M.: Multi-institutional collaborations for improving deep learning-based magnetic resonance image reconstruction using federated learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2423–2432 (June 2021)
Google Scholar
Hendrycks, D., et al.: The many faces of robustness: a critical analysis of out-of-distribution generalization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8340–8349 (2021)
Google Scholar
Hendrycks, D., Dietterich, T.: Benchmarking neural network robustness to common corruptions and perturbations. In: International Conference on Learning Representations (2019)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Flat minima. Neural Comput. 9(1), 1–42 (1997)
Article MATH Google Scholar
Hsu, T.M.H., Qi, H., Brown, M.: Measuring the effects of non-identical data distribution for federated visual classification. In: NeurIPS Workshop (2019)
Google Scholar
Hsu, T.-M.H., Qi, H., Brown, M.: Federated visual classification with real-world data distribution. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12355, pp. 76–92. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58607-2_5
Chapter Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, pp. 448–456. PMLR (2015)
Google Scholar
Izmailov, P., Podoprikhin, D., Garipov, T., Vetrov, D., Wilson, A.G.: Averaging weights leads to wider optima and better generalization. In: Uncertainty in Artificial Intelligence (UAI) (2018)
Google Scholar
Jastrzebski, S., Kenton, Z., Ballas, N., Fischer, A., Bengio, Y., Storkey, A.: On the relation between the sharpest directions of DNN loss and the SGD step length. In: International Conference on Learning Representations (2019)
Google Scholar
Jastrzebski, S., Szymczak, M., Fort, S., Arpit, D., Tabor, J., Cho, K., Geras, K.: The break-even point on optimization trajectories of deep neural networks. arXiv preprint arXiv:2002.09572 (2020)
Jiang, Y., Neyshabur, B., Mobahi, H., Krishnan, D., Bengio, S.: Fantastic generalization measures and where to find them. arXiv preprint arXiv:1912.02178 (2019)
Kairouz, P., et al.: Advances and open problems in federated learning. arXiv preprint arXiv:1912.04977 (2019)
Karimireddy, S.P., Kale, S., Mohri, M., Reddi, S., Stich, S., Suresh, A.T.: Scaffold: stochastic controlled averaging for federated learning. In: International Conference on Machine Learning, pp. 5132–5143. PMLR (2020)
Google Scholar
Keskar, N.S., Mudigere, D., Nocedal, J., Smelyanskiy, M., Tang, P.T.P.: On large-batch training for deep learning: Generalization gap and sharp minima. In: International Conference on Learning Representations (2017)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. ICLR (2015)
Google Scholar
Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114(13), 3521–3526 (2017)
Article MathSciNet MATH Google Scholar
Kleinberg, B., Li, Y., Yuan, Y.: An alternative view: when does SGD escape local minima? In: International Conference on Machine Learning, pp. 2698–2707. PMLR (2018)
Google Scholar
Krizhevsky, A., et al.: Learning multiple layers of features from tiny images (2009)
Google Scholar
Kwon, J., Kim, J., Park, H., Choi, I.K.: Asam: adaptive sharpness-aware minimization for scale-invariant learning of deep neural networks. In: International Conference on Machine Learning (2021)
Google Scholar
Li, H., Xu, Z., Taylor, G., Studer, C., Goldstein, T.: Visualizing the loss landscape of neural nets. In: Neural Information Processing Systems (2018)
Google Scholar
Li, Q., Diao, Y., Chen, Q., He, B.: Federated learning on non-IID data silos: an experimental study. arXiv preprint arXiv:2102.02079 (2021)
Li, Q., He, B., Song, D.: Model-contrastive federated learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10713–10722 (2021)
Google Scholar
Li, T., Sahu, A.K., Talwalkar, A., Smith, V.: Federated learning: challenges, methods, and future directions. IEEE Signal Process. Mag. 37(3), 50–60 (2020)
Article Google Scholar
Li, T., Sahu, A.K., Zaheer, M., Sanjabi, M., Talwalkar, A., Smith, V.: Federated optimization in heterogeneous networks. Proceed. Mach. Learn. Syst. 2, 429–450 (2020)
Google Scholar
Li, W., et al.: Privacy-preserving federated brain tumour segmentation. In: Suk, H.-I., Liu, M., Yan, P., Lian, C. (eds.) MLMI 2019. LNCS, vol. 11861, pp. 133–141. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32692-0_16
Chapter Google Scholar
Li, Y., Wang, N., Shi, J., Liu, J., Hou, X.: Revisiting batch normalization for practical domain adaptation. In: ICLR Workshop (2017)
Google Scholar
Lin, T., Kong, L., Stich, S.U., Jaggi, M.: Ensemble distillation for robust model fusion in federated learning. arXiv preprint arXiv:2006.07242 (2020)
Liu, Q., Chen, C., Qin, J., Dou, Q., Heng, P.A.: FedDG: federated domain generalization on medical image segmentation via episodic learning in continuous frequency space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1013–1023 (2021)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Google Scholar
McMahan, B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Artificial intelligence and statistics, pp. 1273–1282. PMLR (2017)
Google Scholar
Michieli, U., Ozay, M.: Prototype guided federated learning of visual feature representations. arXiv preprint arXiv:2105.08982 (2021)
Ouahabi, A., Taleb-Ahmed, A.: Deep learning for real-time semantic segmentation: application in ultrasound imaging. Pattern Recogn. Lett. 144, 27–34 (2021)
Article Google Scholar
Qu, Z., Li, X., Duan, R., Liu, Y., Tang, B., Lu, Z.: Generalized federated learning via sharpness aware minimization. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., Sabato, S. (eds.) Proceedings of the 39th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 162, pp. 18250–18280. PMLR (17–23 July 2022)
Google Scholar
Reddi, S., et al.: Adaptive federated optimization. In: International Conference on Learning Representations (2021)
Google Scholar
Ruder, S.: An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747 (2016)
Sheller, M.J., Reina, G.A., Edwards, B., Martin, J., Bakas, S.: Multi-institutional deep learning modeling without sharing patient data: a feasibility study on brain tumor segmentation. In: Crimi, A., Bakas, S., Kuijf, H., Keyvan, F., Reyes, M., van Walsum, T. (eds.) BrainLes 2018. LNCS, vol. 11383, pp. 92–104. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11723-8_9
Chapter Google Scholar
Siam, M., Gamal, M., Abdel-Razek, M., Yogamani, S., Jagersand, M., Zhang, H.: A comparative study of real-time semantic segmentation for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 587–597 (2018)
Google Scholar
Smith, S.L., Le, Q.V.: A bayesian perspective on generalization and stochastic gradient descent. In: International Conference on Learning Representations (2018)
Google Scholar
Smith, V., Chiang, C.K., Sanjabi, M., Talwalkar, A.S.: Federated multi-task learning. In: Advances in Neural Information Processing systems 30 (2017)
Google Scholar
Tavera, A., Cermelli, F., Masone, C., Caputo, B.: Pixel-by-pixel cross-domain alignment for few-shot semantic segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1626–1635 (2022)
Google Scholar
Tian, C.X., Li, H., Wang, Y., Wang, S.: Privacy-preserving constrained domain generalization for medical image classification. arXiv preprint arXiv:2105.08511 (2021)
Varno, F., Saghayi, M., Rafiee, L., Gupta, S., Matwin, S., Havaei, M.: Minimizing client drift in federated learning via adaptive bias estimation. arXiv preprint arXiv:2204.13170 (2022)
Weyand, T., Araujo, A., Cao, B., Sim, J.: Google landmarks dataset v2-a large-scale benchmark for instance-level recognition and retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2575–2584 (2020)
Google Scholar
Xie, C., Tan, M., Gong, B., Wang, J., Yuille, A.L., Le, Q.V.: Adversarial examples improve image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 819–828 (2020)
Google Scholar
Yao, C.H., Gong, B., Qi, H., Cui, Y., Zhu, Y., Yang, M.H.: Federated multi-target domain adaptation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1424–1433 (2022)
Google Scholar
Yi, L., Zhang, J., Zhang, R., Shi, J., Wang, G., Liu, X.: SU-Net: an efficient encoder-decoder model of federated learning for brain tumor segmentation. In: Farkaš, I., Masulli, P., Wermter, S. (eds.) ICANN 2020. LNCS, vol. 12396, pp. 761–773. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61609-0_60
Chapter Google Scholar
Yuan, H., Morningstar, W., Ning, L., Singhal, K.: What do we mean by generalization in federated learning? In: NeurIPS Workshop (2021)
Google Scholar
Yue, X., Nouiehed, M., Kontar, R.A.: SALR: sharpness-aware learning rates for improved generalization. arXiv preprint arXiv:2011.05348 (2020)
Zhang, C., Xie, Y., Bai, H., Yu, B., Li, W., Gao, Y.: A survey on federated learning. Knowl.-Based Syst. 216, 106775 (2021)
Article Google Scholar
Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: beyond empirical risk minimization. In: International Conference on Learning Representations (2018)
Google Scholar
Zhang, L., Luo, Y., Bai, Y., Du, B., Duan, L.Y.: Federated learning for non-IID data via unified feature learning and optimization objective alignment. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4420–4428 (2021)
Google Scholar
Zhao, Y., Li, M., Lai, L., Suda, N., Civin, D., Chandra, V.: Federated learning with non-IID data. arXiv preprint arXiv:1806.00582 (2018)

Download references

Acknowledgments

We thank L. Fantauzzo for her help with the SS experiments. We acknowledge the Cineca HPC infrastructure. Work funded by Cini.

Author information

Authors and Affiliations

Politecnico di Torino, Turin, Italy
Debora Caldarola, Barbara Caputo & Marco Ciccone
CINI, Rome, Italy
Barbara Caputo

Authors

Debora Caldarola
View author publications
You can also search for this author in PubMed Google Scholar
Barbara Caputo
View author publications
You can also search for this author in PubMed Google Scholar
Marco Ciccone
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Debora Caldarola .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 970 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Caldarola, D., Caputo, B., Ciccone, M. (2022). Improving Generalization in Federated Learning by Seeking Flat Minima. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13683. Springer, Cham. https://doi.org/10.1007/978-3-031-20050-2_38

Download citation

DOI: https://doi.org/10.1007/978-3-031-20050-2_38
Published: 28 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20049-6
Online ISBN: 978-3-031-20050-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improving Generalization in Federated Learning by Seeking Flat Minima