Skip to main content

Communication-Efficient Model Fusion

  • 2356 Accesses

Abstract

We consider the problem of learning a federated model where the number of communication rounds is severely limited. We discuss recent works on model fusion, a special case of Federated Learning where only a single communication round is allowed. This setting has a unique feature where it is sufficient for clients to have a pre-trained model, but not the data. Data storage regulations such as GDPR make this setting appealing as the data can be immediately deleted after updating the local model before FL starts. However, model fusion methods are limited to relatively shallow neural network architectures. We discuss extensions of model fusion applicable to deep learning models that require more than one communication round, but remain very efficient in terms of communication budget, i.e., number of communication rounds and size of the messages exchanged between the clients and the server. We consider both homogeneous and heterogeneous client data scenarios, including scenarios where training on the aggregated data is suboptimal due to biases in the data. In addition to deep learning methods, we cover unsupervised settings such as mixture models, topic models, and hidden Markov models.

We compare the statistical efficiency of model fusion to that of a hypothetical centralized approach in which a learner with unlimited compute and storage capacity simply aggregates data from all clients and trains a model in a non-federated way. As we shall see, although the model fusion approach generally matches the convergence rate of the (hypothetical) centralized approach, it may not have the same efficiency. Further, this discrepancy between the centralized and federated approaches is amplified when client data is heterogeneous.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    To keep things simple, we assume the set of all possible parameters Θ, called the parameter space, is an open subset of \({\mathbb {R}}^d\).

  2. 2.

    Its variance is O(1).

References

  1. Agueh M, Carlier G (2011) Barycenters in the Wasserstein space. SIAM J Math Anal 43:904–924

    CrossRef  MathSciNet  Google Scholar 

  2. Banerjee A, Dhillon IS, Ghosh J, Sra S (2005) Clustering on the unit hypersphere using von Mises-Fisher distributions. J Mach Learn Res 6:1345–1382

    MathSciNet  MATH  Google Scholar 

  3. Bardenet R, Doucet A, Holmes C (2017) On Markov chain Monte Carlo methods for tall data. J Mach Learn Res 18(1):1515–1557

    MathSciNet  MATH  Google Scholar 

  4. Betancourt M (2017) A conceptual introduction to Hamiltonian Monte Carlo. arXiv preprint arXiv:170102434

    Google Scholar 

  5. Bishop CM (2006) Pattern recognition and machine learning. Springer, New York

    MATH  Google Scholar 

  6. Blei DM, Jordan MI (2006) Variational inference for Dirichlet process mixtures. Bayesian Anal 1:121–143

    CrossRef  MathSciNet  Google Scholar 

  7. Blei DM, Kucukelbir A, McAuliffe JD (2017) Variational inference: a review for statisticians. J Am Stat Assoc 112(518):859–877

    CrossRef  MathSciNet  Google Scholar 

  8. Breiman L (2001) Random forests. Mach Learn 45:5–32

    CrossRef  Google Scholar 

  9. Broderick T, Boyd N, Wibisono A, Wilson AC, Jordan MI (2013) Streaming variational Bayes. In: Advances in neural information processing systems

    Google Scholar 

  10. Bui TD, Nguyen CV, Swaroop S, Turner RE (2018) Partitioned variational inference: a unified framework encompassing federated and continual learning. arXiv preprint arXiv:181111206

    Google Scholar 

  11. Campbell T, How JP (2014) Approximate decentralized Bayesian inference. arXiv:14037471

    Google Scholar 

  12. Carli FP, Ning L, Georgiou TT (2013) Convex clustering via optimal mass transport. arXiv:13075459

    Google Scholar 

  13. Carpenter B, Gelman A, Hoffman M, Lee D, Goodrich B, Betancourt M, Brubaker MA, Guo J, Li P, Riddell A et al (2017) Stan: a probabilistic programming language. J Stat Softw 76:1–32

    CrossRef  Google Scholar 

  14. Claici S, Yurochkin M, Ghosh S, Solomon J (2020) Model fusion with Kullback-Leibler divergence. In: International conference on machine learning

    Google Scholar 

  15. Cooper Y (2018) The loss landscape of overparameterized neural networks. arXiv preprint arXiv:180410200

    Google Scholar 

  16. Das R, Zaheer M, Dyer C (2015) Gaussian LDA for topic models with word embeddings. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (Volume 1: Long Papers)

    Google Scholar 

  17. Dietterich TG (2000) Ensemble methods in machine learning. In: International workshop on multiple classifier systems

    Google Scholar 

  18. Draxler F, Veschgini K, Salmhofer M, Hamprecht F (2018) Essentially no barriers in neural network energy landscape. In: International conference on machine learning

    Google Scholar 

  19. EU (2016) Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation). Official Journal of the European Union

    Google Scholar 

  20. Ferguson TS (1973) A Bayesian analysis of some nonparametric problems. Ann Stat 1:209–230

    CrossRef  MathSciNet  Google Scholar 

  21. Garipov T, Izmailov P, Podoprikhin D, Vetrov D, Wilson AG (2018) Loss surfaces, mode connectivity, and fast ensembling of DNNs. arXiv preprint arXiv:180210026

    Google Scholar 

  22. Ghahramani Z, Griffiths TL (2005) Infinite latent feature models and the Indian buffet process. In: Advances in neural information processing systems

    Google Scholar 

  23. Gromov M, Katz M, Pansu P, Semmes S (1999) Metric structures for Riemannian and non-Riemannian spaces, vol 152. Birkhäuser, Boston

    Google Scholar 

  24. Hasenclever L, Webb S, Lienart T, Vollmer S, Lakshminarayanan B, Blundell C, Teh YW (2017) Distributed Bayesian learning with stochastic natural gradient expectation propagation and the posterior server. J Mach Learn Res 18:1–37

    MathSciNet  MATH  Google Scholar 

  25. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780

    CrossRef  Google Scholar 

  26. Hsu TMH, Qi H, Brown M (2019) Measuring the effects of non-identical data distribution for federated visual classification. arXiv preprint arXiv:190906335

    Google Scholar 

  27. Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK (1999) An introduction to variational methods for graphical models. Mach Learn 37:183–233

    CrossRef  Google Scholar 

  28. Kuhn HW (1955) The Hungarian method for the assignment problem. Nav Res Logist (NRL) 2:83–97

    CrossRef  MathSciNet  Google Scholar 

  29. LeCun Y, Bottou L, Bengio Y, Haffner P et al (1998) Gradient-based learning applied to document recognition. In: Proceedings of the IEEE

    Google Scholar 

  30. Li H, Xu Z, Taylor G, Studer C, Goldstein T (2017) Visualizing the loss landscape of neural nets. arXiv preprint arXiv:171209913

    Google Scholar 

  31. Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28:129–137

    CrossRef  MathSciNet  Google Scholar 

  32. Loiola EM, de Abreu NMM, Boaventura-Netto PO, Hahn P, Querido T (2007) A survey for the quadratic assignment problem. Eur J Oper Res 176:657–690

    CrossRef  MathSciNet  Google Scholar 

  33. Ludwig H, Baracaldo N, Thomas G, Zhou Y, Anwar A, Rajamoni S, Ong Y, Radhakrishnan J, Verma A, Sinn M et al (2020) IBM federated learning: an enterprise framework white paper v0. 1. arXiv preprint arXiv:200710987

    Google Scholar 

  34. McMahan B, Moore E, Ramage D, Hampson S, y Arcas BA (2017) Communication-efficient learning of deep networks from decentralized data. In: Artificial intelligence and statistics

    Google Scholar 

  35. Minka TP (2001) Expectation propagation for approximate Bayesian inference. In: Conference on uncertainty in artificial intelligence

    Google Scholar 

  36. Neal RM (2012) Bayesian learning for neural networks. Springer Science & Business Media, Berlin/Heidelberg

    Google Scholar 

  37. Neal RM et al (2011) MCMC using Hamiltonian dynamics. Handb Markov Chain Monte Carlo 2(11):2

    MATH  Google Scholar 

  38. Opper M (1998) A Bayesian approach to on-line learning. On-line Learning in Neural Networks

    Google Scholar 

  39. Peyré G, Cuturi M, Solomon J (2016) Gromov-Wasserstein averaging of kernel and distance matrices. In: International conference on machine learning

    Google Scholar 

  40. Sahu AK, Li T, Sanjabi M, Zaheer M, Talwalkar A, Smith V (2018) On the convergence of federated optimization in heterogeneous networks. arXiv preprint arXiv:181206127

    Google Scholar 

  41. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556

    Google Scholar 

  42. Singh SP, Jaggi M (2019) Model fusion via optimal transport. arXiv preprint arXiv:191005653

    Google Scholar 

  43. Srivastava S, Cevher V, Dinh Q, Dunson D (2015) Wasp: scalable Bayes via barycenters of subset posteriors. In: Artificial intelligence and statistics

    Google Scholar 

  44. Teh YW, Grür D, Ghahramani Z (2007) Stick-breaking construction for the Indian buffet process. In: Artificial intelligence and statistics

    Google Scholar 

  45. Thibaux R, Jordan MI (2007) Hierarchical Beta processes and the Indian buffet process. In: Artificial intelligence and statistics

    Google Scholar 

  46. Wainwright MJ, Jordan MI et al (2008) Graphical models, exponential families, and variational inference. Found Trends® Mach Learn 1:1–305

    Google Scholar 

  47. Wang H, Yurochkin M, Sun Y, Papailiopoulos D, Khazaeni Y (2020) Federated learning with matched averaging. In: International conference on learning representations

    Google Scholar 

  48. Yurochkin M, Agarwal M, Ghosh S, Greenewald K, Hoang N (2019) Statistical model aggregation via parameter matching. In: Advances in neural information processing systems

    Google Scholar 

  49. Yurochkin M, Agarwal M, Ghosh S, Greenewald K, Hoang N, Khazaeni Y (2019) Bayesian nonparametric federated learning of neural networks. In: International conference on machine learning

    Google Scholar 

  50. Yurochkin M, Fan Z, Guha A, Koutris P, Nguyen X (2019) Scalable inference of topic evolution via models for latent geometric structures. In: Advances in neural information processing systems

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mikhail Yurochkin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Yurochkin, M., Sun, Y. (2022). Communication-Efficient Model Fusion. In: Ludwig, H., Baracaldo, N. (eds) Federated Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-96896-0_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-96896-0_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-96895-3

  • Online ISBN: 978-3-030-96896-0

  • eBook Packages: Computer ScienceComputer Science (R0)