Optimizing Communication Topology for Collaborative Learning Across Datacenters

Luo, Long; Yang, Shulin; Feng, Wenjiao; Yu, Hongfang; Sun, Gang; Lei, Bo

doi:10.1007/978-981-19-9697-9_15

Long Luo ORCID: orcid.org/0000-0003-1028-8178⁶,
Shulin Yang⁶,
Wenjiao Feng⁶,
Hongfang Yu ORCID: orcid.org/0000-0002-5219-1780⁶,
Gang Sun⁶ &
…
Bo Lei⁷

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1696))

Included in the following conference series:

International Conference on Emerging Networking Architecture and Technologies

790 Accesses
1 Citations

Abstract

Federated learning (FL) is emerging as an increasingly important and popular paradigm for collaboratively training high-quality machine learning (ML) models over massive amounts of data stored by geo-distributed datacenters. However, the communication efficiency of gradient aggregation during the training process comes as a primary bottleneck that impedes the adoption of FL, especially in cross-silo settings, as the available bandwidth of inter-datacenter links connecting data silos is often very limited. To improve the training efficiency of cross-silo FL between datacenters, we propose TopoAdopt, an efficient communication topology design for gradient aggregation to overcome the communication bottleneck of cross-silo model training. TopoAdopt uses multiple aggregators to share aggregation load and tree-based hierarchical aggregation to reduce bandwidth consumption from clients to aggregators. For better performance, it jointly optimizes the parameter assignment among aggregators and the construction of aggregation trees. We formulate this optimization problem as a mixed-integer nonlinear programming model and develop efficient algorithms to find satisfactory communication topologies in reasonable computational time. The experimental results show that TopoAdopt achieves significant speedup, up to 5.2\(\times \), in gradient aggregation completion time compared to existing solutions.

This work was funded by the National Natural Science Foundation of China (62102066), the Open Research Projects of Zhejiang Lab (No. 2022QA0AB02), and CNKLSTISS.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Verbraeken, J., et al.: A survey on distributed machine learning. ACM Comput. Surv. (CSUR) 53(2), 1–33 (2020)
Article Google Scholar
Hsieh, K., et al.: Gaia: geo-distributed machine learning approaching LAN speeds. In: NSDI, pp. 629–647 (2017)
Google Scholar
Zhang, C., et al.: BatchCrypt: efficient homomorphic encryption for cross-silo federated learning. In: USENIX ATC, pp. 493–506 (2020)
Google Scholar
Chen, Y., et al.: Elastic parameter server load distribution in deep learning clusters. In: SoCC, pp. 507–521 (2020)
Google Scholar
Sapio, A., et al.: Scaling distributed machine learning with in-network aggregation. In: NSDI, pp. 785–808 (2021)
Google Scholar
Marfoq, O., et al.: Throughput-optimal topology design for cross-silo federated learning. In: NeurIPS, vol. 33, pp. 19 478–19 487 (2020)
Google Scholar
Lao, C., et al.: ATP: in-network aggregation for multi-tenant learning. In: NSDI, pp. 741–761 (2021)
Google Scholar
Li, M., et al.: Scaling distributed machine learning with the parameter server. In: OSDI, pp. 583–598 (2014)
Google Scholar
Luo, L., et al.: Plink: efficient cloud-based training with topology-aware dynamic hierarchical aggregation. In: MLSys, pp. 1–13 (2020)
Google Scholar
Fate (federated AI technology enabler) (2019). https://github.com/FederatedAI/FATE
Courtiol, P., et al.: Deep learning-based classification of mesothelioma improves prediction of patient outcome. Nat. Med. 25(10), 1519–1525 (2019)
Article Google Scholar
Kairouz, P., et al.: Advances and open problems in federated learning. Found. Trends® Mach. Learn. 14(1–2), 1–210 (2021)
Google Scholar
Shouxi, L., et al.: Eliminating communication bottlenecks in cross-device federated learning with in-network processing at the edge. In: IEEE ICC, pp. 1–6 (2022)
Google Scholar
Kirkpatrick, S., Gelatt Jr., C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Electronic Science and Technology of China, Chengdu, People’s Republic of China
Long Luo, Shulin Yang, Wenjiao Feng, Hongfang Yu & Gang Sun
China Telecom Corporation Limited Research Institute, Beijing, China
Bo Lei

Authors

Long Luo
View author publications
You can also search for this author in PubMed Google Scholar
Shulin Yang
View author publications
You can also search for this author in PubMed Google Scholar
Wenjiao Feng
View author publications
You can also search for this author in PubMed Google Scholar
Hongfang Yu
View author publications
You can also search for this author in PubMed Google Scholar
Gang Sun
View author publications
You can also search for this author in PubMed Google Scholar
Bo Lei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Long Luo .

Editor information

Editors and Affiliations

Beijing Jiaotong University, Beijing, China
Wei Quan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Luo, L., Yang, S., Feng, W., Yu, H., Sun, G., Lei, B. (2023). Optimizing Communication Topology for Collaborative Learning Across Datacenters. In: Quan, W. (eds) Emerging Networking Architecture and Technologies. ICENAT 2022. Communications in Computer and Information Science, vol 1696. Springer, Singapore. https://doi.org/10.1007/978-981-19-9697-9_15

Download citation

DOI: https://doi.org/10.1007/978-981-19-9697-9_15
Published: 01 February 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-9696-2
Online ISBN: 978-981-19-9697-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics