Data Source Selection in Federated Learning: A Submodular Optimization Approach

Zhang, Ruisheng; Wang, Yansheng; Zhou, Zimu; Ren, Ziyao; Tong, Yongxin; Xu, Ke

doi:10.1007/978-3-031-00126-0_43

Data Source Selection in Federated Learning: A Submodular Optimization Approach

Ruisheng Zhang^16,17,
Yansheng Wang^16,17,
Zimu Zhou¹⁸,
Ziyao Ren^16,17,
Yongxin Tong^16,17 &
…
Ke Xu^16,17

Conference paper
First Online: 08 April 2022

2935 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13246))

Abstract

Federated learning is a new learning paradigm that jointly trains a model from multiple data sources without sharing raw data. For the practical deployment of federated learning, data source selection is compulsory due to the limited communication cost and budget in real-world applications. The necessity of data source selection is further amplified in presence of data heterogeneity among clients. Prior solutions are either low in efficiency with exponential time cost or lack theoretical guarantees. Inspired by the diminishing marginal accuracy phenomenon in federated learning, we study the problem from the perspective of submodular optimization. In this paper, we aim at efficient data source selection with theoretical guarantees. We prove that data source selection in federated learning is a monotone submodular maximization problem and propose FDSS, an efficient algorithm with a constant approximate ratio. Furthermore, we extend FDSS to FDSS-d for dynamic data source selection. Extensive experiments on CIFAR10 and CIFAR100 validate the efficiency and effectiveness of our algorithms.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Huang, T., et al.: An efficiency-boosting client selection scheme for federated learning with fairness guarantee. IEEE Trans. Parallel Distrib. Syst. 32, 1552–1564(2020)
Google Scholar
Lai, F., Zhu, X., Madhyastha, H.V., Chowdhury, M.: Oort: efficient federated learning via guided participant selection. In: OSDI, pp. 19–35 (2021)
Google Scholar
McMahan, B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: AISTATS, pp. 1273–1282 (2017)
Google Scholar
Minoux, M.: Accelerated greedy algorithms for maximizing submodular set functions. In: Stoer, J. (eds.) Optimization Techniques. LNCIS, vol. 7, pp. 234–243. Springer, Heidelberg (1978). https://doi.org/10.1007/BFb0006528
Nagalapatti, L., Narayanam, R.: Game of gradients: Mitigating irrelevant clients in federated learning. In: AAAI, pp. 9046–9054 (2021)
Google Scholar
Nemhauser, G.L., Wolsey, L.A., Fisher, M.L.: An analysis of approximations for maximizing submodular set functions-I. Math. Program. 14(1), 265–294 (1978)
Article MathSciNet Google Scholar
Shi, Y., et al.: Federated topic discovery: a semantic consistent approach. IEEE Intell. Syst. 36(5), 96–103 (2021)
Article Google Scholar
Song, T., Tong, Y., Wei, S.: Profit allocation for federated learning. In: Big Data, pp. 2577–2586 (2019)
Google Scholar
Sun, C., Shrivastava, A., Singh, S., Gupta, A.: Revisiting unreasonable effectiveness of data in deep learning era. In: ICCV, October 2017
Google Scholar
Wang, Y., Tong, Y., Shi, D.: Federated latent Dirichlet allocation: a local differential privacy based framework. In: AAAI, pp. 6283–6290 (2020)
Google Scholar
Wang, Y., Tong, Y., Shi, D., Xu, K.: An efficient approach for cross-silo federated learning to rank. In: ICDE, pp. 1128–1139 (2021)
Google Scholar
Yagli, S., Dytso, A., Poor, H.V.: Information-theoretic bounds on the generalization error and privacy leakage in federated learning. In: SPAWC Workshop, pp. 1–5 (2020)
Google Scholar
Yang, Q., Liu, Y., Chen, T., Tong, Y.: Federated machine learning: concept and applications. ACM Trans. Intell. Syst. Technol. 10(2), 12:1–12:19 (2019)
Google Scholar

Download references

Acknowledgments

We are grateful to anonymous reviewers for their constructive comments. This work are partially supported by the National Key Research and Development Program of China under Grant No. 2018AAA0101100, the National Science Foundation of China (NSFC) under Grant Nos. U21A20516, 61822201, U1811463 and 62076017, the State Key Laboratory of Software Development Environment Open Funding No. SKLSDE-2020ZX-07, and WeBank Scholars Program.

Author information

Authors and Affiliations

State Key Laboratory of Software Development Environment, Beihang University, Beijing, China
Ruisheng Zhang, Yansheng Wang, Ziyao Ren, Yongxin Tong & Ke Xu
Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, Beihang University, Beijing, China
Ruisheng Zhang, Yansheng Wang, Ziyao Ren, Yongxin Tong & Ke Xu
Singapore Management University, Singapore, Singapore
Zimu Zhou

Authors

Ruisheng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yansheng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zimu Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Ziyao Ren
View author publications
You can also search for this author in PubMed Google Scholar
Yongxin Tong
View author publications
You can also search for this author in PubMed Google Scholar
Ke Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yongxin Tong .

Editor information

Editors and Affiliations

Dept. of Computer Science&Engr., Indian Institutes of Technology, Kanpur, Uttar Pradesh, India
Arnab Bhattacharya
National University of Singapore, Singapore, Singapore
Janice Lee Mong Li
University of California, Santa Barbara, Santa Barbara, CA, USA
Divyakant Agrawal
IIIT Hyderabad, Hyderabad, India
P. Krishna Reddy
Indraprastha Institute of Information Technology Delhi, New Delhi, India
Mukesh Mohania
Ashoka University, Sonepat, Haryana, India
Anirban Mondal
Indraprastha Institute of Information Te, New Delhi, India
Vikram Goyal
University of Aizu, Aizu, Japan
Rage Uday Kiran

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, R., Wang, Y., Zhou, Z., Ren, Z., Tong, Y., Xu, K. (2022). Data Source Selection in Federated Learning: A Submodular Optimization Approach. In: Bhattacharya, A., et al. Database Systems for Advanced Applications. DASFAA 2022. Lecture Notes in Computer Science, vol 13246. Springer, Cham. https://doi.org/10.1007/978-3-031-00126-0_43

Download citation

DOI: https://doi.org/10.1007/978-3-031-00126-0_43
Published: 08 April 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-00125-3
Online ISBN: 978-3-031-00126-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics