Skip to main content

Data Source Selection in Federated Learning: A Submodular Optimization Approach

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13246))

Abstract

Federated learning is a new learning paradigm that jointly trains a model from multiple data sources without sharing raw data. For the practical deployment of federated learning, data source selection is compulsory due to the limited communication cost and budget in real-world applications. The necessity of data source selection is further amplified in presence of data heterogeneity among clients. Prior solutions are either low in efficiency with exponential time cost or lack theoretical guarantees. Inspired by the diminishing marginal accuracy phenomenon in federated learning, we study the problem from the perspective of submodular optimization. In this paper, we aim at efficient data source selection with theoretical guarantees. We prove that data source selection in federated learning is a monotone submodular maximization problem and propose FDSS, an efficient algorithm with a constant approximate ratio. Furthermore, we extend FDSS to FDSS-d for dynamic data source selection. Extensive experiments on CIFAR10 and CIFAR100 validate the efficiency and effectiveness of our algorithms.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Huang, T., et al.: An efficiency-boosting client selection scheme for federated learning with fairness guarantee. IEEE Trans. Parallel Distrib. Syst. 32, 1552–1564(2020)

    Google Scholar 

  2. Lai, F., Zhu, X., Madhyastha, H.V., Chowdhury, M.: Oort: efficient federated learning via guided participant selection. In: OSDI, pp. 19–35 (2021)

    Google Scholar 

  3. McMahan, B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: AISTATS, pp. 1273–1282 (2017)

    Google Scholar 

  4. Minoux, M.: Accelerated greedy algorithms for maximizing submodular set functions. In: Stoer, J. (eds.) Optimization Techniques. LNCIS, vol. 7, pp. 234–243. Springer, Heidelberg (1978). https://doi.org/10.1007/BFb0006528

  5. Nagalapatti, L., Narayanam, R.: Game of gradients: Mitigating irrelevant clients in federated learning. In: AAAI, pp. 9046–9054 (2021)

    Google Scholar 

  6. Nemhauser, G.L., Wolsey, L.A., Fisher, M.L.: An analysis of approximations for maximizing submodular set functions-I. Math. Program. 14(1), 265–294 (1978)

    Article  MathSciNet  Google Scholar 

  7. Shi, Y., et al.: Federated topic discovery: a semantic consistent approach. IEEE Intell. Syst. 36(5), 96–103 (2021)

    Article  Google Scholar 

  8. Song, T., Tong, Y., Wei, S.: Profit allocation for federated learning. In: Big Data, pp. 2577–2586 (2019)

    Google Scholar 

  9. Sun, C., Shrivastava, A., Singh, S., Gupta, A.: Revisiting unreasonable effectiveness of data in deep learning era. In: ICCV, October 2017

    Google Scholar 

  10. Wang, Y., Tong, Y., Shi, D.: Federated latent Dirichlet allocation: a local differential privacy based framework. In: AAAI, pp. 6283–6290 (2020)

    Google Scholar 

  11. Wang, Y., Tong, Y., Shi, D., Xu, K.: An efficient approach for cross-silo federated learning to rank. In: ICDE, pp. 1128–1139 (2021)

    Google Scholar 

  12. Yagli, S., Dytso, A., Poor, H.V.: Information-theoretic bounds on the generalization error and privacy leakage in federated learning. In: SPAWC Workshop, pp. 1–5 (2020)

    Google Scholar 

  13. Yang, Q., Liu, Y., Chen, T., Tong, Y.: Federated machine learning: concept and applications. ACM Trans. Intell. Syst. Technol. 10(2), 12:1–12:19 (2019)

    Google Scholar 

Download references

Acknowledgments

We are grateful to anonymous reviewers for their constructive comments. This work are partially supported by the National Key Research and Development Program of China under Grant No. 2018AAA0101100, the National Science Foundation of China (NSFC) under Grant Nos. U21A20516, 61822201, U1811463 and 62076017, the State Key Laboratory of Software Development Environment Open Funding No. SKLSDE-2020ZX-07, and WeBank Scholars Program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yongxin Tong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, R., Wang, Y., Zhou, Z., Ren, Z., Tong, Y., Xu, K. (2022). Data Source Selection in Federated Learning: A Submodular Optimization Approach. In: Bhattacharya, A., et al. Database Systems for Advanced Applications. DASFAA 2022. Lecture Notes in Computer Science, vol 13246. Springer, Cham. https://doi.org/10.1007/978-3-031-00126-0_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-00126-0_43

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-00125-3

  • Online ISBN: 978-3-031-00126-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics