Combating Quality Distortion in Federated Learning with Collaborative Data Selection

Nguyen, Duc Long; Nguyen, Phi Le; Truong, Thao Nguyen

doi:10.1007/978-981-97-2259-4_14

Duc Long Nguyen¹³,
Phi Le Nguyen¹³ &
Thao Nguyen Truong¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14647))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

204 Accesses

Abstract

Federated Learning (FL), a paradigm facilitating collaborative model training across distributed devices, has attracted substantial attention due to its potential to address privacy concerns and data localization requirements. However, the inherent inaccessibility of data poses a critical challenge in ensuring data quality within FL systems. Consequently, FL systems grapple with a range of data-related issues, encompassing erroneous samples, imbalanced data distributions, and data skew, all of which impose a significant impact on model performance. Therefore, the judicious selection of appropriate data for training is of paramount importance as it seeks to ameliorate these challenges.

This research paper tackles a crucial but often overlooked concern: the presence of low-quality data samples. In such circumstances, we introduce an innovative algorithm that strategically curates a subset of data for each training iteration, with the overarching objective of optimizing the model’s accuracy while simultaneously addressing privacy concerns and reducing communication costs. Our primary innovation lies in the global selection of data, in contrast to the conventional approach that relies on individualized, client-specific data selection.

Furthermore, we introduce a novel medical dataset tailored specifically for classification tasks. This dataset intentionally incorporates various attributes associated with low-quality data to effectively replicate real-world conditions. Through rigorous empirical evaluation, we show the effectiveness of our algorithm using this dataset. The results demonstrate a notable improvement of approximately 2–3% in model performance, particularly in scenarios characterized by imbalanced data distributions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The configurations of this experiment is described in Sect. 4.1.
2.
Datasets are available at https://github.com/duclong1009/S-Selection.
3.
By using the transformation function ImageEnhance of the PIL library.

References

da Costa, G.B.P., Contato, W.A., Nazare, T.S., Batista Neto, J.E.S., Ponti, M.: An empirical study on the effects of different types of noise in image classification tasks (2016)
Google Scholar
Dodge, S., Karam, L.: Understanding how image quality affects deep neural networks (2016)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015)
Google Scholar
He, Z., Rakin, A.S., Fan, D.: Parametric noise injection: trainable randomness to improve deep neural network robustness against adversarial attack. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Google Scholar
Holmstrom, L., Koistinen, P., et al.: Using additive noise in back-propagation training. IEEE Trans. Neural Netw. 3(1), 24–38 (1992)
Article Google Scholar
Jiang, A.H., et al.: Accelerating deep learning by focusing on the biggest losers (2019)
Google Scholar
Katharopoulos, A., Fleuret, F.: Not all samples are created equal: deep learning with importance sampling. In: Proceedings of the 35th International Conference on Machine Learning, vol. 80, pp. 2525–2534. PMLR, July 2018
Google Scholar
Killamsetty, K., Sivasubramanian, D., Ramakrishnan, G., De, A., Iyer, R.: Grad-match: gradient matching based data subset selection for efficient deep model training (2021)
Google Scholar
Krizhevsky, A., et al.: Learning multiple layers of features from tiny images (2009)
Google Scholar
Li, A., Zhang, L., Tan, J., Qin, Y., Wang, J., Li, X.-Y.: Sample-level data selection for federated learning. In: IEEE INFOCOM 2021 - IEEE Conference on Computer Communications, pp. 1–10 (2021)
Google Scholar
Li, Q., Diao, Y., Chen, Q., He, B.: Federated learning on non-IID data silos: an experimental study. In: 2022 IEEE 38th International Conference on Data Engineering (ICDE), pp. 965–978. IEEE (2022)
Google Scholar
Li, T., Sahu, A.K., Zaheer, M., Sanjabi, M., Talwalkar, A., Smith, V.: Federated optimization in heterogeneous networks (2020)
Google Scholar
Li, X., Huang, K., Yang, W., Wang, S., Zhang, Z.: On the convergence of fedavg on non-IID data. arXiv preprint arXiv:1907.02189 (2019)
McMahan, B., Moore, E., Ramage, D., Hampson, S., Aguera, B., Arcas: Communication-efficient learning of deep networks from decentralized data, 54, 1273–1282 (2017)
Google Scholar
Paul, M., Ganguli, S., Dziugaite, G.K.: Deep learning on a data diet: finding important examples early in training (2023)
Google Scholar
Pillutla, K., Laguel, Y., Malick, J., Harchaoui, Z.: Tackling distribution shifts in federated learning with superquantile aggregation. In: NeurIPS 2022 Workshop on Distribution Shifts: Connecting Methods and Applications (2022)
Google Scholar
Qin, Z., et al.: Infobatch: lossless training speed up by unbiased dynamic data pruning (2023)
Google Scholar
Sorscher, B., Geirhos, R., Shekhar, S., Ganguli, S., Morcos, A.S.: Beyond neural scaling laws: beating power law scaling via data pruning (2023)
Google Scholar
Tolpegin, V., Truex, S., Gursoy, M.E., Liu, L.: Data poisoning attacks against federated learning systems. In: Chen, L., Li, N., Liang, K., Schneider, S. (eds.) ESORICS 2020. LNCS, vol. 12308, pp. 480–501. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58951-6_24
Chapter Google Scholar
Truong, T.N., Gerofi, B., Martinez-Noriega, E.J., Trahay, F., Wahib, M.: KAKURENBO: adaptively hiding samples in deep neural network training. In: Thirty-Seventh Conference on Neural Information Processing Systems (2023)
Google Scholar
Wu, C., Yang, X., Zhu, S., Mitra, P.: Mitigating backdoor attacks in federated learning (2021)
Google Scholar
Yang, S., Park, H., Byun, J., Kim, C.: Robust federated learning with noisy labels. IEEE Intell. Syst. 37(2), 35–43 (2022)
Article Google Scholar
Yang, S., Xie, Z., Peng, H., Xu, M., Sun, M., Li, P.: Dataset pruning: reducing training data by examining generalization influence. In: The Eleventh International Conference on Learning Representations (2023)
Google Scholar
Yu, X., Han, B., Yao, J., Niu, G., Tsang, I.W., Sugiyama, M.: How does disagreement help generalization against label corruption? (2019)
Google Scholar
Zhou, T., Konukoglu, E.: FedFA: federated feature augmentation (2023)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information and Communication Technology, Hanoi University of Science and Technology, Hanoi, Vietnam
Duc Long Nguyen & Phi Le Nguyen
The National Institute of Advanced Industrial Science and Technology, Tokyo, Japan
Thao Nguyen Truong

Authors

Duc Long Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Phi Le Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Thao Nguyen Truong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Phi Le Nguyen or Thao Nguyen Truong .

Editor information

Editors and Affiliations

Academia Sinica, Taipei, Taiwan
De-Nian Yang
Microsoft Research Asia, Beijing, China
Xing Xie
National Yang Ming Chiao Tung University, Hsinchu, Taiwan
Vincent S. Tseng
Duke University, Durham, NC, USA
Jian Pei
National Cheng Kung University, Tainan, Taiwan
Jen-Wei Huang
Silesian University of Technology, Gliwice, Poland
Jerry Chun-Wei Lin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, D.L., Nguyen, P.L., Truong, T.N. (2024). Combating Quality Distortion in Federated Learning with Collaborative Data Selection. In: Yang, DN., Xie, X., Tseng, V.S., Pei, J., Huang, JW., Lin, J.CW. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2024. Lecture Notes in Computer Science(), vol 14647. Springer, Singapore. https://doi.org/10.1007/978-981-97-2259-4_14

Download citation

DOI: https://doi.org/10.1007/978-981-97-2259-4_14
Published: 25 April 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2261-7
Online ISBN: 978-981-97-2259-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Combating Quality Distortion in Federated Learning with Collaborative Data Selection