Abstract
The distributed optimization problem has become increasingly relevant recently. It has a lot of advantages such as processing a large amount of data in less time compared to non-distributed methods. However, most distributed approaches suffer from a significant bottleneck—the cost of communications. Therefore, a large amount of research has recently been directed at solving this problem. One such approach uses local data similarity. In particular, there exists an algorithm provably optimally exploiting the similarity property. But this result, as well as results from other works solve the communication bottleneck by focusing only on the fact that communication is significantly more expensive than local computing and does not take into account the various capacities of network devices and the different relationship between communication time and local computing expenses. We consider this setup and the objective of this study is to achieve an optimal ratio of distributed data between the server and local machines for any costs of communications and local computations. The running times of the network are compared between uniform and optimal distributions. The superior theoretical performance of our solutions is experimentally validated.
REFERENCES
J. Verbraeken, M. Wolting, J. Katzy, J. Kloppenburg, T. Verbelen, and J. S. Rellermeyer, “A survey on distributed machine learning,” ACM Comput. Surv. 53 (2), 1–33 (2020).
J. Konečný, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon, “Federated learning: Strategies for improving communication efficiency” (2016). https://doi.org/10.48550/arXiv.1610.05492
T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, “Federated learning: Challenges, methods, and future directions,” IEEE Signal Process. Mag. 37 (3), 50–60 (2020).
P. Kairouz, H. B. McMahan, B. Avent, et al., “Advances and open problems in federated learning,” Found. Trends Mach. Learn. 14 (1–2), 1–210 (2021).
A. Ghosh, R. K. Maity, A. Mazumdar, and K. Ramchandran, “Communication efficient distributed approximate newton method,” in 2020 IEEE International Symposium on Information Theory (ISIT) (IEEE, 2020), pp. 2539–2544.
V. Smith, S. Forte, M. Chenxin, M. Takáč, M. I. Jordan, and M. Jaggi, “Cocoa: A general framework for communication-efficient distributed optimization,” J. Mach. Learn. Res. 18, 230 (2018).
E. Gorbunov, K. P. Burlachenko, Z. Li, and P. Richtárik, “Marina: Faster non-convex distributed learning with compression,” in International Conference on Machine Learning, PMLR (2021), pp. 3788–3798.
Y. Nesterov et al., Lectures on Convex Optimization (Springer, 2018).
Y. Arjevani and O. Shamir, “Communication complexity of distributed convex learning and optimization,” Advances in Neural Information Processing Systems (2015), Vol. 28.
O. Shamir, N. Srebro, and T. Zhang, “Communication-efficient distributed optimization using an approximate newton-type method,” in International Conference on Machine Learning, PMLR (2014), pp. 1000–1008.
S. Matsushima, H. Yun, X. Zhang, and S. Vishwanathan, “Distributed stochastic optimization of the regularized risk” (2014). https://doi.org/10.48550/arXiv.1406.4363
Y. Tian, G. Scutari, T. Cao, and A. Gasnikov, “Acceleration in distributed optimization under similarity,” in International Conference on Artificial Intelligence and Statistics, PMLR (2022), pp. 5721–5756.
Y. Sun, G. Scutari, and A. Daneshmand, “Distributed optimization based on gradient tracking revisited: Enhancing convergence rate via surrogation,” SIAM J. Optim. 32 (2), 354–385 (2022).
S. J. Reddi, J. Konečný, P. Richtárik, B. Póczós, and A. Smola, “AIDE: Fast and communication efficient distributed optimization” (2016). https://doi.org/10.48550/arXiv.1608.06879
H. Hendrikx, L. Xiao, S. Bubeck, F. Bach, and L. Massoulie, “Statistically preconditioned accelerated gradient method for distributed optimization,” in International Conference on Machine Learning, PMLR (2020), pp. 4203–4227.
A. Beznosikov, G. Scutari, A. Rogozin, and A. Gasnikov, “Distributed saddle-point problems under data similarity,” Adv. Neural Inf. Process. Syst. 34, 8172–8184 (2021).
D. Kovalev, A. Beznosikov, E. Borodich, A. Gasnikov, and G. Scutari, “Optimal gradient sliding and its application to optimal distributed optimization under similarity,” Adv. Neural Inf. Process. Syst. 35, 33494–33507 (2022).
B. T. Polyak, “Newton’s method and its use in optimization,” Eur. J. Oper. Res. 181 (3), 1086–1096 (2007).
C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector machines,” ACM Trans. Intell. Syst. Technol. 2 (3), 1–27 (2011).
D. Kim and J. A. Fessler, “Optimizing the efficiency of first-order methods for decreasing the gradient of smooth convex functions,” J. Optim. Theory Appl. 188 (1), 192–219 (2021).
Funding
The research of A. Beznosikov was supported by Russian Science Foundation (project no. 23-11-00229).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The authors of this work declare that they have no conflicts of interest.
Additional information
Publisher’s Note.
Pleiades Publishing remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Medyakov, D., Molodtsov, G., Beznosikov, A. et al. Optimal Data Splitting in Distributed Optimization for Machine Learning. Dokl. Math. 108 (Suppl 2), S465–S475 (2023). https://doi.org/10.1134/S1064562423701600
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S1064562423701600