Advertisement

Large-scale asynchronous distributed learning based on parameter exchanges

  • Bikash Joshi
  • Franck Iutzeler
  • Massih-Reza Amini
Regular Paper
  • 177 Downloads

Abstract

In many distributed learning problems, the heterogeneous loading of computing machines may harm the overall performance of synchronous strategies, as each machine begins its new computations after receiving an aggregated information from a master and any delay in sending local information to the latter may be a bottleneck. In this paper, we propose an effective asynchronous distributed framework for the minimization of a sum of smooth functions, where each machine performs iterations in parallel on its local function and updates a shared parameter asynchronously. In this way, all machines can continuously work even though they do not have the latest version of the shared parameter. We prove the convergence of the consistency of this general distributed asynchronous method for gradient iterations and then show its efficiency on the matrix factorization problem for recommender systems and on binary classification.

Keywords

Large-scale learning Asynchronous framework Recommender systems Binary classification Distributed matrix factorization 

References

  1. 1.
    Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York (2011)CrossRefzbMATHGoogle Scholar
  2. 2.
    Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods. Prentice-Hall Inc, Upper Saddle River (1999)zbMATHGoogle Scholar
  3. 3.
    Chang, T., Hong, M., Liao, W., Wang, X.: Asynchronous distributed ADMM for large-scale optimization—part I: algorithm and convergence analysis (2015). arXiv preprint arXiv:1509.02597
  4. 4.
    Chin, W.S., Zhuang, Y., Juan, Y.C., Lin, C.J.: A learning-rate schedule for stochastic gradient methods to matrix factorization. In: Advances in Knowledge Discovery and Data Mining, pp. 442–455. Springer, Cham (2015)Google Scholar
  5. 5.
    Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Mao, M., Senior, A., Tucker, P., Yang, K., Le, Q.V., et al.: Large scale distributed deep networks. In: Advances in Neural Information Processing Systems, pp. 1223–1231 (2012)Google Scholar
  6. 6.
    Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014)Google Scholar
  7. 7.
    Gemulla, R., Nijkamp, E., Haas, P.J., Sismanis, Y.: Large-scale matrix factorization with distributed stochastic gradient descent. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 69–77. ACM (2011)Google Scholar
  8. 8.
    Ho, Q., Cipar, J., Cui, H., Lee, S., Kim, J.K., Gibbons, P.B., Gibson, G.A., Ganger, G., Xing, E.P.: More effective distributed ml via a stale synchronous parallel parameter server. In: Advances in Neural Information Processing Systems, vol. 26, pp. 1223–1231 (2013)Google Scholar
  9. 9.
    Huo, Z., Huang, H.: Asynchronous stochastic gradient descent with variance reduction for non-convex optimization (2016). ArXiv preprint arXiv:1604.03584
  10. 10.
    Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in Neural Information Processing Systems, pp. 315–323 (2013)Google Scholar
  11. 11.
    Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009)CrossRefGoogle Scholar
  12. 12.
    Mairal, J.: Incremental majorization-minimization optimization with application to large-scale machine learning. SIAM J. Optim. 25(2), 829–855 (2015).  https://doi.org/10.1137/140957639 MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Makari, F., Teflioudi, C., Gemulla, R., Haas, P., Sismanis, Y.: Shared-memory and shared-nothing stochastic gradient descent algorithms for matrix completion. Knowl. Inf. Syst. 42(3), 493–523 (2015)CrossRefGoogle Scholar
  14. 14.
    Roux, N.L., Schmidt, M., Bach, F.R.: A stochastic gradient method with an exponential convergence_rate for finite training sets. In: Advances in Neural Information Processing Systems, pp. 2663–2671 (2012)Google Scholar
  15. 15.
    Sra, S.: Scalable nonconvex inexact proximal splitting. In: Advances in Neural Information Processing Systems, vol. 25, pp. 530–538 (2012)Google Scholar
  16. 16.
    Xu, Y., Yin, W.: A globally convergent algorithm for nonconvex optimization based on block coordinate update. (2014). ArXiv e-prints arXiv:1410.1386
  17. 17.
    Yu, Z.Q., Shi, X.J., Yan, L., Li, W.J.: Distributed stochastic ADMM for matrix factorization. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 1259–1268. ACM (2014)Google Scholar
  18. 18.
    Zhang, R., Zheng, S., Kwok, J.T.: Fast distributed asynchronous SGD with variance reduction. CoRR, arXiv:1508.01633 (2015)
  19. 19.
    Zhu, D.L., Marcotte, P.: Co-coercivity and its role in the convergence of iterative schemes for solving variational inequalities. SIAM J. Optim. 6(3), 714–726 (1996)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Univ. Grenoble Alpes, CNRS, Grenoble INP, LIGGrenobleFrance
  2. 2.Univ. Grenoble Alpes, CNRS, Grenoble INP, LJKGrenobleFrance

Personalised recommendations