Skip to main content
Log in

Asynchronous level bundle methods

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

In this paper, we consider nonsmooth convex optimization problems with additive structure featuring independent oracles (black-boxes) working in parallel. Existing methods for solving these distributed problems in a general form are synchronous, in the sense that they wait for the responses of all the oracles before performing a new iteration. In this paper, we propose level bundle methods handling asynchronous oracles. These methods require original upper-bounds (using upper-models or scarce coordinations) to deal with asynchronicity. We prove their convergence using variational-analysis techniques and illustrate their practical performance on a Lagrangian decomposition problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. To better underline our contributions on asynchronicity, we consider first only exact oracles of the \(f^i\) as above. Later in Sect. 5, we explain how our developments easily extend to the case of inexact oracles providing noisy approximations of \((f^i({x}), g^i)\).

  2. Note that Algorithm 2 still needs initial bounds \(f^{{{\,\mathrm{up}\,}}}_1\) and \(f^{{{\,\mathrm{low}\,}}}_1\). These bounds can often be easily estimated from the data of the problem. Otherwise, we can use the standard initialization: call the m oracles at an initial point \(x_1\) and wait for their first responses from which we can compute \(f^{{{\,\mathrm{up}\,}}}_1 = f(x_1) = \sum _i f^i(x_1)\) and \(f^{{{\,\mathrm{low}\,}}}_1\) as the minimum of the linearization \(f(x_1) + {\langle g_1,x-x_1 \rangle }\) over the compact set X. If we do not want to have this synchronous initial step, we may alternatively estimate \(f^{{{\,\mathrm{lev}\,}}}\) and set \(f^{{{\,\mathrm{up}\,}}}_1=+\infty \) and \(f^{{{\,\mathrm{low}\,}}}_1=-\infty \). This would require small changes in the algorithm (in line 15) and in its proof (in Lemma 3). For sake of clarity we stick with the simplest version of the algorithm and the most frequent situation where we can easily estimate \(f^{{{\,\mathrm{up}\,}}}_1\) and \(f^{{{\,\mathrm{low}\,}}}_1\).

  3. As the oracles are assumed to respond in a finite time, the inequality \(\min _{j=1,\ldots ,m} {{\mathtt {a}}(j)} \ge {\bar{k}}\) is guaranteed to be satisfied for k is large enough.

  4. By substituting \(f^i_{x}= f^i({x}) - \eta _{x}^{v,i}\) in the inequality \(f^i(\cdot ) \ge f^i_{x}+ \langle g^i_{x}, \cdot -{x} \rangle -\eta _{x}^{s,i}\) and evaluating at x, we get that \(f(x)\ge f(x) - \eta _x^{v,i} - \eta _x^{s,i}\). This shows that \(\eta ^{v,i}_x+\eta ^{s,i}_x\ge 0\) and in fact \(g_x^i \in \partial _{(\eta ^{v,i}_x+\eta ^{v,i}_x)} f^i(x)\).

  5. As in Sect. 3, \({{\mathtt {a}}(i)}\) is the iteration index of the anterior information provided by oracle i; see Algorithm 2.

References

  1. Arda, A., Feyzmahdavian, H.R., Johansson, M.: Analysis and implementation of an asynchronous optimization algorithm for the parameter server (2016). arXiv preprint arXiv:1610.05507

  2. Bacaud, L., Lemaréchal, C., Renaud, A., Sagastizábal, C.: Bundle methods in stochastic optimal power management: a disaggregated approach using preconditioners. Comput. Optim. Appl. 20, 227–244 (2001)

    MathSciNet  MATH  Google Scholar 

  3. Bernardes, N.C.: On nested sequences of convex sets in Banach spaces. J. Math. Anal. Appl. 389, 558–561 (2012)

    MathSciNet  MATH  Google Scholar 

  4. Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods, vol. 23. Prentice Hall, Englewood Cliffs (1989)

    MATH  Google Scholar 

  5. Briant, O., Claude Lemaréchal, P., Meurdesoif, S.M., Perrot, N., Vanderbeck, F.: Comparison of bundle and classical column generation. Math. Programm. 113, 299–344 (2008)

    MathSciNet  MATH  Google Scholar 

  6. Bruno, S.V.B., Moraes, L.A.M., de Oliveira, W.: Optimization techniques for the Brazilian natural gas network planning problem. Energy Syst. 8, 81–101 (2017)

    Google Scholar 

  7. de Oliveira, W.: Target radius methods for nonsmooth convex optimization. Oper. Res. Lett. 45, 659–664 (2017)

    MathSciNet  MATH  Google Scholar 

  8. de Oliveira, W., Sagastizábal, C.: Level bundle methods for oracles with on-demand accuracy. Optim. Methods Softw. 29, 1180–1209 (2014)

    MathSciNet  MATH  Google Scholar 

  9. de Oliveira, W., Solodov, M.: Bundle methods for inexact data. Technical report (2018)

  10. Dubost, L., Gonzalez, R., Lemaréchal, C.: A primal-proximal heuristic applied to the french unit-commitment problem. Math. Program. 104, 129–151 (2005)

    MathSciNet  MATH  Google Scholar 

  11. Fischer, F., Helmberg, C.: A parallel bundle framework for asynchronous subspace optimization of nonsmooth convex functions. SIAM J. Optim. 24, 795–822 (2014)

    MathSciNet  MATH  Google Scholar 

  12. Frangioni, A.: Standard bundle methods: untrusted models and duality. Technical report, Universita di Pisa (2018)

  13. Frangioni, A., Gorgone, E.: Bundle methods for sum-functions with “easy” components: applications to multicommodity network design. Math. Program. 145, 133–161 (2014)

    MathSciNet  MATH  Google Scholar 

  14. Geoffrion, A.M.: Generalized benders decomposition. J. Optim. Theory Appl. 10, 237–260 (1972)

    Article  MathSciNet  Google Scholar 

  15. Hannah, R., Yin, W.: More iterations per second, same quality–why asynchronous algorithms may drastically outperform traditional ones (2017). arXiv preprint arXiv:1708.05136

  16. Hiriart-Urruty, J.-B., Lemaréchal, C.: Convex Analysis Minimization Algorithms, vol. 305 and 306. Springer, Berlin (1993)

    Book  Google Scholar 

  17. Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in Neural Information Processing Systems, pp. 315–323 (2013)

  18. Kim, K., Petra, C., Zavala, V.: An asynchronous bundle-trust-region method for dual decomposition of stochastic mixed-integer programming. SIAM J. Optim. 29, 318–342 (2019)

    Article  MathSciNet  Google Scholar 

  19. Kiwiel, K.C.: Proximal level bubdle methods for convex nondiferentiable optimization, saddle-point problems and variational inequalities. Math. Program. 69, 89–109 (1995)

    MATH  Google Scholar 

  20. Konecnỳ, J., McMahan, H.B., Ramage, D., Richtárik, P.: Federated optimization: distributed machine learning for on-device intelligence (2016). arXiv preprint arXiv:1610.02527

  21. Lemaréchal, C.: An extension of davidon methods to nondifferentiable problems. Math. Program. Study 3, 95–109 (1975)

    MathSciNet  MATH  Google Scholar 

  22. Lemaréchal, C.: Lagrangian relaxation. In: Jünger, M., Naddef, D. (eds.) Computational Combinatorial Optimization: Optimal or Provably Near-Optimal Solutions, pp. 112–156. Springer, Berlin, Heidelberg (2001). https://doi.org/10.1007/3-540-45586-8_4

  23. Lemaréchal, C., Nemirovskii, A., Nesterov, Y.: New variants of bundle methods. Math. Program. 69, 111–147 (1995)

    MathSciNet  MATH  Google Scholar 

  24. Ma, C., Smith, V., Jaggi, M., Jordan, M., Richtarik, P., Takac, M.: Adding vs. averaging in distributed primal-dual optimization. In: International Conference on Machine Learning, pp. 1973–1982 (2015)

  25. Malick, J., de Oliveira, W., Zaourar, S.: Uncontrolled inexact information within bundle methods. EURO J. Comput. Optim. 5, 5–29 (2017)

    MathSciNet  MATH  Google Scholar 

  26. Mishchenko, K., Iutzeler, F., Malick, J., Amini, M.-R.: A delay-tolerant proximal-gradient algorithm for distributed learning. In: Proceedings of the 35th International Conference on Machine Learning, vol. 80 of Proceedings of Machine Learning Research, PMLR, 10–15, pp. 3584–3592 (Jul 2018)

  27. Moritsch, H.W., Pflug, GCh., Siomak, M.: Asynchronous nested optimization algorithms and their parallel implementation. Wuhan Univ. J. Nat. Sci. 6(1–2), 560–567 (2001). https://doi.org/10.1007/BF03160302

  28. Peng, Z., Yangyang, X., Yan, M., Yin, W.: Arock: an algorithmic framework for asynchronous parallel coordinate updates. SIAM J. Sci. Comput. 38, A2851–A2879 (2016)

    MathSciNet  MATH  Google Scholar 

  29. Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis. Springer, Heidelberg (1998)

    MATH  Google Scholar 

  30. Rockafellar, R.T., Wets, R.J.-B.: Scenarios and policy aggregation in optimization under uncertainty. Math. Oper. Res. 16, 119–147 (1991)

    MathSciNet  MATH  Google Scholar 

  31. Sagastizábal, C.: Divide to conquer: decomposition methods for energy optimization. Math. Program. 134, 187–222 (2012)

    MathSciNet  MATH  Google Scholar 

  32. Sagastizábal, C.: A VU-point of view of nonsmooth optimization. Proc. Int. Congr. Math. 3, 3785–3806 (2018)

    Google Scholar 

  33. Shapiro, A., Dentcheva, D., Ruszczyński, A.: Lectures on Stochastic Programming: Modeling and Theory. SIAM, Bangkok (2009)

    MATH  Google Scholar 

  34. Smulian, V.: On the principle of inclusion in the space of the type \((b)\). Rec. Math. [Mat. Sbornik] N.S. 5(47), 317–328 (1939)

    MathSciNet  MATH  Google Scholar 

  35. Sun, T., Hannah, R., Yin, W.:Asynchronous coordinate descent under more realistic assumption. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, pp. 6183–6191, Curran Associates Inc., Long Beach, California, USA (2017). http://dl.acm.org/citation.cfm?id=3295222.3295366

  36. Tsitsiklis, J., Bertsekas, D., Athans, M.: Distributed asynchronous deterministic and stochastic gradient optimization algorithms. IEEE Trans. Autom. Control 31, 803–812 (1986)

    Article  MathSciNet  Google Scholar 

  37. van Ackooij, W., de Oliveira, W.: Level bundle methods for constrained convex optimization with various oracles. Comput. Optim. Appl. 57, 555–597 (2014)

    Article  MathSciNet  Google Scholar 

  38. van Ackooij, W., Malick, J.: Decomposition algorithm for large-scale two-stage unit-commitment. Ann. Oper. Res. 238, 587–613 (2015)

    Article  MathSciNet  Google Scholar 

  39. van Ackooij, W., Frangioni, A.: Incremental bundle methods using upper models. SIAM. J. Optimi. 28(1), 379–410 (2018). https://doi.org/10.1137/16M1089897

  40. Wolf, C., Fábián, C.I., Koberstein, A., Suhl, L.: Applying oracles of on-demand accuracy in two-stage stochastic programming. A computational study. Eur. J. Oper. Res. 239, 437–448 (2014)

    MathSciNet  MATH  Google Scholar 

  41. Zhang, R., Kwok, J.: Asynchronous distributed ADMM for consensus optimization. In: International Conference on Machine Learning, pp. 1701–1709 (2014)

Download references

Acknowledgements

We are grateful to the two referees for their rich feedback on the initial version of our paper. We would like to acknowledge the partial financial support of PGMO (Gaspard Monge Program for Optimization and operations research) of the Hadamard Mathematics Foundation, through the project “Advanced nonsmooth optimization methods for stochastic programming”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Welington de Oliveira.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Iutzeler, F., Malick, J. & de Oliveira, W. Asynchronous level bundle methods. Math. Program. 184, 319–348 (2020). https://doi.org/10.1007/s10107-019-01414-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-019-01414-y

Keywords

Mathematics Subject Classification

Navigation