Skip to main content
Log in

Implementation of an oracle-structured bundle method for distributed optimization

  • Research Article
  • Published:
Optimization and Engineering Aims and scope Submit manuscript

Abstract

We consider the problem of minimizing a function that is a sum of convex agent functions plus a convex common public function that couples them. The agent functions can only be accessed via a subgradient oracle; the public function is assumed to be structured and expressible in a domain specific language (DSL) for convex optimization. We focus on the case when the evaluation of the agent oracles can require significant effort, which justifies the use of solution methods that carry out significant computation in each iteration. To solve this problem we integrate multiple known techniques (or adaptations of known techniques) for bundle-type algorithms, obtaining a method which has a number of practical advantages over other methods that are compatible with our access methods, such as proximal subgradient methods. First, it is reliable, and works well across a number of applications. Second, it has very few parameters that need to be tuned, and works well with sensible default values. Third, it typically produces a reasonable approximate solution in just a few tens of iterations. This paper is accompanied by an open-source implementation of the proposed solver, available at https://github.com/cvxgrp/OSBDO.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Agrawal A, Verschueren R, Diamond S, Boyd S (2018) A rewriting system for convex optimization problems. J Control Decis 5(1):42–60

    Article  MathSciNet  Google Scholar 

  • Atkinson D, Vaidya P (1995) A cutting plane algorithm for convex programming that uses analytic centers. Math Program 69:1–43

    Article  MathSciNet  MATH  Google Scholar 

  • Bacaud L, Lemaréchal C, Renaud A, Sagastizábal C (2001) Bundle methods in stochastic optimal power management: a disaggregated approach using preconditioners. Comput Optim Appl 20:227–244

    Article  MathSciNet  MATH  Google Scholar 

  • Belloni A (2005) Lecture notes for IAP 2005 course introduction to bundle methods. Operation Research Center, MIT, Version of February, 11

  • Ben Amor H, Desrosiers J, Frangioni A (2009) On the choice of explicit stabilizing terms in column generation. Discret Appl Math 157(6):1167–1184

    Article  MathSciNet  MATH  Google Scholar 

  • Birgin E, Martínez J, Raydan M (2003) Inexact spectral projected gradient methods on convex sets. IMA J Numer Anal 23(4):539–559

    Article  MathSciNet  MATH  Google Scholar 

  • Boyd, S, Duchi J, Pilanci M, Vandenberghe L (2022) Stanford EE 364b, lecture notes on subgradients. URL: https://web.stanford.edu/class/ee364b/lectures/subgradients_notes.pdf

  • Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Boyd S, Parikh N, Chu E (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3(1):1–122

    Article  MATH  Google Scholar 

  • Bradley A (2010) Algorithms for the equilibration of matrices and their application to limited-memory Quasi-Newton methods. PhD thesis, Stanford University, CA

  • Bruck R (1975) An iterative solution of a variational inequality for certain monotone operators in Hilbert space. Bull Am Math Soc 81:890–892

    Article  MathSciNet  MATH  Google Scholar 

  • Burachik R, Martínez-Legaz J, Rezaie M, Théra M (2015) An additive subfamily of enlargements of a maximally monotone operator. Set-Valued Variat Anal 23:643–665

    Article  MathSciNet  MATH  Google Scholar 

  • Burke J, Qian M (2000) On the superlinear convergence of the variable metric proximal point algorithm using Broyden and BFGS matrix secant updating. Math Program 88:157–181

    Article  MathSciNet  MATH  Google Scholar 

  • Chen X, Fukushima M (1999) Proximal quasi-Newton methods for nondifferentiable convex optimization. Math Program 85(2):313–334

    Article  MathSciNet  MATH  Google Scholar 

  • Chen G, Rockafellar R (1997) Convergence rates in forward-backward splitting. SIAM J Optim 7(2):421–444

    Article  MathSciNet  MATH  Google Scholar 

  • Cheney E, Goldstein A (1959) Newton’s method for convex programming and Tchebycheff approximation. Numer Math 1:253–268

    Article  MathSciNet  MATH  Google Scholar 

  • Choi Y, Lim Y (2016) Optimization approach for resource allocation on cloud computing for IoT. Int J Distrib Sens Netw 12(3):3479247

    Article  Google Scholar 

  • Combettes P, Pesquet J-C (2011) Proximal splitting methods in signal processing. Fixed-point algorithms for inverse problems in science and engineering. Springer, Berlin, pp 185–212

    Chapter  MATH  Google Scholar 

  • Concus P, Golub G, Meurant G (1985) Block preconditioning for the conjugate gradient method. SIAM J Sci Stat Comput 6(1):220–252

    Article  MathSciNet  MATH  Google Scholar 

  • Correa R, Lemaréchal C (1993) Convergence of some algorithms for convex minimization. Math Program 62:261–275

    Article  MathSciNet  MATH  Google Scholar 

  • de Oliveira W, Solodov M (2016) A doubly stabilized bundle method for nonsmooth convex optimization. Math Program 156(1):125–159

    Article  MathSciNet  MATH  Google Scholar 

  • de Oliveira W, Solodov M (2020) Bundle methods for inexact data. Numerical nonsmooth optimization. Springer, Berlin, pp 417–459

    Chapter  Google Scholar 

  • de Oliveira W, Sagastizábal C, Lemaréchal C (2014) Convex proximal bundle methods in depth: a unified analysis for inexact oracles. Math Program 148:241–277

    Article  MathSciNet  MATH  Google Scholar 

  • de Oliveira W, Eckstein J (2015) A bundle method for exploiting additive structure in difficult optimization problems. Optimization Online

  • Dem’yanov V, Vasil’ev L (1985) Nondifferentiable optimization. Translations series in mathematics and engineering. Springer, New York

    Book  Google Scholar 

  • Diamond S, Boyd S (2016) CVXPY: a Python-embedded modeling language for convex optimization. J Mach Learn Res 17(83):1–5

    MathSciNet  MATH  Google Scholar 

  • Díaz M (2021) proximal-bundle-method. Julia software package available at https://github.com/mateodd25/proximal-bundle-method

  • Díaz M, Grimmer B (2023) Optimal convergence rates for the proximal bundle method. SIAM J Optim 33(2):424–454

    Article  MathSciNet  MATH  Google Scholar 

  • Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12(7):2121–2159

    MathSciNet  MATH  Google Scholar 

  • Elzinga J, Moore T (1975) A central cutting plane algorithm for the convex programming problem. Math Program 8:134–145

    Article  MathSciNet  MATH  Google Scholar 

  • Emiel G, Sagastizábal C (2010) Incremental-like bundle methods with application to energy planning. Comput Optim Appl 46(2):305–332

    Article  MathSciNet  MATH  Google Scholar 

  • Fischer F (2022) An asynchronous proximal bundle method. Optimization Online

  • Frangioni A (2002) Generalized bundle methods. SIAM J Optim 13(1):117–156

    Article  MathSciNet  MATH  Google Scholar 

  • Frangioni A (2020) Standard bundle methods: untrusted models and duality. Numerical nonsmooth optimization. Springer, Berlin, pp 61–116

    Chapter  Google Scholar 

  • Frangioni A, Gorgone E (2014) Bundle methods for sum-functions with “easy’’ components: applications to multicommodity network design. Math Program 145:133–161

    Article  MathSciNet  MATH  Google Scholar 

  • Frangioni A, Gorgone E (2014) Generalized bundle methods for sum-functions with “easy’’ components: applications to multicommodity network design. Math Program 145:133–161

    Article  MathSciNet  MATH  Google Scholar 

  • Fuduli A, Gaudioso M, Giallombardo G (2004) Minimizing nonconvex nonsmooth functions via cutting planes and proximity control. SIAM J Optim 14(3):743–756

    Article  MathSciNet  MATH  Google Scholar 

  • Gonzaga C, Polak E (1979) On constraint dropping schemes and optimality functions for a class of outer approximations algorithms. SIAM J Control Optim 17(4):477–493

    Article  MathSciNet  MATH  Google Scholar 

  • Grant M, Boyd S, Ye Y (2006) Disciplined convex programming. Global optimization. Springer, Berlin, pp 155–210

    Chapter  Google Scholar 

  • Haarala M, Miettinen K, Mäkelä M (2004) New limited memory bundle method for large-scale nonsmooth optimization. Optim Methods Softw 19(6):673–692

    Article  MathSciNet  MATH  Google Scholar 

  • Haarala N, Miettinen K, Mäkelä M (2007) Globally convergent limited memory bundle method for large-scale nonsmooth optimization. Math Program 109:181–205

    Article  MathSciNet  MATH  Google Scholar 

  • Han Z, Liu K (2008) Resource allocation for wireless networks: basics, techniques, and applications. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Hare W, Sagastizábal C, Solodov M (2016) A proximal bundle method for nonsmooth nonconvex functions with inexact information. Comput Optim Appl 63(1):1–28

    Article  MathSciNet  MATH  Google Scholar 

  • Helmberg C, Rendl F (2000) A spectral bundle method for semidefinite programming. SIAM J Optim 10(3):673–696

    Article  MathSciNet  MATH  Google Scholar 

  • Helmberg C, Pichler A (2017) Dynamic scaling and submodel selection in bundle methods for convex optimization. https://www.tu-chemnitz.de/mathematik/preprint/2017/PREPRINT_04.pdf

  • Hestenes M, Stiefel E et al (1952) Methods of conjugate gradients for solving linear systems. J Res Natl Bur Stand 49(6):409–436

    Article  MathSciNet  MATH  Google Scholar 

  • Hintermüller M (2001) A proximal bundle method based on approximate subgradients. Comput Optim Appl 20(3):245–266

    Article  MathSciNet  MATH  Google Scholar 

  • Hiriart-Urruty J-B, Lemaréchal C (1996) Convex analysis and minimization algorithms II: advanced theory and bundle methods. Grundlehren der mathematischen Wissenschaften. Springer, Berlin Heidelberg

    MATH  Google Scholar 

  • Hiriart-Urruty J-B, Lemaréchal C (2013) Convex analysis and minimization algorithms I: fundamentals, vol 305. Springer Science & Business Media, Berlin

    MATH  Google Scholar 

  • Iutzeler F, Malick J, de Oliveira W (2020) Asynchronous level bundle methods. Math Program 184:319–348

    Article  MathSciNet  MATH  Google Scholar 

  • Jacobi C (1845) Ueber eine neue auflösungsart der bei der methode der kleinsten quadrate vorkommenden lineären gleichungen. Astron Nachr 22(20):297–306

    Article  Google Scholar 

  • Kairouz P, McMahan H, Avent B, Bellet A, Bennis M, Bhagoji A, Bonawitz K, Charles Z, Cormode G, Cummings R et al (2021) Advances and open problems in federated learning. Found Trends Mach Learn 14(1–2):1–210

    Article  Google Scholar 

  • Karmitsa N (2016) Proximal bundle method. http://napsu.karmitsa.fi/proxbundle/

  • Karmitsa N (2007) LMBM—FORTRAN subroutines for large-scale nonsmooth minimization: user’s manual. TUCS Tech Rep 77:856

    Google Scholar 

  • Karmitsa N, Mäkelä M (2010) Limited memory bundle method for large bound constrained nonsmooth optimization: convergence analysis. Optim Methods Softw 25(6):895–916

    Article  MathSciNet  MATH  Google Scholar 

  • Kelley J (1960) The cutting-plane method for solving convex programs. J Soc Ind Appl Math 8(4):703–712

    Article  MathSciNet  MATH  Google Scholar 

  • Kim K, Petra C, Zavala V (2019) An asynchronous bundle-trust-region method for dual decomposition of stochastic mixed-integer programming. SIAM J Optim 29(1):318–342

    Article  MathSciNet  MATH  Google Scholar 

  • Kim K, Zhang W, Nakao H, Schanen M (2021) BundleMethod.jl: Implementation of Bundle Methods in Julia

  • Kiwiel K (1983) An aggregate subgradient method for nonsmooth convex minimization. Math Program 27:320–341

    Article  MathSciNet  MATH  Google Scholar 

  • Kiwiel K (1985) An algorithm for nonsmooth convex minimization with errors. Math Comput 45(171):173–180

    Article  MathSciNet  MATH  Google Scholar 

  • Kiwiel K (1990) Proximity control in bundle methods for convex nondifferentiable minimization. Math Program 46(1–3):105–122

    Article  MathSciNet  MATH  Google Scholar 

  • Kiwiel K (1995) Approximations in proximal bundle methods and decomposition of convex programs. J Optim Theory Appl 84(3):529–548

    Article  MathSciNet  MATH  Google Scholar 

  • Kiwiel K (1996) Restricted step and Levenberg–Marquardt techniques in proximal bundle methods for nonconvex nondifferentiable optimization. SIAM J Optim 6(1):227–249

    Article  MathSciNet  MATH  Google Scholar 

  • Kiwiel K (1999) A bundle Bregman proximal method for convex nondifferentiable minimization. Math Program 85(2):241–258

    Article  MathSciNet  MATH  Google Scholar 

  • Kiwiel K (2000) Efficiency of proximal bundle methods. J Optim Theory Appl 104(3):589–603

    Article  MathSciNet  MATH  Google Scholar 

  • Kiwiel K (2006) A proximal bundle method with approximate subgradient linearizations. SIAM J Optim 16(4):1007–1023

    Article  MathSciNet  MATH  Google Scholar 

  • Lemaréchal C (1978) Nonsmooth optimization and descent methods. IIASA Research Report, 78-4

  • Lemaréchal C (1975) An extension of Davidon methods to non differentiable problems. Math Program Study 3:95–109

    Article  MathSciNet  MATH  Google Scholar 

  • Lemaréchal C (2001) Lagrangian relaxation. Computational combinatorial optimization. Springer, Berlin, pp 112–156

    Chapter  Google Scholar 

  • Lemaréchal C, Sagastizábal C (1994) An approach to variable metric bundle methods. System modelling and optimization. Springer, Berlin, pp 144–162

    Chapter  Google Scholar 

  • Lemaréchal C, Sagastizábal C (1997) Variable metric bundle methods: from conceptual to implementable forms. Math Program 76:393–410

    Article  MathSciNet  MATH  Google Scholar 

  • Lemaréchal C, Nemirovskii A, Nesterov Y (1995) New variants of bundle methods. Math Program 69(1):111–147

    Article  MathSciNet  MATH  Google Scholar 

  • Lemaréchal C, Ouorou A, Petrou G (2009) A bundle-type algorithm for routing in telecommunication data networks. Comput Optim Appl 44:385–409

    Article  MathSciNet  MATH  Google Scholar 

  • Lemaréchal C, Sagastizábal C, Pellegrino F, Renaud A (1996) Bundle methods applied to the unit-commitment problem. In: System modelling and optimization: proceedings of the seventeenth IFIP TC7 conference on system modelling and optimization, 1995. Springer, Berlin, pp 395–402

  • Li T, Sahu AK, Talwalkar A, Smith V (2020) Federated learning: challenges, methods, and future directions. IEEE Signal Process Mag 37(3):50–60

    Article  Google Scholar 

  • Lions P, Mercier B (1979) Splitting algorithms for the sum of two nonlinear operators. SIAM J Numer Anal 16(6):964–979

    Article  MathSciNet  MATH  Google Scholar 

  • Liu Y, Zhao S, Du X, Li S (2005) Optimization of resource allocation in construction using genetic algorithms. In: 2005 International conference on machine learning and cybernetics, vol 6, pp 3428–3432. IEEE

  • Lukšan L, Vlček J (1998) A bundle-Newton method for nonsmooth unconstrained minimization. Math Program 83:373–391

    Article  MathSciNet  MATH  Google Scholar 

  • Lukšan L, Vlček J (1999) Globally convergent variable metric method for convex nonsmooth unconstrained minimization. J Optim Theory Appl 102:593–613

    Article  MathSciNet  MATH  Google Scholar 

  • Lv J, Pang L, Meng F (2018) A proximal bundle method for constrained nonsmooth nonconvex optimization with inexact information. J Global Optim 70(3):517–549

    Article  MathSciNet  MATH  Google Scholar 

  • Mäkelä M (2003) Multiobjective proximal bundle method for nonconvex nonsmooth optimization: Fortran subroutine MPBNGC 2.0. Reports of the Department of Mathematical Information Technology, Series B. Sci Comput B 13:2003

    Google Scholar 

  • Mäkelä M, Karmitsa N, Wilppu O (2016) Proximal bundle method for nonsmooth and nonconvex multiobjective optimization. Math Model Optim Complex Struct, 191–204

  • Marsten R, Hogan W, Blankenship J (1975) The boxstep method for large-scale optimization. Oper Res 23(3):389–405

    Article  MathSciNet  MATH  Google Scholar 

  • Mifflin R (1977) Semismooth and semiconvex functions in constrained optimization. SIAM J Control Optim 15(6):959–972

    Article  MathSciNet  MATH  Google Scholar 

  • Mifflin R (1996) A quasi-second-order proximal bundle algorithm. Math Program 73(1):51–72

    Article  MathSciNet  MATH  Google Scholar 

  • Nesterov Y (1983) A method for solving the convex programming problem with convergence rate \({\cal{O} }(1/k^2)\). Proc USSR Acad Sci 269:543–547

    Google Scholar 

  • Nocedal J, Wright S (1999) Numerical Optimization. Springer, Berlin

    Book  MATH  Google Scholar 

  • Ouorou A, Mahey P, Vial J-Ph (2000) A survey of algorithms for convex multicommodity flow problems. Manage Sci 46(1):126–147

    Article  MATH  Google Scholar 

  • Parikh N, Boyd S et al (2014) Proximal algorithms. Found Trends Optim 1(3):127–239

    Article  Google Scholar 

  • Passty G (1979) Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J Math Anal Appl 72(2):383–390

    Article  MathSciNet  MATH  Google Scholar 

  • Rey P, Sagastizábal C (2002) Dynamical adjustment of the prox-parameter in bundle methods. Optimization 51(2):423–447

    Article  MathSciNet  MATH  Google Scholar 

  • Rey P, Sagastizábal C (2002) Dynamical adjustment of the prox-parameter in bundle methods. Optimization 51(2):423–447

    Article  MathSciNet  MATH  Google Scholar 

  • Rockafellar R (1981) The theory of subgradients and its applications to problems of optimization. Heldermann Verlag

  • Schechtman S (2022) Stochastic proximal subgradient descent oscillates in the vicinity of its accumulation set. Optim Lett, 1–14

  • Schramm H, Zowe J (1992) A version of the bundle idea for minimizing a nonsmooth function: conceptual idea, convergence analysis, numerical results. SIAM J Optim 2(1):121–152

    Article  MathSciNet  MATH  Google Scholar 

  • Shor N (2012) Minimization methods for non-differentiable functions, vol 3. Springer Science & Business Media, Berlin

    Google Scholar 

  • Sinkhorn R (1964) A relationship between arbitrary positive matrices and doubly stochastic matrices. Ann Math Stat 35(2):876–879

    Article  MathSciNet  MATH  Google Scholar 

  • Sra S, Nowozin S, Wright S (2012) Optimization for machine learning. MIT Press, Cambridge

    Google Scholar 

  • Takapoui R, Javadi H (2016) Preconditioning via diagonal scaling. arXiv preprint arXiv:1610.03871

  • Teo C, Vishwanathan S, Smola A, Le Q (2010) Bundle methods for regularized risk minimization. J Mach Learn Res, 11(1)

  • Trisna T, Marimin M, Arkeman Y, Sunarti T (2016) Multi-objective optimization for supply chain management problem: a literature review. Decis Sci Lett 5(2):283–316

    Article  Google Scholar 

  • van Ackooij W, Frangioni A (2018) Incremental bundle methods using upper models. SIAM J Optim 28:379–410

    Article  MathSciNet  MATH  Google Scholar 

  • van Ackooij W, Frangioni A, de Oliveira W (2016) Inexact stabilized Benders’ decomposition approaches with application to chance-constrained problems with finite support. Comput Optim Appl 65:637–669

    Article  MathSciNet  MATH  Google Scholar 

  • van Ackooij W, Berge V, de Oliveira W, Sagastizábal C (2017) Probabilistic optimization via approximate \(p\)-efficient points and bundle methods. Comput Oper Res 77:177–193

    Article  MathSciNet  MATH  Google Scholar 

  • Wei F, Zhang X, Xu J, Bing J, Pan G (2020) Simulation of water resource allocation for sustainable urban development: an integrated optimization approach. J Clean Prod 273:122537

    Article  Google Scholar 

  • Westerlund T, Pettersson F (1995) An extended cutting plane method for solving convex MINLP problems. Comput Chem Eng 19:131–136

    Article  Google Scholar 

  • Yin P, Wang J (2006) Ant colony optimization for the nonlinear resource allocation problem. Appl Math Comput 174(2):1438–1453

    MathSciNet  MATH  Google Scholar 

  • Zhou B, Bao J, Li J, Lu Y, Liu T, Zhang Q (2021) A novel knowledge graph-based optimization approach for resource allocation in discrete manufacturing workshops. Robot Comput Integr Manuf 71:102160

    Article  Google Scholar 

Download references

Acknowledgements

We thank Parth Nobel, Nikhil Devanathan, Garrett van Ryzin, Dominique Perrault-Joncas, Lee Dicker, and Manan Chopra for very helpful discussions about the problem and formulation. The supply chain example was suggested by van Ryzin, Perrault-Joncas, and Dicker. The communication layer for the implementation with structured variables, to be described in a future paper, was designed by Parth Nobel and Manan Chopra. We thank Mateo Díaz for pointing us to some very relevant literature that we had missed in an early version of this paper. We thank three anonymous reviewers who gave extensive and helpful feedback on an early version of this paper. We gratefully acknowledge support from Amazon, Stanford Graduate Fellowship, Office of Naval Research, and the Oliger Memorial Fellowship. This research was partially supported by ACCESS – AI Chip Center for Emerging Smart Systems, sponsored by InnoHK funding, Hong Kong SAR.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tetiana Parshakova.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Convergence proof

Appendix A: Convergence proof

In this section we give a proof of convergence of the bundle method for oracle-structured optimization. Our proof uses well known ideas, and borrows heavily from Belloni (2005). We will make one additional (and traditional) assumption, that f and g are Lipschitz continuous on \(\mathop \textbf{dom}g\).

We say that the update was accepted in iteration k if \(x^{k+1}={\tilde{x}}^{k+1}\). Suppose this occurs in iterations \(k_1< k_2< \cdots \). We let \(K=\{k_1, k_2, \ldots \}\) denote the set of iterations where the update was accepted. We distinguish two cases: \(|K| = \infty \) and \(|K| < \infty \).

1.1 Infinite updates

We assume \(|K| = \infty \). First we establish that \(\delta ^{k_s}\rightarrow 0\) as \(s\rightarrow \infty \). Since \(k=k_s\) is an accepted step, from step 6 of the algorithm we have

$$\begin{aligned} \eta \delta ^{k_s} \le h(x^{k_s}) - h(x^{k_{s}+1}) = h(x^{k_s}) - h(x^{k_{s+1}}). \end{aligned}$$

Summing this inequality from \(s=1\) to \(s=l\) and dividing by \(\eta \) gives

$$\begin{aligned} \sum _{s=1}^l\delta ^{k_s} \le \frac{h(x^{k_1}) - h(x^{k_{l+1}})}{\eta } \le \frac{h(x^0) - h^\star }{\eta }, \end{aligned}$$

which implies that \(\delta ^{k_s}\) is summable, and so converges to zero as \(s \rightarrow \infty \).

Since \({\tilde{x}}^{k_s+1}\) minimizes \({\hat{h}}^{k_s}(x)+(\rho /2)\Vert x-x^{k_s}\Vert _2^2\), we have

$$\begin{aligned} \partial \hat{h}^{k_s}\left( {\tilde{x}}^{k_s+1} \right) + \rho ( {\tilde{x}}^{k_s+1}-x^{k_s}) \ni 0. \end{aligned}$$

Using \({\tilde{x}}^{k_s+1} = x^{k_s+1} = x^{k_{s+1}}\), we have

$$\begin{aligned} \rho (x^{k_s}-x^{k_{s+1}}) \in \partial \hat{h}^{k_s}\left( x^{k_{s+1}} \right) . \end{aligned}$$

It follows that

$$\begin{aligned} h^\star = h(x^\star ) \ge {\hat{h}}^{k_s}(x^\star ) \ge {\hat{h}}^{k_s}(x^{k_{s+1}}) + \rho (x^{k_s}-x^{k_{s+1}}) ^T (x^\star - x^{k_{s+1}}). \end{aligned}$$

We first rewrite this as

$$\begin{aligned} \frac{h^\star - {\hat{h}}^{k_s}(x^{k_{s+1}})}{\rho }\ge & {} (x^{k_s}-x^{k_{s+1}}) ^T (x^\star - x^{k_s}) + (x^{k_s}-x^{k_{s+1}}) ^T (x^{k_s}- x^{k_{s+1}}))\\= & {} (x^{k_s}-x^{k_{s+1}}) ^T (x^\star - x^{k_s}) + \Vert x^{k_s}-x^{k_{s+1}}\Vert _2^2, \end{aligned}$$

and then in the form we will use below,

$$\begin{aligned} 2(x^{k_s}-x^{k_{s+1}}) ^T (x^\star - x^{k_s}) \le (2/\rho )\left( h^\star - {\hat{h}}^{k_s}(x^{k_{s+1}})\right) - 2\Vert x^{k_s}-x^{k_{s+1}}\Vert _2^2. \end{aligned}$$

Now we use a standard subgradient algorithm argument. We have

$$\begin{aligned} \Vert x^{k_{s+1}} - x^\star \Vert _2^2= & {} \Vert x^{k_s} - x^\star \Vert _2^2 + \Vert x^{k_{s+1}} - x^{k_s}\Vert _2^2 + 2 (x^{k_s} - x^{k_{s+1}})^T (x^\star - x^{k_s}) \\\le & {} \Vert x^{k_s} - x^\star \Vert _2^2+(2/\rho ) \left( h^\star - \hat{h}^{k_s}(x^{k_{s+1}}) \right) - \Vert x^{k_{s+1}} - x^{k_s}\Vert _2^2 \\= & {} \Vert x^{k_s} - x^\star \Vert _2^2+(2/\rho ) \left( h^\star - h(x^{k_s}) +\delta ^{k_s} \right) . \end{aligned}$$

Summing this inequality from \(s=1\) to \(s=l\) and re-arranging yields

$$\begin{aligned} (2/\rho ) \sum _{s=1}^l \left( h(x^{k_s}) - h^\star \right)\le & {} \Vert x^{k_1} - x^\star \Vert _2^2 - \Vert x^{k_{l+1}} - x^\star \Vert _2^2 + (2/\rho )\sum _{s=1}^l\delta ^{k_s} \\\le & {} \Vert x^{k_1} - x^\star \Vert _2^2 +2(h(x^0) - h^\star )/\eta \rho . \end{aligned}$$

It follows that the nonnegative series \(h(x^{k_s}) - h^\star \) is summable, and therefore, \(h(x^{k_s})\rightarrow h^\star \) as \(s \rightarrow \infty \).

1.2 Finite updates

We assume \(|K| < \infty \), with \(p = \max K\) its largest entry. It follows that for any \(k>p\), we have \( h(x^k) - h\left( \tilde{x}^{k+1}\right) <\eta \delta ^k \). Note that \(x^k=x^p\) for all \(k\ge p+1\). Moreover, using

$$\begin{aligned} \Vert \tilde{x}^{k+2} - x^p \Vert _2^2 = \Vert \tilde{x}^{k+2} - \tilde{x}^{k+1} \Vert _2^2 + \left\| \tilde{x}^{k+1} - x^p\right\| _2^2 - 2\left( x^p - \tilde{x}^{k+1}\right) ^T(\tilde{x}^{k+2} -\tilde{x}^{k+1}) \end{aligned}$$

with \(\rho (x^p - \tilde{x}^{k+1}) \in \partial \hat{h}^k\left( \tilde{x}^{k+1} \right) \) and \(\hat{h}^{k+1}(\tilde{x}^{k+2})\ge \hat{h}^k(\tilde{x}^{k+2})\), we get

$$\begin{aligned} \delta ^k - \delta ^{k+1}\ge & {} \hat{h}^{k+1}(\tilde{x}^{k+2})-\hat{h}^k\left( \tilde{x}^{k+1} \right) - \rho \left( x^p - \tilde{x}^{k+1}\right) ^T(\tilde{x}^{k+2} -\tilde{x}^{k+1})\\{} & {} + (\rho /2)\left\| \tilde{x}^{k+2} -\tilde{x}^{k+1}\right\| _2^2\\\ge & {} (\rho /2)\left\| \tilde{x}^{k+2} -\tilde{x}^{k+1}\right\| _2^2. \end{aligned}$$

Therefore, \(\delta ^k \ge \delta ^{k+1}+ (\rho /2)\left\| \tilde{x}^{k+2} - \tilde{x}^{k+1}\right\| _2^2\) for all \(k \ge p+1\). Then from

$$\begin{aligned} \hat{h}^k(x^p)\ge & {} \hat{h}^k\left( \tilde{x}^{k+1}\right) + \rho (x^p - \tilde{x}^{k+1})^T(x^p - \tilde{x}^{k+1})\\= & {} h(x^k)-\delta ^k+(\rho /2)\left\| x^p - \tilde{x}^{k+1}\right\| _2^2, \end{aligned}$$

it follows that \(\left\| x^p - \tilde{x}^{k+1}\right\| _2^2\le 2\delta ^k/\rho \le 2\delta ^p/\rho \).

Now we use the assumption that f and g are Lipschitz continuous with Lipschitz constant L for all \( x \in \mathop \textbf{dom}g\). Every \(q\in \partial \hat{f}^k(x)\) has the form \(q= \sum _{t \le k} \theta _t q^t\), with \(\theta _t \ge 0\) and \(\sum _t \theta _t=1\), a convex combination of normal vectors of active constraints at x, where \(q^t \in \partial f(x^t)\). Therefore, \(\hat{h}^k(x)=\hat{f}^k (x)+g(x)\) is 2L-Lipschitz continuous.

Combining this with

$$\begin{aligned} \delta ^k \le h(x^k) - \hat{h}^k\left( \tilde{x}^{k+1}\right) , \qquad -\eta \delta ^k \le {h}\left( \tilde{x}^{k+1}\right) - h(x^k), \end{aligned}$$

we have

$$\begin{aligned} (1-\eta )\delta ^k \le {h}\left( \tilde{x}^{k+1}\right) - h\left( \tilde{x}^k\right) + \hat{h}^k\left( \tilde{x}^k\right) - \hat{h}^k\left( \tilde{x}^{k+1}\right) \le 4L \left\| \tilde{x}^k - \tilde{x}^{k+1} \right\| _2. \end{aligned}$$

Therefore, from

$$\begin{aligned} \frac{(1-\eta )^2\rho }{32L^2} \sum _{k\ge p} \left( \delta ^k\right) ^2 \le \sum _{k\ge p} \left( \delta ^k-\delta ^{k+1}\right) \le \delta ^{p}, \end{aligned}$$

we can establish that \(\delta ^{k}\) converges to zero as \(k \rightarrow \infty \). This implies

$$\begin{aligned} \underset{k\rightarrow \infty }{\lim }\left( \hat{h}^k\left( \tilde{x}^{k+1}\right) +(\rho /2) \left\| \tilde{x}^{k+1} - x^p\right\| _2^2 \right) = h(x^p). \end{aligned}$$

Also from \(\underset{k\rightarrow \infty }{\lim }\ \left( h(x^p) - h\left( \tilde{x}^{k+1}\right) \right) = 0\) and \(\left\| {x}^{p} - \tilde{x}^{k+1}\right\| _2^2 \le \frac{2\delta ^k}{\rho }\), it follows that

$$\begin{aligned} \underset{k\rightarrow \infty }{\lim }\ \hat{h}^k\left( \tilde{x}^{k+1} \right) = h(x^p), \qquad \underset{k\rightarrow \infty }{\lim }\ \left\| {x}^{p} - \tilde{x}^{k+1}\right\| _2^2 = 0. \end{aligned}$$

Hence, we get \(0 \in \partial h(x^p)\), which implies \(h(x^p)=h^\star \).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Parshakova, T., Zhang, F. & Boyd, S. Implementation of an oracle-structured bundle method for distributed optimization. Optim Eng (2023). https://doi.org/10.1007/s11081-023-09859-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11081-023-09859-z

Keywords

Navigation