A parallelizable augmented Lagrangian method applied to large-scale non-convex-constrained optimization problems

Abstract

We contribute improvements to a Lagrangian dual solution approach applied to large-scale optimization problems whose objective functions are convex, continuously differentiable and possibly nonlinear, while the non-relaxed constraint set is compact but not necessarily convex. Such problems arise, for example, in the split-variable deterministic reformulation of stochastic mixed-integer optimization problems. We adapt the augmented Lagrangian method framework to address the presence of nonconvexity in the non-relaxed constraint set and to enable efficient parallelization. The development of our approach is most naturally compared with the development of proximal bundle methods and especially with their use of serious step conditions. However, deviations from these developments allow for an improvement in efficiency with which parallelization can be utilized. Pivotal in our modification to the augmented Lagrangian method is an integration of the simplicial decomposition method and the nonlinear block Gauss–Seidel method. An adaptation of a serious step condition associated with proximal bundle methods allows for the approximation tolerance to be automatically adjusted. Under mild conditions optimal dual convergence is proven, and we report computational results on test instances from the stochastic optimization literature. We demonstrate improvement in parallel speedup over a baseline parallel approach.

This is a preview of subscription content, access via your institution.

Fig. 1

Notes

  1. 1.

    Implementable is used in the sense of ability to implement the required functionality of an algorithm, not the meaning of implementable specific to stochastic optimization.

References

  1. 1.

    Birge, J.R., Louveaux, F.: Introduction to Stochastic Programming. Springer, Berlin (2011)

    Google Scholar 

  2. 2.

    Shor, N.: Minimization Methods for Non-differentiable Functions. Springer, New York (1985)

    Google Scholar 

  3. 3.

    Bertsekas, D.: Nonlinear Programming. Athena Scientific, Belmont (1999)

    Google Scholar 

  4. 4.

    Ruszczyński, A.: Nonlinear Optimization. Princeton University Press, Princeton (2006)

    Google Scholar 

  5. 5.

    Carøe, C.C., Schultz, R.: Dual decomposition in stochastic integer programming. Oper. Res. Lett. 24(1), 37–45 (1999)

    MathSciNet  MATH  Article  Google Scholar 

  6. 6.

    Bertsekas, D.: Constrained Optimization and Lagrange Multiplier Methods. Academic Press, London (1982)

    Google Scholar 

  7. 7.

    Hestenes, M.R.: Multiplier and gradient methods. J. Optim. Theory Appl. 4, 303–320 (1969)

    MathSciNet  MATH  Article  Google Scholar 

  8. 8.

    Powell, M.J.D.: A method for nonlinear constraints in minimization problems. In: Fletcher, R. (ed.) Optimization. Academic Press, New York (1969)

    Google Scholar 

  9. 9.

    Rockafellar, R.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14(5), 877–898 (1976)

    MathSciNet  MATH  Article  Google Scholar 

  10. 10.

    Lemaréchal, C.: An Extension of Davidon Methods to Non differentiable Problems, pp. 95–109. Springer, Berlin Heidelberg (1975)

  11. 11.

    de Oliveira, W., Sagastizábal, C.: Bundle methods in the XXIst century: a bird’s-eye view. Pesqui. Oper. 34, 647–670 (2014)

    Article  Google Scholar 

  12. 12.

    Hare, W., Sagastizábal, C., Solodov, M.: A proximal bundle method for nonsmooth nonconvex functions with inexact information. Comput. Optim. Appl. 63, 1–28 (2016)

    MathSciNet  MATH  Article  Google Scholar 

  13. 13.

    de Oliveira, W., Sagastizábal, C., Lemaréchal, C.: Convex proximal bundle methods in depth: a unified analysis for inexact oracles. Math. Program. 148(1), 241–277 (2014)

    MathSciNet  MATH  Article  Google Scholar 

  14. 14.

    Amor, H.B., Desrosiers, J., Frangioni, A.: On the choice of explicit stabilizing terms in column generation. Discrete Appl. Math. 157(6), 1167–1184 (2009)

    MathSciNet  MATH  Article  Google Scholar 

  15. 15.

    Bertsekas, D.: Incremental aggregated proximal and augmented Lagrangian algorithms. arXiv preprint arXiv:1509.09257 (2015)

  16. 16.

    Fischer, F., Helmberg, C.: A parallel bundle framework for asynchronous subspace optimization of nonsmooth convex functions. SIAM J. Optim. 24(2), 795–822 (2014)

    MathSciNet  MATH  Article  Google Scholar 

  17. 17.

    Lubin, M., Martin, K., Petra, C., Sandıkçı, B.: On parallelizing dual decomposition in stochastic integer programming. Oper. Res. Lett. 41(3), 252–258 (2013)

    MathSciNet  MATH  Article  Google Scholar 

  18. 18.

    Bertsekas, D.: Convex Optimization Algorithms. Athena Scientific, Belmont (2015)

    Google Scholar 

  19. 19.

    Holloway, C.: An extension of the Frank and Wolfe method of feasible directions. Math. Program. 6(1), 14–27 (1974)

    MathSciNet  MATH  Article  Google Scholar 

  20. 20.

    Von Hohenbalken, B.: Simplicial decomposition in nonlinear programming algorithms. Math. Program. 13(1), 49–68 (1977)

    MathSciNet  MATH  Article  Google Scholar 

  21. 21.

    Bonettini, S.: Inexact block coordinate descent methods with application to non-negative matrix factorization. IMA J. Numer. Anal. 31(4), 1431–1452 (2011)

    MathSciNet  MATH  Article  Google Scholar 

  22. 22.

    Grippo, L., Sciandrone, M.: On the convergence of the block nonlinear Gauss–Seidel method under convex constraints. Oper. Res. Lett. 26(3), 127–136 (2000)

    MathSciNet  MATH  Article  Google Scholar 

  23. 23.

    Hildreth, C.: A quadratic programming procedure. Nav. Res. Logist. Q. 4, 79–85 (1957). 361

    MathSciNet  Article  Google Scholar 

  24. 24.

    Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109, 475–494 (2001)

    MathSciNet  MATH  Article  Google Scholar 

  25. 25.

    Warga, J.: Minimizing certain convex functions. SIAM J. Appl. Math. 11, 588–593 (1963)

    MathSciNet  MATH  Article  Google Scholar 

  26. 26.

    Eckstein, J.: A practical general approximation criterion for methods of multipliers based on Bregman distances. Math. Program. 96(1), 61–86 (2003)

    MathSciNet  MATH  Article  Google Scholar 

  27. 27.

    Eckstein, J., Silva, P.: A practical relative error criterion for augmented lagrangians. Math. Program. 141(1), 319–348 (2013)

    MathSciNet  MATH  Article  Google Scholar 

  28. 28.

    Hamdi, A., Mahey, P., Dussault, J.P.: Recent Advances in Optimization: Proceedings of the 8th French–German Conference on Optimization Trier, July 21–26, 1996, chap. A New Decomposition Method in Nonconvex Programming via a Separable Augmented Lagrangian, pp. 90–104. Springer Berlin Heidelberg, Berlin, Heidelberg (1997)

  29. 29.

    Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)

    MATH  Article  Google Scholar 

  30. 30.

    Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2, 17–40 (1976)

    MATH  Article  Google Scholar 

  31. 31.

    Glowinski, R., Marrocco, A.: Sur l’approximation, par elements finis d’ordre un, et la resolution, par penalisation-dualité, d’une classe de problems de dirichlet non lineares. Revue Française d’Automatique, Informatique, et Recherche Opérationelle 9, 41–76 (1975)

    MATH  Article  Google Scholar 

  32. 32.

    Chatzipanagiotis, N., Dentcheva, D., Zavlanos, M.: An augmented Lagrangian method for distributed optimization. Math. Program. 152(1), 405–434 (2014)

    MathSciNet  MATH  Google Scholar 

  33. 33.

    Tappenden, R., Richtárik, P., Büke, B.: Separable approximations and decomposition methods for the augmented Lagrangian. Optim. Methods Softw. 30(3), 643–668 (2015)

    MathSciNet  MATH  Article  Google Scholar 

  34. 34.

    Mulvey, J., Ruszczyński, A.: A diagonal quadratic approximation method for large scale linear programs. Oper. Res. Lett. 12(4), 205–215 (1992)

    MathSciNet  MATH  Article  Google Scholar 

  35. 35.

    Ruszczyński, A.: On convergence of an augmented Lagrangian decomposition method for sparse convex optimization. Math. Oper. Res. 20(3), 634–656 (1995)

    MathSciNet  MATH  Article  Google Scholar 

  36. 36.

    Kiwiel, K., Rosa, C., Ruszczyński, A.: Proximal decomposition via alternating linearization. SIAM J. Optim. 9(3), 668–689 (1999)

    MathSciNet  MATH  Article  Google Scholar 

  37. 37.

    Lin, X., Pham, M., Ruszczyński, A.: Alternating linearization for structured regularization problems. J. Mach. Learn. Res. 15, 3447–3481 (2014)

    MathSciNet  MATH  Google Scholar 

  38. 38.

    Chen, G., Teboulle, M.: A proximal-based decomposition method for convex minimization problems. Math. Program. 64, 81–101 (1994)

    MathSciNet  MATH  Article  Google Scholar 

  39. 39.

    He, B., Liao, L.Z., Han, D., Yang, H.: A new inexact alternating directions method for monotone variational inequalities. Math. Program. 92, 103–118 (2002)

    MathSciNet  MATH  Article  Google Scholar 

  40. 40.

    Boland, N., Christiansen, J., Dandurand, B., Eberhard, A., Linderoth, J., Luedtke, J., Oliveira, F.: Progressive hedging with a Frank–Wolfe based method for computing stochastic mixed-integer programming Lagrangian dual bounds. Optimization Online (2016). http://www.optimization-online.org/DB_HTML/2016/03/5391.html

  41. 41.

    Eckstein, J., Bertsekas, D.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55(1–3), 293–318 (1992)

    MathSciNet  MATH  Article  Google Scholar 

  42. 42.

    Eckstein, J., Yao, W.: Understanding the Convergence of the Alternating Direction Method of Multipliers: Theoretical and Computational Perspectives. Rutgers University, New Brunswick (2014). Tech. rep

    Google Scholar 

  43. 43.

    Mahey, P., Oualibouch, S., Tao, P.D.: Proximal decomposition on the graph of a maximal monotone operator. SIAM J. Optim. 5(2), 454–466 (1995)

    MathSciNet  MATH  Article  Google Scholar 

  44. 44.

    Feizollahi, M.J., Costley, M., Ahmed, S., Grijalva, S.: Large-scale decentralized unit commitment. Int. J. Electr. Power Energy Syst. 73, 97–106 (2015)

    Article  Google Scholar 

  45. 45.

    Gade, D., Hackebeil, G., Ryan, S.M., Watson, J.P., Wets, R.J.B., Woodruff, D.L.: Obtaining lower bounds from the progressive hedging algorithm for stochastic mixed-integer programs. Math. Program. 157(1), 47–67 (2016)

    MathSciNet  MATH  Article  Google Scholar 

  46. 46.

    Rockafellar, R.T.: Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1(2), 97–116 (1976)

    MathSciNet  MATH  Article  Google Scholar 

  47. 47.

    Rockafellar, R.: Convex Analysis. Princeton University Press, Princeton (1970)

    Google Scholar 

  48. 48.

    Hathaway, R.J., Bezdek, J.C.: Grouped coordinate minimization using Newton’s method for inexact minimization in one vector coordinate. J. Optim. Theory Appl. 71(3), 503–516 (1991)

    MathSciNet  MATH  Article  Google Scholar 

  49. 49.

    Kiwiel, K.C.: Approximations in proximal bundle methods and decomposition of convex programs. J. Optim. Theory Appl. 84(3), 529–548 (1995)

    MathSciNet  MATH  Article  Google Scholar 

  50. 50.

    Bodur, M., Dash, S., Günlük, O., Luedtke, J.: Strengthened Benders cuts for stochastic integer programs with continuous recourse (2014). http://www.optimization-online.org/DB_FILE/2014/03/4263.pdf. Last Accessed 13 Jan (2015)

  51. 51.

    Ahmed, S., Garcia, R., Kong, N., Ntaimo, L., Parija, G., Qiu, F., Sen, S.: SIPLIB: A stochastic integer programming test problem library (2015). http://www.isye.gatech.edu/sahmed/siplib

  52. 52.

    Ntaimo, L.: Decomposition algorithms for stochastic combinatorial optimization: Computational experiments and extensions. Ph.D. thesis (2004)

  53. 53.

    The MathWorks, Natick: MATLAB 2012b (2014)

  54. 54.

    IBM Corporation: IBM ILOG CPLEX Optimization Studio CPLEX Users Manual. http://www.ibm.com/support/knowledgecenter/en/SSSA5P_12.6.1/ilog.odms.studio.help/pdf/usrcplex.pdf. Last Accessed 22 Aug (2016)

  55. 55.

    IBM Corporation: IBM ILOG CPLEX V12.5. http://www-01.ibm.com/software/commerce/optimization/cplex-optimizer/. Last Accessed 28 Jan (2016)

  56. 56.

    COmputational INfrastructure for Operations Research. http://www.coin-or.org/. Last Accessed 28 Jan (2016)

  57. 57.

    National Computing Infrastructure (NCI): NCI Website. http://www.nci.org.au. Last Accessed 19 Nov 2016

  58. 58.

    Gertz, E., Wright, S.: Object-oriented software for quadratic programming. ACM Trans. Math. Softw. 29(1), 58–81 (2003)

    MathSciNet  MATH  Article  Google Scholar 

  59. 59.

    Lubin, M., Petra, C., Anitescu, M., Zavala, V.: Scalable stochastic optimization of complex energy systems. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 64:164:10. ACM, Seattle, WA (2011)

  60. 60.

    Clarke, F.: Optimization and Nonsmooth Analysis. Society for Industrial and Applied Mathematics (1990)

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Andrew Eberhard.

Additional information

This work was supported by the Australian Research Council (ARC) Grant ARC DP140100985.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 87 KB)

A Technical lemmas for establishing optimal convergence of SDM-GS

A Technical lemmas for establishing optimal convergence of SDM-GS

Given initial \((x^0,z^0) \in X \times Z \subset \mathbb {R}^n \times \mathbb {R}^q\), we consider the generation of the sequence \(\left\{ (x^k,z^k) \right\} \) with iterations computed using Algorithm 4, whose target problem is given by

$$\begin{aligned} \min _{x,z} \left\{ F(x,z) : x \in X, z \in Z \right\} , \end{aligned}$$
(32)

where \((x,z) \mapsto F(x,z)\) is convex and continuously differentiable over \(X \times Z\), and sets X and Z are closed and convex, with X bounded and \(z \mapsto F(x,z)\) is inf-compact for each \(x \in X\).

We define the directional derivative with respect to x as

$$\begin{aligned} F_x'(x,z;d) := \lim _{\alpha \downarrow 0} \frac{F(x+\alpha d,z) - F(x,z)}{\alpha }. \end{aligned}$$

Of interest is the satisfaction of the following local stationarity condition at \(x \in X\):

$$\begin{aligned}&F_x'({x},{z};d) \ge 0 \quad \text {for all}\; d \in X - \left\{ {x} \right\} \end{aligned}$$
(33)

for any limit point \((x,z)=(\bar{x},\bar{z})\) of some sequence \(\left\{ (x^k,z^k) \right\} \) of feasible solutions to problem (32). For the sake of nontriviality, we shall assume that the x-stationarity condition (33) never holds at \((x,z)=(x^k,z^k)\) for any \(k \ge 0\). Thus, for each \(x^k\), \(k \ge 0\), there always exists a \(d^k \in X-\left\{ x^k \right\} \) for which \(F_x'(x^k,z^k;d^k) < 0\).

Direction Assumptions (DAs) For each iteration \(k \ge 0\), given \(x^k \in X\) and \(z^k \in Z\), we have \(d^k\) chosen so that (1) \(x^k + d^k \in X\); and (2) \(F_x'(x^k,z^k;d^k) < 0\).

Gradient Related Assumption (GRA) Given a sequence \(\left\{ (x^k,z^k) \right\} \) with \(\lim _{k \rightarrow \infty } (x^k,z^k) = (\overline{x},\overline{z})\), and a bounded sequence \(\left\{ d^k \right\} \) of directions, then the existence of a direction \(\overline{d} \in X - \left\{ \overline{x} \right\} \) such that \(F_{x}'(\overline{x},\overline{z};\overline{d}) < 0\) implies that

$$\begin{aligned} \limsup _{k \rightarrow \infty } F_{x}'\left( x^k,z^k;d^k\right) < 0. \end{aligned}$$
(34)

In this case, we say that \(\left\{ d^k \right\} \) is gradient related to \(\left\{ x^k \right\} \). This gradient related condition is similar to the one defined in [3]. The sequence of directions \({d^k}\) is typically gradient related to \(\left\{ x^k \right\} \) by construction. (See Lemma 7.)

To state the last assumption, we require the notion of an Armijo rule step length \(\alpha ^k \in (0,1]\) given \((x^k,z^k,d^k)\) and parameters \(\beta ,\sigma \in (0,1)\).

figured

Remark 6

Under mild assumptions on F such as continuity that guarantee the existence of finite \(F_x'(x,z;d)\) for all \((x,z,d) \in \left\{ (x,z,d) : x \in X, d \in X-\left\{ x \right\} , z \in Z \right\} \), we may assume that the while loop of Lines 35 terminates after a finite number of iterations. Thus, we have \(\alpha ^k \in (0,1]\) for each \(k \ge 1\).

The last significant assumption is stated as follows.

Sufficient Decrease Assumption (SDA) For sequences \(\left\{ (x^k,z^k,d^k) \right\} \) and step lengths \(\left\{ \alpha ^k \right\} \) computed according to Algorithm 4, we assume for each \(k \ge 0\), that \((x^{k+1},z^{k+1})\) satisfies

$$\begin{aligned} F\left( x^{k+1},z^{k+1}\right) \le F\left( x^k + \alpha ^k d^k,z^k\right) . \end{aligned}$$

Lemma 6

For problem (32), let \(F : \mathbb {R}^{n_x} \times \mathbb {R}^{n_z} \mapsto \mathbb {R}\) be convex and continuously differentiable, \(X \subset \mathbb {R}^{n_x}\) convex and compact, and \(Z \subseteq \mathbb {R}^{n_z}\) closed and convex. Furthermore, assume for each \(x \in X\) that \(z \mapsto F(x,z)\) is inf-compact. If a sequence \(\left\{ (x^k,z^k,d^k) \right\} \) satisfies the DA, the GRA, and the SDA for some fixed \(\beta ,\sigma \in (0,1)\), then the sequence \({(x^k,z^k)}\) has limit points \((\overline{x},\overline{z})\), each of which satisfies the stationarity condition (33).

Proof

The existence of limit points \((\overline{x},\overline{z})\) follows from the compactness of X, the inf-compactness of \(z \mapsto F(x,z)\) for each \(x \in X\), and the SDA. In generating \(\left\{ \alpha ^k \right\} \) according to the Armijo rule as implemented in Lines 25 of Algorithm 4, we have

$$\begin{aligned} \frac{F\left( x^k + \alpha ^k d^k,z^k\right) - F\left( x^k,z^k\right) }{\alpha ^k} \le \sigma F_x'\left( x^k,z^k ; d^k\right) . \end{aligned}$$
(35)

By the DA, \(F_x'(x^k,z^k ; d^k) < 0\) and since \(\alpha ^k > 0\) for each \(k \ge 1\) by Remark 6, we infer from (35) that \(F(x^k + \alpha ^k d^k,z^k) < F(x^k,z^k)\). By construction, we have \(F(x^{k+1},z^{k+1}) \le F(x^k + \alpha ^k d^k,z^k) < F(x^k,z^k).\). By the monotonicity \(F(x^{k+1},z^{k+1}) < F(x^k,z^k)\) and F being bounded from below on \(X \times Z\), we have \(\lim _{k \rightarrow \infty } F(x^k,z^k) = \bar{F} > -\infty \). Therefore,

$$\begin{aligned} \lim _{k \rightarrow \infty } F\left( x^{k+1},z^{k+1}\right) - F\left( x^k,z^k\right) = 0, \end{aligned}$$

which implies

$$\begin{aligned} \lim _{k \rightarrow \infty } F\left( x^k + \alpha ^k d^k,z^k\right) - F\left( x^k,z^k\right) = 0. \end{aligned}$$
(36)

We assume for sake of contradiction that \(\lim _{k \rightarrow \infty } (x^k,z^k) = (\overline{x},\overline{y})\) does not satisfy the stationarity condition (33). By GRA, we have that \(\left\{ d^k \right\} \) is gradient related to \(\left\{ x^k \right\} \); that is,

$$\begin{aligned} \limsup _{k \rightarrow \infty } F_x'\left( x^k,z^k ; d^k\right) < 0. \end{aligned}$$
(37)

Thus, it follows from (35)–(37) that \(\lim _{k \rightarrow \infty } \alpha ^k = 0\).

Consequently, after a certain iteration \(k \ge \bar{k}\), we can define \(\left\{ \bar{\alpha }^k \right\} \), \(\bar{\alpha }^k = \alpha ^k / \beta \), where \(\bar{\alpha }^k \le 1\) for \(k \ge \bar{k}\), and so we have

$$\begin{aligned} \sigma F_x'\left( x^k,z^k ; d^k\right) < \frac{F\left( x^k + \bar{\alpha }^k d^k,z^k\right) - F\left( x^k,z^k\right) }{\bar{\alpha }^k}. \end{aligned}$$
(38)

Since F is continuously differentiable, the mean value theorem may be applied to the right-hand side of (38) to get

$$\begin{aligned} \sigma F_x'\left( x^k,z^k ; d^k\right) < F_x'\left( x^k +\widetilde{\alpha }^k d^k, z^k ; d^k\right) , \end{aligned}$$
(39)

for some \(\widetilde{\alpha }^k \in [0,\overline{\alpha }^k]\).

Again, using the assumption \(\limsup _{k \rightarrow \infty } F_x'(x^k,z^k ; d^k) < 0\), and also the compactness of \(X - X\), we take a limit point \(\overline{d}\) of \(\left\{ d^k \right\} \), with its associated subsequence index set denoted by \(\mathcal {K}\), such that \(F_x'(\overline{x},\overline{z},\overline{d}) < 0\). Taking the limits over the subsequence indexed by \(\mathcal {K}\), we have \(\lim _{k \rightarrow \infty , k \in \mathcal {K}} F_x'(x^k,z^k ; d^k) = F_x'(\overline{x},\overline{z} ; \overline{d})\) and \(\lim _{k \rightarrow \infty , k \in \mathcal {K}} F_x'(x^k +\widetilde{\alpha }^k d^k, z^k ; d^k) = F_x'(\overline{x},\overline{z} ; \overline{d})\). These two limits holds since (1) \((x,z) \mapsto F_x'(x,z;d)\) for each \(d \in X - X\) is continuous and (2) \(d \mapsto F_x'(x,z;d)\) is locally Lipschitz continuous for each \((x,z) \in X \times Z\) (e.g., Proposition 2.1.1 of [60]); these two facts together imply that \((x,z;d) \mapsto F_x'(x,z;d)\) is continuous. Then, inequality (39) becomes in the limit as \(k \rightarrow \infty \), \(k \in \mathcal {K}\),

$$\begin{aligned} \sigma F_x'(\overline{x},\overline{z} ; \overline{d}) \le F_x'(\overline{x},\overline{z} ; \overline{d}) \quad \Longrightarrow \quad 0 \le (1-\sigma ) F_x'(\overline{x},\overline{z} ; \overline{d}). \end{aligned}$$

Since \((1-\sigma ) > 0\) and \(F_x'(\overline{x},\overline{z} ; \overline{d})<0\), we have a contradiction. Thus, \(\overline{x}\) must satisfy the stationary condition (33). \(\square \)

Remark 7

Noting that \(F_x'(x^k,z^k;d^k) = \nabla _x F(x^k,z^k) ^\top d^k\) under the assumption of continuous differentiability of F, one means of constructing \(\left\{ d^k \right\} \) is as follows:

$$\begin{aligned} d^k \leftarrow {{\mathrm{argmin}}}_d \left\{ \nabla _x F(x^k,z^k)^\top d : d \in X-\left\{ x^k \right\} \right\} . \end{aligned}$$
(40)

Lemma 7

Given sequence \(\left\{ (x^k,z^k) \right\} \) with \(\lim _{k \rightarrow \infty } (x^k,z^k) = (\overline{x},\overline{z})\), let each \(d^k\), \(k \ge 1\), be generated as in (40). Then \(\left\{ d^k \right\} \) is gradient related to \(\left\{ x^k \right\} \).

Proof

By the construction of \(d^k\), \(k \ge 1\), we have

$$\begin{aligned} F_x'(x^k,z^k ; d^k) \le F_x'(x^k,z^k ; d) \quad \forall \; d \in X-\left\{ x^k \right\} . \end{aligned}$$

Taking the limit, we have

$$\begin{aligned} \limsup _{k \rightarrow \infty } F_x'(x^k, z^k ; d^k) \le \limsup _{k \rightarrow \infty } F_x'(x^k, z^k ; d) \le F_x'(\overline{x}, \overline{z}; d) \quad \forall \; d \in X-\left\{ \overline{x} \right\} , \end{aligned}$$

where the last inequality follows from the upper semicontinuity of the function \((x,z,d) \mapsto F_x'(x,z;d)\), which holds in our setting due, primarily, to Proposition 2.1.1 (b) of  [60] given that F is assumed to be convex and continuous on \(\mathbb {R}^n\). Taking

$$\begin{aligned} \overline{d} \in {{\mathrm{argmin}}}_d \left\{ F_x'(\overline{x},\overline{z} ; d) : d \in X - \left\{ \overline{x} \right\} \right\} , \end{aligned}$$

we have by the assumed nonstationarity that \(F_x'(\overline{x},\overline{z};\overline{d}) < 0\). Thus, \(\limsup _{k \rightarrow \infty } F_x'(x^k, z^k ; d^k) < 0\), and so GRA holds. \(\square \)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Boland, N., Christiansen, J., Dandurand, B. et al. A parallelizable augmented Lagrangian method applied to large-scale non-convex-constrained optimization problems. Math. Program. 175, 503–536 (2019). https://doi.org/10.1007/s10107-018-1253-9

Download citation

Keywords

  • Augmented Lagrangian method
  • Proximal bundle method
  • Nonlinear block Gauss–Seidel method
  • Simplicial decomposition method
  • Parallel computing

Mathematics Subject Classification

  • 90-08
  • 90C06
  • 90C11
  • 90C15
  • 90C25
  • 90C26
  • 90C30
  • 90C46