Skip to main content
Log in

A Two-Stage Active-Set Algorithm for Bound-Constrained Optimization

  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Abstract

In this paper, we describe a two-stage method for solving optimization problems with bound constraints. It combines the active-set estimate described in Facchinei and Lucidi (J Optim Theory Appl 85(2):265–289, 1995) with a modification of the non-monotone line search framework recently proposed in De Santis et al. (Comput Optim Appl 53(2):395–423, 2012). In the first stage, the algorithm exploits a property of the active-set estimate that ensures a significant reduction in the objective function when setting to the bounds all those variables estimated active. In the second stage, a truncated-Newton strategy is used in the subspace of the variables estimated non-active. In order to properly combine the two phases, a proximity check is included in the scheme. This new tool, together with the other theoretical features of the two stages, enables us to prove global convergence. Furthermore, under additional standard assumptions, we can show that the algorithm converges at a superlinear rate. Promising experimental results demonstrate the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Conn, A.R., Gould, N.I., Toint, P.L.: Global convergence of a class of trust region algorithms for optimization with simple bounds. SIAM J. Numer. Anal. 25(2), 433–460 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  2. Lin, C.J., Moré, J.J.: Newton’s method for large bound-constrained optimization problems. SIAM J. Optim. 9(4), 1100–1127 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  3. Dennis, J., Heinkenschloss, M., Vicente, L.N.: Trust-region interior-point SQP algorithms for a class of nonlinear programming problems. SIAM J. Control Optim. 36(5), 1750–1794 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  4. Heinkenschloss, M., Ulbrich, M., Ulbrich, S.: Superlinear and quadratic convergence of affine-scaling interior-point Newton methods for problems with simple bounds without strict complementarity assumption. Math. Program. 86(3), 615–635 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  5. Kanzow, C., Klug, A.: On affine-scaling interior-point Newton methods for nonlinear minimization with bound constraints. Comput. Optim. Appl. 35(2), 177–197 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  6. Bertsekas, D.P.: Projected Newton methods for optimization problems with simple constraints. SIAM J. Control Optim. 20(2), 221–246 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  7. Facchinei, F., Lucidi, S., Palagi, L.: A truncated Newton algorithm for large scale box constrained optimization. SIAM J. Optim. 12(4), 1100–1125 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  8. Hager, W.W., Zhang, H.: A new active set algorithm for box constrained optimization. SIAM J. Optim. 17(2), 526–557 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  9. Schwartz, A., Polak, E.: Family of projected descent methods for optimization problems with simple bounds. J. Optim. Theory Appl. 92(1), 1–31 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  10. Facchinei, F., Júdice, J., Soares, J.: An active set Newton algorithm for large-scale nonlinear programs with box constraints. SIAM J. Optim. 8(1), 158–186 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  11. Cheng, W., Li, D.: An active set modified Polak-Ribiere-Polyak method for large-scale nonlinear bound constrained optimization. J. Optim. Theory Appl. 155(3), 1084–1094 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  12. Andreani, R., Birgin, E.G., Martínez, J.M., Schuverdt, M.L.: Second-order negative-curvature methods for box-constrained and general constrained optimization. Comput. Optim. Appl. 45(2), 209–236 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  13. Birgin, E.G., Martínez, J.M.: Large-scale active-set box-constrained optimization method with spectral projected gradients. Comput. Optim. Appl. 23(1), 101–125 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  14. De Santis, M., Di Pillo, G., Lucidi, S.: An active set feasible method for large-scale minimization problems with bound constraints. Comput. Optim. Appl. 53(2), 395–423 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  15. Facchinei, F., Lucidi, S.: Quadratically and superlinearly convergent algorithms for the solution of inequality constrained minimization problems. J. Optim. Theory Appl. 85(2), 265–289 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  16. Barzilai, J., Borwein, J.M.: Two-point step size gradient methods. IMA J. Numer. Anal. 8(1), 141–148 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  17. De Santis, M., Lucidi, S., Rinaldi, F.: A Fast Active Set Block Coordinate Descent Algorithm for \(\ell _1\)-regularized least squares. SIAM J. Optim. 26(1), 781–809 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  18. Buchheim, C., De Santis, M., Lucidi, S., Rinaldi, F., Trieu, L.: A feasible active set method with reoptimization for convex quadratic mixed-integer programming. SIAM J. Optim. 26(3), 1695–1714 (2016)

  19. Di Pillo, G., Grippo, L.: A class of continuously differentiable exact penalty function algorithms for nonlinear programming problems. In: System Modelling and Optimization, pp. 246–256. Springer, Berlin (1984)

  20. Grippo, L., Lucidi, S.: A differentiable exact penalty function for bound constrained quadratic programming problems. Optimization 22(4), 557–578 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  21. Zhang, H., Hager, W.W.: A nonmonotone line search technique and its application to unconstrained optimization. SIAM J. Optim. 14(4), 1043–1056 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  22. Dembo, R.S., Steihaug, T.: Truncated-Newton algorithms for large-scale unconstrained optimization. Math. Program. 26(2), 190–212 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  23. Grippo, L., Lampariello, F., Lucidi, S.: A class of nonmonotone stabilization methods in unconstrained optimization. Numer. Math. 59(1), 779–805 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  24. Gould, N.I., Orban, D., Toint, P.L.: GALAHAD, a library of thread-safe Fortran 90 packages for large-scale nonlinear optimization. ACM Trans. Math. Softw. (TOMS) 29(4), 353–372 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  25. Gould, N.I., Orban, D., Toint, P.L.: CUTEst: a constrained and unconstrained testing environment with safe threads for mathematical optimization. Comput. Optim. Appl. 60(3), 545–557 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  26. Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91(2), 201–213 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  27. Birgin, E.G., Gentil, J.M.: Evaluating bound-constrained minimization software. Comput. Optim. Appl. 53(2), 347–373 (2012)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francesco Rinaldi.

Appendices

Appendices

Appendix A

Proof

(Proposition 3.2) Assume that \(\bar{x}\) satisfies (14)–(16). First, we show that

$$\begin{aligned} \bar{x}_i= & {} l_i, \quad \text {if } i \in A_l(\bar{x}), \end{aligned}$$
(28)
$$\begin{aligned} \bar{x}_i= & {} u_i, \quad \text {if } i \in A_u(\bar{x}). \end{aligned}$$
(29)

In order to prove (28), assume by contradiction that there exists an index \(i \in A_l(\bar{x})\) such that \(l_i < \bar{x} \le l_i + \epsilon \lambda _i(\bar{x})\). It follows that \(\lambda _i(\bar{x}) > 0\), and, from (12), that \(g_i(\bar{x})>0\), contradicting (14). Then, (28) holds. The same reasoning applies to prove (29).

Recalling (9), we have that \(g_i(\bar{x})>0\) for all \(i \in A_l(\bar{x})\). Combined with (28), it means that \(\bar{x}_i\) satisfies (2) for all \(i \in A_l(\bar{x})\). Similarly, since \(g_i(\bar{x})<0\) for all \(i \in A_u(\bar{x})\) and (29) holds, then \(\bar{x}_i\) satisfies (3) for all \(i \in A_u(\bar{x})\).

From (16), we also have that \(\bar{x}_i\) satisfies optimality conditions for all \(i \in N(\bar{x})\). Then, \(\bar{x}\) is a stationary point.

Now, assume that \(\bar{x}\) is a stationary point. First, we consider a generic index i such that \(\bar{x}_i = l_i\). For such an index, from (2) we get \(g_i(\bar{x})\ge 0\). If \(g_i(\bar{x})>0\), then, from (9), it follows that \(i \in A_l(\bar{x})\) and (14) is satisfied. Vice versa, if \(g_i(\bar{x})=0\), then we have that i belongs to \(N(\bar{x})\), satisfying (16). The same reasoning applies for a generic index i such that \(\bar{x}_i = u_i\).

Finally, for every index i such that \(l_i< \bar{x}_i < u_i\), from (4) we have that \(g_i(\bar{x}) = 0\). Then, \(\bar{x}\) satisfies (14)–(16). \(\square \)

Proof

(Proposition 3.3) Assume that (17) is verified. Namely,

$$\begin{aligned} \bar{x}_i= & {} l_i, \quad \text {if } i \in A_l(\bar{x}), \\ \bar{x}_i= & {} u_i, \quad \text {if } i \in A_u(\bar{x}). \end{aligned}$$

Recalling the definition of \(A_l(\bar{x})\) and \(A_u(\bar{x})\), the previous relations imply that (14) and (15) are verified. Then, from Proposition 3.2, \(\bar{x}\) is a stationary point if and only if \(g_i(\bar{x}) = 0\) for all \(i\in N(\bar{x})\). \(\square \)

Proof

(Proposition 3.4) Assume that condition (18) is verified. If we have

$$\begin{aligned} \{i \in A_l(\bar{x})~\,~:\, \bar{x}_i>l_i\}~\cup ~\{i \in A_u(\bar{x})~\,:\, \bar{x}_i<u_i\} = \emptyset , \end{aligned}$$

from the definition of \(A_l(\bar{x})\) and \(A_u(\bar{x})\) it follows that

$$\begin{aligned} \bar{x}_i = l_i \, \text{ and } \, g_i(\bar{x}) > 0, \quad&i \in A_l(\bar{x}),\\ \bar{x}_i = u_i \, \text{ and } \, g_i(\bar{x}) < 0, \quad&i \in A_u(\bar{x}). \end{aligned}$$

Then, conditions (2)–(4) are verified, and \(\bar{x}\) is a stationary point.

Conversely, if \(\bar{x}\) is a stationary point, we proceed by contradiction and assume that there exists \(\bar{x}_i \in (l_i,u_i)\) such that \(i \in A_l(\bar{x}) \cup A_u(\bar{x})\). From the definition of \(A_l(\bar{x})\) and \(A_u(\bar{x})\), it follows that \(g_i(\bar{x}) \ne 0\), violating (4) and thus contradicting the fact that \(\bar{x}\) is a stationary point. \(\square \)

Proof

(Proposition 3.5) By the second-order mean value theorem, we have

$$\begin{aligned} f(\tilde{x}) = f(x) + g(x)^T(\tilde{x}-x) + \frac{1}{2}(\tilde{x}-x)^T H(z)(\tilde{x}-x), \end{aligned}$$

where \(z = x + \xi (\tilde{x}-x)\) for a \(\xi \in ]0,1[\). Therefore,

$$\begin{aligned} f(\tilde{x}) - f(x) \le g(x)^T(\tilde{x}-x) + \frac{1}{2}\bar{\lambda }\Vert x-\tilde{x}\Vert ^2. \end{aligned}$$
(30)

Recalling the definition of \(\tilde{x}\), we can also write

$$\begin{aligned} g(x)^T(\tilde{x}-x) = \sum _{i \in A_l(x)} g_i(x)(l_i-x_i) + \sum _{i \in A_u(x)} g_i(x)(u_i-x_i). \end{aligned}$$
(31)

From the definitions of \(A_l(x)\) and \(A_u(x)\), and recalling (12) and (13), we have

$$\begin{aligned} g_i(x) \ge \frac{(x_i-l_i)}{\epsilon } \left[ \frac{(l_i-x_i)^2 + (u_i-x_i)^2}{(u_i-x_i)^2} \right] , \quad&i \in A_l(x), \\ g_i(x) \le \frac{(x_i-u_i)}{\epsilon } \left[ \frac{(l_i-x_i)^2 + (u_i-x_i)^2}{(l_i-x_i)^2} \right] , \quad&i \in A_u(x), \end{aligned}$$

and we can write

$$\begin{aligned}&g_i(x)(l_i-x_i) \le -\frac{1}{\epsilon }(x_i-l_i)^2\left[ \frac{(l_i-x_i)^2 + (u_i-x_i)^2}{(u_i-x_i)^2}\right] \\&\quad \le -\frac{1}{\epsilon }(l_i-x_i)^2, \quad i \in A_l(x), \\&g_i(x)(u_i-x_i) \le -\frac{1}{\epsilon }(u_i-x_i)^2\left[ \frac{(l_i-x_i)^2 + (u_i-x_i)^2}{(u_i-x_i)^2}\right] \\&\quad \le -\frac{1}{\epsilon }(u_i-x_i)^2, \quad i \in A_u(x). \end{aligned}$$

Hence, from (31), it follows that

$$\begin{aligned} g(x)^T(\tilde{x}-x) \le -\frac{1}{\epsilon }\left[ \sum _{i\in A_l(x)}(l_i-x_i)^2 + \sum _{i\in A_u(x)}(u_i-x_i)^2\right] = -\frac{1}{\epsilon }\Vert x-\tilde{x}\Vert ^2. \end{aligned}$$
(32)

Finally, from (30) and (32), we have

$$\begin{aligned} f(\tilde{x})-f(x) \le \frac{1}{2}\left( \bar{\lambda }-\frac{1}{\epsilon }\right) \Vert x-\tilde{x}\Vert ^2 - \frac{1}{2\epsilon }\Vert x-\tilde{x}\Vert ^2 \le -\frac{1}{2\epsilon }\Vert x-\tilde{x}\Vert ^2, \end{aligned}$$

where the last inequality follows from equation (19) in Assumption 3.1. \(\square \)

Proof

(Proposition 3.6) Since the gradient is Lipschitz continuous over [lu], there exists \(L<\infty \) such that for all \(s\in [0,1]\) and for all \(\alpha \ge 0\):

$$\begin{aligned} \Vert g(\bar{x})-g(\bar{x}-s[\bar{x}-\bar{x}(\alpha )])\Vert \le s L \Vert \bar{x}-\bar{x}(\alpha )\Vert , \quad \forall \bar{x} \in [l,u]. \end{aligned}$$

By the mean value theorem, we have:

$$\begin{aligned} f(\bar{x}(\alpha ))-f(\bar{x})&= g(\bar{x})^\mathrm{T}(\bar{x}(\alpha ) -x) + \int _0^1\big (g(\bar{x}-s[\bar{x} - \bar{x}(\alpha )])-g(\bar{x})\big )^\mathrm{T} \big (\bar{x}(\alpha ) - \bar{x}\big )\ \mathrm{d}s\nonumber \\&\le g(\bar{x})^\mathrm{T}(\bar{x}(\alpha ) - \bar{x}) +\Vert \bar{x}(\alpha ) - \bar{x}\Vert \int _{0}^{1}s L \Vert \bar{x}(\alpha ) - \bar{x}\Vert \ \mathrm{d}s\nonumber \\&= g(\bar{x})^\mathrm{T}(\bar{x}(\alpha ) - \bar{x}) + \frac{L}{2}\Vert \bar{x}(\alpha ) - \bar{x}\Vert ^{2}, \quad \forall \alpha \ge 0. \end{aligned}$$
(33)

Moreover, as the gradient is continuous and the feasible set is compact, there exists \(M > 0\) such that

$$\begin{aligned} \Vert g(\bar{x})\Vert \le M, \quad \forall \bar{x} \in [l,u]. \end{aligned}$$
(34)

From (20), (22) and (34), we can write

$$\begin{aligned} d_i \le \Vert d\Vert \le \sigma _2\Vert g(\bar{x})\Vert \le \sigma _2 M, \quad \forall \bar{x} \in [l,u], \quad \forall i=1,\dots ,n. \end{aligned}$$

Now, let us define \(\theta _1,\dots ,\theta _n\) as:

$$\begin{aligned} \theta _i := \left\{ \begin{array}{l@{\quad }l} \min \ \{\bar{x}_i - l_i, u_i - \bar{x}_i\} &{}\text { if } l_i< \bar{x}_i < u_i, \\ u_i - l_i &{}\text { otherwise}, \end{array}\right. \qquad i=1,\ldots ,n. \end{aligned}$$

We set

$$\begin{aligned} \tilde{\theta } := \min _{i=1,\dots ,n} \frac{\theta _i}{2}, \end{aligned}$$

and define \(\hat{\alpha }\) as follows:

$$\begin{aligned} \hat{\alpha }:= \frac{\tilde{\theta }}{\sigma _2 M}. \end{aligned}$$

In the following, we want to majorize the right-hand-side term of (33). First, we consider the term \(g(\bar{x})^\mathrm{T}(\bar{x}(\alpha ) - x)\). We distinguish three cases:

  1. (i)

    \(i \in N(\bar{x})\) such that \(l_i< \bar{x}_i < u_i\). We distinguish two subcases:

    • if \(d_i \ge 0\):

      $$\begin{aligned} l_i< \bar{x}_i + \alpha d_i \le \bar{x}_i + \frac{\tilde{\theta }}{\sigma _2 M} d_i \le x_i + \tilde{\theta }< u_i, \qquad \forall \alpha \in ]0,\hat{\alpha }], \end{aligned}$$
    • else, if \(d_i < 0\):

      $$\begin{aligned} u_i> \bar{x}_i + \alpha d_i \ge \bar{x}_i + \frac{\tilde{\theta }}{\sigma _2 M} d_i \ge \bar{x}_i - \tilde{\theta } > l_i, \qquad \forall \alpha \in ]0,\hat{\alpha }]. \end{aligned}$$

    So, we have

    $$\begin{aligned} \bar{x}_i(\alpha ) = \bar{x}_i + \alpha d_i, \quad \forall \alpha \in ]0,\hat{\alpha }], \end{aligned}$$

    which implies

    $$\begin{aligned} g_i(\bar{x})(\bar{x}_i(\alpha ) - \bar{x}_i) = \alpha g_i(\bar{x})d_i, \quad \forall \alpha \in ]0,\hat{\alpha }]. \end{aligned}$$
    (35)
  2. (ii)

    \(i \in N(\bar{x})\) such that \(\bar{x}_i = l_i\). Recalling the definition of N(x), it follows that \(g_i(\bar{x}) \le 0\). We distinguish two subcases:

    • if \(d_i \ge 0\):

      $$\begin{aligned} l_i \le \bar{x}_i + \alpha d_i \le \bar{x}_i + \frac{\tilde{\theta }}{\sigma _2 M} d_i \le \bar{x}_i + \tilde{\theta } < u_i, \qquad \forall \alpha \in ]0,\hat{\alpha }], \end{aligned}$$

      and then

      $$\begin{aligned} \bar{x}_i(\alpha ) = \bar{x}_i + \alpha d_i, \quad \forall \alpha \in ]0,\hat{\alpha }], \end{aligned}$$

      which implies

      $$\begin{aligned} g_i(\bar{x})(\bar{x}_i(\alpha ) - \bar{x}_i) = \alpha g_i(\bar{x})d_i, \quad \forall \alpha \in ]0,\hat{\alpha }]. \end{aligned}$$
      (36)
    • else, if \(d_i < 0\), we have

      $$\begin{aligned} \bar{x}_i(\alpha ) = \bar{x}_i, \quad \forall \alpha > 0, \end{aligned}$$

      and then

      $$\begin{aligned} 0 = g_i(\bar{x})(\bar{x}_i(\alpha ) - \bar{x}_i) \le \alpha g_i(\bar{x})d_i, \quad \forall \alpha > 0. \end{aligned}$$
      (37)
  3. (iii)

    \(i \in N(\bar{x})\) such that \(\bar{x}_i = u_i\). Following the same reasonings done in the previous step, we have that

    • if \(d_i \le 0\):

      $$\begin{aligned} g_i(\bar{x})(\bar{x}_i(\alpha ) - \bar{x}_i) = \alpha g_i(\bar{x})d_i, \quad \forall \alpha \in ]0,\hat{\alpha }]; \end{aligned}$$
      (38)
    • else, if \(d_i > 0\), we have

      $$\begin{aligned} 0 = g_i(\bar{x})(\bar{x}_i(\alpha ) - \bar{x}_i) \le \alpha g_i(\bar{x})d_i, \quad \forall \alpha > 0. \end{aligned}$$
      (39)

From (20), (35), (36), (37), (38) and (39), we obtain

$$\begin{aligned} g(\bar{x})^\mathrm{T}(\bar{x}(\alpha ) - \bar{x})&= \sum _{i \in N(\bar{x})} g_i(\bar{x})(\bar{x}_i(\alpha ) - \bar{x}) \nonumber \\&\le \alpha \sum _{i \in N(\bar{x})} g_i(\bar{x})d_i = \alpha g_{N(\bar{x})}(\bar{x})^\mathrm{T}d_{N(\bar{x})}, \quad \forall \alpha \in ]0,\hat{\alpha }]. \end{aligned}$$
(40)

Now, we consider the term \(\displaystyle \frac{L}{2}\Vert \bar{x}(\alpha ) - \bar{x}\Vert ^{2}\). For every \(i \in N(\bar{x})\) such that \(d_i \le 0\), we have that \(0 \le \bar{x}_i - \bar{x}_i(\alpha )\le -\alpha d_i\) holds for all \(\alpha >0\). Therefore,

$$\begin{aligned} (\bar{x}_i - \bar{x}_i(\alpha ))^2\le \alpha ^2 d_i^2, \quad \forall \alpha > 0. \end{aligned}$$
(41)

Else, for every \(i \in N(\bar{x})\) such that \(d_i > 0\), we have that \(0 \le \bar{x}_i(\alpha ) - \bar{x}_i\le \alpha d_i\) holds for all \(\alpha > 0\). Therefore,

$$\begin{aligned} 0 \le (\bar{x}_i(\alpha ) - \bar{x}_i)^2\le \alpha ^2 d_i^2, \quad \forall \alpha > 0. \end{aligned}$$
(42)

Recalling (20), from (41) and (42) we obtain

$$\begin{aligned} \Vert \bar{x}(\alpha ) - \bar{x}\Vert ^2 \le \alpha ^2 \Vert d_{N(\bar{x})}\Vert ^2, \quad \forall \alpha > 0. \end{aligned}$$

Using (21) and (22), we get

$$\begin{aligned} \Vert \bar{x}(\alpha ) - \bar{x}\Vert ^2&\le \alpha ^2 \Vert d_{N(\bar{x})}\Vert ^2 \le \alpha ^2 \sigma _2^2\Vert g_{N(\bar{x})}(\bar{x})\Vert ^2 \nonumber \\&\le -\alpha ^2 \frac{\sigma _2^2}{\sigma _1} g_{N(\bar{x})}(\bar{x})^\mathrm{T}d_{N(\bar{x})}, \quad \forall \alpha > 0. \end{aligned}$$
(43)

From (20), (33), (40) and (43), we can write

$$\begin{aligned} f(\bar{x}(\alpha ))-f(\bar{x})&\le \alpha \left( 1 - \alpha \frac{L\sigma ^2_2}{2\sigma _1}\right) g_{N(\bar{x})}(\bar{x})^\mathrm{T} d_{N(\bar{x})} \\&= \alpha \left( 1 - \alpha \frac{L\sigma ^2_2}{2\sigma _1}\right) g(\bar{x})^\mathrm{T} d, \quad \forall \alpha \in ]0,\hat{\alpha }]. \end{aligned}$$

It follows that (23) is satisfied by choosing \(\bar{\alpha }\) such that

$$\begin{aligned}&1 - \bar{\alpha }\frac{L\sigma ^2_2}{2\sigma _1} \ge \gamma , \\&\quad \bar{\alpha } \in ]0,\hat{\alpha }]. \end{aligned}$$

Thus, the proof is completed defining

$$\begin{aligned} \bar{\alpha } := \min \left\{ \hat{\alpha },\frac{2\sigma _1(1-\gamma )}{L\sigma ^2_2}\right\} . \end{aligned}$$

\(\square \)

Appendix B

The scheme of the algorithm is reported in Algorithm 1. At Step 10, 17 and 25 there is the update of the reference value of the non-monotone line search \(f^j_R\): we set \(j := j+1\), \(l^j := k\) and the reference value is updated according to the formula

$$\begin{aligned} f^j_R := \max _{0 \le i \le \min \{j,M\}} \bigl \{f^{j-i}\bigr \}. \end{aligned}$$
figure a

Appendix C

In this section, we prove Theorem 4.1. Preliminarily, we need to state some results.

Lemma C.1

Let Assumption 3.1 hold. Suppose that ASA-BCP produces an infinite sequence \(\{x^k\}\), then

  1. (i)

    \(\{f^j_R\}\) is non-increasing and converges to a value \(\bar{f}_R\);

  2. (ii)

    for any fixed \(j \ge 0\) we have:

    $$\begin{aligned} f^h_R < f^j_R, \quad \forall h > j+M. \end{aligned}$$

Proof

The proof follows from Lemma 1 in [23]. \(\square \)

Lemma C.2

Let Assumption 3.1 hold. Suppose that ASA-BCP produces an infinite sequence \(\{x^k\}\) and an infinite sequence \(\{\tilde{x}^k\}\). For any given value of k, let q(k) be the index such that

$$\begin{aligned} q(k) := \max \{ j :l^j \le k \}. \end{aligned}$$

Then, there exists a sequence \(\{\tilde{x}^{s(j)}\}\) and an integer L satisfying the following conditions:

  1. (i)

    \(f^j_R = f(\tilde{x}^{s(j)})\)

  2. (ii)

    for any integer k, there exist an index \(h^k\) and an index \(j^k\) such that:

    $$\begin{aligned}&0< h^k - k \le L, \qquad h^k = s(j^k), \\&\quad f^{j^k}_R = f(\tilde{x}^{h^k}) < f^{q(k)}_R. \end{aligned}$$

Proof

The proof follows from Lemma 2 in [23] taking into account that for any iteration index k, there exists an integer \(\tilde{L}\) such that the condition of Step 9 is satisfied within the \((k+\tilde{L})\)th iteration. In fact, assume by contradiction that it is not true. If Step 9 is not satisfied at a generic iteration k, then \(x^{k+1} = \tilde{x}^k\). Since the sequences \(\{x^k\}\) and \(\{\tilde{x}^k\}\) are infinite, Proposition 3.4 implies that \(\tilde{x}^{k+1} \ne x^{k+1}\) and that the objective function strictly decreases. Repeating this procedure for an infinite number of steps, an infinite sequence of distinct points \(\{x^{k+1}, x^{k+2}, \dots \}\) is produced, where these points differ from each other only for the values of the variables at the bounds. Since the number of variables is finite, this produces a contradiction. \(\square \)

Lemma C.3

Let Assumption 3.1 hold. Suppose that ASA-BCP produces an infinite sequence \(\{x^k\}\) and an infinite sequence \(\{\tilde{x}^k\}\). Then,

$$\begin{aligned}&\lim _{k \rightarrow \infty } f(x^{k}) = \lim _{k \rightarrow \infty } f(\tilde{x}^{k}) = \lim _{j \rightarrow \infty } f^j_R = \bar{f}_R, \end{aligned}$$
(44)
$$\begin{aligned}&\lim _{k \rightarrow \infty } \Vert x^{k+1} - \tilde{x}^k \Vert = \lim _{k \rightarrow \infty } \alpha ^k \Vert d^k\Vert = 0, \end{aligned}$$
(45)
$$\begin{aligned}&\lim _{k \rightarrow \infty } \Vert \tilde{x}^k - x^k \Vert = 0. \end{aligned}$$
(46)

Proof

We build two different partitions of the iterations indices to analyze the computation of \(x^{k+1}\) from \(\tilde{x}^k\) and that of \(\tilde{x}^k\) from \(x^k\), respectively. From the instructions of the algorithm, it follows that \(x^{k+1}\) can be computed at Step 21, Step 29 or Step 31. Let us consider the following subset of iteration indices:

$$\begin{aligned} \mathcal K_1 := & {} \{k :x^{k+1} \text { is computed at Step } 21\}, \\ \mathcal K_2 := & {} \{k :x^{k+1} \text { is computed at Step } 29\}, \\ \mathcal K_3 := & {} \{k :x^{k+1} \text { is computed at Step } 31\}. \end{aligned}$$

Then, we have

$$\begin{aligned} \mathcal K_1 \cup \mathcal K_2 \cup \mathcal K_3 = \{0,1,\dots \}. \end{aligned}$$

As regards the computation of \(\tilde{x}^k\), we distinguish two further subsets of iterations indices:

$$\begin{aligned} \mathcal K_4 := \{k :\tilde{x}^k \text { satisfies the test at Step } 4\}, \\ \mathcal K_5 := \{k :\tilde{x}^k \text { does not satisfy the test at Step } 4\}. \end{aligned}$$

Then, we have

$$\begin{aligned} \mathcal K_4 \cup \mathcal K_5 = \{0,1,\dots \}. \end{aligned}$$

Preliminarily, we point out some properties of the above subsequences. The subsequence \(\{\tilde{x}^k\}_{\mathcal K_1}\) satisfies

$$\begin{aligned} \Vert x^{k+1} - \tilde{x}^k \Vert = \alpha ^k\Vert d^k\Vert = \Vert d^k\Vert \le \beta ^t\varDelta _0, \quad k \in \mathcal K_1, \end{aligned}$$

where the integer t increases with \(k \in \mathcal K_1\). Since \(\beta \in ]0,1)\), if \(\mathcal K_1\) is infinite, we have

$$\begin{aligned} \lim _{k \rightarrow \infty , \, k \in \mathcal K_1} \Vert x^{k+1} - \tilde{x}^k \Vert = \lim _{k \rightarrow \infty , \, k \in \mathcal K_1} \alpha ^k \Vert d^k \Vert = 0. \end{aligned}$$
(47)

Moreover, since \(\alpha ^k = 0\) and \(d^k = 0\) for all \(k \in \mathcal K_3\), if \(\mathcal K_3\) is infinite, we have

$$\begin{aligned} \lim _{k \rightarrow \infty , \, k \in \mathcal K_3} \Vert x^{k+1} - \tilde{x}^k \Vert = \lim _{k \rightarrow \infty , \, k \in \mathcal K_3} \alpha ^k \Vert d^k \Vert = 0. \end{aligned}$$
(48)

The subsequence \(\{\tilde{x}^k\}_{\mathcal K_4}\) satisfies

$$\begin{aligned} \Vert \tilde{x}^k - x^k\Vert \le \beta ^t\tilde{\varDelta }_0, \quad k \in \mathcal K_4, \end{aligned}$$

where the integer t increases with \(k \in \mathcal K_4\). Since \(\beta \in ]0,1[\), if \(\mathcal K_4\) is infinite, we have

$$\begin{aligned} \lim _{k \rightarrow \infty , \, k \in \mathcal K_4} \Vert \tilde{x}^k - x^k\Vert = 0. \end{aligned}$$
(49)

Now we prove (44). Let s(j), \(h^k\) and q(k) be the indices defined in Lemma C.2. We show that for any fixed integer \(i \ge 1\), the following relations hold:

$$\begin{aligned}&\lim _{j \rightarrow \infty } \Vert \tilde{x}^{s(j)-i+1} - x^{s(j)-i+1} \Vert = 0, \end{aligned}$$
(50)
$$\begin{aligned}&\lim _{j \rightarrow \infty } \Vert x^{s(j)-i+1} - \tilde{x}^{s(j)-i} \Vert = \lim _{j \rightarrow \infty } \alpha ^{s(j)-i} \Vert d^{s(j)-i} \Vert = 0, \end{aligned}$$
(51)
$$\begin{aligned}&\lim _{j \rightarrow \infty } f(x^{s(j)-i+1}) = \bar{f}_R, \end{aligned}$$
(52)
$$\begin{aligned}&\lim _{j \rightarrow \infty } f(\tilde{x}^{s(j)-i}) = \bar{f}_R. \end{aligned}$$
(53)

Without loss of generality, we assume that j is large enough to avoid the occurrence of negative apices. We proceed by induction and first show that (50)–(53) hold for \(i=1\). If \(s(j) \in \mathcal K_4\), relations (50) and (52) follow from (49) and the continuity of the objective function. If \(s(j) \in \mathcal K_5\), from the instructions of the algorithm and taking into account Proposition 3.5, we get

$$\begin{aligned} f^j_R = f(\tilde{x}^{s(j)}) \le f(x^{s(j)}) - \dfrac{1}{2\epsilon }\Vert x^{s(j)} - \tilde{x}^{s(j)}\Vert ^2 < f^{j-1}_R, \end{aligned}$$

from which we get

$$\begin{aligned} f^j_R = f(\tilde{x}^{s(j)}) \le f(x^{s(j)}) < f^{j-1}_R, \end{aligned}$$

and then, from point (i) of Lemma C.1, it follows that

$$\begin{aligned} \lim _{j \rightarrow \infty } f(\tilde{x}^{s(j)}) = \lim _{j \rightarrow \infty } f(x^{s(j)}) = \bar{f}_R, \end{aligned}$$
(54)

which proves (52) for \(i=1\). From the above relation, and by exploiting Proposition 3.5 again, we have that

$$\begin{aligned} \lim _{j \rightarrow \infty } \bigl ( f(\tilde{x}^{s(j)}) - f(x^{s(j)}) \bigr ) \le \lim _{j \rightarrow \infty } - \dfrac{1}{2\epsilon }\Vert x^{s(j)} - \tilde{x}^{s(j)}\Vert ^2. \end{aligned}$$

and then (50) holds for \(i=1\).

If \(s(j)-1 \in \mathcal K_1 \cup \mathcal K_3\), from (47) and (48) it is straightforward to verify that (51) holds for \(i=1\). By exploiting the continuity of the objective function, since (51) and (52) hold for \(i=1\), then also (53) is verified for \(i=1\). If \(s(j)-1 \in \mathcal {K}_2\), from the instruction of the algorithm, we obtain

$$\begin{aligned} f(x^{s(j)}) = f(\tilde{x}^{s(j)-1} + \alpha ^{s(j)-1} d^{s(j)-1}) \le f^{q(s(j)-1)}_R + \gamma \alpha ^{s(j)-1} g(\tilde{x}^{s(j)-1})^\mathrm{T} d^{s(j)-1}, \end{aligned}$$

and then

$$\begin{aligned} f(x^{s(j)}) - f^{q(s(j)-1)}_R \le \gamma \alpha ^{s(j)-1} g(\tilde{x}^{s(j)-1})^\mathrm{T} d^{s(j)-1}. \end{aligned}$$

From (54), point (i) of Lemma C.1, and recalling (20)–(22), we have that

$$\begin{aligned} \lim _{j \rightarrow \infty } \alpha ^{s(j)-1} \Vert d^{s(j)-1}\Vert = \lim _{j \rightarrow \infty } \Vert x^{s(j)} - \tilde{x}^{s(j)-1} \Vert = 0 \end{aligned}$$

for every subsequence such that \(s(j)-1 \in \mathcal K_2\). Therefore, (51) holds for \(i=1\). Recalling that \(f(x^{s(j)}) = f(\tilde{x}^{s(j)-1} + \alpha ^{s(j)-1} d^{s(j)-1})\), and since (50) and (51) hold for \(i=1\), from the continuity of the objective function it follows that also (53) holds for \(i=1\).

Now we assume that (50)–(53) hold for a given fixed \(i \ge 1\) and show that these relations must hold for \(i + 1\) as well. If \(s(j) - i \in \mathcal K_4\), by using (49), it is straightforward to verify that (50) is verified replacing i with \(i+1\). Taking into account (53), this implies that

$$\begin{aligned} \lim _{j \rightarrow \infty } f(x^{s(j) - (i+1) +1}) = \lim _{j \rightarrow \infty } f(x^{s(j) - i}) = \lim _{j \rightarrow \infty } f(\tilde{x}^{s(j)-i}) = \bar{f}_R, \end{aligned}$$

and then (52) holds for \(i+1\). If \(s(j) - i \in \mathcal K_5\), from the instructions of the algorithm, and taking into account Proposition 3.5, we get

$$\begin{aligned} f(\tilde{x}^{s(j)-i}) \le f(x^{s(j)-i}) - \dfrac{1}{2\epsilon }\Vert x^{s(j)-i} - \tilde{x}^{s(j)-i}\Vert ^2 < f^{q(s(j)-i)-1}_R. \end{aligned}$$

Exploiting (53) and point (i) of Lemma C.1, we have that

$$\begin{aligned} \lim _{j \rightarrow \infty } f(\tilde{x}^{s(j)-i}) = \lim _{j \rightarrow \infty } f(x^{s(j)-i}) = \bar{f}_R, \end{aligned}$$
(55)

which proves (52) for \(i+1\). From the above relation, and by exploiting Proposition 3.5 again, we can also write

$$\begin{aligned} \lim _{j \rightarrow \infty } \bigl ( f(\tilde{x}^{s(j)-i}) - f(x^{s(j)}-i) \bigr ) \le \lim _{j \rightarrow \infty } - \dfrac{1}{2\epsilon }\Vert x^{s(j)-i} - \tilde{x}^{s(j)-i}\Vert ^2. \end{aligned}$$

and then (50) holds for \(i+1\).

If \(s(j)-i-1 \in \mathcal K_1 \cup \mathcal K_3\), from (47) and (48), we obtain that (51) holds for \(i=i+1\). Since \(x^{s(j)-i} = \tilde{x}^{s(j)-i-1} + \alpha ^{s(j)-i-1} d^{s(j)-i-1}\), exploiting (47), (48), (55) and the continuity of the objective function, we obtain that (53) holds replacing i with \(i+1\).

If \(s(j)-i-1 \in \mathcal {K}_2\), from the instruction of the algorithm, we obtain

$$\begin{aligned} f(x^{s(j)-i})= & {} f(\tilde{x}^{s(j)-i-1} + \alpha ^{s(j)-i-1} d^{s(j)-i-1}) \\\le & {} f^{q(s(j)-i-1)}_R + \gamma \alpha ^{s(j)-i-1} g(\tilde{x}^{s(j)-i-1})^\mathrm{T} d^{s(j)-i-1}, \end{aligned}$$

and then

$$\begin{aligned} f(x^{s(j)-i}) - f^{q(s(j)-i-1)}_R \le \gamma \alpha ^{s(j)-i-1} g(\tilde{x}^{s(j)-i-1})^\mathrm{T} d^{s(j)-i-1}. \end{aligned}$$

From (55), point (i) of Lemma C.1, and recalling (20)–(22), we have that

$$\begin{aligned} \lim _{j \rightarrow \infty } \alpha ^{s(j)-i-1} \Vert d^{s(j)-i-1}\Vert = \lim _{j \rightarrow \infty } \Vert x^{s(j)-i} - \tilde{x}^{s(j)-i-1} \Vert = 0 \end{aligned}$$

for every subsequence such that \(s(j)-1 \in \mathcal K_2\). Therefore, (51) holds for \(i+1\).

Recalling that \(f(x^{s(j)-i}) = f(\tilde{x}^{s(j)-i-1} + \alpha ^{s(j)-i-1} d^{s(j)-i-1})\), and since (50) and (51) hold replacing i with \(i+1\), exploiting the continuity of the objective function, we have

$$\begin{aligned} \lim _{j \rightarrow \infty } f(\tilde{x}^{s(j)-i-1} + \alpha ^{s(j)-i-1} d^{s(j)-i-1}) = \lim _{j \rightarrow \infty } f(x^{s(j)-i}) = \lim _{j \rightarrow \infty } f(\tilde{x}^{s(j)-i}). \end{aligned}$$

Therefore, if (53) holds at a generic \(i \ge 1\), it must hold for \(i+1\) as well. This completes the induction.

Now, for any iteration index \(k>0\), we can write

$$\begin{aligned} \tilde{x}^{h^k}= & {} x^k + \sum _{i=0}^{h^k - k} \bigl (\tilde{x}^{h^k-i} - x^{h^k-i}\bigr ) + \sum _{i=1}^{h^k - k} \alpha ^{h^k-i} d^{h^k-i},\\ \tilde{x}^{h^k}= & {} \tilde{x}^k + \sum _{i=0}^{h^k - k - 1} \bigl (\tilde{x}^{h^k-i} - x^{h^k-i}\bigr ) + \sum _{i=1}^{h^k - k} \alpha ^{h^k-i} d^{h^k-i}. \end{aligned}$$

From (50) and (51), we obtain

$$\begin{aligned} \lim _{k \rightarrow \infty } \Vert x^k - \tilde{x}^{h^k}\Vert = \lim _{k \rightarrow \infty } \Vert \tilde{x}^k - \tilde{x}^{h^k}\Vert = 0. \end{aligned}$$

By exploiting the continuity of the objective function, and taking into account point (i) of Lemma C.1, the above relation implies that

$$\begin{aligned} \lim _{k \rightarrow \infty } f(x^k) = \lim _{k \rightarrow \infty } f(\tilde{x}^k) = \lim _{k \rightarrow \infty } f(\tilde{x}^{h^k}) = \lim _{k \rightarrow \infty } f^j_R = \bar{f}_R, \end{aligned}$$

which proves (44).

To prove (45), if \(k \in \mathcal K_1 \cup \mathcal K_3\), then from (47) and (48) we obtain

$$\begin{aligned} \lim _{k \rightarrow \infty , \, k \in \mathcal K_1 \cup \mathcal K_3} \Vert x^{k+1} - \tilde{x}^k \Vert = \lim _{k \rightarrow \infty , \, k \in \mathcal K_1 \cup \mathcal K_3} \alpha ^k \Vert d^k \Vert = 0. \end{aligned}$$
(56)

If \(k \in \mathcal K_2\), from the instruction of the algorithm, we get

$$\begin{aligned} f(x^{k+1}) \le f(\tilde{x}^{q(k)}) + \gamma \alpha ^k g(\tilde{x}^k)^\mathrm{T} d^k, \end{aligned}$$

and then, recalling conditions (20)–(22) and (44), we can write

$$\begin{aligned} \lim _{k \rightarrow \infty , \, k \in \mathcal K_2} \Vert x^{k+1} - \tilde{x}^k \Vert = \lim _{k \rightarrow \infty , \, k \in \mathcal K_2} \alpha ^k \Vert d^k\Vert = 0. \end{aligned}$$
(57)

From (56) and (57), it follows that (45) holds.

To prove (46), if \(k \in \mathcal K_4\), then from (49) we obtain

$$\begin{aligned} \lim _{k \rightarrow \infty , \, k \in \mathcal K_4} \Vert \tilde{x}^k - x^k \Vert = 0. \end{aligned}$$
(58)

If \(k \in \mathcal K_5\), from the instruction of the algorithm and recalling Proposition 3.5, we get

$$\begin{aligned} f(\tilde{x}^k) \le f(x^k) - \dfrac{1}{2\epsilon }\Vert x^k - \tilde{x}^k\Vert ^2 < f^{q(k)-1}_R. \end{aligned}$$

From (44) and point (i) of Lemma C.1, we have that

$$\begin{aligned} \lim _{k \rightarrow \infty , \, k \in \mathcal K_5} f(\tilde{x}^k) = \lim _{k \rightarrow \infty , \, k \in \mathcal K_5} f(x^k) = \bar{f}_R. \end{aligned}$$

By exploiting Proposition 3.5 again, we can write

$$\begin{aligned} \lim _{k \rightarrow \infty , \, k \in \mathcal K_5} \bigl ( f(\tilde{x}^k) - f(x^k) \bigr ) \le \lim _{k \rightarrow \infty , \, k \in \mathcal K_5} - \dfrac{1}{2\epsilon }\Vert x^k - \tilde{x}^k\Vert ^2 = 0, \end{aligned}$$

and then

$$\begin{aligned} \lim _{k \rightarrow \infty , \, k \in \mathcal K_5} \Vert \tilde{x}^k - x^k \Vert = 0. \end{aligned}$$
(59)

From (58) and (59), it follows that (46) holds. \(\square \)

The following theorem extends a known result from unconstrained optimization, guaranteeing that the sequence of the directional derivatives along the search direction converges to zero.

Theorem C.1

Let Assumption 3.1 hold. Assume that ASA-BCP does not terminate in a finite number of iterations, and let \(\{x^k\}\), \(\{\tilde{x}^k\}\) and \(\{d^k\}\) be the sequences produced by the algorithm. Then,

$$\begin{aligned} \lim _{k\rightarrow \infty } g(\tilde{x}^k)^\mathrm{T}d^k = 0. \end{aligned}$$
(60)

Proof

We can identify two iteration index subsets \(H,K \subseteq \{1,2,\dots \}\), such that:

  • \(N(\tilde{x}^k) \ne \emptyset \) and \(g_N(\tilde{x}^k) \ne 0\), for all \(k \in K\),

  • \(H := \{1,2,\dots \} \setminus K\).

By assumption, the algorithm does not terminate in a finite number of iterations, and then, at least one of the above sets is infinite. Since we are interested in the asymptotic behavior of the sequence produced by ASA-BCP, we assume without loss of generality that both H and K are infinite sets.

Taking into account Step 31 in Algorithm 1, it is straightforward to verify that

$$\begin{aligned} \lim _{k\rightarrow \infty , \, k\in H} g(\tilde{x}^k)^\mathrm{T}d^k = 0. \end{aligned}$$

Therefore, we limit our analysis to consider the subsequence \(\{x^k\}_K\). Let \(\bar{x}\) be any limit point of \(\{x^k\}_K\). By contradiction, we assume that (60) does not hold. Using (46) of Lemma C.3, since \(\{x^k\}\), \(\{\tilde{x}^k\}\) and \(\{d^k\}\) are limited, and taking into account that \(A_l(x^k)\), \(A_u(x^k)\) and \(N(x^k)\) are subsets of a finite set of indices, without loss of generality we redefine \(\{x^k\}_K \) the subsequence such that

$$\begin{aligned} \lim _{k\rightarrow \infty ,\, k\in K}x^k = \lim _{k\rightarrow \infty ,\, k\in K}\tilde{x}^k = \bar{x}, \end{aligned}$$

and

$$\begin{aligned}&N^k := \hat{N},\quad A_l^k := \hat{A}_l,\quad A_u^k := \hat{A}_u,\quad \forall k \in K, \\&\quad \lim _{k\rightarrow \infty , \, k\in K} d^k = \hat{d}. \end{aligned}$$

Since we have assumed that (60) does not hold, the above relations, combined with (21) and the continuity of the gradient, imply that

$$\begin{aligned} \lim _{k\rightarrow \infty , \, k\in K} g(\tilde{x}^k)^\mathrm{T}d^k = g(\bar{x})^\mathrm{T} \hat{d} = -\eta < 0. \end{aligned}$$
(61)

It follows that

$$\begin{aligned} \lim _{k\rightarrow \infty , \, k\in K} \hat{d} \ne 0, \end{aligned}$$

and then, recalling (45) of Lemma C.3, we get

$$\begin{aligned} \lim _{k\rightarrow \infty , \, k\in K} \alpha ^k = 0. \end{aligned}$$
(62)

Consequently, from the instructions of the algorithm, there must exist a subsequence (renamed K again) such that the line search procedure at Step 28 is performed and \(\alpha ^k < 1\) for sufficiently large k. Namely,

$$\begin{aligned} f\bigl (\bigl [\tilde{x}^k + \frac{\alpha ^k}{\delta }d^k\bigr ]^{\sharp }\bigr )&> f^{q(k)}_R + \gamma \frac{\alpha ^k}{\delta } g(\tilde{x}^k)^\mathrm{T}d^k \nonumber \\&\ge f(\tilde{x}^k) + \gamma \frac{\alpha ^k}{\delta } g(\tilde{x}^k)^\mathrm{T}d^k, \quad \forall k \ge \bar{k}, \, k \in K, \end{aligned}$$
(63)

where \(q(k) := \max \{ j :l^j \le k \}\). We can write the point \([\tilde{x}^k + \frac{\alpha ^k}{\delta }d^k]^{\sharp }\) as follows:

$$\begin{aligned} \bigl [\tilde{x}^k + \frac{\alpha ^k}{\delta }d^k\bigl ]^{\sharp } = \tilde{x}^k + \frac{\alpha ^k}{\delta }d^k - y^k, \end{aligned}$$
(64)

where

$$\begin{aligned} y^k_i := \max \bigl \{0, \bigl (\tilde{x}^k + \frac{\alpha ^k}{\delta }d^k\bigr )_i - u_i\bigr \} - \max \bigl \{0,l_i-\bigl (\tilde{x}^k + \frac{\alpha ^k}{\delta }d^k\bigr )_i\bigr \}, \quad i=1,\dots ,n. \end{aligned}$$

As \(\{\tilde{x}^k\}\) is a sequence of feasible points, \(\{\alpha ^k\}\) converges to zero and \(\{d^k\}\) is limited, we get

$$\begin{aligned} \lim _{k\rightarrow \infty , \, k\in K} y^k = 0. \end{aligned}$$
(65)

From (63) and (64), we can write

$$\begin{aligned} f\bigl (\tilde{x}^k + \frac{\alpha ^k}{\delta }d^k - y^k\bigr ) - f(\tilde{x}^k) > \gamma \frac{\alpha ^k}{\delta } g(\tilde{x}^k)^\mathrm{T}d^k, \quad \forall k \ge \bar{k}, \, k \in K. \end{aligned}$$
(66)

By the mean value theorem, we have

$$\begin{aligned} f\bigl (\tilde{x}^k + \frac{\alpha ^k}{\delta }d^k - y^k\bigr ) = f(\tilde{x}^k) + \frac{\alpha ^k}{\delta }g(z^k)^\mathrm{T}d^k - g(z^k)^\mathrm{T}y^k, \end{aligned}$$
(67)

where

$$\begin{aligned} z^k = \tilde{x}^k + \theta ^k\bigl (\frac{\alpha ^k}{\delta }d^k - y^k\bigr ), \quad \theta ^k \in ]0,1[. \end{aligned}$$
(68)

From (62) and (65), and since \(\{d^k\}\) is limited, we obtain

$$\begin{aligned} \lim _{k\rightarrow \infty , \, k\in K} z^k = \bar{x}. \end{aligned}$$
(69)

Substituting (67) into (66), and multiplying each term by \(\dfrac{\delta }{\alpha ^k}\), we get

$$\begin{aligned} g(z^k)^\mathrm{T}d^k - \frac{\delta }{\alpha ^k}g(z^k)^\mathrm{T}y^k > \gamma g(\tilde{x}^k)^\mathrm{T}d^k, \quad \forall k \ge \bar{k}, \, k \in K. \end{aligned}$$
(70)

From the definition of \(y^k\), it follows that

$$\begin{aligned} y^k_i = {\left\{ \begin{array}{ll} 0, \qquad &{} \text { if } l_i \le \tilde{x}^k_i + \frac{\alpha ^k}{\delta }d^k_i \le u_i, \\ \tilde{x}^k_i + \frac{\alpha ^k}{\delta }d^k_i - u_i> 0, \qquad &{} \text { if } \tilde{x}^k_i + \frac{\alpha ^k}{\delta }d^k_i> u_i, \\ \tilde{x}^k_i + \frac{\alpha ^k}{\delta }d^k_i - l_i < 0, \qquad &{} \text { if } l_i > \tilde{x}^k_i + \frac{\alpha ^k}{\delta }d^k_i. \end{array}\right. } \end{aligned}$$
(71)

In particular, we have

$$\begin{aligned} y^k_i {\left\{ \begin{array}{ll} = 0, \quad &{} \text { if } d^k_i = 0, \\ \in [0 , \frac{\alpha ^k}{\delta } d^k_i], \quad &{} \text { if } d^k_i > 0, \\ \in [-\frac{\alpha ^k}{\delta } d^k_i, 0], \quad &{} \text { if } d^k_i < 0. \end{array}\right. } \end{aligned}$$
(72)

From the above relation, it is straightforward to verify that

$$\begin{aligned} |y^k_i| \le \frac{\alpha ^k}{\delta }|d^k_i|, \quad i=1,\dots ,n. \end{aligned}$$
(73)

In the following, we want to majorize the left-hand side of (70) by showing that \(\{\frac{\alpha ^k}{\delta }g(z^k)^\mathrm{T}y^k\}\) converges to a nonnegative value. To this aim, we analyze three different cases, depending on whether \(\bar{x}_i\) is at the bounds or is strictly feasible:

  1. (i)

    \(i \in \hat{N}\) such that \(l_i< \bar{x}_i < u_i\). As \(\{\tilde{x}^k\}\) converges to \(\bar{x}\), there exists \(\tau > 0\) such that

    $$\begin{aligned} l_i + \tau \le \tilde{x}^k \le u_i - \tau , \qquad k \in K, \, k \text { sufficiently large}. \end{aligned}$$

    Since \(\{\alpha ^k\}\) converges to zero and \(\{d^k\}\) is limited, it follows that \(\frac{\alpha ^k}{\delta } |d^k_i| < \tau \), for \(k \in K\), k sufficiently large. Then,

    $$\begin{aligned} l_i< \tilde{x}^k_i + \frac{\alpha ^k}{\delta }d^k_i < u_i, \qquad k \in K, \, k \text { sufficiently large}, \end{aligned}$$

    which implies, from (71), that

    $$\begin{aligned} y^k_i = 0, \qquad k \in K, \, k \text { sufficiently large}. \end{aligned}$$
    (74)
  2. (ii)

    \(i \in \hat{N}\) such that \(\bar{x}_i = l_i\). First, we show that

    $$\begin{aligned}&g_i(\bar{x}) \le 0, \end{aligned}$$
    (75)
    $$\begin{aligned}&y^k_i \le 0, \qquad k \in K, \, k \text { sufficiently large}. \end{aligned}$$
    (76)

    To show (75), we assume by contradiction that \(g_i(\bar{x}) > 0\). From (12) and recalling that \(\Vert \tilde{x}^k - x^k\Vert \) converges to zero from (46) of Lemma C.3, it follows that

    $$\begin{aligned} \lim _{k\rightarrow \infty , \, k\in K} \lambda _i(\tilde{x}^k) = \lim _{k\rightarrow \infty , \, k\in K} g_i(\tilde{x}^k) = g_i(\bar{x}) > 0. \end{aligned}$$

    Then, there exist an iteration index \(\hat{k}\) and a scalar \(\xi > 0\) such that \(\lambda _i(\tilde{x}^k) \ge \xi > 0\), for all \(k \ge \hat{k}, \, k \in K\). As \(\{\tilde{x}^k_i\}\) converges to \(l_i\), there also exists \(\tilde{k} \ge \hat{k}\) such that

    $$\begin{aligned} l_i \le \tilde{x}^k_i \le l_i + \epsilon \xi \le l_i + \epsilon \lambda _i(\tilde{x}^k),&\qquad k \in K, \, k \ge \tilde{k}, \\ g_i(\tilde{x}^k) > 0,&\qquad k \in K, \, k \ge \tilde{k}, \end{aligned}$$

    which contradicts the fact that \(i \in N(\tilde{x}^k)\) for k sufficiently large. To show (76), we observe that since \(\{\tilde{x}^k_i\}\) converges to \(l_i\), there exists \(\tau \in ]0,u_i-l_i]\) such that

    $$\begin{aligned} l_i \le \tilde{x}^k_i \le u_i-\tau , \qquad k \in K, \, k \text { sufficiently large}. \end{aligned}$$

    Moreover, since \(\{\alpha ^k\}\) converges to zero and \(\{d^k\}\) is limited, it follows that \(\frac{\alpha ^k}{\delta } d^k_i \le \tau \), for \(k \in K\), k sufficiently large. Then,

    $$\begin{aligned} \tilde{x}^k + \frac{\alpha ^k}{\delta } d^k_i \le u_i, \qquad k \in K, \, k \text { sufficiently large}. \end{aligned}$$

    The above relation, combined with (71), proves (76). Now, we distinguish two subcases, depending on the sign of \(d^k_i\):

    • for every subsequence \(\bar{K} \subseteq K\) such that \(d^k_i \ge 0\), from (72) it follows that \(y^k_i \ge 0\). Consequently, from (76) we can write

      $$\begin{aligned} y^k_i = 0, \qquad k \in \bar{K}, \, k \text { sufficiently large}. \end{aligned}$$
      (77)
    • for every subsequence \(\bar{K} \subseteq K\) such that \(d^k_i < 0\), we have two further possible situations, according to (75):

      1. (a)

        \(g_i(\bar{x}) < 0\). As \(\{z^k\}\) converges to \(\bar{x}\), then \(g_i(z^k) \le 0\) for \(k \in \bar{K}\), k sufficiently large. From (76), we obtain

        $$\begin{aligned} \frac{\delta }{\alpha ^k}g_i(z^k)y^k_i \ge 0, \qquad k \in \bar{K}, \, k \text { sufficiently large}. \end{aligned}$$
        (78)
      2. (b)

        \(g_i(\bar{x}) = 0\). From (73), we get

        $$\begin{aligned} \frac{\delta }{\alpha ^k} |g_i(z^k)y^k_i| \le \frac{\delta }{\alpha ^k} |g_i(z^k)| |y^k_i| \le |g_i(z^k)| |d^k_i|. \end{aligned}$$

        Since \(\{d^k\}\) is limited, \(\{z^k\}\) converges to \(\bar{x}\), and \(g_i(\bar{x}) = 0\), from the continuity of the gradient we get

        $$\begin{aligned} \lim _{k\rightarrow \infty , \, k\in \bar{K}} \frac{\delta }{\alpha ^k} g_i(z^k)d^k_i = 0. \end{aligned}$$
        (79)
  3. (iii)

    \(i \in \hat{N}\) such that \(\bar{x}_i = u_i\). Reasoning as in the previous case, we obtain

    $$\begin{aligned} \lim _{k\rightarrow \infty , \, k\in K} \frac{\delta }{\alpha ^k} g_i(z^k)y^k_i \ge 0. \end{aligned}$$
    (80)

Finally, from (74), (77), (78), (79) and (80), we have

$$\begin{aligned} \lim _{k\rightarrow \infty , \, k\in K} \frac{\delta }{\alpha ^k} g(z^k)^\mathrm{T}y^k \ge 0, \end{aligned}$$
(81)

and, from (61), (69), (70), (80) and (81), we obtain

$$\begin{aligned} -\eta&= \lim _{\begin{array}{c} k\rightarrow \infty \\ k\in K \end{array}} g(\tilde{x}^k)^\mathrm{T} d^k = \lim _{\begin{array}{c} k\rightarrow \infty \\ k\in K \end{array}} g(z^k)^\mathrm{T} d^k \ge \lim _{\begin{array}{c} k\rightarrow \infty \\ k\in K \end{array}} g(z^k)^\mathrm{T} d^k - \lim _{\begin{array}{c} k\rightarrow \infty \\ k\in K \end{array}} \frac{\delta }{\alpha ^k} g(z^k)^\mathrm{T}y^k \\&\ge \lim _{\begin{array}{c} k\rightarrow \infty \\ k\in K \end{array}} \gamma g(\tilde{x}^k)^\mathrm{T}d^k = -\gamma \eta . \end{aligned}$$

This contradicts the fact that we set \(\gamma < 1\) in ASA-BCP. \(\square \)

Now, we can prove Theorem 4.1.

Proof

(Theorem 4.1) Let \(x^*\) be any limit point of the sequence \(\{x^k\}\), and let \(\{x^k\}_K \) be the subsequence converging to \(x^*\). From (46) of Lemma C.3 we can write

$$\begin{aligned} \lim _{k\rightarrow \infty , \, k\in K}\tilde{x}^{k}=x^*, \end{aligned}$$
(82)

and, thanks to the fact that \(A_l(x^k)\), \(A_u(x^k)\) and \(N(x^k)\) are subsets of a finite set of indices, we can define a further subsequence \(\hat{K} \subseteq K\) such that

$$\begin{aligned} N^k := \hat{N}, \quad A_l^k := \hat{A}_l, \quad A_u^k := \hat{A}_u, \end{aligned}$$

for all \(k \in \hat{K}\). Recalling Proposition 3.2, we define the following function that measures the violation of the optimality conditions for feasible points:

$$\begin{aligned} \phi (x_i) = \min \left\{ \max \{l_i-x_i, -g_i(x)\}^2, \max \{x_i-u_i, g_i(x)\}^2 \right\} . \end{aligned}$$

By contradiction, we assume that \(x^*\) is a non-stationary point for problem (1). Then, there exists an index i such that \(\phi (x^*_i) > 0\). From (82) and the continuity of \(\phi \), there exists an index \(\tilde{k}\) such that

$$\begin{aligned} \phi (\tilde{x}^k_i) \ge \varDelta > 0, \quad \forall k \ge \tilde{k}. \end{aligned}$$
(83)

Now, we consider three cases:

  1. (i)

    \(i \in \hat{A}_l\). Then, \(\tilde{x}^k_i = l_i\). From (12) and (9), we get \(g_i(x^k) > 0, \, \forall k \in \hat{K}\). By continuity of the gradient, and since both \(\{\tilde{x}^k\}_{\hat{K}}\) and \(\{x^k\}_{\hat{K}}\) converge to \(x^*\), we obtain

    $$\begin{aligned} g_i(\tilde{x}^k) \ge -\frac{\varDelta }{2}, \end{aligned}$$

    for \(k \in \hat{K}\), k sufficiently large. Then, we have \(\phi (\tilde{x}^k_i) \le \frac{\varDelta ^2}{4} < \varDelta \) for \(k \in \hat{K}\), k sufficiently large. This contradicts (83).

  2. (ii)

    \(i \in \hat{A}_u\). Then, \(\tilde{x}^k_i = u_i\). The proof of this case is a verbatim repetition of the previous case.

  3. (iii)

    \(i \in \hat{N}\). As \(\phi (x_i^*) > 0\), then \(g_i(x^*) \ne 0\). From Theorem C.1, we have

    $$\begin{aligned} \lim _{k\rightarrow \infty ,\, k\in \hat{K}} g(\tilde{x}^k)^\mathrm{T} d^k = 0. \end{aligned}$$

    From (21), it follows that

    $$\begin{aligned} \lim _{k\rightarrow \infty ,\, k\in \hat{K}} \Vert g_{\hat{N}}(\tilde{x}^k)\Vert = \Vert g_{\hat{N}}(x^*)\Vert = 0, \end{aligned}$$

    leading to a contradiction.\(\square \)

In order to prove Theorem 4.2, we need a further lemma.

Lemma C.4

Let Assumption 3.1 hold and assume that \(\{x^k\}\) is an infinite sequence generated by ASA-BCP. Then, there exists an iteration index \(\bar{k}\) such that \(N(x^k) \ne \emptyset \) for all \(k \ge \bar{k}\).

Proof

By contradiction, we assume that there exists an infinite index subset \(\bar{K} \subseteq \{1,2,\dots \}\) such that \(N(x^k) = \emptyset \) for all \(k \in \bar{K}\). Let \(x^*\) be a limit point of \(\{x\}_{\bar{K}}\), that is,

$$\begin{aligned} \lim _{k \rightarrow \infty , \, k \in K} x^k = x^*, \end{aligned}$$

where \(K \subseteq \bar{K}\). Theorem 4.1 ensures that \(x^*\) is a stationary point. From (46) of Lemma C.3, we can write

$$\begin{aligned} \lim _{k \rightarrow \infty , \, k \in K} x^k = \lim _{k \rightarrow \infty , \, k \in K} \tilde{x}^k = x^*. \end{aligned}$$

Moreover, from Proposition 3.1, there exists an index \(\hat{k}\) such that

$$\begin{aligned} \{i :x^*_i= & {} l_i, \lambda ^*_i > 0 \} \subseteq A_l(x^k) \subseteq \{i :x^*_i = l_i\}, \quad \forall \ k \ge \hat{k}, \quad k \in K, \end{aligned}$$
(84)
$$\begin{aligned} \{i :x^*_i= & {} u_i, \mu ^*_i > 0 \} \subseteq A_u(x^k) \subseteq \{i :x^*_i = u_i\}, \quad \forall \ k \ge \hat{k}, \quad k \in K. \end{aligned}$$
(85)

Let \(\tilde{k}\) be the smallest integer such that \(\tilde{k} \ge \hat{k}\) and \(\tilde{k} \in K\). From (84) and (85), we can write

$$\begin{aligned} \tilde{x}^{\tilde{k}}_i = l_i = x^*_i, \quad&\text { if } i \in A_l(x^{\tilde{k}}), \\ \tilde{x}^{\tilde{k}}_i = u_i = x^*_i, \quad&\text { if } i \in A_u(x^{\tilde{k}}). \end{aligned}$$

Since \(N(x^k)\) is empty for all \(k \in K\), we also have

$$\begin{aligned} A_l(x^k) \cup A_u(x^k) = \{1,\dots ,n\}, \quad \forall \ k \in K. \end{aligned}$$

Consequently, \(\tilde{x}^{\tilde{k}} = x^*\), contradicting the hypothesis that the sequence \(\{x^k\}\) is infinite. \(\square \)

Now, we can finally prove Theorem 4.2.

Proof

(Theorem 4.2) From Proposition 3.1, exploiting the fact the sequence \(\{x^k\}\) converges to \(x^*\) and that strict complementarity holds, we have that for sufficiently large k,

$$\begin{aligned}&N(x^k) = N(\tilde{x}^k) = N^*, \\&A_l(x^k) = A_l(\tilde{x}^k) = \{i :x^*_i = l_i\}, \\&A_u(x^k) = A_u(\tilde{x}^k) = \{i :x^*_i = u_i\}. \end{aligned}$$

From the instructions of the algorithm, it follows that \(\tilde{x}^k = x^k\) for sufficiently large k, and then, the minimization is restricted on \(N(\tilde{x}^k)\). From Lemma C.4, we have that \(N(\tilde{x}^k) = N(x^k) = N^* \ne \emptyset \) for sufficiently large k. Furthermore, from (27), we have that \(d_{N(\tilde{x}^k)}^k\) is a Newton-truncated direction, and then, the assertion follows from standard results on unconstrained minimization. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cristofari, A., De Santis, M., Lucidi, S. et al. A Two-Stage Active-Set Algorithm for Bound-Constrained Optimization. J Optim Theory Appl 172, 369–401 (2017). https://doi.org/10.1007/s10957-016-1024-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10957-016-1024-9

Keywords

Mathematics Subject Classification

Navigation