1 Introduction

In this paper, we are interested in the solution of smooth constrained optimization problems of the type:

$$\begin{aligned} \begin{array}{lll} &{}\min &{} \quad f(x)\\ &{}&{} h(x) = 0 \\ &{}&{} \ell \le x \le u, \end{array} \end{aligned}$$
(1)

where \(x,\ell ,u\in \Re ^n,\) \(\ell _i < u_i\), for all \(i=1,\dots ,n\), \(f :\Re ^n\rightarrow \Re \), \(h :\Re ^n\rightarrow \Re ^p\) are twice continuously differentiable functions. Note that the structure of Problem (1) is sufficiently general to capture, through reformulation, also problems with nonlinear inequality constraints. Problem (1) has been studied for decades, and many optimization methods have been proposed for its solution. Solution algorithms for (1) belong to different classes like, e.g., sequential penalty [18], augmented Lagrangian [4] and sequential quadratic programming [21].

Among the algorithms based on augmented Lagrangian functions, the one implemented in the ALGENCAN [2, 3] software package is one of the latest and more efficient. The computational heavy part of ALGENCAN consists in the solution (at every outer iteration) of the subproblem, i.e., the minimization of the augmented Lagrangian merit function for given values of the penalty parameter and of the estimated Lagrange multipliers. Such minimization is carried out by the inner solver GENCAN [5].

It is worth noticing that besides the above methods, efficient local algorithms have been proposed in the literature that exploit second-order information to define superlinearly convergent Newton-like methods [4, 13, 16]. The so-called acceleration strategy of ALGENCAN is an attempt to exploit second-order information by means of such locally convergent methods to improve the convergence rate of the overall algorithm.

The idea that we develop in this paper is twofold. On the one side, we propose an alternative and possibly more extensive way to use second-order information within the framework of an augmented Lagrangian algorithm. Basically, we propose a Newton-type direction to use even when potentially far away from solution points. The use of such a Newton direction is combined with an appropriate active-set strategy. In particular, after estimating active and non-active variables with respect to the bound constraints, we compute the Newton direction with respect to only the variables estimated as non-active, while the ones estimated as active are set to the bounds.

On the other hand, when the Newton-type direction cannot be computed or does not satisfy a proper condition, we propose to resort to the minimization of the augmented Lagrangian function, but using an efficient active-set method for bound-constrained problems [11].

The paper is organized as follows. In Sect. 2, we report some preliminary results that will be useful in the paper. In Sect. 3, we describe the procedure to compute the Newton-type direction and we study its theoretical properties. Section 4 is devoted to the description of the proposed augmented Lagrangian algorithm and to its convergence analysis. In Sect. 5, we are concerned with the analysis of the converge rate for the proposed method. In Sect. 6, we report some numerical experiments and comparison with existing software. Finally, in Sect. 7 we draw some conclusions.

2 Notation and Preliminary Results

Given a vector \(x \in \mathbb R^n\), we denote by \(x_i\) its ith entry and, given an index set \(T \subseteq \{1,\ldots ,n\}\), we denote by \(x_T\) the subvector obtained from x by discarding the components not belonging to T. The gradient of a function f(x) is denoted by \(\nabla f(x)\), while the Hessian matrix is denoted by \(\nabla ^2 f(x)\). We indicate by \(\nabla _{x_i} f(x)\) the ith entry of \(\nabla f(x)\). The Euclidean norm of a vector x is indicated by \(\Vert x\Vert \), while \(\Vert x\Vert _{\infty }\) denotes the sup-norm of x. Given a matrix M, we indicate by \(\Vert M\Vert \) the matrix norm induced by the Euclidean vector norm. The projection of a vector x onto a box [ab] is denoted by \(\mathcal{P}_{[a,b]}(x)\). The ith column of the identity matrix is indicated by \(e_i\).

With reference to Problem (1), we define the Lagrangian function \(L(x,\mu )\) with respect to the equality constraints as follows:

$$\begin{aligned} L(x,\mu ) := f(x)+\mu ^T h(x), \end{aligned}$$

where \(\mu \in \Re ^p\) is the Lagrange multiplier.

Denoting the gradient of \(L(x,\mu )\) with respect to x as \(\nabla _x L(x,\mu ) = \nabla f(x) + \nabla h(x) \mu \), we say that \((x,\mu ,\sigma ,\rho )\in \Re ^{3n+p}\) is a KKT tuple for Problem (1) if

$$\begin{aligned}&\nabla _x L(x,\mu ) = \sigma - \rho , \end{aligned}$$
(2a)
$$\begin{aligned}&h(x) = 0, \end{aligned}$$
(2b)
$$\begin{aligned}&\sigma ^T (\ell - x) = 0, \end{aligned}$$
(2c)
$$\begin{aligned}&\rho ^T (x-u) = 0, \end{aligned}$$
(2d)
$$\begin{aligned}&\ell - x \le 0,\quad \sigma \ge 0, \end{aligned}$$
(2e)
$$\begin{aligned}&x - u \le 0, \quad \rho \ge 0. \end{aligned}$$
(2f)

If \(x^*\) is local minimum of Problem (1) that satisfies some constraint qualification, then there exist KKT multipliers \(\mu ^*,\sigma ^*,\rho ^*\) such that \((x^*,\mu ^*,\sigma ^*,\rho ^*)\) is a KKT tuple. Note that the KKT conditions 2 can be rewritten as follows:

$$\begin{aligned} \nabla _{x_i} L(x,\mu )&{\left\{ \begin{array}{ll} = 0, \quad &{} \text {if } \ell _i< x_i < u_i, \\ \ge 0, \quad &{} \text {if } x_i = \ell _i, \\ \le 0, \quad &{} \text {if } x_i = u_i, \end{array}\right. } \end{aligned}$$
(3a)
$$\begin{aligned} h(x)&= 0. \end{aligned}$$
(3b)

For a KKT tuple \((x^*,\mu ^*,\sigma ^*,\rho ^*)\), we say that the strict complementarity holds if \(x^*_i = \ell _i \Rightarrow \sigma ^*_i > 0\) and \(x^*_i = u_i \Rightarrow \rho ^*_i > 0\), that is, \(x^*_i = \ell _i \Rightarrow \nabla _i L(x^*,\mu ^*) > 0\) and \(x^*_i = u_i \Rightarrow \nabla _i L(x^*,\mu ^*) < 0\).

Now, let us define the multiplier functions \(\sigma (x,\mu )\) and \(\rho (x,\mu )\), which give us some estimates of the KKT multipliers \(\sigma \) and \(\rho \), respectively, associated with the box constraints of Problem (1). Following the same approach used in [11, 12] for bound-constrained problems, we can first express \(\sigma (x,\mu ) = \nabla _x L(x,\mu ) + \rho (x,\mu )\) from (2a), and then, we can compute \(\rho (x,\mu )\) by minimizing the error over (2c)–(2d) (see [12] for more details), obtaining

$$\begin{aligned} \sigma _i(x,\mu )&:= \frac{(u_i-x_i)^2}{(\ell _i-x_i)^2 + (u_i-x_i)^2} \nabla _{x_i}L(x,\mu ), \quad i = 1,\ldots ,n, \end{aligned}$$
(4)
$$\begin{aligned} \rho _i(x,\mu )&:= -\frac{(\ell _i-x_i)^2}{(\ell _i-x_i)^2 + (u_i-x_i)^2} \nabla _{x_i}L(x,\mu ), \quad i = 1,\ldots ,n. \end{aligned}$$
(5)

These multiplier functions will be employed later for defining an active-set strategy to be used in the proposed algorithm.

Moreover, now we can say that \((x^*,\mu ^*)\in \Re ^{n+p}\) is a KKT pair for Problem (1) when \((x^*,\mu ^*,\sigma (x^*,\mu ^*),\rho (x^*,\mu ^*))\) is a KKT tuple.

2.1 The Augmented Lagrangian Method

The algorithm we propose here builds upon the augmented Lagrangian method described in [3], where an augmented Lagrangian function is defined with respect to a subset of constraints and iteratively minimized over x subject to the remaining constraints. In our case, we define the augmented Lagrangian function for Problem (1) with respect to the equality constraints as

$$\begin{aligned} L_a(x,\mu ;\epsilon ) := L(x,\mu ) + \frac{1}{\epsilon }\Vert h(x)\Vert ^2, \end{aligned}$$

where \(\epsilon > 0\) is a parameter that penalizes violation of the equality constraints. Given an estimate \((x_k,\bar{\mu }_k)\) of a KKT pair and a value \(\epsilon _k\) for the penalty parameter, the new iterate \(x_{k+1}\) can thus be computed by approximately solving the following bound-constrained subproblem:

$$\begin{aligned} \begin{array}{l} \min \quad L_a(x,\bar{\mu }_k;\epsilon _k)\\ \qquad \quad \ell \le x \le u. \end{array} \end{aligned}$$
(6)

Then, according to [3], we can set

$$\begin{aligned} \mu _{k+1} = \bar{\mu }_k + \frac{2}{\epsilon _k}h(x_{k+1}) \end{aligned}$$
(7)

and update the Lagrange multiplier \(\bar{\mu }_{k+1}\) by projecting \((\mu _{k+1})_i\) in a suitable interval \([\bar{\mu }_{\text {min}},\bar{\mu }_{\text {max}}]\), \(i = 1,\ldots ,p\), that is,

$$\begin{aligned} (\bar{\mu }_{k+1})_i = \max \{\bar{\mu }_{\text {min}},\min \{(\mu _{k+1})_i,\bar{\mu }_{\text {max}}\}\}, \quad i=1,\ldots ,p. \end{aligned}$$
(8)

Finally, we decrease the penalty parameter \(\epsilon _{k+1}\) if the constraint violation is not sufficiently reduced and start a new iteration. We can summarize the method proposed in [3] as in the following scheme.

figure a

In the next section, we will describe how to incorporate the use of a proper second-order direction into this augmented Lagrangian framework.

3 Direction Computation

In this section, we introduce and analyze the procedure for computing a second-order direction, employing a proper active-set estimate.

3.1 Active-Set Estimate

Taking inspiration from the strategy proposed in [16], for any \(x \in [\ell ,u]\) and any \(\mu \in \mathbb R^p\), we can estimate the active constraints in a KKT point by the following sets:

$$\begin{aligned}&\mathcal{L}(x,\mu ) := \{i :\nabla _{x_i} L(x,\mu ) > 0,\ \ell _i \le x_i \le \ell _i+ \nu \sigma _i(x,\mu )\}, \end{aligned}$$
(9)
$$\begin{aligned}&\mathcal{U}(x,\mu ) := \{i :\nabla _{x_i} L(x,\mu ) < 0,\ u_i - \nu \rho _i(x,\mu ) \le x_i \le u_i\}, \end{aligned}$$
(10)

where \(\nu >0\) is a given parameter and the multiplier functions \(\sigma (x,\mu )\), \(\rho (x,\mu )\) are defined in (4) and (5), respectively.

In particular, in a given pair \((x,\mu )\), the sets \(\mathcal{L}(x,\mu )\) and \(\mathcal{U}(x,\mu )\) contain the indices of the variables that are estimated to be active at the lower bound \(\ell _i\) and at the upper bound \(u_i\), respectively, in a KKT point. As to be shown later, at each iteration of the proposed algorithm, these sets are used to compute a Newton direction with respect to only the variables that are estimated as non-active, while the variables estimated as active are set to bound.

Using results from [16], the following identification property of the active-set estimate (9)–(10) holds.

Proposition 3.1

If \((x^*,\mu ^*,\sigma ^*,\rho ^*)\) satisfies the KKT conditions 2, then there exists a neighborhood of \((x^*,\mu ^*)\) such that, for each \((x,\mu )\) in this neighborhood, we have

$$\begin{aligned} \{i :x^*_i = \ell _i, \, \sigma ^*_i> 0 \} \subseteq \mathcal{L}(x,\mu ) \subseteq \{i :x^*_i = l_i\}, \\ \{i :x^*_i = u_i, \, \rho ^*_i > 0 \} \subseteq \mathcal{U}(x,\mu ) \subseteq \{i :x^*_i = u_i\}. \end{aligned}$$

In particular, if the strict complementarity holds at \((x^*,\mu ^*,\sigma ^*,\rho ^*)\), for each \((x,\mu )\) in this neighborhood we have

$$\begin{aligned} \mathcal{L}(x,\mu ) = \{i :x^*_i = l_i\} \quad \text {and} \quad \mathcal{U}(x,\mu ) = \{i :x^*_i = u_i\}. \end{aligned}$$

The result stated in the above proposition holds for an unknown neighborhood of the optimal solution. It would be of great interest and importance to give a characterization of that neighborhood, in order to bound the maximum number of iterations required by the algorithm to identify the active set. Currently, this is an open problem and we think it may represent a possible line of future research, for example by adapting the complexity results given for ALGENCAN in [7], or extending some results on finite active-set identification given in the literature for specific classes of algorithms [8, 10, 22].

3.2 Step Computation

In the proposed algorithm, at the beginning of every iteration k, we have a point \(x_k \in [\ell ,u]\) and Lagrange multiplier estimates \((\bar{\mu }_k)_i \in [\bar{\mu }_{\text {min}},\bar{\mu }_{\text {max}}]\), \(i=1,\ldots ,p\).

Using (9)–(10), we estimate the active and non-active set in \((x_k,\bar{\mu }_k)\). Denoting

$$\begin{aligned} \mathcal{L}_k := \mathcal{L}(x_k,\bar{\mu }_k), \quad \mathcal{U}_k := \mathcal{U}(x_k, \bar{\mu }_k), \quad \mathcal{B}_k := \mathcal{L}_k\cup \mathcal{U}_k, \quad \mathcal{N}_k := \{1,\ldots ,n\} \setminus \mathcal{B}_k, \end{aligned}$$
(11)

we can thus partition the vector \(x_k\) as \(x_k = (x_{\mathcal{B}_k}, x_{\mathcal{N}_k})\), reordering its entries if necessary. Let us also denote

$$\begin{aligned} L_k := L(x_k,\bar{\mu }_k), \quad \nabla _{\mathcal{N}_k} L_k := [\nabla _x L_k]_{\mathcal{N}_k}, \quad h_k := h(x_k), \quad \nabla _{\mathcal{N}_k} h_k := [\nabla h_k]_{\mathcal{N}_k}, \end{aligned}$$

while \(\nabla ^2_{xx} L_k\) denotes the Hessian matrix of \(L_k\) deriving with respect to x two times and \(\nabla ^2_{\mathcal{N}_k} L_k\) denotes the submatrix obtained from \(\nabla ^2_{xx} L_k\) by discarding rows and columns not belonging to \(\mathcal{N}_k\).

Now, consider the following system of equation with unknowns \(x_{\mathcal{N}_k}\) and \(\mu \):

$$\begin{aligned}&\nabla _{\mathcal{N}_k} L(x_{\mathcal{N}_k}, x_{\mathcal{B}_k},\mu ) = 0, \end{aligned}$$
(12a)
$$\begin{aligned}&h(x_{\mathcal{N}_k}, x_{\mathcal{B}_k}) = 0. \end{aligned}$$
(12b)

The nonlinear system (12a)–(12b) can be solved iteratively by the Newton method, where the Newton direction is computed by solving the following linear system:

$$\begin{aligned} \begin{pmatrix}\nabla ^2_{\mathcal{N}_k} L_k &{} \nabla _{\mathcal{N}_k} h_k \\ \nabla _{\mathcal{N}_k} h_k^T &{} 0\end{pmatrix} \begin{pmatrix}d_{x_{\mathcal{N}_k}} \\ d_{\mu }\end{pmatrix} = -\begin{pmatrix}\nabla _{\mathcal{N}_k}L_k \\ h_k\end{pmatrix}. \end{aligned}$$
(13)

Hence, if a solution \((d_{x_{\mathcal{N}_k}},d_{\mu })\) of (13) exists, we can set

$$\begin{aligned} d_k = (d_{x_{\mathcal{N}_k}},d_{\mu }) \end{aligned}$$

and move from \(((x_k)_{\mathcal{N}_k},\bar{\mu }_k)\) along \(d_k\), then projecting \((x_k)_{\mathcal{N}_k} + d_{x_{\mathcal{N}_k}}\) onto the box \([\ell _{\mathcal{N}_k},u_{\mathcal{N}_k}]\). In particular, we define

$$\begin{aligned} (\tilde{x}_k)_{\mathcal{N}_k} = \mathcal{P}_{[\ell _{\mathcal{N}_k},u_{\mathcal{N}_k}]}((x_k)_{\mathcal{N}_k} + d_{x_{\mathcal{N}_k}}). \end{aligned}$$
(14)

and

$$\begin{aligned} \mu _{k+1} = \bar{\mu }_k + d_{\mu }. \end{aligned}$$

For what concerns the variables \((x_k)_{\mathcal{B}_k}\), since they are estimated as active, we set them to the bounds. Namely, we define \((\tilde{x}_k)_{\mathcal{B}_k}\) as follows:

$$\begin{aligned} (\tilde{x}_k)_i = {\left\{ \begin{array}{ll} \ell _i, \quad &{} \text {if } i\in \mathcal{L}_k, \\ u_i, \quad &{} \text {if } i\in \mathcal{U}_k. \end{array}\right. } \end{aligned}$$
(15)

The following results holds.

Proposition 3.2

If the solution \(d_k\) of system (13) exists, then \((x_k,\bar{\mu }_k,\sigma _k,\rho _k)\) is a KKT tuple with \(\sigma _k = \sigma (x_k,\bar{\mu }_k)\) and \(\rho _k = \rho (x_k,\bar{\mu }_k)\) if and only if \(d_k=0\) and \((\tilde{x}_k)_{\mathcal{B}_k} = (x_k)_{\mathcal{B}_k}\).

Proof

First, assume that \(d_k=0\) and \((\tilde{x}_k)_{\mathcal{B}_k} = (x_k)_{\mathcal{B}_k}\). From (13), we have

$$\begin{aligned} \nabla _{x_{\mathcal{N}_k}} L(x_k,\bar{\mu }_k) = 0 \quad \text {and} \quad h(x_k) = 0. \end{aligned}$$

Using the expression of \(\mathcal{L}(x_k,\bar{\mu }_k)\) and \(\mathcal{U}(x_k,\bar{\mu }_k)\) given in (9)–(10), and recalling the definition of \(\rho (x,\mu )\) and \(\sigma (x,\mu )\) given in (4)–(5), we also have

$$\begin{aligned} (\sigma _k)_i&= (\rho _k)_i = \nabla _{x_i} L(x_k,\bar{\mu }_k) = 0,&\forall i \in \mathcal{N}_k, \\ (x_k)_i&= (\tilde{x}_k)_i = \ell _i, \quad (\sigma _k)_i = \nabla _{x_i} L(x_k,\bar{\mu }_k)> 0, \quad (\rho _k)_i = 0,&\forall i \in \mathcal{L}_k, \\ (x_k)_i&= (\tilde{x}_k)_i = u_i, \quad (\sigma _k)_i = 0, \quad (\rho _k)_i = -\nabla _{x_i} L(x_k,\bar{\mu }_k) > 0,&\forall i \in \mathcal{U}_k. \end{aligned}$$

It follows that KKT conditions 2 are satisfied.

Now, assume that \((x_k,\bar{\mu }_k,\sigma _k,\rho _k)\) is a KKT tuple. Since \(\nabla _{x_{\mathcal{N}_k}} L(x_k,\bar{\mu }_k) = 0\) and \(h(x_k) = 0\), from (13) we have \(d_k=0\). Finally, using the KKT conditions written as in 3, and recalling the definition of \(\rho (x,\mu )\) and \(\sigma (x,\mu )\) given in (4)–(5), we also have \((x_k)_i = \ell _i = (\tilde{x}_k)_i\) for all \(i \in \mathcal{L}_k\) and \((x_k)_i = u_i = (\tilde{x}_k)_i\) for all \(i \in \mathcal{U}_k\). \(\square \)

4 The Algorithm

In this section, we use the above described active-set estimate and Newton strategy to design a primal-dual augmented Lagrangian method.

At the beginning of each iteration k, we have a pair \((x_k,\bar{\mu }_k)\). We first estimate the active set \(\mathcal{L}_k \cup \mathcal{U}_k\) and the non-active set \(\mathcal{N}_k\) as in (11). If possible, we calculate a direction \(d_k = (d_{x_\mathcal{N}},d_{\mu })\) by solving the Newton system (13) and we compute \((\tilde{x}_k)\) as in (14)–(15). This point is accepted and set as \(x_{k+1}\) only if \(\Vert (d_k,(\tilde{x}_k-x_k)_{\mathcal{B}_k})\Vert \le \Delta _k\), where \(\Delta _k\) is iteratively decreased trough the iterations by a factor \(\beta \in (0,1)\).

If this is not the case, we compute \(x_{k+1}\) as an approximate minimizer of the bound-constrained subproblem (6), such that

$$\begin{aligned} \Vert x_{k+1} - \mathcal{P}_{[\ell ,u]}(x_{k+1} - \nabla _x L_a(x_{k+1},\bar{\mu }_k;\epsilon _k))\Vert _{\infty } \le \tau _k, \end{aligned}$$
(16)

with \(\{\tau _k\} \rightarrow 0\). Then, we update the multiplier estimate \(\mu _{k+1}\) by (7) and decrease the penalty parameter \(\epsilon _{k+1}\) if the constraint violation is not sufficiently reduced.

We finally terminate the iteration by setting \(\bar{\mu }_{k+1}\) as the projection of \(\mu _{k+1}\) on a prefixed box, according to (8).

The proposed method, named Primal-Dual Augmented Lagrangian Method (P-D ALM), is reported in the following algorithmic scheme. As specified later (see Sect. 6), in practical implementation of the algorithm we use a stricter test to accept the point \(\tilde{x}_k\), also requiring a decrease of the feasibility violation in the new point \(\tilde{x}_k\). For the sake of generality, the theoretical analysis is carried out by considering only the condition \(\Vert (d_k,(\tilde{x}_k-x_k)_{\mathcal{B}_k})\Vert \le \Delta _k\).

figure b

The next results shows that a KKT point is obtained, as a limit point, whenever we accept the Newton direction for an infinite number of iterations.

Proposition 4.1

Let \(\{x_k\}\) be a sequence generated by the Primal-Dual Augmented Lagrangian Method and let \(\{x_k\}_K\) be a subsequence such that \(\tilde{x}_k\) is accepted (i.e., \(d_k\) is computed and \(\Vert (d_k,(\tilde{x}_k-x_k)_{\mathcal{B}_k})\Vert \le \Delta _k\)) for infinitely many iterations \(k \in K\) and

$$\begin{aligned} \lim _{k \rightarrow \infty , \, k \in K} x_{k+1} = x^*. \end{aligned}$$

Then, \(x^*\) is a KKT point.

Proof

Since \(\{\bar{\mu }_k\}\) is a bounded sequence and \(\mathcal{L}_k\), \(\mathcal{U}_k\), \(\mathcal{N}_k\) are subsets of a finite set of indices, without loss of generality we can assume that \(\lim _{k \rightarrow \infty , \, k \in K} \bar{\mu }_{k+1} = \mu ^*\), \(\mathcal{L}_k = \mathcal L\), \(\mathcal{U}_k = \mathcal U\) and \(\mathcal{N}_k = \mathcal N\) (passing into a further subsequence if necessary). Moreover, since \(d_k\) is accepted for infinitely many iterations \(k \in K\), without loss of generality we can also assume that \(d_k\) is accepted for all \(k \in K\) (passing again into a further subsequence if necessary).

Since the projection is non-expansive, for all \(k \in K\) we have

$$\begin{aligned} \Vert (x_{k+1},\bar{\mu }_{k+1}) - (x_k,\bar{\mu }_k)\Vert \le \Delta _k. \end{aligned}$$

Moreover, since \(\Delta _{k+1} = \beta \Delta _k\), with \(\beta \in (0,1)\), for all \(k \in K\),

$$\begin{aligned} \lim _{k \rightarrow \infty } \Delta _k = 0. \end{aligned}$$
(17)

and

$$\begin{aligned} \lim _{\begin{array}{c} k \rightarrow \infty \\ k \in K \end{array}} \Vert x_{k+1}-x_k\Vert = 0. \end{aligned}$$

Then,

$$\begin{aligned} \lim _{\begin{array}{c} k \rightarrow \infty \\ k \in K \end{array}} x_k = \lim _{\begin{array}{c} k \rightarrow \infty \\ k \in K \end{array}} x_{k+1} = x^*. \end{aligned}$$
(18)

Since \(\Vert d_k\Vert \le \Delta _k\) for all \(k \in K\), from (17) we also have that

$$\begin{aligned} \lim _{\begin{array}{c} k \rightarrow \infty \\ k \in K \end{array}} \Vert d_k\Vert = 0. \end{aligned}$$
(19)

Using again the fact that the Newton direction is accepted at every iteration \(k \in K\), we can write

$$\begin{aligned} \left\| \begin{pmatrix}\nabla _\mathcal{N}L_k \\ h_k\end{pmatrix}\right\| = \left\| \begin{pmatrix}\nabla ^2_\mathcal{N} L_k &{} \nabla _\mathcal{N} h_k \\ \nabla _\mathcal{N} h_k^T &{} 0\end{pmatrix} \begin{pmatrix}d_{\mathcal{N}} \\ d_{\mu }\end{pmatrix}\right\| \le \left\| \begin{pmatrix}\nabla ^2_\mathcal{N} L_k &{} \nabla _\mathcal{N} h_k \\ \nabla _\mathcal{N} h_k^T &{} 0\end{pmatrix}\right\| \left\| \begin{pmatrix}d_{\mathcal{N}} \\ d_{\mu }\end{pmatrix}\right\| . \end{aligned}$$
(20)

Taking the limits for \(k \rightarrow \infty \), \(k \in K\), and using (18), we have

$$\begin{aligned} \begin{aligned}&\lim _{\begin{array}{c} k \rightarrow \infty , \\ k \in K \end{array}} \nabla _\mathcal{N}L_k = \lim _{\begin{array}{c} k \rightarrow \infty , \\ k \in K \end{array}} \nabla _\mathcal{N}L_{k+1} = \nabla _\mathcal{N}L_(x^*,\mu ^*) \quad \text {and} \\&\lim _{\begin{array}{c} k \rightarrow \infty , \\ k \in K \end{array}} h_k = \lim _{\begin{array}{c} k \rightarrow \infty , \\ k \in K \end{array}} h_{k+1} = h(x^*). \end{aligned} \end{aligned}$$

Taking into account (19) and (20), we can write

$$\begin{aligned} \nabla _\mathcal{N} L(x^*,\mu ^*) = 0 \quad \text {and} \quad h(x^*) = 0. \end{aligned}$$

To conclude the proof, we have to show that the KKT conditions are satisfied with respect to \(\nabla _{x_\mathcal{L}} L(x^*,\mu ^*)\) and \(\nabla _{x_\mathcal{U}} L(x^*,\mu ^*)\) as well. From the instructions of the algorithm, \((x_{k+1})_\mathcal{L} = (\tilde{x}_k)_\mathcal{L} = \ell _\mathcal{L}\) and \((x_{k+1})_\mathcal{U} = (\tilde{x}_k)_\mathcal{U} = u_\mathcal{U}\) for all \(k \in K\). Consequently,

$$\begin{aligned} x^*_i = {\left\{ \begin{array}{ll} \ell _i, &{} \quad \text {if } i \in \mathcal L, \\ u_i, &{} \quad \text {if } i \in \mathcal U. \end{array}\right. } \end{aligned}$$

So, using 3, KKT conditions with respect to \(\nabla _{x_\mathcal{L}} L(x^*,\mu ^*)\) and \(\nabla _{x_\mathcal{U}} L(x^*,\mu ^*)\) hold if and only if

$$\begin{aligned} \nabla _{x_i} L(x^*,\mu ^*) {\left\{ \begin{array}{ll} \ge 0, &{} \quad \text {if } i \in \mathcal L, \\ \le 0, &{} \quad \text {if } i \in \mathcal U. \end{array}\right. } \end{aligned}$$
(21)

For any index \(i\in \mathcal L\), from the active-set estimate (9) we have \(0 \ge (d_k)_i = \ell _i - (x_k)_i \ge -\nu \sigma _i(x_k,\bar{\mu }_k)\) and, using the definition of \(\sigma _i(x,\mu )\) given in (4), we get

$$\begin{aligned} \nabla _{x_i} L_k \ge -\frac{(\ell _i-(x_k)_i)^2 + (u_i-(x_k)_i)^2}{\nu (u_i-(x_k)_i)^2} \, (d_k)_i. \end{aligned}$$

Similarly, for any index \(i\in \mathcal U\) we have \(0 \le (d_k)_i = u_i - (x_k)_i \le \nu \rho _i(x_k,\bar{\mu }_k)\) and then

$$\begin{aligned} \nabla _{x_i} L_k \le -\frac{(\ell _i-(x_k)_i)^2 + (u_i-(x_k)_i)^2}{\nu (l_i-(x_k)_i)^2} \, (d_k)_i. \end{aligned}$$

Taking the limits for \(k \rightarrow \infty \), \(k \in K\), and using (18)–(19), we obtain (21). \(\square \)

In the following result, we show that any limit point of the sequence \(\{x_k\}\) is either feasible for Problem (1) or stationary for the penalty term \(\Vert h(x)\Vert ^2\) of the augmented Lagrangian function, measuring the violation with respect to the equality constraints.

Proposition 4.2

Let \(\{x_k\}\) be a sequence generated by the Primal-Dual Augmented Lagrangian Method and let \(\{x_k\}_K\) be a subsequence such that

$$\begin{aligned} \lim _{k \rightarrow \infty , \, k \in K} x_{k+1} = x^*. \end{aligned}$$

The following holds:

  • if \(\lim _{k \rightarrow \infty } \epsilon _k > 0\), then \(x^*\) is feasible;

  • if \(\tilde{x}_k\) is accepted (i.e., \(d_k\) is computed and \(\Vert (d_k,(\tilde{x}_k-x_k)_{\mathcal{B}_k})\Vert \le \Delta _k\)) for infinitely many iterations \(k \in K\), then \(x^*\) is feasible (indeed, it is a KKT point);

  • in all other cases, \(x^*\) is a KKT point of the problem \(\min _{\ell \le x \le u}\Vert h(x)\Vert ^2\).

Proof

Let us analyze the three cases separately.

  • If \(\lim _{k \rightarrow \infty } \epsilon _k > 0\), from the instructions of the algorithm there exists an iteration \(\hat{k}\) such that \(\epsilon _{k+1} = \epsilon _k\) for all \(k\ge \hat{k}\). Therefore, \(\Vert h(x_{k+1})\Vert _{\infty } \le \eta \Vert h(x_k)\Vert _{\infty }\), with \(\eta \in (0,1)\), for all \(k \ge \hat{k}\), and then \(\{h(x_k)\} \rightarrow 0\), implying that \(x^*\) is feasible.

  • If \(\tilde{x}_k\) is accepted for infinitely many iterations \(k \in K\), from Proposition 4.1 we have that \(x^*\) is a KKT point, and thus, it is feasible.

  • In all the other cases, we want to show that

    $$\begin{aligned}{}[\nabla h(x^*) h(x^*)]_i {\left\{ \begin{array}{ll} \ge 0, &{} \quad \text {if } x^*_i = \ell _i, \\ = 0, &{} \quad \text {if } x^*_i \in (\ell _i,u_i), \\ \le 0, &{} \quad \text {if } x^*_i = u_i. \end{array}\right. } \end{aligned}$$
    (22)

    Since \(\{\bar{\mu }_k\}\) is a bounded sequence, without loss of generality we can assume that \(\lim _{k \rightarrow \infty , \, k \in K} \bar{\mu }_{k+1} = \mu ^*\), (passing into a further subsequence if necessary). Moreover, note that there exists an iteration \(\hat{k} \in K\) such that, for all \(k \ge \hat{k}\), \(k \in K\), the Newton direction \(d_k\) is not accepted, that is, we compute \(x_{k+1}\) such that (16) holds. Since \(\{\tau _k\} \rightarrow 0\), it follows that

    $$\begin{aligned} \lim _{k \rightarrow \infty } \Vert x_{k+1} - \mathcal{P}_{[\ell ,u]}(x_{k+1} - \nabla _x L_a(x_{k+1},\bar{\mu }_k;\epsilon _k))\Vert _{\infty } = 0. \end{aligned}$$
    (23)

    Now, we distinguish three subcases.

    1. (i)

      \(x^*_i \in (\ell _i,u_i)\). Since \(\{x_{k+1}\}_K \rightarrow x^*\), there exists an iteration \(\hat{k} \in K\) such that \((x_{k+1})_i \in (\ell _i,u_i)\) for all \(k \ge \hat{k}\), \(k \in K\). In view of (23), it follows that

      $$\begin{aligned} \lim _{k \rightarrow \infty , \, k \in K} (\nabla _x L_a(x_{k+1},\bar{\mu }_k;\epsilon _k))_i = 0 \end{aligned}$$

      (otherwise, if it was not true, then \(\limsup _{k \rightarrow \infty , \, k \in K} |(x_{k+1} - \mathcal{P}_{[\ell ,u]}(x_{k+1} - \nabla _x L_a(x_{k+1},\bar{\mu }_k;\epsilon _k)))_i| > 0\), leading to a contradiction with (23)). So, there exists an iteration, that we still denote by \(\hat{k} \in K\) without loss of generality, such that \((x_{k+1} - \nabla _x L_a(x_{k+1},\bar{\mu }_k;\epsilon _k))_i \in [\ell _i,u_i]\) for all \(k \ge \hat{k}\), \(k \in K\). Hence, for all \(k \ge \hat{k}\), \(k \in K\), we can write

      $$\begin{aligned} \begin{aligned} \tau _k&\ge \Vert x_{k+1} - \mathcal{P}_{[\ell ,u]}(x_{k+1} - \nabla _x L_a(x_{k+1},\bar{\mu }_k;\epsilon _k))\Vert _{\infty } \\&\ge \bigl |(x_{k+1} - \mathcal{P}_{[\ell ,u]}(x_{k+1} - \nabla _x L_a(x_{k+1},\bar{\mu }_k;\epsilon _k)))_i\bigr | \\&= \bigl |(\nabla _x L_a(x_{k+1},\bar{\mu }_k;\epsilon _k))_i\bigr | \\&= \biggl |\biggl (\nabla f(x_{k+1}) + \nabla h(x_{k+1}) \bar{\mu }_k + \frac{2}{\epsilon _k} \nabla h(x_{k+1}) h(x_{k+1})\biggr )_i\biggr |. \end{aligned} \end{aligned}$$

      Multiplying the first and the last term in the above chain of inequality by \(\epsilon _k\), we get

      $$\begin{aligned} \epsilon _k \tau _k \ge | (\epsilon _k \nabla f(x_{k+1}) + \epsilon _k \nabla h(x_{k+1}) \bar{\mu }_k + 2 \nabla h(x_{k+1}) h(x_{k+1}))_i|, \end{aligned}$$

      for all \(k \ge \hat{k}\), \(k \in K\). Taking the limits in the above inequality for \(k \rightarrow \infty \), \(k \in K\), the left-hand side converges to zero, since both \(\{\epsilon _k\}\) and \(\{\tau _k\}\) converge to zero, while the right-hand side converges to \(|(2 \nabla h(x^*) h(x^*))_i|\), since \(\{\epsilon _k\} \rightarrow 0\), \(\{\nabla f(x_{k+1})\}_K \rightarrow \nabla f(x^*)\), \(\{\nabla h(x_{k+1})\}_K \rightarrow \nabla h (x^*)\), \(\{h(x_{k+1})\}_K \rightarrow h(x^*)\) and \(\{\bar{\mu }_k\}_K \rightarrow \mu ^*\). We thus conclude that \((\nabla h(x^*) h(x^*))_i=0\).

    2. (ii)

      \(x^*_i = \ell _i\). Since \(\{x_{k+1}\}_K \rightarrow x^*\), there exists an iteration \(\hat{k} \in K\) such that \((x_{k+1})_i \in [\ell _i,u_i)\) for all \(k \ge \hat{k}\), \(k \in K\). In view of (23), it follows that

      $$\begin{aligned} \liminf _{k \rightarrow \infty , \, k \in K} (\nabla _x L_a(x_{k+1},\bar{\mu }_k;\epsilon _k))_i \ge 0 \end{aligned}$$

      (otherwise, if it was not true, then \(\limsup _{k \rightarrow \infty , \, k \in K} |(x_{k+1} - \mathcal{P}_{[\ell ,u]}(x_{k+1} - \nabla _x L_a(x_{k+1},\bar{\mu }_k;\epsilon _k)))_i| > 0\), leading to a contradiction with (23)). So, we can write

      $$\begin{aligned} \liminf _{k \rightarrow \infty , \, k \in K} \biggl (\nabla f(x_{k+1}) + \nabla h(x_{k+1}) \bar{\mu }_k + \frac{2}{\epsilon _k} \nabla h(x_{k+1}) h(x_{k+1})\biggr )_i \ge 0. \end{aligned}$$

      Multiplying the terms of the above inequality by \(\epsilon _k\), and taking into account that \(\{\epsilon _k\}~\rightarrow ~0\), \(\{\nabla f(x_{k+1})\}_K \rightarrow \nabla f(x^*)\), \(\{\nabla h(x_{k+1})\}_K \rightarrow \nabla h (x^*)\), \(\{h(x_{k+1})\}_K \rightarrow h(x^*)\) and \(\{\mu _k\}_K \rightarrow \mu ^*\) is bounded, we get

      $$\begin{aligned} \begin{aligned}&\liminf _{k \rightarrow \infty , \, k \in K} (\epsilon _k \nabla f(x_{k+1}) + \epsilon _k \nabla h(x_{k+1}) \bar{\mu }_k + 2\nabla h(x_{k+1}) h(x_{k+1}))_i =\\&\qquad \qquad = 2(\nabla h(x^*) h(x^*))_i \ge 0. \end{aligned} \end{aligned}$$
    3. (iii)

      \(x^*_i = u_i\). We obtain \((\nabla h(x^*) h(x^*))_i \le 0\) using the same arguments as in the previous case.\(\square \)

In order to show convergence of the algorithm to KKT points, we need to point out some properties of the approximate minimizers of the augmented Lagrangian function. In particular, in the next lemma we show that, when we cannot use the Newton direction, the approximate minimizers of the augmented Lagrangian function computed as in (16), with \(\{\tau _k\} \rightarrow 0\), satisfy the conditions stated in [3] for the solutions of the subproblems (see Step 2 of Algorithm 3.1 in [3]).

Lemma 4.1

Let \(\{x_k\}\) be a sequence generated by the Primal-Dual Augmented Lagrangian Method, and let \(\{x_k\}_K\) be a subsequence such that

$$\begin{aligned} \lim _{k \rightarrow \infty , \, k \in K} x_{k+1} = x^*, \end{aligned}$$

with \(x^*\) feasible and, for all \(k \in K\), either the Newton direction \(d_k\) cannot be computed (i.e., system (13) does not have solutions) or \(\tilde{x}_k\) is not accepted (i.e., \(\Vert (d_k,(\tilde{x}_k-x_k)_{\mathcal{B}_k})\Vert > \Delta _k\)). Then, for all \(k \in K\) there exist \(\tau _{k,1} \ge 0\), \(\tau _{k,2} \ge 0\), \((v_k)_i\),\((w_k)_i\),\(i=1,\ldots ,n\), such that

$$\begin{aligned}&\biggl \Vert \nabla L_a(x_{k+1},\bar{\mu }_k; \epsilon _k) + \sum _{i=1}^n ((v_k)_i-(w_k)_i) \biggr \Vert _{\infty } \le \tau _{k,1}, \end{aligned}$$
(24)
$$\begin{aligned}&(v_k)_i \ge 0, \quad (w_k)_i \ge 0 \quad \text {and} \quad \ell _i - \tau _{k,2} \le (x_{k+1})_i \le u_i + \tau _{k,2}, \quad i = 1,\ldots ,n, \nonumber \\ \end{aligned}$$
(25)
$$\begin{aligned}&(x_{k+1})_i > \ell _i + \tau _{k,2} \, \Rightarrow \, (w_k)_i = 0, \quad i = 1,\ldots ,n, \end{aligned}$$
(26)
$$\begin{aligned}&(x_{k+1})_i < u_i - \tau _{k,2} \, \Rightarrow \, (v_k)_i = 0, \quad i = 1,\ldots ,n, \end{aligned}$$
(27)
$$\begin{aligned}&\lim _{k \rightarrow \infty , \, k \in K} \tau _{k,1} = \lim _{k \rightarrow \infty , \, k \in K} \tau _{k,2} = 0. \end{aligned}$$
(28)

Proof

First, note that the conditions on \(x_k\) in (25) are satisfied for any \(\tau _{k,2} \ge 0\), since we maintain feasibility with respect to the constraints \(\ell \le x \le u\). Without loss of generality, we can limit to prove that an iteration \(\hat{k} \in K\) exists such that (24)–(27) hold for all \(k \ge \hat{k}\), \(k \in K\), and (28) is satisfied (for the iterations \(k < \hat{k}\), \(k \in K\), we can choose arbitrary \(\tau _{k,1} \ge 0\), \(\tau _{k,2} \ge 0\), \((v_k)_i\),\((w_k)_i\),\(i=1,\ldots ,n\), with \(\tau _{k,1}\) sufficiently large, satisfying (24)–(27)).

From the instructions of the algorithm, at every iteration \(k \in K\) we compute \(x_{k+1}\) such that (16) holds, with \(\{\tau _k\} \rightarrow 0\). So, we can choose \(\hat{k}\) as the first iteration such that

$$\begin{aligned} \tau _k < \min _{i=1,\ldots ,n}\{u_i - \ell _i\}, \quad \forall k \ge \hat{k}. \end{aligned}$$
(29)

Since the index set \(\{1,\ldots ,n\}\) is finite, without loss of generality we can define the subsets \(I_1\), \(I_2\), \(I_3\) and \(I_4\) (passing into a further subsequence if necessary) such that:

$$\begin{aligned} I_1&= \{i :(x_{k+1})_i \in (\ell _i,u_i) \, \forall k \in K \text { and } x^*_i \in (\ell _i,u_i)\}, \\ I_2&= \{i :(x_{k+1})_i \in (\ell _i,u_i) \, \forall k \in K \text { and } x^*_i \in \{\ell _i,u_i\}\}, \\ I_3&= \{i :(x_{k+1})_i = \ell _i \, \forall k \in K\}, \\ I_4&= \{i :(x_{k+1})_i = u_i \, \forall k \in K\}. \end{aligned}$$

From (16) and (29), for all \(k \ge \hat{k}\), \(k \in K\), we can write

$$\begin{aligned}&|(x_{k+1} - \mathcal{P}_{[\ell ,u]}(x_{k+1} - \nabla _x L_a(x_{k+1},\bar{\mu }_k;\epsilon _k)))_i| \le \tau _k, \quad&i \in I_1 \cup I_2, \\&\nabla _{x_i} L_a(x_{k+1},\bar{\mu }_k;\epsilon _k) \ge -\tau _k, \quad&i \in I_3, \\&\nabla _{x_i} L_a(x_{k+1},\bar{\mu }_k;\epsilon _k) \le \tau _k, \quad&i \in I_4. \end{aligned}$$

For every variable \((x_k)_i\) with \(i \in I_1\), we also have that

$$\begin{aligned} \mathcal{P}_{[\ell ,u]}(x_{k+1} - \nabla _x L_a(x_{k+1},\bar{\mu }_k;\epsilon _k)))_i = x_{k+1} - \nabla _x L_a(x_{k+1},\bar{\mu }_k;\epsilon _k) \end{aligned}$$

for all sufficiently large \(k \in K\) (this follows from the fact that \(\{(x_k)_i\}_K \rightarrow x^*_i \in (\ell _i,u_i)\) and \(\tau _k \rightarrow 0\)) . So, without loss of generality we can also assume that \(\hat{k}\) is large enough to satisfy

$$\begin{aligned}&|\nabla _x L_a(x_{k+1},\bar{\mu }_k;\epsilon _k)_i| \le \tau _k, \quad&i \in I_1, \\&|(x_{k+1} - \mathcal{P}_{[\ell ,u]}(x_{k+1} - \nabla _x L_a(x_{k+1},\bar{\mu }_k;\epsilon _k)))_i| \le \tau _k, \quad&i \in I_2, \\&\nabla _{x_i} L_a(x_{k+1},\bar{\mu }_k;\epsilon _k) \ge -\tau _k, \quad&i \in I_3, \\&\nabla _{x_i} L_a(x_{k+1},\bar{\mu }_k;\epsilon _k) \le \tau _k, \quad&i \in I_4. \end{aligned}$$

Let us rewrite the quantities within the absolute value in the second inequality as follows:

$$\begin{aligned}&(x_{k+1} - \mathcal{P}_{[\ell ,u]}(x_{k+1} - \nabla _x L_a(x_{k+1},\bar{\mu }_k;\epsilon _k)))_i \\&\quad = \nabla _{x_i} L_a(x_{k+1},\bar{\mu }_k;\epsilon _k) - (y'_k)_i+ (y''_k)_i, \end{aligned}$$

where \((y'_k)_i, (y''_k)_i \ge 0\) are proper scalars. In more detail, if \(p := (x_{k+1} - \nabla _x L_a(x_{k+1},\bar{\mu }_k;\epsilon _k))_i\) is in \([\ell _i,u_i]\), then \((y'_k)_i, (y''_k)_i = 0\). On the other hand, if \(p - \mathcal{P}_{[\ell _i,u_i]}(p)< 0\), then \((y'_k)_i > 0\) and \((y''_k)_i = 0\); otherwise, i.e., if \(p - \mathcal{P}_{[\ell _i,u_i]}(p)> 0\), then \((y'_k)_i = 0\) and \((y''_k)_i > 0\). Therefore, we obtain

$$\begin{aligned}&|\nabla _{x_i} L_a(x_{k+1},\bar{\mu }_k;\epsilon _k)| \le \tau _k, \quad&i \in I_1, \\&|\nabla _{x_i} L_a(x_{k+1},\bar{\mu }_k;\epsilon _k) - (y^1_k)_i + (y^2_k)_i| \le \tau _k, \quad&i \in I_2, \\&\nabla _{x_i} L_a(x_{k+1},\bar{\mu }_k;\epsilon _k) \ge -\tau _k, \quad&i \in I_3, \\&\nabla _{x_i} L_a(x_{k+1},\bar{\mu }_k;\epsilon _k) \le \tau _k, \quad&i \in I_4. \end{aligned}$$

We conclude that (24)–(27) hold for all \(k \ge \hat{k}\), \(k \in K\), with

$$\begin{aligned} \tau _{k,1}&= \tau _k, \\ \tau _{k,2}&= {\left\{ \begin{array}{ll} \min _{i \in I_2 }\{\min \{(x_k)_i-\ell _i,u_i-(x_k)_i\}\}, \quad &{} \text {if } I_2 \ne \emptyset , \\ 0, \quad &{} \text {if } I_2 = \emptyset , \end{array}\right. } \\ (v_k)_i&= {\left\{ \begin{array}{ll} 0, \quad &{} i \in I_1\cup I_3, \\ (y''_k)_i,\quad &{} i\in I_2,\\ \max \{0,-\nabla _{x_i} L_a(x_{k+1},\bar{\mu }_k;\epsilon _k)\}, \quad &{} i \in I_4, \end{array}\right. } \\ (w_k)_i&= {\left\{ \begin{array}{ll} 0, \quad &{} i \in I_1 \cup I_4, \\ (y'_k)_i, \quad &{} i \in I_2, \\ \max \{0,\nabla _{x_i} L_a(x_{k+1},\bar{\mu }_k;\epsilon _k)\}, \quad &{} i \in I_3, \end{array}\right. } \end{aligned}$$

and, from the above definitions, also (28) is satisfied. \(\square \)

Combining the above results with those stated in [3], we can finally show the convergence of the proposed algorithm to stationary points. In particular, as in [3], we use the constant positive linear dependence (CPLD) as constraint qualification condition.

Definition 4.1

A point x is said to satisfy CPLD for Problem (1) if the existence of scalars \(\lambda _1,\ldots ,\lambda _p\), \(\pi _i \ge 0\), \(i \in \mathcal{L}(x)\), \(\varphi _j \ge 0\), \(j \in \mathcal{U}(x)\), such that \(\sum _{t=1}^p \lambda _t \nabla h_t(z) - \sum _{i \in \mathcal{L}(x)} \pi _i e_i + \sum _{j \in \mathcal{U}(x)} \varphi _j e_j = 0\) implies that, for all z in a neighborhood of x, the vectors \(\nabla h_1(z), \ldots , \nabla h_p(z)\),\(-e_i\), \(i \in \mathcal{L}(x)\), \(e_j\), \(j \in \mathcal{U}(x)\) are linearly dependent, where \(\mathcal{L}(x) := \{i :x_i = \ell _i\}\), \(\mathcal{U}(x) := \{i :x_i = u_i\}\) and \(\mathcal{N}(x) := \{1,\ldots ,n\} \setminus (\mathcal{L}(x) \cup \mathcal{U}(x)).\)

For more details on CPLD and the relations with other constraint qualification conditions, see also [1, 23].

Theorem 4.1

Let \(\{x_k\}\) be a sequence generated by the Primal-Dual Augmented Lagrangian Method and let \(\{x_k\}_K\) be a subsequence such that

$$\begin{aligned} \lim _{k \rightarrow \infty , \, k \in K} x_{k+1} = x^*. \end{aligned}$$

The following holds:

  • if \(\tilde{x}_k\) is accepted (i.e., \(d_k\) is computed and \(\Vert (d_k,(\tilde{x}_k-x_k)_{\mathcal{B}_k})\Vert \le \Delta _k\)) for infinitely many iterations \(k \in K\), then \(x^*\) is a KKT point;

  • else, if \(x^*\) satisfies the CPLD constraint qualification, then \(x^*\) is a KKT point.

Proof

If \(\tilde{x}_k\) is accepted for infinitely many iterations \(k \in K\), then \(x^*\) is a KKT point from Proposition 4.1. Else, there exists an iteration \(\hat{k} \in K\) such that \(d_k\) is not accepted for any \(k \ge \hat{k}\), \(k \in K\), and the algorithm reduces to a classical Augmented Lagrangian method. Then, using Lemma 4.1, the conditions stated in [3] for the solutions of the subproblems are satisfied and the result is obtained by the same arguments given in the proof of Theorem 4.2 in [3]. \(\square \)

5 Convergence Rate Analysis

In this section, we analyze the convergence rate of the proposed algorithm. We will show that, for sufficiently large iterations, the primal-dual sequence \((x_k,\bar{\mu }_k)\) converges to an optimal solution \((x^*,\mu ^*)\) at a quadratic rate.

In the literature, standard assumptions to prove the convergence rate of an augmented Lagrangian scheme are the linear independence constraints qualification (LICQ), the strict complementarity and the second-order sufficient condition (SOSC). For Problem (1), let us denote by \(\sigma ^*\) and \(\rho ^*\) the KKT multipliers at \(x^*\) associated with the bound constraints \(x \ge \ell \) and \(x \le u\), respectively, and

$$\begin{aligned} \mathcal{L}^* := \{i :x^*_i = \ell _i\}, \quad \mathcal{U}^* := \{i :x^*_i = u_i\}, \quad \mathcal{N}^* := \{1,\ldots ,n\} \setminus (\mathcal{L}^* \cup \mathcal{U}^*). \end{aligned}$$

Then,

  • LICQ means that the vectors \(\nabla h_1(x^*),\ldots ,\nabla h_p(x^*)\), \(-e_i\), \(i\in \mathcal{L}^*\), \(e_j\), \(j\in \mathcal{U}^*\), are linearly independent;

  • SOSC means that \(y^T \nabla ^2_{xx} L(x^*,\mu ^*) y > 0\) for all \(y \in T(x^*) \setminus \{0\}\), where

    $$\begin{aligned} \begin{aligned}&T(x^*) := \{y \in \mathbb R^n :\nabla h(x^*)^T y = 0, \\&e_i^T y = 0, \quad i \in I_0(x^*), \\&e_i^T y \le 0, \quad i \in I_1(x^*)\}, \end{aligned} \end{aligned}$$

    with \(I_0(x^*) := (\mathcal{L}^*\cap \{i :\sigma _i^*> 0\}) \cup (\mathcal{U}^*\cap \{i :\rho _i^* > 0\})\) and \(I_1(x^*) := (\mathcal{L}^* \cup \mathcal{U}^*) \setminus I_0(x^*)\).

Under LICQ, strict complementarity and SOSC, if the penalty parameter \(\epsilon _k \rightarrow 0\), usually it is possible to show superlinear convergence rate for augmented Lagrangian methods (see, e.g., [4, 17] and the references therein). Moreover, superlinear convergence rate is proved in [17], when \(\epsilon _k \rightarrow 0\), even without any constraint qualification, but requiring the starting multiplier to be in a neighborhood of a KKT multiplier satisfying SOSC.

Here, quadratic convergence rate is obtained by assuming that \(\mu ^*_i\in [\bar{\mu }_{\text {min}},\bar{\mu }_{\text {max}}]\) for all \(i = 1,\ldots ,p\), under LICQ and the strong second-order sufficient condition (SSOSC), where the latter means that

$$\begin{aligned} y^T \nabla ^2_{xx} L(x^*,\mu ^*) y > 0, \quad \forall y \in T'(x^*) \setminus \{0\}, \end{aligned}$$

with

$$\begin{aligned} T'(x^*) := \{y \in \mathbb R^n :\nabla h(x^*)^T y = 0, \,\, e_i^T y = 0, \, i \in I_0(x^*)\}. \end{aligned}$$

Interestingly, our results do not need the convergence of \(\{\epsilon _k\}\) to 0.

First, we state an intermediate result ensuring that, if a sequence converges to a point where the conditions for superlinear convergence rate of the Newton direction are satisfied, then the direction is eventually accepted by the algorithm.

Proposition 5.1

Let \(\{(z_k,\bar{d}_k)\}\) be a sequence of vectors such that

$$\begin{aligned} \lim _{k\rightarrow \infty } z_k = z^* \quad \text {and} \quad \Vert z_{k} + \bar{d}_k - z^* \Vert \le \alpha _k\Vert z_k-z^*\Vert , \end{aligned}$$

with \(\{\alpha _k\} \rightarrow 0\). Then, for k sufficiently large,

$$\begin{aligned} \Vert \bar{d}_k \Vert \le \beta ^k \Delta _0, \end{aligned}$$

for given \(\beta \in (0,1)\) and \(\Delta _0>0\).

Proof

Let \(\bar{k}\) and \(\bar{\alpha }\) be such that, for all \(k\ge \bar{k}\),

$$\begin{aligned} \alpha _k<\bar{\alpha }<\beta <1. \end{aligned}$$
(30)

Therefore, we can write

$$\begin{aligned} \Vert z_{k}-z^*\Vert&\le \bar{\alpha }^{k-\bar{k}}\Vert z_{\bar{k}}-z^*\Vert , \\ \Vert z_{k}+d_k-z^*\Vert&\le \bar{\alpha }^{k+1-\bar{k}}\Vert z_{\bar{k}}-z^*\Vert , \end{aligned}$$

from which we obtain:

$$\begin{aligned} \Vert \bar{d}_k\Vert \le \Vert z_{k}+d_k-z^*\Vert + \Vert z_{k}-z^*\Vert \le \bar{\alpha }^{k}\,\frac{(\bar{\alpha }+1)}{\bar{\alpha }^{\bar{k}}}\,\Vert z_{\bar{k}}-z^*\Vert . \end{aligned}$$

By using (30), we can set

$$\begin{aligned} \bar{\alpha }=\rho \beta ,\qquad \rho \in (0,1). \end{aligned}$$

Then, we have

$$\begin{aligned} \Vert \bar{d}_k\Vert \le \beta ^k \rho ^{k}\,\frac{(\bar{\alpha }+1)}{\bar{\alpha }^{\bar{k}}}\,\Vert z_{\bar{k}}-z^*\Vert . \end{aligned}$$
(31)

Since \(\rho \in (0,1)\), we can conclude that, for k sufficiently large, it results that

$$\begin{aligned} \rho ^{k}\,\frac{(\bar{\alpha }+1)}{\bar{\alpha }^{\bar{k}}}\,\Vert z_{\bar{k}}-z^*\Vert \le \Delta _0. \end{aligned}$$
(32)

Now, (31) and (32) conclude the proof. \(\square \)

Finally, we are ready to show the asymptotic quadratic rate of the primal-dual sequence \(\{(x_k,\bar{\mu }_k)\}\), under LICQ and SSOSC, if \(\mu ^*_i\in [\bar{\mu }_{\text {min}},\bar{\mu }_{\text {max}}]\) for all \(i = 1,\ldots ,p\).

Theorem 5.1

Let \(\{x_k\}\) and \(\{\bar{\mu }_k\}\) be the sequences generated by the Primal-Dual Augmented Lagrangian Method and assume that

$$\begin{aligned} \lim _{k \rightarrow \infty } x_k = x^*, \quad \lim _{k\rightarrow \infty }\bar{\mu }_k = \mu ^*, \end{aligned}$$

with \(\mu ^*_i\in [\bar{\mu }_{\text {min}},\bar{\mu }_{\text {max}}]\) for all \(i = 1,\ldots ,p\). Also assume that the LICQ and SSOSC hold at \((x^*,\mu ^*)\). Then, \(\{(x_k,\bar{\mu }_k)\}\) converges to \((x^*,\mu ^*)\) with a quadratic rate asymptotically, i.e.,

$$\begin{aligned} \left\| \begin{matrix}x_{k+1} - x^* \\ \bar{\mu }_{k+1} - \mu ^*\end{matrix}\right\| \le K \left\| \begin{matrix}x_k - x^* \\ \bar{\mu }_k - \mu ^*\end{matrix}\right\| ^2 \end{aligned}$$

for all sufficiently large k and some constant K.

Proof

Since LICQ and SSOSC hold at \((x^*,\mu ^*)\), using [16, Proposition 3.1] it follows that the following matrix is invertible for k sufficiently large:

$$\begin{aligned} \begin{pmatrix} \nabla ^2 L_k &{} \quad \nabla h_k &{} \quad -I_{\mathcal{L}_k} &{} \quad I_{\mathcal{U}_k} \\ \nabla h_k^T &{} \quad 0 &{} \quad 0 &{} \quad 0\\ -I_{\mathcal{L}_k}^T &{} \quad 0 &{} \quad 0 &{} \quad 0\\ I_{\mathcal{U}_k}^T &{} \quad 0 &{} \quad 0 &{} \quad 0 \end{pmatrix}, \end{aligned}$$

where \(I_{\mathcal{L}_k}\) and \(I_{\mathcal{U}_k}\) denote the submatrices obtained from the identity matrix by discarding the columns whose indices do not belong to \(\mathcal{L}_k\) and \(\mathcal{U}_k\), respectively. Consequently, for all sufficiently large k, the Newton direction can be computed.

Let us define \(\bar{d}_k\) as the Newton direction \(d_k = ((d_x)_k,(d_{\mu })_k)\) augmented with the components in \(\mathcal{B}_k\). Namely, \(\bar{d}_k := ((\bar{d}_x)_k,(\bar{d}_{\mu })_k)\), where

$$\begin{aligned} (\bar{d}_x)_k := ((d_x)_k, (\tilde{x}_k - x_k)_{\mathcal{B}_k}) \quad \text {and} \quad (\bar{d}_{\mu })_k = (d_{\mu })_k \end{aligned}$$

(by properly reordering the entries of \(x_k\)). We note that

$$\begin{aligned} \left\| \begin{matrix}x_k + (\bar{d}_{x})_k - x^* \\ \bar{\mu }_k + (\bar{d}_\mu )_k - \mu ^*\end{matrix}\right\| = \left\| \begin{matrix} (x_k + (\bar{d}_{x})_k - x^*)_{\mathcal{L}_k} \\ (x_k + (\bar{d}_{x})_k - x^*)_{\mathcal{U}_k} \\ (x_k + (\bar{d}_{x})_k - x^*)_{\mathcal{N}_k} \\ \bar{\mu }_k + (\bar{d}_\mu )_k - \mu ^* \end{matrix}\right\| . \end{aligned}$$

By the instructions of the algorithm, when a Newton direction is used, we have

$$\begin{aligned} (x_{k+1})_{\mathcal{L}_k} = (\tilde{x}_k)_{\mathcal{L}_k} = (x_k + (\bar{d}_{x})_k)_{\mathcal{L}_k} = \ell _{\mathcal{L}_k} \end{aligned}$$

and

$$\begin{aligned} (x_{k+1})_{\mathcal{U}_k} = (\tilde{x}_k)_{\mathcal{U}_k} = (x_k + (\bar{d}_{x})_k)_{\mathcal{U}_k} = u_{\mathcal{U}_k}. \end{aligned}$$

So, using Proposition 3.1, for all sufficiently large k we have that

$$\begin{aligned} (x_{k+1} - x^*)_{\mathcal{L}_k}&= (\ell - x^*)_{\mathcal{L}_k} = 0, \end{aligned}$$
(33a)
$$\begin{aligned} (x_{k+1} - x^*)_{\mathcal{U}_k}&= (u - x^*)_{\mathcal{U}_k} = 0, \end{aligned}$$
(33b)

and then,

$$\begin{aligned} \left\| \begin{matrix}x_k + (\bar{d}_{x})_k - x^* \\ \bar{\mu }_k + (\bar{d}_\mu )_k - \mu ^*\end{matrix}\right\| = \left\| \begin{matrix} (x_k + (\bar{d}_{x})_k - x^*)_{\mathcal{N}_k} \\ \bar{\mu }_k + (\bar{d}_\mu )_k - \mu ^* \end{matrix}\right\| . \end{aligned}$$

For all sufficiently large k, by the same arguments given in the proof of [13, Proposition 4], there exists a constant K such that

$$\begin{aligned} \begin{aligned} \left\| \begin{matrix}x_k + (\bar{d}_{x})_k - x^* \\ \bar{\mu }_k + (\bar{d}_\mu )_k - \mu ^*\end{matrix}\right\|&= \left\| \begin{matrix} (x_k + (\bar{d}_{x})_k - x^*)_{\mathcal{N}_k} \\ \bar{\mu }_k + (\bar{d}_\mu )_k - \mu ^* \end{matrix}\right\| \le K \left\| \begin{matrix} (x_k - x^*)_{\mathcal{N}_k} \\ \bar{\mu }_k - \mu ^* \end{matrix}\right\| ^2 \\&\le K \left\| \begin{matrix} x_k - x^* \\ \bar{\mu }_k - \mu ^* \end{matrix}\right\| ^2. \end{aligned} \end{aligned}$$
(34)

The above relation implies that \(\bar{d}_k\) satisfies the assumptions of Proposition 5.1 (with \(z_k = (x_k,\bar{\mu }_k)\) and \(\alpha _k = K \Vert (x_k,\bar{\mu }_k) - (x^*,\mu ^*)\Vert \)). Since \(\Vert d_k\Vert \le \Vert \bar{d}_k\Vert \), by the instructions of the algorithm, the Newton direction \(d_k\) is accepted for all sufficiently large k, so that

$$\begin{aligned} (x_{k+1})_{\mathcal{N}_k} = (\tilde{x}_k)_{\mathcal{N}_k} = \mathcal{P}_{[\ell _{\mathcal{N}_k},u_{\mathcal{N}_k}]}((x_k)_{\mathcal{N}_k} + \bar{d}_{x_{\mathcal{N}_k}}) \end{aligned}$$

and

$$\begin{aligned} (\bar{\mu }_{k+1})_i = \max \{\bar{\mu }_{\text {min}},\min \{\bar{\mu }_{\text {max}},(\bar{\mu }_k + (\bar{d}_{\mu })_k)_i\}\}, \quad i=1,\ldots ,p. \end{aligned}$$

Using 33, we get

$$\begin{aligned} \begin{aligned} \Vert x_{k+1} - x^*\Vert&= \Vert (x_{k+1} - x^*)_{\mathcal{N}_k}\Vert = \Vert \mathcal{P}_{[\ell _{\mathcal{N}_k},u_{\mathcal{N}_k}]}((x_k)_{\mathcal{N}_k} + \bar{d}_{x_{\mathcal{N}_k}}) - (x^*)_{\mathcal{N}_k} \Vert \\&\le \Vert (x_k + (\bar{d}_{x})_k - x^*)_{\mathcal{N}_k}\Vert \le \Vert x_k + (\bar{d}_{x})_k - x^*\Vert , \end{aligned} \end{aligned}$$

where the first inequality follows from the fact that the projection operator is non-expansive and that, for all sufficiently large k, from Proposition 3.1 we have \(\mathcal{N}^* \subseteq \mathcal{N}_k\), implying that \((x^*)_{\mathcal{N}_k} \in (\ell _{\mathcal{N}_k},u_{\mathcal{N}_k})\). Similarly, using again the non-expansivity of the projection operator and the assumption that \(\mu ^*_i\in [\bar{\mu }_{\text {min}},\bar{\mu }_{\text {max}}]\) for all \(i = 1,\ldots ,p\), we have

$$\begin{aligned} \begin{aligned} \Vert \bar{\mu }_{k+1} - \mu ^*\Vert&= \Vert \mathcal{P}_{[\bar{\mu }_{\text {min}}\mathbf {1},\bar{\mu }_{\text {max}}\mathbf {1}]}(\bar{\mu }_k + (\bar{d}_{\mu })_k) - \mu ^*\Vert \\&\le \Vert \bar{\mu }_k + (\bar{d}_{\mu })_k - \mu ^*\Vert , \end{aligned} \end{aligned}$$

where \(\mathbf {1}\) denotes the vector of all ones (of appropriate dimensions). Combining these relations with (34), for all sufficiently large k we obtain

$$\begin{aligned} \left\| \begin{matrix}x_{k+1} - x^* \\ \bar{\mu }_{k+1} - \mu ^*\end{matrix}\right\| \le \left\| \begin{matrix}x_k + (d_{x})_k - x^* \\ \bar{\mu }_k + (d_\mu )_k - \mu ^*\end{matrix}\right\| \le K \left\| \begin{matrix}x_k - x^* \\ \bar{\mu }_k - \mu ^*\end{matrix}\right\| ^2, \end{aligned}$$

concluding the proof. \(\square \)

6 Numerical Experiments

This section is devoted to the description of the numerical experience with the proposed algorithm and to its comparison with other algorithms publicly available. All the numerical experiments have been carried out on an Intel Xeon CPU E5-1650 v2 @ 3.50GHz with 12 cores and 64 Gb RAM.

Problem set description. We considered a set of 362 general constrained problems from the CUTEst collection [20], with number of variables \(n \in [90, 906]\) and number of general constraints (equalities and inequalities) \(m \in [1,8958]\). In particular, among the whole CUTEst problems collection, we selected all constrained problems (i.e., with at least one constraint besides bound constraints on the variables) having:

  1. (i)

    number of variables and constraints “user modifiable”, or

  2. (ii)

    number of variables “user modifiable” and a fixed number of constraints, or

  3. (iii)

    at least 100 variables.

Figure 1 describes the distribution of the number of variables and number of general constraints of the considered problems.

Fig. 1
figure 1

Problem set composition. The two curves represent the number of problems that have at most a given number of variables or general constraints, respectively

Algorithms used in the comparison. We used the following algorithms:

  • the augmented Lagrangian method implemented in the ALGENCAN (v.3.1.1) software package [2, 3];

  • the augmented Lagrangian method implemented in LANCELOT (rev.B) [9, 19];

  • our proposed primal-dual augmented Lagrangian method P-D ALM (as described in Sect. 4).

Both ALGENCAN and LANCELOT have been run using their default parameters. Note that, in its default setting, ALGENCAN uses second-order information exploiting a so-called acceleration strategy, which is activated when the current primal-dual pair is sufficiently close to a KKT pair of the problem.

Our method has been implemented by modifying the code of ALGENCAN in two points:

  • at the beginning of each iteration k, we inserted the computation of the active-set estimate and the Newton direction \(d_k\), according to the algorithmic scheme reported in Sect. 4;

  • the approximate minimization of the augmented Lagrangian function is carried out by means of the ASA-BCP method proposed in [11], in place of GENCAN [5].

In more detail, for every iteration k, in (15) we set \(\nu = \min \{10^{-6}, \Vert x_k - \mathcal{P}_{[\ell ,u]}(x_k - \nabla _x L(x_k,\bar{\mu }_k)) \Vert ^{-3} \}\) and the linear system (13) was solved by means of the MA57 library [15]. Note that we used the same library also in ALGENCAN. For what concerns the inner solver ASA-BCP, it is an active-set method where, at each iteration, the variables estimated as active are set to the bounds, while those estimated as non-active are moved along a truncated-Newton direction. In ASA-BCP, here we employed a monotone line search and, to compute the truncated-Newton direction by conjugate gradient, we used the preconditioning technique described in [6], based on quasi-Newton formulas.

It is worth noticing that, in our implementation of P-D ALM, the test for accepting the point \(\tilde{x}_k\) is made of two conditions, which must be both satisfied for acceptance. The first condition is that reported in Sect. 4, i.e., \(\Vert (d_k,(\tilde{x}_k-x_k)_{\mathcal{B}_k})\Vert \le \Delta _k\), while the second condition is that \(\Vert h(\tilde{x}_k)\Vert _{\infty } \le \eta \Vert h(x_k)\Vert _{\infty }\), i.e., the feasibility violation in \(\tilde{x}_k\) must be sufficiently smaller than in \(x_k\). In our experience, adding this new condition leads to better results in practice.

In our experiments, for all the considered methods we used the same stopping conditions. Namely, the algorithms were stopped when the following two conditions were both satisfied:

$$\begin{aligned} \Vert x_k - \mathcal{P}_{[u,\ell ]}(x_k - \nabla L(x_k,\bar{\mu }_k)) \Vert _\infty&\le \epsilon _{\text {opt}} \max \{1,\Vert \nabla f(x_k)\Vert _\infty \}, \\ \Vert h(x_k)\Vert _\infty&\le \epsilon _{\text {feas}}\Vert h(x_0)\Vert _\infty , \end{aligned}$$

where \(x_0\) is the initial point and \((x_k,\bar{\mu }_k)\) is the primal-dual pair at iteration k, with \(\epsilon _{\text {opt}} = \epsilon _{\text {feas}} = 10^{-6}\). Moreover, we inserted a maximum number of (outer) iterations equal to 400 and a time limit of 3600 s.

In Fig. 2, we start by comparing P-D ALM against ALGENCAN with and without acceleration phase (note that the acceleration phase in ALGENCAN is where second-order information come into play) using the performance profiles [14] with respect to CPU time. Note that the performance profiles are obtained on the subset of problems where at least one solver requires more than 10 s of CPU time. As it can be seen, ALGENCAN (using second-order information) is the most efficient solver but the least robust one. On the other hand, P-D ALM is considerably more robust than both the versions of ALGENCAN. One possible reason for P-D ALM being less efficient than ALGENCAN can be the following: in P-D ALM we try to use the second-order direction as much as possible, whereas second-order information is used in ALGENCAN only when the current primal-dual point is sufficiently close to a KKT pair. This could explain our larger computational times and the behavior of the reported performance profiles.

Fig. 2
figure 2

Comparison between P-D ALM and ALGENCAN with and without acceleration step, using performance profiles with respect to CPU time. Note that “ALGENCAN 3.1.1 no acc.” refers to the version of ALGENCAN not using second-order information, i.e., skipping the so-called acceleration phase

Fig. 3
figure 3

a Comparison between P-D ALM and ALGENCAN, using performance profiles with respect to CPU time. b Comparison between P-D ALM and LANCELOT, using performance profiles with respect to CPU time

Fig. 4
figure 4

Comparison between P-D ALM, ALGENCAN and LANCELOT

In Fig. 3a, we report the comparison between ALGENCAN and P-D ALM. We note that, even though ALGENCAN is slightly better than P-D ALM in terms of efficiency, it is outperformed by our proposed method in terms of robustness. Furthermore, we note that the two performance profiles intersect at, approximately, \(\alpha \simeq 5\), i.e., both algorithms solve the same percentage of problems in at most 5 times the CPU time of the best performing solver.

In Fig. 3b, we report the comparison between P-D ALM and LANCELOT (rev. B). In this case, P-D ALM is clearly the best performing solver both in terms of efficiency and robustness.

Finally, we notice that ALGENCAN, LANCELOT and P-D ALM solve, respectively, 272, 232 and 290 problems out of 362. The comparison among the three solvers is reported in Fig. 4.

7 Conclusions

In this paper, we presented a new method for nonlinear optimization problems with equality constraints and bound constraints. Starting from the augmented Lagrangian scheme implemented in ALGENCAN, we used a tailored active-set strategy to compute a Newton-type direction with respect to the variables estimated as non-active, while the variables estimated as active are set to the bounds. If this direction satisfies a proper test, an augmented Lagrangian function is minimized by means of an efficient solver recently proposed in the literature. We proved convergence to stationary points and, under standard assumptions, an asymptotic quadratic convergence rate. The numerical results show the effectiveness of the proposed method.