An Augmented Lagrangian Method for Cardinality-Constrained Optimization Problems

Kanzow, Christian; Raharja, Andreas B.; Schwartz, Alexandra

doi:10.1007/s10957-021-01854-7

An Augmented Lagrangian Method for Cardinality-Constrained Optimization Problems

Open access
Published: 29 April 2021

Volume 189, pages 793–813, (2021)
Cite this article

Download PDF

You have full access to this open access article

Journal of Optimization Theory and Applications Aims and scope Submit manuscript

An Augmented Lagrangian Method for Cardinality-Constrained Optimization Problems

Download PDF

Christian Kanzow ORCID: orcid.org/0000-0003-2897-2509¹,
Andreas B. Raharja¹ &
Alexandra Schwartz²

2060 Accesses
10 Citations
1 Altmetric
Explore all metrics

Abstract

A reformulation of cardinality-constrained optimization problems into continuous nonlinear optimization problems with an orthogonality-type constraint has gained some popularity during the last few years. Due to the special structure of the constraints, the reformulation violates many standard assumptions and therefore is often solved using specialized algorithms. In contrast to this, we investigate the viability of using a standard safeguarded multiplier penalty method without any problem-tailored modifications to solve the reformulated problem. We prove global convergence towards an (essentially strongly) stationary point under a suitable problem-tailored quasinormality constraint qualification. Numerical experiments illustrating the performance of the method in comparison to regularization-based approaches are provided.

An augmented Lagrangian method for optimization problems with structured geometric constraints

Article Open access 27 September 2022

A strong sequential optimality condition for cardinality-constrained optimization problems

Article 04 August 2022

Second-Order Optimality Conditions and Improved Convergence Results for Regularization Methods for Cardinality-Constrained Optimization Problems

Article 04 June 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In recent years, cardinality-constrained optimization problems (CCOP) have received an increasing amount of attention due to their far-reaching applications, including portfolio optimization [11, 12, 15] and statistical regression [11, 22]. Unfortunately, these problems are notoriously difficult to solve, even testing feasibility is already NP-complete [11].

A recurrent strategy in mathematics is to cast a difficult problem into a simpler one, for which well-established solution techniques already exist. For CCOP, the recent paper [17] was written precisely in this spirit. There, the authors reformulate the problem as a continuous optimization problem with orthogonality-type constraints. This approach parallels the one made in the context of sparse optimization problems [23]. It should be noted, however, that due to its similarities with mathematical programs with complementarity constraints (MPCC), the proposed reformulation from [17] is, unfortunately, also highly degenerate in the sense that even weak standard constraint qualifications (CQ) such as Abadie CQ are often violated at points of interest. In addition, sequential optimality conditions like AKKT (approximate KKT) are known to be satisfied at any feasible point of cardinality-constrained problems, see [32], and therefore are also useless to identify suitable candidates for local minima in this context.

These observations make a direct application of most standard nonlinear programming (NLP) methods to solve the reformulated problem rather challenging, since they typically require the fulfillment of a stronger standard CQ at a limit point to ensure stationarity. To overcome difficulties with CQs, CCOP-tailored CQs were introduced in [17, 19]. Regularization methods, which are standard techniques in attacking MPCC, were subsequently proposed in [15, 17], where convergence towards a stationarity point is proved using these CQs. This is not the path that we shall walk on here. In this paper, we are interested in the viability of ALGENCAN [2, 3, 13], a well established and open-source standard NLP solver based on an augmented Lagrangian method (ALM), to solve the reformulated problem directly without any problem-specific modifications.

ALMs belong to one of the classical solution methods for NLPs. However, up to the mid 2000s, their popularity was largely overshadowed by other techniques, in particular, the sequential quadratic programming methods (SQP) and the interior point methods. Since then, beginning with [2, 3], a particular variant of ALMs, which employs the Powell–Hestenes–Rockafellar (PHR) augmented Lagrangian function as well as safeguarded multipliers, has been experiencing rejuvenated interest. The aforementioned ALGENCAN implements this variant. For NLPs it has been shown that this variant possesses strong convergence properties even under very mild assumptions [4, 6]. It has since been applied to solve various other problems, including MPCC [7, 26], quasi-variational inequalities [27], generalized Nash equilibrium problems [16, 29], and semidefinite programming [8, 14].

Due to the structure of the reformulated problems, particularly relevant to us is the paper [26], where authors prove global convergence of the method towards an MPCC-C-stationarity point under MPCC-LICQ; see also [5] for a more recent discussion under weakened assumptions. However, even though the problems with orthogonality-type constraints resulting from the reformulation of CCOP can be viewed as MPCC in case nonnegativity constraints are present [19], we would like to stress that the results obtained in our paper are not simple corollaries of [26]. For one, we do not assume the presence of nonnegativity constraints here, making our results applicable in the general setting. Moreover, even in the presence of nonnegativity constraints, it was shown in [19, Remark 5.7 (f)] that MPCC-LICQ, which was used to guarantee convergence to a stationary point in [26], is often violated at points of interests for the reformulated problems. Instead, we therefore employ a CCOP-analogue of the quasinormality CQ [10], which is weaker than CCOP-CPLD introduced in [17], to prove the global convergence of the method.

To this end, we first recall some important properties of the CCOP-reformulation in Sect. 2 and define a CCOP-version of the quasinormality CQ. The ALM algorithm is introduced in Sect. 3, and its convergence properties under said quasinormality CQ are analyzed in Sect. 4. Numerical experiments illustrating the performance of ALGENCAN for the reformulated problem are then presented in Sect. 5. We close with some final remarks in Sect. 6.

Notation: For a given vector $x \in {\mathbb {R}}^n$, we define the two index sets

$$\begin{aligned} I_\pm (x) := \{ i \in \{1, \dots , n\} \mid x_i \ne 0\} \quad \text {and} \quad I_0(x) := \{ i \in \{1, \dots , n\} \mid x_i = 0\}. \end{aligned}$$

Clearly, both sets are disjoint and we have $\{1, \dots , n\} = I_\pm (x) \cup I_0(x)$. For two vectors $a, b \in {\mathbb {R}}^n$, the terms $\max \{a,b\}, \min \{a,b\} \in {\mathbb {R}}^n$ denote the componentwise maximum/minimum of these vectors. A frequently used special case hereof is $a_+ := \max \{a,0\} \in {\mathbb {R}}^n$. We denote the Hadamard product of two vectors $x, y \in {\mathbb {R}}^n$ with $x \circ y$, and we define $e := (1, \dots , 1)^T \in {\mathbb {R}}^n$.

2 Preliminaries

In this paper, we consider cardinality-constrained optimization problems of the form

$$\begin{aligned} \begin{array}{lll} \displaystyle \min _{x \in {\mathbb {R}}^n} \ f(x) &{} \text {s.t.}&{} g(x) \le 0, \quad h(x) = 0, \\ &{} &{} \Vert x\Vert _0 \le s, \end{array} \end{aligned}$$

(2.1)

where $f \in C^1({\mathbb {R}}^n,{\mathbb {R}})$, $g \in C^1({\mathbb {R}}^n,{\mathbb {R}}^m)$, $h \in C^1({\mathbb {R}}^n,{\mathbb {R}}^p)$, and $\Vert x \Vert _0$ denotes the number of nonzero components of a vector x. Occasionally, this problem is also called a sparse optimization problem [36], but sparse optimization typically refers to programs, which have a sparsity term within the objective function.

Throughout this paper, we assume $s < n$, since the cardinality constraint would be redundant otherwise. Following the approach from [17], by introducing an auxiliary variable $y \in {\mathbb {R}}^n$, we obtain the following relaxed program

$$\begin{aligned} \begin{array}{lll} \displaystyle \min _{x,y \in {\mathbb {R}}^n} \ f(x) &{} \text { s.t. } &{}g(x) \le 0, \quad h(x) = 0, \\ &{} &{} n - e^T y \le s, \\ &{} &{} y \le e, \\ &{} &{} x \circ y = 0. \end{array} \end{aligned}$$

(2.2)

Observe that the relaxed reformulation we use here is slightly different from the one in [17], because we omit the constraint $y \ge 0$, leading to a larger feasible set. Nonetheless, one can easily see that all results obtained in [17, Section 3] are applicable for (2.2) as well. We shall now gather some of these results, which are relevant for this paper. Their proofs can be found in [17].

Theorem 2.1

Let ${\hat{x}}\in {\mathbb {R}}^n$. Then the following statements hold:

(a)
${\hat{x}}$ is feasible for (2.1) if and only if there exists ${\hat{y}}\in {\mathbb {R}}^n$ such that $({\hat{x}}, {\hat{y}})$ is feasible for (2.2).
(b)
${\hat{x}}$ is a global optimizer of (2.1) if and only if there exists ${\hat{y}}\in {\mathbb {R}}^n$ such that $({\hat{x}}, {\hat{y}})$ is a global minimizer of (2.2).
(c)
If ${\hat{x}}\in {\mathbb {R}}^n$ is a local minimizer of (2.1), then there exists ${\hat{y}}\in {\mathbb {R}}^n$ such that $({\hat{x}}, {\hat{y}})$ is a local minimizer of (2.2). Conversely, if $({\hat{x}}, {\hat{y}})$ is a local minimizer of (2.2) satisfying $ \Vert {\hat{x}}\Vert _0 = s $, then $ {\hat{x}}$ is a local minimizer of (2.1).

Theorem 2.1 shows that the relaxed problem (2.2) is equivalent to the original problem (2.1) in terms of feasible points and global minima, whereas the equivalence of local minima requires some extra condition (namely the cardinality constraint to be active). Hence, essentially, the two problems (2.1) and (2.2) may be viewed as being equivalent, and it is therefore natural to solve the given cardinality problem (2.1) via the relaxed program (2.2).

Let us now recall the stationarity concepts introduced in [17].

Definition 2.2

Let $({\hat{x}}, {\hat{y}}) \in {\mathbb {R}}^n \times {\mathbb {R}}^n$ be feasible for (2.2). Then $({\hat{x}}, {\hat{y}})$ is called

(a)
CCOP-M-stationary, if there exist multipliers $\lambda \in {\mathbb {R}}^m$, $\mu \in {\mathbb {R}}^p$, and $\gamma \in {\mathbb {R}}^n$ such that
- $0 = \nabla f({\hat{x}}) + \nabla g({\hat{x}}) \lambda + \nabla h({\hat{x}}) \mu + \gamma $,
- $\lambda \ge 0$ and $\lambda _i g_i({\hat{x}}) = 0$ for all $i = 1, \dots , m$,
- $\gamma _i = 0$ for all $i \in I_\pm ({\hat{x}})$.
(b)
CCOP-S-stationary, if $ ({\hat{x}}, {\hat{y}}) $ is CCOP-M-stationary with $\gamma _i = 0$ for all $ i \in I_0({\hat{y}}) $.

As remarked in [17], CCOP-S-stationarity corresponds to the KKT condition of (2.2). In contrast, CCOP-M-stationarity does not depend on the auxiliary variables y and is the KKT condition of the following tightened nonlinear program TNLP(${\hat{x}}$)

$$\begin{aligned} \begin{array}{llll} \displaystyle \min _{x} \ f(x) &{} \text { s.t. } &{} g(x) \le 0, &{} h(x) = 0 , \\ &{} &{} x_i = 0 &{} \forall i \in I_0({\hat{x}}). \end{array} \end{aligned}$$

(2.3)

Observe that every local minimizer of (2.1) is also a local minimizer of (2.3). This justifies the definition of CCOP-M-stationarity. Suppose now that $({\hat{x}}, {\hat{y}}) \in {\mathbb {R}}^n \times {\mathbb {R}}^n$ is feasible for (2.2). By the orthogonality constraint, we clearly have $I_\pm ({\hat{x}}) \subseteq I_0({\hat{y}})$ (with equality if $\Vert {\hat{x}}\Vert _0 = s$). Hence, if $({\hat{x}}, {\hat{y}})$ is a CCOP-S-stationary point, then it is also CCOP-M-stationary. The converse is not true in general, see [17, Example 4].

It was shown in [19] that a CCOP-tailored version of Guignard CQ, which is the same as standard Guignard CQ for (2.2), is sufficient to guarantee CCOP-S-stationarity of local minima local minima of (2.2). This is a major difference to MPCCs, where one typically needs MPCC-LICQ to guarantee S-stationarity of local minima and has to rely on M-stationarity under weaker MPCC-CQs. Since local minima of (2.2) are CCOP-S-stationary under CCOP-CQs, CCOP-M-stationary points seem to be undesirable solution candidates. Fortunately, if $({\hat{x}}, {\hat{y}})$ is CCOP-M-stationary, one can simply replace ${\hat{y}}$ with another auxiliary variable ${\hat{z}}\in {\mathbb {R}}^n$ such that $({\hat{x}}, {\hat{z}})$ is CCOP-S-stationary, as the next proposition shows. Note that the proof of this result is constructive.

Proposition 2.3

Let $({\hat{x}}, {\hat{y}}) \in {\mathbb {R}}^n \times {\mathbb {R}}^n$ be feasible for (2.2). If $({\hat{x}}, {\hat{y}})$ is a CCOP-M-stationary point, then there exists ${\hat{z}}\in {\mathbb {R}}^n$ such that $({\hat{x}}, {\hat{z}})$ is CCOP-S-stationary.

Proof

By Theorem 2.1, ${\hat{x}}$ is feasible for (2.1). Now define ${\hat{z}}\in {\mathbb {R}}^n$ such that

$$\begin{aligned} {\hat{z}}_i := {\left\{ \begin{array}{ll} 0 &{} \text {if } i \in I_\pm ({\hat{x}}), \\ 1 &{} \text {if } i \in I_0({\hat{x}}). \end{array}\right. } \end{aligned}$$

Then $({\hat{x}}, {\hat{z}})$ is obviously feasible for (2.2), cf. also the proof of [17, Theorem 3.1]. By assumption, there exists $(\lambda , \mu , \gamma ) \in {\mathbb {R}}^m \times {\mathbb {R}}^p \times {\mathbb {R}}^n$ such that $({\hat{x}}, {\hat{y}})$ is CCOP-M-stationary. And since $I_\pm ({\hat{x}}) = I_0({\hat{z}})$, using Definition 2.2, we can conclude that $({\hat{x}}, {\hat{z}})$ is CCOP-S-stationary with $(\lambda , \mu , \gamma )$ from before as corresponding multipliers. $\square $

This shows that the difference between S- and M-stationarity in this setting is not as big as for MPCCs. More precisely, a feasible point ${\hat{x}}$ of (2.1) is CCOP-M-stationary if and only if there exists ${\hat{z}}$ such that the pair $({\hat{x}}, {\hat{z}})$ is CCOP-S-stationary. Consequently, any constraint qualification which guarantees that a local minimum ${\hat{x}}$ of (2.1) satisfies CCOP-M-stationarity, also yields the existence of a CCOP-S-stationary point $({\hat{x}}, {\hat{z}})$. Numerically, it implies that any method which generates a sequence converging to a CCOP-M-stationary point only, essentially gives a CCOP-S-stationary point.

Utilizing (2.3), CCOP-tailored CQs were introduced in [17]. We shall now follow this approach and introduce a CCOP-tailored quasinormality condition.

Definition 2.4

A point ${\hat{x}}\in {\mathbb {R}}^n$, feasible for (2.1), satisfies the CCOP-quasinormality condition, if there exist no $(\lambda , \mu , \gamma ) \in {\mathbb {R}}^m \times {\mathbb {R}}^p \times {\mathbb {R}}^n \setminus \{(0,0,0)\}$ such that the following conditions are satisfied:

(a)
$0 = \nabla g({\hat{x}}) \lambda + \nabla h({\hat{x}}) \mu + \gamma $,
(b)
$\lambda \ge 0$ and $\lambda _i g_i({\hat{x}}) = 0$ for all $i = 1, \dots , m$,
(c)
$\gamma _i = 0$ for all $i \in I_\pm ({\hat{x}})$,
(d)
$\exists \{x^k\} \subseteq {\mathbb {R}}^n$ with $\{x^k\} \rightarrow {\hat{x}}$ such that, for all $k \in {\mathbb {N}}$, we have
- $\forall i \in \{1, \dots , m\}$ with $\lambda _i> 0: \ \lambda _i g_i(x^k) > 0$,
- $\forall i \in \{1, \dots , p\}$ with $\mu _i \ne 0: \ \mu _i h_i(x^k) > 0$,
- $\forall i \in \{1, \dots , n\}$ with $\gamma _i \ne 0: \ \gamma _i x_i^k > 0$.

Obviously, CCOP-quasinormality corresponds to the (standard) quasinormality CQ of (2.3). By [1], CCOP-CPLD introduced in [17] thus implies CCOP-quasinormality.

3 An Augmented Lagrangian Method

Let us now describe the algorithm. For a given penalty parameter $\alpha > 0$ the PHR augmented Lagrangian function for (2.2) is given by

$$\begin{aligned} L((x,y), \lambda , \mu , \zeta , \eta , \gamma ; \alpha ) := f(x) + \alpha \pi ((x,y), \lambda , \mu , \zeta , \eta , \gamma ; \alpha ) \end{aligned}$$

with $(\lambda , \mu , \zeta , \eta , \gamma ) \in {\mathbb {R}}^m_+ \times {\mathbb {R}}^p \times {\mathbb {R}}_+ \times {\mathbb {R}}^n_+ \times {\mathbb {R}}^n$ and

$$\begin{aligned} \pi ((x,y), \lambda , \mu , \zeta , \eta , \gamma ; \alpha ) := \frac{1}{2} \left\| \begin{pmatrix} \left( g(x) + \frac{\lambda }{\alpha }\right) _+ \\ h(x) + \frac{\mu }{\alpha } \\ \left( n - e^T y -s + \frac{\zeta }{\alpha }\right) _+ \\ \left( y - e + \frac{\eta }{\alpha }\right) _+ \\ x \circ y + \frac{\gamma }{\alpha } \end{pmatrix} \right\| _2^2, \end{aligned}$$

is the shifted quadratic penalty term, cf. [13, Chapter 4]. The algorithm is then stated below.

Algorithm 3.1

(Safeguarded Augmented Lagrangian Method)

$(S_0)$:

Initialization: Choose parameters $\lambda _{\max } > 0$, $\mu _{\min } < \mu _{\max }$, $\zeta _{\max } > 0$, $\eta _{\max } > 0$, $\gamma _{\min } < \gamma _{\max }$, $\tau \in (0,1)$, $\sigma > 1$ and $\{\epsilon _k\} \subseteq {\mathbb {R}}_+$ such that $\{\epsilon _k\} \downarrow 0$.

Choose initial values ${\bar{\lambda }}^1 \in [0, \lambda _{\max }]^m$, ${\bar{\mu }}^1 \in [\mu _{\min }, \mu _{\max }]^p$, ${\bar{\zeta }}^1 \in [0, \zeta _{\max }]$, ${\bar{\eta }}^1 \in [0, \eta _{\max }]^n$, ${\bar{\gamma }}^1 \in [\gamma _{\min }, \gamma _{\max }]^n$, $\alpha _1 > 0$, and set $k \leftarrow 1$.

$(S_1)$:

Update of the iterates: Compute $(x^k, y^k)$ as an approximate solution of

$$\begin{aligned} \displaystyle \min _{(x,y) \in {\mathbb {R}}^{2n}} \ L((x,y), {\bar{\lambda }}^k, {\bar{\mu }}^k, {\bar{\zeta }}^k, {\bar{\eta }}^k, {\bar{\gamma }}^k; \alpha _k) \end{aligned}$$

satisfying

$$\begin{aligned} \Vert \nabla _{(x,y)} L((x^k,y^k), {\bar{\lambda }}^k, {\bar{\mu }}^k, {\bar{\zeta }}^k, {\bar{\eta }}^k, {\bar{\gamma }}^k; \alpha _k) \Vert \le \epsilon _k. \end{aligned}$$

(3.1)

$(S_2)$:

Update of the approximate multipliers:

$$\begin{aligned} \begin{array}{ll} \lambda ^k &{} := (\alpha _k g(x^k) + {\bar{\lambda }}^k)_+ \\ \mu ^k &{} := \alpha _k h(x^k) + {\bar{\mu }}^k \\ \zeta ^k &{}:= (\alpha _k (n - e^T y^k - s) + {\bar{\zeta }}^k)_+ \\ \eta ^k &{} := ( \alpha _k (y^k - e) + {\bar{\eta }}^k)_+ \\ \gamma ^k &{} := \alpha _k x^k \circ y^k + {\bar{\gamma }}^k \end{array} \end{aligned}$$

$(S_3)$:

Update of the penalty parameter: Define

$$\begin{aligned} U^k&:= \min \big \{ -g(x^k), \tfrac{{\bar{\lambda }}^k}{\alpha _k} \big \}, \quad V_k := \min \big \{ -(n - e^T y^k - s), \tfrac{{\bar{\zeta }}^k}{\alpha _k} \big \}, \quad \\ W^k&:= \min \big \{ -(y^k - e), \tfrac{{\bar{\eta }}^k}{\alpha _k} \big \}. \end{aligned}$$

If $k = 1$ or

$$\begin{aligned} \begin{array}{ll} &{} \max \left\{ \Vert U^k\Vert , \; \Vert h(x^k)\Vert , \; \Vert V_k\Vert , \; \Vert W^k\Vert , \; \Vert x^k \circ y^k\Vert \right\} \\ &{} \le \tau \max \left\{ \Vert U^{k - 1}\Vert , \; \Vert h(x^{k - 1})\Vert , \; \Vert V_{k - 1}\Vert , \; \Vert W^{k - 1}\Vert , \; \Vert x^{k - 1} \circ y^{k - 1}\Vert \right\} , \end{array} \end{aligned}$$

(3.2)

set $\alpha _{k + 1} = \alpha _k$. Otherwise set $\alpha _{k + 1} = \sigma \alpha _k$.

$(S_4)$:

Update of the safeguarded multipliers: Choose ${\bar{\lambda }}^{k + 1} \in [0, \lambda _{\max }]^m$, ${\bar{\mu }}^{k + 1} \in [\mu _{\min }$, $\mu _{\max }]^p$, ${\bar{\zeta }}^{k + 1} \in [0, \zeta _{\max }]$, ${\bar{\eta }}^{k + 1} \in [0, \eta _{\max }]^n$, ${\bar{\gamma }}^{k + 1} \in [\gamma _{\min }$, $\gamma _{\max }]^n$.

$(S_5)$:

Set $k \leftarrow k + 1$ and go to $(S_1)$.

Note that Algorithm 3.1 is exactly the safeguarded augmented Lagrangian method from [13]. The only difference to the classical augmented Lagrangian, see, e.g., [9, 34], is in the more careful updating of the Lagrange multipliers: The safeguarded method contains the bounded auxiliary sequences ${{\bar{\lambda }}}^k, {{\bar{\mu }}}^k, \ldots $, which replace the multiplier estimates $\lambda ^k, \mu ^k, \ldots $ in certain places. Note that these bounded auxiliary sequences are chosen by the user and that there is quite some freedom for their choice. In principle, one can simply take ${{\bar{\lambda }}}^k = 0, {{\bar{\mu }}}^k = 0, \ldots $ for all $ k \in {\mathbb {N}}$, in which case Algorithm 3.1 boils down to the classical quadratic penalty method. A more practical choice is to compute ${{\bar{\lambda }}}^{k+1}, {{\bar{\mu }}}^{k+1}, \ldots $ by taking the projections of the multiplier estimates $\lambda ^{k}, \mu ^{k}, \ldots $ onto the respective sets $ [0, \lambda _{\max }]^m, [ \mu _{\min }, \mu _{\max }]^p, \ldots $. This implies that, for sufficiently large parameters $ \lambda _{\max }, \mu _{\min }, \mu _{\max }, \ldots $ the safeguarded ALM often coincides with the classical ALM. Differences occur, however, in those situations where the classical ALM generates unbounded Lagrange multiplier estimates. This has a significant influence on the (global) convergence theory of both methods: While there is a very satisfactory theory for the safeguarded method, see [13], a counterexample from [30] shows that these properties do not hold for the classical approach.

We have not specified a termination condition for the algorithm here. However, the convergence analysis in the next section suggests to stop the algorithm, e.g., if the M-stationarity conditions are satisfied up to a given tolerance.

In the subsequent discussion of the convergence properties of this algorithm, we often make use of the fact that the PHR augmented Lagrangian function is continuously differentiable with the gradient

$$\begin{aligned}&\nabla _x L((x,y), \lambda , \mu , \zeta , \eta , \gamma ; \alpha ) \\&\quad = \nabla f(x) + \alpha \left[ \nabla g(x) \left( g(x) + \tfrac{\lambda }{\alpha }\right) _+ + \nabla h(x) \left( h(x) + \tfrac{\mu }{\alpha }\right) + \left( x \circ y + \tfrac{\gamma }{\alpha }\right) \circ y\right] , \\&\nabla _y L((x,y), \lambda , \mu , \zeta , \eta , \gamma ; \alpha )\\&\quad = \alpha \left[ -\left( n-e^Ty-s + \tfrac{\zeta }{\alpha }\right) _+ e + \left( y-e + \tfrac{\eta }{\alpha }\right) _+ e + \left( x \circ y + \tfrac{\gamma }{\alpha }\right) \circ x \right] , \end{aligned}$$

where $ \nabla g(x) $ and $ \nabla h(x) $ denote the transposed Jacobian matrices of g and h at x, respectively. Consequently, the multipliers in $(S_2)$ are chosen exactly such that

$$\begin{aligned} \nabla _x L((x^k,y^k), {\bar{\lambda }}^k, {\bar{\mu }}^k, {\bar{\zeta }}^k, {\bar{\eta }}^k, {\bar{\gamma }}^k; \alpha _k)= & {} \nabla f(x) + \nabla g(x^k) \lambda ^k + \nabla h(x^k) \mu ^k + \gamma ^k \circ y^k, \\ \nabla _y L((x^k,y^k), {\bar{\lambda }}^k, {\bar{\mu }}^k, {\bar{\zeta }}^k, {\bar{\eta }}^k, {\bar{\gamma }}^k; \alpha _k)= & {} -\zeta ^k e + \eta ^k + \gamma ^k \circ x^k \end{aligned}$$

holds for all $k \in {\mathbb {N}}$.

4 Convergence Analysis

The aim of this section is to prove global convergence of Algorithm 3.1 to CCOP-M-stationary points under the fairly mild CCOP-quasinormality condition. To this end, we begin with an auxiliary result, which states that the sequence $\{ y^k \}$ remains bounded on any subsequence, where $\{ x^k \}$ itself is bounded. In particular, if $\{ x^k \}$ converges on a subsequence, this then allows us to extract a limit point of the sequence $\{(x^k, y^k)\}$.

Proposition 4.1

Let $\{x^k\} \subseteq {\mathbb {R}}^n$ be a sequence generated by Algorithm 3.1. Assume that $\{x^k\}$ is bounded on a subsequence. Then the auxiliary sequence $\{y^k\}$ is bounded on the same subsequence.

Proof

In order to avoid taking further subsequences, let us assume that the entire sequence $\{ x^k \}$ remains bounded. We then show that also the whole sequence $\{ y^k \}$ is bounded. Define, for each $k \in {\mathbb {N}}$,

$$\begin{aligned} B^k := \nabla _y L((x^k,y^k), {\bar{\lambda }}^k, {\bar{\mu }}^k, {\bar{\zeta }}^k, {\bar{\eta }}^k, {\bar{\gamma }}^k; \alpha _k) = -\zeta ^k e + \eta ^k + \gamma ^k \circ x^k. \end{aligned}$$

(4.1)

By (3.1), we know that $\{B^k\} \rightarrow 0$. We first show that the sequence $\{ y^k \}$ is bounded from above and then verify that it is also bounded from below.

$\{y^k\}~\textit{is bounded above}$ We claim that there exists a $c \in {\mathbb {R}}$ such that $y^k \le c e$ for all $k \in {\mathbb {N}}$. Suppose, by contradiction, that there is an index $ j \in \{1, \dots , n\} $ and a subsequence $\{y^{k_l}_j\}$ such that $\{y^{k_l}_j\} \rightarrow + \infty $. Since $\alpha _k \ge \alpha _1 > 0$ for all $k \in {\mathbb {N}}$ and ${\bar{\eta }}_j^{k_l}$ is bounded by definition, we then obtain

$$\begin{aligned} \big \{ \alpha _{k_l}(y^{k_l}_j - 1) + {\bar{\eta }}_j^{k_l} \big \} \rightarrow + \infty . \end{aligned}$$

(4.2)

This implies $ \eta _j^{k_l} = \alpha _{k_l}(y^{k_l}_j - 1) + {\bar{\eta }}_j^{k_l} $ for all $ l \in {\mathbb {N}}$ sufficiently large and, hence, by (4.2), we have $\{ \eta _j^{k_l} \} \rightarrow +\infty $. Observe that, for each $l \in {\mathbb {N}}$ sufficiently large, we have

$$\begin{aligned} \gamma _j^{k_l} x_j^{k_l} = \big ( \alpha _{k_l} x_j^{k_l} y_j^{k_l} + {\bar{\gamma }}_j^{k_l} \big ) x_j^{k_l} = \alpha _{k_l} (x_j^{k_l})^2 y_j^{k_l} + {\bar{\gamma }}_j^{k_l} x_j^{k_l} \ge {\bar{\gamma }}_j^{k_l} x_j^{k_l}. \end{aligned}$$

From (4.1), we then obtain for these $l \in {\mathbb {N}}$ that $ B^{k_l}_j = -\zeta ^{k_l} + \eta _j^{k_l} + \gamma _j^{k_l} x_j^{k_l} \ge -\zeta ^{k_l} + \eta _j^{k_l} + {\bar{\gamma }}_j^{k_l} x_j^{k_l} $, which is equivalent to $ \zeta ^{k_l} \ge \eta _j^{k_l} + {\bar{\gamma }}_j^{k_l} x_j^{k_l} - B^{k_l}_j $. Since $\{B^{k_l}_j\} \rightarrow 0$ and $\{{\bar{\gamma }}_j^{k_l} x_j^{k_l}\}$ is bounded, the right-hand side converges to $ +\infty $. Consequently, we have $\{\zeta ^{k_l}\} \rightarrow +\infty $. The definition of $\{\zeta ^{k_l}\}$ therefore yields $ \{ \alpha _{k_l} (n - e^T y^{k_l} - s) + {\bar{\zeta }}_{k_l} \} \rightarrow +\infty $. Since $\{{\bar{\zeta }}_{k_l}\}$ is a bounded sequence, we get $ \{ \alpha _{k_l} (n - e^T y^{k_l} - s) \} \rightarrow + \infty $. We therefore have

$$\begin{aligned} n - e^T y^{k_l} - s > 0 \quad \forall l \in {\mathbb {N}}\text { sufficiently large.} \end{aligned}$$

(4.3)

We now claim that

$$\begin{aligned} \exists i \in \{1,\dots ,n\} \setminus \{j\}: \ \{y_i^{k_l}\} \text { is unbounded from below.} \end{aligned}$$

(4.4)

Assume there exist $ d \in {\mathbb {R}}$ such that $ y^{k_l}_i \ge d$ for all $ i \in \{ 1, \ldots , n \} \setminus \{ j \} $ and all $ l \in {\mathbb {N}}$. We then obtain

$$\begin{aligned} n - e^T y^{k_l} - s = n - \displaystyle \sum _{i = 1, i \ne j}^n y_i^{k_l} - y_j^{k_l} - s \le n - (n-1) d - y_j^{k_l} - s \rightarrow - \infty . \end{aligned}$$

We therefore get $ n - e^T y^{k_l} - s < 0 $ for all $ l \in {\mathbb {N}}$ sufficiently large, but this contradicts (4.3), hence (4.4) holds. For this particular index i, we can construct a subsequence $\{y_i^{k_{l_t}}\}$ such that $\{y_i^{k_{l_t}}\} \rightarrow -\infty $. Since $\{{\bar{\eta }}_i\}^{k_{l_t}}$ is bounded, we then have $ \big \{\alpha _{k_{l_t}} (y_i^{k_{l_t}} - 1) + {\bar{\eta }}_i^{k_{l_t}} \big \} \rightarrow -\infty $. This implies $ \eta _i^{k_{l_t}} = 0 $ for all $ t \in {\mathbb {N}}$ sufficiently large. We therefore obtain from (4.1) that

$$\begin{aligned} B_i^{k_{l_t}}&= -\zeta ^{k_{l_t}} + \eta _i^{k_{l_t}} + \gamma _i^{k_{l_t}} x_i^{k_{l_t}} \ = \ -\zeta ^{k_{l_t}} + \gamma _i^{k_{l_t}} x_i^{k_{l_t}} \ = \ -\zeta ^{k_{l_t}} + \big ( \alpha _{k_{l_t}} x_i^{k_{l_t}} y_i^{k_{l_t}} + {\bar{\gamma }}_i^{k_{l_t}} \big ) x_i^{k_{l_t}} \\&= -\zeta ^{k_{l_t}} + \alpha _{k_{l_t}} (x_i^{k_{l_t}})^2 y_i^{k_{l_t}} + {\bar{\gamma }}_i^{k_{l_t}} x_i^{k_{l_t}} \ \le \ -\zeta ^{k_{l_t}} + {\bar{\gamma }}_i^{k_{l_t}} x_i^{k_{l_t}} \end{aligned}$$

for all $ t \in {\mathbb {N}}$ large enough. Since $\{{\bar{\gamma }}_i^{k_{l_t}} x_i^{k_{l_t}}\}$ is a bounded sequence and $\{\zeta ^{k_l}\} \rightarrow + \infty $, we get $ \{B_i^{k_{l_t}}\} \rightarrow -\infty $, which leads to a contradiction. Thus, $\{y^k\}$ is bounded above.

${\{y^k\}~\textit{is bounded below}}$ We claim that there exists a $d \in {\mathbb {R}}$ such that $y^k \ge d e$ for all $k \in {\mathbb {N}}$. Assume, by contradiction, that there is an index $ j \in \{1,\dots ,n\} $ such that $\{y^{k_l}_j\} \rightarrow -\infty $ on a suitable subsequence. Then, we have $ y_j^{k_l} < 0 $ and $ \eta _j^{k_l} = 0 $ for all $ l \in {\mathbb {N}}$ large enough, and similar to the previous case, it therefore follows that $ B_j^{k_l} \le -\zeta ^{k_l} + {\bar{\gamma }}_j^{k_l}x_j^{k_l} $. This can be rewritten as $ \zeta ^{k_l} \le {\bar{\gamma }}_j^{k_l}x_j^{k_l} - B_j^{k_l} $. Since $\{{\bar{\gamma }}_j^{k_l}x_j^{k_l}\}$ is bounded and $\{B_j^{k_l}\} \rightarrow 0$, the sequence $\{{\bar{\gamma }}_j^{k_l}x_j^{k_l} - B_j^{k_l}\}$ is bounded. This implies, in particular, that $\{\zeta ^{k_l}\}$ is bounded above, i.e.,

$$\begin{aligned} \exists r \in {\mathbb {R}}\ \forall l \in {\mathbb {N}}: \ \zeta ^{k_l} \le r. \end{aligned}$$

(4.5)

On the other hand, we already know $y^k \le c e$ for all $k \in {\mathbb {N}}$. We therefore get

$$\begin{aligned} n - e^T y^{k_l} - s \ge n - (n-1)c - y_j^{k_l} - s \rightarrow + \infty . \end{aligned}$$

This implies

$$\begin{aligned} \left\{ \alpha _{k_l} \left( n - e^T y^{k_l} - s\right) + {\bar{\zeta }}_{k_l} \right\} \rightarrow + \infty \end{aligned}$$

due to the boundedness of the sequence $\{ {\bar{\zeta }}_{k_l} \}$ and $\alpha _k \ge \alpha _1 > 0$ for all $k \in {\mathbb {N}}$. The definition of $\zeta ^{k_l}$ then yields

$$\begin{aligned} \zeta ^{k_l} = \alpha _{k_l} \left( n - e^T y^{k_l} - s\right) + {\bar{\zeta }}_{k_l} \rightarrow + \infty , \end{aligned}$$

which contradicts (4.5). Hence, $\{y^k\}$ is bounded below. $\square $

As for all penalty-type methods, one has to distinguish two aspects in a corresponding global convergence theory, namely the feasibility issue and an optimality statement. Without further assumptions, feasibility of the limit point cannot be guaranteed (for nonconvex constraints). However, there is a standard result in [13], which shows that the limit point of our stationary sequence is at least a stationary point of the constraint violation. To this end, we measure the infeasibility of a point $({\hat{x}}, {\hat{y}}) \in {\mathbb {R}}^n \times {\mathbb {R}}^n$ for (2.2) by using the unshifted quadratic penalty term

$$\begin{aligned} \pi _{0,1}(x,y) := \pi ((x,y), 0, 0, 0, 0, 0; 1). \end{aligned}$$

Clearly $({\hat{x}}, {\hat{y}})$ is feasible for (2.2) if and only if $\pi _{0,1}({\hat{x}},{\hat{y}}) = 0$. This, in turn, implies that $({\hat{x}}, {\hat{y}})$ minimizes $\pi _{0,1}(x,y)$. In particular, we then ought to have $\nabla \pi _{0,1}({\hat{x}},{\hat{y}}) = 0$.

Theorem 4.2

Let $({\hat{x}}, {\hat{y}}) \in {\mathbb {R}}^n \times {\mathbb {R}}^n$ be a limit point of the sequence $\{(x^k, y^k)\}$ generated by Algorithm 3.1. Then $\nabla \pi _{0,1}({\hat{x}},{\hat{y}}) = 0$.

We omit the proof here, since it is identical to [13, Theorem 6.3] and [31, Theorem 6.2]. Instead, we turn to an optimality result for Algorithm 3.1. Suppose that the sequence $\{x^k\}$ generated by Algorithm 3.1 has a limit point ${\hat{x}}$. Proposition 4.1 then suggests that we can extract a limit point $({\hat{x}}, {\hat{y}})$ of the sequence $\{(x^k, y^k)\}$. Under the additional assumptions that ${\hat{x}}$ satisfies CCOP-quasinormality and $({\hat{x}}, {\hat{y}})$ is feasible for (2.2), we can show that $({\hat{x}}, {\hat{y}})$ is a CCOP-M-stationary point.

Theorem 4.3

Let $({\hat{x}}, {\hat{y}}) \in {\mathbb {R}}^n \times {\mathbb {R}}^n$ be a limit point of $\{(x^k, y^k)\}$ generated by Algorithm 3.1 that is feasible for (2.2) and where ${\hat{x}}$ satisfies CCOP-quasinormality. Then $({\hat{x}}, {\hat{y}})$ is a CCOP-M-stationary point.

Proof

To simplify the notation, we assume, throughout this proof, that the entire sequence $\{ (x^k, y^k) \}$ converges to $({\hat{x}}, {\hat{y}})$. For each $k \in {\mathbb {N}}$, we define

$$\begin{aligned} A^k:= & {} \nabla _x L((x^k,y^k), {\bar{\lambda }}^k, {\bar{\mu }}^k, {\bar{\zeta }}^k, {\bar{\eta }}^k, {\bar{\gamma }}^k; \alpha _k)\\= & {} \nabla f(x^k) + \nabla g(x^k) \lambda ^k + \nabla h(x^k) \mu ^k + \gamma ^k \circ y^k. \end{aligned}$$

Furthermore, let $B^k$ be given as in (4.1). By (3.1) and since $\{\epsilon _k\} \downarrow 0$, we know that $\{A^k\} \rightarrow 0$ and $\{B^k\} \rightarrow 0$. Observe that, by $(S_2)$, we have $\{\lambda ^k\} \subseteq {\mathbb {R}}^m_+$. Furthermore, by $(S_3)$, the sequence of penalty parameters $\{\alpha _k\}$ satisfies $\alpha _k \ge \alpha _1 > 0$ for all $k \in {\mathbb {N}}$. Let us now distinguish two cases.

Case 1 $\{\alpha _k\}$ is bounded. Then $\{\alpha _k\}$ is eventually constant, say $ \alpha _k = \alpha _K $ for all $ k \ge K $ with some sufficiently large $ K \in {\mathbb {N}}$. Now, let us take a closer look at $(S_2)$. The boundedness of $\{\alpha _k\}$ immediately implies that the sequences $\{\mu ^k\}$ and $\{\gamma ^k \circ y^k\}$ are bounded. By passing onto subsequences if necessary, we can assume w.l.o.g. that these sequences converge, i.e. $\{\mu ^k\} \rightarrow {{\hat{\mu }}}$ and $\{\gamma ^k \circ y^k\} \rightarrow {{\hat{\gamma }}}$. For all $i \in I_\pm ({\hat{x}})$ the feasibility of $({\hat{x}}, {\hat{y}})$ implies ${\hat{y}}_i = 0$. Since, in this case, we have $\{y_i^k\} \rightarrow 0$, it follows that

$$\begin{aligned} {\hat{\gamma }}_i = \displaystyle \lim _{k \rightarrow \infty }\gamma _i^k y_i^k = \lim _{k \rightarrow \infty } \alpha _k x_i^k (y_i^k)^2 + \lim _{k \rightarrow \infty } {\bar{\gamma }}_i^k y_i^k = \alpha _K \cdot 0 + \lim _{k \rightarrow \infty } {\bar{\gamma }}_i^k y_i^k = 0 \quad \forall i \in I_\pm ({\hat{x}}). \end{aligned}$$

Next, observe that, for each $i \in \{1, \dots , m\}$, we have $ 0 \le \lambda _i^k \le |\alpha _k g_i(x^k) + {\bar{\lambda }}_i^k| $ for all $ k \in {\mathbb {N}}$. Thus, $\{\lambda _i^k\}$ is bounded as well and has a convergent subsequence. Thus, we can assume w.l.o.g. that $\{\lambda ^k\} \rightarrow {\hat{\lambda }}$ on the whole sequence. Now, the boundedness of $\{\alpha _k\}$ and $(S_3)$ also imply $\{ \Vert U^k \Vert \} \rightarrow 0$. Let $i \notin I_g({\hat{x}})$. Since, by definition, $\{{\bar{\lambda }}^k \}$ is bounded, $\left\{ \frac{{\bar{\lambda }}_i^k}{\alpha _k} \right\} $ is bounded as well and therefore has a convergent subsequence. Assume w.l.o.g. that this sequence converges to some limit point $a_i$. Then

$$\begin{aligned} 0 = \displaystyle \lim _{k \rightarrow \infty } \Vert U_i^k \Vert = \Vert \min \{-g_i({\hat{x}}), a_i\} \Vert \quad \Rightarrow \quad \min \{-g_i({\hat{x}}), a_i\} = 0. \end{aligned}$$

Since $-g_i({\hat{x}}) > 0$, we get $a_i = 0$. This implies

$$\begin{aligned} \left\{ g_i(x^k) + \tfrac{{\bar{\lambda }}_i^k}{\alpha _k} \right\} \rightarrow g_i({\hat{x}}) + a_i = g_i({\hat{x}}) < 0. \end{aligned}$$

Thus, by $(S_2)$ we have

$$\begin{aligned} \lambda _i^k = \max \left\{ 0, \alpha _k g_i(x^k) + {\bar{\lambda }}_i^k \right\} = 0 \quad \forall k \in {\mathbb {N}}\text { sufficiently large}. \end{aligned}$$

(4.6)

As its limit, we then also have ${\hat{\lambda }}_i = 0$. Letting $ k \rightarrow \infty $, the definition of $A^k$ then yields

$$\begin{aligned} 0 = \nabla f({\hat{x}}) + \nabla g({\hat{x}}) {\hat{\lambda }} + \nabla h({\hat{x}}) {\hat{\mu }} + {\hat{\gamma }}. \end{aligned}$$

Altogether, it follows that $({\hat{x}}, {\hat{y}})$ is a CCOP-M-stationary point.

Case 2 $\{\alpha _k\}$ is unbounded. Then, we have $\{\alpha _k\} \rightarrow +\infty $. Now define, for each $k \in {\mathbb {N}}$,

$$\begin{aligned} {\tilde{\gamma }}_i^k := \gamma _i^k y_i^k \quad \forall i \in \{1, \dots , n\}. \end{aligned}$$

We claim that the sequence $\{ ( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k ) \}$ is bounded. By contradiction, assume that $\{ \Vert ( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k ) \Vert \} \rightarrow \infty $, w.l.o.g. on the whole sequence. The corresponding normalized sequence $\left\{ \frac{\left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) }{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } \right\} $ is bounded and therefore, again w.l.o.g. on the whole sequence, convergent to a (nontrivial) limit, i.e.

$$\begin{aligned} \left\{ \frac{\left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) }{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } \right\} \rightarrow \left( {\tilde{\lambda }}, {\tilde{\mu }}, {\tilde{\gamma }}, {\tilde{\zeta }}, {\tilde{\eta }}\right) \ne 0. \end{aligned}$$

We show that this limit, together with the sequence $\{x^k\}$, contradicts CCOP-quasinormality in ${\hat{x}}$: Since $ \lambda ^k \ge 0 $ for all k, it follows that ${\tilde{\lambda }} \ge 0$. Now, take an index $i \notin I_g({\hat{x}})$, i.e. $g_i({\hat{x}}) < 0$. Since $\left\{ {\bar{\lambda }}_i^k\right\} $ is bounded, it follows that $ \left\{ \alpha _k g_i(x^k) + {\bar{\lambda }}_i^k \right\} \rightarrow - \infty $. This implies $ \lambda _i^k = 0 $ for all $ k \in {\mathbb {N}}$ sufficiently large, hence we get

$$\begin{aligned} {\tilde{\lambda }}_i = \displaystyle \lim _{k \rightarrow \infty }\frac{\lambda _i^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } = 0 \quad \forall i \notin I_g({\hat{x}}). \end{aligned}$$

(4.7)

Next take an index $i \in I_\pm ({\hat{x}})$. Since $({\hat{x}}, {\hat{y}})$ is feasible, we then have ${\hat{y}}_i = 0$. The boundedness of $\{{\bar{\eta }}_i^k\}$ therefore yields $ \left\{ \alpha _k \left( y_i^k - 1 \right) + {\bar{\eta }}_i^k \right\} \rightarrow -\infty $. Consequently, we obtain

$$\begin{aligned} \eta _i^k = 0 \quad \forall i \in I_\pm ({\hat{x}}) \ \forall k \in {\mathbb {N}}\text { sufficiently large}. \end{aligned}$$

(4.8)

Now, we claim that ${\tilde{\gamma }}_i = 0$ holds for such an index i. Suppose not. Then ${\tilde{\gamma }}_i^k \ne 0 $ for all $ k \in {\mathbb {N}}$ sufficiently large. Since ${\tilde{\gamma }}_i^k = \gamma _i^k y_i^k$, this implies $y_i^k \ne 0 $ for all $ k \in {\mathbb {N}}$ large enough. We then have

$$\begin{aligned} B^k_i = -\zeta ^k + \eta _i^k + \gamma _i^k x_i^k {\mathop {=}\limits ^{(4.8)}} -\zeta ^k + \gamma _i^k x_i^k = -\zeta ^k + \frac{{\tilde{\gamma }}_i^k}{y_i^k} x_i^k. \end{aligned}$$

(4.9)

Rearranging and dividing (4.9) by $\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| $ then gives

$$\begin{aligned} \frac{B_i^k + \zeta ^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } = \frac{{\tilde{\gamma }}_i^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } \cdot x_i^k \cdot \frac{1}{y_i^k}. \end{aligned}$$

(4.10)

Observe that the left-hand side of (4.10) converges. On the other hand, since

$$\begin{aligned} \left\{ \frac{{\tilde{\gamma }}_i^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } x_i^k \right\} \rightarrow {\tilde{\gamma }}_i {\hat{x}}_i \ne 0 \end{aligned}$$

and $\{y_i^k\} \rightarrow 0$, the right-hand side diverges. This contradiction shows that

$$\begin{aligned} {\tilde{\gamma }}_i = 0 \quad \forall i \in I_\pm ({\hat{x}}). \end{aligned}$$

(4.11)

Now, we claim that $({\tilde{\lambda }}, {\tilde{\mu }}, {\tilde{\gamma }}) \ne 0$. Suppose not. Then, since $\left( {\tilde{\lambda }}, {\tilde{\mu }}, {\tilde{\gamma }}, {\tilde{\zeta }}, {\tilde{\eta }}\right) \ne 0$, it follows that $\left( {\tilde{\zeta }}, {\tilde{\eta }} \right) \ne 0$. Consider an index $i \in I_0({\hat{y}})$. Since $\{y_i^k\} \rightarrow {\hat{y}}_i$ and $\{{\bar{\eta }}_i^k\}$ is a bounded sequence, we have $ \left\{ \alpha _k \left( y_i^k - 1 \right) + {\bar{\eta }}_i^k \right\} \rightarrow -\infty $. Just like before, we can then assume w.l.o.g. that

$$\begin{aligned} \eta _i^k = 0 \quad \forall i \in I_0({\hat{y}}) \ \forall k \in {\mathbb {N}}\end{aligned}$$

(4.12)

which implies $ {\tilde{\eta }}_i = 0 $. Hence, we have

$$\begin{aligned} \left( {\tilde{\zeta }}, {\tilde{\eta }}_i \ \left( i \in I_\pm ({\hat{y}})\right) \right) \ne 0. \end{aligned}$$

(4.13)

Now let $i \in I_\pm ({\hat{y}})$. Since ${\hat{y}}_i \ne 0$ and $\{y_i^k\} \rightarrow {\hat{y}}_i$, we can assume w.l.o.g. that $y_i^k \ne 0 $ for all $ k \in {\mathbb {N}}$. We then get, for each $k \in {\mathbb {N}}$, that

$$\begin{aligned} B_i^k = -\zeta ^k + \eta _i^k + \gamma _i^k x_i^k = -\zeta ^k + \eta _i^k + \frac{{\tilde{\gamma }}_i^k}{y_i^k} x_i^k. \end{aligned}$$

(4.14)

Rearranging and dividing (4.14) by $\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| $ yields

$$\begin{aligned} \frac{B_i^k + \zeta ^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } = \frac{\eta _i^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } + \frac{{\tilde{\gamma }}_i^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } \cdot x_i^k \cdot \frac{1}{y_i^k}. \end{aligned}$$

(4.15)

By assumption, ${\tilde{\gamma }}_i = 0$. Consequently, letting $k \rightarrow \infty $ in (4.15) yields

$$\begin{aligned} {\tilde{\zeta }} = {\tilde{\eta }}_i + 0 \cdot {\hat{x}}_i \cdot \frac{1}{{\hat{y}}_i} = {\tilde{\eta }}_i. \end{aligned}$$

(4.16)

From (4.13) we then obtain $ {\tilde{\zeta }} \ne 0 $ and $ {\tilde{\eta }}_i = {\tilde{\zeta }} \ne 0 $ for all $ i \in I_\pm ({\hat{y}}) $. Since $\zeta ^k \ge 0 $ for all $ k \in {\mathbb {N}}$, we have ${\tilde{\zeta }} \ge 0$ and, therefore, ${\tilde{\zeta }} > 0$. Hence, we can assume w.l.o.g. that $\zeta ^k > 0 $ for all $ k \in {\mathbb {N}}$. This implies $ \zeta ^k = \alpha _k \left( n - e^T y^k - s \right) + {\bar{\zeta }}^k $. We then have

$$\begin{aligned} 0 < {\tilde{\zeta }}&= \displaystyle \lim _{k \rightarrow \infty } \frac{\zeta ^k}{\left\| \left( \lambda ^k, \mu ^k,{\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } \\&= \lim _{k \rightarrow \infty } \frac{\alpha _k \left( n - e^T y^k - s \right) }{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } + \lim _{k \rightarrow \infty } \frac{{\bar{\zeta }}^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } \\&= \lim _{k \rightarrow \infty } \frac{\alpha _k \left( n - e^T y^k - s \right) }{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| }, \end{aligned}$$

since $\{{\bar{\zeta }}^k\}$ is bounded by definition. Consequently, we can assume w.l.o.g. that

$$\begin{aligned} n - e^T y^k - s > 0 \quad \forall k \in {\mathbb {N}}. \end{aligned}$$

(4.17)

By assumption, $({\hat{x}}, {\hat{y}})$ is feasible and, hence, $n - e^T {\hat{y}}- s \le 0$. Thus, we obtain from (4.17) that $n - e^T y^k -s > n - e^T {\hat{y}}- s$ and, therefore,

$$\begin{aligned} e^T {\hat{y}}> e^T y^k \quad \forall k \in {\mathbb {N}}. \end{aligned}$$

(4.18)

Furthermore, since ${\tilde{\zeta }} > 0$, by (4.16), we also have that ${\tilde{\eta }}_i > 0 $ for all $ i \in I_\pm ({\hat{y}})$. This implies $\eta _i^k > 0 $ for all sufficiently large $ k \in {\mathbb {N}}$. Consequently, we have $ \eta _i^k = \alpha _k \left( y_i^k - 1 \right) + {\bar{\eta }}_i^k $ for all $k \in {\mathbb {N}}$ large enough. We then obtain

$$\begin{aligned} 0 < {\tilde{\eta }}_i&= \displaystyle \lim _{k \rightarrow \infty } \frac{\eta _i^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } \\&= \lim _{k \rightarrow \infty } \frac{\alpha _k \left( y_i^k - 1 \right) }{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } + \lim _{k \rightarrow \infty } \frac{{\bar{\eta }}_i^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } \\&= \lim _{k \rightarrow \infty } \frac{\alpha _k \left( y_i^k - 1 \right) }{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| }, \end{aligned}$$

since $\{{\bar{\eta }}_i^k\}$ is bounded by definition. Hence, we can assume w.l.o.g. that $ y_i^k > 1 $ for all $ k \in {\mathbb {N}}$. On the other hand, the feasibility of $({\hat{x}}, {\hat{y}})$ implies ${\hat{y}}_i \le 1 $ for all $ i \in \{1, \dots , n\} $. Consequently, we obtain

$$\begin{aligned} {\hat{y}}_i < y_i^k \quad \forall i \in I_\pm ({\hat{y}}) \ \forall k \in {\mathbb {N}}. \end{aligned}$$

(4.19)

Together, this implies

$$\begin{aligned}&\sum _{i \in I_\pm ({\hat{y}})} {\hat{y}}_i = e^T{{\hat{y}}} > e^T y^k = \sum _{i \in I_\pm ({\hat{y}})} y^k_i + \sum _{i \in I_0({\hat{y}})} y^k_i \ge \sum _{i \in I_\pm ({\hat{y}})} {\hat{y}}_i + \sum _{i \in I_0({\hat{y}})} y^k_i \quad \\&\quad \Longrightarrow \quad \sum _{i \in I_0({\hat{y}})} y^k_i < 0 \end{aligned}$$

for all $k \in {\mathbb {N}}$. By passing to a subsequence, we can therefore assume w.l.o.g. that there exists a $j \in I_0({\hat{y}})$ with $y^k_j < 0$ for all $k \in {\mathbb {N}}$. Since $j \in I_0({\hat{y}})$, by (4.12), we have $\eta _j^k = 0 $ for all $ k \in {\mathbb {N}}$ and, hence, $ B_j^k = -\zeta ^k + \gamma _j^k x_j^k $ or, equivalently, $ B_j^k + \zeta ^k = \gamma _j^k x_j^k $. Since $y_j^k \le 0$, we then have

$$\begin{aligned} \gamma _j^k x_j^k = \left( \alpha _k x_j^k y_j^k + {\bar{\gamma }}_j^k \right) x_j^k = \alpha _k (x_j^k)^2 y_j^k + {\bar{\gamma }}_j^k x_j^k \le {\bar{\gamma }}_j^k x_j^k. \end{aligned}$$

Consequently, we have $ B_j^k + \zeta ^k \le {\bar{\gamma }}_j^k x_j^k $ and, therefore,

$$\begin{aligned} \frac{B_j^k + \zeta ^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } \le \frac{{\bar{\gamma }}_j^k x_j^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| }. \end{aligned}$$

Since $\{ {\bar{\gamma }}_j^k x_j^k \}$ is bounded, letting $k \rightarrow \infty $ then yields the contradiction $ 0 < {\tilde{\zeta }} \le 0 $. Hence we have $ ({\tilde{\lambda }}, {\tilde{\mu }}, {\tilde{\gamma }}) \ne 0 $.

Dividing $A^k$ by $\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| $ and letting $k \rightarrow \infty $ then yields

$$\begin{aligned} 0 = \displaystyle \sum _{i = 1}^m {\tilde{\lambda }}_i \nabla g_i({\hat{x}}) + \sum _{i = 1}^p {\tilde{\mu }}_i \nabla h_i({\hat{x}}) + \sum _{i = 1}^n {\tilde{\gamma }}_i e_i \end{aligned}$$

where $({\tilde{\lambda }}, {\tilde{\mu }}, {\tilde{\gamma }}) \ne 0$ and, in view of (4.7) and (4.11), $ {\tilde{\lambda }} \in {\mathbb {R}}^m_+ $, $ {\tilde{\lambda }}_i = 0 $ for all $ i \notin I_g({\hat{x}}) $, and $ {\tilde{\gamma }}_i = 0 $ for all $ i \in I_\pm ({\hat{x}}) $. This shows that $ {\hat{x}}$ satisfies properties (a)–(c) from Definition 2.4. We now verify that also the three conditions from part (d) hold.

For this purpose, let $i \in \{1, \dots , m\}$ such that ${\tilde{\lambda }}_i > 0$ holds. Then, we can assume w.l.o.g. that $\lambda _i^k > 0 $ for all $ k \in {\mathbb {N}}$ and, thus, $\lambda _i^k = \alpha _k g_i(x^k) + {\bar{\lambda }}_i^k$. Consequently, we have

$$\begin{aligned} 0 < {\tilde{\lambda }}_i&= \displaystyle \lim _{k \rightarrow \infty } \frac{\lambda _i^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } \\&= \lim _{k \rightarrow \infty } \frac{\alpha _k g_i(x^k)}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } + \lim _{k \rightarrow \infty } \frac{{\bar{\lambda }}_i^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } \\&= \lim _{k \rightarrow \infty } \frac{\alpha _k g_i(x^k)}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } \end{aligned}$$

by the boundedness of $\{{\bar{\lambda }}_i^k\}$. Thus, we have $ g_i(x^k) > 0 $ for all $ k \in {\mathbb {N}}$ sufficiently large and, therefore, also $ {\tilde{\lambda }}_i g_i(x^k) > 0 $ for all these $ k \in {\mathbb {N}}$.

Next consider an index $i \in \{1, \dots , p\}$ such that ${\tilde{\mu }}_i \ne 0$. The boundedness of $\{{\bar{\mu }}_i^k\}$ then implies

$$\begin{aligned} {\tilde{\mu }}_i&= \displaystyle \lim _{k \rightarrow \infty } \frac{\mu _i^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } = \lim _{k \rightarrow \infty } \frac{\alpha _k h_i(x^k)}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } \\&\quad + \lim _{k \rightarrow \infty } \frac{{\bar{\mu }}_i^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } \\&= \lim _{k \rightarrow \infty } \frac{\alpha _k h_i(x^k)}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| }. \end{aligned}$$

Since $\alpha _k > 0$, this implies that ${\tilde{\mu }}_i \ne 0$ and $h_i(x^k)$ have the same sign for all $ k \in {\mathbb {N}}$ sufficiently large, i.e. $ {\tilde{\mu }}_i h_i(x^k) > 0 $.

Finally, consider an index $i \in \{1, \dots , n\}$ such that ${\tilde{\gamma }}_i \ne 0$. The boundedness of $\{{\bar{\gamma }}_i^k\}$ yields

$$\begin{aligned} {\tilde{\gamma }}_i&= \displaystyle \lim _{k \rightarrow \infty } \frac{{\tilde{\gamma }}_i^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } = \lim _{k \rightarrow \infty } \frac{\gamma _i^k y_i^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } \\&= \lim _{k \rightarrow \infty } \frac{(\alpha _k x_i^k y_i^k + {\bar{\gamma }}_i^k) y_i^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } \\&= \lim _{k \rightarrow \infty } \frac{\alpha _k x_i^k (y_i^k)^2 }{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } + \lim _{k \rightarrow \infty } \frac{{\bar{\gamma }}_i^k y_i^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } \\&= \lim _{k \rightarrow \infty } \frac{\alpha _k x_i^k (y_i^k)^2}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| }. \end{aligned}$$

Hence, ${\tilde{\gamma }}_i \ne 0$ and $x_i^k$ also have the same sign for all $ k \in {\mathbb {N}}$ large, i.e. $ {\tilde{\gamma }}_i x_i^k > 0 $.

Altogether, this contradicts the assumed CCOP-quasinormality of ${\hat{x}}$. Thus, $\left\{ ( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k )\right\} $ is bounded and therefore has a convergent subsequence. Assume w.l.o.g. that the whole sequence converges, i.e.,

$$\begin{aligned} \exists \big ( {\hat{\lambda }}, {\hat{\mu }}, {\hat{\gamma }}, {\hat{\zeta }}, {\hat{\eta }} \big ): \ \big \{\big ( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \big )\big \} \rightarrow \big ( {\hat{\lambda }}, {\hat{\mu }}, {\hat{\gamma }}, {\hat{\zeta }}, {\hat{\eta }} \big ). \end{aligned}$$

Since $\{\lambda ^k\} \subseteq {\mathbb {R}}^m_+$, we also have ${\hat{\lambda }} \in {\mathbb {R}}^m_+$. Consider an index $i \notin I_g({\hat{x}})$. Then, just like for ${\tilde{\lambda }}_i$, one can show that ${\hat{\lambda }}_i = 0$. Similarly, for $i \in I_\pm ({\hat{x}})$, following the argument for ${\tilde{\gamma }}_i$, one also gets ${\hat{\gamma }}_i = 0$. Taking $ k \rightarrow \infty $ in the definition of $A^k$, we then obtain

$$\begin{aligned} 0 = \nabla f({\hat{x}}) + \nabla g({\hat{x}}) {\hat{\lambda }} + \nabla h({\hat{x}}) {\hat{\mu }} + {\hat{\gamma }}, \end{aligned}$$

where $ {\hat{\lambda }}_i = 0 $ for all $ i \notin I_g({\hat{x}}) $ and $ {\hat{\gamma }}_i = 0 $ for all $ i \in I_\pm ({\hat{x}}) $. Thus, we conclude that $({\hat{x}}, {\hat{y}})$ is CCOP-M-stationary. $\square $

It is known from [6, Corollary 4.2] that accumulation points $({{\hat{x}}}, {{\hat{y}}})$ of Algorithm 3.1, where standard quasinormality holds, are KKT points and thus CCOP-S-stationary. To compare this result with Theorem 4.3, first note that CCOP-quasinormality only depends on ${{\hat{x}}}$, whereas standard quasinormality for (2.2) depends on both $({{\hat{x}}}, {{\hat{y}}})$. In case $\{i \mid {{\hat{y}}}_i \ne 0\} = I_0({{\hat{x}}})$, standard quasinormality in $({{\hat{x}}}, {{\hat{y}}})$ is equivalent to CCOP-quasinormality in ${{\hat{x}}}$, and CCOP-S- and CCOP-M-stationarity coincide. Thus, in this situation, the statement from Theorem 4.3 can also be derived via [6, Corollary 4.2]. However, in case $\{i \mid {{\hat{y}}}_i \ne 0\} \subsetneq I_0({{\hat{x}}})$, standard quasinormality is always violated in $({{\hat{x}}}, {{\hat{y}}})$ and thus [6, Corollary 4.2] cannot be applied. In the latter situation, in general, we can only guarantee CCOP-M-stationarity of limits $({\hat{x}}, {\hat{y}})$. But, using Proposition 2.3, it is still possible to ensure CCOP-stationarity of a potentially modified point $({\hat{x}}, {\hat{z}})$.

Corollary 4.4

Let $({\hat{x}}, {\hat{y}}) \in {\mathbb {R}}^n \times {\mathbb {R}}^n$ be a limit point of $\{(x^k, y^k)\}$ generated by Algorithm 3.1 that is feasible for (2.2) and where ${\hat{x}}$ satisfies CCOP-quasinormality. Then there exists ${\hat{z}}\in {\mathbb {R}}^n$ such that $({\hat{x}}, {\hat{z}})$ is a CCOP-S-stationary point.

5 Numerical Results

In this section, we compare the performance of ALGENCAN with the Scholtes regularization method from [15] as well as the Kanzow–Schwartz regularization method from [17]. All experiments were conducted using Python together with the Numpy library. We used ALGENCAN 2.4.0 compiled with MA57 library [25] and called through its Python interface with user-supplied gradients of the objective functions, sparse Jacobian of the constraints, as well as sparse Hessian of the Lagrangian. As a subsolver for the two regularization methods, we used the (for academic use) freely available SQP solver WORHP version 1.14 [18] called through its Python interface. For the Scholtes regularization method, WORHP was called with user-supplied sparse gradients of the objective functions, sparse Jacobian of the constraints, as well as the sparse Hessian of the Lagrangian. On the other hand, for the Kanzow–Schwartz regularization method, since the analytical Hessian does not exist as the corresponding NCP-function is not twice differentiable, we called WORHP with user-supplied sparse gradients of the objective functions and sparse Jacobian of the constraints only. The Hessian of the Lagrangian was then approximated using the BFGS method. Throughout the experiments, both ALGENCAN and WORHP were called using their respective default settings.

We applied ALGENCAN directly to the relaxed reformulation of the test problems as in (2.2), i.e. without a lower bound for the auxiliary variable y. In contrast, following [15, 17], for both regularization methods, we bounded y from below by 0. For each test problem, we started both regularization methods with an initial regularization parameter $t_0 = 1.0$ and decreased $t_k$ in each iteration by a factor of 0.01. The regularization methods were terminated, if either $t_k < 10^{-8}$ or $\left\| x^k \circ y^k \right\| _\infty \le 10^{-6}$.

5.1 Pilot Test

Let us begin by considering the following academic example

$$\begin{aligned} \displaystyle \min _{x \in {\mathbb {R}}^2} x_1 + 10 x_2 \quad \text {s.t.}\quad \left( x_1 - \tfrac{1}{2}\right) ^2 + \left( x_2 - 1\right) ^2 \le 1, \ \Vert x\Vert _0 \le 1 \end{aligned}$$

which is taken from [17]. This problem has a local minimizer in $\left( 0, 1 - \frac{1}{2}\sqrt{3}\right) $ and an isolated global minimizer in $\left( \frac{1}{2}, 0\right) $. Following [17], we discretised the rectangle $\left[ -1, \frac{3}{2}\right] \times \left[ -\frac{1}{2},2\right] $ resulting in 441 starting points for the considered methods. For each of these starting points, ALGENCAN converged towards the global minimizer $\left( \frac{1}{2}, 0\right) $. The same behaviour was also observed for the Scholtes regularization method. On the other hand, the Kanzow–Schwartz regularization method was slightly less successful, converging in 437 cases towards the global minimizer. In the other 4 cases, the method converged towards the local minimizer. This behaviour might be due to the performance of the BFGS method used by WORHP in approximating the Hessian of the Lagrangian. Indeed, running the Scholtes regularization method without user-supplied Hessian of the Lagrangian, letting the Hessian be approximated by the BFGS method instead, yielded in a convergence towards the global minimizer in only 394 cases. In the other 47 cases, the Scholtes regularization method only managed to find the local minimizer.

5.2 Portfolio Optimization Problems

Following [17], we consider a classical portfolio optimization problem

$$\begin{aligned} \begin{array}{lll} \displaystyle \min _{x \in {\mathbb {R}}^n} \ x^T Q x &{} \text {s.t.}&{} \mu ^T x \ge \rho , \; e^T x \le 1, \; 0 \le x \le u, \\ &{} &{} \left\| x\right\| _0 \le s, \end{array} \end{aligned}$$

(5.1)

where Q and $\mu $ are the covariance matrix and the mean of n possible assets and $e^T x \le 1$ is the budget constraint, see [12, 20]. We generated the test problems using the data from [24], considering $s = 5, 10, 20$ for each dimension $n = 200, 300, 400$, which resulted in 270 test problems, see also [17]. Here, we considered six total approaches:

ALGENCAN without a lower bound on y
ALGENCAN with an additional lower bound $y \ge 0$
Scholtes and Kanzow–Schwartz regularization for cardinality-constrained problems [15, 17] with a regularization of both upper quadrants $x_i \ge 0, y_i \ge 0$ and $x_i \le 0, y_i \ge 0$
Scholtes and Kanzow–Schwartz regularization for MPCCs [28, 35] with a regularization of the upper right quadrant $x_i \ge 0, y_i \ge 0$ only.

As discussed before, introducing a lower bound $y \ge 0$ in (2.2) is possible without changing the theoretical properties of the reformulation. Similarly, due to the constraint $x \ge 0$ in (5.1), the feasible set of the reformulated problem actually has the classical MPCC structure, and thus only one regularization function in the first quadrant suffices. This motivates the modifications of both ALGENCAN and the two regularization methods described above, which should theoretically not have any effect on the performance of the solution algorithms.

For each test problem, we used the initial values $x^0 = 0$ and $y^0 = e$. As a performance measure for the considered methods we compared the attained objective function values and generated a performance profile as suggested in [21], where we set the objective function value of a method for a problem to be $\infty $, if the method failed to find a feasible point of the problem within a tolerance of $10^{-6}$.

As can be seen from Fig. 1, ALGENCAN worked very reliable with regards to feasibility of the solutions. It often outperformed the regularization methods in terms of objective function value of the solution, especially for larger values of s. Although introducing the lower bound $y \ge 0$ does not have any theoretical effect on ALGENCAN, the numerical results suggest that it could bring slight improvements to ALGENCAN’s performance.

6 Final Remarks

This paper shows that the safeguarded augmented Lagrangian method applied directly and without problem-specific modifications to the continuous reformulation of cardinality-constrained problems converges to suitable (M-, essentially even S-) stationary points under a weak problem-tailored CQ called CCOP-quasinormality. On the other hand, it is known that this safeguarded ALM generates so-called AKKT sequences (AKKT = approximate KKT) which, under suitable constraint qualifications, lead to KKT points and, hence, to S-stationary points. In the context of cardinality constraints, however, the AKKT concept is useless as an optimality criterion since any feasible point is known to be an AKKT point, cf. [32].

On the other hand, there are some recent reports, which define a problem-tailored AKKT-type condition for cardinality constrained problems, see [32, 33] (the latter in a more general context). Algorithmic applications of these AKKT-type conditions are not discussed in these papers. We therefore plan to investigate this topic within our future research. Note that a corresponding convergence theory based on AKKT-type conditions for cardinality constrained problems will be different from our current theory, based on CCOP-quasinormality, since it is already known from standard NLPs that quasinormality and AKKT regularity conditions are two independent concepts, cf. [4].

References

Andreani, R., Martinez, J.M., Schuverdt, M.L.: On the relation between constant positive linear dependence condition and quasinormality constraint qualification. J. Optim. Theory Appl. 125(2), 473–485 (2005)
Article MathSciNet Google Scholar
Andreani, R., Birgin, E.G., Martínez, J.M., Schuverdt, M.L.: On augmented Lagrangian methods with general lower-level constraints. SIAM J. Optim. 18(4), 1286–1309 (2007)
Article MathSciNet Google Scholar
Andreani, R., Birgin, E.G., Martínez, J.M., Schuverdt, M.L.: Augmented Lagrangian methods under the constant positive linear dependence constraint qualification. Math. Program. 111(1–2, Ser. B), 5–32 (2008)
MathSciNet MATH Google Scholar
Andreani, R., Martínez, J.M., Ramos, A., Silva, P.J.S.: A cone-continuity constraint qualification and algorithmic consequences. SIAM J. Optim. 26(1), 96–110 (2016)
Article MathSciNet Google Scholar
Andreani, R., Secchin, L.D., Silva, P.J.S.: Convergence properties of a second order augmented Lagrangian method for mathematical programs with complementarity constraints. SIAM J. Optim. 28(3), 2574–2600 (2018)
Article MathSciNet Google Scholar
Andreani, R., Fazzio, N.S., Schuverdt, M.L., Secchin, L.D.: A sequential optimality condition related to the quasi-normality constraint qualification and its algorithmic consequences. SIAM J. Optim. 29(1), 743–766 (2019a)
Article MathSciNet Google Scholar
Andreani, R., Haeser, G., Secchin, L.D., Silva, P.J.S.: New sequential optimality conditions for mathematical programs with complementarity constraints and algorithmic consequences. SIAM J. Optim. 29(4), 3201–3230 (2019b)
Article MathSciNet Google Scholar
Andreani, R., Haeser, G., Viana, D.S.: Optimality conditions and global convergence for nonlinear semidefinite programming. Math. Program. 180(1–2, Ser. A), 203–235 (2020)
Article MathSciNet Google Scholar
Bertsekas, D..P.: Constrained Optimization and Lagrange Multiplier Methods. Computer Science and Applied Mathematics. Academic Press, Inc. [Harcourt Brace Jovanovich, Publishers], New York-London (1982)
MATH Google Scholar
Bertsekas, D.P., Ozdaglar, A.E.: Pseudonormality and a Lagrange multiplier theory for constrained optimization. J. Optim. Theory Appl. 114(2), 287–343 (2002)
Article MathSciNet Google Scholar
Bertsimas, D., Shioda, R.: Algorithm for cardinality-constrained quadratic optimization. Comput. Optim. Appl. 43(1), 1–22 (2009)
Article MathSciNet Google Scholar
Bienstock, D.: Computational study of a family of mixed-integer quadratic programming problems. Math. Program. 74(2, Ser. A), 121–140 (1996)
Article MathSciNet Google Scholar
Birgin, E..G., Martínez, J..M.: Practical Augmented Lagrangian Methods for Constrained Optimization, volume 10 of Fundamentals of Algorithms. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (2014)
Book Google Scholar
Birgin, E.G., Gómez, W., Haeser, G., Mito, L.M., Santos, D.O.: An augmented Lagrangian algorithm for nonlinear semidefinite programming applied to the covering problem. Comput. Appl. Math. 39(1): Paper No. 10, 21 (2020)
Branda, M., Bucher, M., Červinka, M., Schwartz, A.: Convergence of a Scholtes-type regularization method for cardinality-constrained optimization problems with an application in sparse robust portfolio optimization. Comput. Optim. Appl. 70(2), 503–530 (2018)
Article MathSciNet Google Scholar
Bueno, L.F., Haeser, G., Rojas, F.N.: Optimality conditions and constraint qualifications for generalized Nash equilibrium problems and their practical implications. SIAM J. Optim. 29(1), 31–54 (2019)
Article MathSciNet Google Scholar
Burdakov, O.P., Kanzow, C., Schwartz, A.: Mathematical programs with cardinality constraints: reformulation by complementarity-type conditions and a regularization method. SIAM J. Optim. 26(1), 397–425 (2016)
Article MathSciNet Google Scholar
Büskens, C., Wassel, D.: The ESA NLP solver WORHP. In: Modeling and Optimization in Space Engineering, volume 73 of Springer Optimization and Applications, pp. 85–110. Springer, New York (2013)
Červinka, M., Kanzow, C., Schwartz, A.: Constraint qualifications and optimality conditions of cardinality-constrained optimization problems. Math. Program. 160(1), 353–377 (2016)
Article MathSciNet Google Scholar
Di Lorenzo, D., Liuzzi, G., Rinaldi, F., Schoen, F., Sciandrone, M.: A concave optimization-based approach for sparse portfolio selection. Optim. Methods Softw. 27(6), 983–1000 (2012)
Article MathSciNet Google Scholar
Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91(2, Ser. A), 201–213 (2002)
Article MathSciNet Google Scholar
Dong, H., Ahn, M., Pang, J.-S.: Structural properties of affine sparsity constraints. Math. Program. 176(1–2, Ser. B), 95–135 (2019)
Article MathSciNet Google Scholar
Feng, M., Mitchell, J.E., Pang, J.-S., Shen, X., Wächter, A.: Complementarity formulations of $\ell _0$-norm optimization problems. Pac. J. Optim. 14(2), 273–305 (2018)
MathSciNet MATH Google Scholar
Frangioni, A., Gentile, C.: SDP diagonalizations and perspective cuts for a class of nonseparable MIQP. Oper. Res. Lett. 35(2), 181–185 (2007)
Article MathSciNet Google Scholar
HSL. A collection of Fortran codes for large scale scientific computation. http://www.hsl.rl.ac.uk/
Izmailov, A.F., Solodov, M.V., Uskov, E.I.: Global convergence of augmented Lagrangian methods applied to optimization problems with degenerate constraints, including problems with complementarity constraints. SIAM J. Optim. 22(4), 1579–1606 (2012)
Article MathSciNet Google Scholar
Kanzow, C.: On the multiplier-penalty-approach for quasi-variational inequalities. Math. Program. 160(1–2, Ser. A), 33–63 (2016)
Article MathSciNet Google Scholar
Kanzow, C., Schwartz, A.: A new regularization method for mathematical programs with complementarity constraints with strong convergence properties. SIAM J. Optim. 23(2), 770–798 (2013)
Article MathSciNet Google Scholar
Kanzow, C., Steck, D.: Augmented Lagrangian methods for the solution of generalized Nash equilibrium problems. SIAM J. Optim. 26(4), 2034–2058 (2016)
Article MathSciNet Google Scholar
Kanzow, C., Steck, D.: An example comparing the standard and safeguarded augmented Lagrangian methods. Oper. Res. Lett. 45(6), 598–603 (2017)
Article MathSciNet Google Scholar
Kanzow, C., Steck, D., Wachsmuth, D.: An augmented Lagrangian method for optimization problems in Banach spaces. SIAM J. Control Optim. 56(1), 272–291 (2018)
Article MathSciNet Google Scholar
Krulikovski, E.H.M., Ribeiro, A.A., Sachine, M.: A sequential optimality condition for mathematical programs with cardinality constraints. ArXiv e-prints (2020). arXiv:2008.03158
Mehlitz, P.: Asymptotic stationarity and regularity for nonsmooth optimization problems. J. Nonsmooth Anal. Optim. 4, 5 (2020). https://doi.org/10.46298/jnsao-2020-6575
Article Google Scholar
Nocedal, J., Wright, S.J.: Numerical Optimization. Springer Series in Operations Research and Financial Engineering, 2nd edn. Springer, New York (2006)
Google Scholar
Scholtes, S.: Convergence properties of a regularization scheme for mathematical programs with complementarity constraints. SIAM J. Optim. 11(4), 918–936 (2001)
Article MathSciNet Google Scholar
Zhao, C., Xiu, N., Qi, H.-D., Luo, Z.: A Lagrange–Newton algorithm for sparse nonlinear programming. ArXiv e-prints, (2020). arXiv:2004.13257

Download references

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Institute of Mathematics, University of Würzburg, Campus Hubland Nord, Emil-Fischer-Str. 30, 97074, Würzburg, Germany
Christian Kanzow & Andreas B. Raharja
Faculty of Mathematics, Technische Universität Dresden, 01062, Dresden, Germany
Alexandra Schwartz

Authors

Christian Kanzow
View author publications
You can also search for this author in PubMed Google Scholar
Andreas B. Raharja
View author publications
You can also search for this author in PubMed Google Scholar
Alexandra Schwartz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christian Kanzow.

Additional information

Communicated by Ebrahim Sarabi.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kanzow, C., Raharja, A.B. & Schwartz, A. An Augmented Lagrangian Method for Cardinality-Constrained Optimization Problems. J Optim Theory Appl 189, 793–813 (2021). https://doi.org/10.1007/s10957-021-01854-7

Download citation

Received: 15 December 2020
Accepted: 31 March 2021
Published: 29 April 2021
Issue Date: June 2021
DOI: https://doi.org/10.1007/s10957-021-01854-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

An Augmented Lagrangian Method for Cardinality-Constrained Optimization Problems

Abstract

Similar content being viewed by others

An augmented Lagrangian method for optimization problems with structured geometric constraints

A strong sequential optimality condition for cardinality-constrained optimization problems

Second-Order Optimality Conditions and Improved Convergence Results for Regularization Methods for Cardinality-Constrained Optimization Problems

1 Introduction

2 Preliminaries

Theorem 2.1

Definition 2.2

Proposition 2.3

Proof

Definition 2.4

3 An Augmented Lagrangian Method

Algorithm 3.1

4 Convergence Analysis

Proposition 4.1

Proof

Theorem 4.2

Theorem 4.3

Proof

Corollary 4.4

5 Numerical Results

5.1 Pilot Test

5.2 Portfolio Optimization Problems

6 Final Remarks

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An Augmented Lagrangian Method for Cardinality-Constrained Optimization Problems

Abstract

Similar content being viewed by others

An augmented Lagrangian method for optimization problems with structured geometric constraints

A strong sequential optimality condition for cardinality-constrained optimization problems

Second-Order Optimality Conditions and Improved Convergence Results for Regularization Methods for Cardinality-Constrained Optimization Problems

1 Introduction

2 Preliminaries

Theorem 2.1

Definition 2.2

Proposition 2.3

Proof

Definition 2.4

3 An Augmented Lagrangian Method

Algorithm 3.1

4 Convergence Analysis

Proposition 4.1

Proof

Theorem 4.2

Theorem 4.3

Proof

Corollary 4.4

5 Numerical Results

5.1 Pilot Test

5.2 Portfolio Optimization Problems

6 Final Remarks

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation