1 Introduction

In recent years, cardinality-constrained optimization problems (CCOP) have received an increasing amount of attention due to their far-reaching applications, including portfolio optimization [11, 12, 15] and statistical regression [11, 22]. Unfortunately, these problems are notoriously difficult to solve, even testing feasibility is already NP-complete [11].

A recurrent strategy in mathematics is to cast a difficult problem into a simpler one, for which well-established solution techniques already exist. For CCOP, the recent paper [17] was written precisely in this spirit. There, the authors reformulate the problem as a continuous optimization problem with orthogonality-type constraints. This approach parallels the one made in the context of sparse optimization problems [23]. It should be noted, however, that due to its similarities with mathematical programs with complementarity constraints (MPCC), the proposed reformulation from [17] is, unfortunately, also highly degenerate in the sense that even weak standard constraint qualifications (CQ) such as Abadie CQ are often violated at points of interest. In addition, sequential optimality conditions like AKKT (approximate KKT) are known to be satisfied at any feasible point of cardinality-constrained problems, see [32], and therefore are also useless to identify suitable candidates for local minima in this context.

These observations make a direct application of most standard nonlinear programming (NLP) methods to solve the reformulated problem rather challenging, since they typically require the fulfillment of a stronger standard CQ at a limit point to ensure stationarity. To overcome difficulties with CQs, CCOP-tailored CQs were introduced in [17, 19]. Regularization methods, which are standard techniques in attacking MPCC, were subsequently proposed in [15, 17], where convergence towards a stationarity point is proved using these CQs. This is not the path that we shall walk on here. In this paper, we are interested in the viability of ALGENCAN [2, 3, 13], a well established and open-source standard NLP solver based on an augmented Lagrangian method (ALM), to solve the reformulated problem directly without any problem-specific modifications.

ALMs belong to one of the classical solution methods for NLPs. However, up to the mid 2000s, their popularity was largely overshadowed by other techniques, in particular, the sequential quadratic programming methods (SQP) and the interior point methods. Since then, beginning with [2, 3], a particular variant of ALMs, which employs the Powell–Hestenes–Rockafellar (PHR) augmented Lagrangian function as well as safeguarded multipliers, has been experiencing rejuvenated interest. The aforementioned ALGENCAN implements this variant. For NLPs it has been shown that this variant possesses strong convergence properties even under very mild assumptions [4, 6]. It has since been applied to solve various other problems, including MPCC [7, 26], quasi-variational inequalities [27], generalized Nash equilibrium problems [16, 29], and semidefinite programming [8, 14].

Due to the structure of the reformulated problems, particularly relevant to us is the paper [26], where authors prove global convergence of the method towards an MPCC-C-stationarity point under MPCC-LICQ; see also [5] for a more recent discussion under weakened assumptions. However, even though the problems with orthogonality-type constraints resulting from the reformulation of CCOP can be viewed as MPCC in case nonnegativity constraints are present [19], we would like to stress that the results obtained in our paper are not simple corollaries of [26]. For one, we do not assume the presence of nonnegativity constraints here, making our results applicable in the general setting. Moreover, even in the presence of nonnegativity constraints, it was shown in [19, Remark 5.7 (f)] that MPCC-LICQ, which was used to guarantee convergence to a stationary point in [26], is often violated at points of interests for the reformulated problems. Instead, we therefore employ a CCOP-analogue of the quasinormality CQ [10], which is weaker than CCOP-CPLD introduced in [17], to prove the global convergence of the method.

To this end, we first recall some important properties of the CCOP-reformulation in Sect. 2 and define a CCOP-version of the quasinormality CQ. The ALM algorithm is introduced in Sect. 3, and its convergence properties under said quasinormality CQ are analyzed in Sect. 4. Numerical experiments illustrating the performance of ALGENCAN for the reformulated problem are then presented in Sect. 5. We close with some final remarks in Sect. 6.

Notation: For a given vector \(x \in {\mathbb {R}}^n\), we define the two index sets

$$\begin{aligned} I_\pm (x) := \{ i \in \{1, \dots , n\} \mid x_i \ne 0\} \quad \text {and} \quad I_0(x) := \{ i \in \{1, \dots , n\} \mid x_i = 0\}. \end{aligned}$$

Clearly, both sets are disjoint and we have \(\{1, \dots , n\} = I_\pm (x) \cup I_0(x)\). For two vectors \(a, b \in {\mathbb {R}}^n\), the terms \(\max \{a,b\}, \min \{a,b\} \in {\mathbb {R}}^n\) denote the componentwise maximum/minimum of these vectors. A frequently used special case hereof is \(a_+ := \max \{a,0\} \in {\mathbb {R}}^n\). We denote the Hadamard product of two vectors \(x, y \in {\mathbb {R}}^n\) with \(x \circ y\), and we define \(e := (1, \dots , 1)^T \in {\mathbb {R}}^n\).

2 Preliminaries

In this paper, we consider cardinality-constrained optimization problems of the form

$$\begin{aligned} \begin{array}{lll} \displaystyle \min _{x \in {\mathbb {R}}^n} \ f(x) &{} \text {s.t.}&{} g(x) \le 0, \quad h(x) = 0, \\ &{} &{} \Vert x\Vert _0 \le s, \end{array} \end{aligned}$$
(2.1)

where \(f \in C^1({\mathbb {R}}^n,{\mathbb {R}})\), \(g \in C^1({\mathbb {R}}^n,{\mathbb {R}}^m)\), \(h \in C^1({\mathbb {R}}^n,{\mathbb {R}}^p)\), and \(\Vert x \Vert _0\) denotes the number of nonzero components of a vector x. Occasionally, this problem is also called a sparse optimization problem [36], but sparse optimization typically refers to programs, which have a sparsity term within the objective function.

Throughout this paper, we assume \(s < n\), since the cardinality constraint would be redundant otherwise. Following the approach from [17], by introducing an auxiliary variable \(y \in {\mathbb {R}}^n\), we obtain the following relaxed program

$$\begin{aligned} \begin{array}{lll} \displaystyle \min _{x,y \in {\mathbb {R}}^n} \ f(x) &{} \text { s.t. } &{}g(x) \le 0, \quad h(x) = 0, \\ &{} &{} n - e^T y \le s, \\ &{} &{} y \le e, \\ &{} &{} x \circ y = 0. \end{array} \end{aligned}$$
(2.2)

Observe that the relaxed reformulation we use here is slightly different from the one in [17], because we omit the constraint \(y \ge 0\), leading to a larger feasible set. Nonetheless, one can easily see that all results obtained in [17, Section 3] are applicable for (2.2) as well. We shall now gather some of these results, which are relevant for this paper. Their proofs can be found in [17].

Theorem 2.1

Let \({\hat{x}}\in {\mathbb {R}}^n\). Then the following statements hold:

  1. (a)

    \({\hat{x}}\) is feasible for (2.1) if and only if there exists \({\hat{y}}\in {\mathbb {R}}^n\) such that \(({\hat{x}}, {\hat{y}})\) is feasible for (2.2).

  2. (b)

    \({\hat{x}}\) is a global optimizer of (2.1) if and only if there exists \({\hat{y}}\in {\mathbb {R}}^n\) such that \(({\hat{x}}, {\hat{y}})\) is a global minimizer of (2.2).

  3. (c)

    If \({\hat{x}}\in {\mathbb {R}}^n\) is a local minimizer of (2.1), then there exists \({\hat{y}}\in {\mathbb {R}}^n\) such that \(({\hat{x}}, {\hat{y}})\) is a local minimizer of (2.2). Conversely, if \(({\hat{x}}, {\hat{y}})\) is a local minimizer of (2.2) satisfying \( \Vert {\hat{x}}\Vert _0 = s \), then \( {\hat{x}}\) is a local minimizer of (2.1).

Theorem 2.1 shows that the relaxed problem (2.2) is equivalent to the original problem (2.1) in terms of feasible points and global minima, whereas the equivalence of local minima requires some extra condition (namely the cardinality constraint to be active). Hence, essentially, the two problems (2.1) and (2.2) may be viewed as being equivalent, and it is therefore natural to solve the given cardinality problem (2.1) via the relaxed program (2.2).

Let us now recall the stationarity concepts introduced in [17].

Definition 2.2

Let \(({\hat{x}}, {\hat{y}}) \in {\mathbb {R}}^n \times {\mathbb {R}}^n\) be feasible for (2.2). Then \(({\hat{x}}, {\hat{y}})\) is called

  1. (a)

    CCOP-M-stationary, if there exist multipliers \(\lambda \in {\mathbb {R}}^m\), \(\mu \in {\mathbb {R}}^p\), and \(\gamma \in {\mathbb {R}}^n\) such that

    • \(0 = \nabla f({\hat{x}}) + \nabla g({\hat{x}}) \lambda + \nabla h({\hat{x}}) \mu + \gamma \),

    • \(\lambda \ge 0\) and \(\lambda _i g_i({\hat{x}}) = 0\) for all \(i = 1, \dots , m\),

    • \(\gamma _i = 0\) for all \(i \in I_\pm ({\hat{x}})\).

  2. (b)

    CCOP-S-stationary, if \( ({\hat{x}}, {\hat{y}}) \) is CCOP-M-stationary with \(\gamma _i = 0\) for all \( i \in I_0({\hat{y}}) \).

As remarked in [17], CCOP-S-stationarity corresponds to the KKT condition of (2.2). In contrast, CCOP-M-stationarity does not depend on the auxiliary variables y and is the KKT condition of the following tightened nonlinear program TNLP(\({\hat{x}}\))

$$\begin{aligned} \begin{array}{llll} \displaystyle \min _{x} \ f(x) &{} \text { s.t. } &{} g(x) \le 0, &{} h(x) = 0 , \\ &{} &{} x_i = 0 &{} \forall i \in I_0({\hat{x}}). \end{array} \end{aligned}$$
(2.3)

Observe that every local minimizer of (2.1) is also a local minimizer of (2.3). This justifies the definition of CCOP-M-stationarity. Suppose now that \(({\hat{x}}, {\hat{y}}) \in {\mathbb {R}}^n \times {\mathbb {R}}^n\) is feasible for (2.2). By the orthogonality constraint, we clearly have \(I_\pm ({\hat{x}}) \subseteq I_0({\hat{y}})\) (with equality if \(\Vert {\hat{x}}\Vert _0 = s\)). Hence, if \(({\hat{x}}, {\hat{y}})\) is a CCOP-S-stationary point, then it is also CCOP-M-stationary. The converse is not true in general, see [17, Example 4].

It was shown in [19] that a CCOP-tailored version of Guignard CQ, which is the same as standard Guignard CQ for (2.2), is sufficient to guarantee CCOP-S-stationarity of local minima local minima of (2.2). This is a major difference to MPCCs, where one typically needs MPCC-LICQ to guarantee S-stationarity of local minima and has to rely on M-stationarity under weaker MPCC-CQs. Since local minima of (2.2) are CCOP-S-stationary under CCOP-CQs, CCOP-M-stationary points seem to be undesirable solution candidates. Fortunately, if \(({\hat{x}}, {\hat{y}})\) is CCOP-M-stationary, one can simply replace \({\hat{y}}\) with another auxiliary variable \({\hat{z}}\in {\mathbb {R}}^n\) such that \(({\hat{x}}, {\hat{z}})\) is CCOP-S-stationary, as the next proposition shows. Note that the proof of this result is constructive.

Proposition 2.3

Let \(({\hat{x}}, {\hat{y}}) \in {\mathbb {R}}^n \times {\mathbb {R}}^n\) be feasible for (2.2). If \(({\hat{x}}, {\hat{y}})\) is a CCOP-M-stationary point, then there exists \({\hat{z}}\in {\mathbb {R}}^n\) such that \(({\hat{x}}, {\hat{z}})\) is CCOP-S-stationary.

Proof

By Theorem 2.1, \({\hat{x}}\) is feasible for (2.1). Now define \({\hat{z}}\in {\mathbb {R}}^n\) such that

$$\begin{aligned} {\hat{z}}_i := {\left\{ \begin{array}{ll} 0 &{} \text {if } i \in I_\pm ({\hat{x}}), \\ 1 &{} \text {if } i \in I_0({\hat{x}}). \end{array}\right. } \end{aligned}$$

Then \(({\hat{x}}, {\hat{z}})\) is obviously feasible for (2.2), cf. also the proof of [17, Theorem 3.1]. By assumption, there exists \((\lambda , \mu , \gamma ) \in {\mathbb {R}}^m \times {\mathbb {R}}^p \times {\mathbb {R}}^n\) such that \(({\hat{x}}, {\hat{y}})\) is CCOP-M-stationary. And since \(I_\pm ({\hat{x}}) = I_0({\hat{z}})\), using Definition 2.2, we can conclude that \(({\hat{x}}, {\hat{z}})\) is CCOP-S-stationary with \((\lambda , \mu , \gamma )\) from before as corresponding multipliers. \(\square \)

This shows that the difference between S- and M-stationarity in this setting is not as big as for MPCCs. More precisely, a feasible point \({\hat{x}}\) of (2.1) is CCOP-M-stationary if and only if there exists \({\hat{z}}\) such that the pair \(({\hat{x}}, {\hat{z}})\) is CCOP-S-stationary. Consequently, any constraint qualification which guarantees that a local minimum \({\hat{x}}\) of (2.1) satisfies CCOP-M-stationarity, also yields the existence of a CCOP-S-stationary point \(({\hat{x}}, {\hat{z}})\). Numerically, it implies that any method which generates a sequence converging to a CCOP-M-stationary point only, essentially gives a CCOP-S-stationary point.

Utilizing (2.3), CCOP-tailored CQs were introduced in [17]. We shall now follow this approach and introduce a CCOP-tailored quasinormality condition.

Definition 2.4

A point \({\hat{x}}\in {\mathbb {R}}^n\), feasible for (2.1), satisfies the CCOP-quasinormality condition, if there exist no \((\lambda , \mu , \gamma ) \in {\mathbb {R}}^m \times {\mathbb {R}}^p \times {\mathbb {R}}^n \setminus \{(0,0,0)\}\) such that the following conditions are satisfied:

  1. (a)

    \(0 = \nabla g({\hat{x}}) \lambda + \nabla h({\hat{x}}) \mu + \gamma \),

  2. (b)

    \(\lambda \ge 0\) and \(\lambda _i g_i({\hat{x}}) = 0\) for all \(i = 1, \dots , m\),

  3. (c)

    \(\gamma _i = 0\) for all \(i \in I_\pm ({\hat{x}})\),

  4. (d)

    \(\exists \{x^k\} \subseteq {\mathbb {R}}^n\) with \(\{x^k\} \rightarrow {\hat{x}}\) such that, for all \(k \in {\mathbb {N}}\), we have

    • \(\forall i \in \{1, \dots , m\}\) with \(\lambda _i> 0: \ \lambda _i g_i(x^k) > 0\),

    • \(\forall i \in \{1, \dots , p\}\) with \(\mu _i \ne 0: \ \mu _i h_i(x^k) > 0\),

    • \(\forall i \in \{1, \dots , n\}\) with \(\gamma _i \ne 0: \ \gamma _i x_i^k > 0\).

Obviously, CCOP-quasinormality corresponds to the (standard) quasinormality CQ of (2.3). By [1], CCOP-CPLD introduced in [17] thus implies CCOP-quasinormality.

3 An Augmented Lagrangian Method

Let us now describe the algorithm. For a given penalty parameter \(\alpha > 0\) the PHR augmented Lagrangian function for (2.2) is given by

$$\begin{aligned} L((x,y), \lambda , \mu , \zeta , \eta , \gamma ; \alpha ) := f(x) + \alpha \pi ((x,y), \lambda , \mu , \zeta , \eta , \gamma ; \alpha ) \end{aligned}$$

with \((\lambda , \mu , \zeta , \eta , \gamma ) \in {\mathbb {R}}^m_+ \times {\mathbb {R}}^p \times {\mathbb {R}}_+ \times {\mathbb {R}}^n_+ \times {\mathbb {R}}^n\) and

$$\begin{aligned} \pi ((x,y), \lambda , \mu , \zeta , \eta , \gamma ; \alpha ) := \frac{1}{2} \left\| \begin{pmatrix} \left( g(x) + \frac{\lambda }{\alpha }\right) _+ \\ h(x) + \frac{\mu }{\alpha } \\ \left( n - e^T y -s + \frac{\zeta }{\alpha }\right) _+ \\ \left( y - e + \frac{\eta }{\alpha }\right) _+ \\ x \circ y + \frac{\gamma }{\alpha } \end{pmatrix} \right\| _2^2, \end{aligned}$$

is the shifted quadratic penalty term, cf. [13, Chapter 4]. The algorithm is then stated below.

Algorithm 3.1

(Safeguarded Augmented Lagrangian Method)

\((S_0)\):

Initialization: Choose parameters \(\lambda _{\max } > 0\), \(\mu _{\min } < \mu _{\max }\), \(\zeta _{\max } > 0\), \(\eta _{\max } > 0\), \(\gamma _{\min } < \gamma _{\max }\), \(\tau \in (0,1)\), \(\sigma > 1\) and \(\{\epsilon _k\} \subseteq {\mathbb {R}}_+\) such that \(\{\epsilon _k\} \downarrow 0\).

Choose initial values \({\bar{\lambda }}^1 \in [0, \lambda _{\max }]^m\), \({\bar{\mu }}^1 \in [\mu _{\min }, \mu _{\max }]^p\), \({\bar{\zeta }}^1 \in [0, \zeta _{\max }]\), \({\bar{\eta }}^1 \in [0, \eta _{\max }]^n\), \({\bar{\gamma }}^1 \in [\gamma _{\min }, \gamma _{\max }]^n\), \(\alpha _1 > 0\), and set \(k \leftarrow 1\).

\((S_1)\):

Update of the iterates: Compute \((x^k, y^k)\) as an approximate solution of

$$\begin{aligned} \displaystyle \min _{(x,y) \in {\mathbb {R}}^{2n}} \ L((x,y), {\bar{\lambda }}^k, {\bar{\mu }}^k, {\bar{\zeta }}^k, {\bar{\eta }}^k, {\bar{\gamma }}^k; \alpha _k) \end{aligned}$$

satisfying

$$\begin{aligned} \Vert \nabla _{(x,y)} L((x^k,y^k), {\bar{\lambda }}^k, {\bar{\mu }}^k, {\bar{\zeta }}^k, {\bar{\eta }}^k, {\bar{\gamma }}^k; \alpha _k) \Vert \le \epsilon _k. \end{aligned}$$
(3.1)
\((S_2)\):

Update of the approximate multipliers:

$$\begin{aligned} \begin{array}{ll} \lambda ^k &{} := (\alpha _k g(x^k) + {\bar{\lambda }}^k)_+ \\ \mu ^k &{} := \alpha _k h(x^k) + {\bar{\mu }}^k \\ \zeta ^k &{}:= (\alpha _k (n - e^T y^k - s) + {\bar{\zeta }}^k)_+ \\ \eta ^k &{} := ( \alpha _k (y^k - e) + {\bar{\eta }}^k)_+ \\ \gamma ^k &{} := \alpha _k x^k \circ y^k + {\bar{\gamma }}^k \end{array} \end{aligned}$$
\((S_3)\):

Update of the penalty parameter: Define

$$\begin{aligned} U^k&:= \min \big \{ -g(x^k), \tfrac{{\bar{\lambda }}^k}{\alpha _k} \big \}, \quad V_k := \min \big \{ -(n - e^T y^k - s), \tfrac{{\bar{\zeta }}^k}{\alpha _k} \big \}, \quad \\ W^k&:= \min \big \{ -(y^k - e), \tfrac{{\bar{\eta }}^k}{\alpha _k} \big \}. \end{aligned}$$

If \(k = 1\) or

$$\begin{aligned} \begin{array}{ll} &{} \max \left\{ \Vert U^k\Vert , \; \Vert h(x^k)\Vert , \; \Vert V_k\Vert , \; \Vert W^k\Vert , \; \Vert x^k \circ y^k\Vert \right\} \\ &{} \le \tau \max \left\{ \Vert U^{k - 1}\Vert , \; \Vert h(x^{k - 1})\Vert , \; \Vert V_{k - 1}\Vert , \; \Vert W^{k - 1}\Vert , \; \Vert x^{k - 1} \circ y^{k - 1}\Vert \right\} , \end{array} \end{aligned}$$
(3.2)

set \(\alpha _{k + 1} = \alpha _k\). Otherwise set \(\alpha _{k + 1} = \sigma \alpha _k\).

\((S_4)\):

Update of the safeguarded multipliers: Choose \({\bar{\lambda }}^{k + 1} \in [0, \lambda _{\max }]^m\), \({\bar{\mu }}^{k + 1} \in [\mu _{\min }\), \(\mu _{\max }]^p\), \({\bar{\zeta }}^{k + 1} \in [0, \zeta _{\max }]\), \({\bar{\eta }}^{k + 1} \in [0, \eta _{\max }]^n\), \({\bar{\gamma }}^{k + 1} \in [\gamma _{\min }\), \(\gamma _{\max }]^n\).

\((S_5)\):

Set \(k \leftarrow k + 1\) and go to \((S_1)\).

Note that Algorithm 3.1 is exactly the safeguarded augmented Lagrangian method from [13]. The only difference to the classical augmented Lagrangian, see, e.g., [9, 34], is in the more careful updating of the Lagrange multipliers: The safeguarded method contains the bounded auxiliary sequences \({{\bar{\lambda }}}^k, {{\bar{\mu }}}^k, \ldots \), which replace the multiplier estimates \(\lambda ^k, \mu ^k, \ldots \) in certain places. Note that these bounded auxiliary sequences are chosen by the user and that there is quite some freedom for their choice. In principle, one can simply take \({{\bar{\lambda }}}^k = 0, {{\bar{\mu }}}^k = 0, \ldots \) for all \( k \in {\mathbb {N}}\), in which case Algorithm 3.1 boils down to the classical quadratic penalty method. A more practical choice is to compute \({{\bar{\lambda }}}^{k+1}, {{\bar{\mu }}}^{k+1}, \ldots \) by taking the projections of the multiplier estimates \(\lambda ^{k}, \mu ^{k}, \ldots \) onto the respective sets \( [0, \lambda _{\max }]^m, [ \mu _{\min }, \mu _{\max }]^p, \ldots \). This implies that, for sufficiently large parameters \( \lambda _{\max }, \mu _{\min }, \mu _{\max }, \ldots \) the safeguarded ALM often coincides with the classical ALM. Differences occur, however, in those situations where the classical ALM generates unbounded Lagrange multiplier estimates. This has a significant influence on the (global) convergence theory of both methods: While there is a very satisfactory theory for the safeguarded method, see [13], a counterexample from [30] shows that these properties do not hold for the classical approach.

We have not specified a termination condition for the algorithm here. However, the convergence analysis in the next section suggests to stop the algorithm, e.g., if the M-stationarity conditions are satisfied up to a given tolerance.

In the subsequent discussion of the convergence properties of this algorithm, we often make use of the fact that the PHR augmented Lagrangian function is continuously differentiable with the gradient

$$\begin{aligned}&\nabla _x L((x,y), \lambda , \mu , \zeta , \eta , \gamma ; \alpha ) \\&\quad = \nabla f(x) + \alpha \left[ \nabla g(x) \left( g(x) + \tfrac{\lambda }{\alpha }\right) _+ + \nabla h(x) \left( h(x) + \tfrac{\mu }{\alpha }\right) + \left( x \circ y + \tfrac{\gamma }{\alpha }\right) \circ y\right] , \\&\nabla _y L((x,y), \lambda , \mu , \zeta , \eta , \gamma ; \alpha )\\&\quad = \alpha \left[ -\left( n-e^Ty-s + \tfrac{\zeta }{\alpha }\right) _+ e + \left( y-e + \tfrac{\eta }{\alpha }\right) _+ e + \left( x \circ y + \tfrac{\gamma }{\alpha }\right) \circ x \right] , \end{aligned}$$

where \( \nabla g(x) \) and \( \nabla h(x) \) denote the transposed Jacobian matrices of g and h at x, respectively. Consequently, the multipliers in \((S_2)\) are chosen exactly such that

$$\begin{aligned} \nabla _x L((x^k,y^k), {\bar{\lambda }}^k, {\bar{\mu }}^k, {\bar{\zeta }}^k, {\bar{\eta }}^k, {\bar{\gamma }}^k; \alpha _k)= & {} \nabla f(x) + \nabla g(x^k) \lambda ^k + \nabla h(x^k) \mu ^k + \gamma ^k \circ y^k, \\ \nabla _y L((x^k,y^k), {\bar{\lambda }}^k, {\bar{\mu }}^k, {\bar{\zeta }}^k, {\bar{\eta }}^k, {\bar{\gamma }}^k; \alpha _k)= & {} -\zeta ^k e + \eta ^k + \gamma ^k \circ x^k \end{aligned}$$

holds for all \(k \in {\mathbb {N}}\).

4 Convergence Analysis

The aim of this section is to prove global convergence of Algorithm 3.1 to CCOP-M-stationary points under the fairly mild CCOP-quasinormality condition. To this end, we begin with an auxiliary result, which states that the sequence \(\{ y^k \}\) remains bounded on any subsequence, where \(\{ x^k \}\) itself is bounded. In particular, if \(\{ x^k \}\) converges on a subsequence, this then allows us to extract a limit point of the sequence \(\{(x^k, y^k)\}\).

Proposition 4.1

Let \(\{x^k\} \subseteq {\mathbb {R}}^n\) be a sequence generated by Algorithm 3.1. Assume that \(\{x^k\}\) is bounded on a subsequence. Then the auxiliary sequence \(\{y^k\}\) is bounded on the same subsequence.

Proof

In order to avoid taking further subsequences, let us assume that the entire sequence \(\{ x^k \}\) remains bounded. We then show that also the whole sequence \(\{ y^k \}\) is bounded. Define, for each \(k \in {\mathbb {N}}\),

$$\begin{aligned} B^k := \nabla _y L((x^k,y^k), {\bar{\lambda }}^k, {\bar{\mu }}^k, {\bar{\zeta }}^k, {\bar{\eta }}^k, {\bar{\gamma }}^k; \alpha _k) = -\zeta ^k e + \eta ^k + \gamma ^k \circ x^k. \end{aligned}$$
(4.1)

By (3.1), we know that \(\{B^k\} \rightarrow 0\). We first show that the sequence \(\{ y^k \}\) is bounded from above and then verify that it is also bounded from below.

\(\{y^k\}~\textit{is bounded above}\) We claim that there exists a \(c \in {\mathbb {R}}\) such that \(y^k \le c e\) for all \(k \in {\mathbb {N}}\). Suppose, by contradiction, that there is an index \( j \in \{1, \dots , n\} \) and a subsequence \(\{y^{k_l}_j\}\) such that \(\{y^{k_l}_j\} \rightarrow + \infty \). Since \(\alpha _k \ge \alpha _1 > 0\) for all \(k \in {\mathbb {N}}\) and \({\bar{\eta }}_j^{k_l}\) is bounded by definition, we then obtain

$$\begin{aligned} \big \{ \alpha _{k_l}(y^{k_l}_j - 1) + {\bar{\eta }}_j^{k_l} \big \} \rightarrow + \infty . \end{aligned}$$
(4.2)

This implies \( \eta _j^{k_l} = \alpha _{k_l}(y^{k_l}_j - 1) + {\bar{\eta }}_j^{k_l} \) for all \( l \in {\mathbb {N}}\) sufficiently large and, hence, by (4.2), we have \(\{ \eta _j^{k_l} \} \rightarrow +\infty \). Observe that, for each \(l \in {\mathbb {N}}\) sufficiently large, we have

$$\begin{aligned} \gamma _j^{k_l} x_j^{k_l} = \big ( \alpha _{k_l} x_j^{k_l} y_j^{k_l} + {\bar{\gamma }}_j^{k_l} \big ) x_j^{k_l} = \alpha _{k_l} (x_j^{k_l})^2 y_j^{k_l} + {\bar{\gamma }}_j^{k_l} x_j^{k_l} \ge {\bar{\gamma }}_j^{k_l} x_j^{k_l}. \end{aligned}$$

From (4.1), we then obtain for these \(l \in {\mathbb {N}}\) that \( B^{k_l}_j = -\zeta ^{k_l} + \eta _j^{k_l} + \gamma _j^{k_l} x_j^{k_l} \ge -\zeta ^{k_l} + \eta _j^{k_l} + {\bar{\gamma }}_j^{k_l} x_j^{k_l} \), which is equivalent to \( \zeta ^{k_l} \ge \eta _j^{k_l} + {\bar{\gamma }}_j^{k_l} x_j^{k_l} - B^{k_l}_j \). Since \(\{B^{k_l}_j\} \rightarrow 0\) and \(\{{\bar{\gamma }}_j^{k_l} x_j^{k_l}\}\) is bounded, the right-hand side converges to \( +\infty \). Consequently, we have \(\{\zeta ^{k_l}\} \rightarrow +\infty \). The definition of \(\{\zeta ^{k_l}\}\) therefore yields \( \{ \alpha _{k_l} (n - e^T y^{k_l} - s) + {\bar{\zeta }}_{k_l} \} \rightarrow +\infty \). Since \(\{{\bar{\zeta }}_{k_l}\}\) is a bounded sequence, we get \( \{ \alpha _{k_l} (n - e^T y^{k_l} - s) \} \rightarrow + \infty \). We therefore have

$$\begin{aligned} n - e^T y^{k_l} - s > 0 \quad \forall l \in {\mathbb {N}}\text { sufficiently large.} \end{aligned}$$
(4.3)

We now claim that

$$\begin{aligned} \exists i \in \{1,\dots ,n\} \setminus \{j\}: \ \{y_i^{k_l}\} \text { is unbounded from below.} \end{aligned}$$
(4.4)

Assume there exist \( d \in {\mathbb {R}}\) such that \( y^{k_l}_i \ge d\) for all \( i \in \{ 1, \ldots , n \} \setminus \{ j \} \) and all \( l \in {\mathbb {N}}\). We then obtain

$$\begin{aligned} n - e^T y^{k_l} - s = n - \displaystyle \sum _{i = 1, i \ne j}^n y_i^{k_l} - y_j^{k_l} - s \le n - (n-1) d - y_j^{k_l} - s \rightarrow - \infty . \end{aligned}$$

We therefore get \( n - e^T y^{k_l} - s < 0 \) for all \( l \in {\mathbb {N}}\) sufficiently large, but this contradicts (4.3), hence (4.4) holds. For this particular index i, we can construct a subsequence \(\{y_i^{k_{l_t}}\}\) such that \(\{y_i^{k_{l_t}}\} \rightarrow -\infty \). Since \(\{{\bar{\eta }}_i\}^{k_{l_t}}\) is bounded, we then have \( \big \{\alpha _{k_{l_t}} (y_i^{k_{l_t}} - 1) + {\bar{\eta }}_i^{k_{l_t}} \big \} \rightarrow -\infty \). This implies \( \eta _i^{k_{l_t}} = 0 \) for all \( t \in {\mathbb {N}}\) sufficiently large. We therefore obtain from (4.1) that

$$\begin{aligned} B_i^{k_{l_t}}&= -\zeta ^{k_{l_t}} + \eta _i^{k_{l_t}} + \gamma _i^{k_{l_t}} x_i^{k_{l_t}} \ = \ -\zeta ^{k_{l_t}} + \gamma _i^{k_{l_t}} x_i^{k_{l_t}} \ = \ -\zeta ^{k_{l_t}} + \big ( \alpha _{k_{l_t}} x_i^{k_{l_t}} y_i^{k_{l_t}} + {\bar{\gamma }}_i^{k_{l_t}} \big ) x_i^{k_{l_t}} \\&= -\zeta ^{k_{l_t}} + \alpha _{k_{l_t}} (x_i^{k_{l_t}})^2 y_i^{k_{l_t}} + {\bar{\gamma }}_i^{k_{l_t}} x_i^{k_{l_t}} \ \le \ -\zeta ^{k_{l_t}} + {\bar{\gamma }}_i^{k_{l_t}} x_i^{k_{l_t}} \end{aligned}$$

for all \( t \in {\mathbb {N}}\) large enough. Since \(\{{\bar{\gamma }}_i^{k_{l_t}} x_i^{k_{l_t}}\}\) is a bounded sequence and \(\{\zeta ^{k_l}\} \rightarrow + \infty \), we get \( \{B_i^{k_{l_t}}\} \rightarrow -\infty \), which leads to a contradiction. Thus, \(\{y^k\}\) is bounded above.

\({\{y^k\}~\textit{is bounded below}}\) We claim that there exists a \(d \in {\mathbb {R}}\) such that \(y^k \ge d e\) for all \(k \in {\mathbb {N}}\). Assume, by contradiction, that there is an index \( j \in \{1,\dots ,n\} \) such that \(\{y^{k_l}_j\} \rightarrow -\infty \) on a suitable subsequence. Then, we have \( y_j^{k_l} < 0 \) and \( \eta _j^{k_l} = 0 \) for all \( l \in {\mathbb {N}}\) large enough, and similar to the previous case, it therefore follows that \( B_j^{k_l} \le -\zeta ^{k_l} + {\bar{\gamma }}_j^{k_l}x_j^{k_l} \). This can be rewritten as \( \zeta ^{k_l} \le {\bar{\gamma }}_j^{k_l}x_j^{k_l} - B_j^{k_l} \). Since \(\{{\bar{\gamma }}_j^{k_l}x_j^{k_l}\}\) is bounded and \(\{B_j^{k_l}\} \rightarrow 0\), the sequence \(\{{\bar{\gamma }}_j^{k_l}x_j^{k_l} - B_j^{k_l}\}\) is bounded. This implies, in particular, that \(\{\zeta ^{k_l}\}\) is bounded above, i.e.,

$$\begin{aligned} \exists r \in {\mathbb {R}}\ \forall l \in {\mathbb {N}}: \ \zeta ^{k_l} \le r. \end{aligned}$$
(4.5)

On the other hand, we already know \(y^k \le c e\) for all \(k \in {\mathbb {N}}\). We therefore get

$$\begin{aligned} n - e^T y^{k_l} - s \ge n - (n-1)c - y_j^{k_l} - s \rightarrow + \infty . \end{aligned}$$

This implies

$$\begin{aligned} \left\{ \alpha _{k_l} \left( n - e^T y^{k_l} - s\right) + {\bar{\zeta }}_{k_l} \right\} \rightarrow + \infty \end{aligned}$$

due to the boundedness of the sequence \(\{ {\bar{\zeta }}_{k_l} \}\) and \(\alpha _k \ge \alpha _1 > 0\) for all \(k \in {\mathbb {N}}\). The definition of \(\zeta ^{k_l}\) then yields

$$\begin{aligned} \zeta ^{k_l} = \alpha _{k_l} \left( n - e^T y^{k_l} - s\right) + {\bar{\zeta }}_{k_l} \rightarrow + \infty , \end{aligned}$$

which contradicts (4.5). Hence, \(\{y^k\}\) is bounded below. \(\square \)

As for all penalty-type methods, one has to distinguish two aspects in a corresponding global convergence theory, namely the feasibility issue and an optimality statement. Without further assumptions, feasibility of the limit point cannot be guaranteed (for nonconvex constraints). However, there is a standard result in [13], which shows that the limit point of our stationary sequence is at least a stationary point of the constraint violation. To this end, we measure the infeasibility of a point \(({\hat{x}}, {\hat{y}}) \in {\mathbb {R}}^n \times {\mathbb {R}}^n\) for (2.2) by using the unshifted quadratic penalty term

$$\begin{aligned} \pi _{0,1}(x,y) := \pi ((x,y), 0, 0, 0, 0, 0; 1). \end{aligned}$$

Clearly \(({\hat{x}}, {\hat{y}})\) is feasible for (2.2) if and only if \(\pi _{0,1}({\hat{x}},{\hat{y}}) = 0\). This, in turn, implies that \(({\hat{x}}, {\hat{y}})\) minimizes \(\pi _{0,1}(x,y)\). In particular, we then ought to have \(\nabla \pi _{0,1}({\hat{x}},{\hat{y}}) = 0\).

Theorem 4.2

Let \(({\hat{x}}, {\hat{y}}) \in {\mathbb {R}}^n \times {\mathbb {R}}^n\) be a limit point of the sequence \(\{(x^k, y^k)\}\) generated by Algorithm 3.1. Then \(\nabla \pi _{0,1}({\hat{x}},{\hat{y}}) = 0\).

We omit the proof here, since it is identical to [13, Theorem 6.3] and [31, Theorem 6.2]. Instead, we turn to an optimality result for Algorithm 3.1. Suppose that the sequence \(\{x^k\}\) generated by Algorithm 3.1 has a limit point \({\hat{x}}\). Proposition 4.1 then suggests that we can extract a limit point \(({\hat{x}}, {\hat{y}})\) of the sequence \(\{(x^k, y^k)\}\). Under the additional assumptions that \({\hat{x}}\) satisfies CCOP-quasinormality and \(({\hat{x}}, {\hat{y}})\) is feasible for (2.2), we can show that \(({\hat{x}}, {\hat{y}})\) is a CCOP-M-stationary point.

Theorem 4.3

Let \(({\hat{x}}, {\hat{y}}) \in {\mathbb {R}}^n \times {\mathbb {R}}^n\) be a limit point of \(\{(x^k, y^k)\}\) generated by Algorithm 3.1 that is feasible for (2.2) and where \({\hat{x}}\) satisfies CCOP-quasinormality. Then \(({\hat{x}}, {\hat{y}})\) is a CCOP-M-stationary point.

Proof

To simplify the notation, we assume, throughout this proof, that the entire sequence \(\{ (x^k, y^k) \}\) converges to \(({\hat{x}}, {\hat{y}})\). For each \(k \in {\mathbb {N}}\), we define

$$\begin{aligned} A^k:= & {} \nabla _x L((x^k,y^k), {\bar{\lambda }}^k, {\bar{\mu }}^k, {\bar{\zeta }}^k, {\bar{\eta }}^k, {\bar{\gamma }}^k; \alpha _k)\\= & {} \nabla f(x^k) + \nabla g(x^k) \lambda ^k + \nabla h(x^k) \mu ^k + \gamma ^k \circ y^k. \end{aligned}$$

Furthermore, let \(B^k\) be given as in (4.1). By (3.1) and since \(\{\epsilon _k\} \downarrow 0\), we know that \(\{A^k\} \rightarrow 0\) and \(\{B^k\} \rightarrow 0\). Observe that, by \((S_2)\), we have \(\{\lambda ^k\} \subseteq {\mathbb {R}}^m_+\). Furthermore, by \((S_3)\), the sequence of penalty parameters \(\{\alpha _k\}\) satisfies \(\alpha _k \ge \alpha _1 > 0\) for all \(k \in {\mathbb {N}}\). Let us now distinguish two cases.

Case 1 \(\{\alpha _k\}\) is bounded. Then \(\{\alpha _k\}\) is eventually constant, say \( \alpha _k = \alpha _K \) for all \( k \ge K \) with some sufficiently large \( K \in {\mathbb {N}}\). Now, let us take a closer look at \((S_2)\). The boundedness of \(\{\alpha _k\}\) immediately implies that the sequences \(\{\mu ^k\}\) and \(\{\gamma ^k \circ y^k\}\) are bounded. By passing onto subsequences if necessary, we can assume w.l.o.g. that these sequences converge, i.e. \(\{\mu ^k\} \rightarrow {{\hat{\mu }}}\) and \(\{\gamma ^k \circ y^k\} \rightarrow {{\hat{\gamma }}}\). For all \(i \in I_\pm ({\hat{x}})\) the feasibility of \(({\hat{x}}, {\hat{y}})\) implies \({\hat{y}}_i = 0\). Since, in this case, we have \(\{y_i^k\} \rightarrow 0\), it follows that

$$\begin{aligned} {\hat{\gamma }}_i = \displaystyle \lim _{k \rightarrow \infty }\gamma _i^k y_i^k = \lim _{k \rightarrow \infty } \alpha _k x_i^k (y_i^k)^2 + \lim _{k \rightarrow \infty } {\bar{\gamma }}_i^k y_i^k = \alpha _K \cdot 0 + \lim _{k \rightarrow \infty } {\bar{\gamma }}_i^k y_i^k = 0 \quad \forall i \in I_\pm ({\hat{x}}). \end{aligned}$$

Next, observe that, for each \(i \in \{1, \dots , m\}\), we have \( 0 \le \lambda _i^k \le |\alpha _k g_i(x^k) + {\bar{\lambda }}_i^k| \) for all \( k \in {\mathbb {N}}\). Thus, \(\{\lambda _i^k\}\) is bounded as well and has a convergent subsequence. Thus, we can assume w.l.o.g. that \(\{\lambda ^k\} \rightarrow {\hat{\lambda }}\) on the whole sequence. Now, the boundedness of \(\{\alpha _k\}\) and \((S_3)\) also imply \(\{ \Vert U^k \Vert \} \rightarrow 0\). Let \(i \notin I_g({\hat{x}})\). Since, by definition, \(\{{\bar{\lambda }}^k \}\) is bounded, \(\left\{ \frac{{\bar{\lambda }}_i^k}{\alpha _k} \right\} \) is bounded as well and therefore has a convergent subsequence. Assume w.l.o.g. that this sequence converges to some limit point \(a_i\). Then

$$\begin{aligned} 0 = \displaystyle \lim _{k \rightarrow \infty } \Vert U_i^k \Vert = \Vert \min \{-g_i({\hat{x}}), a_i\} \Vert \quad \Rightarrow \quad \min \{-g_i({\hat{x}}), a_i\} = 0. \end{aligned}$$

Since \(-g_i({\hat{x}}) > 0\), we get \(a_i = 0\). This implies

$$\begin{aligned} \left\{ g_i(x^k) + \tfrac{{\bar{\lambda }}_i^k}{\alpha _k} \right\} \rightarrow g_i({\hat{x}}) + a_i = g_i({\hat{x}}) < 0. \end{aligned}$$

Thus, by \((S_2)\) we have

$$\begin{aligned} \lambda _i^k = \max \left\{ 0, \alpha _k g_i(x^k) + {\bar{\lambda }}_i^k \right\} = 0 \quad \forall k \in {\mathbb {N}}\text { sufficiently large}. \end{aligned}$$
(4.6)

As its limit, we then also have \({\hat{\lambda }}_i = 0\). Letting \( k \rightarrow \infty \), the definition of \(A^k\) then yields

$$\begin{aligned} 0 = \nabla f({\hat{x}}) + \nabla g({\hat{x}}) {\hat{\lambda }} + \nabla h({\hat{x}}) {\hat{\mu }} + {\hat{\gamma }}. \end{aligned}$$

Altogether, it follows that \(({\hat{x}}, {\hat{y}})\) is a CCOP-M-stationary point.

Case 2 \(\{\alpha _k\}\) is unbounded. Then, we have \(\{\alpha _k\} \rightarrow +\infty \). Now define, for each \(k \in {\mathbb {N}}\),

$$\begin{aligned} {\tilde{\gamma }}_i^k := \gamma _i^k y_i^k \quad \forall i \in \{1, \dots , n\}. \end{aligned}$$

We claim that the sequence \(\{ ( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k ) \}\) is bounded. By contradiction, assume that \(\{ \Vert ( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k ) \Vert \} \rightarrow \infty \), w.l.o.g. on the whole sequence. The corresponding normalized sequence \(\left\{ \frac{\left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) }{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } \right\} \) is bounded and therefore, again w.l.o.g. on the whole sequence, convergent to a (nontrivial) limit, i.e.

$$\begin{aligned} \left\{ \frac{\left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) }{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } \right\} \rightarrow \left( {\tilde{\lambda }}, {\tilde{\mu }}, {\tilde{\gamma }}, {\tilde{\zeta }}, {\tilde{\eta }}\right) \ne 0. \end{aligned}$$

We show that this limit, together with the sequence \(\{x^k\}\), contradicts CCOP-quasinormality in \({\hat{x}}\): Since \( \lambda ^k \ge 0 \) for all k, it follows that \({\tilde{\lambda }} \ge 0\). Now, take an index \(i \notin I_g({\hat{x}})\), i.e. \(g_i({\hat{x}}) < 0\). Since \(\left\{ {\bar{\lambda }}_i^k\right\} \) is bounded, it follows that \( \left\{ \alpha _k g_i(x^k) + {\bar{\lambda }}_i^k \right\} \rightarrow - \infty \). This implies \( \lambda _i^k = 0 \) for all \( k \in {\mathbb {N}}\) sufficiently large, hence we get

$$\begin{aligned} {\tilde{\lambda }}_i = \displaystyle \lim _{k \rightarrow \infty }\frac{\lambda _i^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } = 0 \quad \forall i \notin I_g({\hat{x}}). \end{aligned}$$
(4.7)

Next take an index \(i \in I_\pm ({\hat{x}})\). Since \(({\hat{x}}, {\hat{y}})\) is feasible, we then have \({\hat{y}}_i = 0\). The boundedness of \(\{{\bar{\eta }}_i^k\}\) therefore yields \( \left\{ \alpha _k \left( y_i^k - 1 \right) + {\bar{\eta }}_i^k \right\} \rightarrow -\infty \). Consequently, we obtain

$$\begin{aligned} \eta _i^k = 0 \quad \forall i \in I_\pm ({\hat{x}}) \ \forall k \in {\mathbb {N}}\text { sufficiently large}. \end{aligned}$$
(4.8)

Now, we claim that \({\tilde{\gamma }}_i = 0\) holds for such an index i. Suppose not. Then \({\tilde{\gamma }}_i^k \ne 0 \) for all \( k \in {\mathbb {N}}\) sufficiently large. Since \({\tilde{\gamma }}_i^k = \gamma _i^k y_i^k\), this implies \(y_i^k \ne 0 \) for all \( k \in {\mathbb {N}}\) large enough. We then have

$$\begin{aligned} B^k_i = -\zeta ^k + \eta _i^k + \gamma _i^k x_i^k {\mathop {=}\limits ^{(4.8)}} -\zeta ^k + \gamma _i^k x_i^k = -\zeta ^k + \frac{{\tilde{\gamma }}_i^k}{y_i^k} x_i^k. \end{aligned}$$
(4.9)

Rearranging and dividing (4.9) by \(\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| \) then gives

$$\begin{aligned} \frac{B_i^k + \zeta ^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } = \frac{{\tilde{\gamma }}_i^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } \cdot x_i^k \cdot \frac{1}{y_i^k}. \end{aligned}$$
(4.10)

Observe that the left-hand side of (4.10) converges. On the other hand, since

$$\begin{aligned} \left\{ \frac{{\tilde{\gamma }}_i^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } x_i^k \right\} \rightarrow {\tilde{\gamma }}_i {\hat{x}}_i \ne 0 \end{aligned}$$

and \(\{y_i^k\} \rightarrow 0\), the right-hand side diverges. This contradiction shows that

$$\begin{aligned} {\tilde{\gamma }}_i = 0 \quad \forall i \in I_\pm ({\hat{x}}). \end{aligned}$$
(4.11)

Now, we claim that \(({\tilde{\lambda }}, {\tilde{\mu }}, {\tilde{\gamma }}) \ne 0\). Suppose not. Then, since \(\left( {\tilde{\lambda }}, {\tilde{\mu }}, {\tilde{\gamma }}, {\tilde{\zeta }}, {\tilde{\eta }}\right) \ne 0\), it follows that \(\left( {\tilde{\zeta }}, {\tilde{\eta }} \right) \ne 0\). Consider an index \(i \in I_0({\hat{y}})\). Since \(\{y_i^k\} \rightarrow {\hat{y}}_i\) and \(\{{\bar{\eta }}_i^k\}\) is a bounded sequence, we have \( \left\{ \alpha _k \left( y_i^k - 1 \right) + {\bar{\eta }}_i^k \right\} \rightarrow -\infty \). Just like before, we can then assume w.l.o.g. that

$$\begin{aligned} \eta _i^k = 0 \quad \forall i \in I_0({\hat{y}}) \ \forall k \in {\mathbb {N}}\end{aligned}$$
(4.12)

which implies \( {\tilde{\eta }}_i = 0 \). Hence, we have

$$\begin{aligned} \left( {\tilde{\zeta }}, {\tilde{\eta }}_i \ \left( i \in I_\pm ({\hat{y}})\right) \right) \ne 0. \end{aligned}$$
(4.13)

Now let \(i \in I_\pm ({\hat{y}})\). Since \({\hat{y}}_i \ne 0\) and \(\{y_i^k\} \rightarrow {\hat{y}}_i\), we can assume w.l.o.g. that \(y_i^k \ne 0 \) for all \( k \in {\mathbb {N}}\). We then get, for each \(k \in {\mathbb {N}}\), that

$$\begin{aligned} B_i^k = -\zeta ^k + \eta _i^k + \gamma _i^k x_i^k = -\zeta ^k + \eta _i^k + \frac{{\tilde{\gamma }}_i^k}{y_i^k} x_i^k. \end{aligned}$$
(4.14)

Rearranging and dividing (4.14) by \(\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| \) yields

$$\begin{aligned} \frac{B_i^k + \zeta ^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } = \frac{\eta _i^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } + \frac{{\tilde{\gamma }}_i^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } \cdot x_i^k \cdot \frac{1}{y_i^k}. \end{aligned}$$
(4.15)

By assumption, \({\tilde{\gamma }}_i = 0\). Consequently, letting \(k \rightarrow \infty \) in (4.15) yields

$$\begin{aligned} {\tilde{\zeta }} = {\tilde{\eta }}_i + 0 \cdot {\hat{x}}_i \cdot \frac{1}{{\hat{y}}_i} = {\tilde{\eta }}_i. \end{aligned}$$
(4.16)

From (4.13) we then obtain \( {\tilde{\zeta }} \ne 0 \) and \( {\tilde{\eta }}_i = {\tilde{\zeta }} \ne 0 \) for all \( i \in I_\pm ({\hat{y}}) \). Since \(\zeta ^k \ge 0 \) for all \( k \in {\mathbb {N}}\), we have \({\tilde{\zeta }} \ge 0\) and, therefore, \({\tilde{\zeta }} > 0\). Hence, we can assume w.l.o.g. that \(\zeta ^k > 0 \) for all \( k \in {\mathbb {N}}\). This implies \( \zeta ^k = \alpha _k \left( n - e^T y^k - s \right) + {\bar{\zeta }}^k \). We then have

$$\begin{aligned} 0 < {\tilde{\zeta }}&= \displaystyle \lim _{k \rightarrow \infty } \frac{\zeta ^k}{\left\| \left( \lambda ^k, \mu ^k,{\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } \\&= \lim _{k \rightarrow \infty } \frac{\alpha _k \left( n - e^T y^k - s \right) }{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } + \lim _{k \rightarrow \infty } \frac{{\bar{\zeta }}^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } \\&= \lim _{k \rightarrow \infty } \frac{\alpha _k \left( n - e^T y^k - s \right) }{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| }, \end{aligned}$$

since \(\{{\bar{\zeta }}^k\}\) is bounded by definition. Consequently, we can assume w.l.o.g. that

$$\begin{aligned} n - e^T y^k - s > 0 \quad \forall k \in {\mathbb {N}}. \end{aligned}$$
(4.17)

By assumption, \(({\hat{x}}, {\hat{y}})\) is feasible and, hence, \(n - e^T {\hat{y}}- s \le 0\). Thus, we obtain from (4.17) that \(n - e^T y^k -s > n - e^T {\hat{y}}- s\) and, therefore,

$$\begin{aligned} e^T {\hat{y}}> e^T y^k \quad \forall k \in {\mathbb {N}}. \end{aligned}$$
(4.18)

Furthermore, since \({\tilde{\zeta }} > 0\), by (4.16), we also have that \({\tilde{\eta }}_i > 0 \) for all \( i \in I_\pm ({\hat{y}})\). This implies \(\eta _i^k > 0 \) for all sufficiently large \( k \in {\mathbb {N}}\). Consequently, we have \( \eta _i^k = \alpha _k \left( y_i^k - 1 \right) + {\bar{\eta }}_i^k \) for all \(k \in {\mathbb {N}}\) large enough. We then obtain

$$\begin{aligned} 0 < {\tilde{\eta }}_i&= \displaystyle \lim _{k \rightarrow \infty } \frac{\eta _i^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } \\&= \lim _{k \rightarrow \infty } \frac{\alpha _k \left( y_i^k - 1 \right) }{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } + \lim _{k \rightarrow \infty } \frac{{\bar{\eta }}_i^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } \\&= \lim _{k \rightarrow \infty } \frac{\alpha _k \left( y_i^k - 1 \right) }{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| }, \end{aligned}$$

since \(\{{\bar{\eta }}_i^k\}\) is bounded by definition. Hence, we can assume w.l.o.g. that \( y_i^k > 1 \) for all \( k \in {\mathbb {N}}\). On the other hand, the feasibility of \(({\hat{x}}, {\hat{y}})\) implies \({\hat{y}}_i \le 1 \) for all \( i \in \{1, \dots , n\} \). Consequently, we obtain

$$\begin{aligned} {\hat{y}}_i < y_i^k \quad \forall i \in I_\pm ({\hat{y}}) \ \forall k \in {\mathbb {N}}. \end{aligned}$$
(4.19)

Together, this implies

$$\begin{aligned}&\sum _{i \in I_\pm ({\hat{y}})} {\hat{y}}_i = e^T{{\hat{y}}} > e^T y^k = \sum _{i \in I_\pm ({\hat{y}})} y^k_i + \sum _{i \in I_0({\hat{y}})} y^k_i \ge \sum _{i \in I_\pm ({\hat{y}})} {\hat{y}}_i + \sum _{i \in I_0({\hat{y}})} y^k_i \quad \\&\quad \Longrightarrow \quad \sum _{i \in I_0({\hat{y}})} y^k_i < 0 \end{aligned}$$

for all \(k \in {\mathbb {N}}\). By passing to a subsequence, we can therefore assume w.l.o.g. that there exists a \(j \in I_0({\hat{y}})\) with \(y^k_j < 0\) for all \(k \in {\mathbb {N}}\). Since \(j \in I_0({\hat{y}})\), by (4.12), we have \(\eta _j^k = 0 \) for all \( k \in {\mathbb {N}}\) and, hence, \( B_j^k = -\zeta ^k + \gamma _j^k x_j^k \) or, equivalently, \( B_j^k + \zeta ^k = \gamma _j^k x_j^k \). Since \(y_j^k \le 0\), we then have

$$\begin{aligned} \gamma _j^k x_j^k = \left( \alpha _k x_j^k y_j^k + {\bar{\gamma }}_j^k \right) x_j^k = \alpha _k (x_j^k)^2 y_j^k + {\bar{\gamma }}_j^k x_j^k \le {\bar{\gamma }}_j^k x_j^k. \end{aligned}$$

Consequently, we have \( B_j^k + \zeta ^k \le {\bar{\gamma }}_j^k x_j^k \) and, therefore,

$$\begin{aligned} \frac{B_j^k + \zeta ^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } \le \frac{{\bar{\gamma }}_j^k x_j^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| }. \end{aligned}$$

Since \(\{ {\bar{\gamma }}_j^k x_j^k \}\) is bounded, letting \(k \rightarrow \infty \) then yields the contradiction \( 0 < {\tilde{\zeta }} \le 0 \). Hence we have \( ({\tilde{\lambda }}, {\tilde{\mu }}, {\tilde{\gamma }}) \ne 0 \).

Dividing \(A^k\) by \(\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| \) and letting \(k \rightarrow \infty \) then yields

$$\begin{aligned} 0 = \displaystyle \sum _{i = 1}^m {\tilde{\lambda }}_i \nabla g_i({\hat{x}}) + \sum _{i = 1}^p {\tilde{\mu }}_i \nabla h_i({\hat{x}}) + \sum _{i = 1}^n {\tilde{\gamma }}_i e_i \end{aligned}$$

where \(({\tilde{\lambda }}, {\tilde{\mu }}, {\tilde{\gamma }}) \ne 0\) and, in view of (4.7) and (4.11), \( {\tilde{\lambda }} \in {\mathbb {R}}^m_+ \), \( {\tilde{\lambda }}_i = 0 \) for all \( i \notin I_g({\hat{x}}) \), and \( {\tilde{\gamma }}_i = 0 \) for all \( i \in I_\pm ({\hat{x}}) \). This shows that \( {\hat{x}}\) satisfies properties (a)–(c) from Definition 2.4. We now verify that also the three conditions from part (d) hold.

For this purpose, let \(i \in \{1, \dots , m\}\) such that \({\tilde{\lambda }}_i > 0\) holds. Then, we can assume w.l.o.g. that \(\lambda _i^k > 0 \) for all \( k \in {\mathbb {N}}\) and, thus, \(\lambda _i^k = \alpha _k g_i(x^k) + {\bar{\lambda }}_i^k\). Consequently, we have

$$\begin{aligned} 0 < {\tilde{\lambda }}_i&= \displaystyle \lim _{k \rightarrow \infty } \frac{\lambda _i^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } \\&= \lim _{k \rightarrow \infty } \frac{\alpha _k g_i(x^k)}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } + \lim _{k \rightarrow \infty } \frac{{\bar{\lambda }}_i^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } \\&= \lim _{k \rightarrow \infty } \frac{\alpha _k g_i(x^k)}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } \end{aligned}$$

by the boundedness of \(\{{\bar{\lambda }}_i^k\}\). Thus, we have \( g_i(x^k) > 0 \) for all \( k \in {\mathbb {N}}\) sufficiently large and, therefore, also \( {\tilde{\lambda }}_i g_i(x^k) > 0 \) for all these \( k \in {\mathbb {N}}\).

Next consider an index \(i \in \{1, \dots , p\}\) such that \({\tilde{\mu }}_i \ne 0\). The boundedness of \(\{{\bar{\mu }}_i^k\}\) then implies

$$\begin{aligned} {\tilde{\mu }}_i&= \displaystyle \lim _{k \rightarrow \infty } \frac{\mu _i^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } = \lim _{k \rightarrow \infty } \frac{\alpha _k h_i(x^k)}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } \\&\quad + \lim _{k \rightarrow \infty } \frac{{\bar{\mu }}_i^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } \\&= \lim _{k \rightarrow \infty } \frac{\alpha _k h_i(x^k)}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| }. \end{aligned}$$

Since \(\alpha _k > 0\), this implies that \({\tilde{\mu }}_i \ne 0\) and \(h_i(x^k)\) have the same sign for all \( k \in {\mathbb {N}}\) sufficiently large, i.e. \( {\tilde{\mu }}_i h_i(x^k) > 0 \).

Finally, consider an index \(i \in \{1, \dots , n\}\) such that \({\tilde{\gamma }}_i \ne 0\). The boundedness of \(\{{\bar{\gamma }}_i^k\}\) yields

$$\begin{aligned} {\tilde{\gamma }}_i&= \displaystyle \lim _{k \rightarrow \infty } \frac{{\tilde{\gamma }}_i^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } = \lim _{k \rightarrow \infty } \frac{\gamma _i^k y_i^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } \\&= \lim _{k \rightarrow \infty } \frac{(\alpha _k x_i^k y_i^k + {\bar{\gamma }}_i^k) y_i^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } \\&= \lim _{k \rightarrow \infty } \frac{\alpha _k x_i^k (y_i^k)^2 }{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } + \lim _{k \rightarrow \infty } \frac{{\bar{\gamma }}_i^k y_i^k}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } \\&= \lim _{k \rightarrow \infty } \frac{\alpha _k x_i^k (y_i^k)^2}{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| }. \end{aligned}$$

Hence, \({\tilde{\gamma }}_i \ne 0\) and \(x_i^k\) also have the same sign for all \( k \in {\mathbb {N}}\) large, i.e. \( {\tilde{\gamma }}_i x_i^k > 0 \).

Altogether, this contradicts the assumed CCOP-quasinormality of \({\hat{x}}\). Thus, \(\left\{ ( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k )\right\} \) is bounded and therefore has a convergent subsequence. Assume w.l.o.g. that the whole sequence converges, i.e.,

$$\begin{aligned} \exists \big ( {\hat{\lambda }}, {\hat{\mu }}, {\hat{\gamma }}, {\hat{\zeta }}, {\hat{\eta }} \big ): \ \big \{\big ( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \big )\big \} \rightarrow \big ( {\hat{\lambda }}, {\hat{\mu }}, {\hat{\gamma }}, {\hat{\zeta }}, {\hat{\eta }} \big ). \end{aligned}$$

Since \(\{\lambda ^k\} \subseteq {\mathbb {R}}^m_+\), we also have \({\hat{\lambda }} \in {\mathbb {R}}^m_+\). Consider an index \(i \notin I_g({\hat{x}})\). Then, just like for \({\tilde{\lambda }}_i\), one can show that \({\hat{\lambda }}_i = 0\). Similarly, for \(i \in I_\pm ({\hat{x}})\), following the argument for \({\tilde{\gamma }}_i\), one also gets \({\hat{\gamma }}_i = 0\). Taking \( k \rightarrow \infty \) in the definition of \(A^k\), we then obtain

$$\begin{aligned} 0 = \nabla f({\hat{x}}) + \nabla g({\hat{x}}) {\hat{\lambda }} + \nabla h({\hat{x}}) {\hat{\mu }} + {\hat{\gamma }}, \end{aligned}$$

where \( {\hat{\lambda }}_i = 0 \) for all \( i \notin I_g({\hat{x}}) \) and \( {\hat{\gamma }}_i = 0 \) for all \( i \in I_\pm ({\hat{x}}) \). Thus, we conclude that \(({\hat{x}}, {\hat{y}})\) is CCOP-M-stationary. \(\square \)

It is known from [6, Corollary 4.2] that accumulation points \(({{\hat{x}}}, {{\hat{y}}})\) of Algorithm 3.1, where standard quasinormality holds, are KKT points and thus CCOP-S-stationary. To compare this result with Theorem 4.3, first note that CCOP-quasinormality only depends on \({{\hat{x}}}\), whereas standard quasinormality for (2.2) depends on both \(({{\hat{x}}}, {{\hat{y}}})\). In case \(\{i \mid {{\hat{y}}}_i \ne 0\} = I_0({{\hat{x}}})\), standard quasinormality in \(({{\hat{x}}}, {{\hat{y}}})\) is equivalent to CCOP-quasinormality in \({{\hat{x}}}\), and CCOP-S- and CCOP-M-stationarity coincide. Thus, in this situation, the statement from Theorem 4.3 can also be derived via [6, Corollary 4.2]. However, in case \(\{i \mid {{\hat{y}}}_i \ne 0\} \subsetneq I_0({{\hat{x}}})\), standard quasinormality is always violated in \(({{\hat{x}}}, {{\hat{y}}})\) and thus [6, Corollary 4.2] cannot be applied. In the latter situation, in general, we can only guarantee CCOP-M-stationarity of limits \(({\hat{x}}, {\hat{y}})\). But, using Proposition 2.3, it is still possible to ensure CCOP-stationarity of a potentially modified point \(({\hat{x}}, {\hat{z}})\).

Corollary 4.4

Let \(({\hat{x}}, {\hat{y}}) \in {\mathbb {R}}^n \times {\mathbb {R}}^n\) be a limit point of \(\{(x^k, y^k)\}\) generated by Algorithm 3.1 that is feasible for (2.2) and where \({\hat{x}}\) satisfies CCOP-quasinormality. Then there exists \({\hat{z}}\in {\mathbb {R}}^n\) such that \(({\hat{x}}, {\hat{z}})\) is a CCOP-S-stationary point.

5 Numerical Results

In this section, we compare the performance of ALGENCAN with the Scholtes regularization method from [15] as well as the Kanzow–Schwartz regularization method from [17]. All experiments were conducted using Python together with the Numpy library. We used ALGENCAN 2.4.0 compiled with MA57 library [25] and called through its Python interface with user-supplied gradients of the objective functions, sparse Jacobian of the constraints, as well as sparse Hessian of the Lagrangian. As a subsolver for the two regularization methods, we used the (for academic use) freely available SQP solver WORHP version 1.14 [18] called through its Python interface. For the Scholtes regularization method, WORHP was called with user-supplied sparse gradients of the objective functions, sparse Jacobian of the constraints, as well as the sparse Hessian of the Lagrangian. On the other hand, for the Kanzow–Schwartz regularization method, since the analytical Hessian does not exist as the corresponding NCP-function is not twice differentiable, we called WORHP with user-supplied sparse gradients of the objective functions and sparse Jacobian of the constraints only. The Hessian of the Lagrangian was then approximated using the BFGS method. Throughout the experiments, both ALGENCAN and WORHP were called using their respective default settings.

We applied ALGENCAN directly to the relaxed reformulation of the test problems as in (2.2), i.e. without a lower bound for the auxiliary variable y. In contrast, following [15, 17], for both regularization methods, we bounded y from below by 0. For each test problem, we started both regularization methods with an initial regularization parameter \(t_0 = 1.0\) and decreased \(t_k\) in each iteration by a factor of 0.01. The regularization methods were terminated, if either \(t_k < 10^{-8}\) or \(\left\| x^k \circ y^k \right\| _\infty \le 10^{-6}\).

5.1 Pilot Test

Let us begin by considering the following academic example

$$\begin{aligned} \displaystyle \min _{x \in {\mathbb {R}}^2} x_1 + 10 x_2 \quad \text {s.t.}\quad \left( x_1 - \tfrac{1}{2}\right) ^2 + \left( x_2 - 1\right) ^2 \le 1, \ \Vert x\Vert _0 \le 1 \end{aligned}$$

which is taken from [17]. This problem has a local minimizer in \(\left( 0, 1 - \frac{1}{2}\sqrt{3}\right) \) and an isolated global minimizer in \(\left( \frac{1}{2}, 0\right) \). Following [17], we discretised the rectangle \(\left[ -1, \frac{3}{2}\right] \times \left[ -\frac{1}{2},2\right] \) resulting in 441 starting points for the considered methods. For each of these starting points, ALGENCAN converged towards the global minimizer \(\left( \frac{1}{2}, 0\right) \). The same behaviour was also observed for the Scholtes regularization method. On the other hand, the Kanzow–Schwartz regularization method was slightly less successful, converging in 437 cases towards the global minimizer. In the other 4 cases, the method converged towards the local minimizer. This behaviour might be due to the performance of the BFGS method used by WORHP in approximating the Hessian of the Lagrangian. Indeed, running the Scholtes regularization method without user-supplied Hessian of the Lagrangian, letting the Hessian be approximated by the BFGS method instead, yielded in a convergence towards the global minimizer in only 394 cases. In the other 47 cases, the Scholtes regularization method only managed to find the local minimizer.

5.2 Portfolio Optimization Problems

Following [17], we consider a classical portfolio optimization problem

$$\begin{aligned} \begin{array}{lll} \displaystyle \min _{x \in {\mathbb {R}}^n} \ x^T Q x &{} \text {s.t.}&{} \mu ^T x \ge \rho , \; e^T x \le 1, \; 0 \le x \le u, \\ &{} &{} \left\| x\right\| _0 \le s, \end{array} \end{aligned}$$
(5.1)

where Q and \(\mu \) are the covariance matrix and the mean of n possible assets and \(e^T x \le 1\) is the budget constraint, see [12, 20]. We generated the test problems using the data from [24], considering \(s = 5, 10, 20\) for each dimension \(n = 200, 300, 400\), which resulted in 270 test problems, see also [17]. Here, we considered six total approaches:

  • ALGENCAN without a lower bound on y

  • ALGENCAN with an additional lower bound \(y \ge 0\)

  • Scholtes and Kanzow–Schwartz regularization for cardinality-constrained problems [15, 17] with a regularization of both upper quadrants \(x_i \ge 0, y_i \ge 0\) and \(x_i \le 0, y_i \ge 0\)

  • Scholtes and Kanzow–Schwartz regularization for MPCCs [28, 35] with a regularization of the upper right quadrant \(x_i \ge 0, y_i \ge 0\) only.

As discussed before, introducing a lower bound \(y \ge 0\) in (2.2) is possible without changing the theoretical properties of the reformulation. Similarly, due to the constraint \(x \ge 0\) in (5.1), the feasible set of the reformulated problem actually has the classical MPCC structure, and thus only one regularization function in the first quadrant suffices. This motivates the modifications of both ALGENCAN and the two regularization methods described above, which should theoretically not have any effect on the performance of the solution algorithms.

For each test problem, we used the initial values \(x^0 = 0\) and \(y^0 = e\). As a performance measure for the considered methods we compared the attained objective function values and generated a performance profile as suggested in [21], where we set the objective function value of a method for a problem to be \(\infty \), if the method failed to find a feasible point of the problem within a tolerance of \(10^{-6}\).

Fig. 1
figure 1

Comparing the performance of ALGENCAN and the regularization methods for (5.1)

As can be seen from Fig. 1, ALGENCAN worked very reliable with regards to feasibility of the solutions. It often outperformed the regularization methods in terms of objective function value of the solution, especially for larger values of s. Although introducing the lower bound \(y \ge 0\) does not have any theoretical effect on ALGENCAN, the numerical results suggest that it could bring slight improvements to ALGENCAN’s performance.

6 Final Remarks

This paper shows that the safeguarded augmented Lagrangian method applied directly and without problem-specific modifications to the continuous reformulation of cardinality-constrained problems converges to suitable (M-, essentially even S-) stationary points under a weak problem-tailored CQ called CCOP-quasinormality. On the other hand, it is known that this safeguarded ALM generates so-called AKKT sequences (AKKT = approximate KKT) which, under suitable constraint qualifications, lead to KKT points and, hence, to S-stationary points. In the context of cardinality constraints, however, the AKKT concept is useless as an optimality criterion since any feasible point is known to be an AKKT point, cf. [32].

On the other hand, there are some recent reports, which define a problem-tailored AKKT-type condition for cardinality constrained problems, see [32, 33] (the latter in a more general context). Algorithmic applications of these AKKT-type conditions are not discussed in these papers. We therefore plan to investigate this topic within our future research. Note that a corresponding convergence theory based on AKKT-type conditions for cardinality constrained problems will be different from our current theory, based on CCOP-quasinormality, since it is already known from standard NLPs that quasinormality and AKKT regularity conditions are two independent concepts, cf. [4].