Abstract
A reformulation of cardinality-constrained optimization problems into continuous nonlinear optimization problems with an orthogonality-type constraint has gained some popularity during the last few years. Due to the special structure of the constraints, the reformulation violates many standard assumptions and therefore is often solved using specialized algorithms. In contrast to this, we investigate the viability of using a standard safeguarded multiplier penalty method without any problem-tailored modifications to solve the reformulated problem. We prove global convergence towards an (essentially strongly) stationary point under a suitable problem-tailored quasinormality constraint qualification. Numerical experiments illustrating the performance of the method in comparison to regularization-based approaches are provided.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In recent years, cardinality-constrained optimization problems (CCOP) have received an increasing amount of attention due to their far-reaching applications, including portfolio optimization [11, 12, 15] and statistical regression [11, 22]. Unfortunately, these problems are notoriously difficult to solve, even testing feasibility is already NP-complete [11].
A recurrent strategy in mathematics is to cast a difficult problem into a simpler one, for which well-established solution techniques already exist. For CCOP, the recent paper [17] was written precisely in this spirit. There, the authors reformulate the problem as a continuous optimization problem with orthogonality-type constraints. This approach parallels the one made in the context of sparse optimization problems [23]. It should be noted, however, that due to its similarities with mathematical programs with complementarity constraints (MPCC), the proposed reformulation from [17] is, unfortunately, also highly degenerate in the sense that even weak standard constraint qualifications (CQ) such as Abadie CQ are often violated at points of interest. In addition, sequential optimality conditions like AKKT (approximate KKT) are known to be satisfied at any feasible point of cardinality-constrained problems, see [32], and therefore are also useless to identify suitable candidates for local minima in this context.
These observations make a direct application of most standard nonlinear programming (NLP) methods to solve the reformulated problem rather challenging, since they typically require the fulfillment of a stronger standard CQ at a limit point to ensure stationarity. To overcome difficulties with CQs, CCOP-tailored CQs were introduced in [17, 19]. Regularization methods, which are standard techniques in attacking MPCC, were subsequently proposed in [15, 17], where convergence towards a stationarity point is proved using these CQs. This is not the path that we shall walk on here. In this paper, we are interested in the viability of ALGENCAN [2, 3, 13], a well established and open-source standard NLP solver based on an augmented Lagrangian method (ALM), to solve the reformulated problem directly without any problem-specific modifications.
ALMs belong to one of the classical solution methods for NLPs. However, up to the mid 2000s, their popularity was largely overshadowed by other techniques, in particular, the sequential quadratic programming methods (SQP) and the interior point methods. Since then, beginning with [2, 3], a particular variant of ALMs, which employs the Powell–Hestenes–Rockafellar (PHR) augmented Lagrangian function as well as safeguarded multipliers, has been experiencing rejuvenated interest. The aforementioned ALGENCAN implements this variant. For NLPs it has been shown that this variant possesses strong convergence properties even under very mild assumptions [4, 6]. It has since been applied to solve various other problems, including MPCC [7, 26], quasi-variational inequalities [27], generalized Nash equilibrium problems [16, 29], and semidefinite programming [8, 14].
Due to the structure of the reformulated problems, particularly relevant to us is the paper [26], where authors prove global convergence of the method towards an MPCC-C-stationarity point under MPCC-LICQ; see also [5] for a more recent discussion under weakened assumptions. However, even though the problems with orthogonality-type constraints resulting from the reformulation of CCOP can be viewed as MPCC in case nonnegativity constraints are present [19], we would like to stress that the results obtained in our paper are not simple corollaries of [26]. For one, we do not assume the presence of nonnegativity constraints here, making our results applicable in the general setting. Moreover, even in the presence of nonnegativity constraints, it was shown in [19, Remark 5.7 (f)] that MPCC-LICQ, which was used to guarantee convergence to a stationary point in [26], is often violated at points of interests for the reformulated problems. Instead, we therefore employ a CCOP-analogue of the quasinormality CQ [10], which is weaker than CCOP-CPLD introduced in [17], to prove the global convergence of the method.
To this end, we first recall some important properties of the CCOP-reformulation in Sect. 2 and define a CCOP-version of the quasinormality CQ. The ALM algorithm is introduced in Sect. 3, and its convergence properties under said quasinormality CQ are analyzed in Sect. 4. Numerical experiments illustrating the performance of ALGENCAN for the reformulated problem are then presented in Sect. 5. We close with some final remarks in Sect. 6.
Notation: For a given vector \(x \in {\mathbb {R}}^n\), we define the two index sets
Clearly, both sets are disjoint and we have \(\{1, \dots , n\} = I_\pm (x) \cup I_0(x)\). For two vectors \(a, b \in {\mathbb {R}}^n\), the terms \(\max \{a,b\}, \min \{a,b\} \in {\mathbb {R}}^n\) denote the componentwise maximum/minimum of these vectors. A frequently used special case hereof is \(a_+ := \max \{a,0\} \in {\mathbb {R}}^n\). We denote the Hadamard product of two vectors \(x, y \in {\mathbb {R}}^n\) with \(x \circ y\), and we define \(e := (1, \dots , 1)^T \in {\mathbb {R}}^n\).
2 Preliminaries
In this paper, we consider cardinality-constrained optimization problems of the form
where \(f \in C^1({\mathbb {R}}^n,{\mathbb {R}})\), \(g \in C^1({\mathbb {R}}^n,{\mathbb {R}}^m)\), \(h \in C^1({\mathbb {R}}^n,{\mathbb {R}}^p)\), and \(\Vert x \Vert _0\) denotes the number of nonzero components of a vector x. Occasionally, this problem is also called a sparse optimization problem [36], but sparse optimization typically refers to programs, which have a sparsity term within the objective function.
Throughout this paper, we assume \(s < n\), since the cardinality constraint would be redundant otherwise. Following the approach from [17], by introducing an auxiliary variable \(y \in {\mathbb {R}}^n\), we obtain the following relaxed program
Observe that the relaxed reformulation we use here is slightly different from the one in [17], because we omit the constraint \(y \ge 0\), leading to a larger feasible set. Nonetheless, one can easily see that all results obtained in [17, Section 3] are applicable for (2.2) as well. We shall now gather some of these results, which are relevant for this paper. Their proofs can be found in [17].
Theorem 2.1
Let \({\hat{x}}\in {\mathbb {R}}^n\). Then the following statements hold:
-
(a)
\({\hat{x}}\) is feasible for (2.1) if and only if there exists \({\hat{y}}\in {\mathbb {R}}^n\) such that \(({\hat{x}}, {\hat{y}})\) is feasible for (2.2).
-
(b)
\({\hat{x}}\) is a global optimizer of (2.1) if and only if there exists \({\hat{y}}\in {\mathbb {R}}^n\) such that \(({\hat{x}}, {\hat{y}})\) is a global minimizer of (2.2).
-
(c)
If \({\hat{x}}\in {\mathbb {R}}^n\) is a local minimizer of (2.1), then there exists \({\hat{y}}\in {\mathbb {R}}^n\) such that \(({\hat{x}}, {\hat{y}})\) is a local minimizer of (2.2). Conversely, if \(({\hat{x}}, {\hat{y}})\) is a local minimizer of (2.2) satisfying \( \Vert {\hat{x}}\Vert _0 = s \), then \( {\hat{x}}\) is a local minimizer of (2.1).
Theorem 2.1 shows that the relaxed problem (2.2) is equivalent to the original problem (2.1) in terms of feasible points and global minima, whereas the equivalence of local minima requires some extra condition (namely the cardinality constraint to be active). Hence, essentially, the two problems (2.1) and (2.2) may be viewed as being equivalent, and it is therefore natural to solve the given cardinality problem (2.1) via the relaxed program (2.2).
Let us now recall the stationarity concepts introduced in [17].
Definition 2.2
Let \(({\hat{x}}, {\hat{y}}) \in {\mathbb {R}}^n \times {\mathbb {R}}^n\) be feasible for (2.2). Then \(({\hat{x}}, {\hat{y}})\) is called
-
(a)
CCOP-M-stationary, if there exist multipliers \(\lambda \in {\mathbb {R}}^m\), \(\mu \in {\mathbb {R}}^p\), and \(\gamma \in {\mathbb {R}}^n\) such that
-
\(0 = \nabla f({\hat{x}}) + \nabla g({\hat{x}}) \lambda + \nabla h({\hat{x}}) \mu + \gamma \),
-
\(\lambda \ge 0\) and \(\lambda _i g_i({\hat{x}}) = 0\) for all \(i = 1, \dots , m\),
-
\(\gamma _i = 0\) for all \(i \in I_\pm ({\hat{x}})\).
-
-
(b)
CCOP-S-stationary, if \( ({\hat{x}}, {\hat{y}}) \) is CCOP-M-stationary with \(\gamma _i = 0\) for all \( i \in I_0({\hat{y}}) \).
As remarked in [17], CCOP-S-stationarity corresponds to the KKT condition of (2.2). In contrast, CCOP-M-stationarity does not depend on the auxiliary variables y and is the KKT condition of the following tightened nonlinear program TNLP(\({\hat{x}}\))
Observe that every local minimizer of (2.1) is also a local minimizer of (2.3). This justifies the definition of CCOP-M-stationarity. Suppose now that \(({\hat{x}}, {\hat{y}}) \in {\mathbb {R}}^n \times {\mathbb {R}}^n\) is feasible for (2.2). By the orthogonality constraint, we clearly have \(I_\pm ({\hat{x}}) \subseteq I_0({\hat{y}})\) (with equality if \(\Vert {\hat{x}}\Vert _0 = s\)). Hence, if \(({\hat{x}}, {\hat{y}})\) is a CCOP-S-stationary point, then it is also CCOP-M-stationary. The converse is not true in general, see [17, Example 4].
It was shown in [19] that a CCOP-tailored version of Guignard CQ, which is the same as standard Guignard CQ for (2.2), is sufficient to guarantee CCOP-S-stationarity of local minima local minima of (2.2). This is a major difference to MPCCs, where one typically needs MPCC-LICQ to guarantee S-stationarity of local minima and has to rely on M-stationarity under weaker MPCC-CQs. Since local minima of (2.2) are CCOP-S-stationary under CCOP-CQs, CCOP-M-stationary points seem to be undesirable solution candidates. Fortunately, if \(({\hat{x}}, {\hat{y}})\) is CCOP-M-stationary, one can simply replace \({\hat{y}}\) with another auxiliary variable \({\hat{z}}\in {\mathbb {R}}^n\) such that \(({\hat{x}}, {\hat{z}})\) is CCOP-S-stationary, as the next proposition shows. Note that the proof of this result is constructive.
Proposition 2.3
Let \(({\hat{x}}, {\hat{y}}) \in {\mathbb {R}}^n \times {\mathbb {R}}^n\) be feasible for (2.2). If \(({\hat{x}}, {\hat{y}})\) is a CCOP-M-stationary point, then there exists \({\hat{z}}\in {\mathbb {R}}^n\) such that \(({\hat{x}}, {\hat{z}})\) is CCOP-S-stationary.
Proof
By Theorem 2.1, \({\hat{x}}\) is feasible for (2.1). Now define \({\hat{z}}\in {\mathbb {R}}^n\) such that
Then \(({\hat{x}}, {\hat{z}})\) is obviously feasible for (2.2), cf. also the proof of [17, Theorem 3.1]. By assumption, there exists \((\lambda , \mu , \gamma ) \in {\mathbb {R}}^m \times {\mathbb {R}}^p \times {\mathbb {R}}^n\) such that \(({\hat{x}}, {\hat{y}})\) is CCOP-M-stationary. And since \(I_\pm ({\hat{x}}) = I_0({\hat{z}})\), using Definition 2.2, we can conclude that \(({\hat{x}}, {\hat{z}})\) is CCOP-S-stationary with \((\lambda , \mu , \gamma )\) from before as corresponding multipliers. \(\square \)
This shows that the difference between S- and M-stationarity in this setting is not as big as for MPCCs. More precisely, a feasible point \({\hat{x}}\) of (2.1) is CCOP-M-stationary if and only if there exists \({\hat{z}}\) such that the pair \(({\hat{x}}, {\hat{z}})\) is CCOP-S-stationary. Consequently, any constraint qualification which guarantees that a local minimum \({\hat{x}}\) of (2.1) satisfies CCOP-M-stationarity, also yields the existence of a CCOP-S-stationary point \(({\hat{x}}, {\hat{z}})\). Numerically, it implies that any method which generates a sequence converging to a CCOP-M-stationary point only, essentially gives a CCOP-S-stationary point.
Utilizing (2.3), CCOP-tailored CQs were introduced in [17]. We shall now follow this approach and introduce a CCOP-tailored quasinormality condition.
Definition 2.4
A point \({\hat{x}}\in {\mathbb {R}}^n\), feasible for (2.1), satisfies the CCOP-quasinormality condition, if there exist no \((\lambda , \mu , \gamma ) \in {\mathbb {R}}^m \times {\mathbb {R}}^p \times {\mathbb {R}}^n \setminus \{(0,0,0)\}\) such that the following conditions are satisfied:
-
(a)
\(0 = \nabla g({\hat{x}}) \lambda + \nabla h({\hat{x}}) \mu + \gamma \),
-
(b)
\(\lambda \ge 0\) and \(\lambda _i g_i({\hat{x}}) = 0\) for all \(i = 1, \dots , m\),
-
(c)
\(\gamma _i = 0\) for all \(i \in I_\pm ({\hat{x}})\),
-
(d)
\(\exists \{x^k\} \subseteq {\mathbb {R}}^n\) with \(\{x^k\} \rightarrow {\hat{x}}\) such that, for all \(k \in {\mathbb {N}}\), we have
-
\(\forall i \in \{1, \dots , m\}\) with \(\lambda _i> 0: \ \lambda _i g_i(x^k) > 0\),
-
\(\forall i \in \{1, \dots , p\}\) with \(\mu _i \ne 0: \ \mu _i h_i(x^k) > 0\),
-
\(\forall i \in \{1, \dots , n\}\) with \(\gamma _i \ne 0: \ \gamma _i x_i^k > 0\).
-
Obviously, CCOP-quasinormality corresponds to the (standard) quasinormality CQ of (2.3). By [1], CCOP-CPLD introduced in [17] thus implies CCOP-quasinormality.
3 An Augmented Lagrangian Method
Let us now describe the algorithm. For a given penalty parameter \(\alpha > 0\) the PHR augmented Lagrangian function for (2.2) is given by
with \((\lambda , \mu , \zeta , \eta , \gamma ) \in {\mathbb {R}}^m_+ \times {\mathbb {R}}^p \times {\mathbb {R}}_+ \times {\mathbb {R}}^n_+ \times {\mathbb {R}}^n\) and
is the shifted quadratic penalty term, cf. [13, Chapter 4]. The algorithm is then stated below.
Algorithm 3.1
(Safeguarded Augmented Lagrangian Method)
- \((S_0)\):
-
Initialization: Choose parameters \(\lambda _{\max } > 0\), \(\mu _{\min } < \mu _{\max }\), \(\zeta _{\max } > 0\), \(\eta _{\max } > 0\), \(\gamma _{\min } < \gamma _{\max }\), \(\tau \in (0,1)\), \(\sigma > 1\) and \(\{\epsilon _k\} \subseteq {\mathbb {R}}_+\) such that \(\{\epsilon _k\} \downarrow 0\).
Choose initial values \({\bar{\lambda }}^1 \in [0, \lambda _{\max }]^m\), \({\bar{\mu }}^1 \in [\mu _{\min }, \mu _{\max }]^p\), \({\bar{\zeta }}^1 \in [0, \zeta _{\max }]\), \({\bar{\eta }}^1 \in [0, \eta _{\max }]^n\), \({\bar{\gamma }}^1 \in [\gamma _{\min }, \gamma _{\max }]^n\), \(\alpha _1 > 0\), and set \(k \leftarrow 1\).
- \((S_1)\):
-
Update of the iterates: Compute \((x^k, y^k)\) as an approximate solution of
$$\begin{aligned} \displaystyle \min _{(x,y) \in {\mathbb {R}}^{2n}} \ L((x,y), {\bar{\lambda }}^k, {\bar{\mu }}^k, {\bar{\zeta }}^k, {\bar{\eta }}^k, {\bar{\gamma }}^k; \alpha _k) \end{aligned}$$satisfying
$$\begin{aligned} \Vert \nabla _{(x,y)} L((x^k,y^k), {\bar{\lambda }}^k, {\bar{\mu }}^k, {\bar{\zeta }}^k, {\bar{\eta }}^k, {\bar{\gamma }}^k; \alpha _k) \Vert \le \epsilon _k. \end{aligned}$$(3.1) - \((S_2)\):
-
Update of the approximate multipliers:
$$\begin{aligned} \begin{array}{ll} \lambda ^k &{} := (\alpha _k g(x^k) + {\bar{\lambda }}^k)_+ \\ \mu ^k &{} := \alpha _k h(x^k) + {\bar{\mu }}^k \\ \zeta ^k &{}:= (\alpha _k (n - e^T y^k - s) + {\bar{\zeta }}^k)_+ \\ \eta ^k &{} := ( \alpha _k (y^k - e) + {\bar{\eta }}^k)_+ \\ \gamma ^k &{} := \alpha _k x^k \circ y^k + {\bar{\gamma }}^k \end{array} \end{aligned}$$ - \((S_3)\):
-
Update of the penalty parameter: Define
$$\begin{aligned} U^k&:= \min \big \{ -g(x^k), \tfrac{{\bar{\lambda }}^k}{\alpha _k} \big \}, \quad V_k := \min \big \{ -(n - e^T y^k - s), \tfrac{{\bar{\zeta }}^k}{\alpha _k} \big \}, \quad \\ W^k&:= \min \big \{ -(y^k - e), \tfrac{{\bar{\eta }}^k}{\alpha _k} \big \}. \end{aligned}$$If \(k = 1\) or
$$\begin{aligned} \begin{array}{ll} &{} \max \left\{ \Vert U^k\Vert , \; \Vert h(x^k)\Vert , \; \Vert V_k\Vert , \; \Vert W^k\Vert , \; \Vert x^k \circ y^k\Vert \right\} \\ &{} \le \tau \max \left\{ \Vert U^{k - 1}\Vert , \; \Vert h(x^{k - 1})\Vert , \; \Vert V_{k - 1}\Vert , \; \Vert W^{k - 1}\Vert , \; \Vert x^{k - 1} \circ y^{k - 1}\Vert \right\} , \end{array} \end{aligned}$$(3.2)set \(\alpha _{k + 1} = \alpha _k\). Otherwise set \(\alpha _{k + 1} = \sigma \alpha _k\).
- \((S_4)\):
-
Update of the safeguarded multipliers: Choose \({\bar{\lambda }}^{k + 1} \in [0, \lambda _{\max }]^m\), \({\bar{\mu }}^{k + 1} \in [\mu _{\min }\), \(\mu _{\max }]^p\), \({\bar{\zeta }}^{k + 1} \in [0, \zeta _{\max }]\), \({\bar{\eta }}^{k + 1} \in [0, \eta _{\max }]^n\), \({\bar{\gamma }}^{k + 1} \in [\gamma _{\min }\), \(\gamma _{\max }]^n\).
- \((S_5)\):
-
Set \(k \leftarrow k + 1\) and go to \((S_1)\).
Note that Algorithm 3.1 is exactly the safeguarded augmented Lagrangian method from [13]. The only difference to the classical augmented Lagrangian, see, e.g., [9, 34], is in the more careful updating of the Lagrange multipliers: The safeguarded method contains the bounded auxiliary sequences \({{\bar{\lambda }}}^k, {{\bar{\mu }}}^k, \ldots \), which replace the multiplier estimates \(\lambda ^k, \mu ^k, \ldots \) in certain places. Note that these bounded auxiliary sequences are chosen by the user and that there is quite some freedom for their choice. In principle, one can simply take \({{\bar{\lambda }}}^k = 0, {{\bar{\mu }}}^k = 0, \ldots \) for all \( k \in {\mathbb {N}}\), in which case Algorithm 3.1 boils down to the classical quadratic penalty method. A more practical choice is to compute \({{\bar{\lambda }}}^{k+1}, {{\bar{\mu }}}^{k+1}, \ldots \) by taking the projections of the multiplier estimates \(\lambda ^{k}, \mu ^{k}, \ldots \) onto the respective sets \( [0, \lambda _{\max }]^m, [ \mu _{\min }, \mu _{\max }]^p, \ldots \). This implies that, for sufficiently large parameters \( \lambda _{\max }, \mu _{\min }, \mu _{\max }, \ldots \) the safeguarded ALM often coincides with the classical ALM. Differences occur, however, in those situations where the classical ALM generates unbounded Lagrange multiplier estimates. This has a significant influence on the (global) convergence theory of both methods: While there is a very satisfactory theory for the safeguarded method, see [13], a counterexample from [30] shows that these properties do not hold for the classical approach.
We have not specified a termination condition for the algorithm here. However, the convergence analysis in the next section suggests to stop the algorithm, e.g., if the M-stationarity conditions are satisfied up to a given tolerance.
In the subsequent discussion of the convergence properties of this algorithm, we often make use of the fact that the PHR augmented Lagrangian function is continuously differentiable with the gradient
where \( \nabla g(x) \) and \( \nabla h(x) \) denote the transposed Jacobian matrices of g and h at x, respectively. Consequently, the multipliers in \((S_2)\) are chosen exactly such that
holds for all \(k \in {\mathbb {N}}\).
4 Convergence Analysis
The aim of this section is to prove global convergence of Algorithm 3.1 to CCOP-M-stationary points under the fairly mild CCOP-quasinormality condition. To this end, we begin with an auxiliary result, which states that the sequence \(\{ y^k \}\) remains bounded on any subsequence, where \(\{ x^k \}\) itself is bounded. In particular, if \(\{ x^k \}\) converges on a subsequence, this then allows us to extract a limit point of the sequence \(\{(x^k, y^k)\}\).
Proposition 4.1
Let \(\{x^k\} \subseteq {\mathbb {R}}^n\) be a sequence generated by Algorithm 3.1. Assume that \(\{x^k\}\) is bounded on a subsequence. Then the auxiliary sequence \(\{y^k\}\) is bounded on the same subsequence.
Proof
In order to avoid taking further subsequences, let us assume that the entire sequence \(\{ x^k \}\) remains bounded. We then show that also the whole sequence \(\{ y^k \}\) is bounded. Define, for each \(k \in {\mathbb {N}}\),
By (3.1), we know that \(\{B^k\} \rightarrow 0\). We first show that the sequence \(\{ y^k \}\) is bounded from above and then verify that it is also bounded from below.
\(\{y^k\}~\textit{is bounded above}\) We claim that there exists a \(c \in {\mathbb {R}}\) such that \(y^k \le c e\) for all \(k \in {\mathbb {N}}\). Suppose, by contradiction, that there is an index \( j \in \{1, \dots , n\} \) and a subsequence \(\{y^{k_l}_j\}\) such that \(\{y^{k_l}_j\} \rightarrow + \infty \). Since \(\alpha _k \ge \alpha _1 > 0\) for all \(k \in {\mathbb {N}}\) and \({\bar{\eta }}_j^{k_l}\) is bounded by definition, we then obtain
This implies \( \eta _j^{k_l} = \alpha _{k_l}(y^{k_l}_j - 1) + {\bar{\eta }}_j^{k_l} \) for all \( l \in {\mathbb {N}}\) sufficiently large and, hence, by (4.2), we have \(\{ \eta _j^{k_l} \} \rightarrow +\infty \). Observe that, for each \(l \in {\mathbb {N}}\) sufficiently large, we have
From (4.1), we then obtain for these \(l \in {\mathbb {N}}\) that \( B^{k_l}_j = -\zeta ^{k_l} + \eta _j^{k_l} + \gamma _j^{k_l} x_j^{k_l} \ge -\zeta ^{k_l} + \eta _j^{k_l} + {\bar{\gamma }}_j^{k_l} x_j^{k_l} \), which is equivalent to \( \zeta ^{k_l} \ge \eta _j^{k_l} + {\bar{\gamma }}_j^{k_l} x_j^{k_l} - B^{k_l}_j \). Since \(\{B^{k_l}_j\} \rightarrow 0\) and \(\{{\bar{\gamma }}_j^{k_l} x_j^{k_l}\}\) is bounded, the right-hand side converges to \( +\infty \). Consequently, we have \(\{\zeta ^{k_l}\} \rightarrow +\infty \). The definition of \(\{\zeta ^{k_l}\}\) therefore yields \( \{ \alpha _{k_l} (n - e^T y^{k_l} - s) + {\bar{\zeta }}_{k_l} \} \rightarrow +\infty \). Since \(\{{\bar{\zeta }}_{k_l}\}\) is a bounded sequence, we get \( \{ \alpha _{k_l} (n - e^T y^{k_l} - s) \} \rightarrow + \infty \). We therefore have
We now claim that
Assume there exist \( d \in {\mathbb {R}}\) such that \( y^{k_l}_i \ge d\) for all \( i \in \{ 1, \ldots , n \} \setminus \{ j \} \) and all \( l \in {\mathbb {N}}\). We then obtain
We therefore get \( n - e^T y^{k_l} - s < 0 \) for all \( l \in {\mathbb {N}}\) sufficiently large, but this contradicts (4.3), hence (4.4) holds. For this particular index i, we can construct a subsequence \(\{y_i^{k_{l_t}}\}\) such that \(\{y_i^{k_{l_t}}\} \rightarrow -\infty \). Since \(\{{\bar{\eta }}_i\}^{k_{l_t}}\) is bounded, we then have \( \big \{\alpha _{k_{l_t}} (y_i^{k_{l_t}} - 1) + {\bar{\eta }}_i^{k_{l_t}} \big \} \rightarrow -\infty \). This implies \( \eta _i^{k_{l_t}} = 0 \) for all \( t \in {\mathbb {N}}\) sufficiently large. We therefore obtain from (4.1) that
for all \( t \in {\mathbb {N}}\) large enough. Since \(\{{\bar{\gamma }}_i^{k_{l_t}} x_i^{k_{l_t}}\}\) is a bounded sequence and \(\{\zeta ^{k_l}\} \rightarrow + \infty \), we get \( \{B_i^{k_{l_t}}\} \rightarrow -\infty \), which leads to a contradiction. Thus, \(\{y^k\}\) is bounded above.
\({\{y^k\}~\textit{is bounded below}}\) We claim that there exists a \(d \in {\mathbb {R}}\) such that \(y^k \ge d e\) for all \(k \in {\mathbb {N}}\). Assume, by contradiction, that there is an index \( j \in \{1,\dots ,n\} \) such that \(\{y^{k_l}_j\} \rightarrow -\infty \) on a suitable subsequence. Then, we have \( y_j^{k_l} < 0 \) and \( \eta _j^{k_l} = 0 \) for all \( l \in {\mathbb {N}}\) large enough, and similar to the previous case, it therefore follows that \( B_j^{k_l} \le -\zeta ^{k_l} + {\bar{\gamma }}_j^{k_l}x_j^{k_l} \). This can be rewritten as \( \zeta ^{k_l} \le {\bar{\gamma }}_j^{k_l}x_j^{k_l} - B_j^{k_l} \). Since \(\{{\bar{\gamma }}_j^{k_l}x_j^{k_l}\}\) is bounded and \(\{B_j^{k_l}\} \rightarrow 0\), the sequence \(\{{\bar{\gamma }}_j^{k_l}x_j^{k_l} - B_j^{k_l}\}\) is bounded. This implies, in particular, that \(\{\zeta ^{k_l}\}\) is bounded above, i.e.,
On the other hand, we already know \(y^k \le c e\) for all \(k \in {\mathbb {N}}\). We therefore get
This implies
due to the boundedness of the sequence \(\{ {\bar{\zeta }}_{k_l} \}\) and \(\alpha _k \ge \alpha _1 > 0\) for all \(k \in {\mathbb {N}}\). The definition of \(\zeta ^{k_l}\) then yields
which contradicts (4.5). Hence, \(\{y^k\}\) is bounded below. \(\square \)
As for all penalty-type methods, one has to distinguish two aspects in a corresponding global convergence theory, namely the feasibility issue and an optimality statement. Without further assumptions, feasibility of the limit point cannot be guaranteed (for nonconvex constraints). However, there is a standard result in [13], which shows that the limit point of our stationary sequence is at least a stationary point of the constraint violation. To this end, we measure the infeasibility of a point \(({\hat{x}}, {\hat{y}}) \in {\mathbb {R}}^n \times {\mathbb {R}}^n\) for (2.2) by using the unshifted quadratic penalty term
Clearly \(({\hat{x}}, {\hat{y}})\) is feasible for (2.2) if and only if \(\pi _{0,1}({\hat{x}},{\hat{y}}) = 0\). This, in turn, implies that \(({\hat{x}}, {\hat{y}})\) minimizes \(\pi _{0,1}(x,y)\). In particular, we then ought to have \(\nabla \pi _{0,1}({\hat{x}},{\hat{y}}) = 0\).
Theorem 4.2
Let \(({\hat{x}}, {\hat{y}}) \in {\mathbb {R}}^n \times {\mathbb {R}}^n\) be a limit point of the sequence \(\{(x^k, y^k)\}\) generated by Algorithm 3.1. Then \(\nabla \pi _{0,1}({\hat{x}},{\hat{y}}) = 0\).
We omit the proof here, since it is identical to [13, Theorem 6.3] and [31, Theorem 6.2]. Instead, we turn to an optimality result for Algorithm 3.1. Suppose that the sequence \(\{x^k\}\) generated by Algorithm 3.1 has a limit point \({\hat{x}}\). Proposition 4.1 then suggests that we can extract a limit point \(({\hat{x}}, {\hat{y}})\) of the sequence \(\{(x^k, y^k)\}\). Under the additional assumptions that \({\hat{x}}\) satisfies CCOP-quasinormality and \(({\hat{x}}, {\hat{y}})\) is feasible for (2.2), we can show that \(({\hat{x}}, {\hat{y}})\) is a CCOP-M-stationary point.
Theorem 4.3
Let \(({\hat{x}}, {\hat{y}}) \in {\mathbb {R}}^n \times {\mathbb {R}}^n\) be a limit point of \(\{(x^k, y^k)\}\) generated by Algorithm 3.1 that is feasible for (2.2) and where \({\hat{x}}\) satisfies CCOP-quasinormality. Then \(({\hat{x}}, {\hat{y}})\) is a CCOP-M-stationary point.
Proof
To simplify the notation, we assume, throughout this proof, that the entire sequence \(\{ (x^k, y^k) \}\) converges to \(({\hat{x}}, {\hat{y}})\). For each \(k \in {\mathbb {N}}\), we define
Furthermore, let \(B^k\) be given as in (4.1). By (3.1) and since \(\{\epsilon _k\} \downarrow 0\), we know that \(\{A^k\} \rightarrow 0\) and \(\{B^k\} \rightarrow 0\). Observe that, by \((S_2)\), we have \(\{\lambda ^k\} \subseteq {\mathbb {R}}^m_+\). Furthermore, by \((S_3)\), the sequence of penalty parameters \(\{\alpha _k\}\) satisfies \(\alpha _k \ge \alpha _1 > 0\) for all \(k \in {\mathbb {N}}\). Let us now distinguish two cases.
Case 1 \(\{\alpha _k\}\) is bounded. Then \(\{\alpha _k\}\) is eventually constant, say \( \alpha _k = \alpha _K \) for all \( k \ge K \) with some sufficiently large \( K \in {\mathbb {N}}\). Now, let us take a closer look at \((S_2)\). The boundedness of \(\{\alpha _k\}\) immediately implies that the sequences \(\{\mu ^k\}\) and \(\{\gamma ^k \circ y^k\}\) are bounded. By passing onto subsequences if necessary, we can assume w.l.o.g. that these sequences converge, i.e. \(\{\mu ^k\} \rightarrow {{\hat{\mu }}}\) and \(\{\gamma ^k \circ y^k\} \rightarrow {{\hat{\gamma }}}\). For all \(i \in I_\pm ({\hat{x}})\) the feasibility of \(({\hat{x}}, {\hat{y}})\) implies \({\hat{y}}_i = 0\). Since, in this case, we have \(\{y_i^k\} \rightarrow 0\), it follows that
Next, observe that, for each \(i \in \{1, \dots , m\}\), we have \( 0 \le \lambda _i^k \le |\alpha _k g_i(x^k) + {\bar{\lambda }}_i^k| \) for all \( k \in {\mathbb {N}}\). Thus, \(\{\lambda _i^k\}\) is bounded as well and has a convergent subsequence. Thus, we can assume w.l.o.g. that \(\{\lambda ^k\} \rightarrow {\hat{\lambda }}\) on the whole sequence. Now, the boundedness of \(\{\alpha _k\}\) and \((S_3)\) also imply \(\{ \Vert U^k \Vert \} \rightarrow 0\). Let \(i \notin I_g({\hat{x}})\). Since, by definition, \(\{{\bar{\lambda }}^k \}\) is bounded, \(\left\{ \frac{{\bar{\lambda }}_i^k}{\alpha _k} \right\} \) is bounded as well and therefore has a convergent subsequence. Assume w.l.o.g. that this sequence converges to some limit point \(a_i\). Then
Since \(-g_i({\hat{x}}) > 0\), we get \(a_i = 0\). This implies
Thus, by \((S_2)\) we have
As its limit, we then also have \({\hat{\lambda }}_i = 0\). Letting \( k \rightarrow \infty \), the definition of \(A^k\) then yields
Altogether, it follows that \(({\hat{x}}, {\hat{y}})\) is a CCOP-M-stationary point.
Case 2 \(\{\alpha _k\}\) is unbounded. Then, we have \(\{\alpha _k\} \rightarrow +\infty \). Now define, for each \(k \in {\mathbb {N}}\),
We claim that the sequence \(\{ ( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k ) \}\) is bounded. By contradiction, assume that \(\{ \Vert ( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k ) \Vert \} \rightarrow \infty \), w.l.o.g. on the whole sequence. The corresponding normalized sequence \(\left\{ \frac{\left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) }{\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| } \right\} \) is bounded and therefore, again w.l.o.g. on the whole sequence, convergent to a (nontrivial) limit, i.e.
We show that this limit, together with the sequence \(\{x^k\}\), contradicts CCOP-quasinormality in \({\hat{x}}\): Since \( \lambda ^k \ge 0 \) for all k, it follows that \({\tilde{\lambda }} \ge 0\). Now, take an index \(i \notin I_g({\hat{x}})\), i.e. \(g_i({\hat{x}}) < 0\). Since \(\left\{ {\bar{\lambda }}_i^k\right\} \) is bounded, it follows that \( \left\{ \alpha _k g_i(x^k) + {\bar{\lambda }}_i^k \right\} \rightarrow - \infty \). This implies \( \lambda _i^k = 0 \) for all \( k \in {\mathbb {N}}\) sufficiently large, hence we get
Next take an index \(i \in I_\pm ({\hat{x}})\). Since \(({\hat{x}}, {\hat{y}})\) is feasible, we then have \({\hat{y}}_i = 0\). The boundedness of \(\{{\bar{\eta }}_i^k\}\) therefore yields \( \left\{ \alpha _k \left( y_i^k - 1 \right) + {\bar{\eta }}_i^k \right\} \rightarrow -\infty \). Consequently, we obtain
Now, we claim that \({\tilde{\gamma }}_i = 0\) holds for such an index i. Suppose not. Then \({\tilde{\gamma }}_i^k \ne 0 \) for all \( k \in {\mathbb {N}}\) sufficiently large. Since \({\tilde{\gamma }}_i^k = \gamma _i^k y_i^k\), this implies \(y_i^k \ne 0 \) for all \( k \in {\mathbb {N}}\) large enough. We then have
Rearranging and dividing (4.9) by \(\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| \) then gives
Observe that the left-hand side of (4.10) converges. On the other hand, since
and \(\{y_i^k\} \rightarrow 0\), the right-hand side diverges. This contradiction shows that
Now, we claim that \(({\tilde{\lambda }}, {\tilde{\mu }}, {\tilde{\gamma }}) \ne 0\). Suppose not. Then, since \(\left( {\tilde{\lambda }}, {\tilde{\mu }}, {\tilde{\gamma }}, {\tilde{\zeta }}, {\tilde{\eta }}\right) \ne 0\), it follows that \(\left( {\tilde{\zeta }}, {\tilde{\eta }} \right) \ne 0\). Consider an index \(i \in I_0({\hat{y}})\). Since \(\{y_i^k\} \rightarrow {\hat{y}}_i\) and \(\{{\bar{\eta }}_i^k\}\) is a bounded sequence, we have \( \left\{ \alpha _k \left( y_i^k - 1 \right) + {\bar{\eta }}_i^k \right\} \rightarrow -\infty \). Just like before, we can then assume w.l.o.g. that
which implies \( {\tilde{\eta }}_i = 0 \). Hence, we have
Now let \(i \in I_\pm ({\hat{y}})\). Since \({\hat{y}}_i \ne 0\) and \(\{y_i^k\} \rightarrow {\hat{y}}_i\), we can assume w.l.o.g. that \(y_i^k \ne 0 \) for all \( k \in {\mathbb {N}}\). We then get, for each \(k \in {\mathbb {N}}\), that
Rearranging and dividing (4.14) by \(\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| \) yields
By assumption, \({\tilde{\gamma }}_i = 0\). Consequently, letting \(k \rightarrow \infty \) in (4.15) yields
From (4.13) we then obtain \( {\tilde{\zeta }} \ne 0 \) and \( {\tilde{\eta }}_i = {\tilde{\zeta }} \ne 0 \) for all \( i \in I_\pm ({\hat{y}}) \). Since \(\zeta ^k \ge 0 \) for all \( k \in {\mathbb {N}}\), we have \({\tilde{\zeta }} \ge 0\) and, therefore, \({\tilde{\zeta }} > 0\). Hence, we can assume w.l.o.g. that \(\zeta ^k > 0 \) for all \( k \in {\mathbb {N}}\). This implies \( \zeta ^k = \alpha _k \left( n - e^T y^k - s \right) + {\bar{\zeta }}^k \). We then have
since \(\{{\bar{\zeta }}^k\}\) is bounded by definition. Consequently, we can assume w.l.o.g. that
By assumption, \(({\hat{x}}, {\hat{y}})\) is feasible and, hence, \(n - e^T {\hat{y}}- s \le 0\). Thus, we obtain from (4.17) that \(n - e^T y^k -s > n - e^T {\hat{y}}- s\) and, therefore,
Furthermore, since \({\tilde{\zeta }} > 0\), by (4.16), we also have that \({\tilde{\eta }}_i > 0 \) for all \( i \in I_\pm ({\hat{y}})\). This implies \(\eta _i^k > 0 \) for all sufficiently large \( k \in {\mathbb {N}}\). Consequently, we have \( \eta _i^k = \alpha _k \left( y_i^k - 1 \right) + {\bar{\eta }}_i^k \) for all \(k \in {\mathbb {N}}\) large enough. We then obtain
since \(\{{\bar{\eta }}_i^k\}\) is bounded by definition. Hence, we can assume w.l.o.g. that \( y_i^k > 1 \) for all \( k \in {\mathbb {N}}\). On the other hand, the feasibility of \(({\hat{x}}, {\hat{y}})\) implies \({\hat{y}}_i \le 1 \) for all \( i \in \{1, \dots , n\} \). Consequently, we obtain
Together, this implies
for all \(k \in {\mathbb {N}}\). By passing to a subsequence, we can therefore assume w.l.o.g. that there exists a \(j \in I_0({\hat{y}})\) with \(y^k_j < 0\) for all \(k \in {\mathbb {N}}\). Since \(j \in I_0({\hat{y}})\), by (4.12), we have \(\eta _j^k = 0 \) for all \( k \in {\mathbb {N}}\) and, hence, \( B_j^k = -\zeta ^k + \gamma _j^k x_j^k \) or, equivalently, \( B_j^k + \zeta ^k = \gamma _j^k x_j^k \). Since \(y_j^k \le 0\), we then have
Consequently, we have \( B_j^k + \zeta ^k \le {\bar{\gamma }}_j^k x_j^k \) and, therefore,
Since \(\{ {\bar{\gamma }}_j^k x_j^k \}\) is bounded, letting \(k \rightarrow \infty \) then yields the contradiction \( 0 < {\tilde{\zeta }} \le 0 \). Hence we have \( ({\tilde{\lambda }}, {\tilde{\mu }}, {\tilde{\gamma }}) \ne 0 \).
Dividing \(A^k\) by \(\left\| \left( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k \right) \right\| \) and letting \(k \rightarrow \infty \) then yields
where \(({\tilde{\lambda }}, {\tilde{\mu }}, {\tilde{\gamma }}) \ne 0\) and, in view of (4.7) and (4.11), \( {\tilde{\lambda }} \in {\mathbb {R}}^m_+ \), \( {\tilde{\lambda }}_i = 0 \) for all \( i \notin I_g({\hat{x}}) \), and \( {\tilde{\gamma }}_i = 0 \) for all \( i \in I_\pm ({\hat{x}}) \). This shows that \( {\hat{x}}\) satisfies properties (a)–(c) from Definition 2.4. We now verify that also the three conditions from part (d) hold.
For this purpose, let \(i \in \{1, \dots , m\}\) such that \({\tilde{\lambda }}_i > 0\) holds. Then, we can assume w.l.o.g. that \(\lambda _i^k > 0 \) for all \( k \in {\mathbb {N}}\) and, thus, \(\lambda _i^k = \alpha _k g_i(x^k) + {\bar{\lambda }}_i^k\). Consequently, we have
by the boundedness of \(\{{\bar{\lambda }}_i^k\}\). Thus, we have \( g_i(x^k) > 0 \) for all \( k \in {\mathbb {N}}\) sufficiently large and, therefore, also \( {\tilde{\lambda }}_i g_i(x^k) > 0 \) for all these \( k \in {\mathbb {N}}\).
Next consider an index \(i \in \{1, \dots , p\}\) such that \({\tilde{\mu }}_i \ne 0\). The boundedness of \(\{{\bar{\mu }}_i^k\}\) then implies
Since \(\alpha _k > 0\), this implies that \({\tilde{\mu }}_i \ne 0\) and \(h_i(x^k)\) have the same sign for all \( k \in {\mathbb {N}}\) sufficiently large, i.e. \( {\tilde{\mu }}_i h_i(x^k) > 0 \).
Finally, consider an index \(i \in \{1, \dots , n\}\) such that \({\tilde{\gamma }}_i \ne 0\). The boundedness of \(\{{\bar{\gamma }}_i^k\}\) yields
Hence, \({\tilde{\gamma }}_i \ne 0\) and \(x_i^k\) also have the same sign for all \( k \in {\mathbb {N}}\) large, i.e. \( {\tilde{\gamma }}_i x_i^k > 0 \).
Altogether, this contradicts the assumed CCOP-quasinormality of \({\hat{x}}\). Thus, \(\left\{ ( \lambda ^k, \mu ^k, {\tilde{\gamma }}^k, \zeta ^k, \eta ^k )\right\} \) is bounded and therefore has a convergent subsequence. Assume w.l.o.g. that the whole sequence converges, i.e.,
Since \(\{\lambda ^k\} \subseteq {\mathbb {R}}^m_+\), we also have \({\hat{\lambda }} \in {\mathbb {R}}^m_+\). Consider an index \(i \notin I_g({\hat{x}})\). Then, just like for \({\tilde{\lambda }}_i\), one can show that \({\hat{\lambda }}_i = 0\). Similarly, for \(i \in I_\pm ({\hat{x}})\), following the argument for \({\tilde{\gamma }}_i\), one also gets \({\hat{\gamma }}_i = 0\). Taking \( k \rightarrow \infty \) in the definition of \(A^k\), we then obtain
where \( {\hat{\lambda }}_i = 0 \) for all \( i \notin I_g({\hat{x}}) \) and \( {\hat{\gamma }}_i = 0 \) for all \( i \in I_\pm ({\hat{x}}) \). Thus, we conclude that \(({\hat{x}}, {\hat{y}})\) is CCOP-M-stationary. \(\square \)
It is known from [6, Corollary 4.2] that accumulation points \(({{\hat{x}}}, {{\hat{y}}})\) of Algorithm 3.1, where standard quasinormality holds, are KKT points and thus CCOP-S-stationary. To compare this result with Theorem 4.3, first note that CCOP-quasinormality only depends on \({{\hat{x}}}\), whereas standard quasinormality for (2.2) depends on both \(({{\hat{x}}}, {{\hat{y}}})\). In case \(\{i \mid {{\hat{y}}}_i \ne 0\} = I_0({{\hat{x}}})\), standard quasinormality in \(({{\hat{x}}}, {{\hat{y}}})\) is equivalent to CCOP-quasinormality in \({{\hat{x}}}\), and CCOP-S- and CCOP-M-stationarity coincide. Thus, in this situation, the statement from Theorem 4.3 can also be derived via [6, Corollary 4.2]. However, in case \(\{i \mid {{\hat{y}}}_i \ne 0\} \subsetneq I_0({{\hat{x}}})\), standard quasinormality is always violated in \(({{\hat{x}}}, {{\hat{y}}})\) and thus [6, Corollary 4.2] cannot be applied. In the latter situation, in general, we can only guarantee CCOP-M-stationarity of limits \(({\hat{x}}, {\hat{y}})\). But, using Proposition 2.3, it is still possible to ensure CCOP-stationarity of a potentially modified point \(({\hat{x}}, {\hat{z}})\).
Corollary 4.4
Let \(({\hat{x}}, {\hat{y}}) \in {\mathbb {R}}^n \times {\mathbb {R}}^n\) be a limit point of \(\{(x^k, y^k)\}\) generated by Algorithm 3.1 that is feasible for (2.2) and where \({\hat{x}}\) satisfies CCOP-quasinormality. Then there exists \({\hat{z}}\in {\mathbb {R}}^n\) such that \(({\hat{x}}, {\hat{z}})\) is a CCOP-S-stationary point.
5 Numerical Results
In this section, we compare the performance of ALGENCAN with the Scholtes regularization method from [15] as well as the Kanzow–Schwartz regularization method from [17]. All experiments were conducted using Python together with the Numpy library. We used ALGENCAN 2.4.0 compiled with MA57 library [25] and called through its Python interface with user-supplied gradients of the objective functions, sparse Jacobian of the constraints, as well as sparse Hessian of the Lagrangian. As a subsolver for the two regularization methods, we used the (for academic use) freely available SQP solver WORHP version 1.14 [18] called through its Python interface. For the Scholtes regularization method, WORHP was called with user-supplied sparse gradients of the objective functions, sparse Jacobian of the constraints, as well as the sparse Hessian of the Lagrangian. On the other hand, for the Kanzow–Schwartz regularization method, since the analytical Hessian does not exist as the corresponding NCP-function is not twice differentiable, we called WORHP with user-supplied sparse gradients of the objective functions and sparse Jacobian of the constraints only. The Hessian of the Lagrangian was then approximated using the BFGS method. Throughout the experiments, both ALGENCAN and WORHP were called using their respective default settings.
We applied ALGENCAN directly to the relaxed reformulation of the test problems as in (2.2), i.e. without a lower bound for the auxiliary variable y. In contrast, following [15, 17], for both regularization methods, we bounded y from below by 0. For each test problem, we started both regularization methods with an initial regularization parameter \(t_0 = 1.0\) and decreased \(t_k\) in each iteration by a factor of 0.01. The regularization methods were terminated, if either \(t_k < 10^{-8}\) or \(\left\| x^k \circ y^k \right\| _\infty \le 10^{-6}\).
5.1 Pilot Test
Let us begin by considering the following academic example
which is taken from [17]. This problem has a local minimizer in \(\left( 0, 1 - \frac{1}{2}\sqrt{3}\right) \) and an isolated global minimizer in \(\left( \frac{1}{2}, 0\right) \). Following [17], we discretised the rectangle \(\left[ -1, \frac{3}{2}\right] \times \left[ -\frac{1}{2},2\right] \) resulting in 441 starting points for the considered methods. For each of these starting points, ALGENCAN converged towards the global minimizer \(\left( \frac{1}{2}, 0\right) \). The same behaviour was also observed for the Scholtes regularization method. On the other hand, the Kanzow–Schwartz regularization method was slightly less successful, converging in 437 cases towards the global minimizer. In the other 4 cases, the method converged towards the local minimizer. This behaviour might be due to the performance of the BFGS method used by WORHP in approximating the Hessian of the Lagrangian. Indeed, running the Scholtes regularization method without user-supplied Hessian of the Lagrangian, letting the Hessian be approximated by the BFGS method instead, yielded in a convergence towards the global minimizer in only 394 cases. In the other 47 cases, the Scholtes regularization method only managed to find the local minimizer.
5.2 Portfolio Optimization Problems
Following [17], we consider a classical portfolio optimization problem
where Q and \(\mu \) are the covariance matrix and the mean of n possible assets and \(e^T x \le 1\) is the budget constraint, see [12, 20]. We generated the test problems using the data from [24], considering \(s = 5, 10, 20\) for each dimension \(n = 200, 300, 400\), which resulted in 270 test problems, see also [17]. Here, we considered six total approaches:
-
ALGENCAN without a lower bound on y
-
ALGENCAN with an additional lower bound \(y \ge 0\)
-
Scholtes and Kanzow–Schwartz regularization for cardinality-constrained problems [15, 17] with a regularization of both upper quadrants \(x_i \ge 0, y_i \ge 0\) and \(x_i \le 0, y_i \ge 0\)
-
Scholtes and Kanzow–Schwartz regularization for MPCCs [28, 35] with a regularization of the upper right quadrant \(x_i \ge 0, y_i \ge 0\) only.
As discussed before, introducing a lower bound \(y \ge 0\) in (2.2) is possible without changing the theoretical properties of the reformulation. Similarly, due to the constraint \(x \ge 0\) in (5.1), the feasible set of the reformulated problem actually has the classical MPCC structure, and thus only one regularization function in the first quadrant suffices. This motivates the modifications of both ALGENCAN and the two regularization methods described above, which should theoretically not have any effect on the performance of the solution algorithms.
For each test problem, we used the initial values \(x^0 = 0\) and \(y^0 = e\). As a performance measure for the considered methods we compared the attained objective function values and generated a performance profile as suggested in [21], where we set the objective function value of a method for a problem to be \(\infty \), if the method failed to find a feasible point of the problem within a tolerance of \(10^{-6}\).
As can be seen from Fig. 1, ALGENCAN worked very reliable with regards to feasibility of the solutions. It often outperformed the regularization methods in terms of objective function value of the solution, especially for larger values of s. Although introducing the lower bound \(y \ge 0\) does not have any theoretical effect on ALGENCAN, the numerical results suggest that it could bring slight improvements to ALGENCAN’s performance.
6 Final Remarks
This paper shows that the safeguarded augmented Lagrangian method applied directly and without problem-specific modifications to the continuous reformulation of cardinality-constrained problems converges to suitable (M-, essentially even S-) stationary points under a weak problem-tailored CQ called CCOP-quasinormality. On the other hand, it is known that this safeguarded ALM generates so-called AKKT sequences (AKKT = approximate KKT) which, under suitable constraint qualifications, lead to KKT points and, hence, to S-stationary points. In the context of cardinality constraints, however, the AKKT concept is useless as an optimality criterion since any feasible point is known to be an AKKT point, cf. [32].
On the other hand, there are some recent reports, which define a problem-tailored AKKT-type condition for cardinality constrained problems, see [32, 33] (the latter in a more general context). Algorithmic applications of these AKKT-type conditions are not discussed in these papers. We therefore plan to investigate this topic within our future research. Note that a corresponding convergence theory based on AKKT-type conditions for cardinality constrained problems will be different from our current theory, based on CCOP-quasinormality, since it is already known from standard NLPs that quasinormality and AKKT regularity conditions are two independent concepts, cf. [4].
References
Andreani, R., Martinez, J.M., Schuverdt, M.L.: On the relation between constant positive linear dependence condition and quasinormality constraint qualification. J. Optim. Theory Appl. 125(2), 473–485 (2005)
Andreani, R., Birgin, E.G., Martínez, J.M., Schuverdt, M.L.: On augmented Lagrangian methods with general lower-level constraints. SIAM J. Optim. 18(4), 1286–1309 (2007)
Andreani, R., Birgin, E.G., Martínez, J.M., Schuverdt, M.L.: Augmented Lagrangian methods under the constant positive linear dependence constraint qualification. Math. Program. 111(1–2, Ser. B), 5–32 (2008)
Andreani, R., Martínez, J.M., Ramos, A., Silva, P.J.S.: A cone-continuity constraint qualification and algorithmic consequences. SIAM J. Optim. 26(1), 96–110 (2016)
Andreani, R., Secchin, L.D., Silva, P.J.S.: Convergence properties of a second order augmented Lagrangian method for mathematical programs with complementarity constraints. SIAM J. Optim. 28(3), 2574–2600 (2018)
Andreani, R., Fazzio, N.S., Schuverdt, M.L., Secchin, L.D.: A sequential optimality condition related to the quasi-normality constraint qualification and its algorithmic consequences. SIAM J. Optim. 29(1), 743–766 (2019a)
Andreani, R., Haeser, G., Secchin, L.D., Silva, P.J.S.: New sequential optimality conditions for mathematical programs with complementarity constraints and algorithmic consequences. SIAM J. Optim. 29(4), 3201–3230 (2019b)
Andreani, R., Haeser, G., Viana, D.S.: Optimality conditions and global convergence for nonlinear semidefinite programming. Math. Program. 180(1–2, Ser. A), 203–235 (2020)
Bertsekas, D..P.: Constrained Optimization and Lagrange Multiplier Methods. Computer Science and Applied Mathematics. Academic Press, Inc. [Harcourt Brace Jovanovich, Publishers], New York-London (1982)
Bertsekas, D.P., Ozdaglar, A.E.: Pseudonormality and a Lagrange multiplier theory for constrained optimization. J. Optim. Theory Appl. 114(2), 287–343 (2002)
Bertsimas, D., Shioda, R.: Algorithm for cardinality-constrained quadratic optimization. Comput. Optim. Appl. 43(1), 1–22 (2009)
Bienstock, D.: Computational study of a family of mixed-integer quadratic programming problems. Math. Program. 74(2, Ser. A), 121–140 (1996)
Birgin, E..G., Martínez, J..M.: Practical Augmented Lagrangian Methods for Constrained Optimization, volume 10 of Fundamentals of Algorithms. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (2014)
Birgin, E.G., Gómez, W., Haeser, G., Mito, L.M., Santos, D.O.: An augmented Lagrangian algorithm for nonlinear semidefinite programming applied to the covering problem. Comput. Appl. Math. 39(1): Paper No. 10, 21 (2020)
Branda, M., Bucher, M., Červinka, M., Schwartz, A.: Convergence of a Scholtes-type regularization method for cardinality-constrained optimization problems with an application in sparse robust portfolio optimization. Comput. Optim. Appl. 70(2), 503–530 (2018)
Bueno, L.F., Haeser, G., Rojas, F.N.: Optimality conditions and constraint qualifications for generalized Nash equilibrium problems and their practical implications. SIAM J. Optim. 29(1), 31–54 (2019)
Burdakov, O.P., Kanzow, C., Schwartz, A.: Mathematical programs with cardinality constraints: reformulation by complementarity-type conditions and a regularization method. SIAM J. Optim. 26(1), 397–425 (2016)
Büskens, C., Wassel, D.: The ESA NLP solver WORHP. In: Modeling and Optimization in Space Engineering, volume 73 of Springer Optimization and Applications, pp. 85–110. Springer, New York (2013)
Červinka, M., Kanzow, C., Schwartz, A.: Constraint qualifications and optimality conditions of cardinality-constrained optimization problems. Math. Program. 160(1), 353–377 (2016)
Di Lorenzo, D., Liuzzi, G., Rinaldi, F., Schoen, F., Sciandrone, M.: A concave optimization-based approach for sparse portfolio selection. Optim. Methods Softw. 27(6), 983–1000 (2012)
Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91(2, Ser. A), 201–213 (2002)
Dong, H., Ahn, M., Pang, J.-S.: Structural properties of affine sparsity constraints. Math. Program. 176(1–2, Ser. B), 95–135 (2019)
Feng, M., Mitchell, J.E., Pang, J.-S., Shen, X., Wächter, A.: Complementarity formulations of \(\ell _0\)-norm optimization problems. Pac. J. Optim. 14(2), 273–305 (2018)
Frangioni, A., Gentile, C.: SDP diagonalizations and perspective cuts for a class of nonseparable MIQP. Oper. Res. Lett. 35(2), 181–185 (2007)
HSL. A collection of Fortran codes for large scale scientific computation. http://www.hsl.rl.ac.uk/
Izmailov, A.F., Solodov, M.V., Uskov, E.I.: Global convergence of augmented Lagrangian methods applied to optimization problems with degenerate constraints, including problems with complementarity constraints. SIAM J. Optim. 22(4), 1579–1606 (2012)
Kanzow, C.: On the multiplier-penalty-approach for quasi-variational inequalities. Math. Program. 160(1–2, Ser. A), 33–63 (2016)
Kanzow, C., Schwartz, A.: A new regularization method for mathematical programs with complementarity constraints with strong convergence properties. SIAM J. Optim. 23(2), 770–798 (2013)
Kanzow, C., Steck, D.: Augmented Lagrangian methods for the solution of generalized Nash equilibrium problems. SIAM J. Optim. 26(4), 2034–2058 (2016)
Kanzow, C., Steck, D.: An example comparing the standard and safeguarded augmented Lagrangian methods. Oper. Res. Lett. 45(6), 598–603 (2017)
Kanzow, C., Steck, D., Wachsmuth, D.: An augmented Lagrangian method for optimization problems in Banach spaces. SIAM J. Control Optim. 56(1), 272–291 (2018)
Krulikovski, E.H.M., Ribeiro, A.A., Sachine, M.: A sequential optimality condition for mathematical programs with cardinality constraints. ArXiv e-prints (2020). arXiv:2008.03158
Mehlitz, P.: Asymptotic stationarity and regularity for nonsmooth optimization problems. J. Nonsmooth Anal. Optim. 4, 5 (2020). https://doi.org/10.46298/jnsao-2020-6575
Nocedal, J., Wright, S.J.: Numerical Optimization. Springer Series in Operations Research and Financial Engineering, 2nd edn. Springer, New York (2006)
Scholtes, S.: Convergence properties of a regularization scheme for mathematical programs with complementarity constraints. SIAM J. Optim. 11(4), 918–936 (2001)
Zhao, C., Xiu, N., Qi, H.-D., Luo, Z.: A Lagrange–Newton algorithm for sparse nonlinear programming. ArXiv e-prints, (2020). arXiv:2004.13257
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Ebrahim Sarabi.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Kanzow, C., Raharja, A.B. & Schwartz, A. An Augmented Lagrangian Method for Cardinality-Constrained Optimization Problems. J Optim Theory Appl 189, 793–813 (2021). https://doi.org/10.1007/s10957-021-01854-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-021-01854-7