On nondegenerate M-stationary points for sparsity constrained nonlinear optimization

We study sparsity constrained nonlinear optimization (SCNO) from a topological point of view. Special focus will be on M-stationary points from Burdakov et al. (SIAM J Optim 26:397–425, 2016), also introduced as NC\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N^C$$\end{document}-stationary points in Pan et al. (J Oper Res Soc China 3:421–439, 2015). We introduce nondegenerate M-stationary points and define their M-index. We show that all M-stationary points are generically nondegenerate. In particular, the sparsity constraint is active at all local minimizers of a generic SCNO. Some relations to other stationarity concepts, such as S-stationarity, basic feasibility, and CW-minimality, are discussed in detail. By doing so, the issues of instability and degeneracy of points due to different stationarity concepts are highlighted. The concept of M-stationarity allows to adequately describe the global structure of SCNO along the lines of Morse theory. For that, we study topological changes of lower level sets while passing an M-stationary point. As novelty for SCNO, multiple cells of dimension equal to the M-index are needed to be attached. This intriguing fact is in strong contrast with other optimization problems considered before, where just one cell suffices. As a consequence, we derive a Morse relation for SCNO, which relates the numbers of local minimizers and M-stationary points of M-index equal to one. The appearance of such saddle points cannot be thus neglected from the perspective of global optimization. Due to the multiplicity phenomenon in cell-attachment, a saddle point may lead to more than two different local minimizers. We conclude that the relatively involved structure of saddle points is the source of well-known difficulty if solving SCNO to global optimality.


Introduction
We consider the sparsity constrained nonlinear optimization: where the so-called 0 "norm" counts non-zero entries of x: the objective function f ∈ C 2 (R n , R) is twice continuously differentiable, and s ∈ {0, 1, . . . , n − 1} is an integer. The difficulty of solving SCNO comes from the combinatorial nature of the sparsity constraint x 0 ≤ s. The requirement of sparsity is however motivated by various applications, such as compressed sensing, model selection, image processing etc.
We refer e. g. to [7,25], and [22] for further details on the relevant applications.
In the seminal paper [1], necessary optimality conditions for SCNO have been stated. Namely, the notions of basic feasibility (BF-vector), L-stationarity and CW-minimality have been introduced and studied there. Note that the formulation of L-stationarity mimics the techniques from convex optimization by using the orthogonal projection on the SCNO feasible set. The notion of CW-minimum incorporates the coordinate-wise optimality along the axes. Based on both stationarity concepts, algorithms that find points satisfying these conditions have been developed. Those are the iterative hard thresholding method, as well as the greedy and partial sparse-simplex methods. In a series of subsequent papers [2,3] elaborated the algorithmic approach for SCNO which is based on L-stationarity and CW-minimality.
Another line of research started with [5], where additionally smooth equality and inequality constraints have been incorporated into SCNO. For that, the authors coin the new term of mathematical programs with cardinality constraints (MPCC). The key idea in [5] is to provide a mixed-integer formulation whose standard relaxation still has the same solutions as MPCC. For the relaxation the notion of S-stationary points is proposed. S-stationarity corresponds to the standard Karush-Kuhn-Tucker condition for the relaxed program. The techniques applied follow mainly those for mathematical programs with complementarity constraints. In particular, an appropriate regularization method for solving MPCC is suggested. The latter is proved to converge towards so-called M-stationary points. M-stationarity corresponds to the standard Karush-Kuhn-Tucker condition of the tightened program, where zero entries of an MPCC feasible point remain locally vanishing. Further research in this direction is presented in a series of subsequent papers [4,6].
Finally, we would like to mention stationarity concepts for SCNO based on the normal cones of the sparsity constrained feasible set. In [20], the Bouligand and Clarke normal cones of the SCNO feasible set are used to derive N B -and N C -stationarity, respectively. Corresponding second-order necessary and sufficient optimality conditions are stated there. These findings were generalized for MPCC in [19]. In [16], the Fréchet and Mordukhovich normal cones of the SCNO feasible set are used to derive N -and N -stationarity, respectively. These notions were generalized for the intersection of the sparsity constrained feasible set with a polyhedral set. In [17], a penalty decomposition method essentially based on the notion of N -stationarity is proposed for solving MPCC under the Robinson's constraint qualification.
The goal of this paper is the study of SCNO from a topological point of view. The topological approach to optimization has been pioneered by [12,13] for nonlinear programming problems, and successfully developed for mathematical programs with complementarity constraints, mathematical problems with vanishing constraints, general semi-infinite pro-gramming, bilevel optimization, semi-definite programming, disjunctive programming etc., see e. g. [23] and references therein. The main idea of the topological approach is to identify stationary points which roughly speaking induce the global structure of the underlying optimization problem. The stationary points include minimizers, but also all kinds of saddle points-just in analogy to the unconstrained case. It turns out that for SCNO the concept of M-stationarity from [5] -coinciding with N C -stationarity from [20]-is the adequate stationarity concept at least from the topological perspective. We outline our main findings and results: 1. We introduce nondegenerate M-stationary points along with their associated M-indices.
The latter subsume as usual the quadratic part-the number of negative eigenvalues of the objective's Hessian restricted to non-vanishing variables. As novelty, the sparsity constraint provides an addition to the M-index, namely, the difference between the bound and the current number of non-zero variables at a nondegenerate M-stationary point. We prove that all M-stationary points are generically nondegenerate. In particular, it follows that all local minimizers of SCNO are nondegenerate with vanishing M-index, hence, the sparsity constraint is active. Note that M-stationary points with non-vanishing Mindex correspond to saddle points. The local structure of SCNO around a nondegenerate M-stationary point is fully described just by its M-index, at least up to a differentiable change of coordinates. 2. We thoroughly discuss the relation of M-stationarity to S-stationarity, basic feasibility, and CW-minimality for SCNO. It turns out that nondegenerate M-stationary points may cause degeneracies of S-stationary points viewed as Karush-Kuhn-Tucker-points for the relaxed problem. Moreover, even under the cardinality constrained second-order sufficient optimality condition from [4] assumed to hold at an S-stationary point, the corresponding M-stationary point does not need to be a nondegenerate local minimizer for SCNO. As for CW-minima, we show that they are not stable with respect to data perturbations in SCNO. After an arbitrarily small C 2 -perturbation of f a locally unique CW-minimum may bifurcate into multiple CW-minima. More importantly, this bifurcation unavoidably causes the emergence of M-stationary points, being different from the CW-minima. Despite of this instability phenomenon, if a BF-vector and, hence, CW-minimum, happens to be nondegenerate as an M-stationary point, then the sparsity constraint is necessarily active. 3. We use the concept of M-stationarity in order to describe the global structure of SCNO.
To this aim the study of topological properties of its lower level sets is undertaken. As in the standard Morse theory, see e. g. [10,18], we focus on the topological changes of the lower level sets as their levels vary. Appropriate versions of deformation and cellattachment theorems are shown to hold for SCNO. Whereas the deformation is standard, the cell-attachment reveals an essentially new phenomenon not observed in nonsmooth optimization before. In SCNO, multiple cells of the same dimension need to be attached, see Theorem 5. To determine the number of these attached cells turns out to constitute a challenging combinatorial problem from algebraic topology, see Lemma 1. 4. As a consequence of proposed Morse theory, we derive a Morse relation for SCNO, which relates the numbers of local minimizers and M-stationary points of M-index equal to one.
The appearance of such saddle points cannot be thus neglected from the perspective of global optimization. As novelty for SCNO, a saddle point may lead to more than two different local minimizers. This is in strong contrast with other nonsmooth optimization problems studied before, see e. g. [23], where a saddle point leads to at most two of them. We conclude that the relatively involved structure of saddle points is the source of well-known difficulty if solving SCNO to global optimality.
The paper is organized as follows. In Sect. 2 we discuss the notion of M-stationarity for SCNO. Section 3 is devoted to the relation of M-stationarity to other stationarity concepts from the literature. In Sect. 4 the global structure of SCNO is described within the scope of Morse theory.
Our notation is standard. The cardinality of a finite set S is denoted by |S|. The ndimensional Euclidean space is denoted by R n with the coordinate vectors e i , i = 1, . . . , n. For J ⊂ {1, . . . , n} we denote by conv e j , j ∈ J and span e j , j ∈ J the convex and linear combination of the coordinate vectors e j , j ∈ J , respectively. Given a twice continuously differentiable function f : R n → R, ∇ f denotes its gradient, and D 2 f stands for its Hessian.

M-stationarity
For 0 ≤ k ≤ n we use the notation Using the latter, the feasible set of SCNO can be written as For a feasible point x ∈ R n,s we define the following complementary index sets: Without loss of generality, we assume throughout the whole paper that at the particular point of interestx ∈ R n,s with x 0 = k it holds: Using this convention, the following local description of SCNO feasible set can be deduced. Letx ∈ R n,s be a feasible point for SCNO with x 0 = k. Then, there exist neighborhoods Ux and V 0 ofx and 0, respectively, such that under the linear coordinate transformation (x) = x −x we have locally: Obviously, a local minimizer of SCNO is an M-stationary point.

Definition 2 (Nondegenerate M-stationarity)
An M-stationary pointx ∈ R n,s with x 0 = k is called nondegenerate if the following conditions hold: is nonsingular.

Definition 3 (M-Index)
Letx ∈ R n,s be a nondegenerate M-stationary point with x 0 = k. The number of negative eigenvalues of the matrix is called its quadratic index (Q I ). The number s − k + Q I is called the M-index ofx.
Theorem 1 (Morse-Lemma for SCNO) Suppose thatx is a nondegenerate M-stationary point for SCNO with x 0 = k and quadratic index Q I . Then, there exist neighborhoods Ux and V 0 ofx and 0, respectively, and a local C 1 -coordinate system : Ux → V 0 of R n aroundx such that: where y ∈ R n−k,s−k × R k . Moreover, there are exactly Q I negative squares in (2).
Proof Without loss of generality, we may assume f (x) = 0. By using from (1), we put At the origin we have: (iii) the matrix ∂ 2f ∂ y i ∂ y j i, j=n−k+1,...,n is nonsingular.
We denotef by f again. Under the following coordinate transformations the set R n−k,s−k × R k will be equivariantly transformed in itself. We put y = Y n−k , Y k , where Y n−k = (y 1 , . . . , y n−k ) and Y k = (y n−k+1 , . . . , y n ). It holds: Note that d i ∈ C 1 , i = 1, . . . , n − k. Due to (ii)-(iii), we may apply the standard Morse lemma on the C 2 -function f 0, Y k without affecting the coordinates Y n−k , see e. g. [13]. The corresponding coordinate transformation is of class C 1 . Denoting the transformed functions again by f and d i , we obtain In case k = s, we need to consider f locally around the origin on the set Hence, y i = 0 for i = 1, . . . , n − k, and we immediately obtain the representation (2).
Hence, we may take y i d i (y), i = 1, . . . , n − k, y j , j = n − k + 1, . . . , n as new local C 1 -coordinates by a straightforward application of the inverse function theorem. Denoting the transformed function again by f , we obtain (2). Here, the coordinate transformation is understood as the composition of all previous ones.

Proposition 1 (Nondegenerate minimizers) Letx be a nondegenerate M-stationary point for SCNO. Then,x is a local minimizer for SCNO if and only if its M-index vanishes.
Proof Letx be a nondegenerate M-stationary point for SCNO. The application of Morse Lemma from Theorem 1 says that there exist neighborhoods Ux and V 0 ofx and 0, respectively, and a local C 1 -coordinate system : Ux → V 0 of R n aroundx such that: where y ∈ R n−k,s−k × R k . Therefore,x is a local minimizer for SCNO if and only if 0 is a local minimizer of f • −1 on the set R n−k,s−k × R k ∩ V 0 . If the M-index ofx vanishes, we have k = s and Q I = 0, and (3) reads as where y ∈ {0} n−s × R s . Thus, 0 is a local minimizer for (4). Vice versa, if 0 is a local minimizer for (3), then obviously k = s and Q I = 0, hence, the M-index ofx vanishes.
Let C 2 (R n , R) be endowed with the strong (or Whitney) C 2 -topology, denoted by C k s (see e. g. [11]). The C k s -topology is generated by allowing perturbations of the functions, their gradients and Hessians, which are controlled by means of continuous positive functions. We say that a set is C 2 s -generic if it contains a countable intersection of C 2 s -open and -dense subsets. Since C 2 (R n , R) endowed with the C 2 s -topology is a Baire space, generic sets are in particular dense.
has rank r .
Note that (m1) refers to feasibility, (m2) to M-stationarity, and (m3)-(m4) describe possible violations of ND1-ND2, respectively. Now, it suffices to show that all k,D,E,r are generically empty whenever E is nonempty or the rank r is less than k. By setting I 1 (x) = D and I 0 (x) = {1, . . . , n}\D, this would mean, respectively, that at least one of the derivatives is singular in ND2. In fact, the available degrees of freedom of the variables involved in each k,D,E,r are n. The loss of freedom caused by (m1) is n − k, and the loss of freedom caused by (m2) is k. Hence, the total loss of freedom is n. We conclude that a further nondegeneracy would exceed the total available degrees of freedom n. By virtue of the jet transversality theorem from [13], generically the sets k,D,E,r must be empty.
For the openness result, we argue in a standard way. Locally, M-stationarity can be written via stable equations. Then, the implicit function theorem for Banach spaces can be applied to follow M-stationary points with respect to (local) C 2 -perturbations of defining functions. Finally, a standard globalization procedure exploiting the specific properties of the strong C 2 s -topology can be used to construct a (global) C 2 s -neighborhood of problem data for which the nondegeneracy property is stable.

Theorem 3 (Genericity for minimizers) Generically, all minimizers of SCNO are nondegenerate with the vanishing M-index.
Proof Note that every local minimizer of SCNO has to be M-stationary. Nondegenerate M-stationary points are generic by Theorem 2. Hence, generically, local minimizers are nondegenerate. Due to Proposition 1, they have vanishing M-index.
By recalling Definition 3 of M-index, we deduce the following important Corollary 1 on the structure of minimizers for SCNO. Corollary 1 (Sparsity constraint at minimizers) At each generic local minimizerx ∈ R n,s of SCNO the sparsity constraint is active, i. e. x 0 = s.

Relation to other stationarity concepts
We relate M-stationarity to other well-known stationarity concepts for SCNO from the literature. First, we focus on S-stationarity introduced in [5]. Then, the notions of basic feasibility and CW-minimality from [1] will be discussed.

S-stationarity
In [5] the following observation has been made:x solves SCNO if and only if there existsȳ such that (x,ȳ) solves the following mixed-integer program: Using the standard relaxation of the binary constraints y i ∈ {0, 1}, the authors arrive at the following continuous optimization problem: As pointed out in [5], SCNO and the optimization problem (6) are closely related:x solves SCNO if and only if there exists a vectorȳ such that (x,ȳ) solves (6). Additionally, the concept of S-stationarity is proposed for (6). For its formulation the following index sets are needed: and, additionally, it holds:

Remark 1 (M-stationarity)
We point out that initially [5] defined the concept of Mstationarity for the relaxed optimization problem (6). Namely, a feasible point (x,ȳ) of (6) is called M-stationary if just (7) is valid. Due to the feasibility of (x,ȳ), we haveȳ i = 0 ifx i = 0 for all i = 1, . . . , n. Hence, it holds: and M-stationarity is independent from the auxiliary variableȳ. Thus, already in [4] it is sometimes said that a feasible pointx of SCNO is M-stationary itself. We use M-stationarity exactly in this sense, cf. Definition 1.
In order to relate M-and S-stationarity, we introduce the canonical choice of the auxiliary variablesȳ for a feasible pointx of SCNO: The auxiliary variablesȳ can be seen as counters of the zero elements ofx. Note that (x,ȳ) becomes feasible for (6).
The importance of S-stationary points is due to the following Proposition 3.

Proposition 3 (S-stationarity and KKT-points, [5]) A feasible point (x,ȳ) satisfies the Karush-Kuhn-Tucker condition if and only if it is S-stationary for (6).
Despite this appealing relation, nondegenerate M-stationary points of SCNO may cause degeneracies of the corresponding S-stationary points. This means that they become degenerate Karush-Kuhn-Tucker-points for (6), i. e. the linear independent constraint qualification is not fulfilled, strict complementarity is violated, or the second derivative of the corresponding Lagrange function restricted to the tangential space becomes singular. The appearance of these degeneracies is mainly due to the fact that the objective function in (6) does not depend on y-variables. We illustrate this phenomenon by means of the following Example 1.

Example 1 (S-stationarity and degeneracies)
We consider the following SCNO with n = 2 and s = 1: It is easy to see that the feasible pointx = (0, 0) is M-stationary with x 0 = k = 0. Moreover, it is nondegenerate with quadratic index Q I = 0. For its M-index we have meaning thatx is a saddle point which connects two minimizers (1, 0) and (0, 1). Further, by the canonical choice (8) of auxiliary y-variables, we obtain the corresponding S-stationary point (x,ȳ) = (0, 0, 1, 1). Due to Proposition 3, (x,ȳ) is also a Karush-Kuhn-Tucker-point for the optimization problem (6): The gradients of the active constraints at (x,ȳ) are linearly independent: Hence, the linear independent constraint qualification holds at (x,ȳ). Let us determine the unique Lagrange multipliers from the Karush-Kuhn-Tucker condition: ⎛ We get μ 1 = μ 2 = 0 and λ 1 = λ 2 = −2. Hence, the strict complementarity is violated at (x,ȳ). Finally, the tangential space on the feasible set vanishes at (x,ȳ). Hence, the second derivative of the corresponding Lagrange function restricted to the tangential space is trivially nonsingular. Overall, we claim that (x,ȳ) is a degenerate Karush-Kuhn-Tucker-point for (6) due to the lack of strict complementarity. It remains to note that the degeneracy of S-stationary points (x, y) prevails if other choices of auxiliary y-variables are made.
An attempt to define a tailored notion of nondegeneracy for S-stationary points of (6) has been recently undertaken in [4]. Let us briefly recall their main idea. For that, the so-called CC-linearization cone L CC (x,ȳ) at a feasible point (x,ȳ) of (6) is used, cf. [6]. Namely, satisfies by definition the following conditions: Here, the new index sets are Definition 5 (CC-SOSC, [4]) Let (x,ȳ) be an S-stationary point for (6). If for all directions then the cardinality constrained second-order sufficient optimality condition (CC-SOSC) is said to hold at (x,ȳ).
The role of CC-SOSC can be seen from the following Proposition 4.
for all feasible points (x, y) of (6) taken sufficiently close to (x,ȳ), and fulfilling x =x.
We relate the concepts of nondegeneracy for M-stationary points and of CC-SOSC for S-stationary points. Next, Proposition 5 mainly follows from Corollary 3.2 a) in [4]. We prove it here for the sake of completeness. Proof By Proposition 4, (x,ȳ) is a local minimizer of (6) with respect to x. For all x ∈ R n,s sufficiently close tox we have because of x 0 = s that (x,ȳ) is feasible for (6). Thus,x is is a local minimizer for SCNO. Due to the canonical choice (8) of auxiliary variablesȳ, the index sets from the definition of the CC-linearization cone L CC (x,ȳ) are Due to x 0 = s, we additionally have Hence, it holds: so that CC-SOSC says that the matrix is positive definite. Hence, the minimizerx is nondegenerate.
If the sparsity constraint is not active for an M-stationary pointx of SCNO, i. e. x 0 < s, the implication in Proposition 5 does not hold in general anymore. Namely,x does not need to be a local minimizer for SCNO, even if CC-SOSC holds at the corresponding S-stationary point (x,ȳ) with the canonical choice (8) of auxiliary variablesȳ. This is illustrated by means of the following Example 2.

Example 2 (Sparsity constraint and CC-SOSC)
We consider the following SCNO with n = 2 and s = 1: It is easy to see that the feasible pointx = (0, 0) is M-stationary. Note that the sparsity constraint is not active forx, since k = x 0 = 0 < 1 = s. By the canonical choice (8) of auxiliary y-variables, we obtain the corresponding S-stationary point (x,ȳ) = (0, 0, 1, 1). Analogously to the proof of Proposition 4 and by recalling (9), Note that here I 1 (x) = ∅ and I 0 (x) = {1, 2}. Hence, the CC-linearization cone is Overall, CC-SOSC trivially holds at (x,ȳ), and as follows from Proposition 4, it is a strict local minimizer of (6) with respect to x. Nevertheless,x is not a local minimizer. Actually, it is a nondegenerate M-stationary point with the quadratic index Q I = 0. For its M-index we have We conclude thatx is rather a saddle point for SCNO.

Basic feasibility and CW-minimality
We proceed by discussing stationarity concepts from [1]. Inspired by linear programming terminology, they first introduce the notion of a basic feasible vector for SCNO.
Definition 6 (Basic feasibility, [1]) A vectorx ∈ R n,s with x 0 = k is called basic feasible (BF) for SCNO if the following conditions are fulfilled: BF1: in case k < s, it holds: BF2: in case k = s, it holds: Attention has been also paid to the notion of coordinate-wise minimum for SCNO. Basic feasibility and CW-minimality can be viewed as necessary optimality condition for SCNO.

Proposition 6 (BF-vector and CW-minimum, [1]) Every global minimizer for SCNO is a CW-minimum, and every CW-minimum for SCNO is a BF-vector.
It is claimed in [1] that the basic feasibility condition is quite weak, namely, there are many BF-points that are not optimal for SCNO. The notion of CW-minimum provides a much stricter necessary optimality condition. Based on the latter, a greedy sparse-simplex method for the numerical treatment of SCNO is proposed by [1]. Let us now examine the relation between M-stationarity, basic feasibility, and CW-minimality.

Proposition 7 (M-stationarity, BF-vector, and CW-minimum) Every BF-vector for SCNO is an M-stationary point, in particular, so is every CW-minimum.
Proof Letx be a BF-vector for SCNO with x 0 = k. If k < s, then BF1 implies Mstationarity ofx. If k = s, then BF2 coincides with the latter property. Since every CWminimum for SCNO is a BF-vector according to Proposition 6, the assertion follows. Proposition 7 says that M-stationarity is an even weaker condition than basic feasibility and CW-minimality. Why should we care about M-stationarity then? Is it not enough to rather focus on the stricter necessary optimality condition of CW-minimality as in [1]? It turns out that CW-minima need not to be stable with respect to data perturbations. Namely, after an arbitrarily small C 2 -perturbation of f a locally unique CW-minimum may bifurcate into multiple CW-minima. More importantly, this bifurcation unavoidably causes the emergence of M-stationary points, being different from CW-minima. Next Example 3 illustrates this instability phenomenon.

Example 3 (CW-mimimum and instability)
We consider the following SCNO with n = 2 and s = 1: min Obviously,x = (0, 0) is the unique minimizer of (10). Due to Proposition 6, it is also a CW-minimum, as well as a BF-vector. Further, let us perturb (10) by using an arbitrarily small ε > 0 as follows: It is easy to see that the perturbed problem (11) has now two solutionsx 1 = (ε, 0) and x 2 = (0, ε). Both are CW-minima, and, hence, BF-points. Here, we observe a bifurcation of the CW-minimumx of the original problem (10) into two CW-minimax 1 andx 2 of the perturbed problem (11). Let us explain this bifurcation in terms of M-stationarity. The bifurcation is caused by the degeneracy ofx viewed as an M-stationary point of the original problem (10). Note that ND1 is violated at the M-stationary pointx of the original problem (10). More interestingly, althoughx is neither a CW-minimum nor a BF-vector of (11) anymore, it becomes a new M-stationary point for the perturbed problem. In fact, due to x 0 = k = 0 and the validity of ND1,x is a nondegenerate M-stationary point of (11) with the quadratic index Q I = 0. For its M-index we have s − k + Q I = 1 − 0 + 0 = 1, meaning thatx is a saddle point which connects two nondegenerate minimizersx 1 andx 2 of (11). Overall, we conclude that the degenerate CW-minimumx of the original problem (10) is not stable. Moreover, it bifurcates into two nondegenerate CW-minimax 1 andx 2 , as well as leads to one nondegenerate saddle pointx of the perturbed problem (10).
Example 3 suggests to consider nondegenerate BF-vectors or nondegenerate CW-minima for SCNO, in order to guarantee their stability with respect to sufficiently small data perturbations. Then, however, the sparsity constraint turns out to be active. This means that BF1 in Definition 6 and CW1 in Definition 7 become redundant.

Proposition 8 (BF-vector, CW-minumum and nondegeneracy) Letx be a BF-vector for SCNO with x 0 = k. If it is nondegenerate as an M-stationary point for SCNO, then k = s. The same applies for CW-minima.
Proof Assume that k < s, then ND1 contradicts BF1, whenever I 0 (x) = ∅. Otherwise, we have k = n, and, hence, n < s, a contradiction. It remains to note that every CW-minimum for SCNO is a BF-vector due to Proposition 6.

Normal cone stationarity
In [20], the Bouligand and Clarke normal cones of the SCNO feasible set are used to derive corresponding stationarity concepts. Let N B R n,s (x) stand for the Bouligand and N C R n,s (x) for the Clarke normal cone of R n,s atx, see e. g. [21] for details. [20]) A feasible pointx ∈ R n,s is called N B -and N C -stationary for SCNO if it respectively holds:

Definition 8 (N B -and N C -stationarity,
. We relate the normal cone stationarity to the previously discussed concepts in the context of SCNO. As consequence, N B -and N C -stationarity can be viewed as necessary optimality conditions for SCNO. Note that the equivalence of basic feasibility and N B -stationarity for SCNO has been already mentioned in [20].

Proposition 9 The notions of basic feasibility and N B -stationarity for SCNO coincide, so do the notions of M-and N C -stationarity.
Proof Theorems 2.1 and 2.2 from [20] provide explicit formulas for the Bouligand and Clarke normal cones of R n,s at a SCNO feasible pointx with x 0 = k: Thus, the conclusion immediately follows.
Let us now comment on the second-order sufficient condition introduced in [20] for N Cstationary points. For that, we denote by T C R n,s the Clarke tangential cone of R n,s atx, see e. g. [21] for details.
Proposition 10 (Second-order sufficient optimality, [20]) Letx be an N C -stationary point for SCNO. Assume the second-order sufficient condition (SOSC) to hold atx: Then,x is a strict local minimizer of f restricted to the set R n,s ∩ span {e i | i ∈ I 1 (x) }.
It turns out that CC-SOSC from [4] and SOSC from [20] are equivalent.

Proposition 11 (CC-SOSC and SOSC) SOSC holds at an M-stationary pointx for SCNO if and only if CC-SOSC holds at the S-stationary point (x,ȳ) for (6) with the canonical choice (8) of auxiliary variablesȳ.
Proof Due to the canonical choice (8) of auxiliary variablesȳ, the index sets from the definition of the CC-linearization cone L CC (x,ȳ) are Case 1: x 0 = s. Then, we additionally have Hence, it holds: Case 2: x 0 < s. Then, we additionally have Hence, it holds: In both case, CC-SOSC is to say that the matrix is positive definite. This is exactly what SOSC requires. In fact, Theorem 2.2 from [20] gives the explicit representation of the Clarke tangential cone of R n,s atx: Now, the assertion follows due to Proposition 2.
We can easily relate the concepts of nondegeneracy and SOSC.

Proposition 12 (Nondegeneracy and SOSC) Letx be an M-stationary point for SCNO with
x 0 = s. Assume that SOSC holds atx. Then,x is a nondegenerate local minimizer for SCNO.
Proof Due to Proposition 10,x is a local minimizer of f on the set R n,s ∩span {e i | i ∈ I 1 (x) }. Since x 0 = s, we have for all x ∈ R n,s sufficiently close tox that I 1 (x) = I 1 (x). Hence,x is actually a local minimizer for SCNO. Moreover, it is nondegenerate, because SOSC means that the matrix is positive definite.
If the sparsity constraint is not active for an M-stationary pointx of SCNO, i. e. x 0 < s, the implication in Proposition 12 does not hold in general anymore. Namely,x does not need to be a local minimizer for SCNO, even in presence of SOSC. We note that this observation has been already made in Example 2.12 from [20]. Let us reconsider this example by using the notion of nondegeneracy. [20]) We consider the following SCNO with n = 3 and s = 2:
In [16], the Fréchet and Mordukhovich normal cones of the SCNO feasible set are used to derive corresponding stationarity concepts. Let N R n,s (x) stand for the Fréchet and N R n,s (x) for the Mordukhovich normal cone of R n,s atx, see e. g. [21] for details. ( N -and N -stationarity, [16]) A feasible pointx ∈ R n,s is called N -and Nstationary for SCNO if it respectively holds:

Definition 9
Note that N -and N -stationarity can be viewed as necessary optimality conditions for SCNO, see [16]. The relation of N -and N -stationarity to the previously discussed concepts in the context of SCNO has been essentially elaborated in [16].

Proposition 13 The notions of basic feasibility and N -stationarity coincide. Every Nstationary point for SCNO is also M-stationary.
Proof Theorem 3.1 from [16] provides explicit formulas for the Fréchet and Mordukhovich normal cones of R n,s at a SCNO feasible pointx with x 0 = k: The equivalence of basic feasibility and N -stationarity follows immediately. For the second assertion, we assume thatx is N -stationary and let i ∈ I 1 (x) be arbitrarily, but fixed. Then, i ∈ J for every J ∈ J (x), and, hence, Thus,x is M-stationary.

Remark 2 (N -stationarity and instability)
Proposition 13 says that M-stationarity is a weaker condition than N -stationarity. However, it turns out that N -stationary points need not to be stable with respect to data perturbations. Namely, after an arbitrarily small C 2perturbation of f a locally unique N -stationary point may bifurcate into multiple N -stationary points. More importantly, this bifurcation unavoidably causes the emergence of M-stationary points, not being N -stationary. This is in full analogy with CW-minima and BF-vectors. The same Example 3 illustrates the instability phenomenon for N -stationary points as well. It is worth to mention that there bifurcation happens even though the N -stationary point under consideration fulfils SOSC.
In order to better understand the relations of M-stationarity to other stationarity concepts discussed in Sect. 3, we provide the following diagram:

Global results
Let us study the topological properties of lower level sets where a ∈ R is varying. For that, we define intermediate sets for a < b: For the topological concepts used below we refer to [24]. Let us start with Assumption 1 which is usual within the scope of Morse theory, cf. [10]. It prevents from considering asymptotic effects at infinity.

Assumption 1
The restriction of the objective function f |R n,s on the SCNO feasible set is proper, i. e. f −1 (K ) ∩ R n,s is compact for any compact set K ⊂ R.

Theorem 4 (Deformation for SCNO) Let Assumption 1 be fulfilled and M b a contain no Mstationary points for SCNO. Then, M a is homeomorphic to M b .
Proof We apply Proposition 3.2 from Part I in [10]. The latter provides the deformation for general Whitney stratified sets with respect to critical points of proper maps. Note that the SCNO feasible set admits a Whitney stratification: where The notion of criticality used in [10] can be stated for SCNO as follows. A pointx ∈ R n,s is called critical for f |R n,s if it holds: where Z is the stratum of R n,s which containsx, and Tx Z is the tangent space of Z atx. By identifying I = I 1 (x) and, hence, I c = I 0 (x), we see that the concepts of criticality and M-stationarity coincide. This concludes the assertion.
Let us now turn our attention to the topological changes of lower level sets when passing an M-stationary level. Traditionally, they are described by means of the so-called cellattachment. We first consider a special case of cell-attachment. For that, let N denote the lower level set of a special linear function on R p,q , i. e.
where ∈ R, and the integers q < p are nonnegative.
In terms of upper level sets Lemma 1 can be obviously reformulated as follows: For any > 0 the set N − is homotopy-equivalent to N with p−1 q cells of dimension q attached. Let us show the latter assertion.
First, we note that the sets N 0 and N − are contractible. The contraction is performed via the mapping For the lower level set N we have the representation Note that N ,J is homotopy-equivalent to the set N J , where is the (|J | − 1)-dimensional simplex conv e j , j ∈ J of R p . In fact, the map can be used for all N J . Altogether, N is homotopy-equivalent to Note that the set in (13) is the (q − 1)-skeleton of the ( p − 1)-dimensional simplex of R p . The (q − 1)-skeleton of the ( p − 1)-dimensional simplex is the union of its simplices up to dimension q − 1, see e. g. [9]. Within the (q − 1)-skeleton (13), we close all q-dimensional holes by attaching qdimensional cells from the collection of simplices conv e j , j ∈ J J ⊂ {1, . . . , p}, |J | = q + 1 .
The attachment should result in a contractible set, as it is actually N 0 . We note that the union of the subdivision conv e j , j ∈ J J ⊂ {1, . . . , p}, 1 ∈ J , |J | = q + 1 (14) is also contractible, namely, to e 1 . To see this, we may use the map Furthermore, none of the relative interiors of the simplices in (14) can be deleted. In fact, deleting gives rise to the boundary of a q-dimensional simplex and the latter is not contractible. On the other hand, for any J * ⊂ {1, . . . , p}\{1} with |J * | = q + 1 the union conv e j , j ∈ J * ∪ J * * ⊂ J * J * * = q conv e j , j ∈ J * * ∪ {1} (15) forms the boundary of the (q + 1)-dimensional simplex conv e j , j ∈ J * ∪ {1} . Hence, the set in (15) is not contractible. Altogether, precisely the q-dimensional cells in (14) can be attached to the (q −1)-skeleton (13) in order to obtain a contractible set. Its number obviously equals p−1 q . This completes the proof. Proof Theorem 4 allows deformations up to an arbitrarily small neighborhood of the Mstationary pointx. In such a neighborhood, we may assume without loss of generality that x = 0 and f has the following form as from Theorem 1: where x ∈ R n−k,s−k × R k , and the number of negative squares in (16) equals Q I . In terms of [10] the set R n−k,s−k × R k can be interpreted as the product of the tangential part R k and the normal part R n−k,s−k . The cell-attachment along the tangential part is standard.
Analogously to the unconstrained case, one Q I -dimensional cell has to be attached on R k . The cell-attachment along the normal part is more involved. Due to Lemma 1, we need to attach n−k−1 s−k cells on R n−k,s−k , each of dimension s − k. Finally, we apply Theorem 3.7 from Part I in [10], which says that the local Morse data is the product of tangential and normal Morse data. Hence, the dimensions of the attached cells add together. Here, we have then to attach n−k−1 Let us put Theorem 5 into the context of Morse theory as developed in the literature for other nonsmooth optimization problems. The new issue for SCNO is the multiplicity of attached cells.

Remark 3 (Multiplicity of attached cells)
We recall that for nonlinear programming problems (NLP) the dimension of the cell to be attached while passing a critical point equals to its quadratic index, see e. g. [13]. The situation changes if we consider mathematical programs with complementarity constraints (MPCC). Here, the dimension of attached cells equals to the so-called C-index of C-stationary points, see [14]. In addition to quadratic, the C-index also has a bi-active part. The latter counts negative pairs of Lagrange multipliers corresponding to the bi-active complementarity constraints. The cell-attachment for mathematical programs with vanishing constraints (MPVC) is similar, see [8]. The dimension of attached cells equals here to the so-called T-index of T-stationary points. The T-index consists again of quadratic and bi-active parts. We emphasize that the cell-attachment for SCNO considerably differs from the described cases of NLP, MPCC, and MPVC. The main difference is that multiple cells are involved into the cell-attachment procedure for SCNO. The multiplicity of attached cells is a novel and striking phenomenon in nonsmooth optimization not observed in the literature before. From the technical point of view, this makes the cell-attachment result for SCNO to appear rather challenging. Note that the determination of the number of attached cells becomes an involved combinatorial problem from algebraic topology, see Lemma 1.
Let us present a global interpretation of our results for SCNO. For that, we need to state another assumption. Following Assumption 2 is standard in the context of SCNO, cf. [1], and gives a necessary condition for its solvability.
where r is the number of local minimizers of SCNO, r I and r I I are the numbers of Mstationary points with M-index equal to one, which correspond to the types (I) and (II), respectively.
Proof We assume without loss of generality that the objective function f has pairwise different values at all M-stationarity points of SCNO. If it is not the case, we may enforce this property by sufficiently small perturbations of the objective function. Due to the openness part in Theorem 2, all M-stationarity points of such a perturbed SCNO remain nondegenerate. Moreover, the formula (17) is still valid since it does not depend on the functional values of f . Further, let q a denote the number of connected components of the lower level set M a . We focus on how q a changes as a ∈ R increases. Due to Theorem 4, q a can change only if passing through a value corresponding to an M-stationary pointx, i.e. a = f (x). In fact, Theorem 4 allows homeomorphic deformations of lower level sets up to an arbitrarily small neighborhood of the M-stationary pointx. Then, we have to estimate the difference between q a and q a−ε , where ε > 0 is arbitrarily, but sufficiently small, and a = f (x). This is done by a local argument. For that, let the M-index ofx be s − k + Q I with x 0 = k. We use Theorem 5 which says that M a is homotopy-equivalent to M a−ε with a cell-attachment of Let us distinguish the following cases: 1)x is a local minimizer with vanishing M-index, i. e. k = s and Q I = 0. Then, by (18) we attach to M a−ε the cell conv (e 1 ) of dimension zero. Consequently, a new connected component is created, and it holds: 2)x is of type (I) with M-index equal to one, i. e. k = s and Q I = 1. Then, by (18) we attach to M a−ε the cell conv (e 1 ) × [0, 1] of dimension one. Consequently, at most one connected component disappears, and it holds: This case is well known from nonlinear programming, see e. g. [13]. 3)x is of type (II) with M-index equal to one, i. e. k = s − 1 and Q I = 0. Then, by (18) we attach to M a−ε as many as n − s cells of dimension one, namely: j = 2, . . . , n − s + 1 conv e 1 , e j .
Consequently, at most n − s connected components disappear, and it holds: For illustration we refer to Fig. 1  Now, we proceed with the global argument. Assumption 2 implies that there exists c ∈ R such that M c is empty, thus, q c = 0. Additionally, there exists d ∈ R such that M d is connected and contains all M-stationary points, thus, q d = 1. Due to Assumption 1, M d c is compact, moreover, it contains all M-stationary points. Since nondegenerate M-stationary points are in particular isolated, we conclude that there must be finitely many of them. Let us now increase the level a from c to d and describe how the number q a of connected components of the lower level sets M a changes. It follows from the local argument that r new connected components are created, where r is the number of local minimizers for SCNO. Let q I and q I I denote the actual number of disappearing connected components if passing the levels corresponding to M-stationary points of types (I) and (II), respectively. The local argument provides that at most r I and (n − s)r I I connected components might disappear while doing so, i. e. q I ≤ r I , q I I ≤ (n − s)r I I .
Altogether, we have: r − r I − (n − s)r I I ≤ r − q I − q I I = q d − q c .
By recalling that q d = 1 and q c = 0, we get Morse relation (17).
We illustrate Theorem 6 by discussing the same SCNO as in Example 1.
Hence, there should exist an additional M-stationary point with M-index one. In fact, (0, 0) is this nondegenerate M-stationary point of type (II), cf. Example 1. Note that, due to r I = 0 and r I I = 1, Morse relation (17) holds with equality here.
Let us briefly comment on the applicability of deformation and cell-attachment results in Theorems 4 and 5 , respectively, for the least squares loss function.

Remark 4 (Least squares loss function)
We take the least squares as the objective function in SCNO, i. e.
where A ∈ R m×n can be viewed as a sensing matrix and b ∈ R m as a measurement vector. Then, SCNO corresponds to the problem of sparse recovery from compressed sensing, see e. g. [1]. Here, it is convenient to assume that the bound on the number of non-zero entries of the signal does not exceed the number of measurements, i. e. s ≤ m. Let us examine whether Assumption 1 is fulfilled for the least squares loss function. It turns out that the so-called s-regularity of A is sufficient for the latter. Recall from [1] that a matrix A ∈ R m×n is called s-regular if for every index set I ⊂ {1, . . . , n} with |I | = s it holds: where A I denotes the submatrix of A with the columns corresponding to the set I , and rank (A I ) stands for its rank. In presence of s-regularity of A, it is shown in [15] that the lower level sets are bounded for all a ∈ R. Hence, the restriction of the least squares loss function on R n,s is in this case proper, i. e. Assumption 1 is satisfied. Note that Assumption 2 trivially holds for the least squares loss function, since it is nonnegative. Finally, we refer to [15] for the detailed exposition of the topological approach as applied to sparse recovery.
Data sharing not applicable to this article as no datasets were generated or analysed during the current study.