A Low-Cost Alternating Projection Approach for a Continuous Formulation of Convex and Cardinality Constrained Optimization

We consider convex constrained optimization problems that also include a cardinality constraint. In general, optimization problems with cardinality constraints are difficult mathematical programs which are usually solved by global techniques from discrete optimization. We assume that the region defined by the convex constraints can be written as the intersection of a finite collection of convex sets, such that it is easy and inexpensive to project onto each one of them (e.g., boxes, hyper-planes, or half-spaces). Taking advantage of a recently developed continuous reformulation that relaxes the cardinality constraint, we propose a specialized penalty gradient projection scheme combined with alternating projection ideas to compute a solution candidate for these problems, i.e., a local (possibly non-global) solution. To illustrate the proposed algorithm, we focus on the standard mean-variance portfolio optimization problem for which we can only invest in a preestablished limited number of assets. For these portfolio problems with cardinality constraints, we present a numerical study on a variety of data sets involving real-world capital market indices from major stock markets. In many cases, we observe that the proposed scheme converges to the global solution. On those data sets, we illustrate the practical performance of the proposed scheme to produce the effective frontiers for different values of the limited number of allowed assets.


Introduction
We are interested in convex constrained optimization problems with an additional cardinality constraint.In other words, we are interested in finding sparse solutions of those optimization problems, i.e. solutions with a limited number of nonzero elements, as required in many areas including image and signal processing, mathematical statistics, machine learning, portfolio optimization problems, among others.One effective way to ensure the sparsity of the obtained solution is imposing a cardinality constraint where the number of nonzero elements of the solution is bounded in advance.
To be precise, let us consider the following constrained optimization problem min x f (x) subject to x ∈ Ω and x 0 ≤ α, where f : R n → R is continuously differentiable, 1 ≤ α < n is a given natural number, Ω is a convex subset of R n (that will change depending on the considered application), and the L 0 (quasi) norm x 0 denotes the number of nonzero components of x.The sparsity constraint x 0 ≤ α is also called the cardinality constraint.Of course, we will assume that α < n since otherwise the cardinality constraint could be discarded.
The main difference between problem (1) and a standard convex constrained optimization problem is that the cardinality constraint, despite of the notation, is not a norm, nor continuous neither convex.Because of the non-tractability of the so-called zero norm x 0 , the 1-norm x 1 has also been frequently considered to develop good approximate algorithms.Clearly, to impose a required level of sparsity, the use of the zero norm in (1) is much more effective.
Optimization problems with cardinality constraints are very hard problems which are typically solved by global techniques from discrete or combinatorial optimization.However, in a more general setting, a continuous reformulation has been recently proposed and analyzed in [11] to deal with this difficult cardinality constraint.The main idea is to address the continuous counterpart of problem (1): min x,y f (x) subject to: x ∈ Ω, e y ≥ n − α, x i y i = 0, for all 1 ≤ i ≤ n, 0 ≤ y i ≤ 1, for all 1 where e ∈ R n denotes the vector of ones.We note that the last n constraints denote a simple box in the auxiliary variable vector y ∈ R n .A more difficult reformulation substitutes the simple box by a set of binary constraints given by: either y i = 0 or y i = 1 for all i.In that case, the problem is an integer programming problem (much harder to solve) for which there are several algorithmic ideas already developed; see, e.g., [3,4,13,16,28].In here, we will focus on the continuous formulation (2), that will play a key role in our algorithmic proposal.For additional theoretical properties that include the equivalence between the original version (1) and the continuous relaxed version (2), see [11,22,24,25].
As a consequence of the so-called Hadamard constraint (x • y = 0, i.e., x i y i = 0 for all i), the formulation (2) is a nonconvex problem, even when the original cardinality constrained problem (except for the cardinality constraint of course) was convex.Thus, one can in general not expect to obtain global minima.But if one is for example interested in obtaining local solutions or good starting points for a global method, this continuous formulation (2) can be useful.
In this work, we will pay special attention to those problems for which the set Ω is the intersection of a finite collection of convex sets, in such a way that it is very easy to project onto each one of them.In that case, the main idea is to take advantage of the fact that two of the constraints in (2), namely e y ≥ n − α and 0 ≤ y i ≤ 1 for all i, are also "easy-to-project" convex sets, and so an alternating projection scheme can be conveniently applied to project onto the intersection of all the involved constraints in (2), except for the Hadamard constraint.For solving the continuous formulation (2) we can then use a suitable low-cost convex constrained scheme, such as gradient-type methods in which the objective function includes f (x) plus a suitable penalization term that guarantees that the Hadamard constraint is also satisfied at the solution.In Section 2, we will describe and analyze a general penalty method to satisfy the Hadamard constraint that appears in the relaxed formulation (2).In Section 3, we will describe a suitable alternating projection scheme as well as a suitable low-cost gradient-type projection method that can be combined with the penalty method of Section 2. Concerning some specific applications, in Section 4, we will consider in detail the standard mean-variance limited diversified portfolio selection problem (see e.g., [12,13,14,16,18,20]).In Section 5, we will present a numerical study to illustrate the computational performance of the proposed scheme on a variety of data sets involving real-world capital market indices from major stock markets.For each considered data set, we will focus our attention on the efficient frontier produced for different values of the limited number of allowed assets.In Section 6, we will present some final comments and perspectives.

A penalization strategy for the Hadamard constraint
Let us consider again the continuous formulation (2), and let us focus our attention on the Hadamard constraint x • y = 0 (i.e., x i y i = 0 for all i).This particular constraint, is the only one that does not define a convex set.The others define convex sets in which it is easy to project, as discussed in the previous section.To see that the set of vectors (x, y) ∈ R 2n such that x • y = 0 do not form a convex set, it is enough to consider the two 2-dimensional pairs: (x 1 , y 1 ) = (1, 0, 0, 1) and (x 2 , y 2 ) = (0, 1, 1, 0).Both pairs are clearly in that set, but the convex combination: 1  2 (x 1 , y 1 ) + 1 2 (x 2 , y 2 ) = 1 2 e, which is not in that set.
A classical and straightforward approach to force the Hadamard condition at the solution, while keeping the feasible set of our problem as the intersection of a finite collection of easy convex sets, is to add a penalization term τ h(x, y) to the objective function and consider instead the following formulation: where τ > 0 is a penalization parameter that needs to be properly chosen, and the function h : R 2n → R is continuously differentiable and chosen to satisfy the following two properties: h(x, y) ≥ 0 for all feasible vectors x and y, and h(x, y) = 0 if and only if x • y = 0. Clearly, the function h(x, y) is crucial and should be conveniently chosen depending on the considered application.
Applying now a penalty scheme, problem (3) can be reduced to a sequence of convex constrained problems of the following form: where τ k > 0 is the penalty parameter that increases at every k to penalize the Hadamard-constraint violation, and the closed convex set Ω is given by Under some mild assumptions and some specific choice of the sequence {τ k }, it can be established that the sequence of solutions of problem (4) converges to a solution of (2); see, e.g., [19] and [26,Secc. 12.1].Let us assume that problem (2) attains global minimizers.Since f is a continuous function, it is enough to assume that one of the closed and convex sets involved in the definition of Ω in (2) is bounded.In here, for the sake of completeness, we summarize the convergence properties of the proposed penalty scheme (4).
Theorem 1.If for all k, τ k+1 > τ k > 0 and (x k , y k ) is a global solution of (4), then Finally, if τ k → ∞ and {(x k , y k )} is the sequence of global minimizers obtained by solving (4) then any limit point of {(x k , y k )} is a global minimizer of (2).Remark 1.In the proof of the last statement of Theorem 1 (see, e.g., [26,Secc. 12.1]), the requirement of τ k → ∞ is used only to guarantee that the term h(x k , y k ) → 0 when k → ∞, i.e., to guarantee that x k • y k → 0. In order to guarantee the convergence result, what is important is that the Hadamard product itself goes to zero even if 0 < τ k < ∞ for all k.This fact will play a key role in our numerical study (Section 5).
We would like to close this section with a pertinent result ([24, Theorem 4]) that establishes a oneto-one correspondence between minimizers of problems ( 1) and ( 2), whenever the obtained solution x satisfies the cardinality constraint with equality, i.e., x 0 = α.Theorem 2. Let (x, ȳ) be a local minimizer of the relaxed problem (2).Then x 0 = α if and only if ȳ is unique, that is, if there exist exactly one ȳ such that (x, ȳ) is a local minimizer of (2).In this case, the components of ȳ are binary (i.e., ȳi = 0 or ȳi = 1 for all 1 ≤ i ≤ n) and x is a local minimizer of (1).

Dykstra's method and the SPG method
For every k, a low-cost projected gradient method can be used to solve the optimization problem (4).Notice that Ω is the intersection of finitely many "easy" convex sets.A convenient tool for finding the required projections onto Ω is Dykstra's alternating projection algorithm [10], that will be now described in a general setting in R n .Roughly speaking, Dykstra's algorithm projects in a clever way onto the easy convex sets individually to complete a cycle which is repeated iteratively, and as any other iterative method it can be stopped prematurely.
For a given x ∈ R n and finitely many closed convex sets, say Ω 1 , . . ., Ω p in R n , we consider the best approximation problem: find the closest point to x in Ω = ∩ p i=1 Ω i = ∅, which can be stated as an optimization problem as follows: where, for any z ∈ R n , z 2 = z, z .The unique solution x * of problem ( 5) is called the projection of x onto Ω and is denoted as P Ω (x).In Dykstra's method it is assumed that the projections onto each of the individual sets Ω i are relatively simple to compute, e.g., boxes, spheres, subspaces, half-spaces, hyperplanes, among others.The algorithm has been adapted and used for solving a huge amount of different applications.For a review on Dykstra's method, its properties and applications, as well as many other alternating projection schemes; see, e.g., [15,17].
Dykstra's algorithm solves (5) by generating two sequences: the iterates {x i } and the increments {I i }.These sequences are defined by the following recursive formulae: for ∈ Z + with initial values x p 0 = x and I i 0 = 0 for i = 1, 2, . . ., p.
Notice that the increment I i −1 associated with Ω i in the previous cycle is always subtracted before projecting onto Ω i .The sequence of increments play a fundamental role in the convergence of the sequence {x i } to the unique optimal solution x * = P Ω (x) of problem (5).Notice also that, for the sake of simplicity in our presentation, the projecting cyclic control index i( ) used in (6) is the most common one: i( ) = mod p + 1, for all ≥ 0. However, more advanced control indices can also be used, as long as they satisfy some minimal theoretical requirements; see, e.g., [17]).
Boyle and Dykstra [10] established the key convergence theorem associated with algorithm (6).
Theorem 3. Let Ω 1 , . . ., Ω p be closed and convex sets of R n such that Ω = ∩ p i=1 Ω i = ∅.For any i = 1, 2, . . ., p and any x ∈ R n , the sequence {x i } generated by (6) converges to x * = P Ω (x) (i.e., Concerning the rate of convergence, it is well-known that Dykstra's algorithm exhibits a linear rate of convergence in the polyhedral case ( [15,17]), which is the case in all problems considered here, see Section 5. Finally, the stopping criterion associated with Dykstra's algorithm is a delicate issue.A discussion about this topic and the development of some robust stopping criteria are fully described in [9].Based on that, in here we will stop the iterations when where ε > 0 is a small given tolerance.
Since the gradient ∇f (x, y) of f (x, y) = f (x) + τ h(x, y) is available for each fixed τ > 0, then Projected Gradient (PG) methods provide an interesting low-cost option for solving (4).They are simple and easy to code, and avoid the need for matrix factorizations (no Hessian matrix is used).There have been many different variations of the early PG methods.They all have the common property of maintaining feasibility of the iterates by frequently projecting trial steps on the feasible convex set.In particular, a well-established and effective scheme is the so-called Spectral Projected Gradient (SPG) method; see Birgin et al. [5,6,7,8]).
The SPG algorithm starts with (x 0 , y 0 ) ∈ R 2n , and moves at every iteration j along the internal projected gradient direction d j = P Ω ((x j , y j ) − α j ∇f (x j , y j )) − (x j , y j ), where d j ∈ R 2n and α j is the well-known spectral choice of step length (see [8]): , and s j−1 = (x j , y j ) − (x j−1 , y j−1 ).In the case of rejection of the first trial point, (x j , y j ) + d j , the next ones are computed along the same direction, i.e., (x + , y + ) = (x j , y j ) + λd j , using a nonmonotone line search to choose 0 < λ ≤ 1 such that the following condition holds where M ≥ 1 is a given integer and γ is a small positive number.Therefore, the projection onto Ω must be performed only once per iteration.More details can be found in [5] and [6].In practice γ = 10 −4 and a typical value for the nonmonotone parameter is M = 10, but the performance of the method may vary for variations of this parameter and a fine tuning may be adequate for specific applications.
Another key feature of the SPG method is to accept the initial spectral step-length as often as possible while ensuring global convergence.For this reason, the SPG method employs a non-monotone line search that does not impose functional decrease at every iteration.The global convergence of the SPG method combined with Dykstra's algorithm to obtain the required projection per iteration can be found in [7, Section 3].

Cardinality constrained optimal portfolio problem
Let the vector v ∈ R n and the symmetric and positive semi-definite matrix Q ≡ [σ ij ] i,j=1,...,n ∈ R n×n be the given mean return vector and variance-covariance matrix of the n risky available assets, respectively.The entry σ ij in Q is the covariance between assets i and j for i, j = 1, . . ., n , σ ii = σ 2 i and σ ij = σ ji .As a consequence of the pioneering work of Markowitz [27], the mean-variance portfolio selection problem can be formulated as (1), where the objective function is given by and the convex set representing the constraints of minimum expected return level ρ, budget constraint ( n i=1 x i = 1 means that all available wealth will be invested), and lower (x ≥ 0 excludes short sale) and upper bounds for each x i , respectively.Notice that the minimization of f (x), involving the given covariance matrix Q, accounts for the minimization of the variance, while the return is expected to be at least ρ.Notice also that, as previously discussed, in this case the set Ω is the intersection of three easy convex sets: a halfspace, a hyperplane, and a box.The additional constraint in (1), x 0 ≤ α for 0 < α < n, plays a key role here, and indicates that among the n risky available options, we can only invest in at most α assets (cardinality constraint).The solution vector x denotes an investment portfolio and each x i represents the fraction held of each asset i.It should be mentioned that other inequality and/or equality constraints can be added to the problem, as they represent additional real-life constraints; e.g., transaction costs [2,23].Now, as discussed above, our main idea is to consider the continuous formulation (2) instead of the optimization problem (1).For the portfolio selection problem we would end up with the following problem that involves the auxiliary vector y: where the upper bound vector up ∈ R n and ρ > 0 are given.Note that the vector y appears only in the last 3 constraints, and the vector x appears in the first three constraints but also in the (non-convex) As discussed in Section 2, the best option to force the Hadamard condition at the solution while keeping the feasible set of our problem as the intersection of a finite collection of easy convex sets, is to add the term τ h(x, y) to the objective function, where our convenient choice is h(x, y) = x y: where τ > 0 is a penalization parameter that needs to be properly chosen as described in Section 2. Since the vectors x and y will be forced by the alternating projection scheme to have all their entries greater than or equal to zero, then h(x, y) = x y ≥ 0 for any feasible pair (x, y), and forcing τ x y = 0 is equivalent to forcing the Hadamard condition: x i y i = 0 for all i.Notice that, setting τ = 0 for solving (9) with f (x, y) given by ( 10) minimizes the risk, independently of the Hadamard condition.On the other hand, if τ > 0 is sufficiently large as compared to the size of Q then the term x y must be zero at the solution.Hence, choosing τ > 0 represents an explicit trade-off between the risk and the Hadamard condition.
Our algorithmic proposal consists in solving a sequence of penalized problems, as described in Section 2, using the SPG scheme and Dykstra's alternating projection method (that from now on will be denoted as the SPG-Dykstra method) to solve problem (9), without the complementarity constraint x • y = 0, and using the objective function given by (10).That is, for a sequence of increasing penalty terms τ k > 0, we will solve the following problems min x,y Since the function h(x, y) = x y satisfies the properties mentioned in section 2, if we choose the sequence of parameters {τ k } such that h(x k , y k ) goes to zero when k goes to infinity, then Theorem 1 guarantees the convergence of the proposed scheme.
Before showing some computational results in our next section, let us recall that the gradient and the Hessian of the objective function f at every pair (x, y) are given by Notice that, for any τ k > 0, ∇ 2 f (x, y) is symmetric and indefinite.

Computational results
To add understanding and illustrate the advantages of our proposed combined scheme, we present the results of some numerical experiments on an academic simple problem (n = 6), and also on some data sets involving real-world capital market indices from major stock markets.All the experiments were performed using Matlab R2022 with double precision on an Intel Quad-Core i7-1165G7 at 4.70 GHz with 16GB of RAM memory, using Windows 10 Pro with 64 Bits.
The algorithm we use in this section was indicated in the previous sections and now, for completeness, we describe it in detail.Algorithm Penalty-SPG-Dykstra (PSPGD).
For our experiments, we set tol 1 = 10 −6 and tol 2 = 10 −8 .We note that at any iteration k ≥ 1, Step S2 of Algorithm PSPGD starts from (x k−1 , y k−1 ), which is the previous solution of ( 11), obtained using τ k−1 .We also note that to stop the SPG-Dykstra iterations we monitor the value of which is denoted as the pgnorm at iteration k in the tables below.It is worth recalling that if P Ω ((x, y) − ∇f (x, y)) − (x, y) 2 = 0, then (x, y) ∈ Ω is stationary for problem (11); see, e.g., [5,7].Concerning the nonmonotone line search strategy used by the SPG method, we set γ = 10 −4 and M = 10.Each SPG iteration uses Dykstra's altrnating projection scheme to obtain the required projection onto Ω, and this internal iterative process is stopped when ( 7) is satisfied with ε = 10 −8 .
To explore the behavior of Algorithm PSPGD, we will vary the minimum expected return parameter ρ > 0 and the cardinality constraint positive integer 1 ≤ α < n.In all cases, we set the upper bound vector up = e, where e is the vector of ones.Of course, for certain combinations of all those parameters the problem might be infeasible.We will discuss possible choices of these parameters to guarantee that the feasible region of problem (11) is not empty.
To keep a balanced trade-off between the risk and the Hadamard condition, it is convenient to choose the initial parameter τ −1 > 0 of the same order of magnitude of the largest eigenvalue of Q.For that, we proceed as follows: set z = Qe and τ −1 = z Qz/(z z), i.e., a Rayleigh-quotient of Q with a suitable vector z, which produces a good estimate of λ max (Q).This choice worked well for the vast majority of the test examples.According to Remark 1, to observe convergence, we need to drive the inner product x k y k down to zero.For that we increase the penalization parameter as follows: We note that in practice this formula increases the penalty parameter in a controlled way taking into account the ratio between the absolute value of the current return |v x k+1 | and the current risk x k+1 Qx k+1 .In all the reported experiments, the controlled sequence {τ k } given by ( 12) was enough to guarantee that the Hadamard product goes down to zero.
Concerning the choice of the expected return, based on [12,28], in order to consider feasible problems we study the behavior of our combined scheme in an interval [ρ min , ρ max ] of possible values of the parameter ρ, which is obtained as follows.Let ρ min = v x min and ρ max = v x max where x Qx + τ x y subject to: These two auxiliary optimization problems are solved in advance, only once for each considered problem, using in turn the proposed Algorithm PSPGD.For that, we fix the same parameters and we start from the same initial values indicated above.Once the interval [ρ min , ρ max ] has been obtained, to choose a suitable return ρ we can proceed as follows.For a fixed 0 < ˜ < 1, if ρ min + ˜ (ρ max − ρ min ) ≥ 0 we set ρ = ρ min + ˜ (ρ max − ρ min ), else if |ρ| ≤ v max we set ρ = ˜ |ρ|, otherwise we set ρ = ˜ v max .In here, v min = min{v 1 , . . ., v n } and v max = max{v 1 , . . ., v n }.
The key properties, to be discussed and illustrated in the rest of this section, are the influence of the cardinality constraint to the feasible set in the risk-return plane, the efficient frontier, and the quality of the solution obtained by Algorithm PSPGD.The feasible set is usually represented in the risk-return plane, presenting all possible combinations of assets that satisfy the constraints.In general the feasible set for the classical problem without cardinality constraint has the so-called bullet shape.The efficient frontier is the set of optimal portfolios that offer the highest expected return for a defined level of risk or the lowest risk for a given level of expected return.Clearly, in the risk-return plane, the efficient frontier is the upper limit of the feasible set.
Introducing the cardinality constraints might complicate the feasible set in the sense that the set is shrinking as we will now show.Starting with the feasible interval for the expected return we report in Table 1, ρ max ≤ v max and ρ min ≥ v min , for α = 5 and for all the considered data sets.x Qx subject to e x = 1) for the Simple-case problem we obtain x = (0.0961, 0.1168, 0.2625, 0.2140, 0.1429, 0.1677) , risk x Qx = 0.1379, and expected return v x = −0.0079.Solving the same problem with the additional constraint x ≥ 0 we get the same solution.Thus, the minimal variance portfolio is the same as the minimal variance portfolio without short sale.In Figure 1, we present for the Simple-case problem, the return and risk for all 6 assets, the minimal variance portfolio, denoted by MVP, the classical Markowitz portfolio without short sale and the expected return constraint v x ≥ ρ = 0.002, denoted by MP, as well as the efficient frontier for different values of the cardinality constraint α.Clearly for α = 6, i.e., without cardinality constraint, we get a classical convex efficient frontier while for smaller α values the curves are deformed.

Problem
Figure 1: Risk versus return, using Algorithm PSPGD for the Simple-case problem.
For the Simple-case problem, with n = 6 available assets, the feasible set is shown in Figure 2. We note that for larger value of α we get larger area of the feasible set.We also note that the bullet shape is not affected by the cardinality constraint but, as expected, the set is shrinking as the number of zero elements increases.The same conclusions apply to the larger data sets coming from real assets.Below, in Figures 3 and  4, we show the feasible sets for Port1 and Port 2. We note that once again the area is shrinking when α decreases.We also note that the same is true for all considered cases.The efficient frontiers for all data sets are shown in Figures 5-10.Again, we observe that the efficient frontier is deformed by the value of the cardinality constraint, and when α < n it is not a convex curve.For the sake of completeness, in the Appendix we provide some tables with more detailed results, varying the cardinality constraints, for all considered data sets.We can observe in all those figures and tables the effectiveness of our low-cost continuous approach (Algorithm PSPGD).Additionally we compare our approach to IBM ILOG CPLEX Optimization Studio, Version: 22.1.0.0.CPLEX is a quadratic mixed integer programming solver.The goal of comparison is to investigate the quality of solutions obtained by PSPGD and CPLEX in term of risk and return.We also report CPU time although CPLEX is implemented in a low-level language and so it requires significantly less execution time than our high-level Matlab implementation.Hence, CPU time might be misleading.For solving the problems with CPLEX we consider the following formulation: Notice that in the above problem formulation we do not have the Hadamard constrained and instead we have x i + y i ≤ 1 followed by y i ∈ {0, 1}.CPLEX is designed to work with linear constraints and for y i = 0 or y i = 1 we get the same condition.
The details of tests for all considered data sets are presented in Tables 3 -9 in Appendix.Notice that CPLEX data is missing for Port 6 as we were not able to solve the problem with CPLEX.One can easily see that PSPGD produces solutions with slightly higher risk and significantly better return.In Tables 5 we observe that CPLEX needs a very large number of iterations to solve the problem for α ≤ 20, which corresponds to the fact the PSPGD needed a special value of τ −1 for these values of α and large values of penalty parameter τ.Thus, this behavior is associated with the data of Port2.In some other cases, reported in the tables in Appendix, we can observe a rather large number of CPLEX iterations for small values of α while PSPGD solved the same problems with reasonably small values of the penalty parameters.
An interesting observation from the literature, and confirmed by our experiments, is the fact that the optimal portfolio without cardinality constraint is in fact sparse.In Table 2 we report the number of assets obtained by our algorithm and CPLEX which is in accordance with the results reported in [12,Figure 5] and [13,Section 5.2.2].We can observe that the number of assets in the unconstrained Mean-Variance optimal portfolio for Port1 x * 0 ≤ 12, for Port4 x * 0 ≤ 40, and for Port5 x * 0 ≤ 15.As noticed above the feasible set of ( 9) belongs to the feasible set of (11).In addition, since the solution of ( 11) satisfies the Hadamard condition we obtain that the solution is also a solution of (9).Then, by Theorem 2, we have that if (x * , y * ) is a local minimizer of (11) satisfying x * 0 = α then the components of y * are binary, y * is unique, and x * is a local minimizer of (1).In fact, for the solutions reported in Tables 3, 4, and 6 in the Appendix, if x * 0 = α we have that the components of y * are binary.The solution may have non-binary entries in y * , for instance port1 with α = n = 31 we have that y * is binary, however the cardinality constraint is not active x * 0 = 12.Another interesting example is detected for Port3 with α = n = 89 in which we obtain a binary y * but x * 0 = 34.

Conclusions and final remarks
Taking advantage of a recently developed continuous formulation, we have developed and analyzed a low-cost and effective scheme for solving convex constrained optimization problems that also include a "hard-to-deal" cardinality constraint.As it appears in many applications, we assume that the region defined by the convex constraints can be written as the intersection of a finite collection of "easy to project" convex sets.Under this continuous formulation, to fulfill the cardinality constraint, the Hadamard condition x • y = 0 must be satisfied between the solution vector x and an auxiliary vector y.In our scheme this condition is achieved by adding a non-negative penalty term h(x, y), and using a classical penalization strategy.For each penalty subproblem, a convex constrained problem must be solved, which in our proposal is achieved by combining two low-cost computational schemes: the spectral projected gradient (SPG) method and Dykstra's alternating projection method.
To illustrate the computational performance of our combined scheme, we have considered in detail the standard mean-variance limited diversified portfolio selection problem, which involves obtaining the proportion of the initial budget that should be allocated in a limited number of the available assets.For this specific application, we proposed a natural differentiable choice of the penalty term (given by h(x, y) = x y) that must be driven to zero, which allowed us to develop a simple way of increasing the associated penalty parameter in a controlled and bounded way.In our numerical study we have included a variety of data sets involving real-world capital market indices.For these data sets we have produced the feasible sets and also the efficient frontier (a curve illustrating the tradeoff between risk and return) for different values of the limited number of allowed assets.In each case, we highlighted the differences that arise in the shape of this efficient frontiers as compared with the unconstrained efficient one.The presented numerical study includes comparison with CPLEX, a professional software for general mixed integer programming problems.The comparison is presented in terms of quality of solution (higher return, lower risk) and PSPGD appears to be competitive.Furthermore, PSPGD is successfully applied to a large portfolio problem with 600 assets while CPLEX failed at solving this particular problem.
In our modeling of the portfolio problem we have bounded the proportion to be invested in each of the selected assets between 0 and 1.However, without altering our proposed scheme, stricter upper limits (less than 1) can be imposed on some particular assets.Clearly, this would require a more careful analysis of the feasible options for the expected return.Moreover, it could also be interesting from a portfolio point of view, to allow negative entries in some of the proportions to be invested, and that can be accomplished by allowing negative values in the lower bounds of the solution vector.In that case, the penalization term to force the Hadamard condition needs to be chosen accordingly (e.g., h(x, y) = n i=1 (x 2 i y i )).

Table 1 :
Return value with α = 5 for all data sets.
[27]us now take a closer look at the Simple-case.If we solve the original Markowitz problem[27]-the minimal variance portfolio, (i.e., min x 1 2

Table 2 :
Performance of Algorithm PSPGD for all cases when n = α.

Table 3 :
Performance of PSPGD and CPLEX for the Simple case.

Table 4 :
Performance of Algorithm PSPGD and CPLEX for problem Port1.

Table 5 :
Performance of Algorithm PSPGD and CPLEX for problem Port2.

Table 6 :
Performance of Algorithm PSPGD and CPLEX for problem Port3.

Table 7 :
Performance of Algorithm PSPGD and CPLEX for problem Port4.

Table 8 :
Performance of Algorithm PSPGD and CPLEX for problem Port5.

Table 9 :
Performance of Algorithm PSPGD for problem Port6.