Dual-density-based reweighted $\ell_{1}$-algorithms for a class of $\ell_{0}$-minimization problems

The optimization problem with sparsity arises in many areas of science and engineering such as compressed sensing, image processing, statistical learning and data sparse approximation. In this paper, we study the dual-density-based reweighted $\ell_{1}$-algorithms for a class of $\ell_{0}$-minimization models which can be used to model a wide range of practical problems. This class of algorithms is based on certain convex relaxations of the reformulation of the underlying $\ell_{0}$-minimization model. Such a reformulation is a special bilevel optimization problem which, in theory, is equivalent to the underlying $\ell_{0}$-minimization problem under the assumption of strict complementarity. Some basic properties of these algorithms are discussed, and numerical experiments have been carried out to demonstrate the efficiency of the proposed algorithms. Comparison of numerical performances of the proposed methods and the classic reweighted $\ell_1$-algorithms has also been made in this paper.


Introduction
Let x 0 denote the number of nonzero components of the vector x.We consider the ℓ 0 -minimization problem min where A ∈ R m×n and B ∈ R l×n are two matrices with m ≪ n and l ≤ n, y ∈ R m and b ∈ R l are two given vectors, and ǫ ≥ 0 is a given parameter, and x 2 = ( n i=1 |x i | 2 ) 1/2 is the ℓ 2 -norm of x.In compressed sensing (CS), the parameter ǫ denotes the level of the measurement error η = y − Ax.Clearly, the problem (1) is to find the sparsest point in the convex set The constraint Bx ≤ b is motivated by some practical applications.For instance, many signal recovery models might include extra constraints reflecting certain special structures or prior information of the target signals.The model (1) is general enough to cover several important applications in compressed sensing [4,5,11,12], 1-bit compressed sensing [18,22,32] and statistical regression [20,23,27,30].The following two models are clearly the special cases of ( 1 The problem (C1) is often called the standard ℓ 0 -minimization problem [6,15,32].Some structured sparsity models, including the nonnegative sparsity model [5,6,15,32] and the monotonic sparsity model (isotonic regression) [31], are also the special cases of the model (1).
Clearly, directly solving the problem ( 1) is generally very difficult since the ℓ 0 -norm is a nonlinear, nonconvex and discrete function.Some algorithms have been developed for some special cases of the problem such as (C1) and (C2) over the past decade, including convex optimization and heuristic methods [11,13,15,32].For instance, by replacing the ℓ 0 -norm in problem (1) with the ℓ 1 -norm, we immediately obtain the ℓ 1 -minimization problem where T is given by (2).A more efficient class of models than (3) is the so-called weighted ℓ 1 -minimization model [7,16,32,35].For (C1) and (C2), the reweighted ℓ 1 -minimization model can be stated respectively as (E1) min where W = diag(w) is a diagonal matrix with w ∈ R n + being a weight vector.A single weighted ℓ 1 -minimization is not efficient enough to outperform the standard ℓ 1 -minimization.As a result, the reweighted ℓ 1 -algorithm has been developed, which consists of solving a series of individual weighted ℓ 1 -minimization problems [1,2,7,16,32,35].Taking the (C1) as example, this method solves a series of the following reweighted ℓ 1 -problems: where k denotes the kth iteration and the weight w k is updated by a certain rule.For example, the first-order method would yield a good updating scheme for w k .The convergence of some reweighted algorithms was shown under certain conditions [8,21,32,35].The reweighted ℓ 1 -minimization may perform better than ℓ 1 -minimization on sparse signal recovery when the initial point is suitably chosen (see, e.g., [7,8,14,21,32,35]).Although this paper focuses on the study of reweighted algorithms, it is worth mentioning that there exist other types of algorithms for ℓ 0 -minimization problems, which have also been widely studied in the CS literature, such as orthogonal matching pursuits [13,24,29], compressed sampling matching pursuits [15,25], subspace pursuits [9,15], thresholding algorithms [3,10,13,15], and the newly developed optimal kthresholding algorithms [33].
Recently, a new framework of reweighted algorithms for sparse optimization problems was proposed in [34,36] which is derived from the perspective of the dual density.The key idea is to use the complementarity between the solutions of the ℓ 0minimization and theoretically equivalent weighted ℓ 1 -minimization problem.Such complementarity property makes it possible to reformulate the ℓ 0 -minimization problem as an equivalent bilevel optimization which seeks the densest solution of the dual problem of a weighted ℓ 1 -problem.In this paper, we generalize this idea to the ℓ 0minimization problem (1) and develop new dual-density-based algorithms through convex relaxation of the bilevel optimization.More specifically, to possibly solve the model (1), we consider the problem min which is the weighted ℓ 1 -minimization problem associated with the problem (1) for a given weight w ∈ R n + .The dual-density-based reweighted ℓ 1 -algorithms for (1) are directly derived from the relaxation of the bilevel-optimization reformulation of the problem (1).To this goal, we develop some sufficient condition for the strict complementarity of the solutions of weighted ℓ 1 -minimization problem associated with the problem (1) and the solution of its dual problem.We propose three types of convex relaxations of the bilevel optimization problem in order to develop our dual-densitybased ℓ 1 -algorithms for the problem (1).
The paper is organized as follows.In Sections 2, we recall the merit functions for sparsity and give a few examples of such functions, and we introduce the classic reweighted ℓ 1 -algorithms.Section 3 is denoted to the development of a sufficient condition for the strict complementarity property to hold.In Section 4, we show that the ℓ 0 -problem (1) can be reformulated equivalently as a bilevel optimization problem which, in theory, can generate an optimal weight for weighted ℓ 1 -minimization problems.In Section 5, we discuss several new relaxation strategies for such a bilevel optimization problem, based on which we develop the dual-density-based reweighted ℓ 1 -algorithms for the problem (1).Finally, we demonstrate some numerical results for the proposed algorithms.
Notation : The ℓ p -norm on R n is defined as x p = ( n i=1 |x i | p ) 1/p , where p ≥ 1.The identity matrix of a suitable size is denoted by I.The n-dimensional Euclidean space is denoted by R n .R n + and R n ++ are the sets of nonnegative and positive vectors respectively, and R n − be the set of the nonpositive vectors.The complementary set of S ⊆ {1, ..., n} is denoted by S, i.e., S = {1, ..., n} \ S. For a given vector x ∈ R n and S ⊆ {1, ..., n} , x S is the subvector of x supported on S.

Preliminary
In this section, we recall the notion of merit functions for sparsity and list a few such examples.We also briefly outline the classic reweighted ℓ 1 -methods for the problem (1).A function is called a merit function for sparsity if it can approximate the ℓ 0 -norm in some sense [32,35].Some concave functions are shown to be the good candidates for the merit functions for sparsity [7,19,32,34,35].As pointed out in [35,36], we may choose a family of merit functions in the form where ϕ ε is a function from R + to R + .Ψ ε (s) satisfies the following properties: • (P 1) for any given s ∈ R n + , Ψ ε (s) tends to s 0 as ε tends to 0; • (P 2) Ψ ε (s) is twice continuously differentiable with respect to s ∈ R n + in the open neighborhood of R n + ; • (P 3) ϕ ε (s i ) is concave and strictly increasing with respect to every s i ∈ R + .
We denote the set of such merit functions by The following merit functions satisfying (P 1)-(P 3) have been used in [35,36]: where ε ∈ (0, 1).In this paper, we also use the following merit function: where ε > 0. It is easy to show that (8) belongs to the set F.
In order to compare the algorithms proposed in later sections, we briefly introduce the classic reweighted ℓ 1 -method.Following the idea in [35] and [32], replacing x 0 with Ψ ε (|x|) ∈ F leads to the following approximation of the problem (1): min By using the first order approximation of Ψ ε (t) ∈ F at the point t k , the problem (9) can be approximated by the linear optimization min which is used to generate the new iterate (x k+1 , t k+1 ).Due to the fact that Ψ ε (t) is strictly increasing with respect to each t i ∈ R + , it is evident that the iterate (x k , t k ) must satisfy t k = |x k |, which implies that This is the classic reweighted ℓ 1 -minimization method described in [32].
Algorithm 1 Reweighted ℓ 1 -algorithm (RA) initial weight w 0 , the iteration index k and the largest number of iterations k max .

Main step:
At the current iterate x k−1 , solve the weighted ℓ 1 -minimization where Repeat the above main step until k = k max (or certain other stopping criterion is met).
Based on the generic convergence of revised Frank-Wolfe algorithms (F W -RD) for a class of concave functions in [26], the generic convergence of the algorithm RA can be obtained (see details in [26]), that is, there exists a family of merit functions Ψ ε ∈ F such that RA converges to a stationary point of the problem.The convergence of RA to a sparse point in the case of linear-system constraints can be found in [32].

Duality, strict complementarity and optimality condition
To develop the dual-density-based reweighted ℓ 1 -algorithms, we first discuss the duality and the optimality condition of the model ( 4), and we give a sufficient condition for the strict complementarity to satisfy for the model (4).

Duality and complementary condition
By introducing two variables t ∈ R n and γ ∈ R m such that |x| t and γ = y − Ax, we can rewrite (4) as the following problem: min Obviously, ( 11) is equivalent to (4).Additionally, if w ∈ R n ++ , then the solution (x * , t * , γ * ) to (11) must satisfy that |x * | = t * and γ * = y − Ax * , and the following relation of the solutions of ( 4) and ( 11) is obvious.
are optimal to the problem (11).Moreover, if (x, t, γ) is optimal to the problem (11), then x is optimal to the problem (4).
Let λ = (λ 1 , ..., λ 6 ) be the dual variable, then the dual problem of ( 11) can be stated as follows: The strong duality between ( 11) and ( 12) can be guaranteed under suitable condition.Thus the following results follows from the classic optimization theory [28].Lemma 3.2.Let the Slater condition hold for the convex problem (11), i.e., there exists where ri(T ) is the relative interior of T .Then there is no duality gap between (11) and its dual problem (12).Moreover, if the optimal value of (11) is finite, then there exists at least one optimal Lagrangian multiplier such that the dual optimal value can be attained.
In this paper, we assume that the Slater condition holds for (11).Clearly, the optimal value of ( 11) is finite when w is a given vector, and hence the strong duality holds for (11) and (12) and the dual optimal value can be attained.Actually, the set Ω = {x : Ax = y, Bx b} is in practice not empty due to the fact that y and b are the measurements of the signals.Thus the Slater condition is a very mild sufficient condition for strong duality to hold for the problems (11) and (12).(11) and (12) It is well-known that for any convex minimization problem with differentiable objective and constraint functions for which the strong duality holds, Karush-Kuhn-Tucker (KKT) condition is the necessary and sufficient optimality condition for the problem and its dual problem [28].Since the Slater condition holds for (11), by Lemma 3.2, the optimality condition for ( 11) is stated as follows.

Strict complementarity
For nonlinear optimization models, the strictly complementary property might not hold.However, it might be possible to develop a condition such that the strict complementarity holds for the model ( 4) or (11).We now develop such a condition for the problems ( 11) and ( 12) under the following assumption.
Assumption 3.6.Let W = diag(w) satisfy the following properties: • G1 The problem ( 4) with w has an optimal solution which is a relative interior point in the feasible set T , denoted by x * ∈ ri(T ), such that • G2 the optimal value Z * of ( 4) is finite and positive, i.e., Z * ∈ (0, ∞), • G3 w j ∈ (0, ∞] for all 1 j n.
Next we prove the following theorem concerning the strict complementarity for ( 11) and ( 12) under Assumption 3.6.Theorem 3.8.Let y and b be two given vectors, A ∈ R m×n and B ∈ R l×n be two given matrices, and w be a given weight which satisfies Assumption 3.6.Then there exists a pair ((x * , t * , γ * ), λ * ), where (x * , t * , γ * ) is an optimal solution to (11) and λ * = (λ * 1 , ..., λ * 6 ) is an optimal solution to (12), such that t * and λ * 6 are strictly complementary, i.e., Proof.Note that (G1) in Assumption 3.6 implies the Slater condition for (11).This, combined with (G2), indicates from Lemma 3.2 that the duality gap is 0, and the optimal value Z * for ( 12) can be attained.For any given index j : 1 ≤ j ≤ n, we consider a series of minimization problems: min The dual problem of ( 16) can be obtained by using the same method for developing the dual problem of (11), which is stated as follows: where e j is a vector whose jth component is 1 and the remains are 0, i.e., e j i = 1, i = j; e j i = 0, i = j.
Next we show that ( 16) and ( 17) satisfy the strong duality property under Assumption 3.6.It can be seen that (x, t, γ) is a feasible solution to (16) if and only if (x, t, γ) is an optimal solution of (11), or if x is optimal to (4).If w satisfies the conditions in Assumption 3.6, then there exists an optimal solution x of (4) such that y − Ax 2 < ǫ, B x b, w T |x| = Z * which means there is a relative interior point (x, t, γ) of the feasible set of ( 16) satisfying As a result, the strong duality holds for ( 16) and ( 17) for all j.Moreover, due to (G2) and (G3), w is positive and Z * is finite, so t j cannot be ∞.Thus the optimal value of all jth minimization problems ( 16) is finite.It follows from Lemma 3.2 that for each jth optimization ( 16) and ( 17), the duality gap is 0, and each jth dual problem (17) can achieve their optimal value.We use ξ * j to denote the optimal value of the jth primal problem in (16).Clearly, ξ * j is nonpositive, i.e., ξ * j < 0 or ξ * j = 0.
It means that the supports of all strictly complementary pairs of ( 11) and ( 12) are invariant.Otherwise, there exists an index j such that t 1 j > 0 and (λ 2 6 ) j > 0, leading to a contradiction.

Bilevel model for optimal weights
For weighted ℓ 1 -minimization, how to determine a weight to guarantee the exact recovery, sign recovery or support recovery of sparse signals is an important issue in CS theory.Based on the complementary condition and strict complementarity discussed above, we may develop a bilevel optimization model for such a weight, which is called the optimal weight in [34], [36] and [32].Definition 4.1 (Optimal Weight).A weight is called an optimal weight if the solution of the weighted ℓ 1 -problem with this weight is one of the optimal solution of the ℓ 0minimization problem.
If w * satisfies Assumption 3.6, then Slater condition is automatically satisfied for (11) with w * and ( 21) is also valid.Moreover, by Theorem 3.8, there exists a strictly complementary pair (|x * (w * )|, λ * 6 (w * )) such that If w * is an optimal weight (see Definition 4.1), then λ * 6 (w * ) must be the densest slack variable among all w ∈ ζ, and locating a sparse vector can be converted to Inspired by the above fact, we develop a theorem under Assumption 4.2 which claims that finding a sparsest point in T is equivalent to seeking the proper weight w such that the dual problem (12) has the densest optimal variable λ 6 .Such weights are optimal weights and can be determined by certain bilevel optimization.This idea was first introduced by Zhao and Kočvara [34] (and also by Zhao and Luo [36]) to solve the standard ℓ 0 -minimization (C1).In this paper, we generalize their idea to solve the model ( 1) by developing new convex relaxation technique for the underlying bilevel optimization problem.Before that we make the following assumption: Assumption 4.2.Let ν be an arbitrary sparsest point in T given in (2).There exists a weight w 0 such that • H1 the problem ( 4) with w has an optimal solution x such that x 0 = ν 0 , • H2 there exists an optimal variable in ( 12) with w, denoted as λ, such that λ6 and x are strictly complementary, • H3 the optimal value of (4) with w is finite and positive.
An example for the existence of a weight satisfying Assumption 4.2 is given in the remark following the next theorem.
where W = diag(w), and T is given as (2).If (w * , λ * ) is an optimal solution to the above optimization problem (22), then any optimal solution x * to is a sparsest point in T .
Given Assumption 4.2 and Slater condition, finding a sparsest point in T is tantamountly equal to look for the densest dual solution via the bilevel model (22).
By the definition of optimal weights, Theorem 4.3 implies that w * is an optimal weight by which a sparsest point can be obtained via (4).If there is no weight satisfying the properties in Assumption 4.2, a heuristic method for finding a sparse point in T can be also developed from (21) since the increase in λ 6 (w) 0 leads to the decrease of x(w) 0 to a certain level.Before we close this section, we make some remarks for Assumption 4.2.

Dual-density-based algorithms
Note that it is difficult to solve a bilevel optimization.We now develop three types of relaxation models for solving the bilevel optimization (22).

Relaxation models
Zhao and Luo [36] presented a method to relax a bilevel problem similar to (22).Motivated by their idea, we now relax our bilevel model.We focus on relaxing the difficult constraint −λ 1 ǫ − λ T 2 b + λ T 3 y = min x { W x 1 : x ∈ T } in (22).By replacing the objective function λ 6 0 in ( 22) by Ψ ε (λ 6 ) ∈ F, where λ 6 0, we obtain an approximation problem of (22), i.e., max (w,λ) We recall the set of the weights ζ given in (20).It can be seen that w being feasible to (29) implies that (11) and ( 12) satisfy the strong duality and have the same finite optimal value, which is equivalent to the fact that w ∈ ζ when Slater condition holds for (11).Moreover, note that the constraints of (29) indicate that for any given w ∈ ζ, λ satisfying the constraints of ( 29) is optimal to (12).Therefore the purpose of ( 29) is to find the densest dual optimal variable λ 6 for all w ∈ ζ.Thus (29) can be rewritten as max (w,λ6) Clearly, the problem (30) can be presented as max (w,λ) An optimal solution of (32) can be obtained by maximizing Ψ ε (λ 6 ) which is based on maximizing −λ 1 ǫ − λ T 2 b + λ T 3 y over the feasible set of (32).Therefore, Ψ ε (λ 6 ) and −λ 1 ǫ − λ T 2 b+ λ T 3 y are required to be maximized over the dual constraints λ ∈ D(w) for all w ∈ ζ.To maximize both the objective functions, we consider the following model as the first relaxation of (30): where α > 0 is a given small parameter.Now we develop the second type of relaxation of the bilevel optimization (22).Note that under Slater condition, for all w ∈ ζ, the dual objective −λ 1 ǫ − λ T 2 b + λ T 3 y must the entries of A and B (if B is not deterministic) are generated from Gaussian random variables with zero mean and unit variance.For each generated (x * , A, B), we set y and b as follows: where d ∈ R l + is generated as absolute Gaussian random variables with zero mean and unit variance, and c 1 ∈ R and c ∈ R m are generated as Gaussian random variables with zero mean and unit variance.Then the convex set T is generated, and all examples of T are generated this way.We use as our default stopping criterion where x ′ is the solution found by the algorithm, and one success is counted as long as (49) is satisfied.In our experiments, we make 200 random examples for each sparsity level.All the algorithms are implemented in Matlab 2018a, and all the convex problems are solved by CVX (Grant and Boyd [17]).
To demonstrate the performance of the dual-density-based reweighted ℓ 1 -algorithms listed in Table 1, we mainly consider the two cases in our experiments (N1) B = 0 and b = 0 (that is the model ( For all cases, we implement the algorithms DRA(I)-DRA(VI), and compare their performance in finding the sparse vectors in T with ℓ 1 -minimization and the algorithm PRA with different merit functions.Before that we test the performance of one-step Dual-density-based algorithm and compare with the ℓ 1 -minimization.
We choose ( 5) and ( 6) for DDA(II) and ϕ ε ((λ 6 ) i ) = (λ6)i (λ6)i+ε , (λ 6 ) i ∈ R + in f (λ 6 ) for DDA(III).By setting the parameters (m, n, ǫ, ε, α, γ, σ 2 ) = (50, 200, 10 −4 , 10 −5 , 10 −5 , 1, 1) and performing 200 random examples for each sparsity level (ranged from 1 to 25), we carry out the experiments for DDA(II) with (6), and DDA(III) with (J3), and compare their performances with ℓ 1 -minimization, which is shown in Figure 1:  Clearly, in this case, the performance of these algorithms are quit similar to that of ℓ 1 -minimization (3).It can be seen that the dual-density-based reweighted algorithms are performing better when the number of iteration is increased and all of them outperform ℓ 1minimization, while the performance of DRA(I) with one or five iterations is similar to the performance of ℓ 1 -minimization.(i)-(iii) indicate the same phenomena: the algorithms based on (47) might achieve more improvement than the ones based on (46) when the number of iteration is increased.For example, in (ii), the success rate of DRA(VI) with five iterations has improved by nearly 25% compared with those with one iteration for each sparsity from 14 to 20, while DRA(V) has only improved its performance by 10% after increasing the number of iterations.We filter the algorithms with the best performance from (i)-(iii) in Figure 2 and merge them into Figure (iv) in 2. It can be seen that DRA(II), DRA(IV) and DRA(VI) slightly outperform CWB with ε = 0.1 and ARCTAN with ε = 0.1.The CWB is one of the efficient choices for the existing reweighted algorithms.We compare the reweighted ℓ 1 -algorithms with updating rule (46) and (47), which are shown in (i) and (ii) in Figure 4, respectively.For the algorithms using (46), when executing 5 iterations, Figure 4 (i) shows that DRA(III) and DRA(V) perform much better than DRA(I).For the algorithms using (47), when executing 5 iterations, Figure 4 (ii) indicates that the success rates of finding the sparse vectors in T by DRA(II) and DRA(VI) are very similar.The other behaviors are similar to the case of B = 0 and b = 0. Finally, we carry out experiment to show how the parameter ε of merit functions affect the performance of locating the sparse vectors in T by dual-density-based reweighted ℓ 1 -algorithms.Some numerical results for PRA-typed algorithms and dualdensity-based reweighted algorithms with different ε indicate that the performance of the DRA-typed algorithm is relatively insensitive to the choice of small ε compared to the PRA-typed algorithms.

Conclusions
In this paper, we have studied a class of algorithms for the ℓ 0 -minimization problem (1).The one-step dual-density-based algorithms (DDA) and the dual-density-based reweighted ℓ 1 -algorithms (DRA) are developed.These algorithms are developed based on the new relaxation of the equivalent bilevel optimization of the underlying ℓ 0minimization problem.Unlike PRA, the DRA can automatically generate an initial iterate instead of obtaining the initial iterate by solving ℓ 1 -minimization.Numerical experiments show that in some cases such as (N1) and (N2), the dual-density-based methods proposed in this paper can perform better than ℓ 1 -minimization in solving the sparse optimization problem (1), and can be comparable to some existing reweighted ℓ 1 -methods.

Figure 1 .
Figure 1.: The performance of DDA(I) and DDA(II) in finding the sparsest points

6. 3 . 1 ( 1 Figure 3 .
Figure 3.: (i)-(iii) Comparison of the performance of DRA with one iteration and five iterations.(iv) Comparison of the performance of the DRA and PRA.