The Boosted DC Algorithm for linearly constrained DC programming

The Boosted Difference of Convex functions Algorithm (BDCA) has been recently introduced to accelerate the performance of the classical Difference of Convex functions Algorithm (DCA). This acceleration is achieved thanks to an extrapolation step from the point computed by DCA via a line search procedure. In this work, we propose an extension of BDCA that can be applied to difference of convex functions programs with linear constraints, and prove that every cluster point of the sequence generated by this algorithm is a Karush--Kuhn-Tucker point of the problem if the feasible set has a Slater point. When the objective function is quadratic, we prove that any sequence generated by the algorithm is bounded and R-linearly (geometrically) convergent. Finally, we present some numerical experiments where we compare the performance of DCA and BDCA on some challenging problems: to test the copositivity of a given matrix, to solve one-norm and infinity-norm trust-region subproblems, and to solve piecewise quadratic problems with box constraints. Our numerical results demonstrate that this new extension of BDCA outperforms DCA.

Observe that we can rewrite problem (P) as an unconstrained nonsmooth DC optimization problem, whose objective function is g + ι F − h, where ι F denotes the indicator function of the feasible set For solving this problem, one can apply the classical DC Algorithm (DCA) in [28,15].DC programming and the DCA have been developed and studied for more than 30 years [15].The DCA has been successfully applied in different fields such as machine learning, financial optimization, supply chain management, and telecommunication, see, e.g.[17,11,24].Nowadays, DCA has become a useful method to solve nonconvex problems.
To accelerate the convergence of DCA, which can be slow for some problems, a new method called Boosted DC Algorithm (BDCA) has been recently proposed in [1,3].The key idea of BDCA is to perform an extrapolation step via a line search procedure at the point computed by DCA at each iteration.This step allows the algorithm to take longer steps than the classical DCA, achieving in this way a larger reduction of the objective value per iteration.In addition to accelerating its convergence, BDCA may have better chances to escape from bad local optima thanks to the line search procedure, see [3,Example 3.3].Therefore, BDCA is not only faster than DCA but also can provide better solutions.Extensive numerical experiments in diverse applications such as biochemistry [1], machine learning [36,22] and data science [3] have been performed where BDCA clearly outperforms DCA.Note that all these applications are modeled as DC problems with g smooth.However, it is important to emphasize that, for unconstrained DC programs, the BDCA proposed in [1,3] is not applicable when the function g in (P) is nonsmooth (see [3,Example 3.4]).
The aim of this paper is to show that BDCA can still be applied if the nonsmooth function g is the sum of a smooth convex function and the indicator function of a polyhedral set.More precisely, we will show that it is possible to use BDCA for solving DC programs with linear constraints of the form (P). To compare the performance of DCA and BDCA, we provide numerical experiments on various challenging problems: to test the copositivity of a given matrix, to solve 1 and ∞ trust-region subproblems, and to find the minimum of a piecewise quadratic function with box constraints.The three first problems have a quadratic objective function and are known to be NP-hard [23] (the first was already heuristically investigated in [8] using DCA), while the last problem has a nonsmooth objective function.Our results confirm that BDCA significantly outperforms DCA in these applications.
The rest of this paper is organized as follows.Section 2 recalls some preliminary results.In Section 3, we propose a new variant of BDCA for solving (P) and prove that the objective value of the sequence generated by the algorithm is monotonically decreasing, and that any limit point of the sequence is a KKT point of the problem.The R-linear convergence of any sequence generated by BDCA in the special case of quadratic objective functions is derived in Section 4. In Section 5, we provide some numerical experiments for testing the copositivity of a given matrix, for solving 1 and ∞ trust-region subproblems and for solving piecewise quadratic problems with box constraints, where we compare BDCA and DCA.Finally, some conclusions and future research are briefly discussed in Section 6.

Preliminaries
In this section, we state our assumptions imposed on (P).We also recall some preliminary and basic results which will be used in the sequel.
Let f : R n → R ∪ {+∞} be a proper extended real-valued convex function.The set dom f := {x ∈ R n | f (x) < +∞} denotes its (effective) domain, and Recall that f is said to be strongly convex with strong convexity parameter Given a set C ⊆ R n , its interior, its relative interior and the convex cone generated by C are denoted by intC, riC and coneC, respectively.The function denotes the indicator function of C, and C is convex if and only if ι C is so.Moreover, its subdifferential ∂ ι C is the normal cone to C, which is given by For proving our convergence results, we will make use of the following assumptions.
Assumption 1.Both g and h are strongly convex on their domain with the same strong convexity parameter ρ > 0.
Assumption 2. The function h is subdifferentiable at every point in dom h; i.e., ∂ h(x) = / 0 for all x ∈ dom h.The function g is continuously differentiable on an open set containing dom h and inf Assumption 3. The feasible set F has a Slater point, that is, there exists x ∈ R n such that a i , x < b i , for all i = 1, . . ., p.
Remark 2.1.Assumption 1 is not restrictive, in the sense that any DC decomposition of φ as φ = g − h, can be expressed as φ = (g 0 holds for all x ∈ ri dom h (by [34,Theorem 23.4]), so the first part of Assumption 2 is clearly satisfied if dom h = R n .A key point of our method is the smoothness of g in Assumption 2, which cannot be in general omitted (see [3,Example 3.4]).
Associated with problem (P), we can construct its dual as which is also a DC program with the same optimal value.Indeed, The DC algorithm is a primal-dual method in the sense that it is aimed to find solutions for both (P) and (D), simultaneously.Our goal then is to design a BDCA variant that allow us to find KKT points of (P) and critical points of (D).
To finish this section, we need to introduce the following notions regarding the geometry of the feasible set F .The cone of feasible directions at x ∈ F is denoted by Due to the polyhedral structure of F , its normal cone coincides with the active cone; i.e., where I(x) stands for the set of active constraints at x, i.e., Since we deal with affine constraints, we have (see e.g.[2, Proposition 4.14]) 3 The Boosted DC Algorithm and its convergence For solving (P), we propose the following method, Algorithm 1, which can tackle more general problems than the Boosted DC Algorithm proposed in [3].
(i) Lines 1 to 6 of Algorithm 1 correspond to the classical DCA for solving (P).Thus, when λ k = 0 for all k, Algorithm 1 coincides with DCA.
(ii) Lines 7 to 15 present the boosting step.It first checks if d k is a feasible direction at y k ∈ F .If so, it then performs a line search step along the direction d k which maintains feasibility to improve the objective value φ .Otherwise, the boosting step is skipped and we simply use the DCA point y k .
(iii) In terms of per-iteration complexity, the boosting step requires to check the feasibility of direction d k , which can be done by comparing the sets of active constraints at x k and y k (see Lemma 3.1).It also requires evaluating the objective function and checking the feasibility of the trial step y k + λ k d k .The computational effort of this task will depend on the particular structure of φ and F .In the case of box constraints, the largest step-size that makes y k + λ k d k feasible can be efficiently computed in Line 8, see Remark 3.1 below.In Section 5 we show some computational results for the trustregion subproblems with 1 norm constraints (where feasibility needs to be checked) and ∞ norm constraints (where the largest step-size can be readily computed).
(iv) When h is differentiable, the algorithms introduced by Fukushima and Mine in [10,20] can be applied in our setting.On the one hand, the algorithm in [10] performs a line search which is similar to the one in Lines 9-11 of Algorithm 1, but with the significant difference that it is performed at the point x k , instead of doing it at the DCA point y k ; that is, it searches for the smallest non-negative integer l such that where 0 < β < 1.Thus, the largest step-size allowed by their algorithm is 1, which corresponds with the point determined by the DCA, since x k + d k = y k .On the other hand, the algorithm defined in [20] performs an exact line search in the direction d k , which may be unaffordable in many practical applications.
Remark 3.1.An explicit formula to compute the largest step-size that makes y k + λ k d k feasible in Line 8 is provided in [9].More precisely, feasibility of the trial step size is guaranteed if λ k is chosen so that see [9,Lemma 5.4(ii)].In principle, this permits to avoid the extra time consumed for checking feasibility in Line 8.However, the computation of λ k in (4) may be even more expensive and inefficient than checking feasibility of λ k in some applications.This is the case, for instance, of the 1 -norm trust region subproblem, whose reformulation as a linearly constrained DC program requires an exponential number of constraints (see Section 5.2).
The next auxiliary lemma shows the equivalence between Line 7 of Algorithm 1 and checking the feasibility of the direction generated by DCA.
Proof.Observe that, for any i ∈ I(y k ), it holds that Hence, the result easily follows by taking into account (3).
In the following proposition, we collect some key inequalities which are useful in the sequel for the convergence analysis of Algorithm 1. Proposition 3.1.Under Assumptions 1 and 2, for all k ∈ N, the next statements hold: (iii) if the condition at Line 7 of Algorithm 1 holds, then there exists some δ k > 0 such that y k + λ k d k ∈ F and Consequently, the backtracking step at Lines 9-11 of Algorithm 1 terminates after a finite number of iterations.
Proof.The proof of (i) is similar to the one of [1, Proposition 3] and is therefore omitted.
To prove (ii), pick any v ∈ ∂ h(y k ).Note that the one-sided directional derivative φ (y k ; d k ) is given by by convexity of h.Since y k is the unique solution of the strongly convex problem (P k ), we can write down the KKT conditions (see, e.g., [2,Theorem 4.20]) of this problem as The fact that h is strongly convex with a parameter ρ implies, by [35,Exercise 12.59], that ∂ h is strongly monotone with constant ρ.Therefore, since v ∈ ∂ h(y k ) and Hence, combining these expressions, together with the fact that x k ∈ F , we can derive and the result follows by combining the last inequality with (5).
Having in mind the condition at Line 7 of Algorithm 1 and Lemma 3.1, we observe that the proof of (iii) is similar to the one of [3, Proposition 3.1], so we omit it for brevity.Remark 3.2 (General convex constraints).Consider a generalized version of (P) where the feasible set is formed by arbitrary convex constraints, i.e., where g and h satisfy Assumptions 1 and 2 and c i : R n → R are smooth, proper, closed and convex functions, for i = 1, . . ., p.Note that problem (P) is a particular instance of (P ) with c i (x) := a i , x − b i , for i = 1, . . ., p.The assertion in Proposition 3.1(ii) still holds true for the more general problem (P ); that is, the direction generated by DCA remains a descent direction provided that x k is feasible for (P ).To confirm this, one can easily check that the proof can be rewritten by replacing the linearity of the gradients by the inequality However, Line 7 of Algorithm 1 is no longer useful to verify if d k is a feasible direction, as the equality in (3) only holds for affine constraints.For general convex constraints, we have the inclusion Therefore, one possibility would be to run the boosting step whenever ∇c i (y k ), d k < 0 for all i ∈ I(y k ).Nevertheless, this will never be the case because x k is feasible for (P ).Indeed, from ( 7), we obtain that In fact, it can be proved that if y k + λ d k ∈ F for some particular λ > 0, then the points in the segment [x k , y k + λ d k ] must be active for all i ∈ I(y k ).
We are now in the position to establish the main convergence result of Algorithm 1.
Theorem 3.1.Under Assumptions 1, 2 and 3, for any x 0 ∈ F , either BDCA returns a KKT point of (P) or it generates an infinite sequence such that the following statements hold.
(i) φ (x k ) is monotonically decreasing and hence convergent to some φ .
(ii) Suppose that {x k } and {u k } are bounded.Then any limit point of {x k } is a KKT point of (P) and any limit point of {u k } is a critical point of (D).
Proof.If Algorithm 1 is terminated at Line 5 and returns x k , then x k = y k .From ( 2) and ( 6), it is clear that x k is a KKT point of (P).Otherwise, by Proposition 3.1 and Line 15 of Algorithm 1, we have where λ k ≥ 0. Therefore, the sequence {φ (x k )} converges to some φ , since it is monotonically decreasing and bounded from below, by (1).As a consequence, we obtain . Now, if x is a limit point of {x k }, then there exists a subsequence x k j converging to x.Then, as y k j − x k j → 0, we have y k j → x.From (6), we obtain Since the sequence {u k } is bounded by assumption, without lost of generality we may assume that u k j → u.It also holds that ∇g(y k j ) → ∇g(x) by continuity of ∇g.The sequence of Lagrange multipliers {µ k j } ∈ R p must also be bounded.Indeed, suppose to the contrary that µ k j → ∞ and assume without loss of generality that lim j→∞ µ k j µ k j = µ * , with µ * ∈ R p + (the non-negative orthant) satisfying µ * = 1.Then, dividing the first equality in (9) by µ k j and letting j → ∞, we obtain Performing the same procedure in the second equality in (9) gives so we deduce that µ * i = 0 for all i ∈ I(x).Thus, Thanks to the Slater Assumption 3, we know that there exists x ∈ R n such that a i , x < b i for all i ∈ I(x).Hence, which implies µ * i = 0 for all i ∈ I(x), since µ * ∈ R p + .This implies that µ * = 0 p , a contradiction with the fact that µ * = 1.
Taking the limit as j → ∞ in (9), thanks to the closedness of the graph of ∂ h (see [34,Theorem 24.4]), we obtain which means that x is a KKT point of (P).From (11) we derive that x ∈ ∂ h * (u) and also that which is equivalent to x ∈ ∂ (g + ι F ) * (u) and, hence, u is a critical point of (D).The proof of (iii) is similar to that of [1, Proposition 5(iii)] and is thus omitted.
Remark 3.3.The assertion in Theorem 3.1(ii) remains valid for any limit point of {x k } in the interior of dom h without requiring the dual sequence {u k } to be bounded.Indeed, let x ∈ int dom (h) be a cluster point of {x k } and let {x k j } be a subsequence converging to x.We can apply [34,Corollary 24.5.1] to obtain that for any ε > 0, there exists a positive integer where B denotes the Euclidean unit ball of R n .Hence, since ∂ h(x) is a bounded set, the dual subsequence {u k j } is also bounded and the proof of Theorem 3.1(ii) stands.
Remark 3.4.Similar to [1, Theorem 1] and [3, Theorem 4.3 and Theorem 4.9], if we further assume that the function φ satisfies the Kurdyka-Łojasiewicz property, then it can be proved that the sequence {x k } converges to a KKT point of (P).Moreover, convergence rates can also be deduced depending on the Łojasiewicz exponent.Especially, when the objective function φ is quadratic (e.g., in some of our numerical experiments), it was proved [18, Theorem 4.2] that the function φ + ι F satisfies the Kurdyka-Łojasiewicz property with exponent 1 2 .Combining this with the technique in [1, Theorem 1], it is a routine task to derive the linear convergence of the sequence {x k } when the sequence has a cluster point.The purpose of the next section is to prove, using a similar technique to [32, Theorem 2.1], that the latter condition is not needed: for quadratic functions, BDCA always generates a sequence which is linearly convergent to a KKT point of the problem without requiring the existence of a cluster point.

Linear convergence for quadratic objective functions
In this section we prove the R-linear convergence of BDCA when the objective function φ of (P) is quadratic, that is, for problems of the form where Q ∈ R n×n is a symmetric matrix, q ∈ R n , and a i ∈ R n and b i ∈ R, for i = 1, . . ., p.In this setting, x is a KKT point of (P Q ) if there exist multipliers µ 1 , µ 2 , . . ., µ p ∈ R such that We denote by F the set of all KKT points of (P).
As Q is not required to be positive semidefinite, (P Q ) is a nonconvex problem.However, the matrix Q can be easily decomposed as , where λ max (Q) is the largest eigenvalue of Q and I denotes the identity matrix.Thus, (P Q ) can be equivalently written in the form of (P) as with g(x) := σ 2 x 2 + q, x and h(x Observe that both functions g and h are strongly convex with parameters σ and σ −λ max (Q), respectively.Thus, Assumption 1 holds for while Assumption 2 trivially holds.By using the indicator function ι F of the feasible set F , we can rewrite problem (P Q ) as an unconstrained nonsmooth DC optimization problem, which can be tackled by the DCA.In this case, the DCA becomes the projected gradient method from convex programming [30] where P F denotes the projection mapping onto the feasible set F .
To derive the R-linear convergence of the iterative sequence generated by the BDCA, we will use the following lemmas.The first one is a classical result regarding the connected components of the KKT set F and can be found in [19,Lemma 3.1].
Lemma 4.1.Let F 1 , F 2 , . . ., F r be the connected components of the KKT set F .Then we have and the following properties hold: (i) each F i is the union of finitely many polyhedral convex sets; (ii) the sets F i , i = 1, 2, . . ., r, are properly separated from each others, that is The second one is a local error bound result originally stated in [19,Theorem 2.3] and later extended in [32, Lemma 2.1].

Lemma 4.2.
There exist scalars ε > 0 and τ > 0 such that for all x ∈ F with We are now in a position to establish the R-linear (aka geometric) convergence of the iterative sequence {x k } generated by BDCA.Geometric convergence is a special type of Rlinear convergence, which is implied by Q-linear convergence (see, e.g.[2, Section 5.2.1]).The next result extends [32, Theorem 2.1] to the BDCA.Its proof employs similar techniques, which are originally based on [19].
Theorem 4.1.If (P Q ) has a solution and Assumption 3 holds, then the sequence {x k } generated by BDCA converges geometrically to a KKT point x of (P Q ), that is, there exist some constants C > 0 and η ∈ ]0, 1[ such that Proof.First, observe that by (8) we have By Theorem 3.1(i), we know that the right-hand side of this inequality converges to zero as k → ∞.Thus, lim Let ε > 0 and τ > 0 be such that ( 16) and ( 17) hold.Since y k = P F x k − 1 σ (Qx k + q) , there exists some k 0 ∈ N such that Hence, we obtain Due to the fact that F is nonempty and closed, there exists which implies lim it follows from ( 19) and ( 21) that Now, let F 1 , F 2 , . . ., F r be the connected components of F .By Lemma 4.1(ii) and ( 22) there exists The last assertion of Lemma 4.1 implies that Note that, since z k is a KKT point of (P Q ), we have From Theorem 3.1(i) we know that lim k→∞ φ (x k ) = φ .Hence, for all k ≥ k 1 , We prove now that φ ≤ c.Indeed, on the one hand, from ( 8) and ( 23), we have for all On the other hand, since y k = P F x k − 1 σ (Qx k + q) and z k ∈ F , we deduce (see, e.g., [4, Theorem 3.16]) that Therefore, Combining ( 25) and ( 26), we obtain for all k ≥ k 1 that From (20), we have Hence, we deduce from (27) that where . Passing (28) to the limit as k → ∞, we get φ = lim k→∞ φ (x k+1 ) ≤ c.
The latter together with (24) imply that φ = c.Therefore, it follows from ( 28) and (8) that or, equivalently, Since {φ (x k )} is monotonically decreasing to φ , we deduce from the last inequality that where Hence, from ( 8) and ( 29), we obtain Then, for all k ≥ k 1 and m ≥ 1, we have This implies that {x k } is a Cauchy sequence and hence converges to a point x ∈ F .According to Theorem 3.1(ii) and Remark 3.3, combined with the fact that dom h = R n is an open set, we must have that x is a KKT point of (P Q ).Moreover, passing the inequality (30) to the limit as m → ∞, we obtain which concludes the proof.

Numerical experiments
The purpose of this section is to compare the performance of BDCA (Algorithm 1) against the classical DCA.We refrain from comparing the algorithms against other methods, as our only aim here is to show that it is more advantageous to apply BDCA than DCA whenever this is possible, not to show that these methods are the most efficient for the problems that we consider.Consequently, we tested both algorithms in the three different settings that can occur, depending on whether the feasible set is unbounded, bounded with general constraints, and bounded with box constraints (so the feasibility in the line search step can be ensured).We applied both algorithms for testing copositivity [8] (unbounded) and for solving the 1 and ∞ trust-region subproblem [7,12] (bounded, with box constraints in the case of ∞ ).In our last experiment we test the performance on piecewise quadratic problems with box constraints, whose objective function is thus nonsmooth.All these problems clearly satisfy Assumptions 1-3 and admit a DC decomposition with g being a quadratic function.Thus, subproblem (P k ) can be solved by computing a projection onto the feasible set F , analogous to that stated in equation (15).The projector onto the feasible set in our applications can be directly computed.All the codes were written in Python 3.7 and the tests were run on an Intel Core i7-4770 CPU 3.40GHz with 32GB RAM, under Windows 10 (64-bit).

Testing copositivity
Recall that a given n × n matrix A is said to be copositive if Ax, x ≥ 0, for all x ∈ R n + , where R n + stands for the non-negative orthant.Copositivity has recently attracted considerable attention in mathematical optimization, see e.g.[5,6,8,25].The problem of determining whether a given matrix is not copositive is known to be NP-complete [23] and it can be recast as the following non-convex optimization problem min The copositivity of A is now equivalent to min x∈R n + φ (x) = 0.In [8], the authors reformulated (31) as a DC problem according to the decomposition in ( 12)-( 13), and applied DCA as a heuristic for testing whether a matrix is not copositive.To be more specific, given σ > max {λ max (A), 0}, the reformulation of problem (31) as a DC problem becomes Under this decomposition, DCA is applied as a heuristic to determine the copositivity of a given matrix as follows: if at some iterate φ (x k ) < 0, then the matrix is non-copositive; otherwise, if a critical point is reached, the instance is undecidable.
Copositive matrices play an important role in graph theory.The size of the largest complete subgraph contained in a given graph G, denoted by γ(G), is known as the clique number of G.If A and E are the adjacency matrix of G and the matrix of all ones, respectively, it can be shown (see [14,Corollary 2.4]) that Therefore, the matrix µ(E − A) − E will be copositive if µ ≥ γ(G) and non-copositive otherwise.Furthermore, in the latter case, the matrix will be closer to the copositive cone as µ approaches γ(G) from the left.
In our tests we considered matrices constructed as follows.Let G be the cycle graph of n nodes whose adjacency matrix, A cycle = (a i j ) ∈ R n×n , is given component-wise by Its clique number is clearly γ(G) = 2. Hence, the matrix is copositive for all µ ≥ 2 and non-copositive for µ < 2. In fact, when µ = 2 it coincides with the so-called Horn matrix H n (see, e.g., [13,Section 4]).For instance, the Horn matrix H 5 takes the form Experiments In our numerical tests we used the parameter setting as α := 0.01 and β := 0.1.
The trial step-size λ k in the boosting step of BDCA (Line 8 of Algorithm 1) was chosen to be self-adaptive as in [3].This technique proceeds as follows.At the first iteration, choose any λ 0 > 0.Then, for k ≥ 1, if the line search has never been used, we take λ k = λ 0 .Otherwise, if the two previous trial step-sizes have been directly accepted (without being reduced by the backtracking step), then the last accepted positive λ is scaled by a factor of γ > 1 and used as the current trial step-size.If that is not the case, the trial step-size is set as the last positive value of λ accepted in previous iterations.In our tests we used λ 0 := 1 and γ := 2.
In our first numerical experiment, we considered Horn matrices of different sizes, H n , for n ∈ {1000, 1250, . . ., 5000}.For each size, DCA and BDCA were run from the same 100 starting points randomly generated in the intersection of the non-negative orthant with the unit ball.We stopped the algorithms when d k ≤ 10 −9 for the first time.The results are shown in Figure 1, where we can observe that, on average, BDCA was more than 15 times faster than DCA for all sizes.As expected, since Horn matrices are copositive, both algorithms converged to critical points with a positive objective value very close to 0. It is worth to mention that the objective function at the points found by BDCA was usually smaller than at the ones found by DCA.We also show in Figure 2 the percentage of iterations at which the boosting step was activated, that is, when condition in line 7 of Algorithm 1 was fulfilled.We observed that, for all sizes, the linesearch was performed in around the 45% of the iterations.In Figure 3 we show the behavior of both algorithms in a particular instance for testing the copositivity of H 1000 .2: Ratio of the number of iterations at which the boosting step of BDCA was activated with respect the number of iterations performed for testing the copositivity of Horn matrices of order n ∈ {1000, 1250, . . ., 5000}.For each size, we represent this ratio for 100 random starting points (green crosses) and the median ratio among all of them (white circle).In our second experiment we considered matrices of the form Q µ n as defined in (32).In order to generate hard instances (those which are close to be copositive) we took µ := 1.9.For each size n ∈ {1000, 1250, . . ., 5000}, DCA and BDCA were run from the same 100 random starting points generated as in our previous experiment.In this case, we let the algorithms run until they find a negative objective value (which exists because of the noncopositivity of the matrices).We used two stopping criteria, whose results are depicted in Figure 4: on the left, the algorithms were stopped when any negative objective value was found; on the right, the objective value was required to be smaller than −10 −4 .We do not show any results on the second criterion for n greater than 2000 because DCA becomes extremely slow (for n = 2000, the instances solved by DCA required more than 5 minutes on average).This time the advantage of BDCA with respect to DCA increased with the size n, and was significantly greater than in the previous experiment when the second criterion was used.
Remark 5.1.Clearly, BDCA iterations will be more time consuming than those of DCA due to the need of checking condition in Line 7 and the linesearch procedure.Note that the linesearch requires of function evaluations of f , which may be expensive at some applications.For this reason, in our experiments we compare the total CPU running time of the algorithm, as it includes the possibly wasted extra time of the boosting step.

Solving the 1 and ∞ trust-region subproblem
The trust-region subproblem (TRSP) arises in trust-region methods, which are optimization algorithms that consist in replacing the original objective function by a model which is a good approximation of the original function at the current iterate.The models are usually defined to be a quadratic function, in which case the task consists in solving nonconvex  The application of the DCA for solving the Euclidean trust-region subproblem (the most common choice for the norm) was first proposed in [29] and its convergence was further studied in [16,31,33].Thanks to the structure of the trust-region subproblem, the implementation of the DCA is very simple and efficient, as the iterations are given by (15) and only require matrix-vector products.On the other hand, as observed in [7,Section 7.8] and [12], the choice of the Euclidean norm for the definition of the trust-region has several drawbacks, especially when the problem at hand is box-constrained, in which case the intersection of the Euclidean ball of the trust region with the feasible set has a complicated structure.The choice of the ∞ norm • ∞ for the trust-region when the problem involves bounds of the type l ≤ x ≤ u permits to ensure feasibility by simply requiring similar bounds on the trust-region subproblem.Unfortunately, the 1 and ∞ trust-region subproblem are NP-hard [23], a class of problems for which it is commonly believed that there are no polynomial time algorithms.
In this subsection we compare the performance of DCA and BDCA on the challenging 1 and ∞ trust-region subproblems.To this aim, we consider the DC decomposition ( 12)-( 13) of the (possibly) nonconvex part of the objective function.Thus, given σ > max {λ max (A), 0}, the subproblems for the 1 norm take the form (P ∞ ) Observe that the feasible set of (P 1 ) can be equivalently written in terms of 2 n linear constraints of the type ±x 1 ± x 2 ± • • • ± x n ≤ r, so the problem is a particular instance of (P).In principle, as mentioned in Remark 3.1, the largest step-size derived in [9] ensuring feasibility could be computed.However, as the feasible set has 2 n linear constraints, computing (4) would be very time-consuming.Nonetheless, given a point y k and a feasible direction d k , an upper bound of the maximum value of the step-size λ k for (P 1 ) is given by the Euclidean norm, since where so one must always choose λ k ≤ λ k to avoid extra time in checking feasibility.On the other hand, for the ∞ norm, the situation is more favorable, as feasibility can be guaranteed whenever which coincides with (4), so no time will be wasted in Step 8 of Algorithm 1 if one takes λ k satisfying the latter inequalities.
Experiments We used the same parameter setting as in the previous section for Algorithm 1, except for the value of γ, which was increased to 20.The matrix A and the vector b were generated with coordinates randomly and uniformly chosen in ] − 1, 1[, while the value of r was randomly and uniformly chosen in the range ]0, √ n/4[ for (P 1 ) and ]0, 1/4[ for (P ∞ ).For each size n ∈ {1000, 1500, . . ., 5000}, DCA and BDCA were run from the same 100 starting points randomly picked inside the trust-region.We stopped the algorithms when d k / x k ≤ 10 −8 for the first time.The time comparison for both norms are summarized in Figure 5. BDCA was consistently faster than DCA.On average, it was 3.8 times faster for solving (P 1 ) and 3.65 times faster for (P ∞ ).
We also show in Figure 6 the percentage of iterations at which the boosting step was activated.We observe that, for all sizes, the linesearch was performed on average in around 80% of the iterations for the case of the 1 norm, and 40% for the ∞ norm.
When applied to (P 1 ), both algorithms obtained the same value in 875 instances, BDCA reached a smaller objective value in 18 instances, while DCA did so in 7. When applied to (P ∞ ), BDCA obtained a smaller objective value in 461 instances, while DCA did so in 417 (the value was the same in 22 instances).In Figure 7 we show the behavior of both algorithms for a particular instance with n = 1000, which was chosen so that both algorithms attained the same objective value.
We conclude this section with some comments about the rate of convergence of DCA and BDCA.According to Theorem 4.1, both methods are R-linearly convergent, so by (18),    6: Ratio of the number of iterations at which the boosting step of BDCA was activated with respect the number of iterations performed for solving randomly generated 1 (left) and ∞ (right) trust-region subproblems in R n , with n ∈ {1000, 1250, . . ., 5000}.For each size, we represent this ratio for 100 random starting points (green crosses) and the median ratio among all of them (white circle).
for some η ∈ ]0, 1[.In Figure 8, for a particular random instance, we have represented the sequence { x k − x 1/k }, which was computed for both algorithms after obtaining the limit point x with a higher precision.We observe that the value of η is very close to 1 for DCA, while line-searches clearly helped improving this slow rate.A possible explanation about the bad R-linear convergence rate of DCA can be deduced from the proof of Theorem 4.1.The upper bound obtained there is β ρ+β , with β > σ .When λ max (Q) > 0, Assumption 1 holds for ρ as in (14), which would then be equal to σ − λ max (Q).Therefore, where the last inequality holds since the left term is an increasing function with respect to β .In our numerical tests we took σ − λ max (Q) = 0.01 (we numerically observed how larger values slowed down both DCA and BDCA).In the random instances shown in Figure 8 λ max (Q) was equal to 25.77 for the 1 norm and 25.89 for the ∞ norm, which would give upper bounds on the R-linear convergence rate of 0.9996 for both problems.According to the figures, these bounds seem to be tight, particularly for the ∞ norm.

Piecewise quadratic problems with box constraints
Finally, we test BDCA on optimization problems with nonsmooth objective function.To this aim, we consider piecewise quadratic problems with linear constraints of the form Experiments For our numerical tests, we generated problems of the form (33) as follows.
First, the vector l ∈ R n was generated with coordinates randomly chosen in ] − 5, 5[.Then, we randomly generated a vector e ∈ R n with coordinates in ]0, 5[ and we set u := l + e.Finally, the points c 1 , . . ., c m ∈ R n were generated so that they became unfeasible for all constraints as follows: each component i ∈ {1, . . ., n} of each of these vectors was randomly picked in one of the intervals ]l i − 10, l i [ or ]u i , u i + 10[.In our experiment, for each n ∈ {100, 150, . . ., 1000} and each m ∈ {100, 150, . . ., 1000}, DCA and BDCA were run from the same 100 starting points randomly picked inside the feasible set.We used the same parameter setting and stopping criterion for the algorithms as in the previous TRSP experiments.The time comparison is shown in Figure 9, where we represent the median ration between DCA and BDCA among the 100 instances.Detailed results for all 100 runs are shown in Figure 10

Concluding remarks
We have extended the Boosted DC Algorithm for solving linearly constrained DC programming.The algorithm is proved to provide KKT points of the constrained problem.In addition, we have shown why this approach cannot be extended to more general convex constraints.The theoretical results were confirmed by some numerical experiments for testing the copositivity of matrices and for solving 1 and ∞ trust-region subproblems.For copositive matrices, BDCA was on average fifteen times faster than DCA; for non-copositive ones, this advantage was much more superior; and for trust-region subproblems, BDCA was more than three times faster than DCA.We also considered piecewise quadratic problems with box constraints, which thus have nonsmooth objective functions.We observed again that BDCA was faster than DCA and, further, that the advantage was more noticeable as the number of pieces of the objective function increased.Future research includes investigation of alternative approaches to derive a Boosted DCA that permits to address any type of constrained DC programs.It would also be interesting to combine BDCA with the inertial technique employed in [27].Figure 11: Ratio of the number of iterations at which the boosting step of BDCA was activated with respect the number of iterations performed for solving randomly generated piecewise quadratic problems with m pieces in R n , for n ∈ {100, 200, . . ., 1000} and m = 500 (left), and for n = 500 and m ∈ {100, 200, . . ., 1000} (right).For each size, we represent this ratio for 100 random starting points (green crosses) and the median ratio among all of them (white circle).

Lemma 3 . 1 .
If x k and y k are generated by Algorithm 1, then

Figure 1 :
Figure1: Comparison between DCA and BDCA for checking the copositivity of Horn matrices of order n ∈ {1000, 1250, . . ., 5000}.For each size, we represent the ratios of the running time between DCA and BDCA for 100 random starting points (blue crosses) and the median ratio among all of them (white circle).

Figure
Figure2: Ratio of the number of iterations at which the boosting step of BDCA was activated with respect the number of iterations performed for testing the copositivity of Horn matrices of order n ∈ {1000, 1250, . . ., 5000}.For each size, we represent this ratio for 100 random starting points (green crosses) and the median ratio among all of them (white circle).

Figure 3 :
Figure3: Value of the objective function of DCA and BDCA (using logarithmic scale in the left axis) as well as the step-size used in BDCA (right axis, dotted blue line), with respect to the iteration, for checking the copositivity of the Horn matrix of order n = 1000 from the same random starting point.

Figure 4 : 1 2
Figure 4: Comparison between DCA and BDCA for detecting the non-copositivity of matrices of various orders n.For each size, we represent the ratios of the running time between DCA and BDCA for 100 random starting points (blue crosses) and the median ratio among all of them (white circle).

Figure 5 :
Figure5: Comparison between DCA and BDCA for solving randomly generated 1 (left) and ∞ (right) trust-region subproblems in R n , with n ∈ {1000, 1250, . . ., 5000}.For each size, we represent the ratios of the running time between DCA and BDCA for 100 random starting points (blue crosses) and the median ratio among all of them (white circle).

Figure
Figure6: Ratio of the number of iterations at which the boosting step of BDCA was activated with respect the number of iterations performed for solving randomly generated 1 (left) and

Figure 7 :
Figure 7: Value of the objective function of DCA and BDCA (using logarithmic scale in the left axis) as well as the step-size used in BDCA (right axis, dotted blue line), with respect to the iteration, for solving a randomly generated 1 (top) and ∞ (bottom) trustregion subproblem in R 1000 from the same random starting point.Both algorithms attained the same objective value φ .

Figure 8 : 7 .x
Figure 8: Comparison on the rate of R-linear convergence of DCA and BDCA for solving the same randomly generated 1 (top) and ∞ (bottom) trust-region subproblems in R 1000 from Figure 7.
(a)  for fixed n = 500 and Figure10(b) for fixed m = 500.From the results of this comparison we infer that the superiority of BDCA increases as m does, while it stabilizes as n increases.As in previous experiments, we show in Figure11the percentage of iterations at which the boosting step was activated.The objective value attained by DCA and BDCA were the same in all instances.

Figure 9 :
Figure 9: Comparison between DCA and BDCA for solving randomly generated piecewise quadratic problems with m pieces in R n , for n ∈ {100, 200, . . ., 1000} and m ∈ {100, 200, . . ., 1000}.For each pair (n, m), we represent the median of the ratios of the running time between DCA and BDCA for 100 random starting points.

Figure 10 :
Figure 10: Comparison between DCA and BDCA for solving randomly generated piecewise quadratic problems with m pieces in R n , for n ∈ {100, 200, . . ., 1000} and m = 500 (left), and for n = 500 and m ∈ {100, 200, . . ., 1000} (right).For each problem, we represent the ratios of the running time between DCA and BDCA for 100 random starting points (blue crosses) and the median ratio among all of them (white circle).