A revision of the rectangular algorithm for a class of DC optimization problems

Every continuously differentiable function can be represented as a difference between a convex function and an additively separable convex function. We show that a DC function with this structure can be optimized using the rectangular algorithm for separable nonconvex optimization, and develop a revision to this algorithm for practical use. We also report some numerical results which indicate the effectiveness of the revision.


Introduction
The DC optimization includes most optimization problems arising in a variety of applications because every continuously twice differentiable function is a DC function, represented as a Difference of two Convex functions on any compact convex set. In fact, for such a class of function f : R n → R and a convex set ⊂ R n , it is shown in [6,14] that g(x) = f (x) + ρ x 2 is convex if ρ ≥ −(1/2) min y T ∇ 2 f (x)y | x ∈ , y = 1 , and therefore f is decomposable into a DC function: To solve this optimization problem of high generality, Tuy proposed a conical algorithm for global optimization in 1987 [13], and Tao and Hoai-An published in 1997 the so-called DCA (DC Algorithm) based on local optimization [12]. Since the 2000s, various studies with DC functions have been conducted mainly in the field of machine learning, and accordingly local optimization algorithms such as DCA have also been widely used there (see e.g., [1,7,16]). In contrast to this, any of global optimization algorithms, including the conical algorithm, The author was partially supported by a Grant-in-Aid for Scientific Research (C) 19K11837 from Japan Society for the Promotion of Sciences.
B Takahito  have not yet reached the stage of practical application, though discussed in detail in textbooks [9,10,15]. In general, it is not easy to detect a DC representation suitable for each algorithm, such as the conical algorithm and DCA, from a given continuous function. However, even a simple DC representation like (1) has the noteworthy feature that the subtrahend is additively separable. In this paper, we show that a DC function with the same structure can be globally optimized on a bounded polyhedron using the rectangular algorithm [3], a branch-and-bound algorithm for separable nonconvex optimization problems. Since the algorithm requires solving a sequence of convex minimization problems, we revise it so that the convex minimization procedure can be warm-started for efficient processing.
The organization of this paper is as follows. In Section 2, after formalizing the problem setting, we illustrate how the rectangular algorithm behaves on this problem. In Section 3, in order to warm-start the convex minimization procedure, we further relax the convex relaxation solved in the bounding process. We also show that the branching process can be carried out with ω-subdivision using the optimal solution of the modified convex relaxation. In Section 4, we summarize the algorithm and analyze its convergence. Finally, in Section 5, we report the numerical results, which indicate that the revised rectangular algorithm is promising for practical application in some fields.

DC optimization and the rectangular algorithm
Let g be a convex function from R n to R, and h 1 , . . . , h n strictly convex functions from R to R. The problem considered in this paper is a DC optimization problem of the form: where A ∈ R m×n and b ∈ R m . Let us denote the feasible set by We assume that D is a bounded polyhedron with a nonempty interior and contained in the effective domain of g. Let For each j, we have s 1 j < t 1 j , and assume that the interval [s 1 j , t 1 j ] lies in the effective domain of h j . We can locate a globally optimal solution of (2) by almost directly applying the rectangular algorithm designed for separable nonconvex optimization [3], as is illustrated below.

Behavior of the rectangular algorithm on (2)
In the rectangular algorithm, M 1 is subdivided into a number of subrectangles M i , i ∈ I , such that where int( · ) represents the set of interior points. Let M = [s, t] = M i for an arbitrary i ∈ I . If D ∩ M = ∅, then M contains no optimal solution of (2) and is discarded from consideration. Otherwise, bounding is performed to see if an optimal solution of (2) is given by a subproblem Since this is essentially the same problem as (2), we cannot solve (P) directly. Instead, (P) is approximated into a convex minimization problem by replacing h j 's involved in the objective function f with linear functions: Thus a convex relaxation of (P) is constructed as follows: We can apply any one of the standard optimization algorithms to (R) (see e.g., [2]) and obtain an optimal solution, denoted ω M . From the definition of c M j , it is easy to see that c M Furthermore, from the strict convexity of h j , we have the following. .
Since both c M j and d are linear, d(x) < c M j (x) holds for any x ∈ (u, v). As noticed above, u cannot agree with s j , and hence s j lies in (u, v).
We see from Proposition 2.1 that where the equality holds if x lies on a vertex of M, while Therefore, the optimal value f M (ω M ) of (R) is a lower bound for the subproblem (P). If f M (ω M ) ≥ f (x * ) holds for the incumbent x * , the best known feasible solution of (2), then M contains no feasible solution better than x * and is excluded from further consideration. Otherwise, branching is performed and M is subdivided for further examination.
How to subdivide M greatly affects the convergence of the algorithm. It is naturally guaranteed with an exhaustive subdivision rule such as bisection, but the algorithm also converges under the ω-subdivision rule given below. ω-subdivision Whatever the subdivision rule is, if the algorithm does not terminate, it generates a sequence of nested rectangles: Let f i and ω i denote the objective function and the optimal solution, respectively, of (R) defined for M = M i . Under the ω-subdivision rule, we have the following result, the proof of which is almost the same as for separable concave minimization (see e.g., 7.1.6 in [15]).

Proposition 2.2 If the sequence (4) is generated according to
The rectangular algorithm repeats updating the incumbent x * while alternating the above branching and bounding processes. If we select the rectangle M with the smallest f M (ω M ) to subdivide in the branching process, Proposition 2.2 guarantees that every accumulation point of the sequence {ω i } is an optimal solution of (2). To terminate the algorithm in a finite amount of time, M is discarded usually if the following pruning criterion is satisfied for a prescribed tolerance ∈ (0, 1):

Revision of the rectangular algorithm
The rectangular algorithm presented in the preceding section requires solving the convex relaxation (R) iteratively. To solve the target problem (2) with sufficient accuracy, we have to solve tens of thousands of convex minimization problems even for a small scale instance, as will be seen in Section 5. Solving them one by one from scratch would take a huge amount of time and ruin the potential of the rectangular algorithm. In this section, to avoid such a case, we discuss a 'warm-start' which allows a convex minimization algorithm to solve the current relaxation with the solution of the last solved relaxation as the initial solution.

Relaxation of the convex relaxation (R)
Let ω denote the solution to the last solved relaxation. After obtaining ω , which rectangle to select next for bounding depends on the rule of searching the branching tree, and the rectangle M defining the current relaxation (R) is not guaranteed to contain ω . In the numerical experiments reported in Section 5, the Frank-Wolfe algorithm [4] was used to solve (R). If ω is feasible for (R), the algorithm uses ω as the initial solution, but otherwise, it requires preprocessing to find a feasible solution. In order to save this effort, we propose to relax (R) further into the following, so that ω is feasible for the current relaxation without fail: Since the feasible set is the same as (2), this problem is assumed to have an optimal solution, denoted . This weakness is, however, offset by the advantage that ω can be used as the initial solution for a convex minimization algorithm of the type generating a sequence of feasible solutions. In addition to the benefit of warm-start, when D has some favorable structure like a graph, (R) might be solved efficiently without damaging that structure. What seems to be problematic is that ω M is not always a point in M. If ω M / ∈ M, the ω-subdivision rule described in the preceding section might fail to subdivide the rectangle M. As is shown below, such a concern is unnecessary.
Let us try to apply the ω-subdivision rule to M at ω M . In In that case, the rectangle M contains no feasible solution better than ω M . In fact, we have which implies that Even if D ∩ M = ∅, the following chain of inequalities holds: We can therefore remove M from further consideration. In either case, ω M is a feasible solution of (2), and so the incumbent x * can be updated with

Revised rectangular algorithm
The revised rectangular algorithm we propose can be summarized as follows.
As for the rule of selecting M from the list L , which is not specified in the branching process of this description, we can adopt any one of the rules commonly used in ordinary branchand-bound algorithms, e.g., the best-bound rule that selects the M with the smallest β M , the depth-first rule that selects the last generated M, and so forth. If the algorithm terminates, then x * is output as an -optimal solution of the problem (2), which satisfies The idea of removing constraints to the rectangle M from the relaxation is also adopted in Soland [11] for separable concave minimization. If the convex function g is absent from the objective function, (R) is a linear program and some vertex of the feasible set D provides an optimal solution. Soland has exploited the finiteness of the vertices to show the finite convergence of his rectangular algorithm. However, once g exists as a nonlinear function, there is no guarantee that the optimal solution ω M of (R) lies on a vertex of D. In general, when the tolerance is set to zero, the algorithm does not terminate and generates an infinite sequence of nested rectangles in L . Let us observe the behavior of this sequence in the next section, and analyze the convergence of revised_rectangular.

Convergence of the revised rectangular algorithm
Suppose that the algorithm revised_rectangular generates a sequence of nested rectangles: (5) To simplify notation, we omit 'M' from the superscript 'M i ' such as ω M i j . Since the sequence Since λ i ∈ (0, 1) for every i ∈ K , we can assume λ i → λ 0 ∈ [0, 1] as i ∈ K → ∞, by passing to a subsequence if necessary. Therefore, the sequence {c i where h + and h − are the right and left derivative functions of h 1 . The subdifferential of h 1 at ω 0 1 is given by Even assuming ω i 1 − t i 1 → 0, the result is the same. If s 0 1 < t 0 1 , then we have If s 0 1 = t 0 1 , then {c i 1 | i ∈ K } converges to c 0 1 pointwise, as seen above, and again we have Now, we are ready to prove the convergence of revised_rectangular.

Theorem 4.3
Concerning the convergence of the algorithm revised_rectangular, the following results hold: (a) If > 0, then revised_rectangular terminates in a finite number of iterations and returns an -optimal solution x * of (2). (b) If = 0 and revised_rectangular terminates, then x * is an optimal solution of (2). (c) Even if revised_rectangular does not terminate, if the best-bound rule is applied to the branching process, then every accumulation point of the sequence {ω i } is an optimal solution of (2).

Proof Since (b) is obvious, let us prove (a) and (c).
(a) Assume that revised_rectangular does not terminate and generates the sequence of nested rectangles (5). By passing to a subsequence of K , we can assume that {ω i | i ∈ K } converges to some ω 0 ∈ D, and so c i However, for each i ∈ K , we have which implies δ j ≤ 0 for every j by Lemma 4.2, and hence n j=1 δ j ≤ 0. This is a contradiction because the list L should be empty after finitely many iterations, and the algorithm should terminate. (c) Let x ∈ D be any feasible solution of (2). Also let M denote the rectangle to which x belongs when M i is selected in the branching process according to the best-bound rule. Then we have the following for each i ∈ K : by noting n j=1 δ j ≤ 0. Thus, ω 0 is an optimal solution by the arbitrariness of x ∈ D.

Computational experiments
So far, we have assumed that the objective function f involves n univariate functions h j 's, but in fact some of them can be missing. For example, if f (x) = g(x) − p j=1 h j (x j ) for p ≤ n − 1, we only need to linearize p strictly convex functions, and obtain a lower bound when s j ≤ x j ≤ t j , j = 1, . . . , p, from a convex minimization problem: where c M j is defined in the same way as in (3). This relaxation corresponds to (R), and if s j ≤ x j ≤ t j , j = 1, . . . , p, are dropped from the constraint, it corresponds to our proposed relaxation (R). In the branching process, we can subdivide the rectangle M = p j=1 [s j , t j ] in the p-dimensional space, instead of the n-dimensional whole space. Therefore, if p is small, even a large-scale problem might be solved in a realistic computational time.
To confirm the above, we randomly generated quadratic optimization problems of the following form, and solved them using the algorithm revised_rectangular: where θ is a positive parameter, and e ∈ R m is the all-ones vector. All entries of the last row of A ∈ R m×n were set to 1/n to make the feasible set bounded. The remaining components of A were randomly generated in the interval [−0.5, 1.0] so that the percentages of zeros and negative numbers were about 20% and 10%, respectively. As regard to Q ∈ R n×n , the tridiagonal components of the pth principal submatrix were random numbers in [−1.0, 1.0], and the other components were all set to zero. Also the components of c ∈ R n were random numbers in [−1.0, 1.0]. The DC representation of f was given as follows: where q i j is the (i, j)-component of Q. We also solved the following concave minimization problem derived from (6): With slight modification to solve (6) and (7), we coded revised_rectangular in GNU Octave 4.0.0 [5], a numerical computing environment, and tested it on one core of Intel Core i7 (3.70GHz). As the procedure for solving the convex relaxation, we used the Frank-Wolfe algorithm, which was also coded from scratch in Octave. In the branching process, we adopted the depth-first rule, and in the bounding process, we replaced the pruning operation where was set to 10 −5 , in order to prevent the convergence from being affected by the magnitude of the optimal value. For the purpose of comparison, we also coded the usual rectangular algorithm, which uses (R) as the convex relaxation, and refer to this code as usual_rectangular. It should be noted that usual_rectangular behaves exactly the same way as the standard algorithm in textbooks [9,10,15] for separable concave minimization problems like (7). Using these two program codes, we solved ten instances of (6) and (7) for each set of parameters m, n, p, θ, and measured their average performances.  Figure 1 shows the change in the average number of convex relaxations solved in each code to process one instance when (m, n, θ) = (20, 100, 5.0), and the number p of nonlinear variables was increased by 10 from 20 to 80. The solid lines are the results for (6), and the dashed lines are the results for (7). There is little difference in the behavior between revised_rectangular and usual_rectangular for (6), and in both codes the number of solved relaxations increases exponentially with respect to p. Figure 2 shows the change in the average CPU time (in seconds) required by each code under the same conditions. We can see from this figure that revised_rectangular is fairly improved over usual_rectangular in terms of time efficiency. This indicates that the Frank-Wolfe algorithm was successfully warm-started as intended in Section 3.1. Also, comparing the solid and dashed lines, we see that (6) can be solved in two or three times the computational time taken to solve a separable concave minimization problem of the same scale using the standard algorithm. Figures 3 and 4 show the average behavior of both codes when p was fixed at 50 and the weight θ of the linear term in the objective function was increased from 1.0 to 10.0. Again, there is no difference in the number of relaxations solved in both codes for (6), but the CPU time tends to be less for revised_rectangular than for usual_rectangular, especially when θ is large.

Numerical results
The numerical results on larger-scale instances of (6) are summarized in Table 1, where the column labeled '#' lists the average number of solved relaxations and the column labeled 'time' the average CPU time (in seconds) when (m, n, p) ranged up to (60, 200, 100) with θ fixed at 5.0. Values in parentheses represent the standard deviation. As the problem size increases, the number of relaxations needed for revised_rectangular increases slightly faster than for usual_rectangular. However, for any size, revised_rectangular requires only about half the CPU time taken by usual_rectangular, and solves an instance of (m, n, p) = (60, 200, 100) in less than ten minutes on average. Although this problem size might not be so large for DCA and heuristics, the output solution of revised_rectangular is assured with any desired accuracy, which would make it a promising algorithm for applications demanding rigorous accuracy.

Comparison with a general global optimization algorithm
Lastly, let us discuss a little about the comparison of revised_rectangular with other global optimization algorithms. To our knowledge, with the exception of revised_rectangular and usual_rectangular, no algorithm has yet been proposed aimed at globally optimizing DC functions with a separable subtrahend. For this reason, we tried to compare revised_rectangular with a global optimization algorithm for general DC optimization problems. Among such algorithms, typical ones that often appear in textbooks are the conical algorithm and the simplicial algorithm, the latter of which we chose to implement in Octave.
The simplicial algorithm first proposed in [8] is a kind of branch-and-bound algorithm, which performs branching by subdividing a simplex enclosing the feasible set D into subsimplices. In contrast to our approach, the minuend term g of the objective function f is linearized and the resulting concave underestimator is used to obtain a lower bound for f over each subsimplex. The convergence is guaranteed by any exhaustive subdivision rule, e.g., bisection which we adopted. In addition to relaxing the tolerance for pruning to 10 −2 , we also set each subsimplex to be pruned if its longest edge is less than 10% of the initial simplex. Despite those loose settings, the performance of the simplicial algorithm for problem (6) was considerably poor, and it could only solve small instances involving up to four nonlinear variables. As shown in Table 2, both revised_rectangular and usual_rectangular are incomparably superior to the simplicial algorithm in average performance, even though their settings were the same as before. However, the performance of the simplicial algorithm is expected to improve by strengthening the lower bound. In that case, the utility of the simplicial algorithm would be significantly enhanced because it does not require any extra structure for the objective function. Next time, we will report on a revision to the simplicial algorithm elsewhere.
appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.