DC programming in communication systems: challenging problems and methods

Nonconvex optimization becomes an indispensable and powerful tool for the analysis and design of Communication Systems (CS) since the last decade. As an innovative approach to nonconvex programming, Difference of Convex functions (DC) programming and DC Algorithms (DCA) are increasingly used by researchers in this field. The objective of this paper is to show that many challenging problems in CS can be modeled as DC programs and solved by DCA-based algorithms. We offer the community of researchers in CS promising approaches in a unified DC programming framework to tackle various applications, such as routing, power control, congestion control of the Internet, resource allocation in networks, etc.

an increasing amount of effort has been put into nonconvex optimization to deal with challenge problems appeared in many applications of this filed (in fact, most real-life problems are of nonconvex nature). The absence of convexity creates a source of difficulties of all kinds, in particular, the distinction between the local and global minima, the nonexistence of verifiable characterizations of global solutions, etc., that causes all the computational complexity while passing from convex to nonconvex programming. In general, unlike the convex programming, there is no iterative algorithm converging directly to a global solution of a nonconvex program. Finding a global solution of a nonconvex program, especially in the large-scale setting, is the holy grail of optimizers.
The special context of practical problems in CS, along with the dramatic progress of novel technologies, requires well-adapted optimization techniques. For example, solution methods to network management in the context of mobile service should take into account the following questions: • The topology of networks is dynamic and real-time data transmissions are needed. Hence, real-time algorithms are expected. • Self-organization and self-configuration require all protocols in mobile networks to be distributive and collaborative. By the way, distributed algorithms are necessary. • Location/tracking management, in addition to the handover management and routing. In the case of hybrid communication networks, the choice of the best access gateway among a number of available access technologies becomes one of the important considerations. Routing in hybrid networks should be handled by finding a suitable mathematical model and efficient algorithms. • Multi-user communications service involves large scale setting optimization problems. Therefore, the algorithms should be able to solve large-size problems.
Many challenging issues arise from nonconvex optimization in communication systems, especially how to design suitable models and develop efficient, fast, scalable and distributed algorithms to tackle large-scale practical problems in the areas of wireless networking, internet engineering, mobile services in self-organized hybrid networks.
As an innovative approach to nonconvex programming, Difference of Convex functions (DC) programming and DC Algorithms (DCA) are increasingly used by researchers in CS (see e.g. [1,2,16,22,45,54] and references therein). The objective of this paper is to show that many challenging problems in CS can be modeled as DC programs and solved by DCA-based algorithms. We offer the community of researchers in CS promising approaches in a unified DC programming framework to tackle various applications such as routing, power control, congestion control of the Internet, resource allocation in networks, etc.

Nonconvex optimization problems in CS
In terms of optimization, nonconvex problems appeared in CS can be divided into three classes: -minimizing a nonconvex function on a convex set; -minimizing a convex/nonconvex function on a nonconvex set; -minimizing a convex/nonconvex function on a convex/nonconvex set with integer variables.
The reader will see that these classes of nonconvex programs in CS can be formulated or reformulated as DC programs and solved by DCA.

Why DC programming and DCA?
DC programming is an extension of convex programming which is sufficiently large to cover almost all nonconvex optimization problems, but not much to still allow using the arsenal of powerful tools in convex analysis and convex optimization. DC programming and DCA constitute the backbone of nonconvex programming and global optimization. The use of DCA for solving nonconvex optimization problems in CS is motivated by the following facts: • DCA is a philosophy rather than an algorithm. For each problem, we can design a family of DCA-based algorithms. The flexibility of DCA on the choice of DC decomposition offers DCA schemes having the potential to outperform standard methods. • By exploiting the nice effect of DC decomposition of the objective function we can build distributed algorithms. This issue is very important in communication networks that involve multi-users, in particular in the purpose of personalized mobile services. • Convex analysis provides powerful tools to prove the convergence of DCA in a generic framework. Hence, any DCA-based algorithm enjoys (at least) general convergence properties of the generic DCA scheme that are already available. • DCA is an efficient, fast and scalable method for smooth/ nonsmooth nonconvex programming. To the best of our knowledge, DCA is actually one of the rare algorithms for nonsmooth nonconvex programming which allows to solve large-scale DC programs. DCA was successfully applied to a lot of different and various nonconvex optimization problems to which it quite often gave global solutions and proved to be more robust and more efficient than related standard methods. In particular, DCA has already efficiently solved large-scale DC programs in network optimization (see [1,2,[24][25][26][27][29][30][31][45][46][47][48][49][50][51]54] and the list of references in [22] ).
We will show how to solve these three classes of problems in CS by DC programming and DCA. For beginning, let us give in Sect. 2, a brief introduction of DCA programming and DCA. The solution methods of each class of problem will be presented in Sects. 3, 4, and 5 where, in addition to development of generic models and algorithms, methods for typical applications in CS will be illustrated. In Sect. 6, we mention another issue in CS for which DCA can also be investigated. Section 7 concludes the paper.

A brief introduction
We are working with the space X = R n which is equipped with the canonical inner product ·, · and the corresponding Euclidean norm · , thus the dual space Y of X can be identified with X itself. We follow [13,37], for definitions of usual tools in modern convex analysis, where functions could take the infinite values +∞. A function θ : X → R ∪ {+∞} is said to be proper if it is not identically equal to +∞. The effective domain of θ , denoted by dom θ , is The indicator function χ C of a nonempty closed convex C set is defined by χ C (x) = 0 if x ∈ C, +∞ otherwise. The set of all lower semicontinuous proper convex functions on X is denoted by 0 (X ). Let θ ∈ 0 (X ), then the conjugate function of θ , denoted θ * , is defined by We have θ * ∈ 0 (Y ) and θ * * = θ .
Nonsmooth convex functions are handled using the concept of subdifferentials. For θ ∈ 0 (X ) and x 0 ∈ dom θ, ∂θ(x 0 ) denotes the subdifferential of θ at x 0 , and is defined by Recall the well-known properties related to subdifferential calculus of θ ∈ 0 (X ): A function θ ∈ 0 (X ) is said to be polyhedral convex if where C is a nonempty polyhedral convex set in X . A DC program is of the form with g, h ∈ 0 (X ). Such a function f is called a DC function, and g − h, a DC decomposition of f , while the convex functions g and h are DC components of f. In (P dc ) the nonconvexity comes from the concavity of the functionh (except the case h is affine since (P dc ) then is a convex program). It should be noted that a convex constrained DC program can be expressed in the form (4) by using the indicator function on C, that is Hence, throughout this paper, DC program of the form (4) is referred to as "standard DC program".
Polyhedral DC programs (P dc ) (i.e., when g or h are polyhedral convex) play a key role in nonconvex programming (see [25,34,35] and references therein), and enjoy interesting properties related to local optimality and DCA's convergence.
The DC duality is based on the conjugate functions and the fundamental characterization of a convex function θ ∈ 0 (X ) as the pointwise supremum of a collection of affine minorants: That associates the primal DC program (4) (P dc ) with its dual DC program (D dc ) defined by and investigates their mutual relations. We observe the perfect symmetry between primal and dual DC programs: the dual to (D dc ) is exactly (P dc ). It is worth noting the wealth of the vector space DC(X ) = 0 (X ) − 0 (X ) spanned by the "convex cone" 0 (X ) [34,35]: it contains most realistic objective functions and is closed under operations usually considered in optimization.
The complexity of DC programs resides, of course, in the lack of verifiable globality conditions. Lets recall the general local optimality conditions in DC programming (subdifferential's inclusion): if x * is a local solution of (P dc ) then The condition (7) is also sufficient (for local optimality) in many important classes of DC programs (see [34,35]).
A point x * is said to be a critical point of g-h (or generalized KKT point for (P dc )) if Note that, by symmetry, the dual part of (7) and (8) are trivial.
DC Programming and DCA were introduced by Pham Dinh Tao in their preliminary form in 1985. These theoretical and algorithmic tools are extensively developed by Le Thi Hoai An and Pham Dinh Tao since 1994 to become now classic and increasingly popular. DCA is a continuous primal dual subgradient approach based on local optimality and duality in DC programming for solving standard DC programs (P dc ).

DCA's philosophy
The key idea behind DCA is to replace in (P dc ), at the current point x k , the second component h with its affine minorant defined by to give birth to the primal convex program of the form whose solution set is ∂g * (y k ).The next iterate x k+1 is taken in ∂g * (y k ).
Dually, a solution x k+1 of (P k ) is then used to define the dual convex program (D k+1 ) obtained from (D dc ) by replacing g * with its affine minorant to obtain the convex program whose solution set is ∂h(x k+1 ). The next iterate y k+1 is chosen in ∂h(x k+1 ).

DCA's convergence properties:
Convergence properties of DCA and its theoretical basis can be found in [25,34,35]. For instance, it is important to mention that i) DCA is a descent method without linesearch but with global convergence: the sequences {g(x k ) − h(x k )} and {h * (y k ) − g * (y k )} are decreasing. ii) If the optimal value α of problem (P dc ) is finite and the infinite sequences {x k } and {y k } are bounded, then every limit point x * (resp. y * ) of the sequence {x k } (resp. ∂h(x * ) ∩ ∂g(x * ) = ∅ (resp. ∂h * (y * ) ∩ ∂g * (y * ) = ∅). iii) DCA has a linear convergence for DC programs. iv) DCA has a finite convergence for polyhedral DC programs.
For a complete study of DC programming and DCA, the reader is referred to [25,34,35] and the references therein. Without going into details, let us mention the key properties of DCA. DCA consists in an iterative approximation of a DC program by a sequence of convex programs that will be solved by appropriate convex optimization algorithms. This property is called successive convex approximation (SCA) in some recent works in CS. d. Versatility With suitable DC decompositions DCA generates most standard algorithms in convex and nonconvex optimization. Hence DCA offers a wide framework for solving convex/nonconvex optimization problems. In particular, DCA is a global approach to convex programming, i.e., it converges to optimal solutions of convex programs reformulated as DC programs. Consequently, it can be used to build efficient customized algorithms for solving convex programs generated by DCA.

How to apply DCA for solving practical problems
It would be wrong to think that using DCA for solving a practical problem is a simple procedure. Indeed, the generic DCA scheme is an overall philosophical idea rather than a single algorithm. There is not only one DCA but a family of DCAs for a considered problem. While DC decompositions exist for a very large class of functions, there are no general procedure for determining such DC decompositions. The design of an efficient DCA for a concrete problem is an art which should be based on theoretical tools and on its special structure. It consists of the five steps : a. Find a DC formulation of the considered optimization problem: this can be done if the feasible domain C is a convex set and the objective function f is DC, otherwise we must use the approximation or reformulation technique based on relevant theoretical tools. b. Design a DCA scheme for (P dc ). This consists of i) computing a subgradient of h and ii) solving the convex program of the form (P k ). In the ideal case (what is not always possible, especially for nondifferentiable DC programs) an optimal solution of (P k ) is explicitly determined, which corresponds to the explicit computation of a subgradient of g * . Otherwise one should find efficient convex optimization algorithms suitably adapted to (P k )'s specific structures in order to save computation time. c. Search for "good" DC decompositions. If the computations in i) and ii) are not satisfactory (costly, computed solutions by DCA are not sufficiently good, …) then one has to find more suitable DC decomposition. That is a difficult issue and should be done by exploiting distinctive features of the class of DC programs at hand. Here reformulation techniques play a key role to obtain suitable models. Reformulation techniques should be diversified and have recourse to good mathematical backgrounds in numerical analysis and optimization. d. Search for "good" starting points. That can be done by combining with other approaches (heuristic or local search methods, solutions of convex relaxation problems in global optimization methods). Another efficient way in finding a convex minorant consists of the objective function on the feasible set C and solving the resulting convex program whose solution is used to initialize DCA. This strategy must be developed in depth and specifically, by exploiting the structure of the problem (P dc ). e. Globalize DCA : to guarantee globality of sought solutions or to improve their quality it is advised to combine DCA with global optimization techniques.
It goes without saying that the two steps (c) and (d) and solution methods for convex programs (P k ) (if necessary) constitute the key issues for successful applications of DC programming and DCA. We show below how to use DCA for solving the three classes of nonconvex problems in CS.

Minimizing a nonconvex function under a convex constraint set
The mathematical formulation of this class of problem is given by where C ⊂ R n is a closed nonempty convex set and f is a nonconvex function on R n . Many applications in CS can be formulated in the form of (P) whose typical examples are Network Utility Maximization (NUM) [9,30], power control problem [2,6,48,54]), dynamic spectrum management in DSL systems [27,53] MIMO relay optimization [15], sum-rate maximization, proportional-fairness and max-min optimization of SISO/MISO/MIMO ad hoc networks [14,17,43,44].
We will consider two examples of applications of DCA on NUM [30] and DSL [27]. Complete works on NUM and power control using DCA can be found in [24,48,31].

Can one get a DC formulation for any problem in this class?
The answer is "yes", one can always formulate (P) as a standard DC program of the form (P dc ). In fact, as indicated above, the vector space DC(X ) contains most realistic objective functions. In the rare cases where f is not DC (for example, when f is a discontinous function), one can approximate f by a DC function or use reformulation techniques to get an equivalent DC program.

Useful DC decompositions and corresponding DCA
To illustrate the way to construct DC decompositions and design the resulting DCA, let us show the two useful DC decompositions of the problem (P) in (9) and discuss on their effectiveness.
Assume that there exists a nonnegative number η (resp. ρ) such that the function 1 2 We can now write (P) in the form of DC program (P dc ) with, for example, two following DC decompositions: and The DCA applied to (P) with decomposition (10) and/or (11) can be described as follows.
Algorithm DCAP1 Let x 0 be given in R n . Set k ← 0.
Repeat -Calculate x k+1 by solving the convex program Algorithm DCAP2: Let x 0 be given in R n . Set k ← 0. Repeat i.e. x k+1 = Proj C (y k /ρ).
Here, Proj C stands for the orthogonal projection on C.
As indicated above, we are greatly interested in the choice of DC decompositions: what is "the best' among (10) and (11)? The answer depends on C and f . In fact, the performance of the DCA depends upon that of the algorithm for solving convex programs (12) and (13). For certain problems, for example, box constrained quadratic programming and ball constrained quadratic programming, Algorithm DCAP2 is greatly less expensive than Algorithm DCAP1, because the orthogonal projection onto C in these cases is given in explicit form (see for example [35]). In practice, when f is differentiable and the computation of its gradient is not difficult, and the projection on C can be inexpensively determined, the use of DCAP2 is very recommended.
For using the above DC decompositions the crucial question is how to determine a nonnegative number η (resp. ρ) such that the function 1 2 ) is convex. In many practical problems such η and ρ exist and can be computed according to the properties of the function f . For example, when f is a smooth function with Lipschitz continuous gradient ρ is nothing but the Lipschitz constant of ∇ f.

DCA for network utility maximization
Network utility maximization (NUM) has many applications in network rate allocation algorithms and Internet congestion control protocols. Consider a communication network with L links, each with a fixed capacity of c l bps, and S sources (i.e., end users), each transmitting at a source rate of x s bps. Each source s emits one flow, using a fixed set L(s) of links in its path, and has a utility function U s (x s ). Each link l is shared by a set of sources denoted S(l), (the set of users using link l). NUM, in its basic version, consists of maximizing the total utility of the network s U s (x s ) over the source rates x, subject to linear flow constraints for all links l: where S denotes the set of users. Here the vector variable is x = (x s ) s∈S ∈R S and the constraint set is a well-defined convex polytope. There are many nice properties of the basic NUM model due to several simplifying assumptions of the utility functions and flow constraints, which provide the mathematical tractability of problem (14) but also limit its applicability. In particular, the utility functions U s are usually assumed to be concave increasing. In such a case, the optimization problem (14) is a convex program, and so far is easy to solve. In the past, maximization of concave utility functions and the resulting distributed rate allocation for elastic traffic have gained extensive attention. Based on the concavity and continuity assumptions on utility functions and the elasticity assumption on application traffic, rigorous mathematical frameworks for standard price-based distributed algorithms have been investigated. However, it is known that for many multimedia applications, user satisfaction may assume nonconcave shape as a function of the allocated rate. Furthermore, in some other models of utility functions, the concavity assumption on U s (x s ) is also related to the elasticity assumption on rate demands by users. When demands for x s are not perfectly elastic, U s (x s ) may not be concave. In this case, the resulting NUM becomes nonconvex and significantly harder to be analyzed and solved. Since inelastic flows with nonconcave utility functions represent important applications in practice, today solving the NUM problem with nonconcave utility function is a challenge of the analysis and design of communication systems by nonconvex optimization techniques.
As an illustrative example, we consider the NUM problem with Sigmoidal-like utility functions [30] that are used in many multimedia applications and Internet congestion control (for example, the utility for voice applications is modeled by a Sigmoidal function with a convex part at low rate and a concave part at high rate). Other useful utility functions can also be solved by DCA-based algorithms (see [24]).
Consider Sigmoidal utilities in a standard form: where a s > 0, b s < 0 and a s , b s are integers. The Sigmoidal function is neither convex nor concave, but it is DC (difference of convex functions). Then the resulting NUM problem is a DC program. We are going to present a DC decomposition for the Sigmoidal function.
We have Since g s and h s are convex functions, the function g and h are convex too (note also that g and h are differentiable). Hence the Sigmoidal NUM problem is a DC program that can be written in the standard form as According to the general DCA scheme, applying DCA to (15) amounts to computing two sequences {y k } and {x k } in the way that Hence the algorithm can be described as follows.
Algorithm: DCA for Sigmoidal utility maximization: 1. Choose x 0 ∈ R S as the initial point. Let > 0 be sufficiently small, k ← 0.

Repeat
Set y k = (a 1 e a 1 x k 1 +b 1 , . . . , a s e a s x k 1 +b 1 ). Set x k+1 as an optimal solution of the convex program min s∈S e 2(a s x s +b s ) 1 + e (a s x s +b s ) − x, y k : x ∈ K (16) Note that the convex problem (16) can be distributedly implemented using the Lagrangian dual decomposition method as shown in [30].

Spectrum management problem (SMP)
Discrete multitone (DMT) [42] has been adopted as standard in various DSL applications such as asymmetric DSL (ADSL) and more recently for very-high-bit-rate digital subscriber line (VDSL) by International Telecommunication Union (ITU). For a sufficiently large number of subcarriers, DMT transmission over a frequency-selective fading channel can be modeled as a set of K parallel independent flat-fading sub-carrier AWGN channels (Additive white Gaussian noise). Under this Gaussian assumption, the achievable bit-loading rate of user n on tone k is r n k = log 2 1 + The goal of spectrum management problem is to achieve best possible user rates tradeoff among users in the network, i.e., to find the boundary of rate region. Assume that each user is subject to an individual total transmission power constraint. One way to define the SMP in the literature is consider the following optimization problem max p 1 , p 2 ,..., p K R 1 : R n ≥ T n , ∀n ≥ 1; k p n k ≤ P n , ∀n, k; where T n is minimum target rates of user n and P n is maximum total transmission power of user n. The SMP (17) aims to maximize the rate of user 1 while guarantees the achievable rates of other users higher than their required minimum target rates T n . P n denotes the maximum total transmission power of user n. Spectral mask constraints p n,mask k may also be applied if needed.
Among various Dynamic Spectrum Management (DSM) techniques, centralized Optimal Spectrum Balancing (OSB) achieves the maximum data rates by computing the optimal PSDs (power spectral density) for all modems in DSL systems. The centralized algorithm based on dual decomposition for OSB, proposed in [3], decouples joint optimization across all tones to make the problem solvable per-tone basis. If the rate region is convex (the assumption that the rate region is convex is justified in [3] for two-user in DSL system, and the same logic for two-user can be applied to justify the convexity of rate region for multiple-user case), solving the problem (17) amounts to solving the following weighted sum rate optimization problem [5]: where the weight for user 1, ω 1 , is set to unity, resulting in the maximization of the rate of user 1; whereas ω n ≥ 0, n = 1 can be adjusted to guarantee the target rate of user n. In [27], we have investigated DC programming and DCA for solving (18).

A nice DC formulation of SMP (18)
First, we write Problem (18) A natural DC decomposition of f (easily deduced from the definition of r n k ) has been given in [27]. However, as indicated in [27], from numerical point of views, the DCA scheme corresponding to this DC decomposition is not interesting because it requires an iterative algorithm for solving a convex program at each iteration. In an elegant way we introduced in [27] a nice DC reformulation of the problem (19) (based on the second DC decomposition discussed in Sect. 2) for which the resulting DCA is explicitly determined via a very simple formula. Such a DC decomposition of f is inspired by the following result.
Using the theorem above, we get the next DC decomposition of f : and Problem (19) can be now written in the form or again, in the standard form of DC program: Then, DCA apply to Problem (19) is described as follows.

DCA-SMP
Initialization: Let > 0 be given, p (0) ∈ C be an initial point, set r := 0; Repeat: set and calculate p (r +1) ∈ ∂(g +χ C ) * (q (r ) ) by solving the linear constrained quadratic program The advantage of the DC decomposition (21) is that the resulting DCA-SMP requires, at each iteration, the computation of the projection of a point on the set C (having very specific structure) for which efficient algorithms are available (see [27]).

Nonconvex constraint set
A typical application of this class of problems is Internet routing (see for example [5]). Mathematically, the nonconvex constraint set is expressed as where C is a closed nonempty convex set in R n , g i , h i ∈ 0 (R n ), i = 0, . . . , m, and f i (x) ≤ 0 are called DC constraints. The generic formulation of this class of problems takes the form where E is assumed to be nonempty. This class of nonconvex programs (called general DC programs) is the most general in DC Programming and, a fortiori, more difficult to treat than that of standard DC programs (P dc ) because of the nonconvexity of the constraints. It is not new and has been addressed in [34]. Its renewed interests is due to the fact that this class appears, increasingly, in many models of nonconvex variational approaches.
We can solve (P dcg ) by DCA via penalty techniques. First, we transform (P dcg )- (23) into (P dc ) via penalty techniques in DC programming.
Let the functions p and p + be defined by which are DC functions with the following DC decompositions (in case g i , h i are finite on C for i = 1, . . . , m,(see, e.g., [34] ), obtained directly from those of f i , i = 1, . . . , m.
The general DC program (P dcg )- (23) can then be formulated as (26) and its penalized is a standard DC program Let DC decompositions of f 0 and p + be given by where g 0 , h 0 , p 1 , p 2 are convex functions defined on the whole space. Then, we have the following DC decomposition for ϕ τ where, Exact penalty (relative the constraint p + (x) ≤ 0) for (26) means that there is τ 0 ≥ 0 such that for all τ > τ 0 both DC programs (P dcg )- (23) and (P τ )- (27) are equivalent in the sense that α(τ ) = α and (P dcg )- (23) et (P τ )- (27) have the same (global) solution set. In this case, the solution of (P dcg )- (23) can be achieved by applying DCA to a standard DC program (P τ )- ( 27) with τ > τ 0 . Exact penalty techniques in DC programming have been widely investigated in our works [25,28,34]. However, from a computational point of view, an inconvenience of this exact penalty method is that the penalty parameter is generally unknown. Moreover, there are practical optimization problems for which the exact penalization is not satisfied. In [28], we proposed to develop the DCA for solving general DC program (P dcg )-( 23) by using a penalty technique with updated parameter.
The generalized DCA can be deduced from DCA as follows: instead of fixing the penalty parameter τ , DCA is applied to the sequence of (P τ k ) with an increasing sequence of penalty parameters {τ k } given by a updating rule from the current iteration x k such that x k+1 is the next iteration of DCA applied to (P τ k ) from x k . Our work consists in the statement of appropriate updating rules for the sequence {τ k } and the refinement of constraint qualifications used, in order to ensure global convergence (to a critical point of (P dcg )- (23)) and efficiency of DCA1. It is also important that the sequence {τ k } is constant after a certain rank. The penalty introduced uses l ∞ -norm, but we can also consider the l 1 -norm where q( . Some other DCA-based algorithms for (P dcg )- (23) have been developed in [36].

Integer variables
Several applications in CS can be formulated as an optimization problem with (mixed) integer variables. Here we mention some classes of problems which have been successfully solved by DCA.

Cross-layer optimization in multi-hop time division multiple access (TDMA) networks
Efficient design of wireless networks is a challenging task due to the interference nature of shared wireless medium.
Recently, the concept of cross-layer design has been investigated extensively. In [26,29] a cross-layer optimization framework, i.e., joint rate control, routing, link scheduling and power control for multi-hop TDMA networks, has been considered. Particularly, we studied a centralized controller that coordinates the routing process and transmissions of links such that the network lifetime is maximized [29] and the quality of-service (QoS) constraints on the minimum source rates are satisfied. Alternatively, the energy consumption is an important design criterion for a multi-hop wireless network.
In [26], we considered the energy minimization-based crosslayer design problem. We will show below that the aforementioned problems can be formulated as mixed integer-linear programs (MILP) and then efficiently solved by DCA.
In the considered TDMA network, time is partitioned into fixed-length frames, and each frame is further divided into J time slots with unit duration. Since the resource allocation is the same in all frames, we concentrate our design on a single frame. A node may need to transmit in one or more slots for its own traffic and/or relay traffic from other nodes. If a node transmits in a slot, while its transmission power can be varied from [0, P max ], its transmission rate is fixed at a unit rate. In the TDMA-based network, a channel is specified by two elements ( j, l), j ∈ J , l ∈ L, where J = {1, 2, . . . , J }. For the channel, the resource allocation is denoted by (s l j , P l j ), where s l j = 1 means link l is active at slot j while s l j = 0 otherwise, and P l j > 0 denotes the transmission power of link l at slot j if s l j = 1, P l j = 0 otherwise. At each node, the difference of its outgoing traffic and its incoming traffic should be the traffic generated by itself, i.e., where O(n) and I(n) are the set of outgoing links and incoming links at node n, respectively. The values of s n for the non-source nodes are set to zero, or equivalently all the traffic entering such nodes must be routed.
The energy consumption at node n ∈ N can be written as where l and ε l denote the energy needed to transmit and receive a unit of traffic over link l, respectively. Note that l , ε l include the energy consumed by the signal processing blocks at the link ends.

Interference Model
Wireless channel is a shared medium and interferencelimited where links contend with each other for channel use. Moreover, interference relations among the nodes and/or links can be modeled in various ways, for example, by using the signal-to-interference-plus-noise-ratio (SINR)based model [32,52]. Specifically, if the link l ∈ L is active at slot j (i.e., s l j = 1), the following inequality should hold so as to guarantee the transmission quality of the link where SINR l j is the SINR for link l at slot j, h kl is the path gain from the transmitter of link k to the receiver of link l , η l is the noise power at receiver of link l, and γ th is the required SINR threshold for accurate information transmission.
We assume that all wireless nodes are low-mobility devices and/or the topology of the network is static or changes slowly allowing enough time for computing the new scheduler. An example of such networks is a wireless sensor network for environmental monitoring with fixed sensor loca-tions. In this case, the need for distributed implementation is not necessary.
From the preceding discussions, the energy minimizationbased cross-layer design, i.e., joint rate control, routing, link scheduling, and power allocation problem can be mathematically formulated as min r n ,P l j , s l j n∈N E n (35a) subject to: wheren denotes the common sink node for all data generated in the network, D is a very large positive constant. The objective function is the energy consumption in the network. 1 Constraints (35b) ensure that the data generated by source nodes are routed properly. Constraints (35c ) guarantee that the rate for each node is no less than a minimum rate. The minimum rates are possibly different for nodes and are usually determined by the network QoS. Nodes which do not generate traffic have r n = r min n = 0. Constraint (35d) is the flow conservation at the traffic destination for all the sources. Constraints (35e) state that a node can not receive and transmit simultaneously in one particular time slot. Constraints (35f) make sure the SINR requirement is met: if a link l is active in time slot j, then the SINR at receiver of link l must be larger than the given threshold γ th which also depends on the system implementation. Constraint (35f) is automatically satisfied if link l is not scheduled in time slot j. Constraint (35g) states that if a link l is scheduled for time slot j, i.e., s l j = 1 , then the corresponding power value P l j must be less than P max . Otherwise, P l j obviously equals to zero. We also impose binary integer constraints on s l j . It can be seen that the cross-layer optimization problem (35a)-(35h) belongs to a class of well-known mixed-integer linear programs (MILPs). The combinatorial nature of the optimization (35a)-(35h) is not surprising and it has been shown in some previous works, albeit with different objective functions and formulations [7,32,52]. Theoretically, MILPs are NP-hard which is clearly inviable for practical scenarios when the dimension is large. It has been shown in [26] that, at optimality, the source rate constraints (35c) must be met with equalities for all sources.
Note that by considering (35a), one aims to minimize the total energy consumption, it may cause some particular nodes spending more energy than the other nodes, and thus, running out of energy quicker. Therefore, equal energy distribution among nodes is not optimal. Another design objective which may help to prevent such situation is as follows subject to: The constraints (35b)-(35h).
The optimization problem (36a)-(36b) aims at minimizing the maximum energy consumed at nodes(s). As a result, more nodes are likely to be involved in the routing algorithm, i.e., relaying information for other nodes. For simplicity, the optimization problem (35a)-(35h) is often considered in the literature.
The cross-layer optimization problem (35a)-(35h) has worst case exponential complexity when BnB methods are used to compute the solution. Moreover, when modeling practical networks and depending on the number of links, nodes and time slots, problem with large sizes may arise. As a result, it is extremely difficult to schedule links optimally. Most research in literature is based on heuristic at the cost of performance degradation, for example, see [7,8,52]. In [26] , we investigated a DCA scheme to solve the mixed 0-1 linear program (35a)-(35h) efficiently.

Quality of service (QoS) routing problems
The Unicast (resp. Multicast) QoS routing emphasizes to find paths (resp. a set of paths) from a source node to a destination node (resp. a set of destination nodes) satisfying the QoS requirements. The Routing problems become more complex as far as we consider mobile networks or hybrid networks, because of dynamic topology and real time routing procedure. As an example, we consider a scenario in Multicast routing problem, such as we are staying in a car parking place. The mobile services are provided in each moving car, equipped with a mobile device, via a car service center likes in the car parking place. There are m cars sending their requests to a mobile car service center, they need help to find the route to go to their destinations under the travel time constraint, the less latency traffic jam, the jitter time delay constraint, the travel cost (same sources, different destinations, considering local constraints to each mobile vehicle). Therefore, based on the temporary update data of the network state, the mobile car service system has to calculate the route and given the answer for each car in a few seconds. In this context, we need a centralized and efficient algorithm to calculate the routes.
The problem of finding a path in network with multiple constraints (the MCP problem) is NP-complete. We reformulated the MCP [46] and MCOP (multi-constrained optimal path problem) [47,51] problem as Binary Integer Linear Programs (BILP) and investigated DCA-based algorithms for solving them. The DCA is fast and furnished an optimal solution in almost all cases, and a near-optimal solution in the remaining cases. For large scale problems we investigated the proximal decomposition technique to solve convex subprograms at each iteration of DCA. Computational results show that this approach is efficient, especially for large-scale settings where the powerful CPLEX fails to be applicable.

The partitioning-hub location-routing problem
The Partitioning-Hub Location-Routing Problem (PHLRP) is a hub location problem involving graph partitioning and routing features. PHLRP consists of partitioning a given network into sub-networks, locating at least one hub in each sub-network and routing the traffic within the network at minimum cost. There are various important applications of PHLRP, such as the deployment of network routing protocol problems and the planning of freight distribution problems. In [50] we formulated this problem as an Binary Integer Linear Programming (BILP) and then investigate DCA for solving it. Preliminary numerical results are compared with the well-known commercial solver CPLEX, they show the efficiency and the superiority of DCA.

The car pooling problem
Car pooling is a well-known transport solution that consists of sharing a car between a driver and passengers sharing the same route, or part of it. The challenge is to minimize both the number of required cars and the additional cost in terms of time for the drivers. To solve the problem, several tasks should be performed: choosing drivers and passengers, allocating passengers to cars, computing an optimal route for the cars. As such, the car pooling transport problem may be described as some kind of fleet management problem. In [49], we formulated this problem as a Mixed Integer Linear Program for which DCA has been efficiently applied. In order to globally solve the problem, we combine DCA with classical Branch and Bound algorithm. DCA is used to calculate upper bound while lower bound is obtained from a linear relaxation problem. Preliminary numerical results are compared with CPLEX. They show the efficiency and the superiority of DCA-based algorithms.

The minimum m-dominating set problem
Let G = (V, E) be a graph, where V is the set of nodes and E is the set of edges of G. A dominating set D of a graph G = (V, E) is a subset of nodes D ⊆ V such that every vertex not in D is joined to at least one member of D by some edge. The domination number γ (G) is the number of vertices in a smallest dominating set for G. The dominating set problem is a classical NP-complete decision problem [10] and has various applications in CS. A classical network application for this problem would be to choose a set of locations to install relay antennas. In ad hoc networks, creating a dominating set is a way to organize the network and is generally used as a first step for generating a connected dominating set [11]. This problem is formulated as a BILP for which DCA is investigated in [39]. Numerical results show that the DCA is efficient even for very large instances of problem. Moreover, our algorithm obtained better solution in significantly less time than CPLEX.
In a general framework, we show below how to solve optimization problems with integer variables by DCA.
where f is a finite concave function on D.
Theorem 2 [23] Let K be a nonempty-bounded polyhedral convex set in IR n and f, p be finite concave on K . Assume the feasible set of (P) be nonempty and p be nonnegative on K . Then there exists t o ≥ 0 such that for every t > t o the following problems have the same solution sets: DCA for solving (P t ) Assuming that a subgradient of − f is computable. One DC decomposition of F t can be chosen as In this case, (P t ) is a polyhedral DC program because χ K is a polyhedral convex function, and the general DCA scheme becomes: x k+1 ∈ argmin − x, y k : x ∈ K .
Besides the computation of subgradients of − f and set ∇(− p)(x) = i∈J (2x i − 1), the algorithm requires one linear program at each iteration The convergence properties can be stated as follows: Theorem 3 i) DCA generates a finite sequence x 1 , . . . , x k * contained in V(K) such that f (x k+1 ) + t p(x k+1 ) ≤ f (x k ) + t p(x k ), p(x k+1 ) ≤ p(x k ) for each k , and x k * is a critical point of g − h. ii) If, in addition, h is differentiable at x k * , then x k * is actually a local minimizer to (P t ).

Extension cases
Based on new results related to exact penalty and error bounds in DC programming [28], the same reformulation technique via exact penalty can be used for -Linear constrained mixed zero-one DC programming problems. -Linear constrained mixed integer DC programming problems.

Another issue: solving convex programs by DCA
Another issue which is also important in CS but were not discussed in this paper is how to solve large-size convex programs. Although convex programming has been studied for about a century, an increasing amount of effort has been put recently into developing fast and scalable algorithms to deal with large scale problems. While some convex regularizations involve convex quadratic programs (QP) for which standard QP solvers can be certainly used, many first-order methods have been developed in the last years for large scale convex problems. Since DC programming and DCA encompass convex programming and convex programs can be recast as (infinitely many) DC programs to which DCAs become global, (i.e. providing optimal solutions), one can make use of these theoretical and algorithmic tools to better reformulate and solve convex programs.