Bi-perspective functions for mixed-integer fractional programs with indicator variables

Perspective functions have long been used to convert fractional programs into convex programs. More recently, they have been used to form tight relaxations of mixed-integer nonlinear programs with so-called indicator variables. Motivated by a practical application (maximising energy efficiency in an OFDMA system), we consider problems that have a fractional objective and indicator variables simultaneously. To obtain a tight relaxation of such problems, one must consider what we call a “bi-perspective” (Bi-P) function. An analysis of Bi-P functions leads to the derivation of a new kind of cutting planes, which we call “Bi-P-cuts”. Computational results indicate that Bi-P-cuts typically close a substantial proportion of the integrality gap.

when t = 0 and y is the origin.) It is known that the perspective function of f (y) is convex (or concave) if and only if f (y) is convex (or concave) [17].
Perspective functions have a wide range of uses in, e.g., convex analysis, optimisation, statistics, signal processing and machine learning (see [4] for a recent survey). In this paper, we focus on two uses in optimisation. The first is to convert fractional programs into convex optimisation problems, and thereby render them easier to solve [3,19]. The second is to reformulate certain mixed-integer nonlinear programs (MINLPs), in such a way that the continuous relaxation is strengthened [6,7]. The MINLPs in question are those with so-called indicator variables.
Recently, while studying certain problems arising in the context of mobile wireless communications, we encountered an MINLP that exhibited both of these features (i.e., a fractional objective and indicator variables) simultaneously. It turns out that, in order to obtain tight convex relaxations of such problems, one needs to study a new kind of function, which we call a bi-perspective (Bi-P) function. These functions are the topic of this paper.
Unfortunately, it turns out that the Bi-P function of a concave function is not concave in general. To deal with this, we characterise the concave envelope of a Bi-P function over a rectangular domain. We then derive a family of linear inequalities, which we call Bi-P cuts, that completely describe the concave envelope. We also show how to generalise the Bi-P cuts when there are "multiple-choice" constraints, stating that two or more indicator variables cannot take the value 1 simultaneously. Finally, we report the results of some computational experiments. It turns out that, for our particular problem, the new cuts typically close over 95% of the integrality gap.
The paper is structured as follows. The relevant literature is reviewed in Sect. 2. In Sect. 3, we present the theoretical results concerning Bi-P functions and Bi-P cuts. In Sect. 4, we consider the multiple-choice case. In Sect. 5, we give details of our practical application. Computational results are given in Sect. 6, and concluding remarks are made in Sect. 7.

Literature review
Now we briefly review the relevant literature. We cover perspective functions and perspective cuts in Sects. 2.1 and 2.2, respectively. In Sect. 2.3, we give some background on optimisation in mobile wireless communications.

Perspective functions
As mentioned in the introduction, two particular uses of perspective functions will turn out to be of importance to us. These are as follows.
1. Consider a fractional program of the form: where C ⊆ R n is convex, f (y) is non-negative and concave over the domain C, and g(y) is positive and convex over the domain C. It is known [3,19] that such a problem can be reformulated as where t is a new non-negative variable representing 1/g(y), and y is a new vector of variables representing y/g(y). The reformulated problem can often be solved efficiently, since the objective function is concave and the feasible region is convex. 2. Consider an MINLP of minimisation type, in which the cost function contains a term f (y), where y is a vector of continuous variables and f is a convex function. Suppose that the MINLP also contains an indicator variable, i.e., a binary variable x with the property that, if x takes the value zero, then all of the components of y must also take the value zero. Then the continuous relaxation of the MINLP is strengthened, while maintaining convexity, if we replace f (y) with the perspective function x f (y/x) [6,7]. (For a generalisation, see [1].)

Perspective cuts
One problem with perspective functions is that they are non-differentiable at the origin. Moreover, they become increasingly ill-conditioned as t approaches zero from above. This can cause algorithms for convex optimisation (or indeed convex MINLP) to run into serious numerical difficulties. To get around this, Frangioni and Gentile [6] proposed to approximate perspective functions using linear inequalities. They show that imposing a non-linear constraint of the form z ≥ x f (y/x), where f is a convex function and x is an indicator variable, is equivalent to imposing the linear constraints for allȳ in the domain of y. The constraints (1) are called perspective cuts. Although the perspective cuts are infinite in number, they can be very useful as cutting planes within an exact algorithm for convex MINLPs with indicator variables (see again [6,7]). Note that the classical Kelley cuts [10] for the function f (y) take the form Thus, the perspective cuts can be viewed as strengthened Kelley cuts.

Optimisation in mobile wireless communications
In mobile wireless communications, mobile devices (such as smartphones or tablets) communicate with one another via so-called base stations. Each base station must periodically allocate its available resources (time, power and bandwidth) in order to receive and transmit data efficiently (see, e.g., [5,8,16]). These days, many base stations follow an Orthogonal Frequency-Division Multiple Access (OFDMA) architecture. In an OFDMA system, we have a set I of communication channels, called subcarriers, and a set J of users. Each subcarrier can be assigned to at most one user, but a user may be assigned to more than one subcarrier. If we let p i denote the power (in watts) assigned to subcarrier i, the classical Shannon-Hartley theorem [21] states that the maximum data rate (in bits per second) that can be transmitted from subcarrier i is: where B i is the bandwidth of subcarrier i (in hertz), and N i is the level of noise in channel i (in watts). We remark that f i ( p i ) is concave over the domain p i ≥ 0. Moreover, if we let S j ⊂ I denote the set of subcarriers allocated to user j, the total data rate for user j is just i∈S j f i ( p i ).
A wide variety of optimisation problems concerned with OFDMA systems have been considered, with various objectives and side-constraints (see, e.g., [11,12,14,15,20,[22][23][24][25][26]). Recently, driven by environmental considerations, some authors working on OFDMA systems have focused on maximising energy efficiency, which is defined as total data rate divided by total power (e.g., [22][23][24]26]). This leads immediately to a fractional objective function, which is what led us to the present paper.

Bi-perspective functions and cuts
This section is concerned with bi-perspective (Bi-P) functions and cuts. In Sect. 3.1, we define Bi-P functions and point out that they are neither convex nor concave in general. In Sect. 3.2, we show how to compute concave over-estimators of Bi-P functions. Then, in Sect. 3.3, we present the Bi-P cuts.

Bi-P functions
To begin, we give a formal definition of Bi-P functions.

Definition 1
Let y be a vector of n continuous variables, let f (y) be a real function of y that is defined over a convex domain C ⊆ R n + , let x be an indicator variable, and let t be a continuous variable with domain [ , u], where 0 < < u. The function with domain x ∈ [0, 1], t ∈ [ , u] and y ∈ x t C, will be called the "bi-perspective" (Bi-P) function of f (y). (By convention, the Bi-P function takes the value zero when x t = 0 and y is the origin.) Whereas standard perspective functions preserve convexity and/or concavity, the same is not true for Bi-P functions. This is shown by the following example.
Example 1 Let f (y) = 1 for all y ∈ C. The Bi-P function g(x, t, y) is just xt, with domain x ∈ [0, 1], t ∈ [ , u] and y ∈ xtC. Since it is an indefinite quadratic function, it is neither convex nor concave over the given domain.

Concave envelope
Now suppose that f (y) is concave over C. Since the corresponding Bi-P function g(x, t, y) is not guaranteed to be concave over its domain, it is natural to seek the strongest possible concave over-estimating function (sometimes called the concave envelope). We will let h(x, t, y) denote this function. The following theorem expresses h(x, t, y) as the solution to an optimisation problem in the single continuous variable t .

Theorem 1 For any concave function f (y) with convex domain C, the concave envelope h(x, t, y) of the Bi-P function of f is equal to
: Eliminating x 0 and x 1 , and using the fact that y 0 must be the origin, this reduces to: We can now replace θ 1 and θ 0 with x and 1 − x, respectively, to give: Next, we can replace y 1 with y/x, to give: Finally, we eliminate t 0 , and replace t 1 with t to get the result.
In general, one must use calculus to solve the optimisation problem (2)-(4). In some important special cases, however, it is possible to obtain a closed-form solution.
To show this, we will need the following two definitions.

Definition 3 The function f (y) is said to be t-decreasing if
We then have the following two results:

Proposition 1 If f (y) is concave and t-increasing over convex domain C, then
Proof In this case, the optimum in (2)-(4) is obtained by setting t to its maximum possible value, which is min

Proposition 2 If f (y) is concave and t-decreasing over convex domain C, then
.
Proof In this case, the optimum in (2)-(4) is obtained by setting t to its minimum An obvious sufficient condition for a concave function to be t-increasing is for it to be non-negative and non-decreasing. The following proposition gives a rather different sufficient condition.
Given that > 0, t ≥ > 0 and f (0) ≥ 0, we have Multiplying both sides by t + yields the stated result.
Checking whether f (y) is t-decreasing, on the other hand, is more difficult. In particular, it is not sufficient for f (y) to be non-positive and non-decreasing. For example, the function f (y) = −2 y is non-positive and non-decreasing over the domain The same function also shows that it is not sufficient to have 0 ∈ C and f (0) ≤ 0. On the other hand, t-decreasing functions do exist. A trivial example is f (y) = −1. A more complex example is f (y) = − y 2 + 1.

Bi-P cuts
Observe that the nonlinear function (5) is non-differentiable not only when xt = 0, but also when u = t − (1 − x). This suggests that standard NLP solvers could struggle to handle functions of the form (5). Moreover, there are situations in which one might prefer to use an LP (or MILP) solver rather than an NLP (or MINLP) solver. So, following Frangioni and Gentile [6], we consider the hypograph of h(x, t, y), i.e., the set The following proposition gives a complete description of H by linear inequalities, for the case in which f (y) is t-increasing.

Proposition 4 When f (y) is t-increasing over C, the hypograph H is described by the linear inequalities
forȳ ∈ C.
Proof Let H + denote the set of 6-tuples x, t, y, z, α 1 , α 2 satisfying Given that the Eqs. (9) and (10) are linear, H + is an affine image of H . Moreover, using exactly the same argument as in [6], one can show that H + is described by the constraints (8) together with the following perspective cuts: Eliminating α 1 and α 2 yields the result.
We will call constraints (6) and (7) type-1 and type-2 Bi-P cuts, respectively. Note that the type-1 cuts can be derived as standard perspective cuts from the modified perspective function ux f (y/(ux)). The type-2 cuts, on the other hand, are harder to interpret.
We now make some remarks about Bi-P cuts.

Remark 1
The type 1 Bi-P cuts pass through all points satisfying t = , y = 0 and z = 0, along with the point x = 1, t = u, y = uȳ and z = u f ȳ u . Thus, they define maximal faces of H whenever the point ȳ, f (ȳ) lies on a maximal face of the hypograph of the original function f (y). A similar argument applies to the type-2 Bi-P cuts.

Remark 2
If we reduce the domain of t, the Bi-P cuts become stronger. Thus, if one wishes to make the cuts as tight as possible, it may be worthwhile applying "domain reduction" techniques (see, e.g., [18]) to t.

Remark 3
Let (x * , t * , y * , z * ) be the solution to some relaxation. To solve the separation problem for type-1 Bi-P cuts, it suffices to compute the quantity ux * f y * ux * . If this quantity is less than z * , then a cut is violated and the vectorȳ yielding the cut is y * /ux * . Similarly, to solve the separation problem for type-2 Bi-P cuts, it suffices to compute t * − (1 − x * ) f y * t * − (1−x * ) and compare it with z * .

Bi-P cuts and multiple-choice constraints
It is very common in integer programming to encounter constraints which state that at most one of a set of binary variables can take a positive value. Such constraints go by various names, such as multiple-choice constraints, clique constraints or generalised upper bounds. In this subsection, we consider the case in which each of the variables in the given set is an indicator variable.
To be more precise, suppose we have: -positive constants , u with < u; -an integer m ≥ 2 and positive integers n 1 , . . . , n m ; -a convex domain C j ⊆ R n j for j = 1, . . . , m; -a t-increasing concave function f j : R n j → R for j = 1, . . . , m.
Let Q denote the set of quadruples (x, y, z, t) satisfying Note that all points in Q satisfy the (non-convex) constraints provided that, as usual, we use the convention that the right-hand side evaluates to zero when t x j is zero. Together with Remark 3 in Sect. 3.2, this implies that all points in Q satisfy the convex constraints: From this, one can derive one family of type-1 and type-2 Bi-P cuts for each value of j.
Perhaps surprisingly, the addition of the constraints (18), (19) to the system (12)-(17) does not yield a complete description of the convex hull of Q. This is shown in the following example.
Example: Let be a small positive constant. Suppose that (i) = and u = 1; and (ii) n j = 1, C j = [0, 1] and f j y j 1 = y j 1 for all j. Consider the fractional point obtained by setting t to , all x and y variables to 1/m, and all z variables to /(m ) . One can check that this point satisfies (18) and (19) for all j, and therefore all Bi-P cuts. Now, observe that all points in Q satisfy the following convex inequality: The above-mentioned fractional point does not satisfy this, since the left-and righthand sides evaluate to (m ) 1− and 1− , respectively. Accordingly, the point cannot lie in the convex hull of Q.
Now, for any j ∈ {1, . . . , m} and anyȳ j ∈ C j , let ||ȳ j || denote f j (ȳ j ) −∇ f j (ȳ j ) · y j . The following theorem presents a huge family of valid inequalities, which generalise the type-2 Bi-P cuts (7). Theorem 2 Let S be any non-empty subset of {1, . . . , m}. For each j ∈ S, letȳ j be any point in C j such that ||ȳ j || > 0. Then the linear inequality is satisfied by all points in Q.
Proof We consider two cases: Case 1: x j = 0 for all j ∈ S. This forces y j i to be zero for all j ∈ S and for i = 1, . . . , n j . This in turn forces z j to be zero for all j ∈ S. Thus, the inequality reduces to t ≥ a, which is trivially valid.
Case 2: x j = 1 for some j ∈ S. This forces x k to be zero for all k ∈ S \ { j}, along with the associated y and z variables. Thus, the inequality reduces to Multiplying this by ||ȳ j || we obtain: This last inequality can be derived from the (convex) constraint z j ≤ t f j (y j /t), in exactly the same way as the perspective cuts.
We call the constraints (20) multiple-choice Bi-P cuts, or MC cuts for short. They reduce to type-2 Bi-P cuts when |S| = 1.
The separation problem for MC cuts can be solved in polynomial time as follows. Let x * , y * , z * , t * be the point to be separated. For j = 1, . . . , m, compute x * j f (y j ) * /( x * j ) . If this quantity is less than z * j , then insert j into S and setȳ j to (y j ) * /( x * j ); otherwise, do not insert j into S. Once this has been done for all j, check the MC cut for violation.

Application to OFDMA systems
We now apply the theoretical results in the last section to an optimisation problem associated with OFDMA systems. In Sect. 5.1, we define our problem formally and model it as a mixed 0-1 fractional program with indicator variables. In Sect. 5.2, we reformulate the problem as a semi-infinite mixed 0-1 linear program. In Sect. 5.3, we show how to strengthen the semi-infinite formulation using Bi-P cuts.

The problem
Let I , J , B i , N i and f i be defined as in Sect. 2.3. Let P > 0 denote the maximum power available, and let σ ∈ (0, P) denote the system power, which is the amount of power needed by the OFDMA system regardless of actual data rates. Finally, suppose that each user j ∈ J has a non-negative demand d j . The task is to maximise the energy efficiency, subject to the constraint that the total data rate for each user j is at least d j .
We call this problem the fractional subcarrier and power allocation problem with rate constraints (F-SPARC). (A related problem, called the SPARC, was studied in [12]. The difference is that the objective in the SPARC was simply to maximise the total data rate.) A natural formulation of the F-SPARC is obtained as follows. For all i ∈ I and j ∈ J , let x i j be a binary variable, indicating whether user j is assigned to subcarrier i. Also let p i j be a non-negative continuous variable, which represents the amount of power supplied to subcarrier i (if x i j = 1), or zero otherwise. We then have: The objective function (21) represents the total data rate divided by the total power used (including system power). The constraint (22) ensures that the total power used does not exceed the amount available. Constraints (23) ensure that each subcarrier is assigned to at most one user. Constraints (24) ensure that the user demands are met. Constraints From now on, we let D = j∈J d j denote the total user demand. Note that an upper bound for the F-SPARC can be computed by solving the following (continuous) fractional program: This fractional program can be converted into a convex program using the transformation mentioned in Sect. 2.1. We denote the corresponding upper bound by U .

Reformulation
We now reformulate the problem to make it easier to solve. This is done in three steps. The first step is to convert the fractional objective function into a concave function. To do this, we use the transformation mentioned in Sect. 2.1. Let t be a non-negative continuous variable, representing 1/(σ + i∈I j∈J p i j ), i.e., the reciprocal of the total power used. Also, for all i and j, letp i j be a non-negative continuous variable, representing t p i j . The problem is then equivalent to the following: : The objective function (28) is now concave, and all constraints are linear, apart from (31), which are convex. The second step is the following. For i ∈ I and j ∈ J , define a new variable, say z i j , representing the quantity t f i (p i j /t). The problem then becomes: Now, all of the nonlinearity has been "concentrated" in the (convex) constraints (33). Finally, we replace the constraints (33) with the following linear constraints: These linear constraints can be derived in the same way as standard perspective cuts.
Since they are infinite in number, we have formulated the F-SPARC as a semi-infinite mixed 0-1 linear program. It can be solved (to arbitrary fixed accuracy) with an LPbased branch-and-cut algorithm.

Bi-P cuts
We now use the results in Sect. 3 to strengthen our semi-infinite formulation of the F-SPARC. We start by reducing the domain of t. Recall that D denotes the total user demand. We compute the following lower bound on the total amount of power used: The upper bound on t in (30) can then be reduced from 1/σ to 1/P min . Now, let i and j be fixed, and consider the constraints (27), (32) and (33). It is clear that all feasible triples (x i j ,p i j , r i j ) satisfy where, as usual, the convention is that the function on the right-hand side takes the value zero whenp i j = t x i j = 0. The function is clearly a t-increasing Bi-P function.
The only tricky part is that the natural domain ofp i j is [0, 1], which is not equal to t x i j C for some convex domain C. Despite this complication, we have the following result.
Proposition 5 For all i ∈ I , j ∈ J andp ∈ [0, P − σ ], the following type-1 and type-2 Bi-P cuts are valid for the F-SPARC: Proof We already know that t must lie in the interval [1/P, 1/P min ]. Also, p i j ≤ P − σ , or, equivalently,p i j ∈ t [0, P − σ ]. In fact, given thatp i j must be zero when x i j is zero, we can conclude thatp i j ∈ t x i j [0, P − σ ]. We can now generate Bi-P cuts as usual.
Note that the type-2 cuts (36) dominate (34). Finally, since the constraints (23) are multiple-choice constraints, we can also generate MC cuts. In our preliminary experiments, we found that the most useful MC cuts were those with (a) S = J and (b)p i j equal to the same (positive) value for all j ∈ J . These cuts take the form: We remark that the cuts (37) collectively enforce the following convex inequalities: From this it can be shown that the upper bound obtained by adding all of the cuts (37) to the continuous relaxation of the semi-infinite formulation is equal to U , the upper bound mentioned at the end of Sect. 5.1. We remark that the separation problem for the cuts (37) can be solved simply by settingp to the current value of j∈Jp i j , and checking the inequality (38) for violation. This can be done in O(|J |) time for a given i.

Computational experiments
In the previous section, we described three families of cutting planes for the F-SPARC: the type-1 cuts (35), the type-2 cuts (36), and the special MC cuts (37). In this section, we present some computational results to shed light on the relative usefulness of these different kinds of cuts.

Test instances
To construct our test instances, we used the procedure described in [12], which is designed to produce instances typical of a small base station. In detail, the instances have |I | = 72, |J | ∈ {4, 6}, and P set to 36 watts. The noise powers N i are random numbers distributed uniformly in (10 −6 , 10 −5 ), and the bandwidths B i are all set to 1.25MHz. The user demands follow a lognormal distribution.
Let D be the total demand, as before, and let M be the maximum possible data rate of the system. As in [12], we call the quantity D/M the demand ratio (DR) of the given instance. The user demands are scaled so that the DR takes values in {0.75, 0.8, 0.85, 0.9, 0.95}. (The closer the DR is to 1, the harder the instance tends to be.) For each combination of |J | and DR, we constructed 10 random instances. This makes 2 × 5 × 10 = 100 instances in total. These instances have been made available at: http://www.research.lancs.ac.uk/portal/en/datasets/search.html under "OFDMA Optimisation".

Experimental setup
For each instance, we did the following. We began by computing the upper bound U mentioned at the end of Sect. 5.1, using MOSEK. We then ran the heuristic described in  [13] to compute a lower bound. If the bounds differed by more than 0.1%, we ran an exact algorithm, similar to the one described in [12], until the difference between the lower and upper bounds dropped below 0.1%. We remark that this exact algorithm took a long time (sometimes several minutes) to converge for some of our test instances. The next step was to solve, for each instance, the continuous relaxation of the formulation described in Sect. 5.2. To do this, we used a (fairly standard) LP-based cutting-plane algorithm, in which the inequalities (34) were added dynamically as cutting planes. In each major iteration, all inequalities violated by more than 10 −4 were added to the LP. A time limit of two minutes per instance was also imposed. (In most cases, tailing off occurred well before the time limit.) The cutting-plane algorithm was coded in Julia v0.5 and run on a virtual machine cluster with 16 CPUs (ranging from Sandy Bridge to Haswell architectures) and 16GB of RAM, under Ubuntu 16.04.1 LTS. The program used the LP solver from the CPLEX 12.6.3 Callable Library (with default settings). More specifically, we used primal simplex to solve the initial relaxation and dual simplex to re-optimise after adding cuts.
Finally, we ran the cutting-plane algorithm again for each instance, switching on and off various combinations of the cuts (35)-(37). The purpose of this was to enable us to identify the cuts that tend to be most useful in practice.

Results
We present results for six versions of the cutting-plane algorithm: -"∅": Constraints (34) alone, without any Bi-P cuts.
-"All": Type-1 cuts (35), type-2 cuts (36) and MC cuts (37). Tables 1 and 2 show, for each set of 10 instances and each combination of cutting planes, the average gap between the upper bound from the cutting-plane algorithm and the lower bound from the exact algorithm, expressed as a percentage of the lower bound. From the tables, we see that the unstrengthened cuts (34) lead to extremely poor upper bounds in all cases. Moreover, the gaps for |J | = 4 are much worse than the gaps for |J | = 6. Type-1, type-2 and MC cuts all close the gap considerably, but type-1 and type-2 cuts appear to be more effective than MC cuts. Using type-1 and type-2 cuts in combination is particularly effective. In fact, the benefit gained by including MC cuts as well is rather small.

Concluding remarks
Perspective reformulations and cuts are an invaluable tool for both fractional programming and MINLP with indicator variables. We have shown that, when one is dealing with a mixed-integer fractional program with indicator variables, one needs to use "biperspective" reformulations and cuts in order to obtain bounds that are useful within an exact solution algorithm. We believe that extensions of perspective reformulations and cuts to other classes of problems would be a valuable topic for future research.
As for our specific application, to optimisation in OFDMA systems, we plan to look next at stochastic dynamic variants of the problem, in which users arrive and depart at random over time.