1 Introduction

In this paper, we study non-convex optimization problems of the form

(1)

where is a generic function, \(g: {\mathbb {R}}^n \mapsto {\mathbb {R}}\) is concave, \(\varvec{A} \in {\mathbb {R}}^{m \times n}\) and \(\varvec{b} \in {\mathbb {R}}^m\). Since f is not necessarily concave, problem (1) is a hard optimization problem even if P = NP [22, Theorem 1]. In the special case where f is convex, problem (1) recovers the class of DC (difference-of-convex-functions) optimization problems over a polyhedron [13]. Significant efforts have been devoted to solving problem (1) exactly (most commonly via branch-and-bound techniques) or approximately (often via convex approximations). For both tasks, the Reformulation-Linearization Technique (RLT) can be used to obtain tight yet readily solvable convex relaxations of (1).

Originally, RLT has been introduced to equivalently reformulate binary quadratic optimization problems as mixed-binary linear optimization problems [1]. To this end, each linear constraint in the original problem is multiplied with each binary decision variable to generate implied quadratic inequalities. These inequalities are subsequently linearized through the introduction of auxiliary decision variables whose values coincide with the generated quadratic terms. This idea is reminiscent of the McCormick envelopes [17], which relax bilinear expressions by introducing implied inequalities that are subsequently linearized. RLT has been extended to (continuous) polynomial optimization problems [26], where implied inequalities are generated from multiplying and subsequently linearizing existing bound constraints.

In this work, we consider a variant of RLT—the Reformulation-Convexification Technique [27]—which applies to linearly constrained optimization problems that maximize a non-concave objective function. This RLT variant (which we henceforth simply call ‘RLT’ for ease of exposition) replaces the non-concave function f in problem (1) with an auxiliary function \(f' : {\mathbb {R}}^{n \times n} \times {\mathbb {R}}^n \mapsto {\mathbb {R}}\) that is concave over the lifted domain \((\varvec{X}, \varvec{x}) \in {\mathbb {S}}^n \times {\mathbb {R}}^n\) and that satisfies \(f' (\varvec{X}, \varvec{x}) = f (\varvec{x})\) whenever \(\varvec{X} = \varvec{x} \varvec{x}^\top \). For the special case where \(f (\varvec{x}) = \varvec{x}^\top \varvec{P} \varvec{x}\) for an indefinite symmetric matrix \(\varvec{P} \in {\mathbb {S}}^n\), for example, we can choose \(f' (\varvec{X}, \varvec{x}) = \langle \varvec{P}, \varvec{X} \rangle \). RLT then augments problem (1) with the decision matrix \(\varvec{X} \in {\mathbb {S}}^n\) and the constraints

$$\begin{aligned} \varvec{a}_i^\top \varvec{X} \varvec{a}_j - (b_i \varvec{a}_j + b_j \varvec{a}_i)^\top \varvec{x} + b_i b_j \ge 0 \qquad \forall i, j = 1, \ldots , m, \end{aligned}$$
(2)

where \(\varvec{a}_i^\top \) denotes the i-th row of the matrix \(\varvec{A}\). The constraints (2) are justified by the fact that the pairwise multiplications \((\varvec{a}_i^\top \varvec{x} - b_i) (\varvec{a}_j^\top \varvec{x} - b_j)\) of the constraints in problem (1) have to be non-negative, and those multiplications coincide with the constraints (2) whenever \(\varvec{X} = \varvec{x} \varvec{x}^\top \). To obtain a convex relaxation of problem (1), the non-convex constraint \(\varvec{X} = \varvec{x} \varvec{x}^\top \) is either removed (which we henceforth refer to as ‘classical RLT’, see [24]) or relaxed to the linear matrix inequality (LMI) constraint \(\varvec{X} \succeq \varvec{x} \varvec{x}^\top \) (henceforth referred to as RLT/SDP, see [2, 3, 25]). Even though the matrix \(\varvec{X}\) linearizes quadratic terms, we emphasize that the problems we are considering are not restricted to quadratic programs since f may be a generic nonlinear function.

RLT and its extensions have been exceptionally successful in providing tight approximations to indefinite quadratic [25], polynomial [26] and generic non-convex optimization problems [15, 29], and RLT is routinely implemented in state-of-the-art optimization software, including ANTIGONE [20], CPLEX [14], GLoMIQO [19] and GUROBI [10].

In this paper, we assume that the constraints of problem (1) describe an n-dimensional simplex. Under this assumption, we show that for a large class of functions f that admit a monotone lifting (which includes, among others, various transformations of quadratic functions as well as the negative entropy), the RLT relaxation of problem (1) admits an optimal solution \((\varvec{X}^\star , \varvec{x}^\star )\) that satisfies \(\varvec{X}^\star = \text {diag} (\varvec{x}^\star )\). This has two important consequences. Firstly, we show that when the feasible region of problem (1) is a simplex, \(\varvec{X}^\star = {{\,\mathrm{diag}\,}}(\varvec{x}^\star )\) satisfies \(\varvec{X}^\star \succeq \varvec{x}^\star \varvec{x}^{\star \top }\), that is, the RLT and RLT/SDP relaxations are equivalent, and the computationally expensive LMI constraint \(\varvec{X} \succeq \varvec{x} \varvec{x}^\top \) can be omitted in RLT/SDP. Secondly, we do not need to introduce the decision matrix \(\varvec{X} \in {\mathbb {S}}^n\) in the RLT relaxation, which amounts to a dramatic reduction in the size of the resulting relaxation. We also discuss how our result can be extended to instances of problem (1) over the Cartesian product of two simplices, a generic polyhedron, or a non-convex feasible region as well as an indefinite quadratic objective function.

Indefinite quadratic optimization over simplices (also known as standard quadratic optimization) has a long history, and it has found applications, among others, in mean/variance portfolio selection and the determination of the maximal cliques on a node-weighted graph [6]. More generally, non-convex polynomial optimization problems over simplices have been proposed for the global optimization of neural networks [4], portfolio optimization using the expected shortfall risk measure [5] and the computation of the Lebesgue constant for polynomial interpolation over a simplex [11]; see [8] for a general discussion. Simplicial decompositions of non-convex optimization problems are also studied extensively in the global optimization literature [12].

The remainder of this paper proceeds as follows. We analyze the RLT relaxations of simplex instances of problem (1) in Sect. 2 and report on numerical experiments in Sect. 3, respectively. “Appendix A” extends our findings to well-structured optimization problems over the Cartesian product of two simplices, specific classes of polyhedral and non-convex feasible regions, as well as indefinite quadratic objective functions. “Appendix B”, finally, contains additional numerical experiments.

Notation. We denote by \({\mathbb {R}}^n\) (\({\mathbb {R}}^n_{+}\)) the (non-negative orthant of the) n-dimensional Euclidean space and by \({\mathbb {Q}}\) the set of rational numbers. The cone of (positive semidefinite) symmetric matrices in \({\mathbb {R}}^{n \times n}\) is denoted by \({\mathbb {S}}^n\) (\({\mathbb {S}}^n_+\)). Bold lower and upper case letters denote vectors and matrices, respectively, while standard lower case letters are reserved for scalars. We denote the i-th component of a vector \(\varvec{x}\) by \(x_i\), the (ij)-th element of a matrix \(\varvec{A}\) by \(A_{ij}\) and the i-th row of a matrix \(\varvec{A}\) by \(\varvec{a}_i^\top \). We write \(\varvec{X} \succeq \varvec{Y}\) to indicate that \(\varvec{X} - \varvec{Y}\) is positive semidefinite. The trace operator is denoted by \({{\,\mathrm{tr}\,}}(\cdot )\), and the trace inner product between two symmetric matrices is given by \(\langle \cdot , \cdot \rangle \). Finally, \({{\,\mathrm{diag}\,}}(\varvec{x})\) is a diagonal matrix whose diagonal elements coincide with the components of the vector \(\varvec{x}\).

2 RLT and RLT/SDP over simplices

This section studies instances of problem (1) where the constraints \(\varvec{A} \varvec{x} \le \varvec{b}\) describe the n-dimensional probability simplex:

$$\begin{aligned} \begin{array}{c@{\quad }l} \sup \limits _{\varvec{x}} &{} f(\varvec{x}) + g(\varvec{x}) \\ {{\,\mathrm{s.t.}\,}}&{} \displaystyle \sum _{i=1}^n x_i = 1 \\ &{} \varvec{x} \in {\mathbb {R}}^n_+. \end{array} \end{aligned}$$
(3)

Assuming that the feasible region describes a probability simplex, as opposed to any other full-dimensional simplex in \({\mathbb {R}}^n\), does not restrict generality. Indeed, we can always redefine the objective function as \( f(\varvec{x}) \leftarrow f(\varvec{T} \varvec{x})\) and \(g(\varvec{x}) \leftarrow g(\varvec{T} \varvec{x})\) for the invertible matrix \(\varvec{T} \in {\mathbb {R}}^{n \times n}\) that has as columns the extreme points of the simplex to be considered. The pairwise products between the constraints \(x_i \ge 0\), \(i = 1, \ldots , n\), and \(\sum _{i=1}^n x_i = 1\) result in the RLT constraints

$$\begin{aligned} \varvec{X} \ge \mathbf {0}, \qquad \sum _{j = 1}^n X_{ij} = \sum _{j = 1}^n X_{ji} = x_i \quad \forall i = 1, \ldots , n; \end{aligned}$$

here we omit the constraint \(\sum _{i = 1}^n \sum _{j = 1}^n X_{ij} = 1\) as it is implied by the above constraints and the fact that \(\sum _{i = 1}^n x_i = 1\). Thus, the RLT relaxation of problem (3) can be written as

$$\begin{aligned} \begin{array}{c@{\quad }l@{\qquad }l} \sup \limits _{\varvec{X}, \varvec{x}} &{} f' (\varvec{X}, \varvec{x}) + g(\varvec{x}) \\ {{\,\mathrm{s.t.}\,}}&{} \displaystyle \sum _{j = 1}^n X_{ij} = \sum _{j = 1}^n X_{ji} = x_i &{} \forall i = 1, \ldots , n \\ &{} \displaystyle \sum _{i=1}^n x_i = 1 \\ &{} \varvec{X} \ge \mathbf {0}, \;\; \varvec{X} \in {\mathbb {S}}^n, \;\; \varvec{x} \in {\mathbb {R}}^n_+, \end{array} \end{aligned}$$
(4)

where the auxiliary function \(f'\) has to be suitably chosen, while the RLT/SDP relaxation contains the additional LMI constraint \(\varvec{X} \succeq \varvec{x} \varvec{x}^\top \).

We now define a condition which ensures that the RLT relaxation (4) of problem (3) admits an optimal solution \((\varvec{X}^\star , \varvec{x}^\star )\) with \(\varvec{X}^\star = \text {diag} \, (\varvec{x}^\star )\).

Definition 1

We say that \(f : {\mathbb {R}}^n_+ \mapsto {\mathbb {R}}\) has a monotone lifting if there is a concave function \(f' : {\mathbb {S}}^n \times {\mathbb {R}}^n_+ \mapsto {\mathbb {R}}\) such that \(f' (\varvec{X}, \varvec{x}) = f ( \varvec{x})\) whenever \(\varvec{X} = \varvec{x} \varvec{x}^\top \), as well as \(f' (\varvec{X}', \varvec{x}) \ge f' (\varvec{X}, \varvec{x})\) for all \((\varvec{X}, \varvec{x}) \in {\mathbb {S}}^n \times {\mathbb {R}}^n_+\) and all \(\varvec{X}' \in {\mathbb {S}}^n\) satisfying \(\varvec{X}' \succeq \varvec{X}\).

The requirement in Definition 1 that \(f' (\varvec{X}, \varvec{x}) = f ( \varvec{x})\) whenever \(\varvec{X} = \varvec{x} \varvec{x}^\top \) is needed for the correctness of the RLT relaxation. The concavity of \(f'\) is required for the RLT relaxation to be a convex optimization problem. The assumption that \(f' (\varvec{X}', \varvec{x}) \ge f' (\varvec{X}, \varvec{x})\) for all \((\varvec{X}, \varvec{x}) \in {\mathbb {S}}^n \times {\mathbb {R}}^n_+\) and all \(\varvec{X}' \in {\mathbb {S}}^n\) satisfying \(\varvec{X}' \succeq \varvec{X}\), finally, will allow us to deduce an optimal solution for \(\varvec{X}\) based on the value of \(\varvec{x}\). Indeed, we will see below in Theorem 1 that the RLT relaxation (4) of an instance of problem (3) admits optimal solutions \((\varvec{X}^\star , \varvec{x}^\star )\) satisfying \(\varvec{X}^\star = {{\,\mathrm{diag}\,}}(\varvec{x}^\star )\) whenever the auxiliary function \(f'\) in (4) is a monotone lifting of the function f in (3). Intuitively speaking, Definition 1 enables us to weakly improve any solution \((\varvec{X}, \varvec{x})\) satisfying \(\varvec{X} \ne \mathrm {diag}(\varvec{x})\) by iteratively moving off-diagonal elements of \(\varvec{X}\) to the diagonal. Before presenting the formal result, we provide some examples of functions f that admit monotone liftings.

Proposition 1

The following function classes have monotone liftings:

  1. 1.

    Generalized linearithmic functions: \(f (\varvec{x}) = \sum _{\ell = 1}^L (\varvec{t}_\ell ^\top \varvec{x} + t_\ell ) \cdot h_\ell (\varvec{t}_\ell ^\top \varvec{x} + t_\ell )\) with (i) \(\varvec{t}_\ell \in {\mathbb {R}}^n_+\), \(t_\ell \in {\mathbb {R}}_+\) and \(h_\ell : {\mathbb {R}} \mapsto {\mathbb {R}}\) concave and non-decreasing, or (ii) \(\varvec{t}_\ell \in {\mathbb {R}}^n\), \(t_\ell \in {\mathbb {R}}\) and \(h_\ell : {\mathbb {R}} \mapsto {\mathbb {R}}\) affine and non-decreasing.

  2. 2.

    Linear combinations: \(f(\varvec{x}) = \sum _{\ell = 1}^L t_\ell \cdot f_\ell (\varvec{x})\) with \(t_\ell \in {\mathbb {R}}_+\), where each \(f_\ell : {\mathbb {R}}^n \mapsto {\mathbb {R}}\) has a monotone lifting.

  3. 3.

    Concave compositions: \(h (\varvec{x}) = g (f (\varvec{x}))\) for \(f : {\mathbb {R}}^n_+ \mapsto {\mathbb {R}}\) with a monotone lifting as well as a concave and non-decreasing \(g : {\mathbb {R}} \mapsto {\mathbb {R}}\).

  4. 4.

    Linear pre-compositions: \(h(\varvec{x}) = f(\varvec{T} \varvec{x})\) for \(f:{\mathbb {R}}^{{p}}_+ \mapsto {\mathbb {R}}\) with a monotone lifting as well as \(\varvec{T} \in {\mathbb {R}}^{{p} \times n}\).

  5. 5.

    Pointwise minima: \(h(\varvec{x}) = \min \{f_1(\varvec{x}), \ldots , f_L (\varvec{x})\}\) where each \(f_\ell : {\mathbb {R}}^n_+ \mapsto {\mathbb {R}}\) has a monotone lifting.

Proof

In view of case (i) of the first statement, we choose

$$\begin{aligned} f'(\varvec{X}, \varvec{x}) \; = \; \sum _{\ell = 1}^L (\varvec{t}_\ell ^\top \varvec{x} + t_\ell ) \cdot h_\ell \left( \frac{\varvec{t}_\ell ^\top \varvec{X} \varvec{t}_\ell + 2 t_\ell \varvec{t}_\ell ^\top \varvec{x} + t_\ell ^2}{\varvec{t}_\ell ^\top \varvec{x} + t_\ell } \right) , \end{aligned}$$

which is concave in \((\varvec{X}, \varvec{x})\) since it constitutes the sum of perspectives of concave functions [7, §3.2.2 and §3.2.6]. Whenever \(\varvec{X} = \varvec{x} \varvec{x}^\top \), we have

$$\begin{aligned} \varvec{t}_\ell ^\top \varvec{X} \varvec{t}_\ell + 2 t_\ell \varvec{t}_\ell ^\top \varvec{x} + t_\ell ^2 \; = \; \varvec{t}_\ell ^\top \varvec{x} \varvec{x}^\top \varvec{t}_\ell + 2 t_\ell \varvec{t}_\ell ^\top \varvec{x} + t_\ell ^2 \; = \; (\varvec{t}_\ell ^\top \varvec{x} + t_\ell )^2, \end{aligned}$$

and thus the standard limit convention for perspective functions implies that \(f'(\varvec{X}, \varvec{x}) = f (\varvec{x})\) for all \(\varvec{x} \in {\mathbb {R}}^n_+\). Moreover, for any \(\varvec{x} \in {\mathbb {R}}^n_+\), we have

$$\begin{aligned} \varvec{t}_\ell ^\top \varvec{X}' \varvec{t}_\ell \;\ge \; \varvec{t}_\ell ^\top \varvec{X} \varvec{t}_\ell \qquad \forall \varvec{X}, \varvec{X}' \in {\mathbb {S}}^n \, : \, \varvec{X}' \succeq \varvec{X}, \end{aligned}$$

where the inequality holds since \(\varvec{X}' - \varvec{X} \succeq 0\). We conclude that

$$\begin{aligned} h_\ell \left( \frac{\varvec{t}_\ell ^\top \varvec{X}' \varvec{t}_\ell + 2 t_\ell \varvec{t}_\ell ^\top \varvec{x} + t_\ell ^2}{\varvec{t}_\ell ^\top \varvec{x} + t_\ell } \right) \; \ge \; h_\ell \left( \frac{\varvec{t}_\ell ^\top \varvec{X} \varvec{t}_\ell + 2 t_\ell \varvec{t}_\ell ^\top \varvec{x} + t_\ell ^2}{\varvec{t}_\ell ^\top \varvec{x} + t_\ell } \right) \end{aligned}$$

as \(2 t_\ell \varvec{t}_\ell ^\top \varvec{x} + t_\ell ^2 \ge 0\) and \(\varvec{t}_\ell ^\top \varvec{x} + t_\ell \ge 0\) due to the non-negativity of \(\varvec{t}_\ell \), \(t_\ell \) and \(\varvec{x}\), which implies that \(f' (\varvec{X}', \varvec{x}) \ge f' (\varvec{X}, \varvec{x})\) as desired.

One readily verifies that in the special case where each \(h_\ell \) is affine, the concavity of \(f'\), the agreement of \(f'\) with f when \(\varvec{X} = \varvec{x} \varvec{x}^\top \) and the monotonicity of \(f'\) with respect to \(\varvec{X}' \succeq \varvec{X}\) continue to hold even when \(\varvec{t}_\ell \) and/or \(t_\ell \) fail to be non-negative. This establishes case (ii) of the first statement.

As for the second statement, let \(f_\ell ' : {\mathbb {S}}^n \times {\mathbb {R}}^n_+ \mapsto {\mathbb {R}}\) be monotone liftings of \(f_\ell \), \(\ell = 1, \ldots , L\). We claim that \(f' (\varvec{X}, \varvec{x}) = \sum _{\ell = 1}^L t_\ell \cdot f_\ell ' (\varvec{X}, \varvec{x})\) is a monotone lifting of f. Indeed, one readily verifies that \(f'\) inherits concavity in \((\varvec{X}, \varvec{x})\) and agreement with f when \(\varvec{X} = \varvec{x} \varvec{x}^\top \) from its constituent functions \(f'_\ell \). Moreover, since \(f'_\ell (\varvec{X}', \varvec{x}) \ge f'_\ell (\varvec{X}, \varvec{x})\) for all \(\varvec{X}, \varvec{X}' \in {\mathbb {S}}^n\) with \(\varvec{X}' \succeq \varvec{X}\), \(\ell = 1, \ldots , L\), we have \(f' (\varvec{X}', \varvec{x}) \ge f' (\varvec{X}, \varvec{x})\) as well.

In view of the third statement, let \(f'\) be a monotone lifting of f. We claim that in this case, \(h' (\varvec{X}, \varvec{x}) = g (f' (\varvec{X}, \varvec{x}))\) is a monotone lifting of h. Indeed, \(h'\) is a non-decreasing concave transformation of a concave function and is thus concave [7, §3.2.5]. Moreover, since \(f' (\varvec{X}, \varvec{x}) = f (\varvec{x})\) for \(\varvec{X} = \varvec{x} \varvec{x}^\top \), we have

$$\begin{aligned} h' (\varvec{X}, \varvec{x}) \; = \; g (f' (\varvec{X}, \varvec{x})) \; = \; g (f (\varvec{x})) \; = \; h (\varvec{x}) \end{aligned}$$

whenever \(\varvec{X} = \varvec{x} \varvec{x}^\top \). Finally, the monotonicity of g implies that

$$\begin{aligned} h' (\varvec{X}', \varvec{x}) \; = \; g (f' (\varvec{X}', \varvec{x})) \; \ge \; g (f' (\varvec{X}, \varvec{x})) \; = \; h' (\varvec{X}, \varvec{x}) \end{aligned}$$

for all \(\varvec{X}, \varvec{X}' \in {\mathbb {S}}^n\) with \(\varvec{X}' \succeq \varvec{X}\).

For the fourth statement, we set \(h'(\varvec{X}, \varvec{x}) = f'(\varvec{T} \varvec{X} \varvec{T}^\top , \varvec{T} \varvec{x})\), where \(f'\) is a monotone lifting of f. The function \(h'\) is concave since it constitutes a composition of a concave function with a linear function [7, §3.2.2]. Moreover, for any \(\varvec{x} \in {\mathbb {R}}^n_+\) and \(\varvec{X} = \varvec{x} \varvec{x}^\top \), we have

$$\begin{aligned} h' (\varvec{X}, \varvec{x}) \; = \; f' (\varvec{T} \varvec{X} \varvec{T}^\top , \varvec{T} \varvec{x}) \; = \; f (\varvec{T} \varvec{x}), \end{aligned}$$

where the second identity holds since \(\varvec{T} \varvec{X} \varvec{T}^\top = (\varvec{T} \varvec{x}) (\varvec{T} \varvec{x})^\top \) whenever \(\varvec{X} = \varvec{x} \varvec{x}^\top \) as well as \(f' (\varvec{X}, \varvec{x}) = f (\varvec{x})\) for \(\varvec{X} = \varvec{x} \varvec{x}^\top \). To see that \(h' (\varvec{X}', \varvec{x}) \ge h' (\varvec{X}, \varvec{x})\) for all \(\varvec{x} \in {\mathbb {R}}^n_+\) and all \(\varvec{X}, \varvec{X}' \in {\mathbb {S}}^n\) satisfying \(\varvec{X}' \succeq \varvec{X}\), we note that

$$\begin{aligned} h' (\varvec{X}', \varvec{x}) \; = \; f' (\varvec{T} \varvec{X}' \varvec{T}^\top , \varvec{T} \varvec{x}) \; \ge \; f' (\varvec{T} \varvec{X} \varvec{T}^\top , \varvec{T} \varvec{x}) \; = \; h' (\varvec{X}, \varvec{x}), \end{aligned}$$

where the inequality follows from the fact that

$$\begin{aligned} \varvec{X}' \succeq \varvec{X} \;\; \Longrightarrow \;\; \varvec{T} (\varvec{X}' - \varvec{X}) \varvec{T}^\top \succeq \mathbf {0} \;\; \Longrightarrow \;\; \varvec{T} \varvec{X}' \varvec{T}^\top \succeq \varvec{T} \varvec{X} \varvec{T}^\top \end{aligned}$$

and the assumption that \(f'\) is a monotone lifting.

For the last statement, we set \(h' (\varvec{X}, \varvec{x}) = \min \{f_1'(\varvec{X}, \varvec{x}), \ldots , f_L'(\varvec{X}, \varvec{x})\}\), where \(f'_\ell :{\mathbb {S}}^n \times {\mathbb {R}}^n_+ \mapsto {\mathbb {R}}\) is a monotone lifting of \(f_\ell \) for all \(\ell = 1,\ldots ,L\). The function \(h'\) is concave as it is a minimum of concave functions [7, §3.2.3]. Moreover, for any \(\varvec{x} \in {\mathbb {R}}^n_+\) and \(\varvec{X} = \varvec{x} \varvec{x}^\top \), we have

$$\begin{aligned} h' (\varvec{X}, \varvec{x}) \; = \; \min \{ f'_1 (\varvec{X}, \varvec{x}), \ldots , f'_L (\varvec{X}, \varvec{x}) \} \; = \; \min \{ f_1 (\varvec{x}), \ldots , f_L (\varvec{x}) \} \; = \; f (\varvec{x}), \end{aligned}$$

since each \(f'_\ell \) is a monotone lifting of \(f_\ell \). Similarly, for any \(\varvec{x} \in {\mathbb {R}}^n_+\) and any \(\varvec{X}, \varvec{X}' \in {\mathbb {S}}^n\) satisfying \(\varvec{X}' \succeq \varvec{X}\), we have

$$\begin{aligned} h' (\varvec{X}', \varvec{x}) \;&= \; \min \{ f'_1 (\varvec{X}', \varvec{x}), \ldots , f'_L (\varvec{X}', \varvec{x}) \} \\&\ge \; \min \{ f'_1 (\varvec{X}, \varvec{x}), \ldots , f'_L (\varvec{X}, \varvec{x}) \} \;\; = \; h' (\varvec{X}, \varvec{x}), \end{aligned}$$

where the inequality again follows from the fact that each \(f'_\ell \) is a monotone lifting of \(f_\ell \). This concludes the proof. \(\square \)

Through an iterative application of its rules, Proposition 1 allows us to construct a rich family of functions that admit monotone liftings. We next list several examples that are of particular interest.

Corollary 1

The functions listed below have monotone liftings.

  1. 1.

    Convex quadratic functions: \(f (\varvec{x}) = \varvec{x}^\top \varvec{Q} \varvec{x} + \varvec{q}^\top \varvec{x} + q\) with \(\varvec{Q} \in {\mathbb {S}}^n_+\).

  2. 2.

    Conic quadratic functions: \(f (\varvec{x}) = \Vert \varvec{F} \varvec{x} \Vert _2 + \varvec{f}^\top \varvec{x} + f\), where \(\varvec{F} \in {\mathbb {R}}^{k \times n}\), \(\varvec{f} \in {\mathbb {R}}^n\) and \(f \in {\mathbb {R}}\).

  3. 3.

    Negative entropy: \(f (\varvec{x}) = \sum _{i = 1}^n c_i \cdot x_i \ln x_i\) with \(c_i \in {\mathbb {R}}_+\).

  4. 4.

    Power functions: \(f(x) = x^a\) with \(a \in [1, 2]\) and \(a \in {\mathbb {Q}}\).

Proof

In view of the first statement, let \(\varvec{Q} = \varvec{L}^\top \varvec{L}\) for \(\varvec{L} \in {\mathbb {R}}^{n \times n}\), where \(\varvec{L}\) can be computed from a Cholesky decomposition. Identifying \(\varvec{t}_\ell ^\top \) with the \(\ell \)-th row of \(\varvec{L}\) and setting \(t_\ell = 0\), \(\ell = 1, \ldots , n\), we then obtain

$$\begin{aligned} f (\varvec{x}) \;&= \; (\varvec{L} \varvec{x})^\top (\varvec{L} \varvec{x}) \; + \; \varvec{q}^\top \varvec{x} + q \\&= \; \sum _{\ell = 1}^n (\varvec{t}_\ell ^\top \varvec{x})^2 \; + \; \varvec{q}^\top \varvec{x} + q \\&= \; \sum _{\ell = 1}^n (\varvec{t}_\ell ^\top \varvec{x} + t_\ell ) \cdot h_\ell (\varvec{t}_\ell ^\top \varvec{x} + t_\ell ) \; + \; \varvec{q}^\top \varvec{x} + q, \end{aligned}$$

where \(h_\ell : {\mathbb {R}} \mapsto {\mathbb {R}}\) is the identity function, \(\ell = 1, \ldots , n\). The first expression on the right-hand side satisfies the conditions of the first statement of Proposition 1 and thus admits a monotone lifting. The remaining term \(g (\varvec{x}) = \varvec{q}^\top \varvec{x} + q\) admits the trivial lifting \(g' (\varvec{X}, \varvec{x}) = \varvec{q}^\top \varvec{x} + q\), and the second statement of Proposition 1 thus implies that the function f has a monotone lifting as well.

As for the second statement, we note that

$$\begin{aligned} f (\varvec{x}) \; = \; \sqrt{\varvec{x}^\top \varvec{F}^\top \varvec{F} \varvec{x}} \; + \; \varvec{f}^\top \varvec{x} + f. \end{aligned}$$

Since \(\varvec{F}^\top \varvec{F} \succeq \varvec{0}\) by construction, the term \(\varvec{x}^\top \varvec{F}^\top \varvec{F} \varvec{x}\) has a monotone lifting due to the first statement of this corollary. Moreover, since \(x \mapsto \sqrt{x}\) is non-decreasing and concave, the third statement of Proposition 1 implies that the expression \(\sqrt{\varvec{x}^\top \varvec{F}^\top \varvec{F} \varvec{x}}\) admits a monotone lifting. The remaining term \(g (\varvec{x}) = \varvec{f}^\top \varvec{x} + f\) again admits the trivial lifting \(g' (\varvec{X}, \varvec{x}) = \varvec{f}^\top \varvec{x} + f\), and the second statement of Proposition 1 thus implies that the function f has a monotone lifting as well.

In view of the third statement, we first note that each term \(x_i \ln x_i\) has a monotone lifting if we choose \(\varvec{t}_i = \mathbf {e}_i\), where \(\mathbf {e}_i\) denotes the i-th canonical basis vector in \({\mathbb {R}}^n\), and \(t_i = 0\) in the first statement of Proposition 1. Since f constitutes a weighted sum of these terms, the existence of its monotone lifting then follows from the second statement of Proposition 1.

As for the last statement, we note that \(f(x) = x \cdot h(x)\) with \(h(x) = x^{a-1}\). Since h is concave and non-decreasing, the first statement of Proposition 1 implies that f has a monotone lifting. \(\square \)

Any indefinite quadratic function can be represented as the sum of a convex quadratic and a concave quadratic function [9, 23]. Thus, if problem (3) optimizes an indefinite quadratic function over a simplex (i.e., if it is a standard quadratic optimization problem), then we can redefine its objective function as a sum of a convex quadratic and a concave quadratic function and subsequently apply the first statement in Proposition 1 to the convex part of the objective function.

We are now ready to prove the main result of this section.

Theorem 1

If the function f in problem (3) has a monotone lifting \(f'\), then the corresponding RLT relaxation (4) has an optimal solution \((\varvec{X}^\star , \varvec{x}^\star )\) satisfying \(\varvec{X}^\star = {{\,\mathrm{diag}\,}}(\varvec{x}^\star )\).

Proof

The RLT relaxation (4) maximizes the concave and, a fortiori, continuous function \(f' (\varvec{X}, \varvec{x}) + g (\varvec{x})\) over a compact feasible region. The Weierstrass theorem thus guarantees that the optimal value of problem (4) is attained.

Let \((\varvec{X}^\star , \varvec{x}^\star )\) be an optimal solution to the RLT relaxation (4). If \(\varvec{X}^\star = {{\,\mathrm{diag}\,}}(\varvec{x}^\star )\), then there is nothing to prove. If \(\varvec{X}^\star \ne {{\,\mathrm{diag}\,}}(\varvec{x}^\star )\), on the other hand, then there is \(i, j \in \{ 1, \ldots , n \}\), \(i \ne j\), such that \(X^\star _{ij} = X^\star _{ji} > 0\). Define \(\varvec{X}' \in {\mathbb {S}}^n\) as \(\varvec{X}' = \varvec{X}^\star + \varvec{T}\), where \(T_{ij} = T_{ji} = -X^\star _{ij}\), \(T_{ii} = T_{jj} = X^\star _{ij}\) and \(T_{kl} = 0\) for all other components kl. Note that \(\varvec{T} \succeq \mathbf {0}\) since \(\varvec{z}^\top \varvec{T} \varvec{z} = {X_{ij}^\star } (z_i - z_j)^2 \ge 0\) for all \(\varvec{z} \in {\mathbb {R}}^n\). We thus have \(\varvec{X}' = \varvec{X}^\star + \varvec{T} \succeq \varvec{X}^\star \), which implies that \({f'} (\varvec{X}', \varvec{x}^\star ) \ge f' (\varvec{X}^\star , \varvec{x}^\star )\) since \(f'\) is a monotone lifting of f. In addition, the row and column sums of \(\varvec{X}^\star \) and \(\varvec{X}'\) coincide by construction, and thus \((\varvec{X}', \varvec{x}^\star )\) is also feasible in the RLT relaxation (4).

By construction, the matrix \(\varvec{X}'\) contains two non-zero off-diagonal elements less than the matrix \(\varvec{X}^\star \). An iterative application of the argument from the previous paragraph eventually results in an optimal diagonal matrix \(\varvec{X}'\), which by the constraints of the RLT relaxation (4) must coincide with \({{\,\mathrm{diag}\,}}({\varvec{x}^\star })\). This proves the statement of the theorem. \(\square \)

Theorem 1 allows us to replace the \(n \times n\) decision matrix \(\varvec{X}\) in the RLT relaxation (4) of problem (3) with \({{\,\mathrm{diag}\,}}(\varvec{x})\) and thus significantly reduce the size of the optimization problem. Our numerical results (cf. Sect. 3) indicate that this can in turn result in dramatic savings in solution time. Another important consequence of Theorem 1 is given next.

Corollary 2

If the function f in problem (3) has a monotone lifting \(f'\), then the optimal value of the corresponding RLT relaxation (4) coincides with the optimal value of the corresponding RLT/SDP relaxation.

Proof

Recall that the RLT/SDP relaxation of problem (3) is equivalent to the RLT relaxation (4), except for the additional constraint that \(\varvec{X} \succeq \varvec{x} \varvec{x}^\top \). According to Theorem 1, it thus suffices to show that \({{\,\mathrm{diag}\,}}(\varvec{x}^\star ) \succeq \varvec{x}^\star \varvec{x}^\star {}^\top \) for the optimal solution \((\varvec{X}^\star , \varvec{x}^\star )\) considered in the theorem’s statement.

Note that the constraints of the RLT relaxation (4) imply that \(\varvec{x}^\star \ge \mathbf {0}\) and \(\sum _{i = 1}^n x_i^\star = 1\). For any vector \(\varvec{y} \in {\mathbb {R}}^n\), we can thus construct a random variable \({\tilde{Y}}\) that attains the value \(y_i\) with probability \(x^\star _i\), \(i = 1, \ldots , n\). We then have

$$\begin{aligned} \varvec{y}^\top {{\,\mathrm{diag}\,}}(\varvec{x}^\star ) \, \varvec{y} \; = \; {\mathbb {E}} \big [ {\tilde{Y}}^2 \big ] \; \ge \; {\mathbb {E}} \big [ {\tilde{Y}} \big ]^2 \; = \; \varvec{y}^\top \big [ \varvec{x}^\star \varvec{x}^\star {}^\top \big ] \, \varvec{y}, \end{aligned}$$

since \({\mathbb {V}}\text {ar} \big [ {\tilde{Y}} \big ] = {\mathbb {E}} \big [ {\tilde{Y}}^2 \big ] - {\mathbb {E}} \big [ {\tilde{Y}} \big ]^2 \ge 0\). We thus conclude that \({{\,\mathrm{diag}\,}}(\varvec{x}^\star ) - \varvec{x}^\star \varvec{x}^\star {}^\top \succeq \mathbf {0}\), that is, the optimal solution \((\varvec{X}^\star , \varvec{x}^\star )\) considered by Theorem 1 vacuously satisfies the LMI constraint of the RLT/SDP relaxation. \(\square \)

Corollary 2 shows that whenever f has a monotone lifting, the RLT/SDP reformulation offers no advantage over the RLT relaxation (4) of problem (3).

3 Numerical experiments

We compare our RLT formulation against standard RLT and RLT/SDP implementations on non-convex optimization problems over simplices. All experiments are run on an 8-th Generation Intel(R) Core(TM) i7-8750H processor using MATLAB 2018b [28], YALMIP R20200930 [16] and MOSEK 9.2.28 [21].

We consider instances of problem (3) whose objective functions satisfy

$$\begin{aligned} f (\varvec{x}) \; = \; \left\Vert \varvec{D} \varvec{Q} (\varvec{x} - \frac{1}{n} \cdot \mathbf {1})\right\Vert _2^2 \quad \text {and} \quad g (\varvec{x}) \; = \;\frac{1}{n}\sum _{i=1}^n \ln (x_i), \end{aligned}$$

where \(\varvec{D} \in {\mathbb {S}}^n\) is a diagonal scaling matrix whose diagonal elements are chosen uniformly at random from the interval [0, 10], \(\varvec{Q} \in {\mathbb {R}}^{n \times n}\) is a uniformly sampled rotation matrix [18], and \(\mathbf {1} \in {\mathbb {R}}^n\) is the vector of all ones (cf. Fig. 1).

Fig. 1
figure 1

Example non-convex optimization instance for \(n = 3\). The convex quadratic function f is minimized at the center of the simplex and maximized at a vertex. The addition of the concave barrier function g ensures that the overall maximum is attained in the interior of the simplex

It follows from our discussion in Sect. 2 that the optimal values of the RLT and RLT/SDP relaxations coincide for the test instances considered in this section, and there are always optimal solutions \((\varvec{X}^\star , \varvec{x}^\star )\) satisfying \(\varvec{X}^\star = {{\,\mathrm{diag}\,}}(\varvec{x}^\star )\). Figure 2 compares the runtimes of our RLT formulation, which replaces the matrix \(\varvec{X}\) with \({{\,\mathrm{diag}\,}}(\varvec{x})\), with those of the standard RLT and RLT/SDP formulations. As expected, our RLT formulation substantially outperforms both alternatives.

Fig. 2
figure 2

Median solution times (in \(\log _{10}\) secs, left, and secs, right) of our RLT formulation (‘Proposed RLT’) and the standard RLT and RLT/SDP formulations over 25 non-convex simplicial optimization instances