Abstract
In this paper, we consider approximation algorithms for optimizing a generic multivariate polynomial function in discrete (typically binary) variables. Such models have natural applications in graph theory, neural networks, error-correcting codes, among many others. In particular, we focus on three types of optimization models: (1) maximizing a homogeneous polynomial function in binary variables; (2) maximizing a homogeneous polynomial function in binary variables, mixed with variables under spherical constraints; (3) maximizing an inhomogeneous polynomial function in binary variables. We propose polynomial-time randomized approximation algorithms for such polynomial optimization models, and establish the approximation ratios (or relative approximation ratios whenever appropriate) for the proposed algorithms. Some examples of applications for these models and algorithms are discussed as well.
Similar content being viewed by others
1 Introduction
This paper is concerned with optimizing a (high degree) multivariate polynomial function in (mixed) binary variables. Our basic model is to maximize a d-th degree polynomial function p(x) where x=(x 1,x 2,⋯,x n )T is chosen such that x i ∈{1,−1} for i=1,2,⋯,n. For ease of referencing, let us call this basic model to be \((P): \max_{\boldsymbol{x}\in\{1,-1\}^{n}} p(\boldsymbol{x})\). This type of problem can be found in a great variety of application domains. For example, the following hypergraph max-covering problem is well studied in the literature, which is precisely (P). Given a hypergraph H=(V,E) with V being the set of vertices and E the set of hyperedges (or subsets of V), and each hyperedge e∈E is associated with a real-valued weight w(e). The problem is to find a subset S of the vertices set V, such that the total weight of the hyperedges covered by S is maximized. Denoting x i ∈{0,1} (i=1,2,⋯,n) to indicate whether or not vertex i is selected in S. The problem thus is \(\max_{\boldsymbol{x}\in\{0,1\}^{n}}\sum_{e\in E}w(e)\prod_{i\in e}x_{i}\). By a simple variable transformation x i →(x i +1)/2, the problem is transformed to (P), and vice versa.
Note that (P) is a fundamental problem in integer programming. As such it has received attention in the literature; see [17, 18]. It is also known as Fourier support graph problem. Mathematically, a polynomial function p:{−1,1}n→ℝ has Fourier expansion \(p(\boldsymbol{x})=\sum_{S\subseteq\{1,2,\cdots,n\}}\hat{p}(S)\prod_{i\in S}x_{i}\), which is also called Fourier support graph. Assume that p has only succinct (polynomially many) non-zero Fourier coefficient \(\hat{p}(S)\). The question is: Can we compute the maximum value of p over the discrete cube {1,−1}n, or alternatively can we find a good approximate solution in polynomial-time? The latter question actually motivates this paper. Indeed, (P) has been investigated extensively in the quadratic case, due to its connections to various graph partitioning problems, e.g., the maximum cut problem [16]. In general, (P) is closely related to finding the maximum weighted independent set in a graph. In particular, let G=(V,E) be a graph with V the set of vertices V and E the set of edges, and each vertex is assigned a positive weight. We call S to be an independent set of vertices if and only if S⊆V and no two vertices in S share an edge. The problem is to find an independent set of vertices such that the sum of its weights is maximum over all possible independent sets.
In fact, any unconstrained binary polynomial maximization problem can be transformed into the maximum weighted independent set problem, which is also commonly used technique in the literature for solving (P) (see e.g., [5, 30]). The transformation uses the concept of a conflict graph of a 0–1 polynomial function. The idea is illustrated in the following example. Let us consider
Note that f(x) can be transformed to an equivalent polynomial so that all terms (except the constant term) have positive coefficients. The new polynomial involves both the variables and their complements, i.e., \(\bar{x}_{i}:=1-x_{i}\) for i=1,2,3. In our example, such polynomial can be
The conflict graph G(f) associated with a polynomial f(x) has vertices corresponding to the terms of f(x), and each vertex is associated with a term in the polynomial except for the constant term. Two vertices in G(f) are connected by an edge if and only if one of the corresponding terms contains a variable and the other corresponding term contains its complement variable. The weight of a vertex in G(f) is the coefficient of the corresponding term in f. The conflict graph of f(x) is shown in Fig. 1. Maximizing the weighted independent set of the conflict graph also solves the binary polynomial optimization problem. Beyond its connection to the graph problems, (P) also has applications in neural networks [4, 8, 21], error-correcting codes [8, 29], etc. For instance, recently Khot and Naor [24] show that it has applications in the problem of refutation of random k-CNF formulas [12, 13].
One important subclass of polynomial function is homogeneous polynomials. Likewise, the homogeneous quadratic case of (P) has been studied extensively; see e.g. [2, 16, 27, 28]. Homogeneous cubic polynomial is also studies by Khot and Naor [24]. Another interesting problem of this class is the ∞↦1-norm of a matrix \(\boldsymbol{M}=(a_{ij})_{n_{1}\times n_{2}}\) (see e.g., [2]), i.e.,
It is quite natural to extend the problem of ∞↦1-norm to higher order tensors. In particular, the ∥F∥∞↦1 of a d-th order tensor \(\boldsymbol {F}=(a_{i_{1}i_{2}\cdots i_{d}})\) can be defined as
The other generalization of the matrix ∞↦1-norm is to extend the entry a ij of the matrix M to symmetric matrix A ij , i.e., the problem of
where λ max(⋅) indicates the largest eigenvalue of a matrix. If the matrix A ij is not restricted to be symmetric, we may instead maximize the largest singular value, i.e.,
These two problems are actually equivalent to
respectively, where F is a multilinear function induced by the tensor F, whose (i,j,k,ℓ)-th entry is (k,ℓ)-th entry of the matrix A ij .
In fact, a very interesting and succinct matrix combinatorial problem is: Given n matrices A i (i=1,2,⋯,n), find a binary combination of the matrices so as to maximize the spectral norm of the combined matrix:
This is indeed equivalent to
All the problems studied in this paper are NP-hard in general, and our focus will be polynomial-time approximation algorithms. In the case that the objective polynomial is quadratic, a well known example is the semidefinite programming relaxation and randomization approach for the max-cut problem due to Goemans and Williamson [16], where essentially a 0.878-approximation ratio of the model \(\max_{\boldsymbol{x}\in\{ 1,-1\}^{n}}\boldsymbol{x}^{\mathrm {T}}\boldsymbol{M}\boldsymbol{x}\) is shown with M being the Laplacian of a given graph. In the case M is only known to be positive semidefinite, Nesterov [27] derived a 0.636-approximation bound. Charikar and Wirth [9] considered a more general model; they proposed an \(\varOmega (\frac{1}{\log n} )\)-approximate algorithm for diagonal-free M. For the matrix ∞↦1-norm problem
Alon and Naor [2] derived a 0.56-approximation bound. Remark that all these approximation bounds remain hitherto the best available ones. When the degree of the polynomial function is greater than 2, to the best of our knowledge, the only known approximation result in the literature is due to Khot and Naor [24], where they showed how to estimate the optimal value of the problem \(\max_{\boldsymbol{x}\in\{1,-1\} ^{n}}\sum_{1\leqslant i,j,k\leqslant n}a_{ijk}x_{i}x_{j}x_{k}\) with (a ijk ) n×n×n being square-free (a ijk =0 whenever two of the indices are equal). Specifically, they presented a polynomial-time procedure to get an estimated value that is no less than \(\varOmega (\sqrt{\frac {\ln n}{n}} )\) times the optimal value. No solution, however, can be derived from the process. Moreover, the process is highly complex and is mainly of theoretical interest.
In this paper we consider the optimization models for a general polynomial function of any fixed degree d in (mixed) binary variables, and present polynomial-time randomized approximation algorithms. The algorithms proposed are fairly simple to implement. This study is motivated by our previous investigations on polynomial optimization under quadratic constraints [19, 20], as well as recent developments on homogeneous polynomial optimization under spherical constraints, e.g., So [31] and Chen et al. [10]. However, the discrete models studied in this paper have novel features, and the analysis is therefore entirely different from previous works. This paper is organized as follows. First, we introduce the notations and models in Sect. 2. In Sect. 3, we present the new approximation results, and also sketch the main ideas, while leaving the technical details to the Appendix. In Sect. 4 we shall discuss a few more specific problems where the models introduced can be directly applied.
2 Notations and Model Descriptions
In this paper we shall use the boldface letters to denote vectors, matrices, and tensors in general (e.g., the decision variable x, the data matrix Q, and the tensor form F), while the usual lowercase letters are reserved for scalars (e.g., x 1 being the first component of the vector x).
2.1 Objective Functions
The objective functions of the optimization models studied in this paper are all multivariate polynomial functions. The following multilinear tensor function plays a major role in our discussion:
where \(\boldsymbol{x}^{k} \in\mathbb{R}^{n_{k}}\) for k=1,2,⋯,d; and the letter ‘T’ signifies the notion of tensor. In the shorthand notation we shall denote \(\boldsymbol {F}=(a_{i_{1}i_{2}\cdots i_{d}})\in \mathbb{R}^{n_{1}\times n_{2}\times\cdots\times n_{d}}\) to be a d-th order tensor, and F to be its corresponding multilinear form. Closely related with the tensor F is a general d-th degree homogeneous polynomial function f(x), where x∈ℝn. We call the tensor \(\boldsymbol {F}=(a_{i_{1}i_{2}\cdots i_{d}})\) super-symmetric (see [25]) if \(a_{i_{1}i_{2}\cdots i_{d}}\) is invariant under all permutations of {i 1,i 2,⋯,i d }. As any homogeneous quadratic function uniquely determines a symmetric matrix, a given d-th degree homogeneous polynomial function f(x) also uniquely determines a super-symmetric tensor. In particular, if we denote a d-th degree homogeneous polynomial function:
then its corresponding super-symmetric tensor form can be written as \(\boldsymbol {F}= (b_{i_{1}i_{2}\cdots i_{d}})\in\mathbb{R}^{n^{d}}\), with \(b_{i_{1}i_{2}\cdots i_{d}} \equiv a_{i_{1}i_{2}\cdots i_{d}} / |\varPi(i_{1},i_{2},\cdots ,i_{d})|\), where |Π(i 1,i 2,⋯,i d )| is the number of distinctive permutations of the indices {i 1,i 2,⋯,i d }. This super-symmetric tensor representation is indeed unique. Let F be its corresponding multilinear function defined by the super-symmetric tensor F, then we have \(f(\boldsymbol{x}) = F(\underbrace{\boldsymbol{x},\boldsymbol{x},\cdots,\boldsymbol{x}}_{d})\). The letter ‘H’ here is used to emphasize that the polynomial function in question is homogeneous.
We shall also consider in this paper the following:
where \(\boldsymbol{x}^{k}\in\mathbb{R}^{n_{k}}\) for k=1,2,⋯,s, d 1+d 2+⋯+d s =d, and d-th order tensor form \(\boldsymbol {F}\in\mathbb{R}^{n_{1}^{d_{1}}\times n_{2}^{d_{2}}\times\cdots\times n_{s}^{d_{s}}}\); the letter ‘M’ signifies the notion of mixed polynomial forms. We may without loss of generality assume that F has partial symmetric property, namely for any fixed (x 2,x 3,⋯,x s), \(F(\underbrace{\cdot,\cdot,\cdots,\cdot}_{d_{1}}, \underbrace{\boldsymbol{x}^{2},\boldsymbol{x}^{2},\cdots,\boldsymbol{x}^{2}}_{d_{2}}, \cdots, \underbrace{\boldsymbol{x}^{s},\boldsymbol{x}^{s},\cdots,\boldsymbol{x}^{s}}_{d_{s}})\) is a super-symmetric d 1-th order tensor, and so on.
Beyond the homogeneous polynomial functions described above, a generic multivariate inhomogeneous polynomial function of degree d, p(x), can be explicitly written as a summation of homogeneous polynomial functions in decreasing degrees, namely
where x∈ℝn, f 0∈ℝ, and \(f_{k}(\boldsymbol{x})=F_{k}(\underbrace{\boldsymbol{x},\boldsymbol{x},\cdots,\boldsymbol{x}}_{k})\) is a homogeneous polynomial function of degree k for k=1,2,⋯,d; the letter ‘P’ signifies the notion of polynomial.
Throughout we shall adhere to the notation F for a multilinear form defined by a tensor form F, and f for a homogeneous polynomial function, and p for an inhomogeneous polynomial function. Without loss of generality we assume that n 1⩽n 2⩽⋯⩽n d in the tensor form \(\boldsymbol {F}\in\mathbb{R}^{n_{1}\times n_{2}\times\cdots\times n_{d}}\), and n 1⩽n 2⩽⋯⩽n s in the tensor form \(\boldsymbol {F}\in\mathbb{R}^{n_{1}^{d_{1}}\times n_{2}^{d_{2}}\times \cdots\times n_{s}^{d_{s}}}\). We also assume at lease one component of the tensor form, F in Functions T, H, M, and F d in Function P is nonzero to avoid triviality. Finally, without loss of generality we assume the inhomogeneous polynomial function p(x) has no constant term, i.e., f 0=0 in Function P.
2.2 Decision Variables
This paper is focused on integer and mixed integer programming with polynomial functions. In particular, two types of decision variables are considered in this paper: discrete binary variables
and continuous variables on the unit sphere:
Note that in this paper we shall by default use the Euclidean norm for vectors, matrices and tensors. The decision variables in our models range from the pure binary vector x, to a mixed one including both x \((\in\mathbb{B}^{n})\) and y \((\in\mathbb{S}^{m})\) .
2.3 Model Descriptions
In this paper we consider the following binary integer optimization models with objection functions as specified in Sect. 2.1:
and their mixed models:
Let d 1+d 2+⋯+d s =d and \(d_{1}'+d_{2}'+\cdots+d_{t}'=d'\) in the above mentioned models. The degrees of the polynomial functions in these models, d for the pure binary models and d+d′ for the mixed models, are understood as fixed constants in our subsequent discussions. As before, we also assume that the tensor forms of the objective functions in (H)′ and (M)′ to have partial symmetric property, m 1⩽m 2⩽⋯⩽m d′ in (T)′, and m 1⩽m 2⩽⋯⩽m t in (M)′.
2.4 Approximation Ratios
All the optimization problems mentioned in the previous subsection are in general NP-hard when the degree of the objective polynomial function is larger than or equal to 2. This is because each one includes computing the matrix ∞↦1-norm as a subclass, i.e.,
Thus, in this paper we shall focus on polynomial-time approximation algorithms with provable worst-case performance ratios. For any maximization problem (P) defined as max x∈S f(x), we use v max(P) to denote its optimal value, and v min(P) to denote the optimal value of its minimization counterpart, i.e.,
Definition 2.1
We call the maximization model (P) to admit a polynomial-time approximation algorithm with approximation ratio τ∈(0,1], if v max(P)⩾0 and a feasible solution z∈S can be found in polynomial-time such that f(z)⩾τ v max(P).
Definition 2.2
We call the maximization model (P) to admit a polynomial-time approximation algorithm with relative approximation ratio τ∈(0,1], if a feasible solution z∈S can be found in polynomial-time such that f(z)−v min(P)⩾τ(v max(P)−v min(P)).
Regarding to the relative approximation ratios (Definition 2.2), in some cases it is convenient to use the equivalent form: v max(P)−f(z)⩽(1−τ)(v max(P)−v min(P)).
3 Bounds on the Approximation Ratios
In this section we shall present our main results, viz. the approximation ratios for the discrete polynomial optimization models considered in this paper. In order not to distract reading the main results, the proofs will be postponed and placed in the Appendix. To simplify, we use the notion Ω(f(n)) to signify that there are positive universal constants α and n 0 such that Ω(f(n))⩾αf(n) for all n⩾n 0. Throughout our discussion, we shall fix the degree of the objective polynomial function (denoted by d or d+d′ in the paper) to be a constant.
3.1 Homogeneous Polynomials in Binary Variables
Theorem 3.1
\((T){:}\, \max_{\boldsymbol{x}^{k}\in\mathbb{B}^{n_{k}}} F(\boldsymbol{x}^{1},\boldsymbol{x}^{2},\cdots,\boldsymbol{x}^{d})\) admits a polynomial-time approximation algorithm with approximation ratio τ T , where
We remark that when d=2, (T) is to compute ∥F∥∞↦1. The current best polynomial-time approximation ratio for that problem is \(\frac{2\ln(1+\sqrt{2})}{\pi} \approx0.56\) due to Alon and Naor [2]. Huang and Zhang [22] considered similar problems for the complex variables and derived constant approximation ratios.
When d=3, (T) is a slight generalization of the model considered by Khot and Naor [24], where F was assumed to be super-symmetric (implying n 1=n 2=n 3) and square-free (i.e., a ijk =0 whenever two of the three indices are equal). In our case, we discard the assumptions on the symmetry and the square-free property altogether. The approximation bound of the optimal value given in [24] is \(\varOmega (\sqrt{\frac{\ln n_{1}}{n_{1}}} )\); however, no polynomial-time procedure is provided to find a corresponding approximate solution.
Our approximation algorithm works for general degree d based on recursion, and is fairly simple. We may take any approximation algorithm for the d=2 case, say the algorithm by Alon and Naor [2], as a basis. When d=3, noticing that any n 1×n 2×n 3 third order tensor can be written as an (n 1 n 2)×n 3 matrix by combining its first and second modes, (T) can be relaxed to
This problem is the exact form of (T) when d=2, which can be solved approximately with approximation ratio \(\frac{2\ln(1+\sqrt{2})}{\pi}\). Denote its approximate solution to be \((\hat{\boldsymbol{X}}, \hat{\boldsymbol{x}}^{3})\). The next key step is to recover \((\hat{\boldsymbol{x}}^{1}, \hat{\boldsymbol{x}}^{2})\) from \(\hat{\boldsymbol{X}}\). For this purpose, we introduce the following decomposition routine, which plays a fundamental role in our algorithms.
If we let \(\boldsymbol{M}=F(\cdot,\cdot,\hat{\boldsymbol{x}}^{3})\) and apply DR 3.1, then we can prove the output \((\hat{\boldsymbol{x}}^{1}, \hat{\boldsymbol{x}}^{2})\) satisfies
which yields an approximation ratio for d=3. By a recursive procedure, the approximation algorithm is readily extended to solve (T) with any fixed degree d.
Theorem 3.2
If \(F(\underbrace{\boldsymbol{x},\boldsymbol{x},\cdots,\boldsymbol{x}}_{d})\) is square-free and d is odd, then \((H){:}\, \max_{\boldsymbol{x}\in\mathbb {B}^{n}} f(\boldsymbol{x})\) admits a polynomial-time approximation algorithm with approximation ratio τ H , where
Theorem 3.3
If \(F(\underbrace{\boldsymbol{x},\boldsymbol{x},\cdots,\boldsymbol{x}}_{d})\) is square-free and d is even, then \((H){:}\, \max_{\boldsymbol{x}\in \mathbb{B}^{n}} f(\boldsymbol{x})\) admits a polynomial-time approximation algorithm with relative approximation ratio τ H .
The key linkage from multilinear tensor function F(x 1,x 2,⋯,x d) to the homogeneous polynomial function f(x) is the following lemma. Essentially it makes the tensor relaxation method applicable for (H).
Lemma 3.4
(He, Li, and Zhang [19])
Suppose x 1,x 2,⋯,x d∈ℝn, and ξ 1,ξ 2,⋯,ξ d are i.i.d. random variables, each taking values 1 and −1 with equal probability. For any super-symmetric d-th order tensor form F and function f(x)=F(x,x,⋯,x), it holds that
Remark that the approximation ratios for (H) hold under the square-free condition. This is because in this case the decision variables are actually in the multilinear form. Hence, one can replace any point in the box ([−1,1]n) by one of its vertices ({−1,1}n) without decreasing its objective function value, due to the linearity. Besides, in the case when d is odd, one may first relax (H) to \(\max_{\boldsymbol{x}\in[-1,1]^{n}}f(\boldsymbol{x})\), and then directly apply the approximation result for homogeneous polynomial maximization over intersection of n co-centered ellipsoids (see [19]). Under the square-free condition, this procedure is able to generate a feasible solution for (H) with approximation ratio \(\varOmega (n^{-\frac{d-2}{2}}\log^{-(d-1)}n )\), which is worse than τ H in Theorem 3.2. Therefore, we may treat Theorem 3.2 an improvement of the approximation ratio.
We move on to consider the mixed form of discrete polynomial optimization model (M). It is a generalization of (T) and (H), making the model applicable to a wider range of practical problems.
Theorem 3.5
If \(F(\underbrace{\boldsymbol{x}^{1},\boldsymbol{x}^{1},\cdots ,\boldsymbol{x}^{1}}_{d_{1}}, \underbrace{\boldsymbol{x}^{2},\boldsymbol{x}^{2},\cdots,\boldsymbol{x}^{2}}_{d_{2}}, \cdots, \underbrace{\boldsymbol{x}^{s},\boldsymbol{x}^{s},\cdots,\boldsymbol{x}^{s}}_{d_{s}})\) is square-free in each x k (k=1,2,⋯,s), and one of d k (k=1,2,⋯,s) is odd, then \((M){:}\, \max_{\boldsymbol{x}^{k}\in\mathbb{B}^{n_{k}}}f(\boldsymbol{x}^{1},\boldsymbol{x}^{2},\cdots,\boldsymbol{x}^{s})\) admits a polynomial-time approximation algorithm with approximation ratio τ M , where
Theorem 3.6
If \(F(\underbrace{\boldsymbol{x}^{1},\boldsymbol{x}^{1},\cdots,\boldsymbol{x}^{1}}_{d_{1}}, \underbrace {\boldsymbol{x}^{2},\boldsymbol{x}^{2},\cdots,\boldsymbol{x}^{2}}_{d_{2}}, \cdots, \underbrace{\boldsymbol{x}^{s},\boldsymbol{x}^{s},\cdots,\boldsymbol{x}^{s}}_{d_{s}})\) is square-free in each x k (k=1,2,⋯,s), and all d k (k=1,2,⋯,s) are even, then \((M){:}\, \max_{\boldsymbol{x}^{k}\in\mathbb{B}^{n_{k}}}f(\boldsymbol{x}^{1},\boldsymbol{x}^{2},\cdots,\boldsymbol{x}^{s})\) admits a polynomial-time approximation algorithm with relative approximation ratio τ M .
The main idea in the proof is tensor relaxation (to relax its objective function f(x 1,x 2,⋯,x s) to a multilinear tensor function), which leads to (T). After solving (T) approximately by Theorem 3.1, we are able to adjust the solutions one by one, using Lemma 3.4.
3.2 Homogeneous Polynomials in Mixed Variables
Proposition 3.7
When d=d′=1, \((T)'{:}\, \max_{\boldsymbol{x}^{1} \in\mathbb{B}^{n_{1}},\boldsymbol{y}^{1} \in\mathbb{S}^{m_{1}}} F(\boldsymbol{x}^{1},\boldsymbol{y}^{1})\) admits a polynomial-time approximation algorithm with approximation ratio \(\sqrt{2/\pi}\).
Proposition 3.7 serves as the basis for (T)′ of general d and d′. In this particular case, (T)′ can be equivalently transformed into \(\max_{\boldsymbol{x}\in\mathbb{B}^{n_{1}}}\boldsymbol{x}^{\mathrm {T}}\boldsymbol{Q}\boldsymbol{x}\) with Q⪰0. The later problem admits a polynomial-time approximation algorithm (SDP relaxation and randomization) with approximation ratio 2/π by Nesterov [27].
Recursion is again the tool to handle the high degree case. For the recursion on d, with discrete variables x k, DR 3.1 is applied in each recursive step. For the recursion on d′, with continuous variables y k, two decomposition routines in He, Li, and Zhang [19] are readily available, namely the eigenvalue decomposition approach (DR 2 of [19]) and the randomized decomposition approach (DR 1 of [19]), either one of them serves the purpose here.
Theorem 3.8
\((T)'{:}\,\max_{\boldsymbol{x}^{k} \in\mathbb{B}^{n_{k}},\,\boldsymbol{y}^{\ell} \in\mathbb{S}^{m_{\ell}}} F(\boldsymbol{x}^{1},\boldsymbol{x}^{2},\cdots,\boldsymbol{x}^{d},\boldsymbol{y}^{1},\boldsymbol{y}^{2},\cdots,\boldsymbol{y}^{d'})\) admits a polynomial-time approximation algorithm with approximation ratio \(\tau_{T}'\), where
From Theorem 3.8, by applying Lemma 3.4 as a linkage, together with the square-free property, we are led to the following two theorems regarding (H)′.
Theorem 3.9
If \(F(\underbrace{\boldsymbol{x},\boldsymbol{x},\cdots,\boldsymbol{x}}_{d},\underbrace{\boldsymbol{y},\boldsymbol{y},\cdots,\boldsymbol{y}}_{d'})\) is square-free in x, and either d or d′ is odd, then \((H)'{:}\, \max_{\boldsymbol{x}\in\mathbb{B}^{n},\boldsymbol{y} \in\mathbb{S}^{m}}f(\boldsymbol{x},\boldsymbol{y})\) admits a polynomial-time approximation algorithm with approximation ratio \(\tau_{H}'\), where
Theorem 3.10
If \(F(\underbrace{\boldsymbol{x},\boldsymbol{x},\cdots,\boldsymbol{x}}_{d},\underbrace{\boldsymbol{y},\boldsymbol{y},\cdots,\boldsymbol{y}}_{d'})\) is square-free in x, and both d and d′ are even, then \((H)'{:}\,\max_{\boldsymbol{x}\in\mathbb{B}^{n},\boldsymbol{y} \in\mathbb{S}^{m}}f(\boldsymbol{x},\boldsymbol{y})\) admits a polynomial-time approximation algorithm with relative approximation ratio \(\tau_{H}'\).
By relaxing (M)′ to the multilinear tensor function optimization (T)′ and solving it approximately using Theorem 3.8, we may further adjust its solution one by one using Lemma 3.4, leading to the following general result.
Theorem 3.11
If
is square-free in each x k (k=1,2,⋯,s), and one of d k (k=1,2,⋯,s) or one of \(d_{\ell}'\) (ℓ=1,2,⋯,t) is odd, then \((M)'{:}\, \max_{\boldsymbol{x}^{k} \in\mathbb{B}^{n_{k}},\boldsymbol{y}^{\ell} \in \mathbb{S}^{m_{\ell}}} f(\boldsymbol{x}^{1}, \boldsymbol{x}^{2},\cdots,\boldsymbol{x}^{s}, \boldsymbol{y}^{1},\boldsymbol{y}^{2},\cdots,\boldsymbol{y}^{t})\) admits a polynomial-time approximation algorithm with approximation ratio \(\tau_{M}'\), where
Theorem 3.12
If
is square-free in each x k (k=1,2,⋯,s), and all d k (k=1,2,⋯,s) and all \(d_{\ell}'\) (ℓ=1,2,⋯,t) are even, then \((M)'{:}\,\max_{\boldsymbol{x}^{k}\in\mathbb {B}^{n_{k}},\boldsymbol{y}^{\ell} \in\mathbb{S}^{m_{\ell}}}f(\boldsymbol{x}^{1}, \boldsymbol{x}^{2},\cdots ,\boldsymbol{x}^{s}, \boldsymbol{y}^{1},\boldsymbol{y}^{2},\cdots,\boldsymbol{y}^{t})\) admits a polynomial-time approximation algorithm with relative approximation ratio \(\tau_{M}'\).
3.3 Inhomogeneous Polynomials in Binary Variables
Extending the approximation algorithms and the corresponding analysis for homogeneous polynomial optimization to the general inhomogeneous polynomials is not straightforward. Technically it is also a way to get around the square-free property, which is a requirement for all the homogeneous polynomials mentioned in the previous subsections. The analysis here, like the analysis in our previous paper [20], is to directly deal with homogenization.
It is quite natural to introduce a new variable, say x h , which is actually set to be 1, to yield a homogeneous form for Function P:
where \(f(\bar{\boldsymbol{x}})\) is an (n+1)-dimensional homogeneous polynomial function of degree d, with variable \(\bar{\boldsymbol{x}}\), i.e., \(\boldsymbol {F}\in\mathbb{R}^{(n+1)^{d}}\) and \(\boldsymbol {\bar{x}}\in \mathbb{R}^{n+1}\). Optimization of this homogeneous form can be done due to our previous results, but in general we do not have any control on the solution of x h , which has to be 1 as required by the feasibility. The following lemma ensures that construction of a feasible solution is possible.
Lemma 3.13
(He, Li, and Zhang [20])
Suppose with \(|x_{h}^{k}|\leqslant 1\) for k=1,2,⋯,d. Let η 1,η 2,⋯,η d be independent random variables, each taking values 1 and −1 with \(\textbf{\textsf{E}} [\eta_{k}]=x_{h}^{k}\) for k=1,2,⋯,d, and let ξ 1,ξ 2,⋯,ξ d be i.i.d. random variables, each taking values 1 and −1 with equal probability (thus the mean is 0). If the last component of the tensor F is 0, then we have
and
Our last result is the following theorem.
Theorem 3.14
(P) admits a polynomial-time approximation algorithm with relative approximation ratio τ P , where
We remark that (P) is indeed a very general discrete optimization model. For example, it can be used to model the following general polynomial optimization problem in discrete values:
To see this, we observe that by adopting the Lagrange interpolation technique and letting
the original decision variables can be equivalently transformed into
where u i ∈{1,2,⋯,m i }, which can be further represented by ⌈log2 m i ⌉ independent binary variables. Combining these two steps of substitution, (D) is then reformulated as (P), with the degree of its objective polynomial function no larger than max1⩽i⩽n {d(m i −1)}, and the dimension of its decision variables being \(\sum_{i=1}^{n}\lceil\log_{2} m_{i}\rceil\).
In many real world applications, the data \(\{a_{1}^{i},a_{2}^{i},\cdots ,a_{m_{i}}^{i}\}\) (i=1,2,⋯,n) in (D) are arithmetic sequences. Then it is much easier to transform (D) to (P), without going through the Lagrange interpolation. It keeps the same degree of the objective polynomial function, and the dimension of its decision variables is \(\sum_{i=1}^{n}\lceil\log_{2} m_{i}\rceil\).
The proofs of all the theorems presented in this section are delegated to Appendix.
4 Examples of Application
As we discussed in Sect. 1, the models studied in this paper have versatile applications. Given the generic nature of the discrete polynomial optimization models (viz. (T), (H), (M), (P), (T)′, (H)′ and (M)′), this point is perhaps self-evident. However, we believe it is helpful to present a few examples at this point with more details, to illustrate the potential modeling opportunities with the new optimization models. We present four problems in this section and show that they are readily formulated by the discrete polynomial optimization models in this paper.
4.1 The Tensor Cut-Norm Problem
The concept of cut-norm is initially defined on a real matrix \(\boldsymbol{A}=(a_{ij})\in\mathbb{R}^{n_{1}\times n_{2}}\), denoted by ∥A∥ C , the maximum over all I⊆{1,2,⋯,n 1} and J⊆{1,2,⋯,n 2}, of the quantity |∑ i∈I,j∈J a ij |. This concept plays a major role in the design of efficient approximation algorithms for dense graph and matrix problems (see e.g., [3, 14]). Alon and Naor [2] proposed a randomized polynomial-time approximation algorithm that approximates the cut-norm with a factor at least 0.56, which is currently the best available approximation ratio. Since a matrix is a second order tensor, it is natural to extend the cut-norm to general higher order tensors, e.g., a recent paper by Kannan [23]. Specifically, given a d-th order tensor \(\boldsymbol {F}=(a_{i_{1}i_{2}\cdots i_{d}})\in\mathbb{R}^{n_{1}\times n_{2}\times \cdots\times n_{d}}\), its cut-norm is defined by
In fact, the cut-norm ∥F∥ C is closely related to ∥F∥∞↦1, which is exactly in the form of (T). By Theorem 3.1, there is a polynomial-time approximation algorithm which computes ∥F∥∞↦1 with a factor at least \(\varOmega ((n_{1}n_{2}\cdots n_{d-2})^{-\frac{1}{2}} )\). The following result, asserts that the cut-norm of a general d-th order tensor can also be approximated by a factor of \(\varOmega ((n_{1}n_{2}\cdots n_{d-2})^{-\frac{1}{2}} )\).
Proposition 4.1
For any d-th order tensor \(\boldsymbol {F}\in\mathbb{R}^{n_{1}\times n_{2}\times \cdots\times n_{d}}\), ∥F∥ C ⩽∥F∥∞↦1⩽2d∥F∥ C .
Proof
Let \(\boldsymbol {F}=(a_{i_{1}i_{2}\cdots i_{d}})\in\mathbb{R}^{n_{1}\times n_{2}\times \cdots\times n_{d}}\). Recall that \(\|\boldsymbol {F}\|_{\infty\mapsto1}= \max_{\boldsymbol{x}^{k} \in\mathbb{B}^{n_{k}},\ k=1,2,\cdots,d}F(\boldsymbol{x}^{1},\boldsymbol{x}^{2},\cdots,\boldsymbol{x}^{d})\). Given any \(\boldsymbol{x}^{k}\in\mathbb{B}^{n_{k}}\) for k=1,2,⋯,d, it follows that
which implies ∥F∥∞↦1⩽2d∥F∥ C .
Observe that \(\|\boldsymbol {F}\|_{C}=\max_{\boldsymbol{z}^{k}\in\{0,1\}^{n_{k}},\,k=1,2,\cdots ,d}|F(\boldsymbol{z}^{1}, \boldsymbol{z}^{2}, \cdots, \boldsymbol{z}^{d})|\). Given any \(\boldsymbol{z}^{k}\in\{0,1\}^{n_{k}}\) for k=1,2,⋯,d, let z k=(e+x k)/2, where e is the all one vector. Clearly \(\boldsymbol{x}^{k}\in\mathbb{B}^{n_{k}}\) for k=1,2,⋯,d, and thus
which implies ∥F∥ C ⩽∥F∥∞↦1. □
4.2 The Vector-Valued Maximum Cut Problem
Consider an undirected graph G=(V,E) where V={v 1,v 2,⋯,v n } is the set of the vertices, and E⊆V×V is the set of the edges. On each edge e∈E there is an associated weight, which is a nonnegative vector in this case, \(\boldsymbol{w}_{e} \in\mathbb{R}_{+}^{m}\). The problem now is to find a cut in such a way that the total sum of the weights, which is a vector in this case, has a maximum norm. More formally, this problem can be formulated as
Note that the usual max-cut problem is a special case of the above model where each weight w e ⩾0 is a scalar. Similar to the scalar case (see [16]), we may reformulate the above problem in binary variables as
where
Observing the Cauchy-Schwartz inequality, we further formulate the above problem as
This is the exact form of (H)′ with d=2 and d′=1. Although the square-free property in terms of x does not hold in this model (which is a condition of Theorem 3.9), one can still replace any point in the box ([−1,1]n) by one of its vertices ({−1,1}n) without decreasing its objective function value, since the matrix \(F(\cdot,\cdot,\boldsymbol{e}_{k}) = ((w_{ij}')_{k} )_{n\times n}\) is diagonal dominant for k=1,2,⋯,m. Thus, the vector-valued max-cut problem admits an approximation ratio of \(\frac{1}{2} ( \frac{2}{\pi } )^{3/2} n^{-1/2}\) by Theorem 3.9.
If the weights on edges are positive semidefinite matrices (i.e., W ij ∈ℝm×m, W ij ⪰0), then the matrix-valued max-cut problem can also be formulated as
where \(\boldsymbol{W}_{ij}'\) is defined similarly to (1); or equivalently,
Similar to the vector-valued case, by the diagonal dominant property and Theorem 3.10, the above problem admits an approximation ratio of \(\frac{1}{4} ( \frac{2}{\pi} )^{3/2} (mn)^{-1/2}\). Notice that Theorem 3.10 only asserts a relative approximation ratio; however for this problem the optimal value of its minimization counterpart is obviously nonnegative, and thus a relative approximation ratio implies a usual approximation ratio.
4.3 The Maximum Complete Satisfiability Problem
The usual maximum satisfiability problem (see e.g., [15]) is to find the boolean values of the literals, so as to maximize the total weighted sum of the satisfied clauses. The key point of the problem is that each clause is in the disjunctive form, namely if one of the literals is assigned the true value, then the clause is called satisfied. If the literals are also conjunctive, then this form of satisfiability problem is easy to solve. However, if not all the clauses can be satisfied, and we alternatively look for an assignment that maximizes the weighted sum of the satisfied clauses, then the problem is quite different. To make a distinction from the usual Max-SAT problem, let us call the new problem to be maximum complete satisfiability problem, to be abbreviated as Max-C-SAT. It is immediately clear that Max-C-SAT is NP-hard, since we can easily reduce the max-cut problem to it. The reduction can be done as follows. For each edge (v i ,v j ) we consider two clauses \(\{x_{i},\bar{x}_{j}\}\) and \(\{\bar{x}_{i},x_{j}\}\), both having weight w ij . Then a Max-C-SAT solution leads to a solution to the max-cut problem.
Now consider an instance of the Max-C-SAT problem with m clauses, each clause containing no more than d literals. Suppose that clause k (k=1,2,⋯,m) has the following form
where s k +t k ⩽d, associated with a weight w k ⩾0 for k=1,2,⋯,m. Then, the Max-C-SAT problem can be formulated in the form of (P) as
According to Theorem 3.14 and the nonnegativity of the objective function, the above problem admits a polynomial-time approximation algorithm with approximation ratio \(\varOmega (n^{-\frac{d-2}{2}} )\), which is independent of the number of clauses m.
4.4 The Box Constrained Diophantine Equation
Solving a system of linear equations where the variables are integers and constrained to a box is an important problem in discrete optimization and linear algebra. Examples of application include the classical Frobenius problem (see e.g., [6]), and a “market split problem” [11], other from engineering applications in integrated circuits design and video signal processing. For more details, one is referred to Aardal et al. [1]. Essentially, the problem is to find an integer-valued x∈ℤn and 0⩽x⩽u, such that Ax=b. The problem can be formulated by the least square method as
According to the discussion at the end of Sect. 3.3, the above problem can be reformulated as a form of (P), whose objective function is quadratic polynomial and number of decision variables is \(\sum_{i=1}^{n}\lceil\log_{2} (u_{i}+1)\rceil\). By applying Theorem 3.14, (L) admits a polynomial-time approximation algorithm with a constant relative approximation ratio.
In general, the Diophantine equations are polynomial equations. The box constrained polynomial equations can also be formulated by the least square method as of (L). Suppose the highest degree of the polynomial equations is d. Then, this least square problem can be reformulated as a form of (P), with the degree of the objective polynomial being 2d and number of decision variables being \(\sum_{i=1}^{n}\lceil\log_{2} (u_{i}+1)\rceil\). By applying Theorem 3.14, this problem admits a polynomial-time approximation algorithm with a relative approximation ratio \(\varOmega ( (\sum_{i=1}^{n}\log u_{i} )^{-(d-1)} )\).
We have tested extensively the numerical performance of the algorithms proposed in this paper, based on simulated data. In general the results show that the algorithms are not only efficient in the theoretical sense as we prove in this paper, but also effective in practice. The numerical results and the discussion of these results under various circumstances, however, are too lengthy to be included in the current paper. Instead, we refer the interested readers to the recent Ph.D. thesis of one of the authors, Li [26].
References
Aardal, K., Hurkens, C.A.J., Lenstra, A.K.: Solving a system of linear Diophantine equations with lower and upper bounds on the variables. Math. Oper. Res. 25, 427–442 (2000)
Alon, N., Naor, A.: Approximating the cut-norm via Grothendieck’s inequality. SIAM J. Comput. 35, 787–803 (2006)
Alon, N., de la Vega, W.F., Kannan, R., Karpinski, M.: Random sampling and approximation of MAX-CSP problems. In: Proceedings of the 34th Annuals ACM Symposium on Theory of Computing, pp. 232–239 (2002)
Ansari, N., Hou, E.: Computational Intelligence for Optimization. Kluwer Academic, Norwell (1997)
Balinski, M.L.: On a selection problem. Manag. Sci. 17, 230–231 (1970)
Beihoffer, D., Hendry, J., Nijenhuis, A., Wagon, S.: Faster algorithms for Frobenius numbers. Electron. J. Comb. 12, R27 (2005)
Bertsimas, D., Ye, Y.: Semidefinite relaxations, multivariate normal distributions, and order statistics. In: Du, D.-Z., Pardalos, P.M. (eds.) Handbook of Combinatorial Optimization, vol. 3, pp. 1–19. Kluwer Academic, Norwell (1998)
Bruck, J., Blaum, M.: Neural networks, error-correcting codes, and polynomials over the binary n-cube. IEEE Trans. Inf. Theory 35, 976–987 (1989)
Charikar, M., Wirth, A.: Maximizing quadratic programs: extending Grothendieck’s inequality. In: Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science, pp. 54–60 (2004)
Chen, B., He, S., Li, Z., Zhang, S.: Maximum block improvement and polynomial optimization. SIAM J. Optim. 22, 87–107 (2012)
Cornuéjols, G., Dawande, M.: A class of hard small 0–1 programs, integer programming and combinatorial optimization. Lect. Notes Comput. Sci. 1412, 284–293 (1998)
Feige, U., Ofek, E.: Easily refutable subformulas of large random 3CNF formulas. In: Automata, Languages and Programming. Lecture Notes in Compuer Science, vol. 3142, pp. 519–530 (2004)
Friedman, J., Goerdt, A., Krivelevich, M.: Recognizing more unsatisfiable random k-SAT instances efficiently. SIAM J. Comput. 35, 408–430 (2005)
Frieze, A.M., Kannan, R.: Quick approximation to matrices and applications. Combinatorica 19, 175–200 (1999)
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, New York (1979)
Goemans, M.X., Williamson, D.P.: Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. ACM 42, 1115–1145 (1995)
Hammer, P.L., Rudeanu, S.: Boolean Methods in Operations Research. Springer, New York (1968)
Hansen, P.: Methods of nonlinear 0–1 programming. Ann. Discrete Math. 5, 53–70 (1979)
He, S., Li, Z., Zhang, S.: Approximation algorithms for homogeneous polynomial optimization with quadratic constraints. Math. Program. 125, 353–383 (2010)
He, S., Li, Z., Zhang, S.: General constrained polynomial optimization: an approximation approach. Technical report, Department of Systems Engineering & Engineering Management, The Chinese University of Hong Kong (2009)
Hopfield, J.J., Tank, D.W.: “Neural” computation of decisions in optimization problem. Biol. Cybern. 52, 141–152 (1985)
Huang, Y., Zhang, S.: Approximation algorithms for indefinite complex quadratic maximization problems. Sci. China Math. 53, 2697–2708 (2010)
Kannan, R.: Spectral methods for matrices and tensors. In: Proceedings of the 42nd Annual ACM Symposium on Theory of Computing, pp. 1–12 (2010)
Khot, S., Naor, A.: Linear equations modulo 2 and the L 1 diameter of convex bodies. In: Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science, pp. 318–328 (2007)
Kofidis, E., Regalia, Ph.: On the best rank-1 approximation of higher order supersymmetric tensors. SIAM J. Matrix Anal. Appl. 23, 863–884 (2002)
Li, Z.: Polynomial optimization problems—approximation algorithms and applications. Ph.D. thesis, The Chinese University of Hong Kong, Shatin, Hong Kong (2011)
Nesterov, Yu.: Semidefinite relaxation and nonconvex quadratic optimization. Optim. Methods Softw. 9, 141–160 (1998)
Nesterov, Yu.: Random walk in a simplex and quadratic optimization over convex polytopes. CORE discussion paper, UCL, Louvain-la-Neuve, Belgium (2003)
Purser, M.: Introduction to Error-Correcting Codes. Artech House, Norwood (1995)
Rhys, J.M.W.: A selection problem of shared fixed costs and network flows. Manag. Sci. 17, 200–207 (1970)
So, A.M.-C.: Deterministic approximation algorithms for sphere constrained homogeneous polynomial optimization problems. Math. Program. 129, 357–382 (2011)
Author information
Authors and Affiliations
Corresponding author
Additional information
Simai He was supported in part by Hong Kong General Research Fund (No. CityU143711).
Zhening Li was supported in part by Natural Science Foundation of Shanghai (No. 12ZR1410100) and Ph.D. Programs Foundation of Chinese Ministry of Education (No. 20123108120002).
Shuzhong Zhang was supported in part by U.S. National Science Foundation (No. CMMI-1161242).
Appendix: Proofs of the Theorems
Appendix: Proofs of the Theorems
1.1 A.1 Proof of Theorem 3.1
Proof
The proof is based on mathematical induction on the degree d. For the case of d=2, it is exactly the algorithm by Alon and Naor [2]. For general d⩾3, let X=x 1(x d)T and (T) is then relaxed to
where we treat X as an n 1 n d -dimensional vector, and \(\boldsymbol {F}\in \mathbb{R}^{n_{1}n_{d}\times n_{2}\times\cdots\times n_{d-1}}\) as a (d−1)-th order tensor. Observe that \((\hat{T})\) is the exact form of (T) in degree d−1, and so by induction we can find \(\hat{\boldsymbol{X}}\in \mathbb{B}^{n_{1}n_{d}}\) and \(\hat{\boldsymbol{x}}^{k}\in\mathbb{B}^{n_{k}}\, (k=2,3,\cdots,d-1)\) in polynomial-time, such that
Rewrite \(\hat{\boldsymbol{X}}\) as an n 1×n d matrix, and construct
as in DR 3.1, and then randomly generate
Let \(\hat{\boldsymbol{x}}^{1}:=\mbox {sign}\,(\boldsymbol{\xi})\) and \(\hat{\boldsymbol{x}}^{d}:=\mbox {sign}\,(\boldsymbol{\eta})\). Noticing that the diagonal components of \(\tilde {\boldsymbol{X}}\) are all ones, it follows from Bertsimas and Ye [7] that for all 1⩽i⩽n 1 and 1⩽j⩽n d ,
where the last equality is due to \(|\hat{X}_{ij}|=1\). Denote matrix \(\hat{\boldsymbol{Q}}=F(\cdot,\hat{\boldsymbol{x}}^{2},\hat{\boldsymbol{x}}^{3},\cdots, \hat{\boldsymbol{x}}^{d-1},\cdot)\), then
Therefore \(\hat{\boldsymbol{x}}^{1}\) and \(\hat{\boldsymbol{x}}^{d}\) can be found by randomization, which concludes the induction step. □
Lemma A.1
If a polynomial function p(x) is square-free and z∈[−1,1]n, then \(\boldsymbol{x}'\in\mathbb{B}^{n}\) and \(\boldsymbol{x}'' \in\mathbb{B}^{n}\) can be found in polynomial-time, such that p(x′)⩽p(z)⩽p(x″).
Proof
Since p(x) is square-free, by fixing x 2,x 3,⋯,x n as constants and taking x 1 as the variable, we may rewrite
Let
Then
Repeat the same procedures for z 2,z 3,⋯,z n , and let them be replaced by \(x_{2}',x_{3}', \cdots,x_{n}'\) respectively. Then \(\boldsymbol{x}'=(x_{1}',x_{2}',\cdots ,x_{n}')^{\mathrm {T}}\in\mathbb{B}^{n}\) satisfies p(x′)⩽p(z). Using a similar argument, we may find \(\boldsymbol{x}''\in\mathbb{B}^{n}\) with p(x″)⩾p(z). □
1.2 A.2 Proof of Theorem 3.2
Proof
Let \(f(\boldsymbol{x})=F(\underbrace{\boldsymbol{x},\boldsymbol{x},\cdots,\boldsymbol{x}}_{d})\) with F being super-symmetric. (H) can be relaxed to
By Theorem 3.1 we are able to find a set of binary vectors \((\hat{\boldsymbol{x}}^{1},\hat{\boldsymbol{x}}^{2},\cdots,\hat{\boldsymbol{x}}^{d})\) in polynomial-time, such that
When d is odd, let ξ 1,ξ 2,⋯,ξ d be i.i.d. random variables, each taking values 1 and −1 with equal probability. Then by Lemma 3.4 it follows that
Thus we may find a binary vector \(\boldsymbol{\beta}=(\beta_{1}, \beta_{2}, \cdots, \beta_{d})^{\mathrm {T}}\in\mathbb{B}^{d}\), such that
Now we notice that \(\frac{1}{d}\sum_{k=1}^{d} (\prod_{i\not= k} \beta_{i} )\hat{\boldsymbol{x}}^{k}\in[-1,1]^{n}\), because for all 1⩽j⩽n,
Since f(x) is square-free, by Lemma A.1 we are able to find \(\tilde{\boldsymbol{x}}\in\mathbb{B}^{n}\) in polynomial-time, such that
□
Lemma A.2
Suppose in (P): \(\max_{\boldsymbol{x}\in\mathbb{B}^{n}}p(\boldsymbol{x})\), the objective polynomial function p(x) is square-free and has no constant term. Then v min(P)⩽0⩽v max(P), and a binary vector \(\boldsymbol{x}'\in \mathbb{B}^{n}\) can be found in polynomial-time with p(x′)⩾0.
Proof
Let ξ=(ξ 1,ξ 2,⋯,ξ n )T, whose components are i.i.d. random variables and take values 1 and −1 with equal probability. Then for any term \(a_{i_{1}i_{2}\cdots i_{k}}x_{i_{1}}x_{i_{2}} \cdots x_{i_{k}}\) with degree k (1⩽k⩽d) of p(x), by the square-free property, it follows
This implies \(\textbf{\textsf{E}} [p(\boldsymbol{\xi})]=0\), and consequently v min(P)⩽0⩽v max(P). By a randomization process, a binary vector \(\boldsymbol{x}'\in \mathbb{B}^{n}\) can be found in polynomial-time with p(x′)⩾0. □
We remark that the second part of Lemma A.2 can also be proven by conducting the procedure in Lemma A.1 with the input vector 0∈[−1,1]n, since p(0)=0. Therefore, finding a binary vector \(\boldsymbol{x}'\in\mathbb{B}^{n}\) with p(x′)⩾0 can be done by either a randomized process (Lemma A.2) or a deterministic process (Lemma A.1).
1.3 A.3 Proof of Theorem 3.3
Proof
Like in the proof of Theorem 3.2, by relaxing (H) to \((\tilde{T})\), we are able to find a set of binary vectors \((\hat{\boldsymbol{x}}^{1},\hat {\boldsymbol{x}}^{2},\cdots,\hat{\boldsymbol{x}}^{d})\) with
Besides, we observe that \(v_{\max}(H) \leqslant v_{\max}(\tilde{T})\) and \(v_{\min}(H) \geqslant v_{\min}(\tilde{T})=-v_{\max}(\tilde{T})\). Therefore
Let ξ 1,ξ 2,⋯,ξ d be i.i.d. random variables, each taking values 1 and −1 with equal probability. Use a similar argument of (4), we have \({1\over d}\sum_{k=1}^{d} \xi_{k} \hat{\boldsymbol{x}}^{k}\in[-1,1]^{n}\). Then by Lemma A.1, there exists \(\hat{\boldsymbol{x}}\in\mathbb{B}^{n}\) such that
Applying Lemma 3.4 and we have
where the last inequality is due to \(2\,v_{\max}(\tilde{T}) \geqslant v_{\max }(H)- v_{\min}(H)\). Thus we may find a binary vector \(\boldsymbol{\beta}=(\beta_{1}, \beta_{2}, \cdots, \beta_{d})^{\mathrm {T}}\in\mathbb{B}^{d}\) with \(\prod_{i=1}^{d} \beta_{i}=1\), such that
Noticing \({1\over d}\sum_{k=1}^{d}\beta_{k}\hat{\boldsymbol{x}}^{k}\in[-1,1]^{n}\) and applying Lemma A.1, by the square-free property of f(x), we are able to find \(\tilde{\boldsymbol{x}}\in\mathbb{B}^{n}\) with
□
1.4 A.4 Proof of Theorem 3.5
Proof
Like in the proof of Theorem 3.2, by relaxing model (M) to (T), we are able to find a set of binary vectors \((\hat{\boldsymbol{x}}^{1},\hat{\boldsymbol{x}}^{2},\cdots,\hat{\boldsymbol{x}}^{d})\) with
Let ξ=(ξ 1,ξ 2,⋯,ξ d )T, whose components are i.i.d. random variables taking values 1 and −1 with equal probability. Denote
Without loss of generality, we assume d 1 to be odd. By applying Lemma 3.4 d times, it is easy to verify that
Thus we are able to find \(\boldsymbol{\beta}=(\beta_{1}, \beta_{2}, \cdots, \beta_{d})^{\mathrm {T}}\in\mathbb{B}^{d}\), such that
It is easy to verify that \(\prod_{i=1}^{d} \beta_{i}\hat{\boldsymbol{x}}_{\beta}^{1}/d_{1}\in[-1,1]^{n}\), and \(\hat{\boldsymbol{x}}_{\beta}^{k}/d_{k}\in[-1,1]^{n}\) for k=2,3,⋯s. By the square-free property of the function f and applying Lemma A.1, we are able to find a set of binary vectors \((\tilde{\boldsymbol{x}}^{1},\tilde{\boldsymbol{x}}^{2},\cdots,\tilde{\boldsymbol{x}}^{s})\) in polynomial-time, such that
□
1.5 A.5 Proof of Theorem 3.6
Proof
The proof is analogous to the proof of Theorem 3.3. The main differences are: (i) we use
instead of invoking Lemma 3.4 directly, where \(\hat{\boldsymbol{x}}_{\xi}^{k}~(k=1,2,\cdots,s)\) is defined by (5); and (ii) we use \(f ({1\over d_{1}}\hat{\boldsymbol{x}}_{\xi}^{1},{1\over d_{2}}\hat{\boldsymbol{x}}_{\xi}^{2},\cdots,{1\over d_{s}}\hat{\boldsymbol{x}}_{\xi}^{s} )\) instead of \(f ({1\over d}\sum_{k=1}^{d}\xi_{k}\hat{\boldsymbol{x}}^{k} )\) during the randomization process. □
1.6 A.6 Proof of Proposition 3.7
Proof
When d=d′=1, (T)′ can be written as
For any fixed x, the corresponding optimal y must be Q T x/∥Q T x∥ due to the Cauchy-Schwartz inequality, and accordingly,
Thus the problem is equivalent to \(\max_{\boldsymbol{x}\in\mathbb{B}^{n_{1}}} \boldsymbol{x}^{\mathrm {T}}\boldsymbol{Q}\boldsymbol{Q}^{\mathrm {T}}\boldsymbol{x}\). Noticing that QQ T is positive semidefinite, by the result of Nesterov [27], it admits an approximation ratio 2/π. Thus the original problem admits a polynomial-time approximation algorithm with approximation ratio \(\sqrt {2/\pi}\). □
Proposition A.3
(He, Li, and Zhang [19])
\((S){:}\ \max_{\boldsymbol{y}^{1}\in\mathbb{S}^{m_{1}}, \boldsymbol{y}^{2}\in \mathbb{S}^{m_{2}}}(\boldsymbol{y}^{1})^{\mathrm {T}}\boldsymbol{Q}\boldsymbol{y}^{2}\) can be solved in polynomial-time, with \(v_{\max}(S) \geqslant \|\boldsymbol{Q}\|/\sqrt{m_{1}}\).
1.7 A.7 Proof of Theorem 3.8
Proof
The proof is based on mathematical induction on the degree d+d′. Proposition 3.7 can be used as the base for the induction process when d+d′=2.
For general d+d′⩾3, if d′⩾2, let Y=y 1(y d′)T. Noticing that ∥Y∥2=∥y 1∥2∥y d′∥2=1, (T)′ can be relaxed to a case with degree d+d′−1, i.e.,
By induction, a feasible solution \((\hat{\boldsymbol{x}}^{1}, \hat{\boldsymbol{x}}^{2},\cdots,\hat{\boldsymbol{x}}^{d},\hat{\boldsymbol{Y}},\hat{\boldsymbol{y}}^{2},\hat{\boldsymbol{y}}^{3},\cdots,\hat{\boldsymbol{y}}^{d'-1})\) can be found in polynomial-time, such that
Let us denote \(\boldsymbol{Q}=F(\hat{\boldsymbol{x}}^{1}, \hat{\boldsymbol{x}}^{2},\cdots, \hat{\boldsymbol{x}}^{d}, \cdot,\hat{\boldsymbol{y}}^{2},\hat{\boldsymbol{y}}^{3},\cdots, \hat{\boldsymbol{y}}^{d'-1},\cdot) \in\mathbb{R}^{m_{1}\times m_{d'}}\). Then by Proposition A.3 (used in DR 2 of [19]), the problem \(\max_{\boldsymbol{y}^{1}\in\mathbb{S}^{m_{1}},\,\boldsymbol{y}^{d'}\in\mathbb {S}^{m_{d'}}}(\boldsymbol{y}^{1})^{\mathrm {T}}\boldsymbol{Q}\boldsymbol{y}^{d'}\) can be solved in polynomial-time, with its optimal solution \((\hat{\boldsymbol{y}}^{1},\hat{\boldsymbol{y}}^{d'})\) satisfying
By the Cauchy-Schwartz inequality, it follows that
Thus we concludes that
For d+d′⩾3 and d′=1, let X=x 1(x d)T. (T)′ can be relaxed to the other case with degree d−1+d′, i.e.,
By induction, the problem admits a polynomial-time approximation algorithm with approximation ratio \((2/\pi)^{\frac {2d-3}{2}}(n_{2}n_{3}\cdots n_{d-1}m_{1}m_{2}\cdots m_{d'-1})^{-\frac{1}{2}}\). In order to decompose X into x 1 and x d, we shall first conduct the randomization procedure as (2) (DR 3.1) in the proof of Theorem 3.1, which will further deteriorate by an additional factor of \(\frac{2}{\pi\sqrt {n_{1}}}\) as shown in (3). Combining these two factors, we are led to the ratio \(\tau_{T}'\). □
1.8 A.8 Proof of Theorem 3.9
Proof
Like in the proof of Theorem 3.2, by relaxing model (H)′ to (T)′, we are able to find \((\hat{\boldsymbol{x}}^{1},\hat{\boldsymbol{x}}^{2},\cdots,\hat{\boldsymbol{x}}^{d},\hat{\boldsymbol{y}}^{1},\hat{\boldsymbol{y}}^{2},\cdots,\hat{\boldsymbol{y}}^{d'})\) with \(\hat{\boldsymbol{x}}^{k}\in\mathbb{B}^{n}\) for all 1⩽k⩽d and \(\hat{\boldsymbol{y}}^{\ell}\in\mathbb{S}^{m}\) for all 1⩽ℓ⩽d′, such that
Let ξ 1,ξ 2,⋯,ξ d ,η 1,η 2,⋯,η d′ be i.i.d. random variables, each taking values 1 and −1 with equal probability. By applying Lemma 3.4 twice, we have
Thus we are able to find \(\boldsymbol{\beta}\in\mathbb{B}^{d}\) and \(\boldsymbol{\beta}'\in\mathbb{B}^{d'}\), such that
If d is odd, let \(\hat{\boldsymbol{x}} = \prod_{i=1}^{d} \beta_{i} \prod_{j=1}^{d'}\beta'_{j}\sum_{k=1}^{d}\beta_{k}\hat{\boldsymbol{x}}^{k}\) and \(\hat{\boldsymbol{y}} =\sum_{\ell=1}^{d'}\beta'_{\ell}\,\hat{\boldsymbol{y}}^{\ell}\); otherwise let \(\hat{\boldsymbol{x}} = \sum_{k=1}^{d}\beta_{k}\hat{\boldsymbol{x}}^{k}\) and \(\hat{\boldsymbol{y}} =\prod_{i=1}^{d} \beta_{i} \prod_{j=1}^{d'}\beta'_{j} \sum_{\ell=1}^{d'}\beta'_{\ell}\,\hat{\boldsymbol{y}}^{\ell}\). Noticing \(\|\hat{\boldsymbol{y}}\|\leqslant d'\) and combining the previous two inequalities, it follows that
Denote \(\tilde{\boldsymbol{y}}=\hat{\boldsymbol{y}}/\|\hat{\boldsymbol{y}}\|\in\mathbb{S}^{m}\). Since \(\hat{\boldsymbol{x}}/d\in[-1,1]^{n}\) and f(x,y) is square-free in x, by applying Lemma A.1, \(\tilde{\boldsymbol{x}}\in\mathbb{B}^{n}\) can be found in polynomial-time, with
□
1.9 A.9 Proof of Theorem 3.10
Proof
Following the same argument as in the proof of Theorem 3.9, we shall get (6), which implies
Denote \(\hat{\boldsymbol{x}}_{\xi}:=\frac{1}{d}\sum_{k=1}^{d}\xi_{k}\hat{\boldsymbol{x}}^{k}\) and \(\hat{\boldsymbol{y}}_{\eta}:=\frac{1}{d'} \sum_{\ell=1}^{d'} \eta_{\ell}\hat{\boldsymbol{y}}^{\ell}\). Clearly we have
Pick any fixed \(\boldsymbol{y}'\in\mathbb{S}^{m}\) and consider the following problem
Since f(x,y′) is square-free in x and has no constant term, by Lemma A.2, a binary vector \(\boldsymbol{x}'\in\mathbb{B}^{n}\) can be found in polynomial-time with \(f(\boldsymbol{x}', \boldsymbol{y}')\geqslant 0 \geqslant v_{\min }(\hat{H}) \geqslant v_{\min}(H')\).
Next we shall argue \(f (\hat{\boldsymbol{x}}_{\xi},\hat{\boldsymbol{y}}_{\eta} )\geqslant v_{\min}(H')\). If this were not the case, then \(f (\hat{\boldsymbol{x}}_{\xi},\hat{\boldsymbol{y}}_{\eta} )<v_{\min}(H')\leqslant 0\). By noticing \(\|\hat{\boldsymbol{y}}_{\eta}\|\leqslant 1\), this leads to
Also noticing \(\hat{\boldsymbol{x}}_{\xi}\in[-1,1]^{n}\), by applying Lemma A.1, a binary vector \(\hat{\boldsymbol{x}}\in\mathbb{B}^{n}\) can be found with
resulting in a contradiction.
By that \(f (\hat{\boldsymbol{x}}_{\xi},\hat{\boldsymbol{y}}_{\eta} )- v_{\min }(H')\geqslant 0\), it follows
Thus we are able to find \(\boldsymbol{\beta}\in\mathbb{B}^{d}\) and \(\boldsymbol{\beta}'\in\mathbb{B}^{d'}\) with \(\prod_{i=1}^{d} \beta_{i} \prod_{j=1}^{d'}\beta_{j}'=1\), such that
Denote \(\boldsymbol{y}''=\hat{\boldsymbol{y}}_{\beta'}/\|\hat{\boldsymbol{y}}_{\beta'}\|\in\mathbb {S}^{m}\). Since \(\hat{\boldsymbol{x}}_{\beta}\in[-1,1]^{n}\), by Lemma A.1, a binary vector \(\boldsymbol{x}''\in\mathbb{B}^{n}\) can be found in polynomial-time with \(f(\boldsymbol{x}'',\boldsymbol{y}'')\geqslant f (\hat {\boldsymbol{x}}_{\beta},\boldsymbol{y}'' )\). Below we shall prove either (x′,y′) or (x″,y″) will satisfy
Indeed, if \(-v_{\min}(H')\geqslant \tau_{H}' (v_{\max}(H')- v_{\min}(H') )\), then (x′,y′) satisfies (7) in this case since f(x′,y′)⩾0. Otherwise, if \(-v_{\min}(H')<\tau_{H}' (v_{\max}(H')- v_{\min}(H') )\), then
which implies
The above inequality also implies that \(f (\hat{\boldsymbol{x}}_{\beta},\hat{\boldsymbol{y}}_{\beta'} )> 0\). Therefore, we have
implying (x″,y″) satisfies (7). Finally, argmax{f(x′,y′),f(x″,y″)} satisfies (7) in both cases. □
1.10 A.10 Proof of Theorem 3.11
Proof
The proof is analogous to the proof of Theorem 3.9. We first relax (M)′ to (T)′ and get an approximate solution \((\hat{\boldsymbol{x}}^{1},\hat{\boldsymbol{x}}^{2},\cdots,\hat{\boldsymbol{x}}^{d},\hat{\boldsymbol{y}}^{1},\hat{\boldsymbol{y}}^{2},\cdots,\hat{\boldsymbol{y}}^{d'})\) using Theorem 3.8. By applying Lemma 3.4 s+t times, we have
where
and
In the above identity, as one of d k (k=1,2,⋯,s) or one of \(d_{\ell}'~({\ell}=1,2,\cdots,t)\) is odd, we are able to move \(\prod_{i=1}^{d} \xi_{i}\prod_{j=1}^{d'}\eta_{j}\) into the coefficient of the corresponding vector (\(\hat{\boldsymbol{x}}_{\xi}^{k}\) or \(\hat{\boldsymbol{y}}_{\eta}^{\ell }\) whenever appropriate) in the function f. Other derivations are essentially the same as the proof of Theorem 3.9. □
1.11 A.11 Proof of Theorem 3.12
Proof
The proof is analogous to that of Theorem 3.10. The main differences are: (i) we use
instead of (6); and (ii) we use \(f (\frac{\hat{\boldsymbol{x}}_{\xi}^{1}}{d_{1}},\frac{\hat{\boldsymbol{x}}_{\xi}^{2}}{d_{2}},\cdots, \frac{\hat{\boldsymbol{x}}_{\xi}^{s}}{d_{s}},\frac{\hat{\boldsymbol{y}}_{\eta}^{1}}{d_{1}'}, \frac{\hat{\boldsymbol{y}}_{\eta}^{2}}{d_{2}'},\cdots,\frac{\hat{\boldsymbol{y}}_{\eta}^{t}}{d_{t}'} )\) instead of \(f (\hat{\boldsymbol{x}}_{\xi},\hat{\boldsymbol{y}}_{\eta} )\). □
1.12 A.12 Proof of Theorem 3.14
Proof
We may without loss of generality assume p(x) is square-free since we have (x i )2=1 for i=1,2,⋯,n, which allows us to reduce the power of x i to 0 or 1. We may further assume p(x) to have no constant term. Thus by homogenization
where \(f(\bar{\boldsymbol{x}})=p(\boldsymbol{x})\) if x h =1, and \(f(\bar{\boldsymbol{x}})\) is an (n+1)-dimensional homogeneous polynomial function of degree d. During this proof, the ‘bar’ notation, e.g., \(\bar{\boldsymbol{x}}\), is reserved for an (n+1)-dimensional vector, with the underlying letter x referring to the vector of its first n components, and the subscript ‘h’ (the subscript of x h ) referring to its last component.
Two immediate observations are in order here. First, \(f(\bar{\boldsymbol{x}})\) is square-free with respect to all the variables x 1,x 2,⋯,x n , but is not square-free with respect to x h . Second, the last component of the tensor form F is 0, since there is no constant term in the polynomial p(x).
(P) is then equivalent to
which can be relaxed to an instance of (T) as follows
Let \((\bar{\boldsymbol{u}}^{1},\bar{\boldsymbol{u}}^{2},\cdots,\bar{\boldsymbol{u}}^{d})\) be the feasible solution of \((\bar{T})\) found by Theorem 3.1 with
Denote \(\bar{\boldsymbol{v}}^{k}:=\bar{\boldsymbol{u}}^{k}/d\) for all 1⩽k⩽d, and consequently
Notice that for all 1⩽k⩽d, \(|v_{h}^{k}|=|u_{h}^{k}/d|=1/d\leqslant 1\) and the last component of tensor F is 0. By applying Lemma 3.13, it follows that
and
where (η 1,η 2,⋯,η d )=η T are independent random variables, taking values 1 and −1 with \(\textbf{\textsf{E}} [\eta_{k}]=v_{h}^{k}\) for all 1⩽k⩽d, and (ξ 1,ξ 2,⋯,ξ d )=ξ T are i.i.d. random variables, taking values 1 and −1 with equal probability. By combining these two identities, we have, for any constant c, the following identity
If we let \(c=\max_{\boldsymbol{\beta}\in\mathbb{B}^{d},\prod_{k=1}^{d}\beta_{k}=-1}\mathrm {Prob}\,\{\boldsymbol{\eta}=\boldsymbol{\beta}\}\), then in the above identity, the coefficient of each term F(⋅) is nonnegative. Therefore we are able to find \(\boldsymbol{\beta}'=(\beta_{1}', \beta_{2}', \cdots, \beta_{d}')^{\mathrm {T}} \in\mathbb{B}^{d}\) such that
with
where \(c\leqslant (\frac{1}{2}+\frac{1}{2d} )^{d}\) is applied because \(\textbf{\textsf{E}} [\eta_{k}]=v_{h}^{k}=\pm1/d\) for all 1⩽k⩽d.
Let us denote \(\bar{\boldsymbol{z}}^{k}=\binom{\boldsymbol{z}^{k}}{z_{h}^{k}}:=\binom{\beta_{k}'\boldsymbol{v}^{k}}{1}\) for all 1⩽k⩽d, and we have
For any \(\boldsymbol{\beta}=(\beta_{1}, \beta_{2}, \cdots, \beta_{d})^{\mathrm {T}}\in\mathbb {B}^{d}\), denote
By noticing \(z_{h}^{k}=1\) and \(|z_{i}^{k}|=|v_{i}^{k}|=|u_{i}^{k}|/d=1/d\) for all 1⩽k⩽d and 1⩽i⩽n, it follows that
Thus z(β)/z h (β)∈[−1,1]n. By Lemma A.1, a binary vector \(\boldsymbol{x}'\in\mathbb{B}^{n}\) can be found, such that
Moreover, we shall argue below that
If this were not the case, then \(f (\bar{\boldsymbol{z}}(\beta)/(2d) )<v_{\min}(P) \leqslant 0\) (by Lemma A.2). Notice that β 1=1 implies z h (β)>0, and thus we have
which is a contradiction.
Suppose (ξ 1,ξ 2,⋯,ξ d )=ξ T are i.i.d. random variables, taking values 1 and −1 with equal probability. By Lemma 3.4 it follows that
By inserting and canceling a constant term, noticing \(f (\bar{\boldsymbol{z}}(-\xi) )=f (-\bar{\boldsymbol{z}}(\xi) ) =(-1)^{d}f (\bar{\boldsymbol{z}}(\xi) )\), the above expression further leads to
where the last inequality is due to (10). Therefore, we are able to find \(\boldsymbol{\beta}''= (\beta_{1}'', \beta_{2}'',\cdots,\beta_{d}'')^{\mathrm {T}}\in\mathbb{B}^{d}\) with \(\beta_{1}''=\prod_{k=2}^{d}\beta_{k}''=1\), such that
where the last step is due to (9).
By Lemma A.2, a binary vector \(\boldsymbol{x}'\in\mathbb{B}^{n}\) can be found in polynomial-time with p(x′)⩾0. Since z(β″)/z h (β″)∈[−1,1]n, by Lemma A.1, a binary vector \(\boldsymbol{x}''\in\mathbb{B}^{n}\) can be found in polynomial-time with p(x″)⩾p(z(β″)/z h (β″)). Below we shall prove at least one of x′ and x″ satisfies
Indeed, if −v min(P)⩾τ P (v max(P)−v min(P)), then x′ satisfies (11) in this case. Otherwise we shall have −v min(P)<τ P (v max(P)−v min(P)), then
which implies
The above inequality also implies that \(f (\bar{\boldsymbol{z}}(\beta'') /(2d) )> 0\). Recall that \(\beta_{1}''=1\) implies z h (β″)>0. Therefore,
which implies x″ satisfies (11). Finally, argmax{p(x′),p(x″)} satisfies (11) in both cases. □
Rights and permissions
About this article
Cite this article
He, S., Li, Z. & Zhang, S. Approximation Algorithms for Discrete Polynomial Optimization. J. Oper. Res. Soc. China 1, 3–36 (2013). https://doi.org/10.1007/s40305-013-0003-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40305-013-0003-1
Keywords
- Polynomial optimization problem
- Binary integer programming
- Mixed integer programming
- Approximation algorithm
- Approximation ratio