1 Introduction

This paper is concerned with optimizing a (high degree) multivariate polynomial function in (mixed) binary variables. Our basic model is to maximize a d-th degree polynomial function p(x) where x=(x 1,x 2,⋯,x n )T is chosen such that x i ∈{1,−1} for i=1,2,⋯,n. For ease of referencing, let us call this basic model to be \((P): \max_{\boldsymbol{x}\in\{1,-1\}^{n}} p(\boldsymbol{x})\). This type of problem can be found in a great variety of application domains. For example, the following hypergraph max-covering problem is well studied in the literature, which is precisely (P). Given a hypergraph H=(V,E) with V being the set of vertices and E the set of hyperedges (or subsets of V), and each hyperedge eE is associated with a real-valued weight w(e). The problem is to find a subset S of the vertices set V, such that the total weight of the hyperedges covered by S is maximized. Denoting x i ∈{0,1} (i=1,2,⋯,n) to indicate whether or not vertex i is selected in S. The problem thus is \(\max_{\boldsymbol{x}\in\{0,1\}^{n}}\sum_{e\in E}w(e)\prod_{i\in e}x_{i}\). By a simple variable transformation x i →(x i +1)/2, the problem is transformed to (P), and vice versa.

Note that (P) is a fundamental problem in integer programming. As such it has received attention in the literature; see [17, 18]. It is also known as Fourier support graph problem. Mathematically, a polynomial function p:{−1,1}n→ℝ has Fourier expansion \(p(\boldsymbol{x})=\sum_{S\subseteq\{1,2,\cdots,n\}}\hat{p}(S)\prod_{i\in S}x_{i}\), which is also called Fourier support graph. Assume that p has only succinct (polynomially many) non-zero Fourier coefficient \(\hat{p}(S)\). The question is: Can we compute the maximum value of p over the discrete cube {1,−1}n, or alternatively can we find a good approximate solution in polynomial-time? The latter question actually motivates this paper. Indeed, (P) has been investigated extensively in the quadratic case, due to its connections to various graph partitioning problems, e.g., the maximum cut problem [16]. In general, (P) is closely related to finding the maximum weighted independent set in a graph. In particular, let G=(V,E) be a graph with V the set of vertices V and E the set of edges, and each vertex is assigned a positive weight. We call S to be an independent set of vertices if and only if SV and no two vertices in S share an edge. The problem is to find an independent set of vertices such that the sum of its weights is maximum over all possible independent sets.

In fact, any unconstrained binary polynomial maximization problem can be transformed into the maximum weighted independent set problem, which is also commonly used technique in the literature for solving (P) (see e.g., [5, 30]). The transformation uses the concept of a conflict graph of a 0–1 polynomial function. The idea is illustrated in the following example. Let us consider

$$f(\boldsymbol{x})=-2x_1-2x_2+5x_1x_2-4x_1x_2x_3, \quad (x_1,x_2,x_3)\in\{0,1\}^3. $$

Note that f(x) can be transformed to an equivalent polynomial so that all terms (except the constant term) have positive coefficients. The new polynomial involves both the variables and their complements, i.e., \(\bar{x}_{i}:=1-x_{i}\) for i=1,2,3. In our example, such polynomial can be

$$f(\boldsymbol{x}) = -4 + 2\bar{x}_1 + 2\bar{x}_2 + x_1x_2 + 4 x_1x_2\bar{x}_3. $$

The conflict graph G(f) associated with a polynomial f(x) has vertices corresponding to the terms of f(x), and each vertex is associated with a term in the polynomial except for the constant term. Two vertices in G(f) are connected by an edge if and only if one of the corresponding terms contains a variable and the other corresponding term contains its complement variable. The weight of a vertex in G(f) is the coefficient of the corresponding term in f. The conflict graph of f(x) is shown in Fig. 1. Maximizing the weighted independent set of the conflict graph also solves the binary polynomial optimization problem. Beyond its connection to the graph problems, (P) also has applications in neural networks [4, 8, 21], error-correcting codes [8, 29], etc. For instance, recently Khot and Naor [24] show that it has applications in the problem of refutation of random k-CNF formulas [12, 13].

Fig. 1
figure 1

Conflict graph associated with −2x 1−2x 2+5x 1 x 2−4x 1 x 2 x 3

One important subclass of polynomial function is homogeneous polynomials. Likewise, the homogeneous quadratic case of (P) has been studied extensively; see e.g. [2, 16, 27, 28]. Homogeneous cubic polynomial is also studies by Khot and Naor [24]. Another interesting problem of this class is the ∞↦1-norm of a matrix \(\boldsymbol{M}=(a_{ij})_{n_{1}\times n_{2}}\) (see e.g., [2]), i.e.,

$$\|\boldsymbol{M}\|_{\infty\mapsto1}=\max_{\boldsymbol{x}\in\{1,-1\}^{n_1}, \boldsymbol{y}\in\{ 1,-1\}^{n_2}}\boldsymbol{x}^{\mathrm {T}} \boldsymbol{M}\boldsymbol{y}:= \sum_{1\leqslant i \leqslant n_1, 1\leqslant j\leqslant n_2}a_{ij}x_iy_j. $$

It is quite natural to extend the problem of ∞↦1-norm to higher order tensors. In particular, the ∥F∞↦1 of a d-th order tensor \(\boldsymbol {F}=(a_{i_{1}i_{2}\cdots i_{d}})\) can be defined as

The other generalization of the matrix ∞↦1-norm is to extend the entry a ij of the matrix M to symmetric matrix A ij , i.e., the problem of

$$\max_{\boldsymbol{x}\in\{1,-1\}^{n_1}, \boldsymbol{y}\in\{1,-1\}^{n_2}}\lambda_{\max} \biggl(\sum _{1\leqslant i\leqslant n_1,1\leqslant j\leqslant n_2} x_iy_j\boldsymbol{A}_{ij} \biggr), $$

where λ max(⋅) indicates the largest eigenvalue of a matrix. If the matrix A ij is not restricted to be symmetric, we may instead maximize the largest singular value, i.e.,

$$\max_{\boldsymbol{x}\in\{1,-1\}^{n_1}, \boldsymbol{y}\in\{1,-1\}^{n_2}}\sigma_{\max} \biggl(\sum _{1\leqslant i\leqslant n_1,1\leqslant j\leqslant n_2} x_iy_j\boldsymbol{A}_{ij} \biggr). $$

These two problems are actually equivalent to

$$\everymath{\displaystyle} \begin{array}{l} \max_{\boldsymbol{x}\in\{1,-1\}^{n_1}, \boldsymbol{y}\in\{1,-1\}^{n_2}, \|\boldsymbol{z}\|_2=1} F(\boldsymbol{x},\boldsymbol{y},\boldsymbol{z},\boldsymbol{z}) \quad\mathrm{and} \\[15pt] \max_{\boldsymbol{x}\in\{1,-1\}^{n_1}, \boldsymbol{y}\in\{1,-1\}^{n_2}, \|\boldsymbol{z}\|_2=\|w\|_2=1}F(\boldsymbol{x},\boldsymbol{y},\boldsymbol{z},\boldsymbol{w}) \end{array} $$

respectively, where F is a multilinear function induced by the tensor F, whose (i,j,k,)-th entry is (k,)-th entry of the matrix A ij .

In fact, a very interesting and succinct matrix combinatorial problem is: Given n matrices A i (i=1,2,⋯,n), find a binary combination of the matrices so as to maximize the spectral norm of the combined matrix:

$$\max_{\boldsymbol{x}\in\{1,-1\}^n}\sigma_{\max} \Biggl(\sum _{i=1}^nx_i\boldsymbol{A}_i \Biggr). $$

This is indeed equivalent to

$$\max_{\boldsymbol{x}\in\{1,-1\}^n, \|\boldsymbol{y}\|_2=\|\boldsymbol{z}\|_2=1}F(\boldsymbol{x},\boldsymbol{y},\boldsymbol{z}). $$

All the problems studied in this paper are NP-hard in general, and our focus will be polynomial-time approximation algorithms. In the case that the objective polynomial is quadratic, a well known example is the semidefinite programming relaxation and randomization approach for the max-cut problem due to Goemans and Williamson [16], where essentially a 0.878-approximation ratio of the model \(\max_{\boldsymbol{x}\in\{ 1,-1\}^{n}}\boldsymbol{x}^{\mathrm {T}}\boldsymbol{M}\boldsymbol{x}\) is shown with M being the Laplacian of a given graph. In the case M is only known to be positive semidefinite, Nesterov [27] derived a 0.636-approximation bound. Charikar and Wirth [9] considered a more general model; they proposed an \(\varOmega (\frac{1}{\log n} )\)-approximate algorithm for diagonal-free M. For the matrix ∞↦1-norm problem

$$\max_{\boldsymbol{x}\in\{1,-1\}^{n_1}, \boldsymbol{y}\in\{1,-1\}^{n_2}}\boldsymbol{x}^{\mathrm {T}} \boldsymbol{M}\boldsymbol{y}, $$

Alon and Naor [2] derived a 0.56-approximation bound. Remark that all these approximation bounds remain hitherto the best available ones. When the degree of the polynomial function is greater than 2, to the best of our knowledge, the only known approximation result in the literature is due to Khot and Naor [24], where they showed how to estimate the optimal value of the problem \(\max_{\boldsymbol{x}\in\{1,-1\} ^{n}}\sum_{1\leqslant i,j,k\leqslant n}a_{ijk}x_{i}x_{j}x_{k}\) with (a ijk ) n×n×n being square-free (a ijk =0 whenever two of the indices are equal). Specifically, they presented a polynomial-time procedure to get an estimated value that is no less than \(\varOmega (\sqrt{\frac {\ln n}{n}} )\) times the optimal value. No solution, however, can be derived from the process. Moreover, the process is highly complex and is mainly of theoretical interest.

In this paper we consider the optimization models for a general polynomial function of any fixed degree d in (mixed) binary variables, and present polynomial-time randomized approximation algorithms. The algorithms proposed are fairly simple to implement. This study is motivated by our previous investigations on polynomial optimization under quadratic constraints [19, 20], as well as recent developments on homogeneous polynomial optimization under spherical constraints, e.g., So [31] and Chen et al. [10]. However, the discrete models studied in this paper have novel features, and the analysis is therefore entirely different from previous works. This paper is organized as follows. First, we introduce the notations and models in Sect. 2. In Sect. 3, we present the new approximation results, and also sketch the main ideas, while leaving the technical details to the Appendix. In Sect. 4 we shall discuss a few more specific problems where the models introduced can be directly applied.

2 Notations and Model Descriptions

In this paper we shall use the boldface letters to denote vectors, matrices, and tensors in general (e.g., the decision variable x, the data matrix Q, and the tensor form F), while the usual lowercase letters are reserved for scalars (e.g., x 1 being the first component of the vector x).

2.1 Objective Functions

The objective functions of the optimization models studied in this paper are all multivariate polynomial functions. The following multilinear tensor function plays a major role in our discussion:

$$\mathrm{Function}\,\ T\quad F\bigl(\boldsymbol{x}^1,\boldsymbol{x}^2,\cdots, \boldsymbol{x}^d\bigr)=\sum_{1\leqslant i_1\leqslant n_1, 1\leqslant i_2\leqslant n_2, \cdots, 1\leqslant i_d\leqslant n_d} a_{i_1i_2\cdots i_d} x_{i_1}^1 x_{i_2}^2 \cdots x_{i_d}^d, $$

where \(\boldsymbol{x}^{k} \in\mathbb{R}^{n_{k}}\) for k=1,2,⋯,d; and the letter ‘T’ signifies the notion of tensor. In the shorthand notation we shall denote \(\boldsymbol {F}=(a_{i_{1}i_{2}\cdots i_{d}})\in \mathbb{R}^{n_{1}\times n_{2}\times\cdots\times n_{d}}\) to be a d-th order tensor, and F to be its corresponding multilinear form. Closely related with the tensor F is a general d-th degree homogeneous polynomial function f(x), where x∈ℝn. We call the tensor \(\boldsymbol {F}=(a_{i_{1}i_{2}\cdots i_{d}})\) super-symmetric (see [25]) if \(a_{i_{1}i_{2}\cdots i_{d}}\) is invariant under all permutations of {i 1,i 2,⋯,i d }. As any homogeneous quadratic function uniquely determines a symmetric matrix, a given d-th degree homogeneous polynomial function f(x) also uniquely determines a super-symmetric tensor. In particular, if we denote a d-th degree homogeneous polynomial function:

$$\mathrm{Function}\,\ H\quad f(x)=\sum_{1\leqslant i_1\leqslant i_2\leqslant \cdots \leqslant i_d\leqslant n} a_{i_1i_2\cdots i_d}x_{i_1}x_{i_2}\cdots x_{i_d}, $$

then its corresponding super-symmetric tensor form can be written as \(\boldsymbol {F}= (b_{i_{1}i_{2}\cdots i_{d}})\in\mathbb{R}^{n^{d}}\), with \(b_{i_{1}i_{2}\cdots i_{d}} \equiv a_{i_{1}i_{2}\cdots i_{d}} / |\varPi(i_{1},i_{2},\cdots ,i_{d})|\), where |Π(i 1,i 2,⋯,i d )| is the number of distinctive permutations of the indices {i 1,i 2,⋯,i d }. This super-symmetric tensor representation is indeed unique. Let F be its corresponding multilinear function defined by the super-symmetric tensor F, then we have \(f(\boldsymbol{x}) = F(\underbrace{\boldsymbol{x},\boldsymbol{x},\cdots,\boldsymbol{x}}_{d})\). The letter ‘H’ here is used to emphasize that the polynomial function in question is homogeneous.

We shall also consider in this paper the following:

where \(\boldsymbol{x}^{k}\in\mathbb{R}^{n_{k}}\) for k=1,2,⋯,s, d 1+d 2+⋯+d s =d, and d-th order tensor form \(\boldsymbol {F}\in\mathbb{R}^{n_{1}^{d_{1}}\times n_{2}^{d_{2}}\times\cdots\times n_{s}^{d_{s}}}\); the letter ‘M’ signifies the notion of mixed polynomial forms. We may without loss of generality assume that F has partial symmetric property, namely for any fixed (x 2,x 3,⋯,x s), \(F(\underbrace{\cdot,\cdot,\cdots,\cdot}_{d_{1}}, \underbrace{\boldsymbol{x}^{2},\boldsymbol{x}^{2},\cdots,\boldsymbol{x}^{2}}_{d_{2}}, \cdots, \underbrace{\boldsymbol{x}^{s},\boldsymbol{x}^{s},\cdots,\boldsymbol{x}^{s}}_{d_{s}})\) is a super-symmetric d 1-th order tensor, and so on.

Beyond the homogeneous polynomial functions described above, a generic multivariate inhomogeneous polynomial function of degree d, p(x), can be explicitly written as a summation of homogeneous polynomial functions in decreasing degrees, namely

$$\textrm{Function}\ P\quad p(\boldsymbol{x}) := \sum_{k=1}^d F_k(\underbrace{\boldsymbol{x},\boldsymbol{x},\cdots,\boldsymbol{x}}_k)+f_0= \sum_{k=1}^d f_k(\boldsymbol{x})+f_0, $$

where x∈ℝn, f 0∈ℝ, and \(f_{k}(\boldsymbol{x})=F_{k}(\underbrace{\boldsymbol{x},\boldsymbol{x},\cdots,\boldsymbol{x}}_{k})\) is a homogeneous polynomial function of degree k for k=1,2,⋯,d; the letter ‘P’ signifies the notion of polynomial.

Throughout we shall adhere to the notation F for a multilinear form defined by a tensor form F, and f for a homogeneous polynomial function, and p for an inhomogeneous polynomial function. Without loss of generality we assume that n 1n 2⩽⋯⩽n d in the tensor form \(\boldsymbol {F}\in\mathbb{R}^{n_{1}\times n_{2}\times\cdots\times n_{d}}\), and n 1n 2⩽⋯⩽n s in the tensor form \(\boldsymbol {F}\in\mathbb{R}^{n_{1}^{d_{1}}\times n_{2}^{d_{2}}\times \cdots\times n_{s}^{d_{s}}}\). We also assume at lease one component of the tensor form, F in Functions T, H, M, and F d in Function P is nonzero to avoid triviality. Finally, without loss of generality we assume the inhomogeneous polynomial function p(x) has no constant term, i.e., f 0=0 in Function P.

2.2 Decision Variables

This paper is focused on integer and mixed integer programming with polynomial functions. In particular, two types of decision variables are considered in this paper: discrete binary variables

$$\boldsymbol{x}\in\mathbb{B}^n:= \bigl\{\boldsymbol{z}\in\mathbb{R}^n \,\big|\, {z_i}^2=1,i=1,2,\cdots,n \bigr\}, $$

and continuous variables on the unit sphere:

$$\boldsymbol{y}\in\mathbb{S}^m:= \bigl\{\boldsymbol{z}\in\mathbb{R}^m\,\big|\, \|\boldsymbol{z}\| := \bigl({z_1}^2+{z_2}^2 +\cdots+{z_m}^2 \bigr)^{1/2}=1 \bigr\} . $$

Note that in this paper we shall by default use the Euclidean norm for vectors, matrices and tensors. The decision variables in our models range from the pure binary vector x, to a mixed one including both x \((\in\mathbb{B}^{n})\) and y \((\in\mathbb{S}^{m})\) .

2.3 Model Descriptions

In this paper we consider the following binary integer optimization models with objection functions as specified in Sect. 2.1:

$$\begin{array}{l@{\quad}l@{\quad}l} (T) & \max & F\bigl(\boldsymbol{x}^1,\boldsymbol{x}^2,\cdots,\boldsymbol{x}^d\bigr) \\[4pt] & \mbox{s.t.} & \boldsymbol{x}^k \in\mathbb{B}^{n_k}, \quad k=1,2,\cdots,d; \\[4pt] (H) & \max & f(\boldsymbol{x})=F(\underbrace{\boldsymbol{x},\boldsymbol{x},\cdots,\boldsymbol{x}}_d) \\ & \mbox{s.t.} & \boldsymbol{x}\in\mathbb{B}^n; \\ [5pt] (M) & \max & f\bigl(\boldsymbol{x}^1,\boldsymbol{x}^2,\cdots,\boldsymbol{x}^s\bigr)= F\bigl(\underbrace{\boldsymbol{x}^1,\boldsymbol{x}^1,\cdots,\boldsymbol{x}^1}_{d_1}, \underbrace{\boldsymbol{x}^2,\boldsymbol{x}^2,\cdots,\boldsymbol{x}^2}_{d_2}, \cdots, \\ && \hspace{3.05cm}\underbrace{\boldsymbol{x}^s,\boldsymbol{x}^s,\cdots,\boldsymbol{x}^s}_{d_s}\bigr) \\[17pt] & \mbox{s.t.} & \boldsymbol{x}^k \in\mathbb{B}^{n_k}, \quad k=1,2,\cdots,s; \\ [3pt] (P) & \max & p(\boldsymbol{x}) = \displaystyle\sum_{k=1}^d F_k(\underbrace{\boldsymbol{x},\boldsymbol{x},\cdots ,\boldsymbol{x}}_k)+f_0 \\ & \mbox{s.t.} & \boldsymbol{x}\in\mathbb{B}^n; \end{array} $$

and their mixed models:

$$\begin{array}{l@{\quad}l@{\quad}l} (T)' & \max & F\bigl(\boldsymbol{x}^1,\boldsymbol{x}^2,\cdots,\boldsymbol{x}^d,\boldsymbol{y}^1,\boldsymbol{y}^2,\cdots ,\boldsymbol{y}^{d'}\bigr)\\[4pt] & \mbox{s.t.} & \boldsymbol{x}^k \in\mathbb{B}^{n_k}, \quad k=1,2,\cdots,d, \\[4pt] & & \boldsymbol{y}^{\ell} \in\mathbb{S}^{m_{\ell}}, \quad {\ell}=1,2,\cdots,d'; \\ [5pt] (H)' & \max & f(\boldsymbol{x}, \boldsymbol{y})=F(\underbrace{\boldsymbol{x},\boldsymbol{x},\cdots,\boldsymbol{x}}_d,\underbrace{\boldsymbol{y},\boldsymbol{y},\cdots,\boldsymbol{y}}_{d'}) \\ & \mbox{s.t.} & \boldsymbol{x}\in\mathbb{B}^n, \\ & & \boldsymbol{y}\in\mathbb{S}^m; \\[5pt] (M)' & \max & f\bigl(\boldsymbol{x}^1,\boldsymbol{x}^2,\cdots,\boldsymbol{x}^s,\boldsymbol{y}^1,\boldsymbol{y}^2,\cdots ,\boldsymbol{y}^t\bigr) \\[4pt] & & \quad {}=F\bigl(\underbrace{\boldsymbol{x}^1,\boldsymbol{x}^1,\cdots,\boldsymbol{x}^1}_{d_1}, \cdots, \underbrace{\boldsymbol{x}^s,\boldsymbol{x}^s,\cdots,\boldsymbol{x}^s}_{d_s}, \underbrace{\boldsymbol{y}^1,\boldsymbol{y}^1,\cdots,\boldsymbol{y}^1}_{d_1'}, \cdots, \\[17pt] && \hspace{0.8cm}\underbrace{\boldsymbol{y}^t,\boldsymbol{y}^t,\cdots,\boldsymbol{y}^t}_{d_t'}\bigr) \\[17pt] & \mbox{s.t.} & \boldsymbol{x}^k \in\mathbb{B}^{n_k}, \quad k=1,2,\cdots,s, \\[4pt] & & \boldsymbol{y}^{\ell} \in\mathbb{S}^{m_{\ell}}, \quad {\ell}=1,2,\cdots,t. \end{array} $$

Let d 1+d 2+⋯+d s =d and \(d_{1}'+d_{2}'+\cdots+d_{t}'=d'\) in the above mentioned models. The degrees of the polynomial functions in these models, d for the pure binary models and d+d′ for the mixed models, are understood as fixed constants in our subsequent discussions. As before, we also assume that the tensor forms of the objective functions in (H)′ and (M)′ to have partial symmetric property, m 1m 2⩽⋯⩽m d in (T)′, and m 1m 2⩽⋯⩽m t in (M)′.

2.4 Approximation Ratios

All the optimization problems mentioned in the previous subsection are in general NP-hard when the degree of the objective polynomial function is larger than or equal to 2. This is because each one includes computing the matrix ∞↦1-norm as a subclass, i.e.,

$$\begin{array}{rcl@{\ }l} \|\boldsymbol{Q}\|_{\infty\mapsto1} &=&\max & \bigl(\boldsymbol{x}^1\bigr)^{\mathrm {T}} \boldsymbol{Q} \boldsymbol{x}^2 \\[3pt] &&\mbox{s.t.} & \boldsymbol{x}^1\in\mathbb{B}^{n_1}, \\[3pt] && & \boldsymbol{x}^2\in\mathbb{B}^{n_2}. \end{array} $$

Thus, in this paper we shall focus on polynomial-time approximation algorithms with provable worst-case performance ratios. For any maximization problem (P) defined as max xS f(x), we use v max(P) to denote its optimal value, and v min(P) to denote the optimal value of its minimization counterpart, i.e.,

$$v_{\max}(P) := \max_{\boldsymbol{x}\in S}f(\boldsymbol{x}) \quad \mbox{and}\quad v_{\min}(P) := \min_{\boldsymbol{x}\in S}f(\boldsymbol{x}). $$

Definition 2.1

We call the maximization model (P) to admit a polynomial-time approximation algorithm with approximation ratio τ∈(0,1], if v max(P)⩾0 and a feasible solution zS can be found in polynomial-time such that f(z)⩾τv max(P).

Definition 2.2

We call the maximization model (P) to admit a polynomial-time approximation algorithm with relative approximation ratio τ∈(0,1], if a feasible solution zS can be found in polynomial-time such that f(z)−v min(P)⩾τ(v max(P)−v min(P)).

Regarding to the relative approximation ratios (Definition 2.2), in some cases it is convenient to use the equivalent form: v max(P)−f(z)⩽(1−τ)(v max(P)−v min(P)).

3 Bounds on the Approximation Ratios

In this section we shall present our main results, viz. the approximation ratios for the discrete polynomial optimization models considered in this paper. In order not to distract reading the main results, the proofs will be postponed and placed in the Appendix. To simplify, we use the notion Ω(f(n)) to signify that there are positive universal constants α and n 0 such that Ω(f(n))⩾αf(n) for all nn 0. Throughout our discussion, we shall fix the degree of the objective polynomial function (denoted by d or d+d′ in the paper) to be a constant.

3.1 Homogeneous Polynomials in Binary Variables

Theorem 3.1

\((T){:}\, \max_{\boldsymbol{x}^{k}\in\mathbb{B}^{n_{k}}} F(\boldsymbol{x}^{1},\boldsymbol{x}^{2},\cdots,\boldsymbol{x}^{d})\) admits a polynomial-time approximation algorithm with approximation ratio τ T , where

$$\tau_T:=(n_1n_2\cdots n_{d-2})^{-\frac{1}{2}}(2/ \pi)^{d-1} \ln(1+\sqrt{2}) =\varOmega \bigl((n_1n_2 \cdots n_{d-2})^{-\frac{1}{2}} \bigr). $$

We remark that when d=2, (T) is to compute ∥F∞↦1. The current best polynomial-time approximation ratio for that problem is \(\frac{2\ln(1+\sqrt{2})}{\pi} \approx0.56\) due to Alon and Naor [2]. Huang and Zhang [22] considered similar problems for the complex variables and derived constant approximation ratios.

When d=3, (T) is a slight generalization of the model considered by Khot and Naor [24], where F was assumed to be super-symmetric (implying n 1=n 2=n 3) and square-free (i.e., a ijk =0 whenever two of the three indices are equal). In our case, we discard the assumptions on the symmetry and the square-free property altogether. The approximation bound of the optimal value given in [24] is \(\varOmega (\sqrt{\frac{\ln n_{1}}{n_{1}}} )\); however, no polynomial-time procedure is provided to find a corresponding approximate solution.

Our approximation algorithm works for general degree d based on recursion, and is fairly simple. We may take any approximation algorithm for the d=2 case, say the algorithm by Alon and Naor [2], as a basis. When d=3, noticing that any n 1×n 2×n 3 third order tensor can be written as an (n 1 n 2n 3 matrix by combining its first and second modes, (T) can be relaxed to

$$\everymath{\displaystyle}\begin{array}{l@{\quad}l} \max & F\bigl(\boldsymbol{X},\boldsymbol{x}^3\bigr):=\sum_{1\leqslant i\leqslant n_1,1\leqslant j\leqslant n_2,1\leqslant k\leqslant n_3}a_{ijk}X_{ij}x^3_k \\[15pt] \mbox{s.t.} & \boldsymbol{X}\in\mathbb{B}^{n_1n_2},\quad \boldsymbol{x}^3\in\mathbb{B}^{n_3}. \end{array} $$

This problem is the exact form of (T) when d=2, which can be solved approximately with approximation ratio \(\frac{2\ln(1+\sqrt{2})}{\pi}\). Denote its approximate solution to be \((\hat{\boldsymbol{X}}, \hat{\boldsymbol{x}}^{3})\). The next key step is to recover \((\hat{\boldsymbol{x}}^{1}, \hat{\boldsymbol{x}}^{2})\) from \(\hat{\boldsymbol{X}}\). For this purpose, we introduce the following decomposition routine, which plays a fundamental role in our algorithms.

If we let \(\boldsymbol{M}=F(\cdot,\cdot,\hat{\boldsymbol{x}}^{3})\) and apply DR 3.1, then we can prove the output \((\hat{\boldsymbol{x}}^{1}, \hat{\boldsymbol{x}}^{2})\) satisfies

which yields an approximation ratio for d=3. By a recursive procedure, the approximation algorithm is readily extended to solve (T) with any fixed degree d.

DR 3.1
figure 2

(Decomposition Routine)

Theorem 3.2

If \(F(\underbrace{\boldsymbol{x},\boldsymbol{x},\cdots,\boldsymbol{x}}_{d})\) is square-free and d is odd, then \((H){:}\, \max_{\boldsymbol{x}\in\mathbb {B}^{n}} f(\boldsymbol{x})\) admits a polynomial-time approximation algorithm with approximation ratio τ H , where

$$\tau_H:=d!d^{-d}n^{-\frac{d-2}{2}}(2/\pi)^{d-1} \ln(1+\sqrt{2}) =\varOmega \bigl(n^{-\frac{d-2}{2}} \bigr). $$

Theorem 3.3

If \(F(\underbrace{\boldsymbol{x},\boldsymbol{x},\cdots,\boldsymbol{x}}_{d})\) is square-free and d is even, then \((H){:}\, \max_{\boldsymbol{x}\in \mathbb{B}^{n}} f(\boldsymbol{x})\) admits a polynomial-time approximation algorithm with relative approximation ratio τ H .

The key linkage from multilinear tensor function F(x 1,x 2,⋯,x d) to the homogeneous polynomial function f(x) is the following lemma. Essentially it makes the tensor relaxation method applicable for (H).

Lemma 3.4

(He, Li, and Zhang [19])

Suppose x 1,x 2,⋯,x d∈ℝn, and ξ 1,ξ 2,⋯,ξ d are i.i.d. random variables, each taking values 1 and −1 with equal probability. For any super-symmetric d-th order tensor form F and function f(x)=F(x,x,⋯,x), it holds that

$$\textbf{\textsf{E}} \Biggl[\prod_{i=1}^d \xi_if \Biggl(\sum_{k=1}^d \xi_k \boldsymbol{x}^k \Biggr) \Biggr]=d!F\bigl(\boldsymbol{x}^1,\boldsymbol{x}^2,\cdots,\boldsymbol{x}^d\bigr). $$

Remark that the approximation ratios for (H) hold under the square-free condition. This is because in this case the decision variables are actually in the multilinear form. Hence, one can replace any point in the box ([−1,1]n) by one of its vertices ({−1,1}n) without decreasing its objective function value, due to the linearity. Besides, in the case when d is odd, one may first relax (H) to \(\max_{\boldsymbol{x}\in[-1,1]^{n}}f(\boldsymbol{x})\), and then directly apply the approximation result for homogeneous polynomial maximization over intersection of n co-centered ellipsoids (see [19]). Under the square-free condition, this procedure is able to generate a feasible solution for (H) with approximation ratio \(\varOmega (n^{-\frac{d-2}{2}}\log^{-(d-1)}n )\), which is worse than τ H in Theorem 3.2. Therefore, we may treat Theorem 3.2 an improvement of the approximation ratio.

We move on to consider the mixed form of discrete polynomial optimization model (M). It is a generalization of (T) and (H), making the model applicable to a wider range of practical problems.

Theorem 3.5

If \(F(\underbrace{\boldsymbol{x}^{1},\boldsymbol{x}^{1},\cdots ,\boldsymbol{x}^{1}}_{d_{1}}, \underbrace{\boldsymbol{x}^{2},\boldsymbol{x}^{2},\cdots,\boldsymbol{x}^{2}}_{d_{2}}, \cdots, \underbrace{\boldsymbol{x}^{s},\boldsymbol{x}^{s},\cdots,\boldsymbol{x}^{s}}_{d_{s}})\) is square-free in each x k (k=1,2,⋯,s), and one of d k (k=1,2,⋯,s) is odd, then \((M){:}\, \max_{\boldsymbol{x}^{k}\in\mathbb{B}^{n_{k}}}f(\boldsymbol{x}^{1},\boldsymbol{x}^{2},\cdots,\boldsymbol{x}^{s})\) admits a polynomial-time approximation algorithm with approximation ratio τ M , where

$$\tau_M:= \left\{ \begin{array}{l@{\hspace{-0.4cm}}r} \bigl(\frac{2}{\pi} \bigr)^{d-1} \ln (1+\sqrt{2} ) \prod_{k=1}^sd_k!{d_k}^{-d_k} ({n_1}^{d_1}{n_2}^{d_2}\cdots {n_{s-2}}^{d_{s-2}}{n_{s-1}}^{d_{s-1}-1} )^{-\frac{1}{2}} &\\[3pt] & d_s=1, \\[3pt] \bigl(\frac{2}{\pi} \bigr)^{d-1} \ln (1+\sqrt{2} ) \prod _{k=1}^sd_k!{d_k}^{-d_k} ({n_1}^{d_1}{n_2}^{d_2}\cdots {n_{s-1}}^{d_{s-1}}{n_s}^{d_s-2} )^{-\frac{1}{2}} & \\[3pt] & d_s\geqslant 2. \end{array} \right . $$

Theorem 3.6

If \(F(\underbrace{\boldsymbol{x}^{1},\boldsymbol{x}^{1},\cdots,\boldsymbol{x}^{1}}_{d_{1}}, \underbrace {\boldsymbol{x}^{2},\boldsymbol{x}^{2},\cdots,\boldsymbol{x}^{2}}_{d_{2}}, \cdots, \underbrace{\boldsymbol{x}^{s},\boldsymbol{x}^{s},\cdots,\boldsymbol{x}^{s}}_{d_{s}})\) is square-free in each x k (k=1,2,⋯,s), and all d k (k=1,2,⋯,s) are even, then \((M){:}\, \max_{\boldsymbol{x}^{k}\in\mathbb{B}^{n_{k}}}f(\boldsymbol{x}^{1},\boldsymbol{x}^{2},\cdots,\boldsymbol{x}^{s})\) admits a polynomial-time approximation algorithm with relative approximation ratio τ M .

The main idea in the proof is tensor relaxation (to relax its objective function f(x 1,x 2,⋯,x s) to a multilinear tensor function), which leads to (T). After solving (T) approximately by Theorem 3.1, we are able to adjust the solutions one by one, using Lemma 3.4.

3.2 Homogeneous Polynomials in Mixed Variables

Proposition 3.7

When d=d′=1, \((T)'{:}\, \max_{\boldsymbol{x}^{1} \in\mathbb{B}^{n_{1}},\boldsymbol{y}^{1} \in\mathbb{S}^{m_{1}}} F(\boldsymbol{x}^{1},\boldsymbol{y}^{1})\) admits a polynomial-time approximation algorithm with approximation ratio \(\sqrt{2/\pi}\).

Proposition 3.7 serves as the basis for (T)′ of general d and d′. In this particular case, (T)′ can be equivalently transformed into \(\max_{\boldsymbol{x}\in\mathbb{B}^{n_{1}}}\boldsymbol{x}^{\mathrm {T}}\boldsymbol{Q}\boldsymbol{x}\) with Q⪰0. The later problem admits a polynomial-time approximation algorithm (SDP relaxation and randomization) with approximation ratio 2/π by Nesterov [27].

Recursion is again the tool to handle the high degree case. For the recursion on d, with discrete variables x k, DR 3.1 is applied in each recursive step. For the recursion on d′, with continuous variables y k, two decomposition routines in He, Li, and Zhang [19] are readily available, namely the eigenvalue decomposition approach (DR 2 of [19]) and the randomized decomposition approach (DR 1 of [19]), either one of them serves the purpose here.

Theorem 3.8

\((T)'{:}\,\max_{\boldsymbol{x}^{k} \in\mathbb{B}^{n_{k}},\,\boldsymbol{y}^{\ell} \in\mathbb{S}^{m_{\ell}}} F(\boldsymbol{x}^{1},\boldsymbol{x}^{2},\cdots,\boldsymbol{x}^{d},\boldsymbol{y}^{1},\boldsymbol{y}^{2},\cdots,\boldsymbol{y}^{d'})\) admits a polynomial-time approximation algorithm with approximation ratio \(\tau_{T}'\), where

From Theorem 3.8, by applying Lemma 3.4 as a linkage, together with the square-free property, we are led to the following two theorems regarding (H)′.

Theorem 3.9

If \(F(\underbrace{\boldsymbol{x},\boldsymbol{x},\cdots,\boldsymbol{x}}_{d},\underbrace{\boldsymbol{y},\boldsymbol{y},\cdots,\boldsymbol{y}}_{d'})\) is square-free in x, and either d or dis odd, then \((H)'{:}\, \max_{\boldsymbol{x}\in\mathbb{B}^{n},\boldsymbol{y} \in\mathbb{S}^{m}}f(\boldsymbol{x},\boldsymbol{y})\) admits a polynomial-time approximation algorithm with approximation ratio \(\tau_{H}'\), where

$$\tau_H':=d!d^{-d}d'!d'^{-d'} (2/\pi)^{\frac{2d-1}{2}}n^{-\frac{d-1}{2}}m^{-\frac{d'-1}{2}} =\varOmega \bigl(n^{-\frac{d-1}{2}}m^{-\frac{d'-1}{2}} \bigr). $$

Theorem 3.10

If \(F(\underbrace{\boldsymbol{x},\boldsymbol{x},\cdots,\boldsymbol{x}}_{d},\underbrace{\boldsymbol{y},\boldsymbol{y},\cdots,\boldsymbol{y}}_{d'})\) is square-free in x, and both d and dare even, then \((H)'{:}\,\max_{\boldsymbol{x}\in\mathbb{B}^{n},\boldsymbol{y} \in\mathbb{S}^{m}}f(\boldsymbol{x},\boldsymbol{y})\) admits a polynomial-time approximation algorithm with relative approximation ratio \(\tau_{H}'\).

By relaxing (M)′ to the multilinear tensor function optimization (T)′ and solving it approximately using Theorem 3.8, we may further adjust its solution one by one using Lemma 3.4, leading to the following general result.

Theorem 3.11

If

$$F\bigl(\underbrace{\boldsymbol{x}^1,\boldsymbol{x}^1,\cdots,\boldsymbol{x}^1}_{d_1}, \cdots, \underbrace{\boldsymbol{x}^s,\boldsymbol{x}^s,\cdots,\boldsymbol{x}^s}_{d_s}, \underbrace{\boldsymbol{y}^1,\boldsymbol{y}^1,\cdots,\boldsymbol{y}^1}_{d_1'}, \cdots,\break \underbrace{\boldsymbol{y}^t,\boldsymbol{y}^t,\cdots,\boldsymbol{y}^t}_{d_t'}\bigr) $$

is square-free in each x k (k=1,2,⋯,s), and one of d k (k=1,2,⋯,s) or one of \(d_{\ell}'\) (=1,2,⋯,t) is odd, then \((M)'{:}\, \max_{\boldsymbol{x}^{k} \in\mathbb{B}^{n_{k}},\boldsymbol{y}^{\ell} \in \mathbb{S}^{m_{\ell}}} f(\boldsymbol{x}^{1}, \boldsymbol{x}^{2},\cdots,\boldsymbol{x}^{s}, \boldsymbol{y}^{1},\boldsymbol{y}^{2},\cdots,\boldsymbol{y}^{t})\) admits a polynomial-time approximation algorithm with approximation ratio \(\tau_{M}'\), where

Theorem 3.12

If

$$F\bigl(\underbrace{\boldsymbol{x}^1,\boldsymbol{x}^1,\cdots,\boldsymbol{x}^1}_{d_1}, \cdots, \underbrace{\boldsymbol{x}^s,\boldsymbol{x}^s,\cdots,\boldsymbol{x}^s}_{d_s}, \underbrace{\boldsymbol{y}^1,\boldsymbol{y}^1,\cdots,\boldsymbol{y}^1}_{d_1'}, \cdots,\break \underbrace{\boldsymbol{y}^t,\boldsymbol{y}^t,\cdots,\boldsymbol{y}^t}_{d_t'}\bigr) $$

is square-free in each x k (k=1,2,⋯,s), and all d k (k=1,2,⋯,s) and all \(d_{\ell}'\) (=1,2,⋯,t) are even, then \((M)'{:}\,\max_{\boldsymbol{x}^{k}\in\mathbb {B}^{n_{k}},\boldsymbol{y}^{\ell} \in\mathbb{S}^{m_{\ell}}}f(\boldsymbol{x}^{1}, \boldsymbol{x}^{2},\cdots ,\boldsymbol{x}^{s}, \boldsymbol{y}^{1},\boldsymbol{y}^{2},\cdots,\boldsymbol{y}^{t})\) admits a polynomial-time approximation algorithm with relative approximation ratio  \(\tau_{M}'\).

3.3 Inhomogeneous Polynomials in Binary Variables

Extending the approximation algorithms and the corresponding analysis for homogeneous polynomial optimization to the general inhomogeneous polynomials is not straightforward. Technically it is also a way to get around the square-free property, which is a requirement for all the homogeneous polynomials mentioned in the previous subsections. The analysis here, like the analysis in our previous paper [20], is to directly deal with homogenization.

It is quite natural to introduce a new variable, say x h , which is actually set to be 1, to yield a homogeneous form for Function P:

where \(f(\bar{\boldsymbol{x}})\) is an (n+1)-dimensional homogeneous polynomial function of degree d, with variable \(\bar{\boldsymbol{x}}\), i.e., \(\boldsymbol {F}\in\mathbb{R}^{(n+1)^{d}}\) and \(\boldsymbol {\bar{x}}\in \mathbb{R}^{n+1}\). Optimization of this homogeneous form can be done due to our previous results, but in general we do not have any control on the solution of x h , which has to be 1 as required by the feasibility. The following lemma ensures that construction of a feasible solution is possible.

Lemma 3.13

(He, Li, and Zhang [20])

Suppose with \(|x_{h}^{k}|\leqslant 1\) for k=1,2,⋯,d. Let η 1,η 2,⋯,η d be independent random variables, each taking values 1 and −1 with \(\textbf{\textsf{E}} [\eta_{k}]=x_{h}^{k}\) for k=1,2,⋯,d, and let ξ 1,ξ 2,⋯,ξ d be i.i.d. random variables, each taking values 1 and −1 with equal probability (thus the mean is 0). If the last component of the tensor F is 0, then we have

$$\textbf{\textsf{E}} \Biggl[\prod_{k=1}^d \eta_kF \biggl(\binom{\eta_1\boldsymbol{x}^1}{1}, \binom{\eta_2\boldsymbol{x}^2}{1}, \cdots, \binom{\eta_d\boldsymbol{x}^d}{1} \biggr) \Biggr] = F\bigl(\bar{\boldsymbol{x}}^1,\bar{\boldsymbol{x}}^2,\cdots,\bar{\boldsymbol{x}}^d\bigr), $$

and

$$\textbf{\textsf{E}} \biggl[F \biggl(\binom{\xi_1\boldsymbol{x}^1}{1},\binom{\xi_2\boldsymbol{x}^2}{1}, \cdots, \binom{\xi_d\boldsymbol{x}^d}{1} \biggr) \biggr] = 0. $$

Our last result is the following theorem.

Theorem 3.14

(P) admits a polynomial-time approximation algorithm with relative approximation ratio τ P , where

$$\tau_P:=\frac{\ln(1+\sqrt{2})}{2(1+e)\,\pi^{d-1}}(d+1)!\, d^{-2d}(n+1)^{-\frac{d-2}{2}} =\varOmega \bigl(n^{-\frac{d-2}{2}} \bigr). $$

We remark that (P) is indeed a very general discrete optimization model. For example, it can be used to model the following general polynomial optimization problem in discrete values:

$$\begin{array}{l@{\quad}l@{\quad}l} (D) & \max & p(\boldsymbol{x}) \\[3pt] & \mbox{s.t.} & x_i\in\bigl\{a_1^i,a_2^i,\cdots,a_{m_i}^i\bigr\},\quad i=1,2,\cdots,n. \end{array} $$

To see this, we observe that by adopting the Lagrange interpolation technique and letting

$$x_i=\sum_{j=1}^{m_i}a_j^i \prod_{1\leqslant k\leqslant m_i,\,k\neq j}\frac {u_i-k}{j-k},\quad i=1,2,\cdots,n, $$

the original decision variables can be equivalently transformed into

$$u_i=j \quad \Longrightarrow\quad x_i=a_j^i, \quad i=1,2,\cdots,n,\ j=1,2,\cdots,m_i, $$

where u i ∈{1,2,⋯,m i }, which can be further represented by ⌈log2 m i ⌉ independent binary variables. Combining these two steps of substitution, (D) is then reformulated as (P), with the degree of its objective polynomial function no larger than max1⩽in {d(m i −1)}, and the dimension of its decision variables being \(\sum_{i=1}^{n}\lceil\log_{2} m_{i}\rceil\).

In many real world applications, the data \(\{a_{1}^{i},a_{2}^{i},\cdots ,a_{m_{i}}^{i}\}\) (i=1,2,⋯,n) in (D) are arithmetic sequences. Then it is much easier to transform (D) to (P), without going through the Lagrange interpolation. It keeps the same degree of the objective polynomial function, and the dimension of its decision variables is \(\sum_{i=1}^{n}\lceil\log_{2} m_{i}\rceil\).

The proofs of all the theorems presented in this section are delegated to Appendix.

4 Examples of Application

As we discussed in Sect. 1, the models studied in this paper have versatile applications. Given the generic nature of the discrete polynomial optimization models (viz. (T), (H), (M), (P), (T)′, (H)′ and (M)′), this point is perhaps self-evident. However, we believe it is helpful to present a few examples at this point with more details, to illustrate the potential modeling opportunities with the new optimization models. We present four problems in this section and show that they are readily formulated by the discrete polynomial optimization models in this paper.

4.1 The Tensor Cut-Norm Problem

The concept of cut-norm is initially defined on a real matrix \(\boldsymbol{A}=(a_{ij})\in\mathbb{R}^{n_{1}\times n_{2}}\), denoted by ∥A C , the maximum over all I⊆{1,2,⋯,n 1} and J⊆{1,2,⋯,n 2}, of the quantity |∑ iI,jJ a ij |. This concept plays a major role in the design of efficient approximation algorithms for dense graph and matrix problems (see e.g., [3, 14]). Alon and Naor [2] proposed a randomized polynomial-time approximation algorithm that approximates the cut-norm with a factor at least 0.56, which is currently the best available approximation ratio. Since a matrix is a second order tensor, it is natural to extend the cut-norm to general higher order tensors, e.g., a recent paper by Kannan [23]. Specifically, given a d-th order tensor \(\boldsymbol {F}=(a_{i_{1}i_{2}\cdots i_{d}})\in\mathbb{R}^{n_{1}\times n_{2}\times \cdots\times n_{d}}\), its cut-norm is defined by

$$\|\boldsymbol {F}\|_C:=\max_{I_k\subseteq\{1,2,\cdots,n_k\},\,k=1,2,\cdots,d} \biggl \vert \sum _{i_k\in I_k,\,k=1,2,\cdots,d} a_{i_1i_2\cdots i_d}\biggr \vert . $$

In fact, the cut-norm ∥F C is closely related to ∥F∞↦1, which is exactly in the form of (T). By Theorem 3.1, there is a polynomial-time approximation algorithm which computes ∥F∞↦1 with a factor at least \(\varOmega ((n_{1}n_{2}\cdots n_{d-2})^{-\frac{1}{2}} )\). The following result, asserts that the cut-norm of a general d-th order tensor can also be approximated by a factor of \(\varOmega ((n_{1}n_{2}\cdots n_{d-2})^{-\frac{1}{2}} )\).

Proposition 4.1

For any d-th order tensor \(\boldsymbol {F}\in\mathbb{R}^{n_{1}\times n_{2}\times \cdots\times n_{d}}\), ∥F C ⩽∥F∞↦1⩽2dF C .

Proof

Let \(\boldsymbol {F}=(a_{i_{1}i_{2}\cdots i_{d}})\in\mathbb{R}^{n_{1}\times n_{2}\times \cdots\times n_{d}}\). Recall that \(\|\boldsymbol {F}\|_{\infty\mapsto1}= \max_{\boldsymbol{x}^{k} \in\mathbb{B}^{n_{k}},\ k=1,2,\cdots,d}F(\boldsymbol{x}^{1},\boldsymbol{x}^{2},\cdots,\boldsymbol{x}^{d})\). Given any \(\boldsymbol{x}^{k}\in\mathbb{B}^{n_{k}}\) for k=1,2,⋯,d, it follows that

which implies ∥F∞↦1⩽2dF C .

Observe that \(\|\boldsymbol {F}\|_{C}=\max_{\boldsymbol{z}^{k}\in\{0,1\}^{n_{k}},\,k=1,2,\cdots ,d}|F(\boldsymbol{z}^{1}, \boldsymbol{z}^{2}, \cdots, \boldsymbol{z}^{d})|\). Given any \(\boldsymbol{z}^{k}\in\{0,1\}^{n_{k}}\) for k=1,2,⋯,d, let z k=(e+x k)/2, where e is the all one vector. Clearly \(\boldsymbol{x}^{k}\in\mathbb{B}^{n_{k}}\) for k=1,2,⋯,d, and thus

which implies ∥F C ⩽∥F∞↦1. □

4.2 The Vector-Valued Maximum Cut Problem

Consider an undirected graph G=(V,E) where V={v 1,v 2,⋯,v n } is the set of the vertices, and EV×V is the set of the edges. On each edge eE there is an associated weight, which is a nonnegative vector in this case, \(\boldsymbol{w}_{e} \in\mathbb{R}_{+}^{m}\). The problem now is to find a cut in such a way that the total sum of the weights, which is a vector in this case, has a maximum norm. More formally, this problem can be formulated as

$$\max_{C\ \mathrm{is\ a\ cut\ of}\ G} \biggl \Vert \sum_{e\in C} \boldsymbol{w}_e \biggr \Vert . $$

Note that the usual max-cut problem is a special case of the above model where each weight w e ⩾0 is a scalar. Similar to the scalar case (see [16]), we may reformulate the above problem in binary variables as

$$\everymath{\displaystyle}\begin{array}{l@{\quad}l} \max & \biggl \Vert \sum_{1\leqslant i,j\leqslant n} x_i x_j \boldsymbol{w}'_{ij} \biggr \Vert \\[15pt] \mbox{s.t.} & \boldsymbol{x} \in\mathbb{B}^{n} , \end{array} $$

where

$$ \boldsymbol{w}'_{ij} = \left \{ \begin{array}{l@{\quad}l} -\boldsymbol{w}_{ij} & i\not= j,\\[3pt] \sum_{1\leqslant k\leqslant n, k\neq i}\boldsymbol{w}_{ik} & i=j. \end{array} \right . $$
(1)

Observing the Cauchy-Schwartz inequality, we further formulate the above problem as

$$\everymath{\displaystyle} \begin{array} {l@{\quad}l} \max& \biggl( \sum_{1\leqslant i,j\leqslant n} x_i x_j \boldsymbol{w}'_{ij} \biggr)^{\mathrm {T}} \boldsymbol{y}=F(\boldsymbol{x},\boldsymbol{x},\boldsymbol{y}) \\[15pt] \mbox{s.t.} & \boldsymbol{x} \in\mathbb{B}^{n} ,\,\boldsymbol{y} \in \mathbb{S}^m. \end{array} $$

This is the exact form of (H)′ with d=2 and d′=1. Although the square-free property in terms of x does not hold in this model (which is a condition of Theorem 3.9), one can still replace any point in the box ([−1,1]n) by one of its vertices ({−1,1}n) without decreasing its objective function value, since the matrix \(F(\cdot,\cdot,\boldsymbol{e}_{k}) = ((w_{ij}')_{k} )_{n\times n}\) is diagonal dominant for k=1,2,⋯,m. Thus, the vector-valued max-cut problem admits an approximation ratio of \(\frac{1}{2} ( \frac{2}{\pi } )^{3/2} n^{-1/2}\) by Theorem 3.9.

If the weights on edges are positive semidefinite matrices (i.e., W ij ∈ℝm×m, W ij 0), then the matrix-valued max-cut problem can also be formulated as

$$\everymath{\displaystyle}\begin{array} {l@{\quad}l} \max& \lambda_{\max} \biggl(\sum_{1\leqslant i,j\leqslant n} x_i x_j \boldsymbol{W}'_{ij} \biggr) \\[17pt] \mbox{s.t.} & \boldsymbol{x} \in\mathbb{B}^{n} , \end{array} $$

where \(\boldsymbol{W}_{ij}'\) is defined similarly to (1); or equivalently,

$$\everymath{\displaystyle}\begin{array} {l@{\quad}l} \max& \boldsymbol{y}^{\mathrm {T}} \biggl( \sum_{1\leqslant i,j\leqslant n} x_i x_j \boldsymbol{W}'_{ij} \biggr) \boldsymbol{y} \\[17pt] \mbox{s.t.} & \boldsymbol{x} \in\mathbb{B}^{n} ,\, \boldsymbol{y} \in \mathbb{S}^m. \end{array} $$

Similar to the vector-valued case, by the diagonal dominant property and Theorem 3.10, the above problem admits an approximation ratio of \(\frac{1}{4} ( \frac{2}{\pi} )^{3/2} (mn)^{-1/2}\). Notice that Theorem 3.10 only asserts a relative approximation ratio; however for this problem the optimal value of its minimization counterpart is obviously nonnegative, and thus a relative approximation ratio implies a usual approximation ratio.

4.3 The Maximum Complete Satisfiability Problem

The usual maximum satisfiability problem (see e.g., [15]) is to find the boolean values of the literals, so as to maximize the total weighted sum of the satisfied clauses. The key point of the problem is that each clause is in the disjunctive form, namely if one of the literals is assigned the true value, then the clause is called satisfied. If the literals are also conjunctive, then this form of satisfiability problem is easy to solve. However, if not all the clauses can be satisfied, and we alternatively look for an assignment that maximizes the weighted sum of the satisfied clauses, then the problem is quite different. To make a distinction from the usual Max-SAT problem, let us call the new problem to be maximum complete satisfiability problem, to be abbreviated as Max-C-SAT. It is immediately clear that Max-C-SAT is NP-hard, since we can easily reduce the max-cut problem to it. The reduction can be done as follows. For each edge (v i ,v j ) we consider two clauses \(\{x_{i},\bar{x}_{j}\}\) and \(\{\bar{x}_{i},x_{j}\}\), both having weight w ij . Then a Max-C-SAT solution leads to a solution to the max-cut problem.

Now consider an instance of the Max-C-SAT problem with m clauses, each clause containing no more than d literals. Suppose that clause k (k=1,2,⋯,m) has the following form

$$\{x_{k_1},x_{k_2},\cdots,x_{k_{s_k}}, \bar{x}_{\bar{k}_1},\bar{x}_{\bar{k}_2},\cdots,\bar{x}_{\bar{k}_{t_k}}\}, $$

where s k +t k d, associated with a weight w k ⩾0 for k=1,2,⋯,m. Then, the Max-C-SAT problem can be formulated in the form of (P) as

$$\everymath{\displaystyle}\begin{array} {l@{\quad}l} \max& \sum _{k=1}^m w_k \prod _{i=1}^{s_k} \frac{1+x_{k_i}}{2} \cdot \prod _{j=1}^{t_k} \frac{1-x_{\bar{k}_j}}{2} \\[3pt] \mbox{s.t.} & \boldsymbol{x} \in\mathbb{B}^{n}. \end{array} $$

According to Theorem 3.14 and the nonnegativity of the objective function, the above problem admits a polynomial-time approximation algorithm with approximation ratio \(\varOmega (n^{-\frac{d-2}{2}} )\), which is independent of the number of clauses m.

4.4 The Box Constrained Diophantine Equation

Solving a system of linear equations where the variables are integers and constrained to a box is an important problem in discrete optimization and linear algebra. Examples of application include the classical Frobenius problem (see e.g., [6]), and a “market split problem” [11], other from engineering applications in integrated circuits design and video signal processing. For more details, one is referred to Aardal et al. [1]. Essentially, the problem is to find an integer-valued x∈ℤn and 0xu, such that Ax=b. The problem can be formulated by the least square method as

$$\begin{array}{l@{\quad}l@{\quad}l} (L)&\max& - (\boldsymbol{A}\boldsymbol{x}-\boldsymbol{b})^{\mathrm {T}}(\boldsymbol{A}\boldsymbol{x}-\boldsymbol{b}) \\[3pt] &\mbox{s.t.} & \boldsymbol{x} \in\mathbb{Z}^{n},\quad \boldsymbol{0}\leqslant \boldsymbol{x} \leqslant \boldsymbol{u}. \end{array} $$

According to the discussion at the end of Sect. 3.3, the above problem can be reformulated as a form of (P), whose objective function is quadratic polynomial and number of decision variables is \(\sum_{i=1}^{n}\lceil\log_{2} (u_{i}+1)\rceil\). By applying Theorem 3.14, (L) admits a polynomial-time approximation algorithm with a constant relative approximation ratio.

In general, the Diophantine equations are polynomial equations. The box constrained polynomial equations can also be formulated by the least square method as of (L). Suppose the highest degree of the polynomial equations is d. Then, this least square problem can be reformulated as a form of (P), with the degree of the objective polynomial being 2d and number of decision variables being \(\sum_{i=1}^{n}\lceil\log_{2} (u_{i}+1)\rceil\). By applying Theorem 3.14, this problem admits a polynomial-time approximation algorithm with a relative approximation ratio \(\varOmega ( (\sum_{i=1}^{n}\log u_{i} )^{-(d-1)} )\).

We have tested extensively the numerical performance of the algorithms proposed in this paper, based on simulated data. In general the results show that the algorithms are not only efficient in the theoretical sense as we prove in this paper, but also effective in practice. The numerical results and the discussion of these results under various circumstances, however, are too lengthy to be included in the current paper. Instead, we refer the interested readers to the recent Ph.D. thesis of one of the authors, Li [26].