Efficient encoding of the weighted MAX k-CUT on a quantum computer using QAOA

The weighted MAX k-CUT problem aims at finding a k-partition of a given weighted undirected graph G(V,E) so as to maximize the sum of the weights of the crossing edges. The problem is of particular interest as it has a multitude of practical applications. We present a formulation of the weighted MAX k-CUT problem suitable for running the quantum approximate optimization algorithm (QAOA) on noisy intermediate scale quantum (NISQ)-devices to get approximate solutions. The new formulation uses a binary encoding that requires only |V|log_2(k) qubits. This is an exponential improvement with respect to k on the number of qubits when compared to previously known encodings. We provide a decomposition of the phase separation operator into basis gates and we show that the circuit depth is very shallow if k is a power of two. The resulting circuits are implemented using Qiskit, which is used to benchmark our algorithm on a suite of test cases. The simulations on random graphs with up to 10 vertices show that we are able to achieve competitive approximation ratios for all cases. We provide an analysis of the circuit depth, which shows that the circuit depth is very shallow if k is a power of two. For small k and with further improvements when k is not a power of two, our algorithm is a possible candidate to show quantum advantage on NISQ devices.


Introduction
The search for quantum algorithms of practical interest has intensified since the announcement of quantum supremacy in [3]. For the foreseeable future, quantum hardware will limit the depth (length of the computation) and width (number of qubits) of the algorithms that can be run. Hybrid quantum-classical algorithms based on the variational principle are a promising approach to achieve an advantage over purely classical algorithms. The variational quantum eigensolver (VQE) [18]/quantum approximate optimization algorithm (QAOA) [8] is such a hybrid algorithm for approximately finding the solution of a problem encoded as the ground state of a Hamiltonian. In this early stage, even small reductions of the depth and/or width of an algorithm can make the difference between success and failure. In light of this we investigate in this article how QAOA can be used to approximately solve the MAX k-CUT problem. The problem has interesting applications that make it practically relevant. These range from placement of television commercials in program breaks, placement of containers on a ship with k bays, partition a set of items (e.g. books sold by an online shop) into k subsets, design of product modules, frequency assignment problems, scheduling problems and pattern matching [1,10]. The QAOA consists of the following main steps.
Optimization Problem (S1) Hamiltonian Quantum Device: (S2) Prepare trial state (S3) Measure Classical Device: (S4) Update control parameters solution optimize θ (S1) The solution of a problem is formulated as the ground state of a Hamiltonian H P that encodes a cost function f to be optimized. It acts diagonally on the computational states, i.e., H P |z = f (z) |z .
(S4) The cost function E(θ) ≥ E min serves a classical computer that finds the ground state energy of the cost function, i.e., finds the optimal parameter θ such that E q (θ) becomes minimal. This iterative process provides a(n approximate) solution to the problem.
The main contributions of this article are • the design of a novel encoding of the MAX k-CUT problem, • an efficient decomposition of the unitary operators into basis gates suitable for, e.g., IBM's quantum devices, • numerical simulations on a suite of test cases providing evidence of the success of the method.
The main advantages as compared to the state of the art is that our approach is efficient in the number of qubits and does not require constraints that need to be incorporated into the model.
After describing related work in Section 2, we describe our approach in Section 3. The implementation and results are presented in Section 4 followed by a conclusion in Section 6.

Related Work
In general, coloring problems are often best phrased as Potts models, see [21]. The MAX k-colorable subgraph problem is connected to the search for a ground state in the anti-ferromagnetic k-state Potts model [20]. For an overview of Ising formulations of NP problems, we refer to [16], which includes a discussion of graph coloring, but not of MAX k-CUT.
The QAOA was introduced by [8]. A general overview of hybrid quantum classical algorithms (VQE/QAOA) is provided in, e.g., [17]. The article discusses obstacles and how to overcome them in order to achieve quantum advantage on noisy intermediate scale quantum devices. An extrapolation of the QAOA compared with the classical AKMAXSAT solver on small problem instances of MAX (2-)CUT the authors in [11] estimate that a quantum speedup can be obtained with (several) hundreds of qubits. It has also been shown that the QAOA can achieve solutions of better quality [6] then the best known classical approximation algorithm. The authors in [22] introduce heuristic strategies inspired by quantum annealing to generate good initial points for the outer optimization loop. They show that this leads to large improvements in the approximation ratio achieved.
Since its inception, there have been several extensions/variants of the QAOA proposed. A recent approach presented in [23] is to create an iterative version that is problem-tailored and can adapt to specific hardware constraints. The method is exemplified on a class of MAX (2-)CUT problems, requiring fewer CNOT gates as the original method. A non-local version of QAOA that is shown to significantly outperform standard QAOA for frustrated Ising models on random 3-regular graphs for the MAX (2-)CUT problem is proposed in [4]. The quantum alternating operator ansatz (also abbreviated as QAOA) presented in [12], considers general parametrized families of unitaries. As an example, the authors present the mapping of the max k-colorable subgraph problem to QAOA, which is equivalent to the unweighted MAX k-CUT. Numerical results for XY -mixers to enforce the hard constraints induced by the one-hot encoding of the MAX k-CUT problem are studied in [19]. Analytical and numerical evidence suggests a significant improvement over the general X-mixer presented in [12].

Classical Formulation
The MAX k-CUT problem is an extension of the well known MAX (2-)CUT problem (or simply MAX CUT). Given a weigthed undirected graph G = (V, E), MAX k-CUT consists of finding a maximum-weight k-cut, that is a partition of the vertices into k subsets such that the sum of the weights of the edges that have end points on different subsets is maximized. A bit more formally, let w ij be the weight assigned to each edge (i, j) ∈ E, and let P = P 1 , . . . , P k be a partition of the vertices in V . Then the cost function for MAX k-CUT can be defined as: Alternatively, one could assign a label x i ∈ {1, . . . , k} to each vertex i ∈ V , indicating to which partition the vertex belongs to. Then the cost function for MAX k-CUT can be written as: where [·] is the Iverson bracket, which is 1 if x i = x j , and 0 otherwise. An example of an optimal solution for MAX 3-CUT is given in Figure 1.
MAX k-CUT is in general a difficult problem. Not only is it NP-complete, but it also does not admit any polynomialtime approximation scheme, for any k ≥ 2, unless P=NP [9].
On a classical computer one might consider either exact methods [7], approximation algorithms [9], or heuristics [5]. Exact methods guarantee to find an optimal solution, but they need to deal with the complexity of the problem, which can quickly become computationally intractable with growing |V | and k. On the other hand, approximation algorithms and heuristics are quicker but cannot guarantee the optimality of the solution found. A trivial algorithm that randomly assigns vertices to partitions has an approximation ratio of (1 − 1/k) (because each edge has a probability of (1 − 1/k) of having endpoints in different partitions) [9]. On the other side of the spectrum, there can be no polynomial time approximation algorithm with performance ratio (1 − 1 34k ), unless P=NP [14]. Therefore, all current best polynomial time algorithms are somewhere in between.
The approach that we present in this paper belongs to the category of heuristic algorithms, and we therefore expect to achieve an approximation ratio that is at least (1 − 1/k).

Quantum Encoding and Problem Hamiltonian H P
As a first step, we need to encode the problem described in Section 3.1 in a way that is suitable for the QAOA. There are three different possibilities (of which the first two are presented in [12], and the last is proposed in this article): • Qudit encoding: Expressing the solutions as strings of k-dits (as in Equation (2)) is a natural extension of the MAX (2-)CUT problem to k > 2. The problem can be formulated using |V | qudits. In order to be practically relevant it requires, however, the realization of a k-level quantum system.
• One-hot encoding: A second method is to use k bits for each vertex, where the single bit that is 1 encodes which set/color the vertex belongs to. Using this encoding requires k|V | qubits. A big disadvantage is that the formulation requires the introduction of constraints in order to prevent solutions where a vertex belongs to several sets of a partition or none.
• Binary encoding: For a given k we encode the information of a vertex belonging to one of the sets by |i L , which requires L = log 2 (k) qubits. Here · means rounding up to the nearest integer. This formulation can be executed on systems using qubits and requires L|V | qubits.
Binary encoding uses exponentially fewer qubits as compared to one-hot encoding. As an example, for k = 4 encoding the information of a vertex belonging to one of the four sets using one-hot encoding is done through , whereas for the binary encoding we have Observe that for one-hot encoding there are 2 4 − 4 = 12 basis states of the 4 qubits that encode multiple colors or no color, whereas all 4 basis states used in the binary encoding are valid encodings.
An encoding is not complete without defining the problem Hamiltonian. For qudit and one-hot encoding these can be found in [12]. The following describes the problem Hamiltonian for the proposed binary encoding, which is given as the sum of local terms, i.e., where w i,j is the weight of the edge between vertices i and j. The matrix H i,j is a diagonal matrix modeling the interaction between vertices i and j In the following, we consider the two diagonal matrices H P and A = aI + bH P to be equivalent for all a, b ∈ R, b = 0. The reason for this is that when we compare the unitary operators e −iθA and e −iθB , a parameter a = 0 results in applying a "global phase" which is irrelevant, and b = 0 can be combined with the parameter θ. The cost function can be easily evaluated classically, independent of the specific form of H P .
The entries of the local Hamiltonian H i,j are given by When k is not a power of two the condition ¬(l 0 ≥ k − 1 ∧ l 1 ≥ k − 1) is introduced such that the sets with number k − 1, . . . , 2 L − 1 are not distinguished and become the same set. Organizing the diagonal entries d m in a matrix of size 2 L × 2 L we get a particularly simple structure where l = 2 L − (k − 1), I is the identity matrix, J is a matrix of ones, and Γ c,d is a matrix that has a one at entry c, d and is zero otherwise. Sub-indices indicate the size of the matrix. Observe, that D = D T and that the sum involving terms Γ is zero if k is a power of two. We can construct the matrix H i,j from D through where vec() is a linear transformation which converts a matrix into a column vector by stacking the columns on top of each other, and diag(v) is a matrix with the entries of the vector v along its diagonal.
Next, we will provide a few examples.

MAX (2-)CUT.
For k = 2 we can use L = log 2 (2) = 1 qubit per vertex, where · means rounding up to the nearest integer. The matrix D and the local Hamiltonian are given by

Unitary Evolution
For the binary encoding, there are no constraints on the binary strings to be a valid solution. Therefore, there is no need to design special mixers, and the mixing Hamiltonian is given by This leads to the unitary operator Each term in the above product can be implemented with an R x -gate.
The unitary operator for phase separation is defined by where the last equality holds because the terms H i,j trivially commute, as they are diagonal matrices. Furthermore, we can use Equation (6) Again, equality holds since only diagonal matrices are involved.
The terms in the product in Equation (13) need to be decomposed using basis gates of the set {U 3 , CX}. The first term in Equation (14), can be realized through the following circuit.
The qubits are enumerated such that qubits q 0 i , · · · , q n−1 i correspond to vertex i. The logic behind the circuit shown can be understood from a classical point of view. Applying CX-gates on pairs of qubits acting on basis states between vertex i and j results in the state q 0 i · · · q L−1 i ∂ q 0 j ⊕ q 0 i · · · q L j ⊕ q L i , where the ⊕ operation is modulo 2. This means the state of the qubits belonging to j has zero entries if and only if all qubits have the same basis state. Negating the state and applying a multi-controlled U 3 (0, θ, 0)-gate therefore applies a phase if the original (basis-) states q 0 i · · · q L−1 i ∂ and q 0 j · · · q L j differ. After this one can uncompute by applying X and CX-gates in reversed order such that the overall change is that of applying a phase.
The remaining terms in Equation (14) (which vanish if k is a power of 2) can be implemented, e.g., with the help two ancillary qubits, a 0 , a 1 , in the following way.
The gates N 1 , N 2 in Equation (16) are of the form U 0 ⊗· · ·⊗U L−1 , where U i ∈ {I, X} are chose such that N 1 |0 L = |c L , and N 2 |0 L = |d L . The logic behind this circuit is that multi-controlled NOT gates are used to set two ancillary qubits to the state one if q i = |c L , and q j = |d L . Of both ancillary qubits are one, a multi-controlled U 3 (0, θ, 0)-gate is applied to change the phase, followed by a uncomputation steps. The ancillary qubits can be reused for all other pairs (i, j) ∈ E. An example for MAX 3-CUT is shown in Figure 3.
Next, we will analyse the number of gates required to decompose the basic building blocks of the phase operator U P . Throughout we assume full connectivity of the qubits, i.e., a CX-gate can be executed directly on any pair of qubits, without the need for applying SWAP or Bridge gates [13]. Furthermore, (multi-)controlled U 3 (0, θ, 0) operations can be implemented in terms of its square root V = U 3 (0, θ/2, 0), and V † , using polynomially many CX-gates, see, e.g., [15]. In order to implement the circuit shown in Equation (15) one needs 2L CX-gates, 2L X-gates and 1 (multi-)controlled U 3 -gate. When k is not a power of two we need to additionally execute the circuits of the form shown in Equation (16). This requires 2L C L X-gates, 2 C L U 3 -gates, and L X-gates. In general these need to be applied 2(2 L − k) times. In the worst case when k = 2 n + 1 we need to apply these gates 2(2 n − 1) times. Overall, Table 1 shows the width and depth requirements of the complete circuit for MAX k-CUT. We can see that low depth and width are achieved when k is a power of two. The analysis shows that, when the number of qubits are limited to a few hundred or thousand, only the cases k = 2, 4 and possibly k = 3 will be of practical interest when quantum advantage is to be achieved.

Implementation and Results
In this section we showcase numerical simulations on different types of graphs. The first example is a graph with two vertices connected by an edge. The result of the ideal simulator is shown in Figure 4. The expectation value E(θ) = H P |Ψ(γ,β) for different parameters, often referred to as the energy landscape, is very similar for all cases k ∈ 2, . . . , 8. The energy landscape in the domain [0, 2π] × [0, π/2] has a rotational symmetry of order 2 when k is Figure 2: Initial guess and (locally optimal) parameters γ, β for the graph shown in Figure 6 using the interpolationbased heuristic described in the text. The results indicate a strong correlation of the parameters between different depths p.

MAX 4-CUT
apply phase for |2 2 ⊗ |3 2 apply phase for |3 2 ⊗ |2 2 a 0 a 1 Figure 3: Implementation of the phase operator for the edge (i, j) for MAX 3-CUT, which consists of the circuit for MAX 4-CUT plus two extra circuits.
a power of two and two local minima. For the remaining cases, only one local minimum exists. The approximation ratio, i.e., the obtained value of the cost function divided by the optimal value, is (close to) 1.
Next, we will provide numerical simulations for larger instances of graphs. We briefly describe the heuristic we employ for the classical outer optimization loop. Sampling high-dimensional target functions uniformly quickly becomes intractable for depth p > 1. In order to get a good initial guess of the parameters ( γ p , β p ) at level p for the local optimization procedure we employ the interpolation-based heuristic described in [22], which is given by the following recursion, In above formula the superscript refers to either the initial parameter (superscript 0), or the local optimum (superscript L). The same formula holds for β. For depth p = 1 the expectation value is sampled on an n × m Cartesian grid over the domain [0, γ max ]×[0, β max ]. The initial parameters (γ 0 1 , β 0 1 ) are then given by identifying a pair of parameters which achieves the lowest expectation value on the grid. Using the starting point ( γ 0 p , β 0 p ) a local optimization algorithm, e.g. Nelder-Mead or COBYLA, is used to find the local minimum with ( γ L p , β L p ) . Figure 2 shows that optimal parameters are strongly correlated between different depths p, also for non-regular graphs.
The final two examples are an unweighted Erdős-Rényi graph with 10 vertices, presented in Figure 5, and a weighted Barabási-Albert graph with 10 vertices, see Figure 6 . Again, the energy landscapes for depth p = 1 and diferrent k are fairly similar in all cases, with a local minimum around the point [γ, β] = [π/4, π/8]. For higher depth we employ the interpolation based heuristic. In all cases the average approximation ratio achieved is considerably higher than the approximation ratio of randomly drawing a solution. Furthermore, the average approximation ratio increases with increasing depth. One-hot encoding in the case of MAX 3-CUT, MAX 4-CUT, and MAX 8-CUT would require 33, 44, and 88 qubits, respectively. This becomes quickly prohibitive for a simulator. Our simulations using binary encoding need only 24, 22, and 33 qubits, respectively. In this case simulations can be run on standard PC or using the IBM QASM simulator.

Availability of Data and Code
All data and the python/jupyter notebook source code of the MAX k-CUT-implementation using QAOA for reproducing the results obtained in this article are available at https://github.com/OpenQuantumComputing.

Conclusion
In this article we provide numerical evidence that NISQ device can be used to (approximately) solve the weighted MAX k-CUT. The analysis of the depth and width of the proposed binary encoding shows an exponential improvement of the number of qubits with respect to previously known results. The circuit depth required is very low when k is a power of two. When this is not the case, we provide a proof of principle implementation, which requires an exponential number of CX gates with respect to k. Future research directions are therefore to investigate more efficient ways of decomposing the phase separation operators. Another possibility might be to introduce penalty terms (in the mixing operator) such that the number of possible sets is limited to k, instead of implementing the circuits shown in Equation (16). Finally, the performance of the proposed algorithms could be tested on simulated noise models and real machines. Another factor is to analyse the balance between number of qubits and circuit depth with respect to  extra auxiliary qubits that can be introduced to minimize the number of SWAP/Bridge-gates on hardware without full qubit connectivity.