Mixer-Phaser Ans\"atze for Quantum Optimization with Hard Constraints

We introduce multiple parametrized circuit ans\"atze and present the results of a numerical study comparing their performance with a standard Quantum Alternating Operator Ansatz approach. The ans\"atze are inspired by mixing and phase separation in the QAOA, and also motivated by compilation considerations with the aim of running on near-term superconducting quantum processors. The methods are tested on random instances of a weighted quadratic binary constrained optimization problem that is fully connected for which the space of feasible solutions has constant Hamming weight. For the parameter setting strategies and evaluation metric used, the average performance achieved by the QAOA is effectively matched by the one obtained by a"mixer-phaser"ansatz that can be compiled in less than half-depth of standard QAOA on most superconducting qubit processors.


I. INTRODUCTION
The Quantum Approximate Optimization Algorithm (QAOA) was initially introduced in Ref. [1]. Its simple structure inspired heuristic algorithms for sampling and exact optimization as well as approximate optimization that generalized the simple structure to include a broader and often more implementable set of operators.
The algorithms following the ansatz alternate p times between unitary operators chosen from a one-parameter family of phase separation operators and operators chosen from a one-parameter family of mixing operators. The mixing operators do not commute with the phase separation operators, enabling exploration of the search space. The aim is to output a state that has good overlap with the low-energy eigenspace of the problem Hamiltonian after p layers. Good parameters can sometimes be determined analytically, or estimated efficiently classically, or may be found using a combination of runs on an quantum processing unit (QPU) together with a classical optimization heuristic [2,3]. The simplest case, one which has been extensively studied in the QAOA literature in terms of theory, numerics, and experiments, is QAOA for MaxCut, for which the ansatz alternates layers consisting of two-qubit parity gates U nm ZZ (γ) = exp[iγZ n Z m ] with single qubit X-rotations U n X (β) = exp[iβX n ] (X-mixer s). In [4], the QAOA approach was generalized to the Quantum Alternating Operator Ansatz, considering unitary layers that are not necessarily linked to local Hamiltonian evolution 1 . In particular, multi-qubit mixing operators were introduced in lieu of the X-rotations when applying the QAOA to hard-constrained optimization in order to restrict the search to the feasible subspace, the space of valid configurations obeying the hard constraints. The simplest among the advanced mixers is the two-qubit * davide.venturelli@nasa.gov 1 In the remainder of this paper, we use the acronym QAOA to mean Quantum Alternating Operator Ansatz, which includes the Quantum Approximate Optimization Algorithms as a special case.

XY gate
which conserves the total spin projection Z n + Z m .
In this work, we introduce a new ansatz that combines the mixing and phase-separation operators into a more general two-parameter family of operators. For this reason, we refer to it as the "Quantum Alternate Mixer-Phaser Ansatz" (QAMPA). The main motivation for this generalization is to reduce the depth of the circuits, potentially reducing performance in the ideal case (by potentially limiting the expressibility of the ansatz) but obtaining improved performance on noisy intermediatescale quantum (NISQ) processors (by running shorterdepth circuits with that can still find good approximate solutions). We perform a numerical study on the performance of QAMPA on a weighted combinatorial optimization problem with hard constraints. For fully-connected binary quadratic optimization problems, the circuits compile to roughly half the depth of standard QAOA on QPUs with nearest-neighbor connectivity. This is the case in most superconducting qubit quantum computers in which qubits are placed on a two-dimensional grid and interact with nearest neighbors through tunable or fixed frequency couplers. Our numerical simulations show that for the problems studied, in the noiseless case, QAMPA performs almost as well as standard QAOA in parameter regimes that are achievable in current hardware, and thus is expected to have advantages under noise given its reduced depth. This ansatz is therefore a viable and attractive approach, particularly for highly connected optimization problems with hard constraints on NISQ hardware.

II. BACKGROUND AND PRIOR WORK
Experimental benchmarks with X-mixers are numerous, especially on superconducting processors, although mostly limited to problems whose topology exactly matches the quantum processor hardware (see [5] for a review). Ref. [5] also explores optimization of the fully con-nected Sherrington-Kirkpatrick model,which requires significant compilation overhead. Unsurprisingly, the compilation requirements resulted in significant performance degradation with circuit depth, due to the relentless unmitigated action of noise during circuit execution in NISQ hardware.
Many techniques have been developed to optimize gate synthesis and qubit routing (i.e. compilation) for algorithms to be run on noisy intermediate scale quantum (NISQ) devices featuring a sparse native gate set. Although experimental QAOA work with XY mixers has still not appeared in the literature, numerical analyses predict long circuit durations for problems with XY mixers that are not encouraging for very near-term hardware [6]. In [7,8] hardware-efficient ansätze were proposed that match the processor topology and the native gates and use the objective function Hamiltonian only to guide the parameter setting procedure and to evaluate the final performance metric. In that approach, QAOA was used as a form of a quantum neural network that needs to be trained to act as an optimization solver, but there is concern as to how well this method, with its many parameters, would work in general.
The standard QAOA approach applied to combinatorial search was discussed in detail in [4]. For a given cost function, it starts from a superposition ideally equally distributed among all possible candidate solutions, i.e.
where F represents the feasible subset of the optimization problem. This state is evolved to a quantum state |ψ F through a circuit composed by alternating sequentially two layers of gates for a number p of rounds. Each round consists of an exploitation (phase-separation) layer U P S (γ) which introduces information related to the cost function to be extremized and an exploration (mixing) layer U M (β) which rearranges probability amplitudes across F. The parameters γ and β are real numbers that need to be optimized layer-by-layer. In practice these layers can be decomposed by products of single and two-qubit gates in an arbitrary order 2 , e.g. U P S (γ) = n,m U nm P S (γ). A final quantum alternating operator ansatz looks like: Note that these products must be ordered when the two-qubit gates do not commute.
Most of the works on QAOA feature exclusively single qubit gates as mixers. Only a few works have discussed the performance of the QAOA using XY -mixers: Ref. [9] studies MaxkColorableSubgraph, [10] looks at MaxkVertexCover and [11] considers QAOA as a quantum state transfer protocol.
III. QAMPA: QUANTUM ALTERNATE "MIXER-PHASER" ANSATZ We now introduce the "Quantum Alternate Mixer-Phaser Ansatz" (QAMPA) unitary, which is the ordered product of two-qubit gates: Here, U nm M P (γ p , β p ) is a two-qubit operation between qubits n and m that is parameterized by two angles γ p , β p ∈ R. The subscript p refers to the pth round. We refer to this operation as a "mixer-phaser" (MP) operation in that it optionally implements mixing and phase separation depending on the value of the two independent parameters. A possible choice for the mixer-phase operation is simply 3 : We will focus on this choice for the analysis in this paper. Note that In other words, Eqs. (7) show that QAOA with 2p parameters γ 1 , . . . , γ p , β 1 , . . . , β p can be mapped to QAMPA with 4p parameters if β 2k+1 = 0 and γ 2k = 0 for k = 0 . . . p (and the non-zero parameters are identified in sequence). However, for the same number of layers the algorithm has double the parameters. It is not clear a priori how QAOA and QAMPA would perform relative to each other when compared at fixed, equal number of parameters. In the next subsection we will investigate this question numerically for a specific illustrative problem.

A. Application to binary optimization with cardinality constraints
Let's consider a special case of an integer program, a quadratic binary optimization problem with N variables, with a constraint that restricts the feasible subspace to bitstrings with a certain fixed Hamming weight κ. Mapping bits (0, 1) into spin variables (-1, +1), this results in the following Ising cost Hamiltonian and constraints: subject to n Z n = 2(κ − N/2).
3 Another possibility could be U nm P S (γp)U nm M (βp).
For h n = 0, this problem is a weighted MaxCut with given sizes of partitions [12] (WeightedMaxCutGSP). It can also be seen as a Markowitzian portfolio optimization problem [13] where the task is to select the best performing κ of assets in a pool assuming correlations between their performance indicators. Note that while for small κ the problem is clearly tractable, for the case κ = N/2 the problem turns into the NP-Hard GraphBisection, which has been mapped to QAOA and studied numerically in the unweighted case where J nm are either 0 or 1 using a sparse XY-mixer in [14].
Following the literature, we construct the QAOA and QAMPA gates using XY mixers [4,9] as: The initial state could be taken to be an equal superposition of all solutions with κ variables set to 1 on the qubit registers. i.e. a Dicke state [15]. By observing the periodicity of the unitaries composing the ansätze, we can observe that if the possible values of the coefficients are commensurable, the angle parameters could be selected within the domains γ ∈ [0, 2π/ min >0 (|h n |, |J nm |)] and β ∈ [0, π] without loss of generality.

B. Synthesis and routing
Efficiently compiling a quantum circuit such as Eq. (4) to a real quantum processors, having pre-defined calibrated two-qubit gates active on a sparse subset of all possible pairs of qubits, is a non-trivial planning and scheduling problem [16]. We would like to estimate the advantage of using QAMPA versus QAOA in common implementation scenarios. For WeightedMaxCutGSP, the required U ZZ gates to implement the objective function are the N (N − 1)/2 edges of a fully-connected graph. The mixer that is responsible for the exploration step of the algorithm by keeping the constraint (9) in check is also ideally a complete mixer, since it is proven numerically to be the best choice for Hamming weight constraints [9,10,17]. Choosing a mixer with sparser connectivity between various terms might lead to shorter circuits, but the compilation advantage of using QAMPA versus QAOA is maximal if we use the same graph for both phase-separation and mixing operations.
We should note that the initialization choice Eq. (2) (e.g. the creation of the Dicke state |ψ 0 , which requires in principle O(κN ) gates [15]), while being the simplest to analyze and possibly the most advantageous based in prior studies of similar problems [9], might be impractical in the near-term. As discussed in [4,18], the initialization procedure could be possibly substituted by a simpler to realize superposition of feasible states or a classical warm start candidate followed by a mixing round in QAOA. In QAMPA, the first round contains mixing so initialization might come for free if the gate is appropriately γ+π/2 β+π synthesized. Hence, initialization is not a concern for the discussion around compilation efficiency.
The routing requirements to schedule gates between all possible pairs of qubits depend on the underlying topology where swap operations can be performed. For a linear device, it was shown in [19] that the most efficient swap network allowing the scheduling of the gates could be executed with maximum parallelization in N steps. For a more connected topology, the linear result is still a worst case scenario which can be implemented by defining an arbitrary Hamiltonian path on the device graph.
As shown in Fig. 1-(a) for an illustratory N = 4 case, the linear efficient compilation of QAOA p = 1 is intertwined with a swap network both for the phase-separation layer (blue box) and for the mixing layer (red box). The routing overhead in this case is a total of p(N − 1) 2 SWAP gates increasing the depth of about 2p(N − 1) if the SWAP gates are not simplified or optimized in synthesis. For QAMPA instead, as shown in Fig. 1-(b), a single swap network is required for the mixing and phase separation layer, resulting in a clear advantage for circuit depth for the same number of parameters.
The optimal synthesis of logical gates depends on the available native operations on the quantum processor. For the sake of illustration, suppose to have access to the common set consisting of CNOT gates and parameterized single qubit rotations about X, Y and Z. This set is universal and admits optimal synthesis formulas for any two-qubit gate utilizing at most 3 CNOTs and 15 single-qubit rotations [20]. Fig. 1-(c) illustrates the optimal synthesis of SWAPU nm M P (γ, β) in terms of the canonical known decomposition. Similar synthesis can be derived for SWAPU nm ZZ and SWAPU nm XY , showing that for the pictured case there should be a factor of 2 between the resulting depths of the two ansätze. A different set of hardware primitives might increase or reduce the advantage, for instance if the fsim gate [5] or the XY gates are available natively then the swap could be subsumed in a renormalization of the angles, as illustrated in Fig. 1-(d) where we pictorialize the identity: SWAPU nm M P (γ, β) = exp(iπ/4)U nm ZZ (γ + π/2)U nm XY (β + π). If the underlying connectivity on the hardware is allto-all, a swap network is not required. Still, for fixed number of angles it could still be depth-advantageous 4 to run QAMPA if the sum of the depth required for the synthesis of both the U nm P S and U nm M is larger than the depth required to synthesize U nm M P , which is almost always the case if the QAOA gates are not natively available.

IV. NUMERICAL EVALUATION
We benchmark the algorithms by numerically simulating the circuits for 40 random fullyconnected WeightedMaxCutGSP problems where J ij ∈ {−1, −0.5, 0.5, 1} (chosen uniformly) and h i = 0, for even N = 4, 6, . . . , 16 and for κ = N/2 (representing the largest search space, |F| 2 N / √ N ). For simplicity, we set linear Zeeman terms h j = 0, since it is mostly inconsequential from the perspective of the compilation overhead. While the order of the execution of U nm ZZ gates does not matter, different orderings of the U nm XY and U nm M P are not equivalent. We consider that all runs related to a given instance are performed with a random permutation of the gates chosen among the sequences that are allowing maximum parallelization and minimum depth when intertwined with a swap network.

A. Performance metric and parameter setting
We want to evaluate the optimization performance of QAOA and QAMPA in terms of their value as a WeightedMaxCutGSP solver. The metric we use is related to a goal of running the algorithm to discover as quickly as possible a bitstring associated to a sufficiently good value (as determined by a pre-defined quality threshold) of the objective function. To address this goal, our target performance metric will be the expected value of the objective function when the best result in R runs is se-lected [21]: where |k is a feasible state whose normalized objective function value is: with 0 , being respectively the minimum and the maximum of the objective function spectrum of values. F ( k ) = p<k | ψ F ( γ, β)|p | 2 is the cumulative distribution function of k , where all sum over states are meant in ascending order k . It could be straightforwardly computed by accessing the wavefunction of the final state, or its sampling statistics. This metrics, beside being the most relevant for optimization purposes, inherits the advantage discussed in the context of other metrics that are more focused on the high-quality solutions portion of the probability, such as the Conditional Value at Risk (CVaR) [22,23] or Gibbs averages [24], which are suspected to have some desirable "trainability" properties to guide parameter setting, as opposed to the more traditional ψ F ( γ, β)|H C |ψ F ( γ, β) [25] (which is simply BEST 1 ). For illustration, we will work with BEST 5 , since R = 5 seems to be a reasonable value to use to reach good approximation ratios for the moderate sizes of problems that we are studying, as we will demonstrate empirically ex-post.
The parameter setting strategy of choice for all experiments in this paper (which we call scanlast, for easiness of reference -see Figure 3) follows a layerwise constructive optimization protocol employing an external blackbox optimizer that guides the repeated execution of the quantum circuit. The protocol aims to identify good angles for both QAOA and QAMPA at level p + 1 using the information of the best found at level p. It starts with a random generation of W 0 pairs of angles that are then used as an initialization for each run of the optimizer. The q best found results (γ 1 (q) and β 1 (q)) are then going to be each seeding the runs at p = 2. More precisely, the p = 2 runs will be each seeded by q batches of W runs each of the form (γ 1 (q), β 1 (q), γ 2 , β 2 ) where the last two angles are chosen randomly. Note that the best initial angles from the previous layer (e.g., γ * 1 and β * 1 ) are allowed to vary when optimizing the next layer. The full layerwise procedure is applying this rule recursively: at layer p + 1 we would launch q searches where each run would initialize the optimizer with (γ 1 (q), β 1 (q), . . . , γ p (q), β p (q), γ p+1 , β p+1 ) for a total complexity of O((W 0 +pqW )f opt ) where f opt is the number of function evaluations used per optimization attempt.
For most of our tests, we choose W 0 = 50, q = 10, W = 250 and f opt = 250, for a total number of runs of 625000 + 2500p per test. Moreover we decide to use Powell's method for the external optimization loop. While this method is not expected to be a suitable choice for Figure 2: Illustrative scanlast parameter setting procedure results for γ (left column) and β (right column) illustrated for an ensemble of 40 N=16 WeightedMaxCutGSP random instances for the procedure stopped at p=10. The colored heatmap reflects visually an interpolation of the probability density function for the value of top 10% best performing angles (normalized in the range 0 − 2π). The actual values for each instance at each iteration are marked as black dots. Blue (red) marks are the best found γ (β) for the specific instance that returned the lowest metric score. (bottom) N=16 as a function of p (darker marks indicate higher p=6-9).
large number of variables, in our problem set it outperformed several other methods used in the literature (e.g. BFGS, Model Gradient Descent, and Nelder-Mead) and provided the closest results to bruteforce search. In Fig. 2 we illustrate the results of the scanlast procedure in practice, showing the 10% best-found QAOA/QAMPA for each instance (black dots) that maximize BEST 5 for 40 instances with N = 16 variables. The displayed results illustrate clearly that for a given instance the best-found phase separation γ p and mixing angles β p for QAOA and QAMPA are similar, at least for low value of p.
Indeed, by analyzing data from random instances of sizes we observe that the first angles of the sequence converge very often to a value of constant magnitude, which is often close to 0. The time-reversal symmetry is manifest in the heatmap since the solutions for which all the angles are negated are equivalent, and the choice of one over another is likely associated to the randomness in scanlast. For the mixing angle the concentration of best performing angles is apparent for almost all tested instances, while for the γ angles at p > 6 the concentration is not as striking. This could be explained by the fact that the value of the BEST 5 metric is already very close to the maximum, so the optimization landscape could be rather flat and difficult to numerically optimize.

B. Instance-by-instance comparison
To compare the performance of the two ansatz we consider the instance-by-instance scatter plot where we compare the BEST 5 results for QAOA vs QAMPA after the scanlast optimization. 5 The figure indicates with cross symbols the metric value for each instance and with the round dots the median/standard deviation of the results for all the instance results associated to a given p. What is clear from the results shown in Figure 4 is that as p increases both ansätze perform essentially the same for all tested sizes. This convergence is not surprising as for large p QAOA and QAMPA for the same number of parameters could be interpreted as a Trotter-like approximation of the same unitary evolution [26]. Similar results are observed also for different R in the metric (including the common expectation value metric R = 1), indicating that the output distribution for the k at the current sizes is smooth and monotonic.

Additional tests and variations of QAMPA/QAOA
The approach that we benchmarked in this study offers some flexibility on its implementation. For instance, given that QAMPA is not fully grounded on insights from a specific Hamiltonian evolution, it might be worth asking whether the information related to the coefficient of objective function (J nm ) are beneficial as inserted in the circuit or if it is sufficient to train the parameters using that information, like done for VQE hardware-efficient ansätze [8]. We investigated empirically these questions by comparing instance-by-instance results for QAOA and QAMPA, using scanlast on a set of instances for N = 10 with slightly modified parameters (f opt = 200p). The results are statistically in line with the ones presented for N = 16 in Figure 4 using higher computational effort.
In Figure 5-(a),(b) we show results for algorithms against a version where for the circuit ansatz all J nm are fixed to be 1 (we call this the -noJ version in the figures). 6 Note that the cost function coefficients are still used in the evaluation metric both for the parameter setting and for scoring the performance. Another test we performed, illustrated in Figure 5-(c),(d), compares the QAMPA performance against the performance of an ansatz that includes only XY gates. For the XY-noJ variation case, the phase separation step of QAOA is completely eliminated and the phase information is entirely absent. The XY label in the figure considers instead the design of an XY-only ansatz that mixes qubits using an angle proportional to the cost function coefficients, i.e. using gates of the form U nm XY (J nm β). What is observed is that the standard QAOA approach (which performs only slightly better than QAMPA, as we recall) gives the best performance compared against all other tested variations, and it is always beneficial to include a proportional factor multiplying γ p , for each gate between qubits n and m, corresponding to the objective function coefficients J nm in the circuit ansatz. These observations were validated for multiple problem sizes up to N = 16.

V. DISCUSSION AND CONCLUSIONS
As reviewed in [27] and [28], modern quantum algorithms for optimization on NISQ devices have generalized significantly the original structure of the QAOA circuitry. In this paper, we have presented a variation that combines the hardware-efficient spirit of Variational Quantum Eigensolvers with the advanced mixers of the Quantum Alternating Operator Ansatz and the guidance from inclusion of operators derived from the cost function without increasing the number of parameters that need to be optimized. Our numerical results indicate that the mixerphaser ansatz QAMPA is a compelling choice among NISQ era quantum options; we expect these ideas can be ported beyond the WeightedMaxCutGSP problem, to provide compilation advantages to a multitude of other hardconstrained combinatorial optimization problems that require advanced mixers. Multiple questions however remain in order to bridge the gap from proof-of-principles to real-world implementation. We list here few avenues of research towards the goal of deploying a QAMPA solver in a quantum processor, before concluding with some more general thoughts on research directions.
First, the parameter setting procedure used in this study could be refined to avoid optimization bottlenecks and limitations that affect all layerwise training protocols at scale. In particular, the Powell method while effective at small N would eventually become intractable, and alternative gradient methods might be hampered by barren plateaus if the parameter setting protocol is kept to be "layerwise" [29]. Recently developed analytical methods based on series expansion [30] or quantum control [31] methods might come in handy to analyze further the reachability deficits and strengths of these algorithms [32], study the observed optimal parameter concentration [2], and to estimate the performance of QAMPA at scale. Ties between quantum annealing schedule and QAOA parameter setting [33][34][35], further indicating that cross-overs between digital and analog optimization methods are also an interesting possible development for QAMPA [36][37][38][39].
Moreover, our performance evaluation procedure based on the BEST 5 provides just an indication of the ability of a QAMPA solver to identify a good solution of WeightedMaxCutGSP in a reasonable time, and a fullyfledged analysis on comparative advantage of using this method versus other heuristics is required. This analysis has to take into account practical issues such as the effect of noise, which is going to adversely impact our performance estimation. The relative performance results will be both problem and hardware dependent. While theoretical frameworks to estimate the impact of noise in circuits featuring XY gates are being developed [40], ultimately only the experimental tests on quantum hard- are after the scanlast parameter setting. QAOA-noJ in (a) is constructed with the phase-separation gates U nm P S (γ, β)| Jnm=1 , the QAMPA-noJ ansatz in (b) is constructed with the mixer-phaser gatesŨ nm M P (γ, β)| Jnm=1 , and the XY-noJ ansatz in (d) usesŨ nm XY (γ, β)| Jnm=1 . In (c) it is shown that the simplified circuit with γ p = 0 (XY) is not performing well against other options.
ware will be able to provide a full picture of performance, including identifying at what layer p the solver is most effective. While in the ideal case, performance cannot decrease with p, under noise the situation is different; performance is observed to decrease quickly with p in current hardware [5]. QAMPA, with its circuit depth reduction compared to QAOA, will enable empirical studies for higher values of p than QAOA on a wide variety of quantum hardware. Hybridization of QAMPA with adaptive [41] and recursive [42] hybrid methods may prove powerful. One advantage of the studied problem, and of others with locally conserved particle number constraints, is that provides a natural error mitigation strategy via post-selection [43]. Namely, measured bitstrings which do not obey the Hamming weight constraint are discarded.
Post-selection has shown to provide significant improvements in experiments on superconducting processors [44] and can be generally applied beneficially to any situation with constraints. Another recent experiment [45] shows that permuting the ordering of the qubits in the SWAP network and averaging over the results is reducing the systematic coherent errors, a technique that could be generalized to our case where the permutations are not equivalents, possibly helping the parameter setting and/or optimization performance.
In terms of actual implementation, the QAMPA method is readily testable in superconducting processors that natively support XY interactions, such as Rigetti's devices of the Aspen family [46]. However, the initialization to a Dicke state is likely too heavy for near-term implementation, so a warm-start from a classical candidate solution obtained by greedy search is advisable [18].
We expect researchers to develop other ansätze that have sweet spots in terms of circuit depth, number of parameters, and performance. As QAMPA illustrates, there is nothing sacrosanct about QAOA; variants may perform as well or better, particularly on NISQ hardware. A rich ecosystem of ansätze that take into account hardware architectures, gate sets, and noise considerations for different types of NISQ processors will enable more rapid understanding of quantum optimization approaches for the NISQ era and beyond.