1 Introduction

The Quantum Approximate Optimization Algorithm (QAOA) was initially introduced in Ref. Farhi et al. (2014). Its simple structure inspired heuristic algorithms for sampling and exact optimization as well as approximate optimization that generalized the simple structure to include a broader and often more implementable set of operators.

The algorithms following the ansatz alternate p times between unitary operators chosen from a one-parameter family of phase separation operators and operators chosen from a one-parameter family of mixing operators. The mixing operators do not commute with the phase separation operators, enabling exploration of the search space. The aim is to output a state that has good overlap with the low-energy eigenspace of the problem Hamiltonian after p layers. Good parameters can sometimes be determined analytically, or estimated efficiently classically, or may be found using a combination of runs on an quantum processing unit (QPU) together with a classical optimization heuristic (Akshay et al. 2021; Rabinovich et al. 2021). The simplest case, one which has been extensively studied in the QAOA literature in terms of theory, numerics, and experiments, is QAOA for \(\mathtt{MaxCut}\), for which the ansatz alternates layers consisting of two-qubit parity gates \(\mathcal {U}^{nm}_{ZZ}(\gamma )=\exp [i\gamma Z_{n}Z_{m}]\) with single qubit X-rotations \(\mathcal {U}^{n}_{X}(\beta )=\exp [i\beta X_{n}]\) (X-mixers).

In Hadfield et al. (2019), the QAOA approach was generalized to the Quantum Alternating Operator Ansatz, considering unitary layers that are not necessarily linked to local Hamiltonian evolutionFootnote 1. In particular, multi-qubit mixing operators were introduced in lieu of the X-rotations when applying the QAOA to hard-constrained optimization in order to restrict the search to the feasible subspace, the space of valid configurations obeying the hard constraints. The simplest among the advanced mixers is the two-qubit XY gate

$$\mathcal{U}^{nm}_{XY}(\beta)=\exp\left[i\beta(X_{n} X_{m} +Y_{n} Y_{m})\right]$$
(1)

which conserves the total spin projection Zn + Zm.

In this work, we introduce a new ansatz that combines the mixing and phase-separation operators into a more general two-parameter family of operators. For this reason, we refer to it as the ”Quantum Alternate Mixer-Phaser Ansatz” (QAMPA). The main motivation for this generalization is to reduce the depth of the circuits, potentially reducing performance in the ideal case (by potentially limiting the expressibility of the ansatz) but obtaining improved performance on noisy intermediate-scale quantum (NISQ) processors (by running shorter-depth circuits with that can still find good approximate solutions). We perform a numerical study on the performance of QAMPA on a weighted combinatorial optimization problem with hard constraints. For fully-connected binary quadratic optimization problems, the circuits compile to roughly half the depth of standard QAOA on QPUs with nearest-neighbor connectivity. This is the case in most superconducting qubit quantum computers in which qubits are placed on a two-dimensional grid and interact with nearest neighbors through tunable or fixed frequency couplers. Our numerical simulations show that for the problems studied, in the noiseless case, QAMPA performs almost as well as standard QAOA in parameter regimes that are achievable in current hardware, and thus is expected to have advantages under noise given its reduced depth. This ansatz is therefore a viable and attractive approach, particularly for highly connected optimization problems with hard constraints on NISQ hardware.

2 Background and prior work

Experimental benchmarks with X-mixers are numerous, especially on superconducting processors, although mostly limited to problems whose topology exactly matches the quantum processor hardware (see Harrigan et al. 2021 for a review). Ref. Harrigan et al. (2021) also explores optimization of the fully connected Sherrington-Kirkpatrick model,which requires significant compilation overhead. Unsurprisingly, the compilation requirements resulted in significant performance degradation with circuit depth, due to the relentless unmitigated action of noise during circuit execution in NISQ hardware.

Many techniques have been developed to optimize gate synthesis and qubit routing (i.e. compilation) for algorithms to be run on noisy intermediate scale quantum (NISQ) devices featuring a sparse native gate set. Although experimental QAOA work with XY mixers has still not appeared in the literature, numerical analyses predict long circuit durations for problems with XY mixers that are not encouraging for very near-term hardware (Do et al. 2020). In Farhi et al. (2017) and Kandala et al. (2017) hardware-efficient ansätze were proposed that match the processor topology and the native gates and use the objective function Hamiltonian only to guide the parameter setting procedure and to evaluate the final performance metric. In that approach, QAOA was used as a form of a quantum neural network that needs to be trained to act as an optimization solver, but there is concern as to how well this method, with its many parameters, would work in general.

The standard QAOA approach applied to combinatorial search was discussed in detail in Hadfield et al. (2019). For a given cost function, it starts from a superposition ideally equally distributed among all possible candidate solutions, i.e.

$$|\psi_{0}\rangle=|\mathcal{F}|^{-\frac{1}{2}} {\sum}_{s\in \mathcal{F}} |s\rangle$$
(2)

where \(\mathcal {F}\) represents the feasible subset of the optimization problem. This state is evolved to a quantum state |ψF〉 through a circuit composed by alternating sequentially two layers of gates for a number p of rounds. Each round consists of an exploitation (phase-separation) layer \(\mathcal {U}_{PS}(\gamma )\) which introduces information related to the cost function to be extremized and an exploration (mixing) layer \(\mathcal {U}_{M}(\beta )\) which rearranges probability amplitudes across \(\mathcal {F}\). The parameters γ and β are real numbers that need to be optimized layer-by-layer. In practice these layers can be decomposed by products of single and two-qubit gates in an arbitrary order Footnote 2 , e.g. \(\mathcal {U}_{PS}(\gamma )={\prod }_{n, m}\mathcal {U}_{PS}^{nm}(\gamma )\). A final quantum alternating operator ansatz looks like:

$$\begin{array}{@{}rcl@{}} |\psi_{F}(\vec\gamma,\vec\beta)\rangle&=& |\mathcal{F}|^{-\frac{1}{2}} {\prod}_{i = 1}^{p} \left[\mathcal{U}(\gamma_{i}, \beta_{i})\right] {\sum}_{k\in\mathcal{F}}|k\rangle \end{array}$$
(3)
$$\begin{array}{@{}rcl@{}} \mathcal{U}(\gamma_{i},\beta_{i})&=& \left( {\prod}_{n,m}\mathcal{U}_{M}^{nm}(\beta_{i}) {\prod}_{n,m}\mathcal{U}_{PS}^{nm}(\gamma_{i}) \right). \end{array}$$
(4)

Note that these products must be ordered when the two-qubit gates do not commute. Most of the works on QAOA feature exclusively single qubit gates as mixers. Only a few works have discussed the performance of the QAOA using XY -mixers: Ref. Wang et al. (2020) studies \(\mathtt{MaxkColorableSubgraph}\), Cook et al. (2020) looks at \(\mathtt{MaxkVertexCover}\) and (Niu et al. 2019) considers QAOA as a quantum state transfer protocol.

3 QAMPA: Quantum alternate “Mixer-Phaser” Ansatz

We now introduce the “Quantum Alternate Mixer-Phaser Ansatz” (QAMPA) unitary, which is the ordered product of two-qubit gates:

$$\begin{array}{@{}rcl@{}} {\tilde U}(\gamma_{p},\beta_{p})&\equiv& \left( {\prod}_{n,m}{U}_{MP}^{nm}(\gamma_{p},\beta_{p}) \right) \end{array}$$
(5)

Here, \({U}_{MP}^{nm}(\gamma _{p},\beta _{p})\) is a two-qubit operation between qubits n and m that is parameterized by two angles \(\gamma _{p}, \beta _{p} \in \mathbb {R}\). The subscript p refers to the p th round. We refer to this operation as a “mixer-phaser” (MP) operation in that it optionally implements mixing and phase separation depending on the value of the two independent parameters. A possible choice for the mixer-phase operation is simplyFootnote 3:

$${U}_{MP}^{nm}(\gamma_{p},\beta_{p})={U}_{M}^{nm}(\beta_{p}){U}_{PS}^{nm}(\gamma_{p}).$$
(6)

We will focus on this choice for the analysis in this paper.

Note that

$$\begin{array}{@{}rcl@{}} {U}_{MP}^{nm}(\gamma_{p},0)&=&{U}_{PS}^{nm}(\gamma_{p})\\ {U}_{MP}^{nm}(0,\beta_{p})&=&{U}_{M}^{nm}(\beta_{p}). \end{array}$$
(7)

In other words, Eq. 7 show that QAOA with 2p parameters \(\gamma _{1},\dots , \gamma _{p}\), \(\beta _{1},\dots , \beta _{p}\) can be mapped to QAMPA with 4p parameters if β2k+ 1 = 0 and γ2k = 0 for \(k=0{\dots } p\) (and the non-zero parameters are identified in sequence). However, for the same number of layers the algorithm has double the parameters. It is not clear a priori how QAOA and QAMPA would perform relative to each other when compared at fixed, equal number of parameters. In the next subsection we will investigate this question numerically for a specific illustrative problem.

3.1 Application to binary optimization with cardinality constraints

Let’s consider a special case of an integer program, a quadratic binary optimization problem with N variables, with a constraint that restricts the feasible subspace to bitstrings with a certain fixed Hamming weight κ. Mapping bits (0, 1) into spin variables (-1, + 1), this results in the following Ising cost Hamiltonian and constraints:

$$\begin{array}{@{}rcl@{}} H_{C}&=&{\sum}_{nm}J_{nm}Z_{n} Z_{m}+{\sum}_{n} h_{n} Z_{n} \end{array}$$
(8)
$$\begin{array}{@{}rcl@{}} \text{subject to}&&{\sum}_{n} Z_{n} = 2(\kappa - N/2). \end{array}$$
(9)

For hn = 0, this problem is a weighted \(\mathtt{MaxCut}\) with given sizes of partitions (Ageev and Sviridenko 1999) (\(\mathtt{WeightedMaxCutGSP}\)). It can also be seen as a Markowitzian portfolio optimization problem (Markowitz 1952) where the task is to select the best performing κ of assets in a pool assuming correlations between their performance indicators. Note that while for small κ the problem is clearly tractable, for the case κ = N/2 the problem turns into the NP-Hard \(\mathtt{GraphBisection}\), which has been mapped to QAOA and studied numerically in the unweighted case where Jnm are either 0 or 1 using a sparse XY-mixer in (Wilson et al. 2021).

Following the literature, we construct the QAOA and QAMPA gates using XY mixers (Hadfield et al. 2019; Wang et al. 2020) as:

$$\begin{array}{@{}rcl@{}} {U}_{PS}^{nm}(\gamma)={U}_{ZZ}^{nm}(\gamma)&=&e^{i\gamma\left( J_{nm}Z_{n} Z_{m}+h_{n}Z_{n}\right)}\\ {U}_{M}^{nm}(\beta)={U}_{XY}^{nm}(\beta)&=&e^{i\beta\left( X_{n} X_{m}+Y_{n} Y_{m}\right)}\\ {\tilde{U}}_{MP}^{nm}(\gamma,\beta)&=&e^{i\left[\beta\left( X_{n} X_{m}+Y_{n} Y_{m}\right)+\gamma(J_{nm}Z_{n} Z_{m}+h_{n}Z_{n})\right]}. \end{array}$$
(10)

The initial state could be taken to be an equal superposition of all solutions with κ variables set to 1 on the qubit registers. i.e. a Dicke state (Bärtschi and Eidenbenz 2019). By observing the periodicity of the unitaries composing the ansätze, we can observe that if the possible values of the coefficients are commensurable, the angle parameters could be selected within the domains \(\gamma \in [0,2 \pi / \min \limits _{>0} (|h_{n}|, |J_{nm}|)]\) and β ∈ [0,π] without loss of generality.

3.2 Synthesis and routing

Efficiently compiling a quantum circuit such as Eq. 4 to a real quantum processors, having pre-defined calibrated two-qubit gates active on a sparse subset of all possible pairs of qubits, is a non-trivial planning and scheduling problem (Venturelli et al. 2018). We would like to estimate the advantage of using QAMPA versus QAOA in common implementation scenarios. For \(\mathtt{WeightedMaxCutGSP}\), the required UZZ gates to implement the objective function are the N(N − 1)/2 edges of a fully-connected graph. The mixer that is responsible for the exploration step of the algorithm by keeping the constraint Eq. 9 in check is also ideally a complete mixer, since it is proven numerically to be the best choice for Hamming weight constraints (Wang et al. 2020; Cook et al. 2020; Bärtschi and Eidenbenz 2020). Choosing a mixer with sparser connectivity between various terms might lead to shorter circuits, but the compilation advantage of using QAMPA versus QAOA is maximal if we use the same graph for both phase-separation and mixing operations.

We should note that the initialization choice Eq. 2 (e.g. the creation of the Dicke state |ψ0〉, which requires in principle \({O}(\kappa N)\) gates (Bärtschi and Eidenbenz 2019)), while being the simplest to analyze and possibly the most advantageous based in prior studies of similar problems (Wang et al. 2020), might be impractical in the near-term. As discussed in (Hadfield et al. 2019; Egger et al. 2021), the initialization procedure could be possibly substituted by a simpler to realize superposition of feasible states or a classical warm start candidate followed by a mixing round in QAOA. In QAMPA, the first round contains mixing so initialization might come for free if the gate is appropriately synthesized. Hence, initialization is not a concern for the discussion around compilation efficiency.

The routing requirements to schedule gates between all possible pairs of qubits depend on the underlying topology where swap operations can be performed. For a linear device, it was shown in Kivlichan et al. (2018) that the most efficient swap network allowing the scheduling of the gates could be executed with maximum parallelization in N steps. For a more connected topology, the linear result is still a worst case scenario which can be implemented by defining an arbitrary Hamiltonian path on the device graph.

As shown in Fig. 1-(a) for an illustratory N = 4 case, the linear efficient compilation of QAOA p = 1 is intertwined with a swap network both for the phase-separation layer (blue box) and for the mixing layer (red box). The routing overhead in this case is a total of p(N − 1)2 \(\mathtt{SWAP}\) gates increasing the depth of about 2p(N − 1) if the \(\mathtt{SWAP}\) gates are not simplified or optimized in synthesis. For QAMPA instead, as shown in Fig. 1-(b), a single swap network is required for the mixing and phase separation layer, resulting in a clear advantage for circuit depth for the same number of parameters.

Fig. 1
figure 1

A pictorial view of the compiled circuits of (a) QAOA and (b) QAMPA for p = 1 and a swap network that compiles all possible pairs of interactions efficiently on a line. (c) shows the canonical decomposition of the mixer-phaser gate with additional swapping \(\mathtt{SWAP}\mathcal{U}_{MP}^{nm}(\gamma,\beta)\) into 3 CNOTs plus single qubit rotations. (d) the same gate decomposed using XY and ZZ as native gates. (Jnm = 1 for simplicity in all figures)

The optimal synthesis of logical gates depends on the available native operations on the quantum processor. For the sake of illustration, suppose to have access to the common set consisting of CNOT gates and parameterized single qubit rotations about X, Y and Z. This set is universal and admits optimal synthesis formulas for any two-qubit gate utilizing at most 3 CNOTs and 15 single-qubit rotations (Vatan and Williams 2004). Figure 1-(c) illustrates the optimal synthesis of \(\mathtt{SWAP}\mathcal{U}_{MP}^{nm}(\gamma,\beta)\) in terms of the canonical known decomposition. Similar synthesis can be derived for \(\mathtt{SWAP}\mathcal{U}_{ZZ}^{nm}\) and \(\mathtt{SWAP}\mathcal{U}_{XY}^{nm}\), showing that for the pictured case there should be a factor of 2 between the resulting depths of the two ansätze. A different set of hardware primitives might increase or reduce the advantage, for instance if the fsim gate (Harrigan et al. 2021) or the XY gates are available natively then the swap could be subsumed in a renormalization of the angles, as illustrated in Fig. 1-(d) where we pictorialize the identity: \(\mathtt{SWAP}\mathcal{U}_{MP}^{nm}(\gamma,\beta)=\exp(i\pi/4)\mathcal{U}_{ZZ}^{nm}(\gamma+\pi/2)\mathcal{U}_{XY}^{nm}(\beta+\pi)\).

If the underlying connectivity on the hardware is all-to-all, a swap network is not required. Still, for fixed number of angles it could still be depth-advantageous Footnote 4 to run QAMPA if the sum of the depth required for the synthesis of both the \({U}_{PS}^{nm}\) and \({U}_{M}^{nm}\) is larger than the depth required to synthesize \({U}_{MP}^{nm}\), which is almost always the case if the QAOA gates are not natively available.

4 Numerical evaluation

We benchmark the algorithms by numerically simulating the circuits for 40 random fully-connected \(\mathtt{WeightedMaxCutGSP}\) problems where Jij ∈{− 1,− 0.5,0.5,1} (chosen uniformly) and hi = 0, for even \(N={4,6,\dots ,16}\) and for κ = N/2 (representing the largest search space, \(| {F}|\simeq 2^{N}/\sqrt {N}\)). For simplicity, we set linear Zeeman terms hj = 0, since it is mostly inconsequential from the perspective of the compilation overhead. While the order of the execution of \({U}_{ZZ}^{nm}\) gates does not matter, different orderings of the \({U}_{XY}^{nm}\) and \({U}_{MP}^{nm}\) are not equivalent. We consider that all runs related to a given instance are performed with a random permutation of the gates chosen among the sequences that are allowing maximum parallelization and minimum depth when intertwined with a swap network.

4.1 Performance metric and parameter setting

We want to evaluate the optimization performance of QAOA and QAMPA in terms of their value as a \(\mathtt{WeightedMaxCutGSP}\) solver. The metric we use is related to a goal of running the algorithm to discover as quickly as possible a bitstring associated to a sufficiently good value (as determined by a pre-defined quality threshold) of the objective function. To address this goal, our target performance metric will be the expected value of the objective function when the best result in R runs is selected (Kim et al. 2019):

$$\langle \text{BEST}_{R}\rangle = \sum\limits_{k\in{F}} \epsilon_{k} \left[\left( 1- F(\epsilon_{k-1})\right)^{R} - \left( 1-F(\epsilon_{k})\right)^{R} \right],$$
(11)

where |k〉 is a feasible state whose normalized objective function value is:

$$\epsilon_{k}=\frac{\langle k| H_{C} |k \rangle-\epsilon_{0}}{\epsilon^{\star}-\epsilon_{0}}.$$
(12)

with 𝜖0, 𝜖 being respectively the minimum and the maximum of the objective function spectrum of values. \(F(\epsilon _{k})={\sum }_{p<k}|\langle \psi _{F}(\vec \gamma ,\vec \beta ) | p \rangle |^{2}\) is the cumulative distribution function of 𝜖k, where all sum over states are meant in ascending order 𝜖k. It could be straightforwardly computed by accessing the wavefunction of the final state, or its sampling statistics. This metrics, beside being the most relevant for optimization purposes, inherits the advantage discussed in the context of other metrics that are more focused on the high-quality solutions portion of the probability, such as the Conditional Value at Risk (CVaR) (Barkoutsos et al. 2020; Díez-Valle et al. 2021) or Gibbs averages (Li et al. 2020), which are suspected to have some desirable “trainability” properties to guide parameter setting, as opposed to the more traditional \(\langle \psi _{F}(\vec \gamma ,\vec \beta ) | H_{C} | \psi _{F}(\vec \gamma ,\vec \beta )\rangle\) (McClean et al. 2018) (which is simply 〈BEST1〉). For illustration, we will work with 〈BEST5〉, since R = 5 seems to be a reasonable value to use to reach good approximation ratios for the moderate sizes of problems that we are studying, as we will demonstrate empirically ex-post (Fig. 2).

Fig. 2
figure 2

Illustrative \(\mathtt{scanlast}\) parameter setting procedure results for γ (left column) and β (right column) illustrated for an ensemble of 40 N = 16 \(\mathtt{WeightedMaxCutGSP}\) random instances for the procedure stopped at p = 10. The colored heatmap reflects visually an interpolation of the probability density function for the value of top 10% best performing angles (normalized in the range 0 − 2π). The actual values for each instance at each iteration are marked as black dots. Blue (red) marks are the best found γ (β) for the specific instance that returned the lowest metric score

The parameter setting strategy of choice for all experiments in this paper (which we call \(\mathtt{scanlast}\), for easiness of reference - see Fig. 3) follows a layerwise constructive optimization protocol employing an external blackbox optimizer that guides the repeated execution of the quantum circuit. The protocol aims to identify good angles for both QAOA and QAMPA at level p + 1 using the information of the best found at level p. It starts with a random generation of W0 pairs of angles that are then used as an initialization for each run of the optimizer. The q best found results (\(\gamma _{1}^{\star }(q)\) and \(\beta _{1}^{\star }(q)\)) are then going to be each seeding the runs at p = 2. More precisely, the p = 2 runs will be each seeded by q batches of W runs each of the form \((\gamma _{1}^{\star }(q), \beta _{1}^{\star }(q), \gamma _{2}, \beta _{2})\) where the last two angles are chosen randomly. Note that the best initial angles from the previous layer (e.g., \(\gamma _{1}^{*}\) and \(\beta _{1}^{*}\)) are allowed to vary when optimizing the next layer. The full layerwise procedure is applying this rule recursively: at layer p + 1 we would launch q searches where each run would initialize the optimizer with \((\gamma _{1}^{\star }(q), \beta _{1}^{\star }(q), \dots , \gamma _{p}^{\star }(q), \beta _{p}^{\star }(q), \gamma _{p+1}, \beta _{p+1})\) for a total complexity of \({O}((W_{0} + pqW)f_{opt})\) where fopt is the number of function evaluations used per optimization attempt.

Fig. 3
figure 3

The \(\mathtt{scanlast}\) protocol. Note that at each iteration the two combinations of angles {γp= 0, βp= 0} as well as {γp=γp− 1, βp=βp− 1} are included in each iteration

For most of our tests, we choose W0 = 50, q = 10, W = 250 and fopt = 250, for a total number of runs of 625000 + 2500p per test. Moreover we decide to use Powell’s method for the external optimization loop. While this method is not expected to be a suitable choice for large number of variables, in our problem set it outperformed several other methods used in the literature (e.g. BFGS, Model Gradient Descent, and Nelder-Mead) and provided the closest results to bruteforce search. In Fig. 2 we illustrate the results of the \(\mathtt{scanlast}\) procedure in practice, showing the 10% best-found QAOA/QAMPA for each instance (black dots) that maximize 〈BEST5〉 for 40 instances with N = 16 variables. The displayed results illustrate clearly that for a given instance the best-found phase separation \(\gamma _{p}^{\star }\) and mixing angles \(\beta _{p}^{\star }\) for QAOA and QAMPA are similar, at least for low value of p.

Indeed, by analyzing data from random instances of sizes we observe that the first angles of the sequence converge very often to a value of constant magnitude, which is often close to 0. The time-reversal symmetry is manifest in the heatmap since the solutions for which all the angles are negated are equivalent, and the choice of one over another is likely associated to the randomness in \(\mathtt{scanlast}\). For the mixing angle the concentration of best performing angles is apparent for almost all tested instances, while for the γ angles at p > 6 the concentration is not as striking. This could be explained by the fact that the value of the 〈BEST5〉 metric is already very close to the maximum, so the optimization landscape could be rather flat and difficult to numerically optimize.

4.2 Instance-by-instance comparison

To compare the performance of the two ansatz we consider the instance-by-instance scatter plot where we compare the 〈BEST5〉 results for QAOA vs QAMPA after the \(\mathtt{scanlast}\) optimization.Footnote 5 The figure indicates with cross symbols the metric value for each instance and with the round dots the median/standard deviation of the results for all the instance results associated to a given p. What is clear from the results shown in Fig. 4 is that as p increases both ansätze perform essentially the same for all tested sizes. This convergence is not surprising as for large p QAOA and QAMPA for the same number of parameters could be interpreted as a Trotter-like approximation of the same unitary evolution (Childs et al. 2021). Similar results are observed also for different R in the metric (including the common expectation value metric R = 1), indicating that the output distribution for the 𝜖k at the current sizes is smooth and monotonic.

Fig. 4
figure 4

Results obtained for the 〈BEST5〉 comparing instance-by-instance QAOA vs QAMPA. (bottom) N = 16 as a function of p (darker marks indicate higher p = 6-9)

4.2.1 Additional tests and variations of QAMPA/QAOA

The approach that we benchmarked in this study offers some flexibility on its implementation. For instance, given that QAMPA is not fully grounded on insights from a specific Hamiltonian evolution, it might be worth asking whether the information related to the coefficient of objective function (Jnm) are beneficial as inserted in the circuit or if it is sufficient to train the parameters using that information, like done for VQE hardware-efficient ansätze (Kandala et al. 2017). We investigated empirically these questions by comparing instance-by-instance results for QAOA and QAMPA, using \(\mathtt{scanlast}\) on a set of instances for N = 10 with slightly modified parameters (fopt = 200p). The results are statistically in line with the ones presented for N = 16 in Fig. 4 using higher computational effort.

In Fig. 5-(a), (b) we show results for algorithms against a version where for the circuit ansatz all Jnm are fixed to be 1 (we call this the -noJ version in the figures).Footnote 6 Note that the cost function coefficients are still used in the evaluation metric both for the parameter setting and for scoring the performance. Another test we performed, illustrated in Fig. 5-(c), (d), compares the QAMPA performance against the performance of an ansatz that includes only XY gates. For the XY-noJ variation case, the phase separation step of QAOA is completely eliminated and the phase information is entirely absent. The XY label in the figure considers instead the design of an XY-only ansatz that mixes qubits using an angle proportional to the cost function coefficients, i.e. using gates of the form \({U}^{nm}_{XY}(J_{nm} \beta )\).

Fig. 5
figure 5

N = 10 QAOA vs QAMPA instance-by instance comparison (left) and Ansatz Variations (a)-(d). All results are after the \(\mathtt{scanlast}\) parameter setting. QAOA-noJ in (a) is constructed with the phase-separation gates \({U}_{PS}^{nm}(\gamma ,\beta )|_{J_{nm}=1}\), the QAMPA-noJ ansatz in (b) is constructed with the mixer-phaser gates \({\tilde {U}}_{MP}^{nm}(\gamma ,\beta )|_{J_{nm}=1}\), and the XY-noJ ansatz in (d) uses \({\tilde {U}}_{XY}^{nm}(\gamma ,\beta )|_{J_{nm}=1}\). In (c) it is shown that the simplified circuit with γp = 0 (XY) is not performing well against other options

What is observed is that the standard QAOA approach (which performs only slightly better than QAMPA, as we recall) gives the best performance compared against all other tested variations, and it is always beneficial to include a proportional factor multiplying γp, for each gate between qubits n and m, corresponding to the objective function coefficients Jnm in the circuit ansatz. These observations were validated for multiple problem sizes up to N = 16.

5 Discussion and conclusions

As reviewed in Cerezo et al. (2020) and Bharti et al. (2021), modern quantum algorithms for optimization on NISQ devices have generalized significantly the original structure of the QAOA circuitry. In this paper, we have presented a variation that combines the hardware-efficient spirit of Variational Quantum Eigensolvers with the advanced mixers of the Quantum Alternating Operator Ansatz and the guidance from inclusion of operators derived from the cost function without increasing the number of parameters that need to be optimized. Our numerical results indicate that the mixer-phaser ansatz QAMPA is a compelling choice among NISQ era quantum options; we expect these ideas can be ported beyond the \(\mathtt{WeightedMaxCutGSP}\) problem, to provide compilation advantages to a multitude of other hard-constrained combinatorial optimization problems that require advanced mixers. Multiple questions however remain in order to bridge the gap from proof-of-principles to real-world implementation. We list here few avenues of research towards the goal of deploying a QAMPA solver in a quantum processor, before concluding with some more general thoughts on research directions.

First, the parameter setting procedure used in this study could be refined to avoid optimization bottlenecks and limitations that affect all layerwise training protocols at scale. In particular, the Powell method while effective at small N would eventually become intractable, and alternative gradient methods might be hampered by barren plateaus if the parameter setting protocol is kept to be “layerwise” (Campos et al. 2021). Recently developed analytical methods based on series expansion (Hadfield et al. 2021) or quantum control (Larocca et al. 2021) methods might come in handy to analyze further the reachability deficits and strengths of these algorithms (Akshay et al. 2020), study the observed optimal parameter concentration (Akshay et al. 2021), and to estimate the performance of QAMPA at scale. Ties between quantum annealing schedule and QAOA parameter setting (Yang et al. 2017; Zhou et al. 2020; Brady et al. 2021a), further indicating that cross-overs between digital and analog optimization methods are also an interesting possible development for QAMPA (Magann et al. 2021; Headley et al. 2020; Wiersema et al. 2020; Brady et al. 2021b).

Moreover, our performance evaluation procedure based on the 〈BEST5〉 provides just an indication of the ability of a QAMPA solver to identify a good solution of \(\mathtt{WeightedMaxCutGSP}\) in a reasonable time, and a fully-fledged analysis on comparative advantage of using this method versus other heuristics is required. This analysis has to take into account practical issues such as the effect of noise, which is going to adversely impact our performance estimation. The relative performance results will be both problem and hardware dependent. While theoretical frameworks to estimate the impact of noise in circuits featuring XY gates are being developed (Streif et al. 2021), ultimately only the experimental tests on quantum hardware will be able to provide a full picture of performance, including identifying at what layer p the solver is most effective. While in the ideal case, performance cannot decrease with p, under noise the situation is different; performance is observed to decrease quickly with p in current hardware (Harrigan et al. 2021). QAMPA, with its circuit depth reduction compared to QAOA, will enable empirical studies for higher values of p than QAOA on a wide variety of quantum hardware. Hybridization of QAMPA with adaptive (Zhu et al. 2020) and recursive (Bravyi et al. 2020) hybrid methods may prove powerful. One advantage of the studied problem, and of others with locally conserved particle number constraints, is that provides a natural error mitigation strategy via post-selection (Shaydulin and Galda 2021). Namely, measured bitstrings which do not obey the Hamming weight constraint are discarded. Post-selection has shown to provide significant improvements in experiments on superconducting processors (Google et al 2020) and can be generally applied beneficially to any situation with constraints. Another recent experiment (Hashim et al. 2021) shows that permuting the ordering of the qubits in the SWAP network and averaging over the results is reducing the systematic coherent errors, a technique that could be generalized to our case where the permutations are not equivalents, possibly helping the parameter setting and/or optimization performance.

In terms of actual implementation, the QAMPA method is readily testable in superconducting processors that natively support XY interactions, such as Rigetti’s devices of the Aspen family (Abrams et al. 2020). However, the initialization to a Dicke state is likely too heavy for near-term implementation, so a warm-start from a classical candidate solution obtained by greedy search is advisable (Egger et al. 2021).

We expect researchers to develop other ansätze that have sweet spots in terms of circuit depth, number of parameters, and performance. As QAMPA illustrates, there is nothing sacrosanct about QAOA; variants may perform as well or better, particularly on NISQ hardware. A rich ecosystem of ansätze that take into account hardware architectures, gate sets, and noise considerations for different types of NISQ processors will enable more rapid understanding of quantum optimization approaches for the NISQ era and beyond.