1 Introduction

Quantum computing has emerged as a powerful computation paradigm taking advantage of principles of quantum mechanisms. It involves two computational models that fundamentally differ in their functioning: The adiabatic quantum model is best suited for optimization problems, typically cast in quadratic unconstrained binary optimization form. Commercial devices such as d-wave annealers [1] can already solve combinatorial problems in various fields such as computer vision [2, 3] and database engineering [4, 5]. The universal model of quantum computing, also referred to as the gate-based or circuit model, is more flexible and can potentially implement any classical operation, as shown by Bennett [6]. For selected problems such as Shor’s factoring [7] and Grover’s search [8] algorithms, strong theoretical convergence properties and drastic speed-up of the universal quantum computing model over classical counterparts could be proven. However, for problem sizes of practical interest, these algorithms still require more resources and run-time than existing universal devices can provide. In the era of noisy intermediate-scale devices, it is a challenging task to find real-world applications of universal quantum computing. Hybrid quantum-classical algorithms are therefore considered to be a promising way to obtain practical quantum supremacy.

The Ising model [9] describes a quantum mechanical system with \(n\in \mathbb {N}\) particles or spins that can be in two possible states, i.e., the spin \(s_i, i = 1, \ldots , n\) can be in the state \(\pm 1\) represented by \(s_i = (-1)^{q_i}, q_i \in \left\{ 0, 1\right\} \). Each spin \(s_i\) can interact with some external energy of strength \(\mathcal {C}_{ii}\) or with an adjacent spin \(s_j\) by a mutual interaction energy \(\mathcal {C}_{ij}\). The complete system can be modeled by a general un-directed n vertices graph \( \mathcal {G} = (\mathcal {S}, \mathcal {E}, \mathcal {C}) \) with \( \mathcal {S} = \left\{ s_1, \ldots , s_n\right\} \), \( \mathcal {E} \subseteq \mathcal {S} \times \mathcal {S} \) and a cost function \( \mathcal {C}: \mathcal {E} \rightarrow \mathbb {R}\) with \(\mathcal {C}(s_i, s_j):= \mathcal {C}_{ij}\) on \( \mathcal {E} \). For a quantum system in the state \({| q\rangle } = \bigotimes _{i=1}^n{| q_i\rangle }, q_i \in \left\{ 0, 1\right\} \), the Ising model describes the total energy of the system as being the expectation \(\langle \textbf{C}\rangle := \langle q | \textbf{C} | q\rangle \) of the \((2^n \times 2^n)\) Hamiltonian

$$\begin{aligned} \textbf{C} = \sum _{i=1}^n \mathcal {C}_{ii} \textbf{Z}_i + \sum _{1 \le i < j \le n} \mathcal {C}_{ij} \textbf{Z}_i \textbf{Z}_j, \end{aligned}$$
(1)

where \(\textbf{Z}_k\) denotes the Pauli-\(\textbf{Z}\) operator acting on the kth particle of the system.

The problem is to

(2)

i.e., to search for the state \({| q^\star \rangle }\) of the system with the minimal energy according to the Ising model in Eq. (1). By setting \(\mathcal {C}_{ii} = 0\) for all i, the problem reduces to the so-called weighted maximum cut (maxcut) problem [10].

The observable \(\textbf{C}\) in Eq. (1) is a diagonal matrix and it is straightforward to verify that the expectation \(\langle \textbf{C}\rangle \) is lower-bounded by the smallest eigenvalue \(\mathcal {C}_{min}\) of \(\textbf{C}\). Indeed, measuring \(\textbf{C}\) on a quantum system prepared in the superposition state \({| \psi \rangle } = \sum _{q=0}^{2^n-1} \alpha _q {| q\rangle }\) yields

$$\begin{aligned} \langle \textbf{C}\rangle = \langle \psi | \textbf{C} | \psi \rangle&= \sum _{q=0}^{2^n-1} \left| \alpha _q \right| ^2 \langle q | \textbf{C} | q\rangle \end{aligned}$$
(3)
$$\begin{aligned}&\ge \sum _{q=0}^{2^n-1} \left| \alpha _q \right| ^2 \langle q^\star | \textbf{C} | q^\star \rangle = \mathcal {C}_{min}. \end{aligned}$$
(4)

In other words, under all norm-1 vectors, \(\langle \textbf{C}\rangle \) reaches the minimum exactly for \({| \psi \rangle }\) being an eigenvector of \(\textbf{C}\) to the lowest eigenvalue, a ground state \({| q^\star \rangle }\) of \(\textbf{C}\). Computing \(\langle \textbf{C}\rangle \) for all eigenvectors of \(\textbf{C}\)—which amounts to computing the diagonal elements of \(\textbf{C}\)—is practically hard, as it amounts to brute-forcing the problem. Instead, in hybrid quantum-classical approaches, one aims for an efficient way to search for parameters \(\Theta \in \Gamma \) that solve the variational problem

$$\begin{aligned} \begin{aligned}&\underset{\Theta \in \Gamma }{\text {minimize}}{} & {} \mathcal {L} (\Theta ), \quad \mathcal {L} (\Theta ):= \langle \psi (\Theta ) | \textbf{C} | \psi (\Theta )\rangle , \end{aligned} \end{aligned}$$
(5)

where \(\Gamma \) is a suitable parameterization of the search space.

The Ising spin model is a powerful tool describing ferromagnetism in statistical mechanics [9] as well as many practically relevant np-problems [11]. maxcut is no less important. It finds practical applications in varying fields such as computer vision [12], data clustering [13], and communications network design [14]. Thus, determining solutions of the Ising spin model or the maxcut problem, even approximately, is of great practical interest.

Adiabatic quantum computation (aqc) relies on the adiabatic theorem [15, 16] and solves the Ising problem by performing an adiabatic evolution of the quantum system from the known and easily-prepared ground state of an initial Hamiltonian to that of the problem Hamiltonian. d-wave [1] quantum annealers are intermediate-scale devices implementing aqc. It is an established fact that the run-time of aqc algorithms scales inversely with the energy gap between the ground state and the first excited state of the system Hamiltonian [15,16,17]. This introduces the necessity of carefully designing the system Hamiltonian or to use spectral gap amplification techniques [18]. Another workaround is the conception of (hybrid) gate-based algorithms that use moderate quantum resources with outer-loop classical optimization [19].

Quantum approximate optimization algorithms (qaoa), firstly introduced by Fahri et al. [19] and widely discussed in the literature [20,21,22,23], mimic the aqc model, but use a low-depth variational circuit capable to run on near-term devices and benefiting from the maturity of classical optimization. qaoa is considered as one of the major candidates for dealing with realistic real-world applications at competitive performance using the universal model. However, in practice, optimizing qaoa parameters appears to be extremely difficult due to a non-convex objective [20, 22, 23]. Also, qaoa is threefold iterative and hence computationally expensive: First, the ansatz itself involves repetitive layers of gates; second, the classical optimization routine used around qaoa often needs to be iterative as the objective function is non-convex; third, evaluating this objective function requires a large number of quantum circuit measurements.

This work aims to facilitate combinatorial optimization on universal quantum computers. We first eliminate the repetitive layers of qaoa and propose an alternative, more effective encoding of the problem on universal quantum hardware. We present a new, easy to implement and low-depth variational circuit that block-encodes the maxcut or the Ising Hamiltonian in an Hermitian and unitary operator. Measuring this circuit reveals direct information about the loss function Eq. (5). We then derive an optimization routine based on gradient descent for the proposed variational circuit in order to drive the quantum system toward the ground state of the problem Hamiltonian. We experimentally validate the novel algorithm and find that it outperforms the state-of-the-art gate-based qaoa model for solving maxcut; on the Ising model, it also challenges the specialized d-wave annealers.

This paper is structured as follows: in Sect. 2 we recall how d-wave and qaoa solve combinatorial problems; in Sect. 3, we present the new variational quantum circuit and how its parameters are optimized; Sect. 4 provides experimental results and Sect. 5 concludes the work.

2 Related work

2.1 Adiabatic quantum computing

Adiabatic quantum computation (aqc) is an optimization paradigm which relies on the adiabatic theorem [15, 16] saying that the ground state of a quantum mechanical system is the solution of an optimization problem [24]. Theoretically, the evolution of a quantum system of \(n \in \mathbb {N}\) particles at a time \(t \in [0, T]\), typically re-scaled to \(s=t/T \in [0, 1]\), can be described by a Hamiltonian \(\textbf{H}(s)\) on a Hilbert space \(\mathcal {H} = \mathbb {C}^{\otimes 2^n}\), with the state of the system given by a unit vector \({| \psi (s)\rangle } \in \mathcal {H}\).

Industrial devices such as the d-wave quantum annealer [1] have already been proposed to solve binary combinatorial problems based on the aqc optimization principle. Their main idea consists in initializing a quantum system with a Hamiltonian \(\textbf{B} = \sum _{k=1}^n \textbf{X}_k\) whose ground state \({| +\rangle }^{\otimes n}\) is the perfect superposition state and easy to prepare. As above, \(\textbf{X}_k\) denotes the Pauli-\(\textbf{X}\) operator acting on the kth particle of the system. Then, another Hamiltonian \(\textbf{C}\), the problem Hamiltonian, is prepared as in Eq. (1). As the time s evolves from 0 to 1, the initial Hamiltonian \(\textbf{B}\) is transformed into the problem Hamiltonian \(\textbf{C}\), describing a time-dependent system Hamiltonian

$$\begin{aligned} \textbf{H}(s) = \mathcal {B}(s) \cdot \textbf{B} + \mathcal {C}(s) \cdot \textbf{C}, \end{aligned}$$
(6)

where \(\mathcal {B}\) and \(\mathcal {C}\) with \(\lim _{s\rightarrow 1}\mathcal {B}(s) = 0\) and \(\lim _{s\rightarrow 1}\mathcal {C}(s) = 1\) are annealing functions. The evolution of the system generated by \(\textbf{H}(s)\) over the time \(s \in [0, 1]\) is governed by the Schrödinger equation; its solution defines a time-dependent unitary operator that transforms the ground state of \(\textbf{B}\) into the ground state of \(\textbf{C}\) with high probability if s varies sufficiently slowly [16]. The ground state of \(\textbf{C}\) is the solution of the optimization problem (2).

While being an efficient scheme for combinatorial optimization that has the potential to ultimately supercede classical computers, aqc has a caveat. It is known that the smaller the energy gap between the ground state and the first excited state of the adiabatic Hamiltonian \(\textbf{H}(s)\), the longer the required annealing time for guaranteeing the success of the optimization [15,16,17]. To overcome this, methods for universal quantum computers that take advantage of efficient classical optimization techniques have been proposed in the form of quantum approximate optimization algorithms [19].

2.2 Quantum approximate optimization algorithms

In quantum approximate optimization algorithms (qaoa, [19]), the problem (2) is relaxed to finding a state \( {| \psi \rangle } = \sum _{q} \alpha _q {| q\rangle } \) such that \(\langle \textbf{C}\rangle \) is minimized. qaoa solves the maxcut problem by trotterizing the evolution generated by Eq. (6). Its pseudo-code is recapitulated in Algorithm 1: First, the system is prepared in the perfect superposition state \({| +\rangle }^{\otimes n}\). Then, trotterization consists in repeatedly, say p times, applying to the state the unitaries \( \textbf{U}(\textbf{C}, \gamma _k):= \exp (-i \gamma _k \textbf{C}) \) and \( \textbf{U}(\textbf{B}, \beta _k):= \exp (-i \beta _k \textbf{B}) \) generated by the problem Hamiltonian \(\textbf{C}\) and the mixing Hamiltonian \(\textbf{B}\). The (small) time steps \(\gamma _k, \beta _k \in \mathbb {R}\), \(1\le k\le p\), are optimization parameters in qaoa. The resulting state is called \( {| \gamma , \beta \rangle } \):

$$\begin{aligned} {| \gamma , \beta \rangle }:= \textbf{U}(\textbf{B}, \beta _p)\textbf{U}(\textbf{C}, \gamma _p) \cdots \textbf{U}(\textbf{B}, \beta _1)\textbf{U}(\textbf{C}, \gamma _1){| +\rangle }^{\otimes n}. \end{aligned}$$
(7)

Finally, the state \( {| \gamma , \beta \rangle } \) is measured in the computational basis and is used to evaluate the cost function \(\langle \textbf{C}\rangle := \langle \gamma , \beta |\textbf{C}|\gamma , \beta \rangle \) and if necessary its derivatives \(\nabla _{(\gamma , \beta )}\langle \textbf{C}\rangle \) and \(\nabla _{(\gamma , \beta )}^2\langle \textbf{C}\rangle \). A classical optimization routine is used to update the parameters \( (\gamma , \beta ):= (\gamma _1, \ldots , \gamma _p, \beta _1, \ldots , \beta _p) \).

figure a

Quantum Approximate Optimization Algorithms [19]

For \(p \rightarrow \infty \), the results of [19] guarantee that there exist parameters \( (\gamma , \beta ) \) for which measuring \( {| \gamma , \beta \rangle } \) gives the desired ground state \({| q^\star \rangle }\) with high probability. However, the qaoa objective is difficult to optimize [20, 22, 23]. We believe that this is partially due to the fact that the qaoa ansatz encodes problem information in the argument of \(\exp \) as phases of the qubits, which is partially lost at the measurement. Another issue of qaoa is that its repetitive layers are still too expensive for running on current and near-term devices. In this work, we propose a quantum circuit that encodes the problem more effectively and does not require the repetitive layers of qaoa. Adhering to the promising concept of designing hybrid quantum algorithms, we embed the circuit in a classical optimization method to output the desired ground state \({| q^\star \rangle }\) with high probability.

3 Proposed universal quantum algorithm

Our method builds on the notion of block-encoding introduced in [25, 26] that allows to embed non-unitary matrices as the principal block of a unitary operator acting on the quantum system. Block-encoding is typically achieved by enlarging the Hilbert space of the quantum system.

Definition 1

(Block-encoding [25, 26]) Let \(a,n \in \mathbb {N}\) and \(m:=a+n\). The \(m\times m\) unitary \(\textbf{U}\) is said to be a block-encoding of the \(n\times n\) matrix \(\textbf{C}\) if there is \(\kappa \in (0,\infty )\) such that

$$\begin{aligned} \kappa \textbf{C} = \left[ \langle 0|^{\otimes a} \otimes \textbf{I}_n\right] \textbf{U} \left[ {| 0\rangle }^{\otimes a} \otimes \textbf{I}_n\right] . \end{aligned}$$
(8)

By adding a single qubit to the system, our goal is to implement the \((2^{1+n}) \times (2^{1+n})\) unitary operator

$$\begin{aligned} \textbf{U}:= \textbf{U}(\textbf{C}, K):= \left[ \begin{array}{ll} \sin (\hat{\textbf{C}}) &{} \cos (\hat{\textbf{C}})\\ \cos (\hat{\textbf{C}}) &{} -\sin (\hat{\textbf{C}}) \end{array}\right] ,\quad \hat{\textbf{C}}:= \textbf{C} / K, \end{aligned}$$
(9)

for the Ising Hamiltonian \(\textbf{C}\) from Eq. (1) and a suitably chosen constant \( K \in \mathbb {R}\). As \(\textbf{C}\) is a diagonal matrix, the \(\sin \) and \(\cos \) functions directly apply to the diagonal elements [27, Section 2.1.8]. This allows to encode information about the problem as probability amplitudes of the qubits. Note that because \(\textbf{C}\) is Hermitian, \(\textbf{U}\) is Hermitian as well and thus can serve as measurement observable. We use the constant K to re-scale all entries of \(\textbf{C}\) to \([-\pi / 2, \pi / 2]\), where \(\sin \) is strictly increasing and invertible. Specially, Eq. (9) block-encodes a bijective transformation of \(\textbf{C}\). For reasons that will become clear in Sect. 3.1, this re-scaling also allows for an efficient implementation of \(\textbf{U}\). We stress that K should not be set to large since \(\lim _{K \rightarrow \infty } \textbf{U}(\textbf{C}, K) = \textbf{X}^{\otimes (1+n)}\), i.e., for large K the operator \(\textbf{U}\) behaves like a not-gate. In Sect. 3.2, we provide an optimization routine for the proposed circuit, we discuss its scalability and complexity in Sect. 3.3, and lastly we provide a suitable choice for K in Sect. 3.4.

3.1 Implementation

For a system prepared in the basis state \({| q\rangle }\), the cost of the cut q according to the Ising model in Eq. (1) is given by

$$\begin{aligned} \langle \textbf{C}\rangle = \sum _{i=1}^n (-1)^{q_i} \mathcal {C}_{ii} + \sum _{1 \le i < j \le n} (-1)^{q_i + q_j} \mathcal {C}_{ij}. \end{aligned}$$
(10)

Crucially, for a \((1+n)\)-qubit system prepared in the state \({| \hat{\psi }, \psi \rangle }:= {| \hat{\psi }\rangle }_c \otimes {| \psi \rangle }\), where \({| \hat{\psi }\rangle }_c = \alpha {| 0\rangle }_c + \beta {| 1\rangle }_c\) is a 1-qubit register that we call the cost qubit and \({| \psi \rangle }\) is the n-qubit working register, if the cost qubit \({| \hat{\psi }\rangle }_c\) is kept in the state \({| 0\rangle }_c\), it holds for \(\textbf{U}\) defined in Eq. (9) that

$$\begin{aligned} \langle \textbf{U}\rangle = \langle \psi |\sin \left[ \dfrac{\textbf{C}}{K}\right] |\psi \rangle . \end{aligned}$$
(11)

Thus, if K is chosen such that the entries of the scaled diagonal matrix \(\frac{\textbf{C}}{K}\) fit inside the monotone region of the sine, then minimizing \(\langle \textbf{U}\rangle \) with the cost qubit set to zero is equivalent to minimizing \(\langle \textbf{C}\rangle \).

Observe that \(\textbf{U}\) is a block matrix of diagonal matrices and applying it to the basis state \({| 0, q\rangle }\) gives the same result as applying to the cost qubit \({| 0\rangle }_c\) the controlled \((2 \times 2)\)-operator

$$\begin{aligned} \textbf{U}_{2\times 2}:= \left[ \begin{array}{ll} \sin \langle \hat{\textbf{C}}\rangle &{} \cos \langle \hat{\textbf{C}}\rangle \\ \cos \langle \hat{\textbf{C}}\rangle &{} -\sin \langle \hat{\textbf{C}}\rangle \end{array}\right] . \end{aligned}$$
(12)

As derived in “Appendix A,” this operator performs a reflection in the Bloch sphere of the cost qubit about the axis \(\vec n = (1/2) \left[ \begin{array}{lll}\cos \langle \hat{\textbf{C}}\rangle&0&\sin \langle \hat{\textbf{C}}\rangle \end{array}\right] ^\top \). Up to an irrelevant phase factor, \(\textbf{U}\) can be written as

$$\begin{aligned} \textbf{U}(\textbf{C}, K)&\equiv R_y\left[ 2 \arccos (\sin \langle \hat{\textbf{C}}\rangle )\right]&\cdot \textbf{Z} \otimes \textbf{I}^{\otimes n} \nonumber \\&\equiv R_y\left[ \pi - 2 \langle \hat{\textbf{C}}\rangle \right]&\cdot \textbf{Z} \otimes \textbf{I}^{\otimes n} \nonumber \\&\equiv R_y (-2\langle \hat{\textbf{C}}\rangle ) \cdot R_y(\pi )&\cdot \textbf{Z} \otimes \textbf{I}^{\otimes n}. \end{aligned}$$
(13)

The second equation holds because \(\arccos (\sin x) = \pi /2 - x\) for \(x\in [-\pi /2, \pi /2]\). The last equation is obtained by applying the identity \(R_y(\theta _1 + \theta _2) = R_y(\theta _2)\cdot R_y(\theta _1)\) for rotations in two dimensions, where \(\theta _1, \theta _2 \in \mathbb {R}\). By recursively applying this same identity to \(R_y (-2\langle \hat{\textbf{C}}\rangle )\), we find

$$\begin{aligned} \textbf{U}(\textbf{C}, K)&\equiv \ \ \ \prod _{i=1}^n \hspace{.8cm} \textbf{X}^{q_i} \cdot R_y (-2\hat{\mathcal {C}}_{ii}) \cdot \textbf{X}^{q_i} \cdot \nonumber \\&\quad \prod _{1 \le i < j \le n} \textbf{X}^{q_i + q_j} \cdot R_y (-2\hat{\mathcal {C}}_{ij}) \cdot \textbf{X}^{q_i + q_j} \cdot \textbf{X} \otimes \textbf{I}^{\otimes n}. \end{aligned}$$
(14)

As a result, the weighted sum of Pauli-\(\textbf{Z}\) operators naturally translates into a product of unitary transformations, which is very compatible with the gate-based model of quantum computing. As the basis state \({| q\rangle }\) is chosen arbitrarily, \(\langle \textbf{U}\rangle \) outputs sine transformed costs as derived in Eq. (11) for arbitrary states. For a system prepared in the basis state \({| \hat{\psi }, \psi \rangle } = {| 0, q\rangle }\), we can even recover the exact cost by \(\langle \textbf{C}\rangle = K \arcsin \langle \textbf{U}\rangle \).

Fig. 1
figure 1

Implementation of the \((2^{1+n}) \times (2^{1+n})\) operator \( \textbf{U} \). The cost qubit is initialized in the state \({| \hat{\psi }\rangle }_c = \textbf{X}{| 0\rangle }\) and is the target of all operations. Note that this does not contradict the idea of keeping it in the \({| 0\rangle }\)-state, as the operation \(\textbf{X} = R_y(\pi ) \cdot \textbf{Z}\) implements the last part of \(\textbf{U}\) [Eq. (13)]. (Left) For each coupling edge of weight \(\mathcal {C}_{ij}\) between nodes \(q_i\) and \(q_j\), rotate the cost qubit by \(R_y (-2\hat{\mathcal {C}}_{ij})\) if their corresponding working qubits are in the same state, else by \(R_y (2\hat{\mathcal {C}}_{ij}) = \textbf{X} \cdot R_y (-2\hat{\mathcal {C}}_{ij}) \cdot \textbf{X}\). (Right) For each unary edge of weight \(\mathcal {C}_{ii}\) involving node \(q_i\), rotate the cost qubit by \(R_y (-2\hat{\mathcal {C}}_{ii})\) if its corresponding working qubit is in the \({| 0\rangle }\)-state, else by \(R_y (2\hat{\mathcal {C}}_{ii}) = \textbf{X} \cdot R_y (-2\hat{\mathcal {C}}_{ij}) \cdot \textbf{X}\)

The operator \(\textbf{U}\) can be efficiently implemented using the circuit given in Fig. 1:

  • We initialize the cost qubit in the state \({| \hat{\psi }\rangle }_c = R_y(\pi ) \cdot \textbf{Z}{| 0\rangle } = \textbf{X} {| 0\rangle }\).

  • For each weight \(\mathcal {C}_{ij}\) between two nodes \(q_i\) and \(q_j\), we rotate the cost qubit by \(R_y (-2\hat{\mathcal {C}}_{ij})\) if the corresponding working qubits are in the same state, and by \(R_y (2\hat{\mathcal {C}}_{ij}) = \textbf{X} \cdot R_y (-2\hat{\mathcal {C}}_{ij}) \cdot \textbf{X}\) if they are not.

  • For each unary weight \(\mathcal {C}_{ii}\) involving node \(q_i\), we rotate the cost qubit by \(R_y (-2\hat{\mathcal {C}}_{ii})\) if the corresponding working qubit is in the \({| 0\rangle }\)-state, and by \(R_y (2\hat{\mathcal {C}}_{ii}) = \textbf{X} \cdot R_y (-2\hat{\mathcal {C}}_{ii}) \cdot \textbf{X}\) if it is not.

Whenever the unary costs satisfy \(\mathcal {C}_{ii} = 0\) for all i, we refer to Eq. (14) as the Universal Quantum Maximum Cut (uq maxcut) model, else as the Universal Quantum Ising (uqising) model.

3.2 Optimization

3.2.1 Workflow

Fig. 2
figure 2

Complete workflow of our proposed algorithm for the Universal Quantum maxcut and Ising Model (uq maxcut and uqising). First \(\textcircled {{\textbf {1}}}\), the cost (node and edge weight) information for the given graph is used to implement the operator \(\textbf{U}:= \textbf{U}(\textbf{C}, K)\) that block-encodes the problem Hamiltonians. This operator is applied to a trial variational quantum state \({| \psi (\Theta )\rangle }\) tensored with the cost qubit. Second \(\textcircled {{\textbf {2}}}\), using the principle of implicit measurement, the expectation value \(\langle \textbf{U}\rangle \), which equals the cost \(\mathcal {L}(\Theta )\), is approximated by measuring several copies of the circuit. This computed expectation and eventually its gradient are iteratively used in a classical optimization routine that drives the parameterized state \({| \psi (\Theta )\rangle }\) toward the state \({| \psi (\Theta ^\star )\rangle }\) that potentially gives the global minimal cost value. Finally \(\textcircled {{\textbf {3}}}\), the optimal state \({| \psi (\Theta ^\star )\rangle }\) is measured in the computational basis and the most frequently measured state corresponds to the desired optimal cut of the input graph

The overall workflow of our algorithm is presented in Fig. 2. The complete circuit consists of 3 registers: a 1-qubit register containing an ancilla qubit, another 1-qubit register for the cost qubit, and an n-qubit register for the working qubits encoding the variables of the problem.

First, the working qubits are rotated by a set of angles \(\Theta = (\theta _1, \ldots , \theta _n) \in \mathbb {R}^n\), constructing the ansatz

$$\begin{aligned} {| \psi (\Theta )\rangle }:= R_y(\theta _1) \otimes \cdots \otimes R_y(\theta _n) {| \psi \rangle }, \end{aligned}$$
(15)

from a system previously prepared in the state \({| \psi \rangle }\). Here, the qubit \(q_i\), representing the ith node of the graph, is rotated by \(\theta _i\) around the y-axis. Next, a Hadamard sandwich involving a controlled version of the \(\textbf{U}\)-operator is applied to the ancilla qubit. Finally, according to the principle of implicit measurement of quantum computing [27, Section 4.4] stating that all qubits that are not measured at the end of a quantum circuit can be assumed to be measured, only the ancilla qubit is measured, leaving the cost and working qubits in the state \(\textbf{U} {| 0, \psi (\theta )\rangle } = {| 0\rangle }_c \otimes \sin (\hat{\textbf{C}}){| \psi (\theta )\rangle } + {| 1\rangle }_c \otimes \cos (\hat{\textbf{C}}){| \psi (\theta )\rangle }\).

Our goal is to solve the variational problem

$$\begin{aligned} \begin{aligned}&\underset{\Theta \in \mathbb {R}^n}{\text {minimize}}{} & {} \mathcal {L} (\Theta ), \quad \mathcal {L} (\Theta ) = \langle 0, \psi (\Theta ) | \textbf{U} | 0, \psi (\Theta )\rangle , \end{aligned} \end{aligned}$$
(16)

i.e., to find a set of angles \(\Theta ^\star \) such that \({| \psi (\Theta ^\star )\rangle } = {| q^\star \rangle }\). The cost \(\mathcal {L} (\Theta )\) can be efficiently calculated by \(\mathcal {L} (\Theta ) = p(0) - p(1)\), where p(0) and p(1) are the probabilities of measuring the ancilla qubit in the \({| 0\rangle }\) and \({| 1\rangle }\) state, see “Appendix B” for the proof. This allows us to compute the expectation \(\langle \textbf{U}\rangle \) without having to measure \(\langle 0, \psi (\Theta )|\) needed for the scalar product. For the optimization, several approaches [28, 29] have fortunately been developed for evaluating or approximating gradients and improving optimization on quantum computers.

3.2.2 Parameter shift rule

The parameter shift rule [28] is a simple but exact method for evaluating the analytical gradient of a function given in the form of an expectation as in Eq. (16) on quantum hardware. For computing the partial derivative with respect to the parameter \(\theta _i\), it uses two function evaluations with the parameter \(\theta _i\) shifted by \(\pm \pi /2\):

$$\begin{aligned} \dfrac{\partial }{\partial \theta _i} \mathcal {L} (\Theta ) = \dfrac{1}{2} \bigg [ \mathcal {L} (\Theta _{i+}) - \mathcal {L} (\Theta _{i-})\bigg ], \end{aligned}$$
(17)

where \(\Theta _{i\pm }:= (\theta _1, \ldots , \theta _{i-1}, \theta _i \pm \pi /2, \theta _{i+1}, \ldots , \theta _n)\). The parameter shift rule uses the same circuit as the function evaluation, but allows to compute the exact gradient. Its drawback is that it requires 2n function evaluations to compute the gradient of \(\mathcal {L}\).

3.2.3 Update rule

The circuits (uq maxcut, uqising) are optimized by normalized gradient descent with decreasing step size. Normalized gradient descent has recently been proposed by Suzuki et al. [30] as a powerful optimization strategy for variational quantum algorithms. Specifically, in [30] it is demonstrated experimentally that normalized gradient steps are more effective in escaping non-global minima than gradient steps. It is also known [31] that normalized gradient descent evades saddle points. At each iteration k, our update rule reads

$$\begin{aligned} \Theta ^{(k)} \leftarrow \Theta ^{(k-1)} - \biggl [ \dfrac{\pi n}{2}\biggr ]^{1/2} \exp \Biggl [-\dfrac{4k^2}{k_{max}^2}\Biggr ] \cdot \dfrac{\nabla _\Theta \mathcal {L}(\theta ^{(k-1)})}{\Vert \nabla _\Theta \mathcal {L}(\theta ^{(k-1)}) \Vert _2}. \end{aligned}$$
(18)

The design of the update rule Eq. (18) is motivated by the following consideration: We know that we produce bit-strings by either flipping \({| q_i\rangle }\) or not, thus \(\theta _i^\star = \ell \pi , \ell \in \mathbb {Z}\). At iteration \(k=0\), the update rule Eq. (18) allows each \(\theta _i\) to get updated to \(\theta _i^{1} \leftarrow \theta _i^0 \pm \pi \cdot g_i/2\), where \( g_i \in [-1, 1]\) is the normalized contribution of \(\theta _i^0\) in the loss \(\mathcal {L} (\Theta )\). Subsequently, we let the step size decay exponentially to zero when approaching the maximum number of iterations \(k_{max}\). Given the noisy nature of quantum measurement, it is helpful that this frees us from the difficult task of determining a stopping condition for the algorithm.

3.3 Scalability and computational complexity

For a given graph \( \mathcal {G} = (\mathcal {S}, \mathcal {E}, \mathcal {C}) \), the circuit construction presented in Fig. 1 requires at most one not-gate, \(|\mathcal {E}|\) single-qubit rotation gates and \(4|\mathcal {E}| - 2|\mathcal {S}|\) cnot gates. Note that since the edges can be treated in arbitrary order, two consecutive cnot gates that have the same control qubit cancel each other out as their product is the identity, further reducing the number of required cnot gates.

The controlled-\(\textbf{U}(\textbf{C}, K)\) gate appearing in Fig. 2 and conditioned by the ancilla qubit \({| \cdot \rangle }_a\) can be fully decomposed into single qubit rotations and cnot gates without using any Toffoli gates. To see this, note that it can be expressed as

$$\begin{aligned} \text {controlled-}\textbf{U}(\textbf{C}, K)&= \textbf{I} \otimes \textbf{U}(\textbf{C}, K)^{a}\end{aligned}$$
(19)
$$\begin{aligned}&\equiv \textbf{I} \otimes \ \prod _{i=1}^n \hspace{.8cm} \textbf{X}^{q_i} \cdot [R_y (-2\hat{\mathcal {C}}_{ii})]^a \cdot \textbf{X}^{q_i} \cdot \nonumber \\&\quad \quad \prod _{1 \le i < j \le n} \textbf{X}^{q_i + q_j} \cdot [R_y (-2\hat{\mathcal {C}}_{ij})]^a \cdot \textbf{X}^{q_i + q_j} \cdot \textbf{X} \otimes \textbf{I}^{\otimes n}. \end{aligned}$$
(20)

In particular, in order to control the complete \(\textbf{U}(\textbf{C}, K)\) gate, it suffices to control only the rotation gate \(R_y\).

Finally, the controlled rotation itself can be decomposed into two cnot gates and two single qubit rotations as

$$\begin{aligned}{}[R_y (\theta )]^a = \textbf{X}^a \cdot R_y (-\theta /2) \cdot \textbf{X}^a \cdot R_y (\theta /2). \end{aligned}$$
(21)

Table 1 recapitulates the main differences between uqising and a conventional qaoa of depth p. It shows that for \(p \ge 3\), qaoa requires more quantum resources than our uqising model. Further, physically mapping the qaoa ansatz onto the quantum hardware has to take into account a graph-dependent qubit connectivity, while our method, independently of the input graph, requires that one qubit (the cost qubit) is connected to all other qubits (ancilla and working qubits).

Table 1 Resources and complexity comparison of a depth p conventional qaoa and our uqising model on a graph \( \mathcal {G} = (\mathcal {S}, \mathcal {E}, \mathcal {C}) \)

3.4 Impact of the K-rescaling

Fig. 3
figure 3

Transforming the true costs \(\textrm{diag}(\textbf{C})\) into \(\textrm{diag}(\sin (\textbf{C}/K))\), where \(K = \lambda \cdot \mathcal {C}_{max}\), c.f. Eq. (22). The true costs are sorted by their values and the x-axis represents the sorted cuts in the decimal basis. The y-axis showcases the influence of \( 0.1 \le \lambda \le 1\). The z-axis reports the costs, re-scaled to \(\left[ 0, 1\right] \). The transformed costs, sorted by the sorted arguments of the true costs, show whether the order is preserved under the influence of \(\lambda \). Example: If \(\textbf{C} = [2, 5, 7, 3]\) and, for some value of K, \({\hat{\textbf{C}}} = [1, 3, 4, 2]\), then \(\textrm{sort}(\textbf{C}) = [2, 3, 5, 7]\), \(\textrm{argsort}(\textbf{C}) = [0, 3, 1, 2]\) and \({\hat{\textbf{C}}}_{\textrm{sorted}} = [1, 2, 3, 4]\). For each value of \(\lambda \), the displayed costs are the averages over 10 instances of a 10-nodes fully-connected graph. Left: Graph instances with only positive edge weights. Right: Graph instances with both positive and negative edge weights. As expected, the more \(\lambda \) and thus K grows, the better the order of the true cost agrees with the order of the transformed costs. Also, for fixed \(\lambda \) the difference between the two costs is the largest when the true cost is large in absolute value. The cyan-marked line is the profile that we selected for the experiments; it indicates the value \(\lambda = 2/\pi \), which is the minimum \(\lambda \) that guarantees an error-free transformation, cf. Eq. (22)

When applying the method, it is important to appropriately choose the constant \(K\in \mathbb {R}\) for re-scaling the costs \(\hat{\textbf{C}} = \textbf{C}/K\). The goal is to fix K such that \(\textrm{diag}(\hat{\textbf{C}}) \in [-\pi /2, \pi /2]^{2^n}\), as this guarantees that \(\sin (\textrm{diag}(\hat{\textbf{C}}))\) preserves the order of the original costs in \(\textrm{diag}(\textbf{C})\).

Although it is tempting to choose \(K \gg \mathcal {C}_{max}:= \max _k \textbf{C}_{kk}\), where \(\textbf{C}_{kk}\) denotes the kth diagonal element, recall that in Eq. (9) we have \(\lim _{K \rightarrow \infty } \textbf{U}(\textbf{C}, K) = \textbf{X}^{\otimes (n+1)}\), i.e., the larger K, the more shots are required to accurately measure the entries of \(\sin (\hat{\textbf{C}})\). Fortunately, in many problems an upper bound to the maximal cost \(\mathcal {C}_{max}\) can be computed from the original weights without knowing \(\textbf{C}\). For example, choosing

$$\begin{aligned} K = \dfrac{2}{\pi } \cdot \mathcal {C}_{max}:= \dfrac{2}{\pi } \cdot \left[ \sum _{i=1}^n \left| \mathcal {C}_{ii} \right| + \sum _{1 \le i < j \le n} \left| \mathcal {C}_{ij} \right| \right] \end{aligned}$$
(22)

guarantees an error-free transformation of the initial problem (5) into the equivalent problem (16). As presented for 10-node graph examples in Fig. 3, the choice of smaller values for K entails that the approximation of \(\arccos (\sin x)\) by \(\pi /2 - x\), used in Eq. (13), is inaccurate, with more pronounced error in the largest absolute entries of \(\textrm{diag}(\hat{\textbf{C}})\). Notably, the solution of the transformed problem will only be a solution of the initial problem as long as \(\arg \min _k \hat{\textbf{C}}_{kk} = \arg \min _k \textbf{C}_{kk}\).

4 Experimental results

In order to validate the practical usefulness of uq maxcut and uqising, we benchmark against two state-of-the-art approaches for solving binary combinatorial optimization with quantum computing: qaoa for the gate-based model and d-wave solvers for the adiabatic model. Random graphs in the experiments are generated using the Python language package NetworkX [32]. The unary and quadratic edge weights are all randomly and uniformly chosen in the range [1, 10] and the graphs are all fully connected. The gate-based circuits in the experiments (uq maxcut, uqising, qaoa) are implemented in Python and simulated in a noise-free framework using the QisKit library and the IBM-QASM simulator [33]. For the adiabatic model, d-wave solvers that run on the actual quantum hardware are used. d-wave quantum annealers are made available through the leap quantum cloud service [34], and the d-wave quantum algorithms can be implemented in Python using the Ocean software [35]. We perform 1024 measurement shots for all gate-based algorithms. On d-wave [1], the experiment is run with the default annealing time of \(20\mu s\) and 50 sample reads on the advantage topology.

4.1 Benchmark metrics

We denote by \({| q^\star \rangle }\) the ground-truth global minimizer and by \({| \psi ^\star \rangle }\) the ground-state proposal of each method (uq maxcut/uqising, qaoa and d-wave). In the experiments, we adopt the following two metrics to evaluate the performance of the methods:

  • The approximation ratio

    $$\begin{aligned} r(\psi ^\star ):= \dfrac{\langle \psi ^\star | \textbf{C} | \psi ^\star \rangle - \mathcal {C}_{max}}{\mathcal {C}_{min}- \mathcal {C}_{max}} \end{aligned}$$
    (23)

    informs about the quality of the result, i.e., how confident the method is with its solution proposal and how far the cost of this proposal is from the minimal cost \(\mathcal {C}_{min}:= \min _k \textbf{C}_{kk}\), cf. [36]. All the terms appearing in r are classically evaluated. It holds \(0 \le r(\psi ^\star ) \le 1\) and \(r(\psi ^\star )=1\) iff \(\psi ^\star = q^\star \).

  • The approximation index

    $$\begin{aligned} i(\psi ^\star ):= \textbf{1}_{\langle q^\star | \textbf{C} | q^\star \rangle = \langle q_{max} | \textbf{C} | q_{max}\rangle } \end{aligned}$$
    (24)

    is a Boolean variable that indicates whether the most likely state \({| q_{max}\rangle }\) obtained with probability \(\left| \alpha _{max} \right| ^2\) is the desired state \({| q^\star \rangle }\). It holds \(i(\psi ^\star ) = 1\) if \(\langle q^\star | \textbf{C} | q^\star \rangle = \langle q_{max} | \textbf{C} | q_{max}\rangle \) and 0 otherwise. Note that this differs from the usual approach of d-wave, where the sampled state with the minimal energy is regarded as the best solution proposal.

4.2 Benchmarking UQMaxCut against QAOA and D-Wave

4.2.1 Symmetric solutions and entanglement

Fig. 4
figure 4

Entanglement circuit to enforce symmetric solutions for maxcut. Using this circuit, the optimization is performed only on the last \(n-1\) variables and the qubit for the node \(q_1\) is kept constant in the state \({| 0\rangle }\) or \({| 1\rangle }\)

Before discussing the results on the maxcut problem, it is important to notice that for maxcut, solutions always exist in symmetric pairs. Specifically, for a basis state solution \({| q^\star \rangle } = \bigotimes _{i=1}^n{| q_i^\star \rangle }, q_i^\star \in \left\{ 0, 1\right\} \), the basis state \({| \bar{q}^\star \rangle }:= \bigotimes _{i=1}^n{| 1-q_i^\star \rangle }\) is also a solution. This feature can be enforced by introducing the entanglement circuit given in Fig. 4 after the rotation layer in Fig. 2. The circuit has the matrix representation

(25)

Also, this entanglement allows to optimize over \(n-1\) angles instead of n. Without entanglement our method just outputs one of the two possible solutions. In the experiments, uq maxcut is used without entanglement; if the algorithm outputs either one of the two solutions, we set the approximation index to 1.

4.2.2 Benchmark results

Fig. 5
figure 5

Experimental comparison of the proposed Universal Quantum maxcut (uq maxcut) algorithm with qaoa and d-wave. Results are shown for fully connected graphs of \(n = 3, 5\) and 10 nodes with strictly positive edge weights. They are averaged over 20 random graph instances for each n. Top: The approximation ratio. The lines represent the averaged ratios over the 20 instances and the shaded areas indicate the standard deviations. Bottom: The approximation index, i.e., the percentage of instances where a global solution is found. All gate-based methods hardly return a global solution for larger n. Yet, in contrast to qaoa and Competitively with d-wave, the proposed uq maxcut consistently returns very good approximate solutions, i.e., points whose function values are very close to the global minimal function value

For the outer optimization algorithm of the uq maxcut circuit, we use normalized gradient descent as described in Section 3.2. Other optimization algorithms such as vanilla gradient descent or adaptive moment estimation (adam [37]) could be used as well. However, it proved difficult in the experiments to find suitable step sizes for those methods, so we leave them for future research.

The qaoa layers depth in the experiments is set to \(p = \lceil n/2 \rceil \) to allow for a fair comparison, as then both qaoa and uq maxcut optimize over approximately n real variables. We also attempted to optimize qaoa using the same optimizer as for uq maxcut, but the results were not competitive. Hence, we also show the results of qaoa when using a gradient-free optimizer; we used the cobyla solver [38] available in the scipy library [39].

The results are presented in Fig. 5 for fully connected graphs of \(n=3, 5\) and 10 nodes with strictly positive edge weights. For each n, the results are averaged over 20 graph instances and all algorithms are tested on the same instances. The angles for uq maxcut and qaoa are all initialized to 0.

The approximation ratio for the three methods (qaoa, d-wave, uq maxcut) is not adversely affected by the number of variables n, but the approximation index drops sharply as the size of the problem increases. The gradient-free Cobyla-optimized version of qaoa performs much better than qaoa with gradient descent. The latter is on average as good as d-wave regarding the approximation ratio. We conjecture that the gradient-based optimization of qaoa often gets trapped by saddle points of the qaoa loss function landscape. In contrast, uq maxcut clearly outperforms the two qaoa variants and can compete with d-wave in producing good approximate solutions. Furthermore, the approximation index demonstrates that it returns a global minimizer significantly more often than qaoa and less often than d-wave whose architecture is specifically designed to solve such problems.

4.3 Benchmarking UQIsing against D-Wave

Fig. 6
figure 6

Experimental comparison of the proposed Universal Quantum Ising Model (uqising) algorithm with  d-wave. Results are shown for fully connected graphs of \(n = 3, 5\) and 10 nodes with strictly positive edge weights. They are averaged over 20 random graph instances for each n. Top: The approximation ratio. The lines represent the averaged ratios over the 20 instances and the shaded areas indicate the standard deviations. Bottom: The approximation index, i.e., the percentage of instances where a global solution is found. Our method can compete with d-wave solvers in predicting approximate solutions and finding the global minimum for moderate n

For the Ising model, we benchmark the proposed uqising algorithm against the d-wave annealer, the adiabatic quantum computer specialized in solving this type of problems. The variational circuit for uqising is optimized in the same way as for uq maxcut, see Sect. 3.2, with the exception that the initial angles are set to \(\pi /2\) instead of 0. The results are depicted in Fig. 6.

Notable differences in performance between uqising and d-wave are consistent with the MaxCut experiments. Specifically, its high approximation ratio, similar to that of d-wave, indicates that uqising always produces either globally optimal solutions or extremely good approximations thereof. On the other hand, the approximation index shows that d-wave identifies a globally optimal solution more often than uqising.

5 Conclusion

We have presented a new low-depth quantum circuit to tackle two important combinatorial problems on universal quantum machines. The resulting Universal Quantum maxcut (uq maxcut) approach outperforms the state-of-the-art quantum approximate optimization algorithms (qaoa) by the lower depth, by the computed approximation ratios and by a higher probability of outputting optimal solutions. It also challenges the d-wave-quantum annealers that are specifically designed to solve such combinatorial problems; on the maxcut as well as on the Ising spin model, uq maxcut, respectively, uqising can compete with d-wave in producing globally optimal solutions.

We believe that the proposed approach enables the design of new methods for solving practically-sized problems on universal quantum machines. Inspired by the novel operator \(\textbf{U}\), future work should focus on designing fully universal algorithms without the classical outer optimization loop, replacing the latter with fully universal methods like for example Grover’s search [8].