1 Introduction

Coherent quantum control has long been a key component in the effort towards future quantum technologies. It relies on the interference between multiple pathways to steer the quantum system in some desired way [1, 2]. Originally conceived for the control of chemical reactions [35], it has since been extended to a wide variety of applications, see Ref. [6] for a review. In this context, numerical optimal control theory (OCT) is a particularly powerful tool. OCT follows an iterative approach, improving the control field in each iteration to steer the dynamics to the optimization target. Generally, the fastest converging algorithms are those that take into account information about the gradient of the optimization functional with respect to variations in the control. Gradient-based methods assume an open-loop setup, where the entire optimization procedure is performed based on the knowledge of the system dynamics. Two widely used methods are Krotov’s method [79] and gradient ascent pulse engineering (GRAPE) [10]. Krotov’s method guarantees monotonic convergence for time-continuous control fields. The Limited memory Broyden-Fletcher-Goldfarb-Shannon (LBFGS) method [11] can be used to extend both GRAPE [12] and Krotov’s method [13], considering not only the gradient but also an estimate for the Hessian, i.e., the second order derivative. This has been demonstrated to improve the convergence in some instances, in particular when close to the optimum [14]. In experimental setups that only allow for limited control and knowledge of the dynamics, closed-loop control schemes have often been preferred [15]. There, the controls are updated based only on a measurement of the figure of merit, e.g. using genetic algorithms [16] or other gradient-free optimization methods.

More recently, gradient-free optimization methods have also been employed in an open-loop context, prompted by the observation that evaluation of the gradient in many-body systems is often numerically infeasible. The chopped random basis (CRAB) method [17, 18] has been formulated for such applications. It expands the control in a relatively small number of randomly chosen spectral components and then applies a Nelder-Mead simplex optimization to the expansion coefficients. In principle, gradient-free methods are applicable if the control can reasonably be described by only a few free parameters and the optimization landscape has no local minima in the vicinity of the initial ‘guess’.

Optimal control theory is particularly relevant for quantum information processing. Both Krotov’s method and GRAPE have been extensively used to obtain high-fidelity quantum gates [1924]. Short gate durations are crucial, in order to minimize detrimental effects of decoherence. With OCT, this is achieved by systematically decreasing the gate duration until no solution can be found [25, 26], thus operating at the quantum speed limit [27, 28]. Moreover, OCT may be used to actively minimize the effects of decoherence [29, 30], and to increase robustness with respect to classical noise [31]. Robustness is a requirement that is generally difficult to fulfill with analytical approaches.

Here, we explore the possibility of combining gradient-free and gradient-based methods at different stages in the optimization, exploiting the benefits of each method. The application of a simplex optimization to a guess pulse, described by only a few free parameters, efficiently yields a comparably simple first optimized pulse of moderate fidelity. This pre-optimized pulse then provides a good starting point for further optimization using a gradient-based method. The second optimization stage relaxes the restrictions on the search space implied by a simple parametrization and may then quickly converge towards a high fidelity. The simplex pre-optimization addresses the observation that typically in direct gradient-based optimizations, due to the large size of the search space, the majority of the numerical effort is spent in ‘getting off the ground’. Pre-optimization thus allows to locate a region of the search space in which the gradient is large enough to provide meaningful information. Such good guess pulses may sometimes be designed by hand, but this requires a very good intuition of the underlying control mechanisms. We propose here instead to simply prune the search space for the initial phase of the optimization by reducing the complexity of the control. The second optimization stage can then more easily identify high fidelity solutions.

We illustrate the use of such hybrid optimization schemes by optimizing a quantum gate on superconducting qubits, using an example inspired by the recently proposed resonator-induced phase gate [32]. Superconducting circuits are a prime candidate for the implementation of quantum computing, due to the flexibility in qubit parameters, their inherent controllability, and the promise of scalability. Moreover, with recent advances in the transmon architecture, decoherence times are approaching 0.1 ms [33], allowing to reach fault-tolerance with sufficiently fast gates. However, the flexibility and large number of different gate mechanisms [34] also imply a challenge from a control perspective, as it is not immediately obvious what are good qubit parameters, or good guess pulses. This makes superconducting circuits especially well-suited for combining a coarse search using simplex methods, which may then be refined with a more powerful gradient-based method.

The paper is organized as follows: In Section 2, we review the realization of a geometric phase gate on a system of two transmon qubits coupled with a shared transmission line resonator, using all microwave control. In Section 3, we first introduce a functional targeting the geometric phase gate, i.e., any diagonal perfect entangler, and then show the results of a direct optimization using the gradient-based Krotov method. In Section 4, we apply the gradient-free Nelder-Mead simplex optimization to obtain pre-optimized guess pulses, which then become the starting point for an optimization with Krotov’s method. We compare the control pulses obtained by the gradient-based, gradient-free, and hybrid schemes, and the dynamics they induce, as well as the numerical effort necessary to obtain converged results in each scheme. Section 5 concludes.

2 Model and gate mechanism

We consider a system of two transmon qubits, coupled via a shared transmission line resonator (‘cavity’) [35, 36]. The Hamiltonian reads

$$ \hat {\mathbf {H}} = \sum_{q=1,2} \biggl[ \omega_{q} \hat {\mathbf {b}}_{q}^{\dagger} \hat {\mathbf {b}}_{q} + \frac{\alpha_{q}}{2} \hat {\mathbf {b}}_{q}^{\dagger} \hat {\mathbf {b}}_{q}^{\dagger} \hat {\mathbf {b}}_{q} \hat {\mathbf {b}}_{q} + g_{q} \bigl( \hat {\mathbf {b}}_{q}^{\dagger} \hat {\mathbf {a}} + \hat {\mathbf {b}}_{q} \hat {\mathbf {a}}^{\dagger}\bigr) \biggr] + \omega_{c} \hat {\mathbf {a}}^{\dagger} \hat {\mathbf {a}} + \epsilon^{*}(t) \hat {\mathbf {a}} + \epsilon(t) \hat {\mathbf {a}}^{\dagger} , $$
(1)

where \(\omega_{c}\), \(\omega_{1}\), \(\omega_{2}\) are the frequency of the cavity and the first and second qubit, respectively; \(\alpha_{1}\), \(\alpha_{2}\) are the qubit anharmonicities, and \(g_{1}\), \(g_{2}\) are the coupling between each qubit and the cavity. The operators \(\hat {\mathbf {a}}\), \(\hat {\mathbf {b}}_{1}\), and \(\hat {\mathbf {b}}_{2}\) are the standard annihilation operators for the cavity (\(\hat {\mathbf {a}}\)) and the two qubits (\(\hat {\mathbf {b}}_{1}\), and \(\hat {\mathbf {b}}_{2}\)), respectively. For numerical purposes, the Hilbert spaces for the qubit and cavity are truncated after 6, respectively 70, levels. It has been verified that the inclusion of additional levels yields no significant change in the results of the subsequent sections. The parameters take the values listed in Table 1. The system is driven by the microwave field \(\epsilon(t)\), with a pulse duration T. An off-resonant pulse results in a state-dependent shift of the resonator frequency [32]. For a slowly-varying pulse shape with \(\epsilon(0) = \epsilon(T) = 0\), such that the level shifts occur adiabatically, the dynamics result in a geometric phase on each of the qubit levels. That is, the resulting gate takes the diagonal form

$$ \hat {\mathbf {U}} = \operatorname {diag}\bigl[ e^{i \phi_{00}}, e^{i \phi_{01}}, e^{i \phi_{10}}, e^{i \phi_{11}} \bigr] . $$
(2)
Table 1 Parameters for two transmon qubits coupled via a shared transmission line resonator

The maximal reachable concurrence of such a diagonal gate is

$$ C(\gamma) = \biggl\vert \sin \biggl(\frac{\gamma}{2} \biggr)\biggr\vert ,\quad \gamma\equiv\phi_{00} - \phi_{10} - \phi_{01} + \phi_{11} , $$
(3)

where γ defines the non-local two-qubit phase [26]. The concurrence is obtained from the theory of local invariants for two-qubit gates [37, 38]. The local invariants for a diagonal gate evaluate to \(G_{1} = \cos^{2}(\gamma/2)\) and \(G_{2}=1+2\cos^{2}(\gamma/2)\). From these, the Weyl chamber coordinates may be calculated as \(c_{1} = \gamma/2\), \(c_{2} = c_{3} = 0\) [38]. Following Ref. [39], the concurrence is evaluated as a function of the Weyl chamber coordinates to yield Eq. (3). The gate is a perfect entangler for \(\gamma= \pi\).

We consider pulse shapes of the form

$$ \epsilon(t) = E_{0} \sin^{2} \biggl(\pi\frac{t}{T} \biggr) \cos (\omega_{d} t) $$
(4)

with a fixed driving frequency \(\omega_{d}\) given in Table 1. For simplicity, we neglect the dephasing induced by high cavity populations which allows solution of wave-packet dynamics with the time-dependent Schrödinger equation including microwave control fields. For an arbitrarily chosen guess pulse of duration \(T = 200\mbox{ ns}\) and peak amplitude \(E_{0} = 300\mbox{ MHz}\), the population dynamics resulting from the initial condition \(\vert \Psi(t=0) \rangle = \vert 00 \rangle\) are shown in panels (a)-(d) of Figure 1.

Figure 1
figure 1

Population dynamics under guess and optimized pulse. The figure shows the population dynamics of the initial state \(\vert \Psi \rangle(t=0) = \vert 00 \rangle \) for the guess pulse, panels (a)-(d), and the pulse obtained from direct optimization of the \(J_{T}^{\mathrm {geo}}\) functional, panels (e)-(h). In panels (a), (e), expectation value \(\langle n \rangle\) of the cavity excitation, plus-minus the standard deviation \(\sigma_{n}\). In panels (b), (f), and (c), (g), expectation values and standard deviations for the excitation of the right and left qubit, respectively. In panels (d), (h), population in the state \(\vert 00 \rangle\). The pulse shape (normalized by the peak amplitude \(E_{0} = 300\mbox{ MHz}\), cf. Figure 2) is shown in the background of panels (d), (h). The guess pulse implements a geometric phase gate with a gate error of \(\varepsilon _{\mathrm {avg}}= 8.3 \times10^{-2}\), with concurrence error \(\varepsilon _{C}= 1.9 \times10^{-1}\) and population loss from the logical subspace \(\varepsilon _{\mathrm {pop}}= 5.9 \times10^{-3}\). The optimized pulse decreases the gate error to \(\varepsilon _{\mathrm {avg}}= 1.4 \times10^{-4}\), with \(\varepsilon _{C}= 1.8 \times10^{-6}\) and \(\varepsilon _{\mathrm {pop}}= 1.4 \times10^{-4}\), see Table 2.

If the pulse induces adiabatic dynamics, it shifts the qubit and cavity levels proportionally to \(\epsilon(t)^{2} / \Delta\) where Δ is the detuning from the respective level [32]. In the original field-free frame, this is equivalent to shifting the initial wave packet proportionally to the square of the pulse. Thus, the excitation and population dynamics should smoothly follow the pulse shape. Specifically, the condition for adiabaticity is that if the pulse shape were to be stretched in time, the population dynamics would simply stretch correspondingly. For a given peak amplitude, the larger the detuning of the drive \(\omega_{d}\) from the frequencies \(\omega_{c}\), \(\omega_{2}\), \(\omega_{1}\), the smaller the excitation in the respective Hilbert space. Since the drive is detuned by only 40 MHz from the cavity, the cavity excitation, panel (a), reaches a large value \(\langle n \rangle \approx30\). The far-detuned right qubit, panel (b), and even farther detuned left qubit, panel (c), only show a small excitation. As indicated by the standard deviations shown as shaded areas in panels (a)-(c), the excitation remains relatively localized in energy. It is noteworthy, however, that the excitation curves (specifically of the cavity) show some imperfections. These ‘wobbles’ can be interpreted as deviations from the expected adiabatic dynamics, e.g. jumping over an avoided crossing between highly excited cavity states. This ultimately results in a small loss of population from the logical subspace, as the system does not perfectly return to its original state. The loss of population for the given example is \(\varepsilon _{\mathrm {pop}}= 5.9 \times 10^{-3}\). While such a small deviation is not discernible in the plot of the population in panel (d), it is nonetheless a significant error in the objective of obtaining high fidelity gates below the quantum error correction limit, typically resulting in gate errors below 10−3. In principle, cavity population can be suppressed by tuning the qubit parameters and using non-trivial pulse shapes [32].

The dynamics for the remaining two-qubit basis states are similar to those of the \(\vert 00 \rangle\) state in Figure 1. The concurrence of the gate implemented by the propagated guess pulse, according to Eq. (3), yields a value of \(C \approx0.8\). This implements the closest diagonal perfect entangler with an average gate error of \(8.3 \times10^{-2}\). Both the loss of population from the logical subspace and the small value of the generated entanglement imply that the chosen pulse parameters are sub-optimal with respect to the desired geometric phase gate. We therefore turn to numerical optimal control to obtain a high fidelity gate.

3 Direct optimization with Krotov’s method

3.1 Optimization functionals and method

The standard approach for implementing a specific quantum gate \(\hat {\mathbf {O}}\) using optimal control theory is to maximize the overlap between the time evolution \(\hat {\mathbf {U}}(T,0;\epsilon(t))\) under the control \(\epsilon(t)\) and the target gate [19]. For a two-qubit gate, this is commonly expressed in the final time functional [40]

$$ J_{T}^{\mathrm {sm}} = 1 - \frac{1}{16} \Biggl\vert \sum _{k=1}^{4} \langle k \vert \hat {\mathbf {O}}^{\dagger} \hat {\mathbf {U}}\bigl(T,0;\epsilon(t)\bigr) \vert k \rangle \Biggr\vert ^{2} ,\quad \vert k \rangle \in \bigl\{ \vert 00 \rangle, \vert 01 \rangle, \vert 10 \rangle , \vert 11 \rangle \bigr\} , $$
(5)

which goes to zero as the target gate \(\hat {\mathbf {O}}\) is implemented, up to a global phase.

Krotov’s method allows to iteratively improve the control field, changing the control \(\epsilon^{(i)}(t)\) to the updated control \(\epsilon ^{(i+1)}(t)\) in the ith iteration. The final time functional \(J_{T}\), given e.g. by Eq. (5), is augmented with a running cost to result in the total functional

$$ J\bigl[\epsilon^{(i)}(t)\bigr] = J_{T}\bigl(\bigl\{ \phi_{k}^{(i)}(T)\bigr\} \bigr) + \int_{0}^{T} g_{a}\bigl[\epsilon^{(i)}(t)\bigr] \,dt ,\quad\quad \bigl\vert \phi_{k}^{(i)}(t) \bigr\rangle = \hat {\mathbf {U}}\bigl(t,0; \epsilon^{(i)}(t)\bigr) \vert k \rangle . $$
(6)

Monotonic convergence is ensured by the choice [9, 40]

$$ g_{a}\bigl[\epsilon^{(i)}(t)\bigr] = \frac{\lambda_{a}}{S(t)} \bigl(\Delta\epsilon(t)\bigr)^{2} ,\quad \Delta\epsilon(t) \equiv \epsilon^{(i+1)}(t) - \epsilon^{(i)}(t) , $$
(7)

where \(\lambda_{a}\) is an arbitrary scaling parameter and \(S(t) \in [0,1]\) is a shape function that ensures smooth switch-on and switch-off.

The iterative update scheme is then given in terms of three coupled equations [9],

$$\begin{aligned}& \Delta\epsilon(t) = \frac{S(t)}{\lambda_{a}} \operatorname {Im}\biggl[ \sum _{k=1}^{4} \biggl( \bigl\langle \chi_{k}^{(i)}(t) \bigr\vert \biggl( \frac{\partial \hat {\mathbf {H}}}{\partial\epsilon} \mathop{\bigg\vert _{ \phi^{(i+1)}(t) }}_{ \epsilon^{(i+1)}(t)} \biggr) \bigl\vert \phi_{k}^{(i+1)}(t) \bigr\rangle \\& \hphantom{\Delta\epsilon(t) =}{} + \frac{1}{2} \sigma(t) \bigl\langle \Delta\phi_{k}(t) \bigr\vert \biggl( \frac{\partial \hat {\mathbf {H}}}{\partial\epsilon} \mathop{\bigg\vert _{ \phi^{(i+1)}(t)}}_{ \epsilon^{(i+1)}(t)} \biggr) \bigl\vert \phi_{k}^{(i+1)(t)} \bigr\rangle \biggr) \biggr] , \end{aligned}$$
(8a)
$$\begin{aligned}& \frac{\partial}{\partial t} \bigl\vert \phi_{k}^{(i+1)}(t) \bigr\rangle = - \frac{i}{\hbar} \hat {\mathbf {H}}^{(i+1)} \bigl\vert \phi_{k}^{(i+1)}(t) \bigr\rangle ,\quad\quad \bigl\vert \phi_{k}^{(i+1)}(0) \bigr\rangle = \vert k \rangle , \end{aligned}$$
(8b)
$$\begin{aligned}& \frac{\partial}{\partial t} \bigl\vert \chi_{k}^{(i)}(t) \bigr\rangle = - \frac{i}{\hbar} \hat {\mathbf {H}}^{\dagger(i)} \bigl\vert \chi _{k}^{(i)}(t) \bigr\rangle ,\quad\quad \bigl\vert \chi_{k}^{(i)}(T) \bigr\rangle = - \frac{\partial J_{T}}{\partial\langle\phi_{k}|} \bigg\vert _{\phi _{k}^{(i)}(T)} , \end{aligned}$$
(8c)

with \(\vert\Delta\phi_{k}(t) \rangle\equiv\vert\phi_{k}^{(i+1)}(t) \rangle- \vert\phi_{k}^{(i)}(t) \rangle\). The second order contribution to the update, with the prefactor \(\sigma (t)\), is required for certain types of functionals [9], as we will see below. For the choice of \(J_{T}^{\mathrm {sm}}\), we may set \(\sigma(t) = 0\) [9].

For the gate mechanism outlined in Section 2, we obtain a phase on each of the four logical basis states. These phases should combine to produce a perfect entangler, with \(\gamma= \pi\) according to Eq. (3). The individual phases \(\phi_{00}\), \(\phi_{01}\), \(\phi_{10}\) and \(\phi_{11}\) depend delicately on the shape, amplitude, and duration of the pulse. It is therefore not known a priori for a given guess pulse which exact geometric phase gate will or can be reached, and thus what should be the target gate \(\hat {\mathbf {O}}\) of the optimization. For a gate \(\hat {\mathbf {U}}_{0}\) induced by a guess pulse, we may construct the closest diagonal perfect entangler by numerically evaluating

$$ \hat {\mathbf {O}} = \mathop{\operatorname{arg\,min}}\limits_{\phi_{00}, \phi_{01}, \phi_{10}} \bigl\Vert \hat {\mathbf {O}}_{\operatorname {diag}}( \phi_{00}, \phi_{01}, \phi_{10}) - \hat {\mathbf {U}}_{0}\bigr\Vert , $$
(9)

with

$$ \hat {\mathbf {O}}_{\operatorname {diag}}(\phi_{00}, \phi_{01}, \phi_{10}) = \operatorname {diag}\bigl[ e^{i \phi_{00}}, e^{i \phi_{01}}, e^{i \phi_{10}}, e^{i (\pi+ \phi_{01} + \phi_{10} - \phi_{00})} \bigr] , $$
(10)

which includes the condition \(\gamma= \pi\) to make the gate a perfect entangler. Using this gate as a target fully determines the optimization problem, with two caveats. First, the target gate depends on \(\hat {\mathbf {U}}_{0}\) induced by the (arbitrary) guess pulse, and second, the construction of the closest diagonal perfect entangler does not take into account the topology of the optimization landscape; the ‘closest’ gate is by no means guaranteed to be the one that is easiest to reach.

An approach that addresses these issues is to go beyond the standard functional of Eq. (5) and formulate a functional that targets the properties of the geometric phase gate specifically, without stipulating the phases on all of the logical states. We split the functional into two terms,

$$ J_{T}^{\mathrm {geo}} = \frac{1}{8} ( J_{\operatorname {diag}} + J_{\gamma} ) , $$
(11)

where \(J_{\operatorname {diag}}\) goes to zero if and only if the gate is diagonal (with arbitrary phases) and \(J_{\gamma}\) goes to zero if and only if the gate is also a perfect entangler, \(\gamma= \pi\). Thus, the functional is conceptually similar to a recently proposed functional targeting an arbitrary perfect entangler [41]. However, the additional restriction to enforce a diagonal gate is important, as the Hamiltonian also allows for non-diagonal gates, but only through undesired non-adiabatic effects.

The two terms take the form

$$\begin{aligned}& J_{\operatorname {diag}} = 4 - \tau_{00}\tau_{00}^{*} - \tau_{01}\tau_{01}^{*} - \tau_{10} \tau_{10}^{*} - \tau_{11}\tau_{11}^{*} , \end{aligned}$$
(12a)
$$\begin{aligned}& J_{\gamma} = 2 + \tau_{00} \tau_{01}^{*} \tau_{10}^{*} \tau_{11} + \tau_{00}^{*} \tau_{01} \tau_{10} \tau_{11}^{*} , \end{aligned}$$
(12b)

with

$$ \begin{aligned} &\tau_{00} \equiv \langle00|\hat {\mathbf {U}}|00 \rangle , \quad\quad\tau_{01} \equiv \langle01|\hat {\mathbf {U}}|01 \rangle , \\ &\tau_{10} \equiv \langle10| \hat {\mathbf {U}}|10 \rangle , \quad\quad\tau_{11} \equiv \langle11|\hat {\mathbf {U}}|11 \rangle . \end{aligned} $$
(12c)

The construction of \(J_{\gamma}\) is based on the observation that

$$ \gamma= \pi \quad\Longleftrightarrow\quad 2 + e^{i \gamma} + e^{-i \gamma} = 0 , $$

which becomes Eq. (12b) by associating \(\tau_{k}\) with \(e^{i \phi_{k}}\) and using the definition of γ in Eq. (3). Both \(J_{\operatorname {diag}}\) and \(J_{\gamma}\) take values \(\in[0, 4]\), hence the normalization factor \(\frac{1}{8}\) in Eq. (11) to bring the value of the functional closer to that of \(J_{T}^{\mathrm {sm}}\).

In contrast to \(J_{T}^{\mathrm {sm}}\), the functional \(J_{T}^{\mathrm {geo}}\) is not convex, since the states enter in higher than quadratic order. The Krotov update equation (8a) must then include the second order contribution, where \(\sigma(t)\) can be determined numerically in each iteration as [9]

$$ \sigma(t) = - \max(\epsilon_{A}, 2A + \epsilon_{A}) , \quad A = \frac{2 \sum_{k=1}^{4} \operatorname {Re}[\langle\chi_{k}(T) \vert\Delta \phi _{k}(T) \rangle ] + \Delta J_{T}}{ \sum_{k=1}^{4} \vert \Delta\phi_{k}(T)\vert ^{2}} , $$
(13)

with a small non-negative number \(\epsilon_{A}\), and \(\Delta J_{T} \equiv J_{T}(\{\phi_{k}^{(i+1)}(T)\}) -J_{T}(\{\phi _{k}^{(i)}(T)\})\). The boundary condition for the backward propagated states in Eq. (8c) yields

$$\begin{aligned}& \bigl\vert \chi_{00}(T) \bigr\rangle = \bigl( \tau_{00} - \tau_{01}\tau_{10}\tau_{11}^{*} \bigr) \vert 00 \rangle, \end{aligned}$$
(14a)
$$\begin{aligned}& \bigl\vert \chi_{01}(T) \bigr\rangle = \bigl( \tau_{01} - \tau_{00}\tau_{10}^{*}\tau_{11} \bigr) \vert 01 \rangle, \end{aligned}$$
(14b)
$$\begin{aligned}& \bigl\vert \chi_{10}(T) \bigr\rangle = \bigl( \tau_{10} - \tau_{00}\tau_{01}^{*}\tau_{11} \bigr) \vert 10 \rangle, \end{aligned}$$
(14c)
$$\begin{aligned}& \bigl\vert \chi_{11}(T) \bigr\rangle = \bigl( \tau_{11} - \tau_{00}^{*}\tau_{01}\tau_{10} \bigr) \vert 11 \rangle. \end{aligned}$$
(14d)

Both \(J_{T}^{\mathrm {sm}}\) and \(J_{T}^{\mathrm {geo}}\) are only loosely connected to the average gate fidelity that is accessible to experimental measurement. In the case of a two-qubit gate and non-dissipative dynamics this can be evaluated as [42]

$$ F_{\mathrm {avg}}= \int\bigl\vert \langle\Psi \vert \hat {\mathbf {O}}^{\dagger} \hat {\mathbf {U}} \vert \Psi \rangle\bigr\vert ^{2} \,d\Psi = \frac{1}{20} \bigl( \bigl\vert \operatorname {tr}\bigl[\hat {\mathbf {O}}^{\dagger} \hat {\mathbf {U}} \bigr]\bigr\vert ^{2} + \operatorname {tr}\bigl[ \hat {\mathbf {O}}^{\dagger} \hat {\mathbf {U}} \hat {\mathbf {U}}^{\dagger} \hat {\mathbf {O}} \bigr] \bigr) . $$
(15)

Thus, \(F_{\mathrm {avg}}\), respectively the gate error \(\varepsilon _{\mathrm {avg}}\equiv1-F_{\mathrm {avg}}\), provides a well-defined measure of the optimization success independent of the choice of optimization functional. For an optimization with \(J_{T}^{\mathrm {geo}}\), we may evaluate \(\varepsilon _{\mathrm {avg}}\) with respect to the closest geometric phase gate resulting from propagation with the optimized pulse, according to Eq. (9).

3.2 Optimization results

The optimization starts from the guess pulse described by Eq. (4), with \(T = 200\mbox{ ns}\) and \(E_{0} = 300\mbox{ MHz}\), as discussed in Section 2, with the dynamics shown in panels (a)-(d) of Figure 1. The gate error with respect to the closest geometric phase gate for this guess is \(\varepsilon _{\mathrm {avg}}= 8.3 \times 10^{-2}\), with a loss of population from the logical subspace of \(\varepsilon _{\mathrm {pop}}= 5.9 \times 10^{-3}\). The concurrence error, defined as \(\varepsilon _{C}\equiv1 - C\), takes the value \(1.9 \times10^{-1}\).

Optimization using Krotov’s method and \(J_{T}^{\mathrm {geo}}\) as the optimization functional converges within 5,516 iterations of the algorithm. Convergence is assumed when the relative change of the functional \(\Delta J_{T} / J_{T}\) falls below 10−4, such that no significant further improvement is to be expected. The gate error is reduced to \(\varepsilon _{\mathrm {avg}}= 1.4 \times10^{-4}\). It is dominated by the remaining loss of population from the logical subspace, \(\varepsilon _{\mathrm {pop}}\approx \varepsilon _{\mathrm {avg}}\), as the concurrence error is only \(\varepsilon _{C}= 1.8 \times10^{-6}\), see Table 2.

Table 2 Optimization success for different optimization schemes

The resulting optimized pulse is shown in Figure 2, with the guess pulse indicated by the dashed line. For the center 100 ns, there are only small deviations from the guess pulse (both in shape and phase). Significant deviations occur only at the very beginning and end, most notably the variation in the complex phase (center panel) between 20 and 40 ns. The spectral width of the pulse (bottom panel) remains well within a bandwidth of ±50 MHz. The dynamics of the \(\vert 00 \rangle\) state under the optimized pulse are shown in panels (e)-(h) of Figure 1. The difference to the dynamics under the guess pulse, panels (a)-(d), is striking; the excitations no longer smoothly follow the pulse shape, but show strong oscillations on top of the expected behavior. The features at the beginning of the optimized pulse provide a kick to the system, inducing oscillations in the populations, with a counter-kick near the end of the pulse. These kicks are very visible in the population of the \(\vert00 \rangle\) state in panel (h), compared to the smooth dynamics for guess pulse in panel (d).

Figure 2
figure 2

Optimized pulse resulting from direct optimization with Krotov’s method using the \(\pmb{J_{T}^{\mathrm {geo}}}\) functional. In the panels from top to bottom, absolute value of the pulse shape, complex phase, and spectrum of the optimized pulse (solid black lines) and of the guess pulse (dashed red/gray lines).

It is worth noting that the gate obtained with the optimized pulse is not the closest geometric phase gate to the gate implemented by the guess pulse, as in Eq. (9). This illustrates the benefit of using \(J_{T}^{\mathrm {geo}}\) over \(J_{T}^{\mathrm {sm}}\). The latter optimizes towards a specific, pre-determined gate, according to Eq. (9), while \(J_{T}^{\mathrm {geo}}\) can dynamically adjust which specific geometric phase gate is easiest to reach, allowing it to fulfill the objective much more easily. Table 2 shows that optimization with \(J_{T}^{\mathrm {geo}}\) requires significantly less propagations (which are directly proportional to CPU time) than optimization with \(J_{T}^{\mathrm {sm}}\), for both direct and pre-optimized strategies.

While the optimization yields a gate error well below the quantum error correction threshold, it deviates significantly from the simple geometric phase gate scheme, resulting in complex dynamics. The numerical effort required to obtain a high fidelity solution is considerable, with several thousand iterations (each iteration requiring two full propagations of four logical basis states). We have discussed here only the optimization for a fixed gate duration of \(T = 200\mbox{ ns}\). Generally, significantly faster quantum gates could potentially be implemented using other mechanisms, e.g. [24]. The geometric phase gate, however, relies on adiabatic shifts of the energy levels such that loss of population from the logical subspace inhibits realization of high fidelities when pushing the gate duration significantly below 200 ns.

The complexity of the optimized pulse is typical for Krotov’s method or other gradient-based optimization methods. For the present example, this clashes with the gate mechanism of the geometric phase gate that intends to use simple and smooth pulse shapes.

4 Hybrid optimization scheme

4.1 Simplex optimization

The gate error of the guess pulse is dominated by the insufficient amount of entanglement that is generated. A natural approach is to maintain the analytical pulse shape of Eq. (4) for the time being, and to vary the free parameters \(E_{0}\) and T in order to maximize the figure of merit. Such an optimization of a pulse determined by only a handful of parameters (in this case two) is easily performed using a gradient-free method such as Nelder-Mead simplex. This has the additional benefit that there are no restrictions on the choice of optimization functional. Specifically, there is no need to formulate it in such a way that derivatives can be calculated analytically. Thus, we can include the objective of minimizing the gate duration in the functional and modify Eq. (11) to read

$$ J_{T}^{\mathrm {splx}} = J_{\operatorname {diag}} + J_{\gamma} + \frac{T}{T_{0}} ,\quad T_{0} = 200 \mbox{ ns}. $$
(16)

Note that with the addition of penalizing the gate duration, the functional is no longer a distance measure that approaches zero as the target is reached; the simplex optimization will find a local minimum of Eq. (16) at a value of \(J_{T}^{\mathrm {splx}}\lesssim1\).

Using only 116 propagations, the algorithm converges to a solution that reduces the gate duration from 200 to 185 ns, while bringing the entanglement error down to \(\varepsilon _{C}= 2.0 \times10^{-5}\), see Table 2. The resulting dynamics are shown in panels (a)-(d) of Figure 3. They are similar to those of the original guess pulse, cf. panel (a)-(d) of Figure 1. The shorter pulse duration and larger pulse intensity (from 300 MHz in the original guess to ≈400 MHz) results in a significantly larger cavity excitation. It also leads to more non-adiabatic defects (wobbles in the cavity excitation). Consequently, the loss of population from the logical subspace is increased by about a factor of two to \(\varepsilon _{\mathrm {pop}}= 1.4 \times10^{-2}\), and limits the total gate error to \(\varepsilon _{\mathrm {avg}}= 1.4 \times10^{-2}\). Thus, while the simplex search yields a dramatic improvement over the original guess pulse, it does not reach a sufficiently high fidelity to approach the quantum error correction limit. To remedy this, we turn to a hybrid approach, combining simplex and gradient-based optimization.

Figure 3
figure 3

Population dynamics under (pre-optimized) guess and optimized pulse. The figure follows the conventions of Figure 1. Panels (a)-(d) show the dynamics resulting from a simplex optimization, see text for details. The resulting pulse is the starting point for a continued optimization using Krotov’s method with the \(J_{T}^{\mathrm {sm}}\) functional. The optimized dynamics are shown in panels (e)-(h). The pulse amplitude indicated in the background of panels (d), (h) is normalized to the peak amplitude of \(E_{0} \approx 400\mbox{ MHz}\). The simplex-optimized pulse implements a geometric phase gate with a gate error of \(\varepsilon _{\mathrm {avg}}= 1.4 \times10^{-2}\), with \(\varepsilon _{C}= 2.0 \times10^{-2}\) and \(\varepsilon _{\mathrm {pop}}= 1.4 \times10^{-2}\). The continued optimization decreases the gate error to \(\varepsilon _{\mathrm {avg}}= 3.4 \times10^{-5}\), with \(\varepsilon _{C}= 5.1 \times10^{-5}\) and \(\varepsilon _{\mathrm {pop}}= 1.1 \times10^{-5}\), see Table 2.

4.2 Continued optimization with Krotov’s method

For the present example, we use the final pulse of the previous section as the starting point of an optimization with Krotov’s method. Since now the guess pulse already has a relatively high fidelity, both the \(J_{T}^{\mathrm {sm}}\) and \(J_{T}^{\mathrm {geo}}\) functionals may be used interchangeably, as we are only searching in a very small vicinity of the starting point. Both methods converge rapidly to \(\Delta J_{T}/J_{T} < 10^{-4}\) in under 200 iterations. The dynamics resulting from the propagation of the \(\vert 00 \rangle\) state under the pulse obtained from optimization with the \(J_{T}^{\mathrm {sm}}\) functional is shown in panels (e)-(h) of Figure 3. The comparison to the dynamics of the pre-optimized guess pulse in panels (a)-(d) is striking: the excitations now follow the pulse shape smoothly. The non-adiabatic defects, i.e., the wobbles especially in the cavity excitation in panel (a), have been corrected. Consequently, the loss of population from the logical subspace is now reduced to a value of \(\varepsilon _{\mathrm {pop}}= 1.1 \times10^{-5}\). Together with only a slight increase in the concurrence error to \(\varepsilon _{C}= 5.1 \times10^{-5}\), the overall gate error of the optimized pulse is \(\varepsilon _{\mathrm {avg}}= 3.4\times10^{-5}\). This is an improvement of half an order of magnitude compared to the direct optimization in Section 3.2. Moreover, the result has been obtained at a small fraction of the numerical cost. The pre-optimized guess and post-optimized pulse shape, indicated as the blue shaded area in panels (d) and (h), appear visually indistinguishable.

The correction to the pre-optimized guess pulse is shown in Figure 4. Indeed, the corrections are on the order of 1 MHz, much smaller than the peak amplitude of \(E_{0} \approx400\mbox{ MHz}\). The corrections follow a regular pattern, both in the shape (top panel) and the complex phase (center panel), as indicated by the existence of sharp peaks in the spectrum (bottom panel). It appears that the non-adiabatic defects in the pre-optimized guess pulse can be corrected by a series of small kicks in amplitude and phase at regular intervals. This results in an optimized pulse that is conceptually simpler and yields a higher fidelity than the direct application of Krotov’s method. The comparison illustrates the power of a hybrid approach to steer the physical characteristics of the optimized control. In our example, the parametrization used in the first optimization stage enforces the desired pulse shape. Without this restriction, pulses of undesirable complexity are obtained. On the other hand, the restriction must ultimately be relaxed to allow for the representation of the higher-frequency features that correct non-adiabatic defects. Thus, the role of the two-stage optimization is also to influence the physical features of the control, in addition to its numerical benefits.

Figure 4
figure 4

Pulse corrections obtained with Krotov’s method. The figure summarizes the differences between the optimized pulse, cf. panel (h) in Figure 3, and the (pre-optimized) guess pulse, cf. panel (d) in Figure 3, also indicated by the dashed red/gray line in each panel. The panels from top to bottom show the corrections to absolute value, complex phase, and spectrum of the pulse, respectively (solid black lines). The amplitude, phase, and spectrum of the guess pulse (dashed red/gray lines) are shown using the alternative axis scaling in red/gray.

The striking numerical efficiency of the hybrid optimization scheme can be seen by comparing its convergence to that of the direct optimization. This is shown in Figure 5. The direct optimization shows an extended plateau for at least the first 100 iterations, before slowly converging. This behavior is typical for ill-chosen guess pulses [43]. For a direct optimization with the \(J_{T}^{\mathrm {sm}}\) functional, the plateau extends for several thousand iterations, and does not yield convergence within 10,000 iterations. In contrast, the optimization starting from a pre-optimized pulse has no plateau (note the log-scale of the x-axis); both \(J_{T}^{\mathrm {sm}}\) and \(J_{T}^{\mathrm {geo}}\) converge at roughly the same rate. The improvement by the simplex (pre-)optimization compared to the original guess pulse can be seen from the difference in the y-intercept between the ‘direct’ and ‘pre-optimized’ curves in Figure 5; in addition, there is also a reduction in the gate duration from 200 to 185 ns at no additional numerical cost.

Figure 5
figure 5

Convergence of optimization towards a geometric phase gate. Value of the final-time optimization functional \(J_{T}^{\mathrm {geo}}\), respectively \(J_{T}^{\mathrm {sm}}\), over the number of iterations using Krotov’s method. Each iteration requires two full propagations; the number of propagations are proportional to the required CPU time. The direct optimization starts from an arbitrary guess pulse, see Section 3.2 for details. In the pre-optimized case, the guess pulse was the result of a simplex optimization, see Section 4.2 for details.

The gate duration for the geometric phase gate mechanism is limited by the requirement of adiabaticity. We have also performed the hybrid optimization scheme for a gate duration of ≈100 ns. In this case, the simplex search yields significant non-adiabatic defects, with a loss of population of \(1.2 \times10^{-1}\). The concurrence error is \(2.3\times10^{-3}\) and the total gate error, \(\varepsilon _{\mathrm {avg}}= 1.2 \times10^{-1}\), is dominated by the population loss. Post-optimization using Krotov’s method significantly reduces the total gate error to \(\varepsilon _{\mathrm {avg}}= 1.4 \times10^{-2}\). The post-optimization result is obtained at a numerical cost very close to that for the \(T=200\mbox{ ns}\) gate, and yields pulse corrections very similar to those shown in Figure 4. This results in a correction of the non-adiabatic defects and thereby lowers the population loss to only \(6.5\times 10^{-3}\). The total gate error is now dominated instead by the increased concurrence error of \(1.7\times10^{-2}\), which is insufficient for a high quality phase gate. The observation that an overall improvement in gate performance at \(T = 100\) ns from hybrid optimization is only possible by increasing adiabaticity at the cost of reduced entanglement, indicates that a quantum speed limit has been reached for the specific gate mechanism. These results show that a hybrid optimization scheme may be used successfully even when operating close to the quantum speed limit.

While it is not the aim of this work to characterize the quantum speed limit for the coupled transmon system, we note that this could be quantified by systematically scanning over the gate duration [26]. Under the constraints of the specific gate mechanism employed here, a numerical approach appears necessary to extract the speed limit. However, for the Strauch gate of Ref. [44], an analytic estimate may be made from summing the minimal precession periods required for each component of the gate, i.e., two iSWAP and one controlled-Z gates. With the qubit-cavity interaction strength of \(g = 70\mbox{ MHz}\) employed here, this would suggest that gate durations as short as a few tens of nanoseconds might be attainable for this system.

We may also compare the total number of propagations necessary to obtain the optimized pulse, included in Table 2 together with measures of success: concurrence error, loss of population from the logical subspace, and gate error. Each iteration using Krotov’s method requires two propagations. For the hybrid optimization schemes (‘pre-opt s.m.’, ‘pre-opt hol.’), the number of propagations includes simplex as well as the additional propagations due to Krotov’s method. The gate error is measured with respect to the exact target gate when using the square-modulus functional, and with respect to the closest geometric phase gate to the optimized dynamics in all other cases. In terms of numerical efficiency, the hybrid schemes outperform the direct optimization by nearly two orders of magnitude while resulting in a significantly better gate fidelity.

5 Conclusions

For the example of a geometric phase gate on a system of two transmon qubits with a shared transmission line resonator, we have considered the application of gradient-based optimization methods, specifically Krotov’s method, and a gradient-free optimization method, Nelder-Mead simplex. The objectives of the geometric phase gate can be formulated in a specialized optimization functional \(J_{T}^{\mathrm {geo}}\) that reaches its optimal value for any diagonal perfect entangler. We have shown that for a direct optimization, this functional vastly outperforms the ‘standard’ square-modulus functional for gate optimization. Convergence is aided by the use of a functional that formulates the objective as general as possible. This is in agreement with recent results for optimization using a functional targeting arbitrary perfect entanglers [41, 45].

The direct optimization using Krotov’s method can in principle find controls that implement the desired gate with high fidelity. However, the resulting dynamics are complex and the numerical effort is dominated by an extended plateau in the initial phase of the optimization; for short gate durations, optimization becomes increasingly harder and the fidelity is limited by loss from the logical subspace. The numerical effort to reach high fidelities quickly becomes unfeasible in this case. In contrast, parametrizing the pulse by its duration and peak amplitude only, and applying a simplex optimization on those parameters, we are able to find a simple analytic pulse that implements the desired gate with moderate fidelity, still one order of magnitude above the quantum error correction limit. Thus, for the example presented here, neither the application of a gradient-based algorithm nor the simplex optimization alone yield satisfactory results; only the combination of both methods into a hybrid optimization scheme is able to obtain controls with a clear mechanism that implement a geometric phase gate to high fidelity, with a minimum of numerical effort.

These results prompt the recommendation to generally adapt hybrid optimization schemes, i.e., obtain guess pulses for gradient-based optimization from a gradient-free pre-optimization, when there is insufficient knowledge to design good guess pulses by hand. There is great flexibility in the choice of parametrization. Here, we have taken the two free parameters, peak amplitude and gate duration, in a simple fixed analytical formula for the pulse shape. Generally, one might use slightly more sophisticated parametrizations, following e.g. the CRAB approach [18]. The small number of free parameters and relatively high quality of the original guess pulse with a fidelity of already >90% results in particularly fast convergence of the simplex method. For a more sophisticated example, at least several hundred propagations would probably be required in the simplex stage. However, even in that case, this numerical effort is by far outweighed by the large number of iterations required to leave the initial plateau in the optimization landscape for a bad guess pulse in a direct gradient-based optimization. Moreover, any figure of merit is suitable for optimization with the simplex method, as there is no need to derive the gradient for the optimization functional. For example, the gate duration can be included in the figure of merit, something that generally is not straightforward in gradient-based methods [46]. The hybrid scheme is aimed at providing optimal solutions in an open-loop context, and still leaves open the possibility of further combining open-loop and closed-loop optimization methods when targeting a specific experimental setup [47].

In principle, the approach could thus be extended to multiple stages, where in each stage, a different parameterization and a suitable optimization algorithm is used. There is no requirement for a specific method such as Nelder-Mead simplex or Krotov’s method to be employed. For example, a two-stage optimization using a genetic algorithm in the first stage, and a gradient algorithm in the second stage has been used for the optimization of quantum gates in strongly coupled two-level systems [48, 49]. There, starting with a simple parametrization, and then moving to a less restricted search space in the second optimization stage was also found to benefit the overall optimization performance, in agreement with the results shown here.

Adding an additional stage might also be beneficial when the optimization landscape is non-trivial and contains traps or saddle points. This may happen e.g. when the optimization is performed with limited resources [50]. Repeating the simplex search from a different starting point - either by systematic variation or by random search - may then find solutions that do not get stuck in traps of the optimization landscape. Of course, the truncation of the search space due to the low-dimensional parametrization may itself introduce additional traps in the landscape. However, these traps would disappear when returning to the full search space in the final optimization stage.

For the specific example of the geometric phase gate considered here, we have found the post-optimization to introduce small corrections to non-adiabatic defects. Due to the small relative strength of the correction, the experimental realization of the geometric phase gate would require extraordinary precision. Moreover, the large excitation of the cavity would limit the fidelity when dissipative effects, specifically spontaneous decay of the cavity are taken into account. However, we stress that the method of employing hybrid optimization schemes presented here is entirely general and aimed at reducing the numerical effort in obtaining high fidelity solutions to arbitrary quantum control problems, not limited to quantum information processing. Moreover, robustness with respect to both fluctuations in the control parameters and dissipation can be achieved using complimentary advanced control techniques, such as a description in Liouville space and ensemble optimization [31, 43]. These approaches are numerically even more demanding, such that the reduction of the optimization cost achieved by the hybrid scheme discussed here may become imperative.