1 Introduction

Precision measurements play a fundamental role in physics, as they constitute a key ingredient of many state-of-the-art applications and experiments testing the limits of scientific theories. But the accuracy to which such measurements can be performed is itself governed by the laws of physics—and, ultimately, by those of quantum mechanics [14].

A generic measurement protocol for estimating the value of an unknown parameter consists of preparing a probe in a desired initial state, allowing it to interact with the physical system whose state depends on the parameter and, finally, obtaining a measurement result that encapsulates the information about it. This process, however, is often affected by systematic and statistical errors. While the source of the former may stem from imperfect calibration of the measurement instruments, the origin of the latter can either be accidental, due to insufficient control of the measurement chain, or fundamental, deriving from the nature of the physical measurement [5, 6]. Fortunately, statistical errors, regardless of their origin, can be minimized by repeating the process and averaging the resulting outcomes, as a consequence of the central limit theorem [7]. This theorem states that, given a large number N of independent measurement results, their average will converge to a Gaussian distribution with standard deviation scaling as \(1/\sqrt{N}\). In metrology, this is referred to as the Standard Quantum Limit (SQL). For many practical applications that involve noisy and error-prone systems, it is essential to devise protocols that yield precision levels close to the SQL. However, it is known that this limit is not fundamental—the ultimate limit set by the laws of quantum physics is the Heisenberg Limit (HL) [811].

Recently, several systems have achieved sufficient levels of technological maturity to allow the experimental exploration of these limits. Due to the non-idealities present in these systems, they are typically referred to as NISQ (noisy intermediate-scale quantum). Largely, the motivation for this development has been rooted in quantum information processing, where efficient and precise measurements and state manipulation is required [12], and indeed significant progress towards the implementation of quantum gates and algorithms has been made in optical setups [13, 14], NV centers in diamond [15, 16], trapped ions [17, 18] and superconducting circuits [19, 20]. Since it employs similar control and measurement techniques as quantum computing, the exploration of quantum-enhancing techniques has grown as a separate field, usually referred to as quantum metrology [21, 22], with applications such as ultrasensitive force detection [23], adaptive environment sensing [24], near-surface electric field measurements [25], sensing of weak signals [26] and even detection of gravitational waves [27].

The paradigmatic protocol in quantum metrology is quantum phase estimation. The quantum phase is a parameter that cannot be directly measured and yet, it contains information about other quantities of interest, such as electric or magnetic fields. The traditional quantum metrology approach to phase estimation has been through the use of highly entangled states such as NOON states [1, 3, 28], as well as other specially optimized states [2931]. However, highly entangled states tend to be very sensitive to noise: for NOON states even the loss of a single qubit (photon) results in a separable state. Thus, the implementation of this method is challenging with respect to real-life applications. Fortunately, it has been later realized that precise quantum phase estimation does not necessarily require entanglement. In optics, it was demonstrated that adaptive homodyne phase measurements yield uncertainties close to the quantum uncertainty of coherent states [32, 33]. It was also shown how to adaptively estimate the angle of a half waveplate that characterizes the linear polarization of photons [34, 35] as well as how to perform phase-shift estimation with polarization-encoded qubits [36]. Furthermore, the inverse quantum Fourier transform, which is the last step in Shor’s factorization algorithm and which typically requires gate entangling, can be carried out using local measurements and feedback [37]. This approach has been used to break the SQL in experiments with photons [38], superconducting qubits [39] and NV centers in diamond [40].

The incorporation of machine learning techniques in estimation protocols is a natural step forward. Seminal theoretical work on employing reinforcement learning algorithms has demonstrated the potential of these methods for reaching sensitivities below the SQL when used in conjunction with entanglement [4147]. Recently, some of these methods have been tested experimentally in an optics setup with their estimation precision being limited by the SQL [48]. This only upholds the potential of applying machine learning for the optimization of parameter estimation under resources and noise level constraints, but the foundations of these methods are still poorly understood. In mathematical statistics, the limits of machine learning algorithms are an active area of investigation—for example deriving bounds on the number of samples needed to reach a certain accuracy [49]. However, the issue at stake is that typically all the mathematical results, including the SQL, are derived from the so-called central limit \(N \gg 1\) and for ideal noiseless systems. Yet the relevant regime for the present status of quantum technologies in the NISQ era is that of moderate N and significant noise. In this context no general unbiased estimator has been found yet [28].

The objective of this work is to employ two machine learning algorithms, the DE and the PSO algorithm, in the design of adaptive estimation schemes capable of driving the measurement precision close to the SQL. We numerically show that the SQL is still a valid bound even in the regime of not too large N. We also demonstrate that machine learning algorithms can be used even in the presence of various experimental noises, consequently providing robust policies with better performance than non-adaptive protocols. For practical devices such as magnetometers based on single qubits, these methods can be directly applied. More precisely, we observe that in order to increase the precision of these devices, one straightforward option is to increase the sensitivity to the parameter to be estimated by, for example, using higher-energy states or by increasing the response using a different biasing point. However, both of these strategies result in an increased coupling to noise sources and, therefore, a compromise needs to be reached in order to achieve the maximum performance. This will be further detailed in the paper when analyzing the experimental realizations.

Both the DE and the PSO algorithm can also be employed as subroutines in order to enhance other quantum algorithms. For example, algorithms that use variable sensing times, or multipass techniques, are in principle able to breaks the SQL and reach the HL. Instead of averaging at every sensing time, which is the typical approach used one can further increase the sensitivity by using our technique. Beyond quantum metrology, machine learning protocols could become useful in other quantum-information paradigmatic problems that involve phase estimation, such as factorization, sampling, and computation of molecular spectra [50]. In particular, our calculations are relevant for NISQ quantum technologies, where the number of qubits is limited and subject to errors and noise. Overall, by benchmarking these two important protocols for the task of quantum phase estimation, we hope that machine learning algorithms will be more prominently employed in applications such as optical interferometry and magnetometry, where the increase of precision is essential and where the aim is set on reaching the HL.

The paper is organized in the following sections. Section 2 describes the general concept of adaptive quantum phase estimation in Ramsey interferometry, discussing the updating procedure as well as the relevant sources of noise. Section 3 presents the two machine learning algorithms, the DE and PSO algorithm. Section 4 presents our main results, where we show how the two algorithms allow us to approach the SQL. In this section we also provide an analysis on the effects of Gaussian noise, Random Telegraph noise (RTN) and quantum decoherence on the performance of the algorithms. Section 5 discusses the implementation of our protocol on several experimental platforms, namely Mach–Zehnder optical interferometers, superconducting qubits, trapped ions and defects in diamond. Finally, we conclude our results in Sect. 6.

2 Adaptive quantum phase estimation scheme

To perform the estimation of an unknown phase ϕ, we use a generic adaptive scheme that is able to adjust the value of a control phase θ to match the value of ϕ, based on the results of previous measurements in a Ramsey interferometric sequence.

The schematic representation of the adaptive quantum phase estimation scheme displayed in Fig. 1 consists of a quantum circuit made of a Hadamard gate H, a phase shift gate \(\mathrm{U}_{\phi ,\theta }\), another Hadamard gate H and a detector augmented with a processing unit to calculate the value of θ for the next round of measurements. Using standard notations in quantum information, the Hadamard and phase shift gates can be respectively defined as

$$ \mathrm{H} = \frac{1}{\sqrt{2}} \begin{bmatrix} 1 & 1 \\ 1& -1 \end{bmatrix} \quad \textrm{and} \quad \mathrm{U}_{\phi ,\theta } = \begin{bmatrix} 1 & 0 \\ 0 & e^{i (\phi -\theta )} \end{bmatrix} . $$
(1)
Figure 1
figure 1

Adaptive quantum phase estimation scheme. A qubit m is injected into the quantum circuit either in the quantum state \(|0\rangle \) or \(|1\rangle \) and its measured outcome \(\zeta _{m} \in \{0,1\}\) is used together with some policy phase instruction \(x_{m} \in \mathopen [0,2\pi \mathclose [\) to prepare the circuit for the next round of measurements. The quantum circuit consists of two Hadamard gates H, a phase shift gate \(\mathrm{U}_{\phi ,\theta }\), a detector D, and a processing unit with phase update instructions for the controllable phase shifter

Under this adaptive quantum estimation scheme, an ensemble of N qubits is injected sequentially into the circuit in randomly chosen quantum states, either in state \(|0\rangle \) or state \(|1\rangle \), and their outcome is measured to prepare the value of the controllable phase shifter for the next incoming qubit. The input state of the quantum circuit takes the form \(|0\rangle _{1}\otimes |1\rangle _{2} \otimes |1\rangle _{3} \otimes \cdots \otimes |0\rangle _{N}\), which is manifestly separable. After the last Nth qubit is injected and its outcome measured, the final phase value of θ is considered to be the estimated value of ϕ. The initial states of the qubits can be represented in Dirac notation as

$$ |0\rangle = \begin{bmatrix} 1 \\ 0 \end{bmatrix} \quad \textrm{and} \quad |1\rangle = \begin{bmatrix} 0 \\ 1 \end{bmatrix} . $$

The idea is to have at the end of the process an estimated controllable phase value θ as close as possible to the value of the unknown phase ϕ. The state of each qubit after the second Hadamard gate is

$$ |\psi _{\pm }\rangle = \frac{1}{2} \bigl[ 1 \pm e^{i (\phi - \theta )} \bigr] |0\rangle + \frac{1}{2} \bigl[ 1 \mp e^{i (\phi - \theta )} \bigr] |1\rangle , $$

where the upper sign corresponds to a qubit whose initial state was \(|0\rangle \) and the lower sign to one whose initial state was \(|1\rangle \). This yields measurement results \(\zeta \in \{0,1\}\) with probabilities

$$ \mathcal{P}_{+}(\zeta = 0\vert \phi , \theta ) = \mathcal{P}_{-}(\zeta = 1\vert \phi , \theta ) = \cos ^{2} \frac{\phi - \theta }{2}, $$
(2)

and

$$ \mathcal{P}_{+}(\zeta = 1\vert \phi , \theta ) = \mathcal{P}_{-}(\zeta = 0\vert \phi , \theta ) = \sin ^{2} \frac{\phi - \theta }{2}. $$
(3)

If the phase difference \(\phi - \theta \) is zero, then θ approximates very well the unknown phase ϕ and a qubit prepared in state \(|0\rangle \) will also exit in state \(|0\rangle \). Similarly, the same logic can be applied for a qubit injected in the quantum state \(|1\rangle \).

In order to estimate the phase ϕ at every step m of the algorithm the controllable phase \(\theta _{m}\) has to be updated based on the previous result \(\zeta _{m-1}\) of the measurement. Here we adopt a standard updating rule

$$ \theta _{m} = \theta _{m-1} + (-1)^{\zeta _{m}} x_{m}, $$
(4)

which has already been employed successfully in similar setups [4446], and where the initial value of θ can be fixed to \(\theta _{0}=0\) without any loss of generality.

Note that the term \(x_{m}\) in Eq. (4) represents the update policy which will be determined by the machine learning algorithms. This update rule can be viewed as a decision tree where for each qubit m, the controllable phase shifter must update its value \(\theta _{m}\) either by adding or by subtracting \(x_{m}\) depending on the measured outcome state of the previous qubit \(\zeta _{m} \in \{0,1\}\). Consequently, for N qubits the number of possible \(\theta _{m}\) values increases exponentially as \(2^{(N+1)}-1\). This is a Markovian feedback, since the new controllable phase \(\theta _{m}\) depends only on the latest measured outcome state \(\zeta _{m}\).

To evaluate the performance of a policy, we use the Holevo variance [2931], defined as

$$ V_{\mathrm{H}} = S^{-2}-1 = \bigl\vert \bigl\langle e^{i(\phi -\theta )} \bigr\rangle \bigr\vert ^{-2}-1, $$
(5)

where \(\langle e^{i(\phi -\theta )} \rangle \) represents the average value of \(e^{i(\phi -\theta )}\) for different phase values ϕ and their respective estimates θ considered in the learning process of the machine learning algorithms. Here we make the notation abbreviation \(\theta = \theta _{N}\), since the Holevo variance of a policy can only be calculated after the last qubit N is injected into the circuit and its outcome measured. The quantity \(S = \langle e^{i(\phi -\theta )} \rangle \in [0,1]\) is called the sharpness of the phase distribution, where the value \(S=1\) corresponds to a perfect estimation of ϕ. For periodically bounded variables such as the phase, the Holevo variance is a direct measure of the standard deviation by \((\Delta \phi )^{2}=V_{\mathrm{H}}\). Therefore we have \(V_{\mathrm{H}} \sim 1/N\) for the SQL. It is also important for the model of the designed adaptive quantum phase estimation scheme to include the possibility of errors and imperfections that occur in a real experimental situation. This provides an important test to the robustness of the algorithms to relatively general sources of noise which can be encountered on most experimental platforms.

The first source of noises to be considered are the noises in the application of the controlled unitary phase shift transformation \(\mathrm{U}_{\phi - \theta }\), namely the Gaussian and the Random Telegraph noise. Note that the Random Telegraph noise is particularly relevant for experiments involving solid-state qubits.

In the scenario of Gaussian noise, the noise follows a normal distribution parametrized by the standard deviation σ. Letting \(\theta ^{{ (\mathrm{GSN})}}\) represent the actual value of the controllable phase shifter subjected to the noise, the Gaussian noise distribution can be defined as:

$$ p \bigl(\theta ^{{ \mathrm{GSN}}}_{m} \bigr) = \frac{1}{\sqrt{2\pi }\sigma } \exp \biggl[-\frac{1}{2\sigma ^{2}} \bigl( \theta ^{{ \mathrm{GSN}}}_{m}-\theta _{m} \bigr)^{2} \biggr]. $$
(6)

In the scenario of Random Telegraph noise, the noise follows a discrete distribution where at each time step the controllable phase shifter value can be randomly offset by a fixed valued λ with a probability η. Letting \(\theta ^{{ (\mathrm{RTN})}}\) represent the value of the controllable phase shifter subject to this source of noise, the Random Telegraph noise distribution can be described as:

$$ p \bigl(\theta ^{{ \mathrm{RTN}}}_{m} \bigr) = \textstyle\begin{cases} 1 - \eta , & \theta ^{{ \mathrm{RTN}}}_{m} = \theta _{m}, \\ \eta , & \theta ^{{ \mathrm{RTN}}}_{m} = \theta _{m} + \lambda . \end{cases} $$
(7)

The other important type of noise to be considered affects the qubit itself. This modifies the unitary evolution generated by a Hermitian Hamiltonian \(\mathcal{H}\) to a non-unitary evolution given by the Lindblad master equation

$$ \dot{\rho } = -\frac{i}{\hbar } [\mathcal{H}, \rho ] + \Gamma _{1} \mathcal{D}[\sigma _{-}](\rho ) + \frac{\Gamma _{\varphi }}{2} \mathcal{D}[\sigma _{z}](\rho ) , $$
(8)

where \(\Gamma _{1}\) is the relaxation rate, \(\Gamma _{\varphi }\) is the pure dephasing rate and the dissipation superoperator \(\mathcal{D}\) acting on the density matrix ρ is defined by \(\mathcal{D}[A](\rho ) = A \rho A^{\dagger } - \frac{1}{2} \{A^{\dagger }A, \rho \}\). It is also useful to introduce the standard notations \(T_{1}\) and \(T_{2}\) for the relaxation and decoherence time, \(T_{1} = 1/\Gamma _{1}\) and \(T_{2} = 1/ ( \Gamma _{1}/2 + \Gamma _{\varphi } )\).

To implement the phase gate \(U_{\phi - \theta _{m}}\) from Eq. (1) the Hamiltonian must be of \(\sigma _{z}\) type, with a component for the unknown phase ϕ and another one for the control \(\theta _{m}\). Typically, for Ramsey experiments with a phase accumulation time τ, we have \(\mathcal{H}_{m}=\frac{\hbar }{2}(\phi /\tau - \theta _{m}/\tau ) \sigma _{z}\) at the step m, with \(U_{\phi - \theta _{m}} = \exp [- i \mathcal{H}_{m} \tau /\hbar ]\), up to a global phase factor. The solution of Eq. (8) is a \(2 \times 2\) matrix with elements \(\rho _{00}(\tau ) = 1- \exp (-\tau /T_{1})\rho _{11}(0)\), \(\rho _{01}(\tau ) = \exp (-i \phi + i \theta _{m} -\tau /T_{2})\rho _{01}(0)\), \(\rho _{10}(\tau ) = \exp (i \phi - i \theta _{m} -\tau /T_{2})\rho _{10}(0)\) and \(\rho _{11}(\tau ) = \exp (- \tau /T_{1})\rho _{11}(0)\). If the state at \(\tau =0\) is prepared by the action of the Hadamard gate from either \(\vert 0\rangle \) or \(\vert 1\rangle \), corresponding respectively to the + and − signs below, and at the final time τ we apply again a Hadamard gate, we obtain that at every step m of the algorithm the probabilities are modified as

$$ P_{\pm }(\zeta _{m}\vert \phi , \theta _{m}) = \frac{1}{2} \bigl[1 \pm (-1)^{ \zeta _{m}} \nu \cos (\phi - \theta _{m}) \bigr], $$
(9)

where \(\nu = \exp ( - \tau /T_{2})\) is called interference visibility. One can check that for maximum visibility, \(\nu =1\), we recover Eqs. (2) and (3). Further considerations can be found in Appendix A.1.

3 Machine learning algorithms

The problem of quantum phase estimation relies on a sequential and cumulative set of measurements to drive the estimation process, thus making it an ideal problem for reinforcement learning algorithms. In this work, we considered the Differential Evolution (DE) [51, 52] and the Particle Swarm Optimization (PSO) [5355] , among other reinforcement learning algorithms, as they are the most commonly employed for similar tasks in literature [4147].

These algorithms employ a direct search method to the exploration of the search space generated by all the possible policy configurations. Direct search methods use a greedy criterion to drive their exploration. Such methods guarantee fairly fast convergence times, even though such fast convergence times often come at the expense of becoming trapped in a local minimum. This comes as result of the greedy criterion promoting decisions that lead to shorter and more immediate term rewards usually in detriment of fully exploring the total search space. To avoid this scenario, it is important to perform a thorough study on the controllable parameters of each of the mentioned algorithms before applying them to the quantum phase estimation task. To evaluate the performance of an algorithm there are two fundamental criteria: its ability to converge within the imposed number of iterations to a solution and its ability to converge to a valid solution.

3.1 Differential evolution

The implementation of the DE algorithm to the current problem starts with a set of P populations, each representing a candidate solution for the adaptive scheme update policy. Each of these populations is a vector of size N, with each entry representing a new phase instruction to prepare the controllable phase shifter for each of the qubits being injected into the system. The DE algorithm at each iteration will employ its direct search method to P N-dimensional vectors \(x^{G}_{i,j}\), where \(i \in \lbrace 1, 2,\ldots, P \rbrace \) and \(j \in \lbrace 1, 2,\ldots, N \rbrace \) represent each entry of the candidate solutions vectors and G represents the generation of the population vectors.

Each of these vectors is initialized with random values for each entry in the interval \(x^{0}_{i,j} \in [0, 2\pi ]\). Afterwards, at each iteration, the DE algorithm generates possible new candidate solution vectors for the next generation by adding the weighted difference between four population vectors to a fifth vector. This process is referred to as mutation:

$$ \tilde{u}^{G+1}_{i,j} = x^{G}_{r_{1},j} + F \cdot \bigl(x^{G}_{r_{2},j} + x^{G}_{r_{3},j} - x^{G}_{r_{4},j} - x^{G}_{r_{5},j} \bigr). $$

Here F represents a constant value in the interval \([0,1]\) which controls the amplification of the difference between the considered populations. Hence, F will be referred to as the amplification parameter. Note as well that all indexes \(\lbrace r_{1},\ldots, r_{5}\rbrace \) are randomly chosen integer values always different between themselves. At this point, the entries of the newly mutated vectors \(\tilde{u}^{G+1}_{i,j}\) are randomly mixed with the originally corresponding vectors to increase their diversity. This process is referred to as crossover:

$$ \tilde{x}^{G+1}_{i,j} = \textstyle\begin{cases} x^{G}_{i,j} ,&\text{if $R_{1} > C$ and $j \neq R_{2}$}, \\ \tilde{u}^{G+1}_{i,j} ,& \text{if $R_{1} \le C$ or $j = R_{2}$}. \end{cases} $$

This process is controlled by the crossover parameter C which can take any value in the interval \([0,1]\). Hence, a crossover only occurs if the random value \(R_{1}\), which is generated for each population member at each iteration, is below or equal to the chosen crossover parameter. The value of \(R_{2}\) is an integer randomly chosen for each population at each iteration of the evaluation process to ensure that at least one entry from the newly mutated vectors \(\tilde{u}^{G+1}_{i,j}\) is passed to the trial vector for the next generation \(\tilde{x}^{G+1}_{i,j}\).

Finally, the new trial vectors are compared against the population vectors of the previous generation to see which perform best against the cost function of the problem and, as a result, become a member of the next generation of populations. This process is referred to as selection:

$$ x^{G+1}_{i,j} = \textstyle\begin{cases} x^{G}_{i,j} ,& \text{if $f (x^{G}_{i,j} ) < f ( \tilde{x}^{G+1}_{i,j} )$}, \\ \tilde{x}^{G+1}_{i,j} , & \text{if $f (x^{G}_{i,j} ) \ge f (\tilde{x}^{G+1}_{i,j} )$}. \end{cases} $$

Here \(f(\cdot )\) represents the cost function associated to the quantum phase estimation which is defined by the Holevo variance in Eq. (5). Therefore, if the new trial vectors \(\tilde{x}^{G+1}_{i,j}\) minimize this cost function when compared to the previous generation vectors \(x^{G}_{i,j}\), then they become part of the next generation of populations. Otherwise, the previous generation populations survive for the next generation. This entire process illustrates the adapted DE algorithm implemented in this work and is schematically represented in Fig. 2.

Figure 2
figure 2

Overview of the three main stages of the DE algorithm. (a) Mutation: each entry of the candidate solution vector is mutated, generating a new mutated candidate solution vector; (b) Crossover: a new candidate solution vector is created with entries from the original and the newly created mutated vector; (c) Selection: the new and the original candidate solution vector are tested against the cost function and the one with the best results is propagated for the next generation

3.2 Particle swarm optimization

The implementation of the PSO algorithm starts with a set of P particles, each representing an individual candidate solution to the update policy of the adaptive scheme. Each particle can move in the N-dimensional search space associated with the N different phase instructions of the controllable phase shifter for each of the input qubits. Therefore, each particle will be represented by a position vector \(x^{G}_{i,j}\) and velocity vector \(v^{G}_{i,j}\), where \(i \in \lbrace 1, 2,\ldots, P \rbrace \) and \(j \in \lbrace 1, 2,\ldots, N \rbrace \) represent each entry and G the generation of the vectors. Note that the position vector \(x^{G}_{i,j}\) corresponds to a candidate solution vector, while the velocity vector \(v^{G}_{i,j}\) represents the change in direction of the corresponding position vector in the search space.

All entries of these vectors are initialized with random values in the interval \([0, 2\pi ]\). At each iteration, each particle evaluates its current position according to the cost function of the search problem, defined by the Holevo variance in Eq. (5), and compares its value with the positions previously visited by itself and with the positions previously visited by the entire collective of particles. If the current position is better than its own previously visited positions, the particle stores it in a variable \(\mathit{pbest}_{i}\), where \(i \in [1,\ldots, P]\) is the identifier of that particle. If, in addition, the current position is better than all of the other previously visited positions by the entire collective ensemble, the same position is stored in a variable gbest shared among all other particles. This process is illustrated in Fig. 3.

Figure 3
figure 3

Visual representation of the PSO algorithm. Initially, all the particles representing the different candidate solution vectors are initialized at random positions with random velocities. At each iteration the particles explore the search space of the problem, eventually converging around the collectively found global optimum. The particles are able to share knowledge regarding the best-found position with other particles, which is generically represented by the red fuzzy lines. In our implementation the topology of inter-particle communication is such that all particles are able to share information with every other particle

Both of these variables will determine the entire exploration of the search space by the P particle candidate solutions. After each iteration, each particle will use this collective knowledge to adjust its displacement for the next turn according to

$$ x^{G+1}_{i,j} = x^{G}_{i,j} + w \cdot v^{G+1}_{i,j} $$

and

$$ v^{G+1}_{i,j} = v^{G}_{i,j} + \alpha \cdot R_{a} \cdot (\mathit{pbest}_{i,j}-x_{i,j}) + \beta \cdot R_{b} \cdot (\mathit{gbest}_{j}-x_{i,j}). $$

Here the parameter α controls the desirability of each particle to move towards its best found position, while the parameter β controls the desirability of each particle to move towards the best found solution by the entire collective. Both \(R_{a}\) and \(R_{b}\) are uniformly distributed random values in the interval \([0,1]\). In addition, the parameter w works as a damping weight controlling the change of direction imposed by the new velocity vector at the current position of each particle.

To steer the direction of each particle to a converging candidate policy solution and to avoid overstepping the found minima, an additional parameter \(v_{\max }\) is imposed to the algorithm which determines the maximum value that each entry in the velocity vector may take. As the algorithm advances in its search for the optimal policy solution, this parameter will decay with the number of iterations, consequently reducing the step size of each particle and forcing them to converge to a solution. This final adjustment completes the description of the adapted PSO algorithm implemented in this work.

3.3 Parameters analysis

It is important to understand the behaviour and performance of these two algorithms under the different possible configurations of their controllable parameters. To verify the convergence of the solutions, it is important to remember that at each iteration of both algorithms there are P different candidate solutions, each one represented by a given population of N phase value policies. As the algorithm iterates, the candidate solutions should move closer to each other until converging to a final solution. Thus, one way of inferring this convergence value is by calculating the deviation from the average value of each population for each entry of their candidate solution and then averaging these values. To do so, the convergence value L is defined as

$$ L = \sum^{N}_{j}\frac{1}{N} \Biggl(\sum^{P}_{i} \frac{\bar{x}_{j}-x_{i,j}}{P} \Biggr) , $$
(10)

where \(\bar{x}_{j}\) corresponds to the average value of entry j over all the candidate solutions and \(x_{i,j}\) corresponds to the entry j of the candidate solution vector i. Therefore, lower values of L occur when all the candidate solutions are relatively close to each other and the algorithm has converged to a solution. On the other hand, larger values of L indicate that the algorithm was not able to converge to a solution at the given iteration.

The algorithms should also converge to a valid solution. It is not enough that the algorithms converge to a solution, if it is not the correct one. Letting K represent the total number of different phase values of ϕ considered in the learning task imposed to the machine learning algorithms, the performance of a policy can be evaluated by the Holevo variance in Eq. (5). This equation, however, is computationally expensive in its current form. So instead, we can approximate it numerically [29, 42, 46] to reduce computational time. A more efficient evaluation of the Holevo variance can be described as

$$ V_{H} = S^{-2} - 1 = \Biggl\vert \frac{1}{K} \sum ^{K}_{k=1} e^{i [\phi ^{(k)}-\theta _{N}^{(k)} ]} \Biggr\vert ^{-2}-1 , $$
(11)

where values for \(\phi ^{(k)}-\theta _{N}^{(k)}\) close to zero signify lower values of imprecision (sharpness \(S\approx 1\)) and therefore better performance.

Additionally, the performance \(V_{H}\) of each candidate policy vector is evaluated \(M=5\) separate times and the results averaged in order to smoothen small fluctuations. Repeating the simulation multiple times allows for a more accurate representation of the performance of each policy and thus more consistent results. The number of training instances K can be any arbitrarily large enough number given that it does not hinder the computational time. A large number of training instances K also ensures a faithful representation of ϕ in the interval \(\mathopen [0, 2\pi \mathclose [\), since they are sampled uniformly and randomly from that same interval. A reasonable choice that satisfies these criteria is \(K=10N^{2}\), a number sufficiently large to guarantee the convergence of the algorithms [46]. Overall, the time complexity of the algorithms scales with \(\mathcal{O}(P \cdot G \cdot N \cdot K \cdot M) \sim \mathcal{O}(N^{3})\) which is polynomial in time.

At this point it is possible to completely evaluate the performance of each algorithm according to the different possible configurations of each of their controllable parameters and choose those that achieve better results. While Eq. (10) provides a measurement for the convergence of the algorithms, Eq. (11) shows the precision to which they are able to estimate the value of the unknown phase ϕ.

A thorough study on the performance of both algorithms under the different possible controllable parameter configurations can be found in Appendix A.2 and Appendix A.3. The optimal parameter configuration obtained for each algorithm is summarized in Table 1.

Table 1 Optimal parameters for the DE and PSO algorithms obtained through the analysis described in Appendices A.2 and A.3

Along with a fixed configuration for each algorithm, it is also important to ensure that all the other variable parameters remain the same in order to draw comparable results. Thus, it is important that for the same number of input qubits N being injected into the system, the population number of candidate solution vectors P and the number of training instances K remain the same under all different scenarios. As previously mentioned, the number of training instances was set to \(K=10N^{2}\), while the number of populations was defined as \(P=20+2 \cdot \operatorname{int}(N/10)-1\), where \(\operatorname{int}(\cdot )\) represents the integer part of the division. Note that for an increasing number of qubits being injected into the system, the population size of candidate solution vectors P and the number of training instances K must also increase to accommodate the increasing complexity of the problem search space.

Ideally, both algorithms would be allowed to run until all the different candidate solution vectors would have converged to a successful policy vector. However, due to time constraints the number of iterations for which both algorithms were allowed to run for each number N of input qubits was set to \(G=100\), regardless of having reached convergence or not. Thus, both algorithms would stop either when they had converged to a solution, or when they reached iteration \(G=100\), and the final policy vector would be the average of all the different candidate solution vectors at that point.

4 Results

In order to provide a benchmark for the two machine learning algorithms discussed above, we first introduce a non-adaptive protocol that can also be run in the presence of noise. This protocol has the advantage of simplicity and we find that for moderate noise values it yields results that are better or comparable with machine learning algorithms. On the other hand, for increased noise values the machine learning protocols show better results.

We start by discussing the ideal configuration where any source of noise or quantum decoherence is neglected, then we consider the Gaussian and Random Telegraph noise configurations and, finally, the visibility loss due to decoherence. These different sources of imperfection were applied to the quantum phase estimation configuration independently. The number of varying qubits used under all the different scenarios was set in the interval \(N \in [5,25]\) and all the remaining parameters were left the same across all the different scenarios. For \(N>20\) we found that the algorithms were already taking more than five days to arrive at a successful policy vector, which ultimately made any larger value of N computationally impracticable under a reasonable amount of time.

The break in performance in non-ideal systems is clearer for larger values of noise, since increasing the value of noise in the estimation process consequently leads to an increased complexity of the search space for both algorithms. Thus, for larger values of N this added level of complexity becomes even more evident in the scattering of the obtained results. However, it is important to stress that when both algorithms had enough time to converge to a successful policy, they were able to perform with precision values close to the SQL, thus attesting their resilience to noise in the quantum phase estimation scheme.

4.1 A non-adaptive protocol and the standard quantum limit benchmark

The SQL is defined strictly speaking in the limit of large N. Since we do not work in this asymptotic regime, it is important to devise a non-adaptive protocol that reproduces the SQL for \(N \gg 1\), yet it yields results also at \(N \approx 5-25\).

This protocol can be outlined as follows. We consider the random phase that should be estimated ϕ and a fixed control phase θ. Based on these values, we calculate the probability \(\mathcal{P}_{\pm }(0\vert \phi , \theta ) \), see Eqs. (2), (3) or Eq. (9) with \(\nu =1\). Then, for each N we generate N uniformly random numbers in the interval \([0,1]\). If the random number is less than \(\mathcal{P}_{\pm }(0\vert \phi , \theta )\) we add it into the 0 bin, otherwise we add it to 1. Next, we count the number of elements in the 0 stack, \(N_{\pm }(0)\). Finally, we find an estimation for the phase \(\phi ^{\mathrm{est}}\), as \(\phi ^{\mathrm{est}}=\theta + \arccos (\pm 2N_{\pm }(0)/N\mp 1)\), and we calculate the Holevo variance with the exponent in Eq. (11) as \(\exp [i(\phi -\phi ^{\mathrm{est}})]\).

The number of elements \(N_{+}(0)\) and \(N_{+}(1)=N-N_{+}(0)\) follows the binomial distribution as shown in Fig. 4. This distribution is obtained from two constant phase estimation by 50 measurements, repeated 250,000 times.

Figure 4
figure 4

Distribution of \(N_{+}(0)\) outcomes for 25,000 experiments of fixed phase estimation for two values of \(\mathcal{P}_{+}\)

This procedure is repeated \(K=10 N^{2}\) times for each phase ϕ uniformly distributed in the interval \([0,2\pi ]\). The resulting Holevo variance is represented by the blue squares in Fig. 5. We have verified numerically that the non-adaptive method reproduces asymptotically the SQL. For example, even at \(N=100\) the difference between simulated \(V_{H}\) and \(1/N\) is 10 % and reaches 4 % at \(N=800\). Also, as it is seen from the inset at Fig. 5, the slope for this non-adaptive protocol is equal to −1.0265, while for the ideal SQL this slope should be −1.

Figure 5
figure 5

Estimation precision based on the logarithm of the Holevo variance \(\ln (V_{H})\) of both the DE and PSO algorithms under the ideal configuration for \(N\in [5,25]\) qubits

The same procedure is used to simulate the variance in the presence of different noises, as it is shown below.

4.2 Adaptive protocols in the ideal noiseless configuration

Before analysing the evolution of the performance obtained by the DE and PSO algorithms under a increasing number of qubits N, it is important to show that this increase in N does indeed lead to better estimation values. To do so, the two algorithms were first allowed to converge to a given value of the unknown phase ϕ for different values of N. This experiment was repeated 1000 times for each value of N and, at each time, the value of the estimated phase θ was recorded. Finally, these 1000 different results of θ for each value of N were fitted to a probability density function (PDF) and centered all around the same value \(\theta = \pi \) to better compare the results. The results for each algorithm are displayed side by side in Fig. 6.

Figure 6
figure 6

Probability density function of the DE and PSO algorithms as a function of the phase value θ

Considering the results obtained in Fig. 6 it is indeed possible to see that by increasing the number of qubits N at the input level the estimation process leads to values of θ converging further towards the given value of the unknown phase ϕ. It is also possible to see that as N goes higher, this increase in precision starts to be less evident as the algorithms start running into convergence issues. Given that the complexity of the search space of each algorithm scales polynomial in N and that due to time constraints the algorithms were not allowed to run for more than \(G=100\), a decrease in performance under this time limitation for large values of N is expected.

Having arrived at this conclusion, the final results obtained under the ideal scenario, where there is no source of noise or quantum decoherence, are shown in Fig. 5. Observing the results it is possible to see that both algorithms were able to follow the SQL for the ideal-case scenario. It is also possible to reinforce the previous found conclusion that an increase in N leads to better results. In fact, for \(N\in [5,15]\) the scaling of the Holevo variance is well approximated by a power law \(V_{H}\sim N^{-\alpha }\), where \(\alpha _{\mathrm{DE}}=0.75\) for the DE algorithm and \(\alpha _{\mathrm{PSO}}=0.64\) for the PSO algorithm, while the corresponding value for the SQL is \(\alpha _{\mathrm{SQL}}=1\). This also shows that the DE performs slightly better than PSO at reaching low variances, which is consistent with results obtained in other contexts, such as estimating the bias in the quantum walk algorithm [43]. This scaling in precision close to the SQL for \(N\in [5,15]\) is consistent with what has been numerically observed in other algorithms [31].

The fact that the machine learning algorithms do not reach the SQL is also consistent with the known results that the adaptive measurements cannot saturate the Cramér-Rao inequality even for optimal measurements [56, 57]. It is also noticeable that for larger values of N the performance starts to break due to the limiting number of iterations \(G=100\) imposed for the convergence of the algorithms. This is not a malfunction of the algorithms, but a direct consequence of the restricted time available. As the complexity of the search space of the algorithms increases, so does the number of generations required to converge to a successful policy.

4.3 Configurations with noise

Considering the results obtained under the ideal configuration, it is important to study the resilience of the algorithms to different sorts of imperfections that can be found in an experimental quantum phase estimation scheme. First, the performance of the algorithms was evaluated in the presence of Gaussian noise, followed by Random Telegraph noise and finally in the presence of quantum decoherence.

4.3.1 Gaussian noise

In this scenario the algorithms were tested for increasing amounts of Gaussian noise \(\sigma =\{0.2,0.4,0.8\}\) when dealing with the controllable phase shifter θ according to Eq. (6). The results obtained under these conditions are presented in Fig. 7.

Figure 7
figure 7

Estimation precision based on the logarithm of the Holevo variance \(\ln (V_{H})\) of both the DE and PSO algorithms under increasing values of Gaussian noise σ for \(N \in [5,25]\) qubits

It is possible to see that as the noise fluctuations increase the precision of both adaptive and non-adaptive algorithms starts to diminish. This is, nevertheless, expected for any estimation process being conducted under increasing values of noise. The break in performance is also perceptible for larger values of N. However, we see from Fig. 7 that the adaptive algorithms are more sensitive to increasing values of Gaussian noise and that for \(\sigma =0.8\) the policies obtained from machine learning become clearly superior to the non-adaptive protocol.

4.3.2 Random telegraph noise

Considering now the scenario where the estimation scheme was subject to a Random Telegraph noise following Eq. (7), the algorithms were tested against increasing values of \(\lambda =\{0.2,0.4,0.8\}\) while keeping the probability of switching to the erroneous phase value fixed at \(\eta =0.4\). The results obtained by the algorithms under these configurations are displayed in Fig. 8.

Figure 8
figure 8

Estimation precision based on the logarithm of the Holevo variance \(\ln (V_{H})\) of both the DE and PSO algorithms under increasing values of Random Telegraph noise λ with a fixed probability of \(\eta =0.4\) for \(N \in [5,25]\) qubits

Similarly to the results obtained under the Gaussian noise, the results obtained in Fig. 8 show that both adaptive and non-adaptive algorithms partly follow the SQL curve even in the presence of Random Telegraph noise. The poorer performance for larger values of noise, as well as the break in performance for larger values of N, is also evident under this scenario for the same reason as before. We can also see that for larger values of λ the machine learning adaptive algorithms are more robust than the non-adaptive algorithm.

4.3.3 Quantum decoherence

In this last scenario, we study the impact of quantum decoherence in the performance of both algorithms. This is a key source of noise in all non-optical qubits (e.g. trapped ions, superconducting qubits, NV centers). The algorithms were tested against decreasing values of visibility \(\nu =\{0.9,0.8,0.6\}\) according to Eq. (9). Note that unlike the ideal scenario, the reduced visibility also impacts the SQL, see Eq. (18) in Appendix A.1. Also note that the value \(\nu = e^{-0.5} \approx 0.6\) appears in certain tasks of non-adaptive parameter estimation as the visibility corresponding to an optimal phase accumulation time τ of half the decoherence time [21]. The results obtained under this configuration are shown in Fig. 9.

Figure 9
figure 9

Estimation precision based on the logarithm of the Holevo variance \(\ln (V_{H})\) of both the DE and PSO algorithms under decreasing values of visibility \(\nu =\{0.9, 0.8, 0.6\}\) for \(N \in [5,25]\) qubits

Considering the results obtained in Fig. 9, it is evident that reduced values of visibility have a significant impact on the performance of the algorithms. This is an expected behaviour, since it is known that quantum enhancement for systems operated at around the decoherence rate is asymptotically limited to only a constant factor improvement over the SQL [58]. This can be confirmed by an analysis of the Fisher information:

$$\begin{aligned} \mathcal{F}_{\phi ,\theta _{m}} = \overline{ \bigl[\partial _{\phi }\ln P_{\pm } (\zeta _{m}\vert \phi , \theta _{m}) \bigr]^{2}} = \sum_{\zeta _{m} = 0,1} \frac{ [\partial _{\phi } P_{\pm } (\zeta _{m}\vert \phi , \theta _{m}) ]^{2}}{P_{\pm } (\zeta _{m}\vert \phi , \theta _{m})} = \frac{\nu ^{2} \sin ^{2} (\phi - \theta _{m})}{1- \nu ^{2} \cos ^{2} (\phi - \theta _{m})} . \end{aligned}$$
(12)

The Fisher information quantifies the knowledge gained at each measurements. It can also be extended from projective to positive operator-valued measurements, see [59], and it is used to define the SQL (see Appendix A.1). Indeed, from Eq. (12) it can be observed that ν reduces the information extracted after each measurement from its maximum value \(\mathcal{F}_{\phi ,\theta _{m}} =1\) at \(\nu =1\).

Similarly here we observe that relatively large values of decoherence (visibility around 0.6) result in the suppression of precision obtained by the non-adaptive protocol, while the adaptive DE and PSO are less sensitive.

5 Experimental implementations

Summing up the theoretical results obtained so far, we can see that even for noisy systems it is possible to reduce the Holevo variance to values in a band of typically \([-1.2, -1.7]\) (standard deviations between 0.5 and 0.4) with a moderate numbers of \(N \in [10,25]\) qubits. This protocol can be implemented on several experimental platforms, using either optical or microwave frequencies. We describe here the main experimental platforms and we show how the problem of optimizing the precision in the presence of noise can be addressed using the machine learning framework.

The most straightforward implementation is optical Mach–Zehnder interferometry. In this case, the operator \(U_{\theta -\phi }\) can be realized by placing a phase shifter ϕ in one branch of the interferometer and a variable-phase shifter θ in the other branch. The latter can be realized physically as a half-wave plate placed on a rotation stage controlled by a computer [60]. The states \(\vert 0\rangle \) and \(\vert 1\rangle \) correspond to a single photon in one branch or the other of the interferometer (dual-rail encoding). Our results can be compared also with those obtained from running the more powerful Kitaev algorithm, which for the same number of resources (from 10 to 25) results in standard deviations ranging from 0.35 to 0.16 [60], only marginally better than the data reported here. Also, our results are consistent with the theoretical limits reported in [61] for the case of non-entangled initial states (\(V_{H} = 0.5609\), \(\ln V_{H} =-0.25\)). We obtain approximately a factor of 2 improvement in the standard deviation. For implementations employing optical interferometry, the visibility is typically very close to 1, and the main noise sources in the feedback loop concern changes in the refractive index of the beam-splitters and variable phase shifter, as well as variations in the optical paths typically caused by temperature.

More recently, an experiment with optical photons has tested the PSO algorithm with sequential non-entangled photons [48]. The single photons were obtained non-deterministically, by generating entangled pairs through spontaneus parametric down-conversion in a 2 mm long beta-barium borate (BBO) crystal and heralding over detection events in another detector. The unknown and the control phases in the two arms of the Mach–Zehnder interferometer were realized with liquid crystal devices, where the relative phase between the horizontal and vertical polarization can be changed depending on the applied electric field. The results with the PSO algorithm obtained agree with our analysis: PSO is quite efficient at reaching values approaching the SQL, especially when the number of resources (number of photons N) is limited. Similarly to our findings, for N exceeding approximately 15 photons the Holevo variance tends to saturate. The robustness with respect to Gaussian phase noise and dephasing noise was also demonstrated by artificially adding errors to the feedback phase.

In the case of qubits based on discrete levels (solid-state and ions/atoms), Mach–Zehnder interferometry corresponds to Ramsey interference [62, 63]. To understand how the phase information is embedded in this system, consider the following generic qubit Hamiltonian driven by a microwave field [64],

$$ \mathcal{H} = \frac{\hbar }{2}\omega _{01}\sigma _{z} + \hbar \Omega \cos \omega t \sigma _{x}. $$
(13)

In a frame defined by the unitary \(\exp (i \omega t \sigma _{z})\) (a clockwise rotation around the z-axis), using \(\exp (i \omega t \sigma _{z})\sigma _{x}\exp (-i \omega t \sigma _{z}) = \sigma _{x}\cos \omega t - \sigma _{y} \sin \omega t\) and by further applying the rotating-wave approximation we get the effective Hamiltonian

$$ \mathcal{H} = \frac{\hbar }{2}(\omega _{01}-\omega ) \sigma _{z}+ \frac{\hbar \Omega }{2}\sigma _{x}. $$
(14)

A non-zero Rabi frequency \(\Omega \neq 0\) can be thus used to create the two Hadamard gate, while the time in-between is spend for the actual sensing. Indeed, if \(\Omega = 0\) for a sensing time τ, the resulting unitary becomes

$$ \mathrm{U}_{\phi ,\theta } = \begin{bmatrix} 1 & 0 \\ 0 & e^{i (\omega _{01} - \omega )\tau } \end{bmatrix} , $$
(15)

which is exactly Eq. (1) up to an overall irrelevant phase and the identification \(\phi = \omega _{01}\tau \), \(\theta = \omega \tau \).

Consider now a concrete problem, that of evaluating the magnetic field using a superconducting qubit, a trapped ion, or a nitrogen-vacancy (NV) center in diamond. In these cases, for typical experiments using Ramsey interferometry, the probability is given by Eq. (9), where the dependence of the visibility ν on the sensing time τ can be sometimes different from a simple exponential decay \(\nu (\tau ) = \exp [-\tau /T_{2}]\), valid for the case of Gaussian noise (see the Lindblad equation used in Sect. 2). For example, another dependence is \(\nu (\tau ) = \exp [-(\tau /T_{2})^{2}]\) if the noise experienced by the qubit is 1/f, see e.g. [40, 63, 64]. Since τ is bounded by \(T_{2}\), in these setups one might attempt to increase the precision by increasing the sensitivity of \(\omega _{01}\) to the magnetic field. However, this means increasing the coupling to the magnetic field, which at the same time this will increase the exposure of the qubit to noise. Thus, a tradeoff must be reached between these two competing effects. Our results demonstrate that the increase in noise can be mitigated successfully by the use of machine learning strategies.

In the case of a superconducting qubit in the symmetric transmon design as used in recent magnetometry experiments [63, 65, 66], the information about magnetic field is embedded in the energy level separation as

$$ \omega _{01}(B) = \frac{1}{\hbar } \biggl( \sqrt{8 E_{\mathrm{C}} E_{\mathrm{J} \Sigma } \cos \biggl\vert \pi \frac{BS}{\Phi _{0}} \biggr\vert } - E_{\mathrm{C}} \biggr), $$
(16)

where B is the magnetic field piercing the SQUID area S of the transmon and modulating the total Josephson energy \(E_{\mathrm{J}\Sigma }\). In order to evaluate B, we can keep τ fixed and adjust the frequency ω (generated by an external signal generator) at every step. In the case of superconducting qubits, with relatively standard values \(T_{2} = 10\ \mu\text{s}\) and \(\tau =1\ \mu\text{s}\) we obtain \(\nu = 0.9\) (one of the values used in Sect. 4) if the noise is Gaussian and \(\nu = 0.99\) if it is 1/f noise.

In order to increase the precision of determining the magnetic field, we can use a higher excited state [65, 67]—for example, the second excited state \(\vert 2\rangle \), and considering the harmonic approximation for the transmon, the relevant accumulated phase will be \(\approx 2 \omega _{01}\tau \); or we can bias the transmon to a magnetic field value where the slope \(d\omega _{01}/dB\) is larger. Both situations result in an increase in the noise. In the first case, this is due to the fact that higher energy levels have higher couplings to the electromagnetic environment. This causes an increase in \(T_{2}\), cause due to both an increase in the \(T_{1}\) time and an increase in the pure dephasing time. For example, if the second excited state is used, the decay rate of the \(1-2\) transition is twice that of the \(0-1\) transition, which for \(T_{1}\)-limited qubits results in a significant reduction of the interferometric visibility. In the second situation, due to biasing far from the sweet point. In the latter case, let us restrict ourselves for simplicity to the interval \(BS \in [-\Phi _{0}/2, \Phi _{0}/2 ]\). We then have

$$ \frac{d\omega _{01}}{dB} = - \frac{\pi S \omega _{01}}{\Phi _{0}} \tan \biggl( \pi \frac{BS}{\Phi _{0}} \biggr). $$

This slope is infinite for \(BS = \pm \Phi _{0}/2\), apparently allowing us to measure B with an infinite precision. However, the displacement of the bias point from the sweet point is accompanied by a significant increase in noise, since the qubit is no longer protected against linear-oder fluctuations. This results again in a visibility ν below unity.

The noises in the control phase \(\theta _{m}\) are typically not caused by electronics in the feedback look, but they result from uncontrollable frequency shifts that are poorly understood and controlled. Experimentally, these shifts are of the order of a few MHz. If τ is of the order of microseconds, then the resulting values of λ and σ are well below those considered in this work. Therefore, this type of noise will not affect the performance of our protocols when run on a superconducting qubit.

In the case of trapped ions, sensitive magnetometry using the well-known \({}^{171}\mathrm{Yb}^{+}\) ion has been demonstrated [68]. This uses four hyperfine states, \(|\mathrm{F} = 0, m_{\mathrm{F}} = 0\rangle \), \(|\mathrm{F} = 1, m_{\mathrm{F}} = 1\rangle \), \(|\mathrm{F} = 1, m_{\mathrm{F}} = -1\rangle \), and \(|\mathrm{F} = 1, m_{\mathrm{F}} = 0\rangle \) belonging to the \(^{2}S_{1/2}\) manifold. The latter three states are degenerate and they are separated by the hyperfine splitting \(\omega _{\mathrm{hf}}/(2\pi ) = 12.642\text{ GHz}\) from the first state. The degeneracy of these three states can be lifted by the application of a magnetic field. In the first order in magnetic field, the state \(|\mathrm{F} = 1, m_{\mathrm{F}} = 0\rangle \) remains unmodified but \(|\mathrm{F} = 1, m_{\mathrm{F}} = \pm 1\rangle \) acquires frequency shifts of \(\pm (g_{e} \mu _{\mathrm{B}}/2\hbar ) B\), where \(g_{e} \approx 2\) is the g-factor of the electron and \(\mu _{\mathrm{B}}\) is the Bohr magnetron. Thus, for magnetic field detection one could in principle use the state \(|\mathrm{F} = 0, m_{\mathrm{F}} = 0\rangle \) and either of the magnetic-sensitive states \(|\mathrm{F} = 1, m_{\mathrm{F}} = \pm 1\rangle \) and drive resonant Ramsey \(\pi /2\) microwave pulses at around 12 GHz with τ time separation. Then the information about magnetic field is obtained from the phase \(\phi = (\omega _{\mathrm{hf}} \pm g_{e} \mu _{\mathrm{B}}B/2\hbar )\tau \). These states would be exposed not only to the magnetic field that we would like to sense, but also to magnetic field noise, making our results for the noisy case relevant. Further improvements may be achieved by the use of a continuous dynamical decoupling technique, where one could identify the \(|\mathrm{F} = 1, m_{\mathrm{F}} = 0\rangle \equiv |0\rangle \) and the dark state \(\frac{1}{\sqrt{2}} (|\mathrm{F} = 1, m_{\mathrm{F}} = - 1\rangle + |\mathrm{F} = 1, m_{\mathrm{F}} = 1\rangle ) \equiv |1\rangle \) as a dressed-qubit states, with a \(T_{2}\) time exceeding one second [69], three orders of magnitude more than the bare atomic states which is a few miliseconds.

Similar ideas can be applied to NV centers in diamond. These defects have very long decay times, of the order of tens of miliseconds, and total decoherence times are of the order of microseconds and can be extended to hundreds of microseconds by the use of dynamical decoupling pulses [70]. Single NV centers have a triplet \(S=1\) ground-state structure, with the states \(m_{\mathrm{S}}=0\) and \(m_{\mathrm{S}}=\pm 1\) separated by the so-called zero-field splitting \(D = 2.87\text{ GHz}\). By applying a magnetic field along the crystal axis, the levels \(m_{\mathrm{S}}=\pm 1\) can be further split by the Zeeman effect [71]; the resulting energy level difference between these levels is \(2 g_{e} \mu _{\mathrm{B}} B/\hbar = 2 \gamma _{e}B \) where \(\mu _{\mathrm{B}}\) is the Bohr magnetron and \(g_{e}\) is the electronic g-factor, \(g_{e}\approx 2\). The gyromagnetic ratio \(\gamma _{e}\) is defined as \(\gamma _{e} = g_{e} \mu _{\mathrm{B}}/\hbar \). Because the triplet states can be selectively addressed with microwaves [71] we can immediately identify for example \(\vert 0\rangle = \vert m_{\mathrm{S}}=0 \rangle \) and \(\vert 1\rangle = \vert m_{\mathrm{S}}= 1 \rangle \) as the basis corresponding to our formalism, and we obtain \(\omega _{01} = 2\pi D + \gamma _{e} B\). The encoding of the magnetic field in frequency and subsequently in the phase is linear, \(\Phi = (2\pi D + \gamma _{e} B)\tau \).

To summarize, each of these experimental systems involved relatively well characterized imperfections, and an evaluation of our protocols for any specific setup can be realized by inspecting the general results obtained for various types of noise in Sect. 4. We find that the precision can be optimized by using machine learning protocols to mitigate the noise resulting from the increase in sensitivity. We also note that this tradeoff is expected to occur also if machine learning algorithms are included as subroutines in protocols that use highly entangled states to increase the precision, since highly entangled states are also very susceptible to noise.

6 Conclusion

The objective of this work was to study the performance of machine learning algorithms, namely the DE and PSO algorithm, in a quantum phase estimation scheme and determine their ability to come close to the SQL without resorting to multi-particle entanglement. To this end, both algorithms were tested against different configuration scenarios and were able to follow the SQL curve up to a given value of input qubits N. Under the constraint of \(G=100\) it was possible to notice that the algorithms start to loose performance for larger values of N. This becomes even more relevant in the scenario with quantum decoherence. However, it is important to reiterate that this is not a deficiency of the algorithms, but a direct consequence of the time and computational resources available.

These limitations can be overcome in future works by optimizing the code and making it more time efficient in order to allow both algorithms a larger number of iterations before converging to a valid solution. An immediate improvement would be to fully vectorize the code which would make it significantly faster. Another direct improvement would be to parallelize the code so that it could leverage the power of stronger computers with higher number of cores. This would allow for the different independent simulation threads of both algorithms to be conducted in parallel, making the estimation process even more time efficient. One can also explore recurrent neural networks for the quantum phase estimation task, as they are particularly well suited for regression-based prediction models on time series events. It would be interesting to see how architectures such as the Long Short-Term Memory (LSTM), the Gated-Recurrent Unit (GRU) and the Temporal Convolutional Networks (TCN) would compare against our reinforcement learning algorithms.

Overall, we have shown that machine learning algorithms can provide noise-robust policies and we benchmarked their performances for various types of noise in close connection to experimental realizations. Above a certain critical value of noise, we found that machine learning based protocols show better results than non-adaptive methods. This opens a door to future developments that may push even further the precision of quantum phase estimation with classical machine learning algorithms without resorting to preparation and measurement of highly entangled states.