1 Introduction

Ideal and fault-tolerant quantum computers will provide us with game-changing platforms in various area such as security [1], chemical [2, 3], and financial engineering [4,5,6,7,8,9,10]. Although quantum devices available now are all small and noisy [11], the recent progress in hardware [12] is certainly going to bridge this gap. At the same time various software methods, which are expected to lower the hurdle for realizing quantum computing, are now being developed as well [13,14,15]. With careful evaluation of both of these approaches, it is important to have a right prediction on what applications of quantum computing can be realized and when they might emerge in reality; for instance [16, 17]. The quantum volume [18] is one reasonable measure along with discussing such predictions.

This paper focuses on the quantum amplitude estimation algorithm [19, 20], which can be typically applied to speedup the classical Monte Carlo methods [4, 5], etc; more precisely, in an ideal setup the quantum amplitude estimation algorithm can quadratically reduce the number of samples and thereby the computation time for Monte Carlo methods. Because the standard amplitude estimation algorithm is demanding to implement, due to several technical reasons including the use of many ancilla qubits for quantum Fourier transform and the use of many controlled gate operations, recently some new techniques that circumvent these challenges have been developed [21,22,23,24]. In particular, our approach [21] takes the maximum likelihood (ML) method to estimate the amplitude without both the ancilla qubits and the controlled operation conditioned on those qubits, thereby drastically reducing the number of quantum gates (particularly the controlled NOT gate) involved in this algorithm compared to the standard one; also the near quadratic speedup was demonstrated in a numerical simulation in the ideal setup.

This paper extends the ML method [21] so that it can be used for actual noisy quantum computers, and then gives an experimental demonstration on a superconducting quantum device. The point of the ML method is that, while the true probability distribution generating the data is in general unknown, the ML estimator is constructed based on a suitable model distribution. In our case, the true distribution is determined from the output state of a noisy quantum computer operated under several imperfections that are impossible to perfectly characterize. To approximate such unknown noise source for the purpose of constructing the ML estimator, in this work, we consider the depolarizing noise model; this is often chosen as the minimal or the worst-case noise model [25,26,27] which is also used for quantifying the gate fidelity of a given quantum computing device, i.e., the randomized benchmarking [28,29,30]. We also add a fact that the depolarization noise process has a clear mathematical merit in that it commutes with an arbitrary unitary operation, which thereby enables analytic treatment for our estimation problem. We then formulate the problem as a two-parameters estimation problem with respect to the target amplitude parameter and the noise parameter. This problem formulation introduces a new important aspect to the near-term quantum computing field, in the sense that the unavoidable noise coming from the realistic imperfection has also to be estimated as a nuisance parameter. In our problem, thanks to the property of depolarizing noise, we have an explicit form of the Fisher information matrix for discussing the accuracy of estimation and thereby derive a formula for specifying the noise level so that near-quadratic speedup is achieved to reach a given estimation accuracy. Furthermore, the explicit formula of Fisher information matrix reveals the existence of anomalous case, where the target parameter cannot be efficiently estimated because the Fisher information matrix becomes degenerate. Note that, hence, such anomalous target parameter appears only in the multi-parameters estimation problem. Fortunately we can provide a simple way to circumvent this difficulty.

Below we show the organization of the paper, together with the summary of the results obtained in this paper

  1. Section 2:

    We take the depolarizing noise model and then formulate the two-parameters estimation problem. With this noise model we can have the analytic expression of the Fisher information matrix, which gives an asymptotically achievable lower bound of the estimation error. This result is further used to derive a condition of the noise level required to have nearly quadratic speedup to reach a given estimation error. Furthermore, we show the anomalous case where the Fisher information matrix is degenerate and consequently the target parameter cannot be efficiently estimated; a simple strategy to circumvent this issue is provided.

  2. Section 3:

    We give an experimental demonstration of our ML method for a simple Monte Carlo integration problem, using 2- and 3-qubits IBM Quantum devices [31, 32]. The result is that, for the former case, a quantum speedup in the number of queries over the classical one is observed, while the estimation error saturates as the query becomes large. On the other hand, for the 3-qubit case, we found that the estimation error is always bigger than that via the classical method. For both cases the saturated value is consistent to the theoretical prediction, implying the validity of the depolarization noise model.

  3. Section 4:

    We give the following two discussions. First, based on the condition obtained in Sect. 2 together with the experimental result shown in Sect. 3, we discuss the hardware requirements, e.g., the error rates of single-qubit and CNOT gates, to achieve a given precision for estimating the value of multidimensional integration. Second, the computational complexity of our proposed algorithm is discussed.

2 Analysis of estimation error under depolarizing noise

2.1 Preliminary

We briefly review the amplitude estimation algorithm that uses the ML method on parallelized circuits [21]. This algorithm can estimate the parameter (i.e., the unknown amplitude) quadratically faster than any classical sampling method to attain a given precision, even without the standard phase estimation subroutine. The algorithm mainly consists of two components: one is the amplitude amplification process [19, 20], which is the generalized version of Grover’s quantum search algorithm, and the other is the classical ML estimation part for the data obtained by the amplitude amplification. The essential idea of the algorithm is sketched below.

The amplitude amplification algorithm initially prepares the state given by \(|\Psi \rangle _{n+1}=\mathcal {A}|0\rangle _{n+1}\), where \(\mathcal {A}\) is a unitary operator acting on the \((n+1)\) qubits. The operator \(\mathcal {A}\) is designed to satisfy \(\mathcal {A}|0\rangle _{n+1} = \sqrt{a}|\tilde{\Psi }_1\rangle _n|1\rangle +\sqrt{1-a}|\tilde{\Psi }_0\rangle _n|0\rangle \), where \(a\in [0,1]\) is an unknown parameter to be estimated, and \(|\tilde{\Psi }_1\rangle _n\) and \(|\tilde{\Psi }_0\rangle _n\) are the n-qubit normalized “good” and “bad” states. In terms of \(\theta _a\in [0,\pi /2]\) satisfying \(\sin ^2\theta _a=a\), the prepared state \(|\Psi \rangle _{n+1}\) can also be expressed as

$$\begin{aligned} |\Psi \rangle _{n+1} = \sin {\theta _a}|\tilde{\Psi }_1\rangle _n|1\rangle +\cos {\theta _a}|\tilde{\Psi }_0\rangle _n|0\rangle . \end{aligned}$$
(1)

The probability to measure the good state can be amplified by applying the unitary operator \(\mathbf {Q}=-\mathcal {A}\mathbf {S}_0 \mathcal {A}^{-1}\mathbf {S}_{\chi }\) on \(|\Psi \rangle _{n+1}\), where \(\mathbf {S}_{\chi }=\mathbf {I}_{n+1}-2\mathbf {I}_{n}\otimes |1\rangle \langle 1|\) and \(\mathbf {S}_0=\mathbf {I}_{n+1}-2|0\rangle _{n+1}\langle 0|\). Also \(\mathbf {I}_{k}\) is an identity operator on k qubits. By applying \(\mathbf {Q}\) on \(|\Psi \rangle _{n+1}\) for m times, we have

$$\begin{aligned} \mathbf {Q}^m|\Psi \rangle _{n+1} = \sin ((2m+1)\theta _a)|\tilde{\Psi }_1\rangle _n|1\rangle + \cos ((2 m+1)\theta _a)|\tilde{\Psi }_0\rangle _n|0\rangle . \end{aligned}$$
(2)

This equation tells that the probability for measuring the good state is amplified depending on the number of repetitions of \(\mathbf {Q}\) on \(|\Psi \rangle _{n+1}\). Note that Eq. (2) is valid only in the absence of noise.

Next, we describe the ML part. The idea is to estimate the amplitude a using the number of measuring the good state after performing \(m_k\) (for \(0 \le k \le M\)) repetitions of \(\mathbf {Q}\). Now for the ideal state \(\mathbf {Q}^{m_k}|\Psi \rangle _{n+1}\), the probability measuring the good state is \(P(m_k ; a)=\sin ^2((2m_k +1)\theta _a)\); then the probability to have the good state \(h_k\) times out of the total \(N_k\) measurement shots is proportional to \([P(m_k ; a)]^{h_k} [1-P(m_k ; a)]^{N_k-h_k}\) and accordingly the likelihood function is

$$\begin{aligned} L(\mathbf {h}; a) = \prod _{k=0}^M [P(m_k ; a)]^{h_k} [1-P(m_k ; a)]^{N_k-h_k}, \end{aligned}$$
(3)

where \(\mathbf {h}=(h_0,h_1, \ldots ,h_M)\). The ML estimate of a is then given by \(\mathrm{argmax}_{a} L(\mathbf {h}; a)\), which was proven to achieve the theoretical lower bound in the estimation error (the detail is described just below). Note that, of course, the estimated precision depends on the choice of a sequence of integers \(\{m_k\}\). In this paper, we consider the following Linearly Incremental Sequence (LIS) and Exponential Incremental Sequence (EIS):

$$\begin{aligned} \text{(LIS) }~~m_k = k, ~~~~~ \text{(EIS) }~~m_k=\lfloor 2^{k-1}\rfloor , ~\text {for}~0 \le k\le M, \end{aligned}$$
(4)

where in the hereafter we omit the floor notation for simplicity. To asymptotically achieve a given error \(\epsilon \), LIS and EIS need \(O(1/\epsilon ^{4/3})\) and \(O(1/\epsilon )\) total query calls, respectively, while the classical case (i.e., \(\forall k: m_k=0\)) needs \(O(1/\epsilon ^2)\) to achieve the same precision [21].

Lastly we give a general framework of the above-described method for the vector of multiple parameters \(\varvec{\theta }=[\theta _1, \ldots , \theta _K]^\top \) under noisy environment, which is indeed the scenario studied in this paper (\(\bullet ^\top \) denotes the transpose). Let \(P(m_k; \varvec{\theta })\) be the probability of measuring the good state for a density matrix obtained by applying \(\mathbf {Q}^{m_k}\) on the state (1) under noisy environment. Then the likelihood function corresponding to the probability having the good state \(h_k\) times in total \(N_k\) measurement shots is, similar to the above, given by

$$\begin{aligned} L(\mathbf {h};\varvec{\theta }) =\prod _{k=0}^M \left[ P(m_k;\varvec{\theta })\right] ^{h_k} \left[ 1-P(m_k;{\varvec{\theta }})\right] ^{N_k-h_k}, \end{aligned}$$
(5)

where again \(\mathbf {h}=(h_0,h_1, \ldots ,h_M)\). The ML estimate \(\hat{\varvec{\theta }}_\mathrm{ML}\) is defined as

$$\begin{aligned} \hat{\varvec{\theta }}_\mathrm{ML} ={\mathrm{argmax}}_{\varvec{\theta }} L(\mathbf {h};\varvec{\theta }) ={\mathrm{argmax}}_{\varvec{\theta }}\ln L(\mathbf {h};\varvec{\theta }). \end{aligned}$$
(6)

In general, the estimation error covariance matrix \(\mathrm {Cov}(\varvec{\hat{ \theta }})=\mathbb {E}[(\varvec{\theta }-\varvec{\hat{\theta }}) (\varvec{\theta }-\varvec{\hat{\theta }})^T ]\), with \(\hat{\varvec{\theta }}\) any unbiased estimate, satisfies the Cramér–Rao inequality:

$$\begin{aligned} {\mathrm{Cov}}(\varvec{\hat{\theta }})\ge {\mathcal {I}}^{-1}(\varvec{\theta }), \end{aligned}$$
(7)

where \(\mathcal {I}(\varvec{\theta })\) is the Fisher information matrix defined as

$$\begin{aligned} \mathcal {I}(\varvec{\theta }) =\mathbb {E}\left[ \left( \frac{\partial }{\partial \varvec{\theta }}\ln L(\mathbf {h};\varvec{\theta }) \right) \left( \frac{\partial }{\partial \varvec{\theta }}\ln L(\mathbf {h};\varvec{\theta }) \right) ^\top \right] . \end{aligned}$$
(8)

The expectation value \(\mathbb {E} \left[ \bullet \right] \) in Eq. (8) is defined as

$$\begin{aligned} \mathbb {E}\left[ X(\mathbf {h})\right] =\sum _{h_0=0}^{N_0}\sum _{h_1=0}^{N_1}\cdots \sum _{h_M=0}^{N_M}X(\mathbf {h})\left[ \prod _{k=0}^{M} \left( \begin{array}{c} N_k\\ h_k \end{array} \right) P(m_k;\varvec{\theta })^{h_k}(1-P(m_k;\varvec{\theta }))^{N_k-h_k}\right] , \end{aligned}$$
(9)

where \(X(\mathbf {h})\) is a function of random variables \(\mathbf {h}\). It is known that the ML estimate attains the lower bound of \(\mathrm{{Cov}}(\varvec{\hat{\theta }})\) in Eq. (7), asymptotically in the limit of large samples. The elements of the Fisher information matrix are given as

$$\begin{aligned} \left[ \mathcal {I}(\varvec{\theta }) \right] _{i,j} =\mathbb {E}\left[ \left( \frac{\partial }{\partial \theta _i}\ln L(\mathbf {h};\varvec{\theta }) \right) \left( \frac{\partial }{\partial \theta _j}\ln L(\mathbf {h};\varvec{\theta }) \right) \right] . \end{aligned}$$
(10)

In the case of the multi-parameter estimation problem, the ML estimation of the i-th and j-th (\(i\ne j\)) parameters can be performed independently, if the (ij) element of the Fisher information matrix is zero. However, otherwise, the i-th and j-th parameters are correlated, and the estimation of parameter of interest may be adversely affected by other nuisance parameters, which is indeed the case in this work as shown later.

2.2 Fisher information in the presence of depolarizing noise

Quantum states are usually disturbed by noise that comes from the system-environment interaction. Then the ideal pure state (2) is replace by a mixed state, which is yet impossible to perfectly identify. On the other hand, the ML estimator is based on a model state which may well approximate such an unknown true mixed state. In this work, we assume the depolarization channel defined by [33, 34]:

$$\begin{aligned} \mathcal {D}(\rho ) = p \rho + (1-p) \frac{\mathbf {I}_{n+1}}{d}. \end{aligned}$$
(11)

Here \(\rho \) is a density matrix, \(\mathcal {D}(\rho )\) is a completely positive trace-preserving (CPTP) map which represents the depolarization of qubits, \(1-p\) is the error probability that qubits are depolarized, and d is the dimension of the quantum system, i.e., \(d=2^{n+1}\). Note that p should also be treated as an unknown parameter, in addition to a; hence we are studying the two-parameter estimation problem, which is essentially harder compared to the one-parameter problem studied in [21].

Now, the CPTP map of the ideal amplitude amplification channel, in terms of the density matrix, is represented as

$$\begin{aligned} \mathcal {Q}(\rho ) =\mathbf {Q} \rho \mathbf {Q}^{\dagger }. \end{aligned}$$
(12)

From Eqs. (11) and (12), the amplitude amplification process in the presence of noise is thus given by

$$\begin{aligned} \rho _\mathrm{{noise}} = \mathcal {Q}\mathcal {D} (\rho )=\mathcal {D}\mathcal {Q} (\rho ) = p \mathbf {Q} \rho \mathbf {Q}^{\dagger }+ (1-p) \frac{\mathbf {I}_{n+1}}{d}. \end{aligned}$$
(13)

Moreover, it is easy to see that m times repetition of this noisy amplitude amplification process end up with

$$\begin{aligned} \rho _\mathrm{{noise}}^{(m)}=(\mathcal {Q}\mathcal {D} )^{m} (\rho ) = p^m \mathbf {Q}^{m} \rho \mathbf {Q}^{\dagger m}+ (1-p^m) \frac{\mathbf {I}_{n+1}}{d}. \end{aligned}$$
(14)

Now the initial state is chosen as \(\rho =|\Psi \rangle _{n+1}\langle \Psi |\), where \(|\Psi \rangle _{n+1}\) is given in Eq. (1). As in the ideal case, we are interested in the probability of measuring the good state with which the last qubit is found to be \(|1\rangle \), i.e., \(P(m;\varvec{\theta })=\mathrm{{Tr}}(\rho _\mathrm{{noise}}^{(m)} E_{1})\), where \(E_1=\mathbf {I}_{n}\otimes |1\rangle \langle 1|\); also \(\varvec{\theta }=[a, \kappa ]^\top \) is the vector of unknown parameters, where \(\kappa \) is defined as \(\kappa =-\ln p\) which we refer as the noise level of the amplitude amplification process. By using Eq. (2) and \(\mathrm{{Tr}}(\mathbf {I}_{n+1}E_1)=2^n\), this probability is calculated as

$$\begin{aligned} P(m;\varvec{\theta }) = P(m; a, \kappa ) = \frac{1}{2} - \frac{1}{2} \mathrm {e}^{- \kappa m} \cos (2(2m+1)\theta _a). \end{aligned}$$
(15)

Then the likelihood function (5) is represented as

$$\begin{aligned} L(\mathbf {h}; \varvec{\theta }) = L(\mathbf {h} ; a, \kappa ) = \prod _{k=0}^M \left[ P(m_k;a, \kappa )\right] ^{h_k} \left[ 1-P(m_k; a, \kappa )\right] ^{N_k-h_k}. \end{aligned}$$
(16)

Our task is to estimate \(\varvec{\theta }=[a, \kappa ]^\top \) via the ML estimate (6), i.e., \(\varvec{\theta }_\mathrm{ML} = \mathrm{argmax}_{\varvec{\theta }}L(\mathbf {h}; \varvec{\theta })\), where \(\mathbf {h}=(h_0, h_1, \ldots , h_M)\) is the set of data obtained in the experiment. As mentioned in the previous subsection, \(\varvec{\theta }_\mathrm{ML}\) asymptotically achieves the lower bound in the Cramér–Rao inequality (7) if the data is generated from the model distribution; the Fisher information matrix (8) in this case can now be calculated as

$$\begin{aligned} \begin{aligned} \mathcal {I}_{1,1}(a,\kappa )&= \mathbb {E}\left[ \left( \frac{\partial }{\partial a}\ln L(\mathbf {h};a,\kappa )\right) ^2 \right] \\&=\sum _{k=0}^M \frac{N_k(2m_k+1)^2}{\sin ^2(2 \theta _a)} \frac{4\sin ^2\left( 2\left( 2 m_k + 1 \right) \theta _a \right) }{e^{2\kappa m_k}-\cos ^2\left( 2\left( 2 m_k+1 \right) \theta _a\right) }, \\ \mathcal {I}_{1,2}(a,\kappa ) = \mathcal {I}_{2,1}(a,\kappa )&= \mathbb {E}\left[ \left( \frac{\partial }{\partial a} \ln L(\mathbf {h};a,\kappa )\right) \left( \frac{\partial }{\partial \kappa }\ln L(\mathbf {h};a,\kappa )\right) \right] \\&=\sum _{k=0}^M \frac{N_km_k(2m_k+1)}{\sin (2\theta _a)} \frac{\sin \left( 4\left( 2 m_k + 1 \right) \theta _a \right) }{e^{2\kappa m_k}-\cos ^2\left( 2\left( 2 m_k+1 \right) \theta _a\right) }, \\ \mathcal {I}_{2,2}(a,\kappa )&= \mathbb {E}\left[ \left( \frac{\partial }{\partial \kappa }\ln L(\mathbf {h};a,\kappa )\right) ^2 \right] \\&=\sum _{k=0}^M N_km_k^2 \frac{\cos ^2\left( 2\left( 2 m_k + 1 \right) \theta _a \right) }{e^{2\kappa m_k}-\cos ^2\left( 2\left( 2 m_k+1 \right) \theta _a\right) }. \end{aligned} \end{aligned}$$
(17)

Recall that a is the parameter of our true interest; from Eq. (7), the estimation error of a is lower bounded by the (1, 1) element of the inverse of the above Fisher information matrix as

$$\begin{aligned} \epsilon = \sqrt{\mathbb {E}\Big [ (a - \hat{a} )^2 \Big ]} \ge \sqrt{\left( \mathcal {I}(a,\kappa )^{-1} \right) _{1,1}} =: \epsilon _\mathrm{min}(a,\kappa ). \end{aligned}$$
(18)

We use \(\epsilon _\mathrm{min}(a,\kappa )\) to discuss the condition on the noise level \(\kappa \) to satisfy a required estimation precision. These topics are studied in detail in the next subsection.

Lastly we remark on other noise models. The most general (Markovian) noise model, with multiple parameters, can be represented by the Kraus superoperator. However, in general this does not commute with Grover operator, and as a result we cannot obtain an analytic expression of Cramér–Rao lower bound. To discuss the estimation accuracy in such a case, several numerical methods have been developed in the field of quantum metrology [28]. Note that even for a special type of noise channel other than depolarization, it is still difficult to have an analytic expression of Cramér–Rao lower bound. For instance, in Ref. [34] considering the amplitude damping and dephasing noise model in the same amplitude estimation problem yet with known noise strength, the estimation error was evaluated numerically.

2.3 Achievable estimation error and required depolarizing noise level

Fig. 1
figure 1

Cramér–Rao lower bound \(\epsilon _\mathrm{min}(a,\kappa )\) on the estimation error of a, versus the total number of queries. The red line corresponds to the classical case \(\forall k: m_k=0\), while the other lines are the quantum cases of EIS with several values of noise level \(\kappa \). See the first paragraph of Sect. 2.3 for the details (Color figure online)

Figure 1 shows the Cramér–Rao lower bound of the estimation error, \(\epsilon _\mathrm{min}(a,\kappa )\), versus the total number of query calls, \(N_\mathrm{{q}}=\sum _{k=0}^{M}N_k(2 m_k+1)\). In particular in the classical case, it is given by \(N_\mathrm{{q}}=\sum _{k=0}^{M}N_k\). The target value is chosen to be \(a=\sin ^2\theta _a=0.375\), and \(N_k=100\) for all k. The (red) thin solid line represents the lower bound in the classical case. The other lines represent the lower bounds obtained when using the amplitude amplification with the EIS for several noise level \(\kappa \); the (yellow) thick solid line is the bound without noise (\(\kappa =0\)); the (green) dashed line with triangles, the (blue) dash-dotted line with squares, and the (light blue) dotted line with crosses are the lower bound under depolarizing noise \(\kappa =10^{-1}, 10^{-2}\), and \(10^{-3}\), respectively. Recall that, in the ideal case \(\kappa =0\), the number of query calls needed to reach a specified value of \(\epsilon \) is \(N_\mathrm{q}\sim O(1/\epsilon )\), i.e., the Heisenberg-scaling. We note that a similar dependence of \(\epsilon _\mathrm{min}(a,\kappa )\) on the magnitude of \(\kappa \) is observed for many cases of a, but there exist cases such that \(\epsilon _\mathrm{min}(a, \kappa )\) takes much bigger values than those shown in Fig. 1 even when \(\kappa \) is sufficiently small; see Sect. 2.4 about this special case.

A notable point observed in Fig. 1 is that, even under the depolarizing noise model, the estimation error \(\epsilon _\mathrm{min}(a,\kappa )\) decreases in nearly the Heisenberg-scaling law up to \(N_\mathrm{q}\sim 10^4\) and \(N_\mathrm{q}\sim 10^5\) for the cases \(\kappa =0.01\) and \(\kappa =0.001\), respectively. However, the error does not decrease anymore, even by using more queries. In other words, \(\epsilon _\mathrm{min}(a,\kappa )\) gets saturated at those points of \(N_\mathrm{q}\), and thus the algorithm has to be stoppedFootnote 1. The maximum number of queries within which the Heisenberg-scaling is guaranteed can be formally characterized as follows. That is, even under depolarizing noise model with \(\kappa \), the Heisenberg-scaling is nearly preserved if the number of operations of \(\mathcal {Q}\), \(m_k\), is smaller than \(\bar{m}\) defined as the maximum integer satisfying the following inequality [28]:

$$\begin{aligned} (2\bar{m} + 1) (1 - \mathrm {e}^{-\kappa } ) \le 1. \end{aligned}$$
(19)

This condition is derived as a sufficient condition to guarantee that the probability to measure the good state is not affected by the noise in the limit of large samples. The star marks in Fig. 1 are the total query calls \(\bar{N}_\mathrm{q}\) corresponding to \(\bar{m}\) for given \(\kappa \), showing that in fact the estimation error does not obey the Heisenberg-scaling law even by calling more queries than \(\bar{N}_\mathrm{q}\). Note that \(\bar{m}\) given in Eq. (19) was originally derived in the one-parameter setting (that is, the case where \(\kappa \) is known), while \(\epsilon _\mathrm{min}(a,\kappa )\) is a function of the two-parameter Fisher information matrix; despite of this gap \(\bar{m}\) certainly captures the point of maximum number of query calls up to which the estimation error decreases according to the Heisenberg-scaling.

Fig. 2
figure 2

Relationship between the noise level \(\kappa \) and the achievable error \(\epsilon \), in the case \(a=0.375\). The ML algorithm is with \(\forall k: N_k=100\) and EIS

Based on the above-described fact, we obtain the condition on the noise level \(\kappa \) so that the ML algorithm reaches a specified estimation error \(\epsilon \) with the number of query calls of the order \(O(1/\epsilon )\), i.e., the Heisenberg-scaling, even under the noisy environment. Fig. 2 yields such condition; this is the relation between \(\kappa \) and the error at \(\bar{N}_\mathrm{q}\), for the case \(a=\sin ^2\theta _a=0.375\), \(\forall k: N_k=100\), and EIS. For instance, if we want to reach the estimation error \(\epsilon =10^{-4}\) using \(O(1/\epsilon )\) query calls, then we need \(\kappa \) to be smaller than \(\sim 10^{-3}\). Importantly, Fig. 2 indicates the “quasi linear relation” between a specified \(\epsilon \) and the required value of \(\kappa \); that is, if we need to decrease \(\epsilon \) to \(\epsilon /10\) then \(\kappa \) should be improved to simply \(\kappa /10\). Note that this quasi linear relation is expected to hold from Eq. (19), which leads to \(2\bar{m}+1=1/\kappa \) when \(\kappa \) is small, together with the Heisenberg-scaling \(\bar{N}_\mathrm{q} = O(1/\epsilon )\), although Eq. (19) was proven only in the one-parameter setting. The condition on \(\kappa \) can be further converted to that on the gate fidelity of elementary gates constructing the quantum circuit. That is, Fig. 2 represents the minimum hardware specification required to apply the quantum amplitude estimation method to solve a concrete problem such as a Monte Carlo integration task demonstrated later; a more detailed discussion on the hardware requirement will be given in Sect. 4.1.

The above discussion as well as Figs. 1 and 2 surely depend completely on the mathematical model described in Sect. 2.2. The real quantum system must not perfectly coincide with this model. The resulting ML estimate is then not guaranteed to reach the Cramér–Rao lower bound discussed here; also the quasi linear relation observed in Fig. 2 could be changed. That is, only with the materials posed up to now, we still could not say that Fig. 2 serves as a guide for discussing the condition on \(\kappa \) to reach a specified estimation error with Heisenberg-scaling law in the real world. This fully motivates us to execute some detailed experiment to see if the theory described above is consistent to the experimental result and thereby verify its usability to the real quantum computing applications; this topic will be discussed in Sect. 3.

Note that there is still a room for improvement on update strategy of \(m_k\). For instance, if we employ \(m_k=\lfloor r^{k-1} \rfloor \) where r is a real number which satisfies \(r>1\), the value of \(\bar{N}_\mathrm{{q}}\) changes depending on the value of r. This indicates that the achievable estimation error in the presence of noise can be reduced by changing the update strategy. More importantly, especially for the two-parameters problem considered in this paper, the estimation precision can be severely limited depending on the choice of r, which will be discussed in the next subsection.

2.4 Anomalous target value that induces a large estimation error

Fig. 3
figure 3

Lower bound of the estimation error \(\epsilon _\mathrm{min}=\sqrt{(\mathcal {I}^{-1})_{1,1}}\) versus the noise level \(\kappa \). The target value is chosen as \(a=\sin ^2(\pi /8)\). The solid (blue) and dotted (red) lines are obtained when using \(\{m_k\}=\{0,2^0,2^1,2^2,\ldots \}\) and \(\{m_k\}=\{0,\lfloor {2.5^0}\rfloor ,\lfloor {2.5^1}\rfloor ,\lfloor {2.5^2}\rfloor ,\ldots \}\), respectively. \(\lfloor \cdot \rfloor \) is the floor function (Color figure online)

The previous discussion is based on the following two conditions; that is, the quasi linear relation between \(\kappa \) and \(\epsilon _\mathrm{{min}}(a,\kappa )\), and the Heisenberg scaling of the total number of queries \(N_\mathrm{{q}}=\sum _{k=0}^M N_k(2 m_k+1)\) with respect to a given estimation error \(\epsilon \) in the range \(m_k\lesssim \bar{m}\). However, these conditions does not always hold. For instance, when the target value a is \(a=\sin ^2 (\pi /8)\), the estimation error \(\epsilon _\mathrm{{min}}(a,\kappa )\) does not obey the quasi linear relation with respect to \(\kappa \), as shown by the blue solid line in Fig. 3. That is, even when the noise \(\kappa \) is sufficiently small, a precise estimate of a is not possible. This is due to the multi-parameter estimation setting, where in general the estimation error covariance matrix of the parameters \(\varvec{\theta }\) satisfies the Cramér–Rao inequality (7), i.e., \(\mathrm{{Cov}}(\varvec{\hat{\theta }})\ge \mathcal {I}^{-1}(\varvec{\theta })\), where \(\mathcal {I}(\varvec{\theta })\) is the Fisher information matrix. Clearly, if \(\mathrm{det}\, \mathcal {I}(\varvec{\theta })\) is nearly zero at a certain point of \(\varvec{\theta }\), then the estimation of those parameters has to be inefficient. Actually in the above-described example, our Fisher information matrix \(\mathcal {I}(a, \kappa )\) is nearly degenerate at around \(a=\sin ^2 (\pi /8)\); this is the reason why the quasi linear decrease of \(\epsilon \) with respect to \(\kappa \) does not hold in this case. In this section, we investigate this “anomalous target” point of a in detail. But before moving forward, we would like to emphasize that this analysis never happen in the 1-parameter setting. That is, as stated in Sect. 1, to achieve quantum advantage on available noisy quantum devices, the noise parameter has to be incorporated into the parameters to be estimated and accordingly such a singular point analysis needs to be carried out.

Fig. 4
figure 4

Contour plot of the lower bound of the estimation error, \(\epsilon _\mathrm{min}=\sqrt{(\mathcal {I}^{-1})_{1,1}}\), as a function of the target value a and the noise level \(\kappa \)

First, to see how likely such an anomalous target of a exists, we plot the lower bound of the estimation error \(\epsilon _\mathrm{min}=\sqrt{(\mathcal {I}^{-1})_{1,1}}\), as a function of a and \(\kappa \) in Fig. 4. This contour plot shows that, for almost all target values, the estimation error \(\epsilon _\mathrm{min}\) almost linearly decreases with respect to \(\kappa \); that is, \(\epsilon _\mathrm{min}\) decreases approximately by one order of magnitude, as \(\kappa \) decreases by one order of magnitude. However, there exist anomalous target values of which estimation errors are insensitive to the value of \(\kappa \); for instance, at around \(a=\sin ^2(\pi /8)=0.146\), we observe a long spike where \(\epsilon _\mathrm{min}\) takes almost the same value in the range \([1\times 10^{-2}, 1\times 10^{-3}]\) regardless of \(\kappa \).

Fig. 5
figure 5

Estimation error \(\epsilon _\mathrm{min}=\sqrt{(\mathcal {I}^{-1})_{1,1}}\) represented with the solid (blue) line and the anomality \(\beta = \mathcal {I}_{1,2}^2/\mathcal {I}_{1,1}\mathcal {I}_{2,2}\) represented with the dotted (red) line, as a function of the target value a. The noise parameter \(\kappa \) is fixed to (A) \(\kappa =10^{-2}\) and (B) \(\kappa =10^{-3}\) (Color figure online)

This insensitivity of \(\epsilon _\mathrm{min}\) at a certain point of a is originated from the fact that, as mentioned in the first paragraph of this section, the Fisher information matrix \(\mathcal {I}(a, \kappa )\) is nearly degenerate at around a. To closely look into the relation between the estimation error and the degeneracy of \(\mathcal {I}(a, \kappa )\), in Fig. 5 we plot \(\epsilon _\mathrm{min}\) with the solid blue line and the “anomality” \(\beta = \mathcal {I}_{1,2}^2/\mathcal {I}_{1,1}\mathcal {I}_{2,2}\) with the dotted red line, as a function of the target value a. The noise parameter \(\kappa \) is fixed to (A) \(\kappa =10^{-2}\) and (B) \(\kappa =10^{-3}\). Note that \(0<\beta \le 1\) and \(\beta =1\) is equivalent to \(\mathrm{det}\, \mathcal {I}(a, \kappa )=0\); hence, \(\beta \sim 1\) means that such \((a, \kappa )\) are difficult to estimate precisely. Both of (A) and (B) of Fig. 5 indeed show that, at around the anomalous target of a where \(\beta =1\), the estimation error takes a relatively large value.

The existence ratio of anomalous target values can be quantitatively analyzed in terms of the linear density defined as follows. Here, \(N=10^5\) samples of a are randomly chosen from the uniform distribution on [0, 1], and and the ratio of a satisfying \(\beta > 0.9\) is computed; the linear density is given by this ratio. Table 1 shows the linear density for several values of \(\kappa \). Importantly, the linear density is almost independent of the value of \(\kappa \); it is about \(1\% \sim 2\%\) regardless of \(\kappa \). This is due to the composite of the following two properties of the anomalous targets: (i) the number of anomalous segments satisfying \(\beta > 0.9\) is inversely proportional to \(\kappa \), and (ii) the length of each anomalous segment is proportional to \(\kappa \). Since the linear density of anomalous targets is approximately the product of the number of the anomalous regions with the average length of anomalous regions, it is almost insensitive to \(\kappa \).

It should be noted that the linear density of anomalous targets takes finite value in the limit of \(\kappa \rightarrow 0\), while the anomalous targets themselves are not present in the case of \(\kappa =0\). This is essentially originated from whether one has the information of the noise or not, i.e. the Fisher information \(\mathcal {I}_{1,1}\) can be employed to obtain the lower bound of the estimation error in the absence of the noise, however \((\mathcal {I}^{-1})_{1,1}\) should be used if the noise level is unknown.

Table 1 The linear density of anomalous targets in the range [0, 1]

Finally, we propose two approaches to avoid the anomalous case. The first one is based on the observation that the determinant of the Fisher information matrix changes depending on the choice of the sequence \(\{m_k\}\) of the amplitude amplification. Therefore, if the underlying target value is detected to be anomalous, then we can try another sequence \(\{m_k\}\) to avoid the anomaly. For instance, when \(\{m_k\} = \{0,\lfloor {2.5^0}\rfloor ,\lfloor {2.5^1}\rfloor ,\lfloor {2.5^{2}}\rfloor ,\ldots \}\), the quasi linear relation between \(\kappa \) and \(\epsilon _\mathrm{{min}}\) is recovered even when \(a=\sin ^2(\pi /8)\), as shown with the dotted red line in Fig. 3. This is a clear evidence showing that by suitably choosing \(\{m_k\}\) the determinant of the Fisher information matrix does not get smaller. Our view is that it might be possible to detect the anomalous target by calculating the empirical Fisher information, which eventually allows us to tune the sequence and thereby avoid the anomality.

The second approach is by modifying the amplitude to be estimated. After detecting the anomalous target, we introduce an extra ancilla qubit; then \(R_y(\phi )\) rotation (i.e., the single qubit rotation around the y-axis with a fixed parameter \(\phi \)) is applied to the ancilla qubit as follows:

$$\begin{aligned} \begin{aligned} \mathcal {A}|0\rangle _{n+2} =&\sqrt{a}|\tilde{\Psi }_1\rangle _n|1\rangle |0\rangle +\sqrt{1-a}|\tilde{\Psi }_0\rangle _n|0\rangle |0\rangle \\ \xrightarrow {Ry(\phi )}&\cos (\phi )\sqrt{a}|\tilde{\Psi }_1\rangle _n|1\rangle |0\rangle +\sin (\phi )\sqrt{a}|\tilde{\Psi }_1\rangle _n|1\rangle |1\rangle \\&+\cos (\phi )\sqrt{1-a}|\tilde{\Psi }_0\rangle _n|0\rangle |0\rangle +\sin (\phi )\sqrt{1-a}|\tilde{\Psi }_0\rangle _n|0\rangle |1\rangle . \end{aligned} \end{aligned}$$
(20)

By estimating the probability that the last two-qubit state is \(|1\rangle |1\rangle \), we could avoid the anomalous target problem, because the amplitude to be estimated is modified from a to \(a\sin ^2(\phi )\).

3 Experiment with a real quantum computing device

This section is devoted to show experimental result using the real-backend device of IBM Quantum Systems called ibmq_valencia [25], to evaluate the estimation performance of the proposed ML estimate and thereby the validity of the employed depolarization model. In particular we consider the Monte Carlo integration problem, whose computational (sampling) cost can be quadratically reduced via the amplitude estimation algorithm [4,5,6,7,8,9,10]. In this section, we begin with a brief explanation on the target integration problem, followed by showing the execution results of the real device for two-qubit and three-qubit cases.

3.1 Monte Carlo integration via amplitude estimation

Let us consider a general 1-dimensional integration \(\mathbb {E}[f(x)] = \int _{0}^{1}dx q(x)f(x)\), where \(f:[0,1]\rightarrow [0,1]\) is any real-valued smooth function and \(q:[0,1]\rightarrow [0,1]\) is the probability distribution function which satisfies \(\int _0^1dx q(x) = 1\). This quantity can be in practice obtained by calculating the approximation

$$\begin{aligned} S(f) = \sum _{j=0}^{2^n-1}p(x_j)f(x_j), \end{aligned}$$
(21)

where \(x_j\) is defined as \(x_j = (j + 1/2)/2^n\) and p is the probability mass function defined as \(p(x_j) = \int _{x_j-1/2^{n+1}}^{x_j+1/2^{n+1}} q(x) dx \). It should be noted that there is an approximation error due to the discretization of f(x), i.e., \(S(f)\ne \mathbb {E}[f(x)]\). In our analysis, however, we evaluate the error between S(f) and the value obtained by Monte Carlo integration in order to assess the performance of our algorithm on a real quantum device. The amplitude estimation algorithm is run via the following operators:

$$\begin{aligned} \mathcal {P}|0\rangle _n = \sum _{j=0}^{2^n-1}\sqrt{p(x_j)}|j\rangle _n, ~~ \mathcal {R}|j\rangle _n|0\rangle = |j\rangle _n\left( \sqrt{f(x_j)}|1\rangle +\sqrt{1-f(x_j)}|0\rangle \right) . \end{aligned}$$
(22)

By applying these operators to the \((n+1)\)-qubit initial state, \(|0\rangle _n|0\rangle \), we obtain

$$\begin{aligned} \begin{aligned} \mathcal {R}(\mathcal {P} \otimes \mathbf {I}_{1})|0\rangle _n|0\rangle&=\sum _{j=0}^{2^n-1}\sqrt{p(x_j)}|j\rangle _n\left( \sqrt{f(x_j)}|1\rangle +\sqrt{1-f(x_j)}|0\rangle \right) \\&=\sqrt{S(f)} |\tilde{\Psi }_1\rangle |1\rangle +\sqrt{1-S(f)} |\tilde{\Psi }_0\rangle |0\rangle , \end{aligned} \end{aligned}$$
(23)

where \(|\tilde{\Psi }_1\rangle \) and \(|\tilde{\Psi }_0\rangle \) are defined as

$$\begin{aligned} |\tilde{\Psi }_1\rangle =\frac{1}{\sqrt{S(f)}}\sum _{j=0}^{2^n-1}\sqrt{p(x_j) f(x_j)}|j\rangle _n, ~~ |\tilde{\Psi }_0\rangle =\frac{1}{\sqrt{1-S(f)}}\sum _{j=0}^{2^n-1}\sqrt{p(x_j)(1-f(x_j))}|j\rangle _n. \end{aligned}$$
(24)

This is exactly the state of the form (1). Thus, the ideal amplitude estimation algorithm gives an approximation of S(f) with the precision \(\epsilon \), with only \(O(1/\epsilon )\) queries.

In this paper, we consider the simple integration \( \int _0^{1} \sin ^2(bx) \ dx\) with b a constant, which is approximated as

$$\begin{aligned} S(f) = \sum _{j=0}^{2^n-1} \frac{1}{2^n} \sin ^2\left( b x_j\right) . \end{aligned}$$
(25)

In this case the operators \(\mathcal {P}\) and \(\mathcal {R}\) in Eq. (22) can be implemented only with Hadamard gates and controlled Y-rotation gates, as shown in Fig. 6.

3.2 Experimental result for the two-qubits case

We now show the experimental result of applying the ML algorithm in the real device, to compute Eq. (25). In this subsection, we consider the simple case \(n=1\), meaning that the integration is approximated via the discrete sum S(f) having only two domain values \(x=0\) or \(x=1\), in which case Eq. (25) takes \(b=\pi /20\), that is, the value \(S(f)=a=\sin ^2\theta _a=0.0077\). Also this setting means that we need only two qubits; in the experiment we chose the 0-th and 1-st qubit of ibmq_valencia.

Fig. 6
figure 6

The 2-qubit circuit of the unitary operators \(\mathcal {Q}\) and \(\mathcal {R}(\mathcal {P} \otimes I)\), for computing the probability \(P(m;\varvec{\theta })\) where the target value S(f) is given in Eq. (25) with \(n=1\), and \(b=2\pi /5\)

First, we show the experimental result of the quantum algorithm to compute the probability \(P(m_k;a, \kappa )\) given in Eq. (15); recall that

$$\begin{aligned} \begin{aligned} P(m_k;a, \kappa ) = \mathrm{{Tr}}(E_{1} \rho _\mathrm{{noise}}^{(m_k)} )&= \mathrm{{Tr}}\Big [ (\mathbf {I}_1\otimes |1\rangle \langle 1|) (\mathcal {Q}\mathcal {D} )^{m_k} (|\Psi \rangle _2\langle \Psi |) \Big ] \\&= \frac{1}{2} - \frac{1}{2} \mathrm {e}^{- \kappa m_k} \cos (2(2m_k+1)\theta _a), \end{aligned} \end{aligned}$$
(26)

and this is used to construct the likelihood function (16).

Fig. 7
figure 7

The hitting rate of ancilla qubit being 1 (i.e., \(h_k/N_k\)), versus the number of \(\mathcal {Q}\) in amplitude amplifications, \(m_k\). The (green) round and (blue) square points show the experimental result obtained by running the ML algorithm with LIS and EIS, respectively. The (orange) line is the analytic result of the ideal probability without noise. To mark each point, we performed \(N_k = 8192\) measurements for all k (Color figure online)

The quantum circuit to execute the unitary operators \(\mathcal {Q}\) and \(\mathcal {R}(\mathcal {P} \otimes I)\) is shown in Fig. 6. Here \(|\Psi \rangle _2=\mathcal {R}(\mathcal {P} \otimes I)|0\rangle |0\rangle \) is given by Eq. (23). In Fig 7, the green round points are plotted by computing the hitting rate of ancilla qubit being 1 (i.e., the hitting rate of measuring the good state, which corresponds to \(P(m_k; a, \kappa )\)), for the LIS setting. Note that these points cover the points of the EIS setting which are marked with the blue square points depicted in Fig. 7. Importantly, the figures show that the hitting rate has a trend of exponentially-decaying oscillation and approaches to 0.5 as the number of \(\mathcal {Q}\) in amplitude amplification, \(m_k\), increases. As a minimal model, we take the depolarizing noise (11) to model this decayed oscillation; actually the resulting probability distribution (26) well interpolates all the points obtained in the experiment, as shown in Fig. 7. Also, for reference, the analytic result of the ideal probability in the absence of noise, i.e., the case \(\kappa =0\), is depicted with the orange line in the Fig. 7.

We next experimentally executed the ML algorithm based on the model (26), for estimating \(a=S(f)\) and \(\kappa \). Recall that the best ML estimate of \((a, \kappa )\) is given by the maximum of the likelihood function (16) with \(P(m_k;a, \kappa )\) the model (26) and \(h_k\) the experimental result of the number of hit for a fixed number of \(\mathcal {Q}\) in amplitude amplification, \(m_k~(k=0, \ldots , M)\); here we tested 6 patterns \(M = 1, \ldots , 6\). In particular, we used EIS, fixed \(N_k=100\) and \(b=2\pi /5\) therefore \(a=S(f)=0.375\). The result is given in Fig. 8, which shows the relation between the estimation error of S(f) and the total number of query calls, \(N_\mathrm{{q}}=\sum _{k=0}^{M}N_k(2 m_k+1)\). The solid thin red and thick yellow lines are the theoretical Cramér–Rao lower bound \(\epsilon _\mathrm{min}(a,\kappa )\) given in Eq. (18), obtained via the classical method and the ideal quantum ML method without noise (\(\kappa =0\)), respectively. The blue cross marks are the standard deviation of the estimated values of S(f) obtained via the ML method, from the true value \(S(f) = 0.375\). Note that, for instance to mark the blue point at \(M=4\) (or equivalently \(N_\mathrm{{q}} \sim 3.5\times 10^3\)), in which case the estimation error is about \(0.65 \times 10^{-2}\), the ML algorithm uses the likelihood function constructed from the amplitude amplification processes with different operation length \(m_k~(k=0, \ldots , 4)\). Further, to reduce the fluctuation of those points, we repeated the same experiment 1064 times and averaged out to determine each point; the three-times standard errors are indicated by the error bars. The green dotted line shows the Cramér–Rao lower bound with noise level \(\kappa =0.067\), which is the single ML estimate of \(\kappa \) based on the \(1064 \times N_k =1064 \times 100\) data at \(M=6\) (that is, roughly speaking, the best estimate of \(\kappa \) over the whole execution of the algorithm).

Fig. 8
figure 8

Estimation error of a versus the total number of queries for the 2-qubit case. The thin red and thick yellow lines are the theoretical Cramér–Rao lower bound, obtained via the classical method and the ideal quantum ML method without noise (\(\kappa =0\)), respectively. The green dotted line shows the Cramér–Rao lower bound with noise \(\kappa =0.067\). The blue cross marks show the standard deviation between the true value \(S(f) = a = \sin ^2\theta _a = 0.375\) and the estimated values of a obtained via the ML method of experiments on ibmq_valencia (Color figure online)

As seen from Fig. 8, the estimation error experimentally obtained using the ML method (the blue points) is in good agreement with the theoretical Cramér–Rao lower bound (the green dotted line). A few slight deviation, particularly the points where the ML estimate is below the Cramér–Rao lower bound, might be due to some imperfections other than depolarizing noise, such as the rotation error of the gate operation and unnecessary coupling to neighboring qubits on the device. We would also like to emphasize that, in this example, there are several points where the experiment achieves the more precise estimate than that obtained via the classical method (the red line). This is an important evidence that even a noisy quantum computer can be beneficial over the classical one, in the measure of query complexity. Finally we remark that a similar behavior was observed, in other settings that use different two qubits in the device and a different target value S(f); see Appendix C.

3.3 Experimental result for the three-qubit case

Fig. 9
figure 9

The 3-qubit circuit for computing the probability \(P(m;a, \kappa )\) where the target value S(f) is given in Eq. (25) with \(n=2\) and \(b=2\pi /5\)

Here we present the experimental result for the case where the target integration \(\int _0^{1} \sin ^2(bx) \ dx\) with \(b=2\pi /5\) is to be approximated by S(f) in Eq. (25) with \(2^n=2^2\) segments. In this case, \(S(f) = a = \sin ^2\theta _a = 0.381\). Also then we need \(n+1=3\) qubits to implement the amplitude amplification operation.

The quantum circuit to execute the ML algorithm is shown in Fig. 9. Because ibmq_valencia does not allow direct coupling for arbitrary pair of qubits, we chose three qubits to form a chain structure. In the experiment, the qubit in the middle of chain was chosen as the ancilla qubit, and it is placed as the third qubit from the top in the circuit at Fig. 9.

Fig. 10
figure 10

Estimation error of a versus the total number of queries for the 3-qubit case. The thin red and thick yellow lines are the theoretical Cramér–Rao lower bound, obtained via the classical method and the ideal quantum ML method without noise (\(\kappa =0\)), respectively. The green dotted line shows the Cramér–Rao lower bound with noise \(\kappa =0.331\). The blue cross marks show the standard deviation between the true value \(S(f) = a =\sin ^2\theta _a = 0.381\) and the estimated values of a obtained via the ML method of experiments on ibmq_valencia (Color figure online)

Fig. 10 is the three-qubit version of that at Fig. 8, and shows the relationship between the number of queries and the estimation error of \(a=S(f)\). The Cramér–Rao lower bound under noise (the green dotted line with triangle) was plotted with \(\kappa =0.331\). Due to such a high noise level, as expected from Fig. 1, this Cramér–Rao lower bound is above the classical one (the red line), meaning that the quantum computation has no advantage in the amplitude estimation task. However, what is important in our context is that the estimation error obtained from the experiment (the blue points) lie near the green line; that is, the ML estimate computed with the 3-qubit real quantum device asymptotically reaches the theoretical Cramér–Rao lower bound.

Therefore, together with the result for the 2-qubit case, we now would like to conclude that Eq. (11) is a good model of the noise process, at least for a small size qubit device. This means that the theoretical predictions illustrated in Fig. 1 and thereby Fig. 2 might be usable as a practical guide for discussing the condition on the noise level \(\kappa \) to realize the Heisenberg-scaling in the quantum amplitude estimation task.

4 Discussion

This section is divided to two topics. In Sect. 4.1, based on the result obtained in Sect. 2.3, we show a procedure to assess the gate errors required to achieve a given task as well as the expected execution time of the algorithm, and discuss the limitations such as the gate error and the coherence time when using IBM Quantum devices. Next in Sect. 4.2, we discuss the computational complexity that takes into account the classical optimization procedure to compute the ML estimate.

4.1 Hardware specification for the amplitude estimation task

Recall that Fig. 2 is used to predict the maximum allowed noise level \(\bar{\kappa }\) for achieving a given estimation error \(\epsilon \) with the query calls obeying the Heisenberg-scaling law. Here we connect this value of \(\bar{\kappa }\) to the error of elementary gates in the quantum circuit; also the execution time of the algorithm is assessed. The entire procedure to compute these quantities is composed of the following four steps.

  1. 1.

    An amplitude estimation problem, together with a target estimation error \(\epsilon \), is given to us. Then the amplitude amplification operator \(\mathcal {Q}\) (plus \(\mathcal {P}\) and \(\mathcal {R}\) in the Monte Carlo case), and accordingly the elementary gates constituting those operators are identified; for instance, see Fig. 6 or Fig. 9.

  2. 2.

    Given the target estimation error \(\epsilon \), we use Fig. 2 to compute the maximum allowed noise level \(\bar{\kappa }\) when \(N_k\) for all k is fixed. We can then compare \(\bar{\kappa }\) to the ML estimate \(\hat{\kappa }\) obtained through the experiment with a real device, to see whether the device can produce the desired estimate \(\hat{a}\). Also Eq. (19) enables us to have the maximum number of operations of \(\mathcal {Q}\), i.e., \(\bar{m}\), within which the Heisenberg-scaling nearly holds in estimating the parameter. More precisely it is given by \(\bar{m} = 0.5/(e^{\bar{\kappa }} -1)\); for example, \(\bar{m}\) can take only \(\bar{m}=5\) when \(\bar{\kappa }=0.1\), but \(\bar{m}=500\) when \(\bar{\kappa }=0.001\).

  3. 3.

    Furthermore, from \(\bar{m}\) and the number of gates constituting the operator \(\mathcal {Q}\), denoted by L, we can have a rough estimate of the execution time of each quantum circuit with length \(m_1, m_2, \ldots , \bar{m}\). Then the execution time in \(\bar{m}\) and the total execution time in \(m_1, m_2, \ldots , \bar{m}\) can also be estimated, which are then compared to the coherence time of the available real device, and then the feasibility of this algorithm can be assessed respectively.

  4. 4.

    Under the assumption that all qubits are subjected to only depolarizing noise, we approximate the \(p=e^{-\bar{\kappa }}\), the success probability of the operator \(\mathcal {Q}\), as \(\prod _{i=1}^{L} (1 - \epsilon _i)\), where L is the number of gates that constitute the operator \(\mathcal {Q}\) and \(\epsilon _i\) is the depolarizing error probability of the ith qubit. Using this equation we may also have a rough estimate of the gate error, using \(\bar{\kappa }\) and L. The gate error is compared to that of real device (which might be identified via the standard randomized benchmarking test) to see the feasibility of the algorithm.

A detailed procedure for computing the above quantities, especially the gate errors and the total execution time, is given in Appendix B.

As an example, let us consider a multiple integration over 5 variables, which assumes direct correlations between any of two variables, e.g., \(\int (x_1x_2 + x_2x_3 + x_3x_4 + x_4x_5)d\mathbf {x}\). The reason of this choice is that, to compute such a multiple integration with more than five variables, the Monte-Carlo method is usually employed rather than grid-based numerical integration approaches. Here we follow the above four steps one by one. (Step 1) Suppose that we are required to achieve the target estimation error \(\epsilon = 0.001\) with the Heisenberg-scaling query calls. (Step 2) Then Fig. 2 tells us that we need \(\bar{\kappa } = 0.005\) when \(N_k=100\) for all k and 99 qubits wherein 48 qubits are used for the ancilla to gate decomposition. Also we have \(\bar{m}=99\). (Step 3) For some quantum devices, the operating time of single gates and the CNOT gate are publicly available; in the case of ibmq_valencia, they are \(7.1 \times 10^{-8}\) s and \(2.8 \times 10^{-7}\) s, respectively. Also the measurement time is \(3.5 \times 10^{-6}\) s. Moreover we assume that the interval time between the measurement and the next execution of the algorithm is 10 times longer than the execution time of the algorithm with length \(\bar{m}\). Then, we find that the execution time of the algorithm with length \(\bar{m}\) is 0.54 s and the total computation time of the entire algorithm is about 1082 s; see Table 2 in Appendix B for the detailed calculation. (Step 4) We can also assess the gate error required for the algorithm to follow the Heisenberg-scaling to reach the given estimation error. Under the assumption that the error of CNOT gate, \(\epsilon _\mathrm{d}\), is 10 times bigger than that of any single gate error, \(\epsilon _\mathrm{s}\), we end up with \(\epsilon _\mathrm{d}=2.8 \times 10^{-7}\) and \(\epsilon _\mathrm{s}=2.8 \times 10^{-8}\). Now, in the case of ibmq_valencia, they are given by \(\epsilon _\mathrm{d}\sim 1.0 \times 10^{-2}\) and \(\epsilon _\mathrm{s}\sim 1.0\times 10^{-3}\).

An important finding is in quantifying the difference between the gate error of the currently available devices and that of the near ideal machine which realizes the Heisenberg-scaling to reach the given estimation error. The execution time (0.54 s) is also clarified, which may be much longer than the coherence time of the currently available devices. In fact, even though these gaps seem to be large, they are informative for us because we now know how much improvement in the hardware development is necessary to fill these gaps. In addition, algorithm improvements would help close these gaps. For example, recent research [29] suggested a method for reducing the circuit depth and the number of qubits.

Finally, we note that the estimated noise level \(\kappa \) is comparable with the value calculated based on the publicly available information on the gate error of the IBM Quantum device. In our case, the 2-qubits Grover circuit contains 5 CNOT gates (the Q operator part of Fig. 6), and the 3-qubits one contains 16 CNOT gates (the Q operator part of Fig. 9); note that \(R_y\) gate is composed of 2 CNOT gates. Then, for the 2 qubits case, the CNOT gate error was 0.00565, meaning that \(\kappa \sim -\mathrm{ln}((1-0.00565)^5) \sim 0.0283\). For the 3 qubits case, we used two types of CNOT gate with error rate 0.008923 and 0.01119, and they were used 8 times; hence we have \(\kappa \sim -\mathrm{ln}((1-0.008923)^8*(1-0.01119)^8) \sim 0.162\). By taking into account some other impact of noise such as single gate error, we consider that these values becomes closer to the estimated values \(\kappa = 0.067\) and \(\kappa = 0.331\).

4.2 Computational time complexity of maximum likelihood estimation

We have argued about the possibility of less number of queries achieved by the ML method against classical cases under noise influences, and how to deal with anomalous target values. Here, assuming anomalous targets are detectable, we show the classical post-processing for maximum likelihood estimation of the target value \(\hat{a}\) and noise level \(\hat{\kappa }\) can be performed in \(O(\ln ^{5/2}(1/\epsilon ))\), which is much less than that of the quantum part.

Because under depolarizing noise model the success probability now has two parameters, the target value \(\hat{a}\) and noise level \(\hat{\kappa }\) as shown in Sect. 2.2, it is necessary to perform two-dimensional maximum likelihood to obtain the estimated values of the parameters. A straightforward random search or grid search requires \(O(1/\epsilon ^2)\) evaluations of likelihood function to achieve the error \(\epsilon \) [30]. This completely ruins the advantage obtained from the quantum method because the total time complexity now becomes \(O(1/\epsilon ^2)\) which is at least the same as the classical Monte-Carlo approach.

In order to avoid the problem of computational complexity, we employed an adaptive constant grid search at each stage k for \(0 \le k \le M\) in the experiment of Sect. 3. Namely, at the k-th stage, we performed the grid search with a constant number of divisions only in the range of confidence interval defined as \(C_\epsilon \) times larger than the error estimated from the Fisher information at the \((k-1)\)-th stage. By the Chebysev inequality, the error estimated at each stage is guaranteed to be within \(C_\epsilon \) times the confidence interval with probability at least \(1 - 1/C_\epsilon ^2\). To obtain an estimation within \(\epsilon \) error, the number of stages M is \(O(\ln (1/\epsilon ))\). Thus, the total probability of obtaining good parameter estimation is at least \((1-1/C_\epsilon ^2)^M\), which is \(\varOmega (1)\) when \(C_\epsilon = \varTheta (\ln ^{1/2}(1/\epsilon ))\). The total time complexity of the maximum likelihood computation is thus \(C_\epsilon \cdot M \cdot O(\ln (1/\epsilon ))\), where the last term is that for evaluating the likelihood function. This gives the bound \(O(\ln ^{5/2}(1/\epsilon ))\) time complexity for the classical post processing.

Except for the anomalous cases, we should note the estimation error of the target value a is not affected even if the estimation error of the noise level \(\kappa \) is large. The details are shown in Appendix A. This may allow us to reduce the two-dimensional ML estimation to a one-dimensional one. In this case, the estimated values can be obtained with lower computational cost; roughly estimate the noise level \(\kappa \) with large grid size, and then perform one-dimensional ML of the target value a with high accuracy. Furthermore, for sufficiently large number of shots the two-dimensional ML estimation can be approximated with the weighted least square estimation. The latter can be solved much more efficiently in practice thanks to the specific algorithms, such as the Levenberg-Marquardt’s [31, 32].

5 Conclusion

All quantum algorithms running on currently available quantum devices must consider the effects of noise. Hence, to perform an appropriate evaluation of such algorithms, particularly those aimed at possible quantum advantage, it is necessary to carefully model the noise, analyze its effects to the algorithm, and make a validation via experiments. This paper provides a demonstration of this evaluation, yielding Fig. 2, a relationship between the target estimation error and the required noise threshold for the amplitude estimation problem, which is validated by experiments on quantum devices.

Lastly we again point out the importance of multi-parameters estimation problem in the quantum computing framework. In our case, the nuisance noise parameter in addition to the target amplitude parameter must be estimated. We have seen that the Fisher information matrix can be degenerate, in which case the target amplitude parameter cannot be efficiently estimated even by increasing the number of amplitude amplifications. This singular problem arises only in multi-parameters estimation problems, and a similar challenge may easily appear in other scenarios such as the phase estimation problem under noisy environment. In this sense, while this paper employed the ML estimate approach, we can have various options to design an efficient estimator for such multi-parameter estimation problems in noisy quantum devices. For instance Ref. [33] showed the method to tune the likelihood function for Bayesian inference of the amplitude parameter, and they have derived the run time estimation to achieve a target accuracy under noisy environment.